PO format for translations

Started by Alan v.Drake, Mon 24/04/2023 14:46:56

Previous topic - Next topic

Alan v.Drake

Since I have some time right now, I'm trying to finish implementing the base support for PO translations.
So, I'd like to discuss some things and plan what to implement.

I have an open ticket for this activity:
https://github.com/adventuregamestudio/ags/issues/1780

What are PO translations?
The PO format is a widely adopted standerd for translating strings.

What are the advantages over .tra files?
It's more solid.
Currently you can mistakenly add spaces to the original string, and end up the Editor recognizing it as a new entry.
Or adding a newline, and so inverting the order of all subsequent "original<->translation" pairs.
With this new format, besides avoiding those problems, we can add some metadata which may come in handy, and most important, there are plenty mature tools to edit them (Poedit, web platforms, etc).

I currently do not plan to implement pluralization or using different translations depending on context, as these would require some bigger changes to both editor and engine.

What's there to think about?
Because PO files support a variety of features (tracking source lines, comments for translators, etc), I'd like your input on how we should implement them in the future.

1. Comments for translators:
I think the standard here is using a "// TRANSLATORS: lorem ipsum" comment.
I might go with that unless someone has better ideas. These can be seen in tools like Poedit and could be useful providing specific context.

2. Context tag:
The po format has a context field, which usually can be used to differentiate different translations for the same words.
Alas, we're not going to implement that any time soon, but we could use it as a visual aid since some editors display it as a label.
An idea could be using the script name for context, though that information can already be added in the source reference. Maybe the script name makes more sense.

My suggestion for duplicate strings remains that of using unique ids, and then making a translation for the current language.

3. The problem of unwanted strings:
When you include a module in your project all those strings end up in the translation file, creating unnecessary clutter.
We should perhaps think about mechanisms to exclude scripts while building/updating translations.
Dialogues and Room are likely always in need of translations. Not sure about others, maybe we should add a property to scripts, or some magic comment to enable/disable string fetching for translations.

4. What to do with obsolete strings?
Currently they remain inthe translation file forever.
I'm on the fence whether keeping them and marking them as "fuzzy" (which means thay need to be verified) perhaps with an extra "deleted" comment  or removing them completely.


I can promise nothing, but having some ideas could help chart a course to make the translation system better. Let me know what you think.

EDIT: Just to be clear, this will be landing in AGS4

- Alan

Crimson Wizard

#1
So, from what I understand, the primary goal of this is to have a cleaner and stricter translation source format, which is

- is a widespread standard, which can be viewed and edited by multiple tools;
- not breaking easily like TRS, if you add or misplace a line somewhere;
- provides extra fields that may potentially be utilized in the future for more translation fixes and features.

I guess that additional features, like contexts, is more of a job for Editor/Engine rather than the PO format. The format itself only provides an option to store extra data, but the problem remains how to generate and use that data in game.


I propose to separate two kinds of features:
- The tasks of generating and updating a translation source, which involve only parsing game and generating PO file.
- The features that require engine to treat translations differently (contexts, plurals, etc).

This is for organizational purposes, as former kind may be done easier than the latter.

SMF spam blocked by CleanTalk