by Chusslove Illich (Часлав Илић) <chaslav@sezampro.yu>
December 10th, 2003.
Copyright © 2003, Chusslove Illich
1. Introduction
2. Basics
Few months ago, when I started to do some work on program translations using gettext (or gettext based) system, it occurred to me that more flexibility may be provided to translators and less burden put on programmers if there could be some way of programming translation itself. To be clear, of course I am thinking of run-time solution (ie. interpreted lisp-like translation language), not something that would need application recompilation when adding new translations.
Key point in this is that translation system would provide for translation not only message strings with argument placeholders, but also the arguments themselves, in form of formated strings. Using built-in and user defined functions, translator would be able to make translation sensitive to context, that is to given arguments.
This idea itched me for some time, and it turned out that it was not too time consuming to write a piece of rough demonstration code, using C++. I'll give few explanations and examples of common problems; for translation examples I'll be using my native language, Serbian, but with pure ASCII transcription. Also, I'll be referencing this programmable system as `cotras' (context translation of strings).
In general, message in cotras catalog has following format:
def message (function arg1 arg2 arg3 ... )
def
is reserved keyword, marking start of message definition. message
is constant string, or more strings with C-like automatic concatenation. Each argument to function
can be either constant string, variable or subfunction. All functions return string as their result.
Simple gettext message that looks like this:
msgid "Programmable translation demo" msgstr "Demo programabilnog prevoda"
would look like this in cotras catalog:
def "Programmable translation demo" (cat "Demo programabilnog prevoda")
Function cat
concatenates all given strings, so the same translation would be produced using:
def "Programmable translation demo" (cat "Demo " "programabilnog " "prevoda")
or
def "Programmable translation demo" (cat "Demo " (cat "programabilnog " "prevoda"))
etc.
Now let's see gettext message containing argument:
msgid "Total %1 files found" msgstr "Nadjeno je ukupno %1 fajlova"
Cotras message could look like:
def "Total %1 files found" (cat "Nadjeno je ukupno %1 fajlova")
Like in gettext based system, cotras will replace this %1
placeholder with real data when it receives final string. But, remember, arguments themselves are also passed around. They can be referenced like variables starting with percentage sign, followed by number (ie. same as placeholder). So, same result could be produced using:
def "Total %1 files found" (cat "Nadjeno je ukupno " %1 " fajlova")
Here, %1
will expand to string and get concatenated with rest of strings.
Let's introduce one more built-in function of cotras. It is multiple selection function, sel
, which has following format:
(sel match case1 return1 case2 return2 case3 return3 ... )
Returned is the first return
string whose case
string matches match
string. Here is an example:
def "Found %1 result(s)" (cat "Found " (sel %1 "1" "one" "2" "two" "3" "three" "" "many" ) " result(s)" )
Here you can see direct message modification based on context. Argument %1
is used as match value, and sel
will replace number that would normally be output for either `one', `two', `three' or `many' (empty string, ""
, matches any string).
As a minor, but important detail, case values in function sel
can contain prefix and suffix marks. Value "-ion"
would match anything ending with `ion'; similarly, "ion-"
or "-ion-"
would match anything starting with or containing `ion'.
Translator can also define functions using keyword defv
, with non-quoted string as function name:
defv name (function arg1 arg2 arg3 ... )
For example:
defv num-to-text (sel %1 "1" "one" "2" "two" "3" "three" "" "many" )
%1
in this example is the argument passed to function num-to-text
from calling function. So, once num-to-text
is defined, we could rewrite previous example like this:
def "Found %1 result(s)" (cat "Found " (num-to-text %1) " result(s)")
Argument reference variables shown to this point are local; %
in called user function refers to argument passed to that function, not to top argument from the untranslated message. However, there is a way to refer to top arguments from any function (they are, in effect, global). Also, these arguments are cached from previous calls to some depth. For example, somenumber
%3-0
refers to top arg %3
in current message, but %3-1
refers to top arg %3
from message called immediately before current.
In case you are wondering, %0
also has a meaning -- it is the message string or function name itself. This is same as passing of command line to C programs, where argument 0 is name of executable.
Most of these examples are solvable using gettext approach, but it requires tight cooperation between programmers and translators, which in my experience rarely happens. Recently, in translation of KDE, there was example of translator reporting problem similar to example 3.2, and he was replied that it was impossible to change, so that he should be `creative', ie. try to make things look less ugly :)
First important thing is that, unlike when using gettext, programmer doesn't need to take care of plural handling (that is, beyond English).
Programmer can happily use message like "We have %1 train(s)"
, since there are only two plural forms -- `train' and `trains'. Serbian translator cannot do it, because he has 3 plural forms in his language:
So, he would write function serbian-plural
which takes number (%1
) and three forms of plural (%2
, %3
, %4
):
defv serbian-plural (sel %1 "-11" %4 "-12" %4 "-13" %4 "-14" %4 "-1" %2 "-2" %3 "-3" %3 "-4" %3 "" %4 )
With this, translation for message "Found %1 train(s)"
is:
def "We have %1 train(s)" (cat "Imamo %1 " (serbian-plural %1 " voz" " voza" " vozova"))
Of course, you could use gettext-style complete string rewrite, like:
def "We have %1 train(s)" (serbian-plural %1 "Imamo %1 voz" "Imamo %1 voza" "Imamo %1 vozova" )
Lets now consider trickier example, which would require some acrobatics (from both programmer and translator) to solve using gettext: "We have %1 train(s) and %2 ship(s)"
. This is easily solvable with cotras:
def "We have %1 train(s) and %2 ship(s)" (cat "Imamo %1 " (serbian-plural %1 " voz" " voza" " vozova") " i %2 " (serbian-plural %2 " brod" " broda" " brodova") )
Due to unchangeable words in English, something that may sound acceptable in it, is completely wrong for many other languages. For example, consider message "Undo %1"
, where %1
is some action, like "Delete"
. In this case `Delete' is considered both as verb in imperative form for itself and a noun used as object in `Undo...' message. In gettext catalog, there would be something like:
msgid "Delete" msgstr "Obrisi" msgid "Cut" msgstr "Iseci" msgid "Paste" msgstr "Nalepi" msgid "Undo %1" msgstr "Ponisti %1"
In Serbian, message `Undo Delete' would result in `Ponisti Obrisi', which is totally unacceptable. The correct message would be `Ponisti brisanje'. The way to get it is to define function action-to-noun-object
:
defv action-to-noun-object (sel %1 "Obrisi" "brisanje" # Delete "Iseci" "isecanje" # Cut "Nalepi" "nalepljivanje" # Paste ... "" %1 # If there is no match, just return original )
Messages in cotras catalog would then look like:
def "Delete" (cat "Obrisi") def "Cut" (cat "Iseci") def "Paste" (cat "Nalepi") def "Undo %1" (cat "Ponisti " (action-to-noun-object %1))
Again, because of unchangeable nature of English language, we can get into problems with same short (say, one or two words) message needing two different translations.
For example, we might have lists titled `Defined actions' and `Defined filters', and a short adjective message `None' when no filters or actions are defined. In Serbian language, adjective form depends (among other things) on noun gender, and nouns `action' and `filter' have different genders. In case of `action', `none' should be `nijedna', and for `filter' it should be `nijedan'. So, in gettext catalog we have:
msgid "Defined actions" msgstr "Definisane akcije" msgid "Defined filters" msgstr "Definisani filteri" msgid "None" msgstr "?" # This should be either "Nijedna" or "Nijedan"...
Using cotras, Serbian translator can solve this problem using cached top arguments of message calls; let's assume that `Defined...' messages are called immediately before `None'. First, let's define function none-gender-form
that determines form of `none' based on gender:
defv none-gender-form (sel %1 "-action-" "Nijedna" "-filter-" "Nijedan" )
Now, messages would look like:
def "Defined actions" (cat "Definisane akcije") def "Defined filters" (cat "Definisani filteri") def "None" (none-gender-form %0-1)
Reference %0-1
means: argument 0 of first message previously called. Remember, arg 0 is untranslated message itself. So, if "Defined actions"
was called prior to "None"
, %0-1
will expand to "Defined actions"
, and will be matched by "-action-"
in function none-gender-form
, thus returning proper form of translation for `none'.