[sugar] Develop activity (Oops...)

Marc-Antoine Parent maparent
Mon Aug 13 16:40:53 EDT 2007


Good day, all!

Finally had time to read and think a bit, especially Jameson's design.

(For the record: Andrew kindly included me in your discussion,  
because I have worked on a few multilingual projects in the past, and  
got to think about these issues somewhat.)

I like many aspects of the design a lot; especially the idea that  
the .py files should be in English as much as possible, with  
translation on load and save. This has two obvious advantages, namely
a) we do not have to store translation between a matrix of languages,  
but we are allowed a simpler hub-and-spoke model with English as the hub
b) the current interpreter can work without knowing about all this.

However, Jameson, if I may, I would take issue with a few assumptions  
of your model: most especially that of a "preferred language" for  
modules.
I am referring to your point 3:
> 3-This dictionary ONLY contains translations for the "public  
> interface" of somemodule.py, that is, those identifiers which are  
> used in importer modules. It also defines a single, unchanging  
> "preferred language" for that file, which is the assumed language  
> for all non-translated identifiers in that file.

I am especially interested in collaborative work; and I believe it is  
not unreasonable to hope that children between schools in different  
countries will get to share some work.
That would mean that a given modules may have many editors, possibly  
introducing identifiers in more than one non-English language.
 From that point of view, "preferred language" is a feature of an  
editing environment, not of a module. New identifiers should be  
individually tagged by language; I see that tagging as appropriate  
work for the editing environment. Basically, upon loading the file,  
all local identifiers would be read in memory; upon saving, new ones  
would be saved with a language tag. (Plausibly as a postfix,  
Identifier_i18n_2letterLanguageCode...)
(I would otherwise follow Mike's suggestion to use a fixed  
transliteration table for non-latin scripts.)

This only applies until we have a valid English version of the  
identifier, of course; at that point, it will serve as the hub. But  
that raises another issue, which you tackle in point 5 and 6: what  
happens with imports in other modules that use the old generated  
identifier? You suggest keeping a separate history. It is a  
possibility, but I fear it goes counter to the goal of making the  
files usable by the existing interpreter. (Though you may have  
thought of a workaround this that I have missed.)
My suggestion would be as follows:
(I will use French for my example.)

in premier_module.py:
def une_fonction__i18n_fr: ...
EOF

in deuxieme_module.py:
du premier_module importe une_fonction__i18n_fr
...
EOF

Then, the translation a_function is introduced for une_fonction...

So premier_module.py becomes:

def a_function: ...

#  -*- Translation history block -*-
une_fonction__i18n_fr = a_fonction
# -*- End translation history block -*-
EOF

(N.B. 1: The translation history block could be hidden in a  
knowledgeable editor; but we should have access to it, so as to  
explain why that word is still reserved.)
(N.B. 2: Actually, it is likely that premier_module.py has been  
renamed to first_module.py, and the package's __init__.py has a  
similar equivalence in _its_ translation block!!!)

That way, the original import in deuxieme_module still works, in an  
unmodified python interpreter.
(Until the knowledgeable editor gets to work on deuxieme_module.py  
again, of course.)
Even if someone decides on a better translation later on, more than  
one version may be kept in the translation block.
This has the disadvantage of polluting the code, but the advantage of  
polluting the filesystem less.

I am realizing a broader application of this mechanism: the  
translation block could be tagged with a revision number (if  
__revision__<540:), and the "import" command could mention the last  
known revision; so translation blocks would only be activated at  
need. But that's all another story.

Another quick related note: What if someone adds a translation  
between two non-English languages? In your first email, you  
explicitly forbid it; I am not sure that is necessary. (I am not sure  
you think of it as necessary in your later design as well.) Clearly,  
however, X to Y translations may have to refer to the history (as  
language X is replaced by English) so as to become English to Y.

To finish with your design points, you introduce what I see as a  
severe limitation in your point 4:

> 4-There is good UI support for creating a new translation for a  
> word. However, the assumed user model is that words will be  
> translated INTO a users preferred language; FROM the context of an  
> importer module (you'd generally not add translations for a module  
> from that module itself, since generally you wouldn't even have  
> modules open whose preferred language is not your own); and  
> therefore WITH an explicit user decision as to which module this  
> translation belongs in (they want to use their language for  
> identifier X which is in English, well, they must have had a reason  
> to write it in English rather than their language so they  
> presumably know what imported module it comes from.)

What really made me jump is the notion that "you wouldn't even have  
modules open whose preferred language is not your own". Again, this  
assumes a single preferred language per module, which is something I  
would rather avoid, and I believe is not necessary if identifiers  
have a language mark.

However, I suspect your mention of "from the context of an importer  
module" comes from the issues you encountered with memorizing the  
import structure. I would like to hear more about the problems you  
ran into there, because I believe it is necessary (for reasons to be  
detailed below.)

Now, a few suggestions and pitfalls of my own:

a) I believe there should be one translation file per language. More  
file pollution, less parsing.
I suspect that something akin to the getinfo file structure would be  
appropriate:
package/module1.py
would be translated in
package/_t9n_/fr/module1.pyt
package/_t9n_/fr/module1.pyto (object, like a .mo file)
package/_t9n_/es/module1.pyt
package/_t9n_/es/module1.pyto
and so on.

b) A particularly fancy editor would color-code words in other  
languages instead of showing the _i18n_xx tag.
Of course there would be a way to access online translation services  
to get suggestions (as has been suggested by many.)

c) Sci-fi scenario:  any new translation suggestion by a child or  
educator should be made available to others using a distributed  
database system... (they are likely to work on common projects, and  
hence on common modules.)
The children educators known to be knowledgeable about a given  
language pair should have a way to vet translations in that database.
Oh, and let's send it to planet python so we have a basis to build  
the translation files to the standard library for very obscure  
languages ;-)
(OK, that _is_ sci-fi. Still worth thinking about!)

d) Back to earth: I said we really had to know the import  
structure... here is a slew of related problems:

Suppose we are editing a module that is importing something from the  
core library:

from moduleX import f1
from moduleZ import f2
f1()
f2()

Now, suppose f1 and f2 both translate as "sigma" in the current  
editing language... Then, though the .py code is unambiguous, the  
translated on-screen code looks ambiguous; and worse, the un- 
translation process on save is not well-defined.
The solution is to actually un-specify the imports in the source code:

import moduleX
import moduleY
moduleX.sigma()
moduleY.sigma()

This refactoring should be possible in most cases, unless two top- 
level modules have similar translations. (say moduleX and module Y  
both translate as "modula")
This situation should be marked as an error; or alternately _display_  
the following:

import moduleX__i18n_en_ as modula_1
import moduleY__i18n_en_ as modula_2
modula_1.sigma()
module_2.sigma()

This is not an interpreter-level change, but a disguised display. (or  
rather a refactoring which can be memorized, and reverted by the  
untranslation machinery.)
Note that display-only import disambiguation may also be necessary if  
the above code happens in a core library file (which we would never  
modify.)
In any case it is useful to flag as an error any translation that  
introduces ambiguity within the same namespace.
Similar transformations may be made necessary by "from moduleX import  
*" syntax.

None of this is simple, as I said; but alas probably necessary.

e) Would we display numbers as the equivalent numerics in other  
writing systems?

f) Docstrings... are another issue entirely. I still like my idea of  
a distributed database, so children puzzling out a foreign (to them)  
docstring with online help can put their minds together.

OK, I am giving more problems than solutions, here; and  
unfortunately, my spare time is otherwise quite occupied, so I doubt  
I can contribute to implementation; still, I hope that spelling some  
of these things out is useful to others. I'll try to keep my thinking  
cap on as this discussion evolves.

Cheers,
Marc-Antoine Parent
http://maparent.ca/

P.S. I _love_ your idea of arrows in the margin to indicate flow!




More information about the Sugar-devel mailing list