[Sugar-devel] Unicode strings in translations

Martin Langhoff martin.langhoff at gmail.com
Wed Aug 15 09:40:45 EDT 2012


On Wed, Aug 15, 2012 at 9:20 AM, Manuel Kaufmann <humitos at gmail.com> wrote:
> Oh, it's OK. I agree with the result. Now, let's check what Python say
> if I use my default encoding (UTF8) for this simple task:
>
>>>> len("camión")
> 7

CAREFUL HERE. You don't understand what is happening -- it is not as
simple as you think it is.

When you say  len("camión"), you are writing that from a terminal
(Gnome's Terminal, Sugar Terminal, xterm) that is set to use utf-8.

However, Python expects the sequence between " characters to be
straight ASCII (with a codepage). So your terminal IS sending to
Python what looks like 7 chars -- definitely 7 bytes.

However, there is an ASCII representation of "camión" that has 6
bytes, using the Latin-1 codepage. In fact, install an old Linux
system, open an xterm or a VT, retry your example and you'll probably
see that camión has 6 bytes.

I agree we should all use Unicode, specifically UTF-8, everywhere. We
should also make an effort to understand the mechanics of what is
actually happening behind the scenes.

cheers,



m
--
 martin.langhoff at gmail.com
 martin at laptop.org -- Software Architect - OLPC
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff


More information about the Sugar-devel mailing list