[Sugar-devel] Unicode strings in translations
Martin Langhoff
martin.langhoff at gmail.com
Wed Aug 15 09:40:45 EDT 2012
On Wed, Aug 15, 2012 at 9:20 AM, Manuel Kaufmann <humitos at gmail.com> wrote:
> Oh, it's OK. I agree with the result. Now, let's check what Python say
> if I use my default encoding (UTF8) for this simple task:
>
>>>> len("camión")
> 7
CAREFUL HERE. You don't understand what is happening -- it is not as
simple as you think it is.
When you say len("camión"), you are writing that from a terminal
(Gnome's Terminal, Sugar Terminal, xterm) that is set to use utf-8.
However, Python expects the sequence between " characters to be
straight ASCII (with a codepage). So your terminal IS sending to
Python what looks like 7 chars -- definitely 7 bytes.
However, there is an ASCII representation of "camión" that has 6
bytes, using the Latin-1 codepage. In fact, install an old Linux
system, open an xterm or a VT, retry your example and you'll probably
see that camión has 6 bytes.
I agree we should all use Unicode, specifically UTF-8, everywhere. We
should also make an effort to understand the mechanics of what is
actually happening behind the scenes.
cheers,
m
--
martin.langhoff at gmail.com
martin at laptop.org -- Software Architect - OLPC
- ask interesting questions
- don't get distracted with shiny stuff - working code first
- http://wiki.laptop.org/go/User:Martinlanghoff
More information about the Sugar-devel
mailing list