[Sugar-devel] Unicode strings in translations

S. Daniel Francis francis at sugarlabs.org
Mon Aug 13 19:15:46 EDT 2012


2012/8/13 Gonzalo Odiard <gonzalo at laptop.org>:
> You can read a utf8 encodec file with codecs.open too.
>
> http://docs.python.org/library/codecs.html
>
> Gonzalo

I look some people is needing to know more about Unicode:

The strings are encoded by default in ASCII, but with ASCII the
computer can't represent all the unicode characters, and here appears
utf-8, adding the line at the header of the file Python encodes the
strings in utf-8.

If we have a variable of type unicode:

my_unicode = u'Hello World'

you can get a string in utf-8 with the following line:

utf8_string = my_unicode.encode('utf-8')

for get a unicode object from a string:

new_unicode = utf8_string.decode('utf-8')

When do you need unicode?
Some characters have 2 bytes at the memory for be represented, so at
the time of iterate the string you get bytes, not characters, so it
works well with 1-byte characters but it will not work as expected
with the other, with unicode you can iterate by the text character by
character.

Code example:

>> for i in "My string with Ñ": print i
M
y

s
t
r
i
n
g

w
i
t
h


�

>>for x in [i.encode('utf-8') for i in u"My string with Ñ"]: print x
M
y

s
t
r
i
n
g

w
i
t
h

Ñ


http://wiki.python.org/moin/Unicode

Regards.
~danielf


More information about the Sugar-devel mailing list