[Sugar-devel] Unicode strings in translations

Martin Langhoff martin.langhoff at gmail.com
Wed Aug 15 12:05:50 EDT 2012


On Wed, Aug 15, 2012 at 11:40 AM, S. Daniel Francis
<francis at sugarlabs.org> wrote:
> So, the Python strings can be encoded in a Unicode compatible charset
> like utf-8, the Python Unicode type is a way to encode a string if you
> don't like to add a header and the recommended way to work in the
> program internally, so you mustn't use it for output, you will have to
> encode the content of type PyUnicode in a PyString with the UTF-8
> charset for the output and it'll not generate any conflict.

In general, yes, the switch to assuming strings are in UTF-8 format
will not cause any conflict. Specially since we used to have that
before, as others pointed out.

Outside of existing Sugar code, you have to be careful with libraries
that deal with binary data. Before utf-8, handling binary data and
handling strings was just about the same. For example, a sequence of
bytes and a sequence of chars would both work transparently with len()
and general array manipulations (ie: myvar[33]).

So for example, a pure python zip compression/decompression
implementation now needs to clearly define it is _not_ working on
utf-8 streams.

We aren't really making an ASCII to UTF-8 transition, we are restoring
UTF-8-as-default. So this is not an issue. But anytime you make such a
transition, you have to review & retest any code working with raw
binary data.

cheers,


m
-- 
 martin.langhoff at gmail.com
 martin at laptop.org -- Software Architect - OLPC
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff


More information about the Sugar-devel mailing list