[Sugar-devel] [PATCH sugar] Do not display APs that announce the ssid in invalid utf-8 data, OLPC #11698

Simon Schampijer simon at schampijer.de
Wed Mar 28 04:12:03 EDT 2012


On 03/27/2012 11:32 PM, Sascha Silbe wrote:
> Excerpts from Simon Schampijer's message of 2012-03-27 10:19:30 +0200:
>
>> Sugar is not doing well in dealing with non-utf8 data. If an AP
>> does not announce the ssid in valid utf-8 data Sugar will fail in
>> certain ways.
>
> That's an odd way to put it. The actual problem is that - similar to
> POSIX file systems - IEEE 802.11 [7] only defines the SSID to be a
> sequence of octets (i.e. bytes), but Sugar treats it as UTF-8 character
> data. While in most cases the SSID is actually some human-readable
> string, there's neither a guarantee for that nor does any (de-facto or
> de-jure) standard specify the encoding to use. As a result, we'll
> encounter SSIDs in a large variety of encodings and will also need to
> cope with arbitrary byte strings. Any assumption of a single (or in fact
> any) character encoding is incorrect.
>
> The D-Bus API of NetworkManager 0.9 [8] passes SSIDs as uninterpreted
> byte strings (D-Bus signature "ay"). Before SSIDs can be displayed on
> screen, some kind of interpretation must happen. A hex dump would be the
> most straightforward and simple approach, but also the least useful to
> users (for obvious reasons).
>
> networkmanager-applet (and probably its successor in the Gnome 3 Shell)
> uses nm_utils_ssid_to_utf8() [9] from libnm-util to get a display name
> (in UTF-8) for a given SSID. This functions contains a set of heuristics
> to interpret the SSID and should produce useful results in the majority
> of cases where the user has the same language configured as the operator
> of the access point. Arguably it could do even better by falling back to
> hex escapes (rather than a question mark) for bytes that cannot be
> converted to a UTF-8 character (because all heuristics failed or it
> simply isn't a representation of a character), but hopefully those are
> rare enough in practice not to matter.
>
> Some of this has already been discussed (by yourself and others) in
> SL#2023 [6]; I'm surprised that ticket wasn't referenced in the
> description as it's the relevant upstream (i.e. Sugar Labs) ticket.
>
>
>> In some cases Sugar will crash when feeded non utf-8 compliant data,
>> as ssid which segfaults glib.markup_escape_text when trying to display
>> the ssid: glib.markup_escape_text does assume correct utf8 data to be
>> passed [3].
>
> While it doesn't explicitly state that assumption in the docs you cite,
> it also doesn't promise to handle non-UTF-8 data properly. Seems we need
> another round of inspecting the code for potential sources of invalid
> data and catch malformed UTF-8 strings.

It says: "the UTF-8 string to be escaped". And 
https://bugzilla.gnome.org/show_bug.cgi?id=672546 does say as well that 
you should feed valid utf-8 data.

>> The patch does check early when the AP is announced by NM and verify
>> that we do have a ssid with valid utf-8 data. If not, we don't display
>> the AP and log a debug message.
>
> That may be a reasonable stopgap measure downstream for the crash you
> encounter [2], but I don't think we should do this upstream. Assuming
> UTF-8 is simply incorrect; the way we currently handle it and the way
> you propose both mean users are rendered unable to use the access points
> in question (rather than just visual artifacts). While it's technically
> not a regression, we should fix it properly rather than putting a small
> band-aid on a major hole.

No, it is not just about a visual artifact. Even the AP is displayed and 
the Palette does display no ssid you can not connect to that AP because 
we are trying to sent this non-utf8 valid string over D-Bus and that 
chokes on not being valid UTF-8 data.

> I can imagine the following approaches:
>
> 1. Use nm_utils_ssid_to_utf8() via gobject-instrospection:
>
>     >>>  from gi.repository import NetworkManager
>     >>>  print unicode(NetworkManager.utils_ssid_to_utf8(list("äöüß")), 'utf-8')
>     äöüß
>
>     We know that PyGTK and GTK3 via gobject-introspection shouldn't be
>     mixed inside a single process; does that apply to mixing PyGTK with
>     gobject-introspection in general or would calling a self-contained
>     utility function be fine?

We can do this once the shell is ported to gobject-introspection and and 
our code is ported to NetworkManager glib.

> 2. Use nm_utils_ssid_to_utf8() via ctypes.
>
>     I didn't get this to work within a few minutes (returns an empty
>     string), but I don't see any fundamental reason this shouldn't work.

You will hit https://bugzilla.gnome.org/show_bug.cgi?id=672889 going 
down that road.

import ctypes
libnm = ctypes.CDLL('libnm-util.so')
s = getattr(libnm, 'nm_utils_ssid_to_utf8')
s('hallo')

> 3. Duplicate (a subset of) nm_utils_ssid_to_utf8() in Sugar.
>
>     It should be pretty straightforward to write a Python function that
>     can at least handle the most common cases. At the very least we can
>     do a hex dump for non-UTF-8 strings, that would be only slightly more
>     complex than your patch.

I wonder if a hex dump will be understandable to a user that this is his 
desired AP. I think doing the same as NM-applet is doing, is a good 
approach. We can do (1) when we port the shell. I think for 0.96.0 my 
fix is fine, unless you have a spare moment this week to code up the 
nm_utils_ssid_to_utf8 in Python.

Regards,
    Simon


More information about the Sugar-devel mailing list