[Sugar-devel] [PATCH sugar] Do not display APs that announce the ssid in invalid utf-8 data, OLPC #11698

Sascha Silbe silbe at activitycentral.com
Tue Mar 27 17:32:46 EDT 2012


Excerpts from Simon Schampijer's message of 2012-03-27 10:19:30 +0200:

> Sugar is not doing well in dealing with non-utf8 data. If an AP
> does not announce the ssid in valid utf-8 data Sugar will fail in
> certain ways.

That's an odd way to put it. The actual problem is that - similar to
POSIX file systems - IEEE 802.11 [7] only defines the SSID to be a
sequence of octets (i.e. bytes), but Sugar treats it as UTF-8 character
data. While in most cases the SSID is actually some human-readable
string, there's neither a guarantee for that nor does any (de-facto or
de-jure) standard specify the encoding to use. As a result, we'll
encounter SSIDs in a large variety of encodings and will also need to
cope with arbitrary byte strings. Any assumption of a single (or in fact
any) character encoding is incorrect.

The D-Bus API of NetworkManager 0.9 [8] passes SSIDs as uninterpreted
byte strings (D-Bus signature "ay"). Before SSIDs can be displayed on
screen, some kind of interpretation must happen. A hex dump would be the
most straightforward and simple approach, but also the least useful to
users (for obvious reasons).

networkmanager-applet (and probably its successor in the Gnome 3 Shell)
uses nm_utils_ssid_to_utf8() [9] from libnm-util to get a display name
(in UTF-8) for a given SSID. This functions contains a set of heuristics
to interpret the SSID and should produce useful results in the majority
of cases where the user has the same language configured as the operator
of the access point. Arguably it could do even better by falling back to
hex escapes (rather than a question mark) for bytes that cannot be
converted to a UTF-8 character (because all heuristics failed or it
simply isn't a representation of a character), but hopefully those are
rare enough in practice not to matter.

Some of this has already been discussed (by yourself and others) in
SL#2023 [6]; I'm surprised that ticket wasn't referenced in the
description as it's the relevant upstream (i.e. Sugar Labs) ticket.


> In some cases Sugar will crash when feeded non utf-8 compliant data,
> as ssid which segfaults glib.markup_escape_text when trying to display
> the ssid: glib.markup_escape_text does assume correct utf8 data to be
> passed [3].

While it doesn't explicitly state that assumption in the docs you cite,
it also doesn't promise to handle non-UTF-8 data properly. Seems we need
another round of inspecting the code for potential sources of invalid
data and catch malformed UTF-8 strings.


> The patch does check early when the AP is announced by NM and verify
> that we do have a ssid with valid utf-8 data. If not, we don't display
> the AP and log a debug message.

That may be a reasonable stopgap measure downstream for the crash you
encounter [2], but I don't think we should do this upstream. Assuming
UTF-8 is simply incorrect; the way we currently handle it and the way
you propose both mean users are rendered unable to use the access points
in question (rather than just visual artifacts). While it's technically
not a regression, we should fix it properly rather than putting a small
band-aid on a major hole.

I can imagine the following approaches:

1. Use nm_utils_ssid_to_utf8() via gobject-instrospection:

   >>> from gi.repository import NetworkManager
   >>> print unicode(NetworkManager.utils_ssid_to_utf8(list("äöüß")), 'utf-8')
   äöüß

   We know that PyGTK and GTK3 via gobject-introspection shouldn't be
   mixed inside a single process; does that apply to mixing PyGTK with
   gobject-introspection in general or would calling a self-contained
   utility function be fine?

2. Use nm_utils_ssid_to_utf8() via ctypes.

   I didn't get this to work within a few minutes (returns an empty
   string), but I don't see any fundamental reason this shouldn't work.

3. Duplicate (a subset of) nm_utils_ssid_to_utf8() in Sugar.

   It should be pretty straightforward to write a Python function that
   can at least handle the most common cases. At the very least we can
   do a hex dump for non-UTF-8 strings, that would be only slightly more
   complex than your patch.


Sascha

> [1] http://developer.gnome.org/libnm-util/unstable/libnm-util-nm-utils.html#nm-utils-ssid-to-utf8
> [2] http://dev.laptop.org/ticket/11698
> [3] http://developer.gnome.org/pygobject/stable/glib-functions.html#function-glib--markup-escape-text
> [4] https://bugzilla.gnome.org/show_bug.cgi?id=672546
> [5] http://developer.gnome.org/glib/2.30/glib-Unicode-Manipulation.html#g-utf8-validate
[6] https://bugs.sugarlabs.org/ticket/2023
[7] http://standards.ieee.org/getieee802/download/802.11-2007.pdf (section 7.3.2.1)
[8] http://projects.gnome.org/NetworkManager/developers/api/09/spec.html
[9] http://projects.gnome.org/NetworkManager/developers/libnm-util/09/libnm-util-nm-utils.html#nm-utils-ssid-to-utf8
-- 
http://sascha.silbe.org/
http://www.infra-silbe.de/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://lists.sugarlabs.org/archive/sugar-devel/attachments/20120327/bbbeee6f/attachment.pgp>


More information about the Sugar-devel mailing list