[sugar] Utility to run multiple OLPC/Sugar instances in UML on virtual network

Andrew Clunis orospakr
Sun Nov 5 11:03:40 EST 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hey folks,

This is my first post on the Sugar list. :)

I've spent some time writing a utility that runs a set of Sugar
virtual machines under UML.  The idea being to facilitate debugging of
Sugar itself and Activities that involve a lot of use of the network.

I imagine this sort of work would get very tedious very quickly
because otherwise one would have to use several machines to test this
stuff effectively (plus all the sneakernet use that would entail).

I must admit I was inspired by Michael Richardson's test suite for
Openswan, which does something similar in that it runs a set of UML
instances in a specific network topology so that they can effectively
test the entire IPsec stack without ever leaving the machine.

The script can be found at:
http://orospakr.is-a-geek.org/stuff/olpc_uml/olpc_uml_emu.py.
SHA1 sum is: cffe967fd64858dc5d990b4c13ac07be3c37dbc2

As described in the documentation, the disk image only has a few small
modifications.  It would be awesome if we could see some love between
this and pilgrim...

I apologise for any ugliness in the script, but it's not like I could
have written a shiny MVC framework for this (that and I'm down and
dirty with UNIX and UML in all its buggy glory, heh).

Documentation follows:

One Laptop Per Child Sugar UML Test Environment script version 0.1

usage: 
    ... read the below document! ;)
    
    olpc_uml_emu.py
    
    Author:         Andrew Clunis <orospakr at linux.ca>
                    Released under the GNU General Public License, version 2.
    
    Date:           Nov 5, 2006
    Version:        0.1
    
    Description:    This script allows you to run multiple OLPC CM1 instances
                    simultaneously within separate User Mode Linux instances,
                    all connected to a virtual network emulated at the Layer 2
                    level.
                    
                    This is useful for creating a virtual classroom, if you
                    will, of CM1s.  The practical upshot of all this is that
                    you can create a virtual classroom, if you will, of CM1
                    laptops for testing Sugar and Activities without all that
                    tedious mucking about with multiple computers.
                    
                    I strongly suggest you read the following, otherwise you
                    might be in for some fun surprises...

    Dependencies:   The UML Tools package.  Under Ubuntu, the package name is
                    "uml-utilities".  You can get them and install them from
                    source at:
                    http://www.user-mode-linux.org/~blaisorblade/uml-utilities/
                    
                    ISC dhcpd v3 (does not need to be running).
                    
                    Ncurses headers (just the usual kernel deps).
                    
                    Bind installed and enabled.  Just the default bind config
                    is necessary, as it needs to answer DNS queries for the
                    instances (I haven't implemented NAT yet, so this isn't
                    strictly necessary...).
                    
                    Reconfigured /dev/shm, most likely.  UML actually stores
                    its memory image in /dev/shm, and thus it is important to
                    ensure that the /dev/shm tmpfs is at least large enough to
                    accommodate all four of the UMLs' physical memory.  The
                    default is generally half of your system's physical memory,
                    and it is quite likely that this will not be enough.  This
                    would be hard to diagnose, because the failure condition is
                    that some of the UML instances will oops for no real reason,
                    citing "Kernel Mode Signal 7".  Specify the desired /dev/shm
                    size by changing the SHM_SIZE directive in
                    /etc/default/tmpfs to the desired size in bytes.  It
                    appears to be OK to set this to a size larger than your
                    phyiscal RAM, as memory is only taken on demand and swapping
                    will be done as needed.
                    
                    A fair amount of RAM.  I did it in 512 MiB, but it was a
                    squeeze and I was definitely pushing my host a fair way
                    into swap.
                    
                    Lots of disk space.  The disk.img file alone is 500 MiB.
                    You'll want at least 2 GiB free to account for builds,
                    etc.
                    
                    There's nothing important in 172.16.0.0/16 routable from
                    your location.  I picked this RFC 1918 network because it
                    isn't very popular and hopefully will trod on as few toes as
                    possible.
                    
                    Otherwise, everything else needed should be satisified by
                    the dependencies for sugar-jhbuild.
                    
    Network:        Given the nature of virtual networking, the setup procedure
                    is somewhat invasive.  This program needs to be temporarily
                    run as root to set up the TAP adapter and a DHCP server
                    for the UML instances.  This is done in a non-persistent
                    way; the computer will be back to its prior state upon
                    reboot.
                    
    How it Works:   Basically, these steps occur when you invoke 'run':
    
                    1. olpc_uml_emu.py starts up four Xephyrs.
                    
                    2. olpc_uml_emu.py starts up four UMLs, each with a CoW
                       image, so they all "share" the same disk.img, but
                       maintain their own set of differences as they go.
                       Each one is given the boot option uml_jhbuild, which
                       contains the path on your *host* of your jhbuild
                       directory.
                       
                    3. All the UMLs connect to the uml_switch daemon started
                       by olpc_uml_emu.py netstart, which is basically that,
                       a virtual Ethernet switch.  This virtual switch is also
                       plugged into a TAP device on the host computer, created
                       by olpc_uml_emu.py netsetup as well.
                       
                    4. All the userlands in the UMLs come up, and
                       NetworkManager fetches an IP address from the dhcpd
                       server running on the host machine.
                       
                    5. A special initscript in the userland of each UML detects
                       the presence of the uml_jhbuild= boot option, which then
                       hostfs mounts (basically UML maps a piece of the host's
                       VFS into its own) it.  It is important to note that the
                       UML maps it into the *same place* on its own VFS as it
                       is on the host.  I did this because jhbuild gets very
                       upset if it is run in a different absolute path from
                       the one it was built in.  From there it 
                       invokes olpc_uml_emu.py start-olpc-environment, a secret ninja 
                       hidden option.
                       
                    6. With control back in the hands of olpc_uml_emu.py,
                       I am able to infer which instance the UML is by
                       looking up the UML's IPv4 address in the list of
                       instances stored in the script.  This trick allows all
                       the UMLs to use the same rootfs but still boot up
                       pointing to their respective X servers or execute any
                       other instance-specific behaviours.
                       
                    
    Usage:          Place this file in your sugar-jhbuild/ directory.
                    To build UML and set up the environment:
                    
                    ./olpc_uml_emu.py setup
                    
                    To temporarily configure the virtual TAP network:
                    
                    sudo ./olpc_uml_emu.py --tapowner=yourusername netstart
                    
                    I've tried very hard to make sure I won't futz up folks'
                    workstations with this too badly.  This does not persist
                    across reboots.
                    
                    To run the emulated instances:
                    
                    ./olpc_uml_emu.py run
                    
                    The --tapdevice option can be used with both run and setup
                    to specify a different TAP device than the default, should
                    you be using tap0 for something already.

                    You should only need to run setup once per release of
                    olpc_uml_emu.py.  However, netsetup needs to be done on
                    every boot of the host.
                    
                    If you run into any kind of trouble, in particular where
                    stuff appears to silently fail, look in the log/ directory!
                    I log quite a bit of stuff in there.  It's also work noting
                    that the way I've set up the initscript in the disk image
                    causes all sugar errors to end up in the log file for that
                    UML, which should make debugging easier.  tail -f works
                    nicely here as well.
                    
                    Naturally, --help will give you a list of all the switches.

    What it makes:  The setup function will create the following directories
                    and files in jhbuild/:

                    uml_test_env/-+-disk.img     # UML disk image containing the
                                  |              # fedora core ext3 image.
                                  |              # Hacked from from J5's boot
                                  |              # image.  This won't be
                                  |              # modified at runtime.
                                  +-kernel/      # linux kernel build directory.
                                  +-instance/    # instance specific files.

                    There are four instances, "Maurice", "Sally", "Miles", and
                    "Antoine".

                    To get rid of or reset the UML test system, just remove
                    the uml_test_env/ directory.
                    
                    To see more help information, run olpc_uml_emu.py --help.
                    
    The Disk Image: This script basically uses J5's boot image, with three
                    modifications; the addition of a small initscript enabled
                    in runlevel 5 with priority 99, and most of the virtual
                    consoles are disabled in /etc/inittab because otherwise
                    UML helpfully spams your desktop with xterms.
                    
    Known Issues:   This script was written on Ubuntu, and I may have
                    inadverdently committed some Ubuntu/Debianisms.  Since the
                    target audience here will mostly be running Fedora Core
                    systems, a few bugs are possible.  But I really can't be
                    bothered to haul down Fedora Core just to test this.
                    Patches welcome. :)
                    
                    Boot time is a little slow.  This is mostly udev's fault,
                    which with the load of all four instances starting up takes
                    about a minute on modest hardware.
                    
                    Forget about x86_64.  But then, Sugar doesn't work there
                    anyway (-fPIC screwups in Mozilla).  Xephyr also seems to
                    be broken there, at least on Ubuntu (it might just be
                    nvidia being screwed up again, though).
                    
                    I haven't implemented NAT yet, so the instances don't yet
                    have IPv4 Internet access.
                    
                    UML uses some very hacky^Wclever tricks in order to make
                    its terminal emulation work right.  It "discovers" the tty
                    it's running on, regardless of whether or not its stdout
                    is redirected elsewhere, and puts it into "raw" mode.
                    Thus, the tty that olpc_uml_emu.py runs on gets screwed
                    up on a regular basis. In fact the reason why I made a quick
                    GTK GUI with the "Stop OLPC emulators" button on it was
                    because this broke interactive use of the terminal.  At
                    any rate, I now have strategic `stty sane` incantations
                    in strategic places in the code so as to at least return
                    your tty to you in a usable state.
                    
                    For some reason, the pilgrim disk image does not support
                    IPv6 out of the box.  Despite this, I've made a point of
                    making this script build UML with IPv6 support enabled
                    because more and more of Sugar will ultimately rely on
                    IPv6.  However,I'm not bothering (unless people clamour
                    for it) to add support for talking to the outside world
                    via IPv6 because simply *isn't* any IPv6 Internet to talk
                    to (yet). :(
                     
                    There are two weird-isms to do with hostfs and jhbuild.
                    jhbuild requires that it have the same absolute directory
                    name in the VFS when it is run as it did when built.  Thus,
                    jhbuild is mounted in the same directory inside the UMLs
                    as it is on the host machine.  If you didn't have it in a
                    particularly weird place, this should be fine.  However,
                    all of the UIDs on the files will all appear as integers
                    (or perhaps just a nonsensical username, depending on how
                    your host box was configured).  This shouldn't cause any
                    major issues.
                    
                    Right now the disk image contains an old copy of Sugar 
                    installed from RPMs.  While this isn't really a big deal
                    because jhbuild does a fine job of isolating the build copy
                    from any system stuff, it could potentially cause problems
                    in the future.  A new pilgrim build target just for
                    olpc_uml_emu.py might make sense.
                    
                    For some reason, if an X11 Bad Auth error happens at
                    startup, the GTK library seems to forcibly quit the
                    python interpreter, rather than raise an exception.
                    Thus, whenever this condition occurs, the script will fail,
                    even if you're just using the netstart and netstop options,
                    which don't use GTK (after a su root, for instance).  Unset
                    DISPLAY to get around it.
                    
                    Sometimes UML gets all confused and pins the CPU at 100% and
                    doesn't quit.  killall linux is your friend.
                    
                    You have to enter the machine nicknames manually into
                    sugar-shell at first boot, and the Xephyrs don't have very
                    identifying titles.  Sugar's first boot screen has no means 
                    of setting a default name in the Entry box, and Xephyr has
                    no means to change the title.  I am sorry, but this means 
                    that the nicknames of the machines are nonobvious at 
                    runtime.  My advice is just to remember the X11 display
                    numbers.  :40 is Maurice, :41 is Sally, etc.
                    
                    There's no (useful) sound support.  UML doesn't have any
                    ALSA support at all, and the OSS support doesn't seem to
                    work even though I enabled it in .config.  Sorry, TamTam.
                    
                    The X servers listen on 0.0.0.0 with access controls
                    disabled.  'nuff said.
                    
    Happy Hacking!               

usage: olpc_uml_emu.py [options] <netsetup, setup, start, netstop>

options:
  -h, --help            show this help message and exit
  -x XWIDTH, --xwidth=XWIDTH
                        Width of the X instances. (default: 640)
  -y XHEIGHT, --xheight=XHEIGHT
                        Height of the X instances. (default: 480)
  -t TAPDEV, --tapdev=TAPDEV
                        TAP network adapter name to create/use the virtual
                        OLPC network on. DON'T use a 'tun' device. (default:
                        tap0)
  -o TAPOWNER, --tapowner=TAPOWNER
                        Owner of the tap device created by netstart.  This
                        will probably be the user who will run the OLPC test
                        environment.
~~~

Regards,
Andrew Clunis
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)

iD8DBQFFTa9PALkUMXSNow8RAroWAKCzvHqMWlx6D+lfeGNpQ/q8T9ltVACdHgko
VHmy5HvKspRw820Pi9JCTv4=
=MmdR
-----END PGP SIGNATURE-----


More information about the Sugar-devel mailing list