[sugar] Utility to run multiple OLPC/Sugar instances in UML on virtual network
Andrew Clunis
orospakr
Sun Nov 5 11:03:40 EST 2006
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hey folks,
This is my first post on the Sugar list. :)
I've spent some time writing a utility that runs a set of Sugar
virtual machines under UML. The idea being to facilitate debugging of
Sugar itself and Activities that involve a lot of use of the network.
I imagine this sort of work would get very tedious very quickly
because otherwise one would have to use several machines to test this
stuff effectively (plus all the sneakernet use that would entail).
I must admit I was inspired by Michael Richardson's test suite for
Openswan, which does something similar in that it runs a set of UML
instances in a specific network topology so that they can effectively
test the entire IPsec stack without ever leaving the machine.
The script can be found at:
http://orospakr.is-a-geek.org/stuff/olpc_uml/olpc_uml_emu.py.
SHA1 sum is: cffe967fd64858dc5d990b4c13ac07be3c37dbc2
As described in the documentation, the disk image only has a few small
modifications. It would be awesome if we could see some love between
this and pilgrim...
I apologise for any ugliness in the script, but it's not like I could
have written a shiny MVC framework for this (that and I'm down and
dirty with UNIX and UML in all its buggy glory, heh).
Documentation follows:
One Laptop Per Child Sugar UML Test Environment script version 0.1
usage:
... read the below document! ;)
olpc_uml_emu.py
Author: Andrew Clunis <orospakr at linux.ca>
Released under the GNU General Public License, version 2.
Date: Nov 5, 2006
Version: 0.1
Description: This script allows you to run multiple OLPC CM1 instances
simultaneously within separate User Mode Linux instances,
all connected to a virtual network emulated at the Layer 2
level.
This is useful for creating a virtual classroom, if you
will, of CM1s. The practical upshot of all this is that
you can create a virtual classroom, if you will, of CM1
laptops for testing Sugar and Activities without all that
tedious mucking about with multiple computers.
I strongly suggest you read the following, otherwise you
might be in for some fun surprises...
Dependencies: The UML Tools package. Under Ubuntu, the package name is
"uml-utilities". You can get them and install them from
source at:
http://www.user-mode-linux.org/~blaisorblade/uml-utilities/
ISC dhcpd v3 (does not need to be running).
Ncurses headers (just the usual kernel deps).
Bind installed and enabled. Just the default bind config
is necessary, as it needs to answer DNS queries for the
instances (I haven't implemented NAT yet, so this isn't
strictly necessary...).
Reconfigured /dev/shm, most likely. UML actually stores
its memory image in /dev/shm, and thus it is important to
ensure that the /dev/shm tmpfs is at least large enough to
accommodate all four of the UMLs' physical memory. The
default is generally half of your system's physical memory,
and it is quite likely that this will not be enough. This
would be hard to diagnose, because the failure condition is
that some of the UML instances will oops for no real reason,
citing "Kernel Mode Signal 7". Specify the desired /dev/shm
size by changing the SHM_SIZE directive in
/etc/default/tmpfs to the desired size in bytes. It
appears to be OK to set this to a size larger than your
phyiscal RAM, as memory is only taken on demand and swapping
will be done as needed.
A fair amount of RAM. I did it in 512 MiB, but it was a
squeeze and I was definitely pushing my host a fair way
into swap.
Lots of disk space. The disk.img file alone is 500 MiB.
You'll want at least 2 GiB free to account for builds,
etc.
There's nothing important in 172.16.0.0/16 routable from
your location. I picked this RFC 1918 network because it
isn't very popular and hopefully will trod on as few toes as
possible.
Otherwise, everything else needed should be satisified by
the dependencies for sugar-jhbuild.
Network: Given the nature of virtual networking, the setup procedure
is somewhat invasive. This program needs to be temporarily
run as root to set up the TAP adapter and a DHCP server
for the UML instances. This is done in a non-persistent
way; the computer will be back to its prior state upon
reboot.
How it Works: Basically, these steps occur when you invoke 'run':
1. olpc_uml_emu.py starts up four Xephyrs.
2. olpc_uml_emu.py starts up four UMLs, each with a CoW
image, so they all "share" the same disk.img, but
maintain their own set of differences as they go.
Each one is given the boot option uml_jhbuild, which
contains the path on your *host* of your jhbuild
directory.
3. All the UMLs connect to the uml_switch daemon started
by olpc_uml_emu.py netstart, which is basically that,
a virtual Ethernet switch. This virtual switch is also
plugged into a TAP device on the host computer, created
by olpc_uml_emu.py netsetup as well.
4. All the userlands in the UMLs come up, and
NetworkManager fetches an IP address from the dhcpd
server running on the host machine.
5. A special initscript in the userland of each UML detects
the presence of the uml_jhbuild= boot option, which then
hostfs mounts (basically UML maps a piece of the host's
VFS into its own) it. It is important to note that the
UML maps it into the *same place* on its own VFS as it
is on the host. I did this because jhbuild gets very
upset if it is run in a different absolute path from
the one it was built in. From there it
invokes olpc_uml_emu.py start-olpc-environment, a secret ninja
hidden option.
6. With control back in the hands of olpc_uml_emu.py,
I am able to infer which instance the UML is by
looking up the UML's IPv4 address in the list of
instances stored in the script. This trick allows all
the UMLs to use the same rootfs but still boot up
pointing to their respective X servers or execute any
other instance-specific behaviours.
Usage: Place this file in your sugar-jhbuild/ directory.
To build UML and set up the environment:
./olpc_uml_emu.py setup
To temporarily configure the virtual TAP network:
sudo ./olpc_uml_emu.py --tapowner=yourusername netstart
I've tried very hard to make sure I won't futz up folks'
workstations with this too badly. This does not persist
across reboots.
To run the emulated instances:
./olpc_uml_emu.py run
The --tapdevice option can be used with both run and setup
to specify a different TAP device than the default, should
you be using tap0 for something already.
You should only need to run setup once per release of
olpc_uml_emu.py. However, netsetup needs to be done on
every boot of the host.
If you run into any kind of trouble, in particular where
stuff appears to silently fail, look in the log/ directory!
I log quite a bit of stuff in there. It's also work noting
that the way I've set up the initscript in the disk image
causes all sugar errors to end up in the log file for that
UML, which should make debugging easier. tail -f works
nicely here as well.
Naturally, --help will give you a list of all the switches.
What it makes: The setup function will create the following directories
and files in jhbuild/:
uml_test_env/-+-disk.img # UML disk image containing the
| # fedora core ext3 image.
| # Hacked from from J5's boot
| # image. This won't be
| # modified at runtime.
+-kernel/ # linux kernel build directory.
+-instance/ # instance specific files.
There are four instances, "Maurice", "Sally", "Miles", and
"Antoine".
To get rid of or reset the UML test system, just remove
the uml_test_env/ directory.
To see more help information, run olpc_uml_emu.py --help.
The Disk Image: This script basically uses J5's boot image, with three
modifications; the addition of a small initscript enabled
in runlevel 5 with priority 99, and most of the virtual
consoles are disabled in /etc/inittab because otherwise
UML helpfully spams your desktop with xterms.
Known Issues: This script was written on Ubuntu, and I may have
inadverdently committed some Ubuntu/Debianisms. Since the
target audience here will mostly be running Fedora Core
systems, a few bugs are possible. But I really can't be
bothered to haul down Fedora Core just to test this.
Patches welcome. :)
Boot time is a little slow. This is mostly udev's fault,
which with the load of all four instances starting up takes
about a minute on modest hardware.
Forget about x86_64. But then, Sugar doesn't work there
anyway (-fPIC screwups in Mozilla). Xephyr also seems to
be broken there, at least on Ubuntu (it might just be
nvidia being screwed up again, though).
I haven't implemented NAT yet, so the instances don't yet
have IPv4 Internet access.
UML uses some very hacky^Wclever tricks in order to make
its terminal emulation work right. It "discovers" the tty
it's running on, regardless of whether or not its stdout
is redirected elsewhere, and puts it into "raw" mode.
Thus, the tty that olpc_uml_emu.py runs on gets screwed
up on a regular basis. In fact the reason why I made a quick
GTK GUI with the "Stop OLPC emulators" button on it was
because this broke interactive use of the terminal. At
any rate, I now have strategic `stty sane` incantations
in strategic places in the code so as to at least return
your tty to you in a usable state.
For some reason, the pilgrim disk image does not support
IPv6 out of the box. Despite this, I've made a point of
making this script build UML with IPv6 support enabled
because more and more of Sugar will ultimately rely on
IPv6. However,I'm not bothering (unless people clamour
for it) to add support for talking to the outside world
via IPv6 because simply *isn't* any IPv6 Internet to talk
to (yet). :(
There are two weird-isms to do with hostfs and jhbuild.
jhbuild requires that it have the same absolute directory
name in the VFS when it is run as it did when built. Thus,
jhbuild is mounted in the same directory inside the UMLs
as it is on the host machine. If you didn't have it in a
particularly weird place, this should be fine. However,
all of the UIDs on the files will all appear as integers
(or perhaps just a nonsensical username, depending on how
your host box was configured). This shouldn't cause any
major issues.
Right now the disk image contains an old copy of Sugar
installed from RPMs. While this isn't really a big deal
because jhbuild does a fine job of isolating the build copy
from any system stuff, it could potentially cause problems
in the future. A new pilgrim build target just for
olpc_uml_emu.py might make sense.
For some reason, if an X11 Bad Auth error happens at
startup, the GTK library seems to forcibly quit the
python interpreter, rather than raise an exception.
Thus, whenever this condition occurs, the script will fail,
even if you're just using the netstart and netstop options,
which don't use GTK (after a su root, for instance). Unset
DISPLAY to get around it.
Sometimes UML gets all confused and pins the CPU at 100% and
doesn't quit. killall linux is your friend.
You have to enter the machine nicknames manually into
sugar-shell at first boot, and the Xephyrs don't have very
identifying titles. Sugar's first boot screen has no means
of setting a default name in the Entry box, and Xephyr has
no means to change the title. I am sorry, but this means
that the nicknames of the machines are nonobvious at
runtime. My advice is just to remember the X11 display
numbers. :40 is Maurice, :41 is Sally, etc.
There's no (useful) sound support. UML doesn't have any
ALSA support at all, and the OSS support doesn't seem to
work even though I enabled it in .config. Sorry, TamTam.
The X servers listen on 0.0.0.0 with access controls
disabled. 'nuff said.
Happy Hacking!
usage: olpc_uml_emu.py [options] <netsetup, setup, start, netstop>
options:
-h, --help show this help message and exit
-x XWIDTH, --xwidth=XWIDTH
Width of the X instances. (default: 640)
-y XHEIGHT, --xheight=XHEIGHT
Height of the X instances. (default: 480)
-t TAPDEV, --tapdev=TAPDEV
TAP network adapter name to create/use the virtual
OLPC network on. DON'T use a 'tun' device. (default:
tap0)
-o TAPOWNER, --tapowner=TAPOWNER
Owner of the tap device created by netstart. This
will probably be the user who will run the OLPC test
environment.
~~~
Regards,
Andrew Clunis
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
iD8DBQFFTa9PALkUMXSNow8RAroWAKCzvHqMWlx6D+lfeGNpQ/q8T9ltVACdHgko
VHmy5HvKspRw820Pi9JCTv4=
=MmdR
-----END PGP SIGNATURE-----
More information about the Sugar-devel
mailing list