[IAEP] Squeak sources (was: Sugar on Ubuntu - Summary)

Jecel Assumpcao Jr jecel at merlintec.com
Fri Nov 7 13:30:57 EST 2008


David Van Assche wrote on Fri, 7 Nov 2008 14:39:00 +0100
> I've just been corrected on this by a ubuntu dev, who states for it to
> get into universe, it must have source code available. That means that
> for now, squeak must stay in Multiverse.... the only option seems to
> seperate squeak from sugar, if sugar is to get into main or universe.

Since Bert's authorative, but brief, clarification seems to have had no
effect on this thread I am forced to go into more details than you all
probably wanted to know about Squeak in order to clear up this
confusion.

If you download a typical Squeak distribution you will get four files:

1) an executable called just "squeak", "squeak.exe" or something more
complicated depending on your OS. This is the virtual machine (VM - not
virtual memory in this context) which simulates the imaginary computer
on which the rest of Squeak runs. This is the only part of the system
that is different for each platform to which Squeak has been ported
(which are quite a few). In many systems this virtual machine is broken
up into a main executable and a few dynamically loaded libraries but for
the rest of this post I will consider it to always be a single file to
make things simpler. The sources for this are fully available but it is
a complicated story and not at all the focus of the Debian discussions
so I will explain them at the very end of this email.

2) an image called something like "squeak3.11-dev.image" which is a
simple memory dump of the VM, very much like a "suspend to disk" file in
a laptop PC. This means that when you start up Squeak it is in exactly
the same state as when you last used it. If you were in the middle of
typing some text then that window will be open and the cursor will be in
the same place. Everything that matters is in the image.

3) and 4) is something like "squeakV3.sources" and
"squeak3.11-dev.changes" which are the FULL TEXT SOURCES OF EVERY LITTLE
BIT OF CODE INCLUDED IN THE IMAGE. They are logically just a single file
but are split in two in order to deal with size limits and to save disk
when you have lots of .image/.changes pairs on your machine. You can
look at them in any ASCII text editor that can deal with Mac line ends
and with such large files. You can do a tr '\r' '\n' <*.changes |
whatever to let loose all the Unix text apps on them. And note that
though the .changes file can be compressed to keep only the latest
versions of each code, that is typically not done and so you can
normally get the sources for all versions of a given method in a normal
Unix text editor without even firing up Squeak at all.

Why is there any talk about "binary blobs without sources", then? The
.image file can contain arbitrary objects including drawings, photos,
waveforms and so on which are built with editors running inside Squeak
directly instead of from some textual representation. These objects are
their own sources, as is any JPEG or MP3 file. You won't find any trace
of them looking at .sources and .changes. Some people object even to
that, though I fail to see why - any Linux distribution has plenty of
icons, images and other stuff of the same kind.

A far more interesting objection is that Squeak's tools allow someone
smart enough to create code inside the image that doesn't have its
source saved to the .changes file. But note that any code in the image
is in the form of bytecodes and there is a decompiler which kicks in
automatically whenever the source can't be found in .sources or
.changes. The comments will be missing and local variables will be named
t1, t2, t3 and so on but the result is good enough that more than once I
got this instead of the actual sources (normally due to some permission
problem with the source files) and it took me a very long time to
notice.

The objection can be extended to state that since the decompiler and
other stuff is inside Squeak, someone really, really smart could not
only insert nasty sourceless code in the image but also patch all tools
to keep that hidden. But we have tools to examine one image from inside
another (supposedly trusted) image as a side effect of the was Squeak
was initially developed.

So the whole discussion has never been about sources at all, but about
security. In Squeak we sacrifice (in theory) some security for absolute
freedom. We allow anyone to do anything and this means some bad person
could hurt us. I hope you can see the irony of people classifying Squeak
as "non free" due to our choice.

Back to the VM and its sources - you can read about how Squeak was
bootstrapped from Apple Smalltalk in the "Back to the Future" paper
(http://users.ipa.net/~dwighth/squeak/oopsla_squeak.html). The VM was
written in a very restricted subset of Smalltalk (which came to be
called "Slang" though it has no relation with other programming
languages with that name) which was fully debugged in Smalltalk itself
and then translated into C by a specially written tool. The C version
was compiled and become the first Squeak VM. Both this Slang code and
its C translator were in all the images (and so .sources or .changes)
distributed for many years. As Squeak was ported to Unix, Windows and
other systems we started to get some C files that were distributed
separately. Note that these were true source files, unlike the results
of the C translator where the Slang code would have to be considered the
true sources. Eventually the process became complicated enough that a
package called VMMaker was created to deal with it. This VMMaker and VM
Slang sources moved outside the official Squeak image and became an
optional package that can be easily loaded by anyone interested in that
kind of thing. The hand crafted C sources live in a SVN server which
should be very familiar to any Unix developer.

In short:

full VM sources: Slang code in VMMaker package + C files in SVN server
full image sources: the .sources + .changes text files

-- Jecel



More information about the IAEP mailing list