[IAEP] etoys now available in Debian's non-free repository

Jim Gettys jg at laptop.org
Wed Jun 25 04:31:57 CEST 2008

On Tue, 2008-06-24 at 11:41 -0700, John Gilmore wrote:
> Jim:
> > My point is somewhat different: the only way out of the compilation
> > trust trap is another compiler.  Unless someone has done this for gcc,
> > it has the identical problem, and there are many possible upstream
> > attacks.  I see no reason (probably less) to trust the chain of trust
> > for gcc than I do Squeak, as the rewards of attacking gcc are so much
> > higher.
> Alan:
> > We are dealing with stories here, not with any kind of reality. 
> Here's a story for you.
> GCC is regularly compiled with non-GCC compilers.  It has been
> bootstrapped thousands of times, using dozens of different compilers
> (Sun's, Intel's, Apple's MPW, Green Hills, Code Warrior, MIPS's, ...).
> GCC includes lots of hacks and configuration code to allow it to build
> on other vendors' buggy or divergent compilers.  I've bootstrapped it
> myself hundreds of times, and Cygnus built the testing infrastructure
> to do it nightly on many architectures.  The makefiles are set up to
> do the classic 3-stage bootstrap:
>   stage1 = binaries from GCC sources compiled with vendor compiler
>   stage2 = binaries from GCC sources compiled with stage1 GCC
>   stage3 = binaries from GCC sources compiled with stage2 GCC
> We then make sure that the stage2 and stage3 binaries are identical.
> (This check has caught hundreds of bugs in gcc, binutils, and in
> vendor compilers.)

Ah, yes, I remember this.  I've even struggled to do this once or twice,
but that was about 15 years ago...

> (The 3-stage bootstrap does not prevent or detect a Ken Thompson style
> all-binary attack -- but to succeed on a particular platform, you'd
> have to patch both the vendor compiler and GCC, and keep these patches
> in sync over time.  GCC also supports full blown cross-compilation (of
> itself and anything else), so a cross-bootstrap from any other host
> architecture would break the chain of an all-binary attack.  Debian's
> issue is not about resistance to Thompson attacks; this is just an aside.)
> It is trivial to build working binaries of GCC from a source tree, if you
> have a C compiler from any other source.  "./configure && make install".
> This is quite different from the eToys situation, in which there is a
> single binary implementation of the language; and the sources, where
> present, are all mixed into a binary blob that's only readable by the
> single implementation.  I have the same concerns that Debian does.  Is
> there even a tool internal to eToys that confirms that everything in a
> blob includes the matching source?  Let alone a tool that would
> extract that source and rebuild the blob from scratch, using a
> virgin binary environment.
> We could've bootstrapped GCC once, and limped along ever afterward
> with binaries built from that one original GCC binary.  (In a sense,
> the entire C compiler market has done this.  Bell Labs' original C
> compiler was bootstrapped from a BCPL compiler, and every other C
> compiler probably bootstrapped from Bell's C compilers.)  Instead, the
> GCC maintainers built lots of infrastructure to allow GCC to be
> bootstrapped anytime somebody wants to.  And to test it regularly.
> That's the part that eToys hasn't done.
> It turns out to be quite hard to reproduce a GCC release or Linux
> release, bit-for-bit identical, from its source tree.  If you haven't
> tested it, I'd pretty much guarantee that it isn't working.  It has
> subtle dependencies on its build environment, which predates the
> release.  GCC depends on the binutils, on libc, and on many external
> include files, for example.  Even unpacking a source release depends
> on tar and gzip or bzip2, each of which depends on the installed
> shared libc.  Cygnus was bit by such build-environment dependencies
> several times, when trying to produce patches against its old GCC
> binary releases.
> Are Fedora releases verified before release to build an identical
> binary CD/DVD image, starting from only a bare Fedora (release X)
> bootable binary CD/DVD, a freshly bought bare PC, and the pile of
> source code RPMs in the Fedora (release X) source CD/DVD?  (I think
> not -- I don't see source or binaries of pilgrim, which makes the
> bootable CD image.)
> Are Debian releases tested to reproduce themselves the same way?
> Perhaps eToys can get a reprieve until Debian confirms that it,
> itself, can do what it asks of eToys.
> 	John
> PS: Several distros don't make it easy to *find* source code releases.
> There's no obvious path on http://getfedora.com to get the definitive
> F9 source code release; you have to troll around in the mirrors
> looking for it.  OLPC will groan when it gets the first letter asking
> for the matching sources for its binary (flash) release.  It's only a
> few cranky geeks like me who insist on actually obtaining the sources
> for their "free" binary software.

I hope/believe Dennis has this cleaned up now....  I'll check to make
sure.  Please pick on Fedora first, as they are upstream of us ;-).

Back to the real matter at hand:

Note the following: squeak is actually a several part system:

The VM (virtual machine), which is compiled using a C compiler and
exquisitely examined regularly for performance reasons, and recompiled
with some regularity with your favorite C compiler.  As I understand it,
Squeak generates this C code itself.

This VM interprets the image file, and so this C code of the VM can and
is regularly examined, as Yoshiki points out, and for which the code can
be decompiled by tools and examined.  In fact, the binary image is
routinely decompiled whenever debugging is done in Squeak.

So as Yoshiki points out, it is actually feasible to complete this loop
and verify the binary in the image file has the same result; external
programs (have) exist(ed) to do so, in Yoshiki's example, in Squeak.  In
this case, the Thompson attack seems unlikely; having Squeak able to
recognize you are compiling a program intended to decompile an image
seems pretty far-fetched to me (it isn't the same as a compiler
recognizing it is compiling itself).
                           - Jim

Jim Gettys <jg at laptop.org>
One Laptop Per Child

More information about the Its.an.education.project mailing list