No subject

Sun May 31 08:28:51 EDT 2009

similar to the download button on .  The user clicks
a button(link) and the the rest happens behind the scenes.  If you
watch the status bar in the lower left corner of firefox as you you
click the download link, you can see the redirect flashing by.

The mirror systems works through a couple of processes:
1.  The mirrors pull updates from the primary server.  Individual
mirrors are controlled by their local mirror maintainers.  Those
maintainers configure their mirrors to update from the  It is common for maintainers to sync against
between once an hour to once a day with the primary servers.  A good
example of mirror ages can be seen at

2. Mirrorbrain pings all of the mirror once a minute to make sure that
they are still alive.  (Redirecting to a dead mirror in not good.)

3.  Mirrorbrain checks what files are available on each mirror every
five minutes.  (This is a check on step 1 to determine _when_ each
individual mirror has updated itself)

4.  When a download request comes in mirrorbrain uses the information
gathered in steps 1-3 to correctly redirect the download request:

4.1. When a download request come in to ,
mirrorbrain check to see if the file exits in /srv/uploads (the name
is a bit of a kludge... /var/www-sugarlabs/download was a symlink to
/srv/upload for historical purposes).  If the file does not exist the
user receives a file not found error.

4.2. If the file exists, mirrorbrain check the file size.  Anything
smaller than 4K is served.  (It is not worth the database lookup and
redirect traffic for files smaller than 4K.)

4.3 If a request is for a file larger than 4K:

4.3.1 Mirrorbrain determines the physical location where the download
request originated (Onalaska, WI, US, North America or Berlin,
Germany, Europe)

4.3.2 Mirrorbrain search its database for the closest (good) mirror
which has the requested file (As determined in step 3)

4.3.3 If a good mirror is found, the download request it redirected to
the mirror.

4.3.4. In no mirror is found, the file is served straight from .

There are some side effects of this process:
1. is the weak link.  If it goes down, the
entire mirror system become inoperable.  Because of this, the
infrastructure team hosts on a very reliable machine.

2. There is a lag between when file are available for download and
when they are available on the mirrors.  During this lag, a file is
served directly from  At current traffic
levels that is not a problem.

2.1 Project with popular products usually a third layer called
staging.  For example, when a new version of firefox (or fedora) is
release, downloads spike immediately.  So the mirrors compete with
normal downloads for copies of the content.  This competition can
crash the primary server.

Instead, the mirrors synchronize against staging.  New popular product
at first added to the staging tree a couple of days before the
actually public release data.  This gives the mirrors several days to
update.  On the public release date the file is added to the download
tree.  At this point it is available to public download and the
mirrors have already been pre-seeded.

2.2.  A harder challenge will be .  When
activities are approved they are immediatly  made available for public
download.  This could be a problem if every student in Uruguay updates
their computer with in minutes of a large and popular activity such as
etoys being release.  The good news is that we have at least a year
for that to become an issue and the mozilla and mirrorbrain developers
also working the the issue.

3. Security.  We are going to have to consider that mirrors can be
hijacked.  ISOs will have to be shipped with md5 hashes.  The md5
hashes will be small enough that it is always shipped from the primary
server.  This will make it harder to attack both the iso and the hash.
 The activity installer will need to check the md5 hash of activity
bundles before installing them.  The hash is calculated as part of the
process to upload to

4. Download tree size.  We are going to have to consider the size of
the download tree.  For example currently, the tree has 40 to 50 soas
snapshots which take about 20GB of space.  We are going to have to
determine what gets mirrored.  I am looking at setting up two separate
rsysnc groups called 'releases' and 'entire.' This would allow
individual mirror maintainers to chose between the small 'releases'
goups and the entire tree.

I'll try to put this onto the CDN wiki page.


More information about the Systems mailing list