[IAEP] [Systems] ASLO updates

David Farning dfarning at sugarlabs.org
Thu Oct 15 14:13:13 EDT 2009


I'll try to document how mirror brain is setup and how it affects
other systems in the wiki this afternoon.

Changes to devel, testing, or product activities.sl.o should not
affect one another.  They are three separate instances consisting of:
1. Separate code trees.
2. Separate database instances.
3. Separate file storage.

Two variations of the file tree are stored.  These trees are (I am
using mozilla terminology for the trees):
1. Application tree.
2. Repo tree.

Each instance has an individual application tree stored at
www-sugarlabs/activitie-*/files.  Each instance interact directly with
its application tree.  From a user pov this tree is call the sand box.

Each instance can have an _optional_ repo tree.  An instance interacts
with its repo tree via the 'download server.'   From a a user pov this
tree is the set of downloadable files.

Our confusion was because we were pointing the download server at the
application tree via a symlink.  When the download server was the same
machine as the application server it worked correctly.  Adding
mirrorbrain forced us to clearly draw the line between the download
server and the application server. Long term, it is good system
design.  Short term, a bug in mirrorbrain cause it to incorrectly
handle symlinks in the vhost directory part.

Moving forward--
1. Work Flow -- What ever you decide for a work flow is fine.
2. A.sl.o modularization -- a.slo is design to split up into several
pieces as it grows:
2.1 Application server. This is the primary php application.  It can
be split across multiple machines using mod_perbal.
2.2 Database server. This is the primary database.  It can be split
across multiple machines as necessary.  There are limited to
scalability due to db replication issues.
2.3 Download server.  This is the primary download server.  It can be
scaled using various CDN techniques.
2.4 Memcache server.  Memcach sits between the application server and
the database.  Memcache reduces database server load and is easier to
scale than multiple database server.

Currently, all of these pieces are sitting on sumjammer. As a result
we were a bit hand wavy about the abstraction barriers between the
pieces.

Due to the improvements bernie and danny have done to sunjammer we are
maxing out at 45% cpu usage.

Due to mirror brain we have reduced our the bandwith usage on
Sunjammer from between 100 and 150 GB per day to less than five GB.
The big gain is that offloading the downloads reduces the cache churn
and eth0 interupts on Sunjammer.

Recommended growth roadmap--
Based on discussions with the AMO infrastructure team the recommend
plans for grow are:
1. Split off download server.  We have effectively done that via
mirrorbrain.  But at some point we will need to see about putting
mirrorbrain on a separate machine.
2. Split off download server.  At some point, we will need to put the
database on a separate machine which will grow into cluster of
machines.
3. Split off memcache server.  At some point we will need to split off
memcache machines.  Memcache machines can just be a bunch of old
machines with lots of memory.
4. Build multiple application servers.  This is just a matter of using
mod_perlbal to distribute across multiple servers.

I have no idea when we are going to have to make the above changes.
Current growth trajectory for a.sl.o is about 25% per months.  But,
this could change when:
1. SoaS buleberry comes out.
2. Sugar .86 starts to hit the street and the updater starts
automatically pinging a.sl.o for updates.

I do want to make sure that ever though a.slo. _looks_ like a black
box, scaling a.sl.o is a solved problem.

Hope this helps.
david

On Thu, Oct 15, 2009 at 3:04 AM, Aleksey Lim <alsroot at member.fsf.org> wrote:
> On Thu, Oct 15, 2009 at 07:34:50AM +0000, Aleksey Lim wrote:
>> On Thu, Oct 15, 2009 at 02:42:15AM -0400, Bernie Innocenti wrote:
>> > [cc += dfarning, alsroot, systems@]
>> >
>> > El Wed, 14-10-2009 a las 17:47 -0700, Josh Williams escribió:
>> > > I've made some bug fixes to the new ASLO design, I've tested it lightly
>> > > and it seems to work in all major browsers (even ie6). If you have a few
>> > > moments, please test it out (download/upload activities, browse around)
>> > > and let me know if you see any display bugs or major usability issues.
>> > >
>> > > http://activities-devel.sugarlabs.org/en-US/sugar/
>> >
>> > All links to activity bundles appear to be broken :-(
>> > For example:
>> >
>> >  http://activities-devel.sugarlabs.org/en-US/sugar/downloads/file/26072/xpi/labyrinth-7.xo?src=addondetail
>> >
>> > I'm not sure how to fix it, but I can imagine that it may be related
>> > with moving the activity bundles from their old location
>> > (/srv/www-sugarlabs/activities/files) to the upload directory
>> > (/srv/upload/activities/) done by Dfarning in order to enable
>> > Mirrorbrain.
>> >
>> > Earlier today, alsroot asked me to fix some permission issues that would
>> > prevent aslo from writing new activities in the new location.
>>
>> Thats intended to be so, activities-devel is just mysql copy of
>> activities, I thought, it shouldn't affect activities-devel testing(but
>> you can create new activity/version and it should be downloaded).
>>
>> > also noticing that there's still a copy of the activities in the old
>> > location, and it is also bigger by 40MB!
>>
>> Thats because /srv/www-sugarlabs/activities/files contains some tmp
>> directory, it shouldn't affect .xo downloading.
>>
>> > /me is very confused :-/
>> >
>> > Could anyone who was involved please write a short description of what
>> > was changed exactly? I'm only trying to reconstruct the current
>> > situation, not looking for a scapegoat.
>>
>> We just tried to utilize AMO feature when it lets user download
>> public .xos from mirror sources and from files/ for other cases.
>> Recently all .xo were downloaded from files/(even after creating symlink
>> in /upload to files/).. and it was done from several attempts.
>>
>> For now we have two independent sources for .xo downloads:
>> * /srv/www-sugarlabs/activities/files for not public activities
>>   and for 30min age public activities
>> * mirrored /upload/activities for all public activities,
>>   after making new .xo public, ASLO uploads it to /upload/activities
>>
>> > Hmm... well, perhaps we can learn something from this accident:
>> >
>> > The classic way to avoid the "too many cooks" syndrome would be to
>> > appoint a single official maintainer and make all the change requests go
>> > through him. However, I feel this "solution" would create lots of
>> > critical roles and ultimately defeat our ongoing attempts to
>> > decentralize system administration.
>> >
>> > Instead, we shall establish simple procedures to improve sysadmin
>> > coordination and communication. For example:
>> >
>> > 1) commit configuration changes in git along with a short description
>> >    of what was done and why. We already have repositories for /etc and
>> >    also and we could create more repos as needed;
>>
>> Yeah, for now, ASLO configs are stored only in
>> ~activities/site/app/config
>
> hmm.. but what about passwords in ASLO configs
>
>>
>> > 2) write a short report for systems@ when we make substantial changes
>> >    to a service;
>> >
>> > 3) write or update the wiki documentation for important sysadm
>> >    procedures such as installing a new instance of a service
>> >
>> > Use your common sense to decide what needs to be documented and how much
>> > detail is needed. At all costs, we want to avoid putting too much
>> > bureaucratic burden on volunteers because it's the most effective way to
>> > make them look for something more exciting to do.
>> >
>> > We could save time by coalescing steps (1) and (2): all we need to do is
>> > enabling a post-commit hook in the repositories that would send patches
>> > to systems-logs@ . We need to be extra careful not to expose passwords
>> > in this way.
>
> not sure, ASLO configs could be not obvious for others, so we need
> explanation in email anyway.
>
>>> Any volunteers to write and test this procedure?
>>
>> Well, we don't need such ASLO specific administration 24x7, most of time
>> it could be just regular file-permissions/apache/etc administration.
>
> Maybe just reuse issue tracker, I mean it could be a good way from
> decentralization pov and all interested people can subscribe to
> bugs.sl.o email notifications. So all administration related tasks(not
> only ASLO) could be requested on bugs.sl.o at first.
>
> --
> Aleksey
>


More information about the IAEP mailing list