[Systems] aslo cluster progress

David Farning dfarning at gmail.com
Fri Mar 12 13:48:24 EST 2010


We had a pretty big success yesterday.  We brought up a third aslo node
yesterday from stock ubuntu server to fully functioning aslo node with one
command.  The goal is to reduce the aslo cluster management as much as
possible:
1. No backups. It is easier to rebuild a machine than restore it from
backup.
2. Easy to replace machines.  Replacing a dead machine or adding a machine
to the cluster is easy and foolproof.  (easy and foolproof is important to
anything in maintain.)
3.  I we need to change a configuration.  That change is automatically
propagated to all machines

We had only one problem that need manual intervention.  I screwed up the
order of  disenabling the apache default site and installing apache.  Had to
go back in and remove the default site by hand:)  Still it was pretty cool.

Nodes two or three can die without affecting service.

Node one is still a bottle neck for.
1. The database.
2. The shared file system.
3. The loadbalancer.

Work this weekend will focus on setting up redundant 'masters' for the
database, filesystem, and loadbalancer.  The goal will be for a admin
(human) to be able to manually switch control of the cluster from one node
to the other.  From there it will be a matter of adding the High
Availability (HA) function so the cluster can pass control around on its
own.

The interesting problem is not switching control from one machine to another
machine.  The main problem is insuring that the orginal 'master' does not
wake up and think that it is still in control.

After a couple of slow weeks learning puppet it is nice to be moving forward
again.

david
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.sugarlabs.org/private/systems/attachments/20100312/e8256029/attachment.htm 


More information about the Systems mailing list