[Systems] pointing activities.sugarlabs.org at the proxy

Bernie Innocenti bernie at codewiz.org
Mon Nov 16 02:29:36 EST 2009

El dom, 15-11-2009 a las 23:27 -0600, David Farning escribió:

> 1. aslo-proxy - This is the secure front end which contains the squid
> reverse proxy and will contain the haproxy (for ha and load balancing)
> .  This will sit on the public internet.  The main constrains here
> will be memory and IO speed. The big win is that we are caching the
> static content (images, css, and js) before it hits the php servers.

These static files make up 60-80% of the requests, but Apache usually
serves these in fractions of 1ms. Compare this to 800-900ms of a typical
aslo page and we get a total saving of less than 1% by offloading these
to Squid.

We could still use Squid as a load balancer, but haproxy seems to be a
better tool for this job.

The argument for security is sound, but we'd have to hide the aslo-web
slaves away from the public Internet, which is hard since our
infrastructure is not centralized (we have no LAN).

> 2. aslo-web - This is were the php happens.  The main constraint here
> appears to be CPU.

Yep. I expect this is the only part of the cluster that needs to be
replicated. This is why I was proposing to merge the other VMs.

> 3. aslo-db - This is where the mysql and memcache will live.  Mysql is
> cpu and IO bound while memcache is memory bound.

Why would we centralize memcache if we're not forced to? It would
generate a lot of extra network traffic, adding a lot of latency.

Actually, I'd propose an opposite strategy: create MySQL replica slaves
on each aslo-web node to maximize throughput.

You can bet that the vast majority of queries will be read-only, which
can be served by the replica slaves without an extra TCP/IP connection
to the master database server.

Many years ago I did setup MySQL in a master-slave configuration using
the binary logs. It was very easy to do.

> At this point the issues in not scalability but rather how to
> determine our future scalability needs.  You are right it would be
> easier and more efficient to stick the whole stack on one machine.  By
> doing the abstraction now it will we be easier to scale in the future.

I agree on the general principle, but Squid is a tiny service that can
be moved from one machine to another in 10 minutes. Dedicating an entire
virtual machine to it seems just overkill.

Compared to physical boxes, VMs are easy to setup and clone.
Unsurprisingly, many sysadmins tend to get a bit overexcited when they
discover the potential. There's more to it: VMs also hide several, less
obvious downsides. Within an intricate network of interdependent nodes,
complexity tends to explode and reliability drops. Chasing problems
generally requires opening 3-4 ssh sessions and comparing multiple logs.

You don't want to be in this situation unless you absolutely have to. 

> > Also, I feel that the proxy and the database shouldn't run off a crappy
> > qcow2 file, for performance reasons. As the number of aslo-web
> > front-ends increases, they will probably perform a lot of disk I/O
> I have not been following how you are setting up the VMs.  ON my test
> machine at home I have my entire disk set up as LVM on raid (the is a
> small ext2 boot partition to start dom0).  I just move and resize
> memory and hard drive space as needed for individual vms.

Yes, it should be this way also on treehouse. This is the command to
setup a new VM:

 virsh vol-create-as treehouse FOOBAR 10G
 virt-clone -o template-jaunty --file=/dev/treehouse/FOOBAR -n FOOBAR

It should automatically setup a new LVM logical volume in
the /dev/treehouse VG.

If you agree, I'll migrate also-db, aslo-web and aslo-proxy to their

> aslo-proxy is ready to go into production.  It is all set up. I
> thought you set up the backup last night.  aslo-proxy is currently
> pointing at the existing aslo instance on sunjammer.  I would like to
> spend a week or so tuning before pointing it at the new also-web
> instance.

>From my benchmarks, Squid doesn't seem to help increase the throughput.

We could transition it into production anyway, but until we multiply the
number of application servers, the net result would be a slight increase
in complexity and fragility with no gain.

Do we really want to on anyway?

> I would like to emphasise that that point of using a layer of VMs is
> not because VMs are cool.  As Bernie correctly states, they are a pain
> in the ass to set up.  The point of using the VMs is to insure that I
> have the architectural design and abstraction barriers right for when
> we need to migrate the VMs to their own physical machines.

Once we start depending on a myriad of VMs for production services, it
will be harder to go back and consolidate on fewer machines.

I don't want to delay the much needed aslo scalability work, but at this
time I think the only thing we need is to get aslo-web into production
with the current database backend and load balance requests to it.

I'd bet a pizza that we could get to to 6-8 aslo-web machines before
we'd start to require a dedicated machine for MySQL. At this size, Squid
would start to become necessary, but only if we make it cache php pages,
which is tricky to get right.

   // Bernie Innocenti - http://codewiz.org/
 \X/  Sugar Labs       - http://sugarlabs.org/

More information about the Systems mailing list