[Systems] pointing activities.sugarlabs.org at the proxy

David Farning dfarning at sugarlabs.org
Mon Nov 16 03:27:59 EST 2009

2009/11/16 Bernie Innocenti <bernie at codewiz.org>:
> El dom, 15-11-2009 a las 23:27 -0600, David Farning escribió:
>> 1. aslo-proxy - This is the secure front end which contains the squid
>> reverse proxy and will contain the haproxy (for ha and load balancing)
>> .  This will sit on the public internet.  The main constrains here
>> will be memory and IO speed. The big win is that we are caching the
>> static content (images, css, and js) before it hits the php servers.
> These static files make up 60-80% of the requests, but Apache usually
> serves these in fractions of 1ms. Compare this to 800-900ms of a typical
> aslo page and we get a total saving of less than 1% by offloading these
> to Squid.

Let's agree to disagree on this until we get the benchmarks in place.
The knowledge I am gaining running reliable benchmarks is probably
worth the effort.  If the the experiment fails, I will gladly throw it

> We could still use Squid as a load balancer, but haproxy seems to be a
> better tool for this job.
> The argument for security is sound, but we'd have to hide the aslo-web
> slaves away from the public Internet, which is hard since our
> infrastructure is not centralized (we have no LAN).

Ideally this will be the case.  Have you talked to Steve at RIT?
Apparently they have an entire rack of blade servers sitting idle in
the inovation center.

>> 2. aslo-web - This is were the php happens.  The main constraint here
>> appears to be CPU.
> Yep. I expect this is the only part of the cluster that needs to be
> replicated. This is why I was proposing to merge the other VMs.
>> 3. aslo-db - This is where the mysql and memcache will live.  Mysql is
>> cpu and IO bound while memcache is memory bound.
> Why would we centralize memcache if we're not forced to? It would
> generate a lot of extra network traffic, adding a lot of latency.
> Actually, I'd propose an opposite strategy: create MySQL replica slaves
> on each aslo-web node to maximize throughput.

Eventually we are going to need both.  The downsides of depending on
master slave databases for scalability is that:
a. each db server caches that same hot data.  -  If we set up a
master-slave combo each with 8GB of cache, the same 8GB of objects
with be cached on each machine.  Memcached, while slower, shares the
same cache across multiple php servers and db servers.

> You can bet that the vast majority of queries will be read-only, which
> can be served by the replica slaves without an extra TCP/IP connection
> to the master database server.

aslo is getting a ratio of about 90% reads to 10% writes.
aslo-db is getting a cache hit rate of just below 90% with 4GB of
memory and 3GB of dedicated innodb_cache.
The entire database is about 70GB

> Many years ago I did setup MySQL in a master-slave configuration using
> the binary logs. It was very easy to do.

I have a master-slave set up on my home aslo development system.  Very easy

>> At this point the issues in not scalability but rather how to
>> determine our future scalability needs.  You are right it would be
>> easier and more efficient to stick the whole stack on one machine.  By
>> doing the abstraction now it will we be easier to scale in the future.
> I agree on the general principle, but Squid is a tiny service that can
> be moved from one machine to another in 10 minutes. Dedicating an entire
> virtual machine to it seems just overkill.
> Compared to physical boxes, VMs are easy to setup and clone.
> Unsurprisingly, many sysadmins tend to get a bit overexcited when they
> discover the potential. There's more to it: VMs also hide several, less
> obvious downsides. Within an intricate network of interdependent nodes,
> complexity tends to explode and reliability drops. Chasing problems
> generally requires opening 3-4 ssh sessions and comparing multiple logs.
> You don't want to be in this situation unless you absolutely have to.

Yes, I agree VMs incur an unavoidable cost in complexity, fragility,
and overhead.  At our currently scale they are nothing more than a
pain in the ass:(  Note--dfarning does not like using vms
unnecessarily in production.

My goal is to learn to set up, learn to test, learn to benchmark, and
learn to maintain aslo when it is spread across multiple physical

>> > Also, I feel that the proxy and the database shouldn't run off a crappy
>> > qcow2 file, for performance reasons. As the number of aslo-web
>> > front-ends increases, they will probably perform a lot of disk I/O
>> I have not been following how you are setting up the VMs.  ON my test
>> machine at home I have my entire disk set up as LVM on raid (the is a
>> small ext2 boot partition to start dom0).  I just move and resize
>> memory and hard drive space as needed for individual vms.
> Yes, it should be this way also on treehouse. This is the command to
> setup a new VM:
>  virsh vol-create-as treehouse FOOBAR 10G
>  virt-clone -o template-jaunty --file=/dev/treehouse/FOOBAR -n FOOBAR
> It should automatically setup a new LVM logical volume in
> the /dev/treehouse VG.
> If you agree, I'll migrate also-db, aslo-web and aslo-proxy to their
> partitions.

Yes, I think that would help significantly.

>> aslo-proxy is ready to go into production.  It is all set up. I
>> thought you set up the backup last night.  aslo-proxy is currently
>> pointing at the existing aslo instance on sunjammer.  I would like to
>> spend a week or so tuning before pointing it at the new also-web
>> instance.
> >From my benchmarks, Squid doesn't seem to help increase the throughput.
> We could transition it into production anyway, but until we multiply the
> number of application servers, the net result would be a slight increase
> in complexity and fragility with no gain.
> Do we really want to on anyway?
>> I would like to emphasise that that point of using a layer of VMs is
>> not because VMs are cool.  As Bernie correctly states, they are a pain
>> in the ass to set up.  The point of using the VMs is to insure that I
>> have the architectural design and abstraction barriers right for when
>> we need to migrate the VMs to their own physical machines.
> Once we start depending on a myriad of VMs for production services, it
> will be harder to go back and consolidate on fewer machines.
> I don't want to delay the much needed aslo scalability work, but at this
> time I think the only thing we need is to get aslo-web into production
> with the current database backend and load balance requests to it.
> I'd bet a pizza that we could get to to 6-8 aslo-web machines before
> we'd start to require a dedicated machine for MySQL. At this size, Squid
> would start to become necessary, but only if we make it cache php pages,
> which is tricky to get right.

yummmm.  I have been going to a different pizzeria every evening for a
pizza and beer.  I'll take you up on that bet.


> --
>   // Bernie Innocenti - http://codewiz.o

>  \X/  Sugar Labs       - http://sugarlabs.org/

More information about the Systems mailing list