[Systems] Load on treehouse

Bernie Innocenti bernie at codewiz.org
Thu Sep 2 07:26:40 EDT 2010


El Sun, 29-08-2010 a las 21:14 +0200, Sascha Silbe escribió:
> BTW, is IPv6 to sunjammer working fine for you? It's broken for me for
> several days now. Since squid isn't too smart about connecting to
> IPv4/IPv6 multi-homed hosts, it's rather annoying (I have to abort and
> resubmit every other HTTP(S) request).

Works perfectly for me. I tested ping, http and ssh from treehouse and
bender.


> > > by the way cpu usage score on treehouse:
> > > 1) aslo-web
> > > 2) lightwave
> What on lightwave causes the CPU load (everytime I run top it's completely
> idle so I can't tell)?

I think it's the sks thing.

I've now installed atop, so we can go back in the past and see what's
wrong. Does sks need much memory? Could it make the machine swap?

Anyway, I was planning to upgrade lightwave to lucid (sounds like a pun,
eh?). Shouldn't be hard, as it runs just named and sks. We'll re-measure
performance after the upgrade.


> > I figured out why lightwave causes high load. From time to time, sks
> > wakes up and trashes the disk for a few minutes. Sascha, why is that?
> Ah, I already suspected that (and changed the configuration to verify my
> suspicion, but didn't check Munin again yet).
> SKS does a daily statistics calculation which is heavily IO-bound. A few
> minutes doesn't sound bad, so in general it should be quite fine. The
> reason it's a problem on treehouse is still the combination of
> 
> 1. KVM hiding away any prioritisation (AKA ionice) done inside the VM

Of course. And anyway, ionice cannot do miracles even in non-virtualized
case due to ordering constraints of journaling filesystems. Moreover,
seek time dominates and the Linux block layer only cares about I/O
bandwidth.

As of 2010, fair I/O scheduling at the OS level is still a largely
unsolved problem (researchers thought they had solved it in publications
of the 70's, but they forgot to tell us how to implement it in a real
kernel with a real filesystem and a real workload :-)


> 2. running high-priority (LDAP) and low-priority (SKS) servers inside
>    the same VM (so we can't just ionice the entire VM).

the LDAP server is still not running on lightwave. Maybe I'm paranoid,
but even after 175 days of uptime I still don't trust treehouse's
stability enough. The host has no redundant fans and PSUs and I have no
physical access yet.

There are a number of pending infrastracture changes that I'm planning
to do after I'll be settled in Boston because they are too risky from
remote.


> So maybe we should move SKS into a new VM or at least a different,
> equally low-priority one (that we will ionice).

There are no priorities for VMs. Neither I/O, nor CPU.

I experimented with libvirt + cgroups on housetree, resulting in two
kernel crashes. Moreover, the schedinfo settings are not yet being saved
by libvirt.

It all looks like unfinished, undocumented, unstable crap. Especially on
Ubuntu which is always lagging behind on both kernel and libvirt.

-- 
   // Bernie Innocenti - http://codewiz.org/
 \X/  Sugar Labs       - http://sugarlabs.org/



More information about the Systems mailing list