[Systems] Sunjammer crash postmortem

Samuel Cantero scanterog at gmail.com
Fri Jan 29 08:05:04 EST 2016
Previous message: [Systems] Sunjammer crash postmortem
Next message: [Systems] Sunjammer crash postmortem
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, Jan 29, 2016 at 9:49 AM, Samuel Cantero <scanterog at gmail.com> wrote:

> On Fri, Jan 29, 2016 at 12:33 AM, Bernie Innocenti <bernie at sugarlabs.org>
> wrote:
>
>> On 01/28/2016 08:32 PM, Samuel Cantero wrote:
>> > Apache wasn't the only process calling oom-killer. I found also
>> > opendkim, spamc, mb, uwsgi, and smtpd.
>> >
>> > The first incident was at Jan 28 03:07:25. Usually we have a lot of
>> > memory available in sunjammer. Munin stopped plotting at 02:40 and the
>> > memory was low as expected. I just can only imagine some kind of
>> > unmanaged over-commitment (over-provisioning) in the Xen Dom0.
>>
>> I don't think the Dom0 can steal ram from the domU. That's fixed, unless
>> you use those weird virtio balloon devices.
>>
>
> As far as I know, Xen have the memory over-commit feature. With this
> feature, memory is taken from one domain and given to another using the xen
> "ballooning" mechanism. My simple theory is that this feature is enabled in
> the Xen instance running at the FSF and suddenly the host got overloaded
> without the capacity to provide the memory needed for our VM.
>
>>
>> I'm not really sure what allocated all the memory, but it had to be
>> something internal. The kernel dumped a list of processes and their
>> memory usage at each oom iteration, but none of them is particularly big.
>>
>> The real memory usage of apache is very hard to estimate, because it
>> forks plenty of children, each with big VSS and RSS figures. However,
>> most of the pages should be shared, so they don't add up.
>
>
I didn't read it well before. It makes sense now. It is really hard to
estimate if those pages are shared.

>
> Currently, we are running apache in prefork mode (multiple child processes
> with one thread each). I have done a little test in order to sizing it now.
> We can check again this value when apache is crawling.
>
> Total number of apache2 processes: *53*
>
> ps aux | grep 'apache2' | grep -v grep | wc -l
>
> Our server limit value is 512.
>
> Total RSS (resident set size) value for the total number of apache2
> processes: *3553.78 MB*
>
> ps aux | grep 'apache2' | grep -v grep | awk '{sum += $6;} END {print
> sum/1024 " MB";}'
>
> According to ps man page, the RSS is the non-swapped physical memory that
> a task has used. However, this sum is not related with the free cmd output.
>
> scg at sunjammer:~$ free -m
>             total       used       free     shared    buffers     cached
> Mem:         11992      10848       1144          0        682       8036
> -/+ buffers/cache:       2128       9864
> Swap:         8191          0       8191
>
> Where the used memory is only 2 G. The rest is cached memory. Do am I
> missing something?
>
>
>> > Regarding to disk I/O:
>> >
>> > iostat shows:
>> >
>> >   * An average of 32 tps (IOPS) in the first partition (/root). iostat
>> >     -x shows an average latency (await) of 126 ms. The 25% are read
>> >     operations and the 75% are write operations. Munin shows an average
>> >     latency of 145 ms since we're running diskstats plugin.
>> >   * An average of 26 tps in the third partition (/srv). iostat -x shows
>> >     an average latency of 16.5 ms. The 81% are read operations and the
>> >     19% are write operations. Munin shows an average latency of 14.5 ms.
>> >
>> > sar -dp -f /var/log/sysstat/sa[day] shows (for some days):
>> >
>> >   * Jan 27:
>> >       o An avg of 26 tps (IOPS) in the first partition (/root). An avg
>> >         latency of 126 ms.
>> >       o An avg of 11 tps in the third partition (/srv). An avg latency
>> >         of 29 ms.
>> >       o
>> >
>> >   * Jan 26:
>> >       o An avg of 27 tps (IOPS) in the first partition (/root). An avg
>> >         latency of 126 ms.
>> >       o An avg of 11 tps in the third partition (/srv). An avg latency
>> >         of 29 ms.
>> >
>> > I can check this avg in the other days.
>> >
>> > As we can see, we have a high latency on the first partition (where
>> > databases reside) and taking into account that our VM is struggling for
>> > disk I/O in an old disk subsystem, it is likely that 37 IOPS would be a
>> > big part of the total maximum IOPS value.
>>
>> Great analysis. Ruben and I upgraded the kernel to 3.0.0, which is still
>> ancient, but at least better than what we had before. We also disabled
>> barriers, which might not play well with the dom0 which is also running
>> a very old kernel.
>>
>> Let's see if this brings down the damn latency.
>>
>> --
>> Bernie Innocenti
>> Sugar Labs Infrastructure Team
>> http://wiki.sugarlabs.org/go/Infrastructure_Team
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sugarlabs.org/archive/systems/attachments/20160129/d58d4187/attachment.html>
Previous message: [Systems] Sunjammer crash postmortem
Next message: [Systems] Sunjammer crash postmortem
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Systems mailing list