[Systems] Sunjammer crash postmortem

Samuel Cantero scanterog at gmail.com
Fri Jan 29 11:27:15 EST 2016


You're right. It is the same in Xen. You need a balloon driver in the guest
OS.

On Fri, Jan 29, 2016 at 12:35 PM, Bernie Innocenti <bernie at codewiz.org>
wrote:

> I thought memory ballooning requires a kernel module loaded in the guest
> to free up pages before releasing them:
>
> https://rwmj.wordpress.com/2010/07/17/virtio-balloon/
>
> Dunno about XEN, but memory can't just vanish without the guest kernel
> knowing.
>
> On January 29, 2016 8:05:04 AM EST, Samuel Cantero <scanterog at gmail.com>
> wrote:
>
>>
>>
>> On Fri, Jan 29, 2016 at 9:49 AM, Samuel Cantero <scanterog at gmail.com>
>> wrote:
>>
>>> On Fri, Jan 29, 2016 at 12:33 AM, Bernie Innocenti <bernie at sugarlabs.org
>>> > wrote:
>>>
>>>> On 01/28/2016 08:32 PM, Samuel Cantero wrote:
>>>> > Apache wasn't the only process calling oom-killer. I found also
>>>> > opendkim, spamc, mb, uwsgi, and smtpd.
>>>> >
>>>> > The first incident was at Jan 28 03:07:25. Usually we have a lot of
>>>> > memory available in sunjammer. Munin stopped plotting at 02:40 and the
>>>> > memory was low as expected. I just can only imagine some kind of
>>>> > unmanaged over-commitment (over-provisioning) in the Xen Dom0.
>>>>
>>>> I don't think the Dom0 can steal ram from the domU. That's fixed, unless
>>>> you use those weird virtio balloon devices.
>>>>
>>>
>>> As far as I know, Xen have the memory over-commit feature. With this
>>> feature, memory is taken from one domain and given to another using the xen
>>> "ballooning" mechanism. My simple theory is that this feature is enabled in
>>> the Xen instance running at the FSF and suddenly the host got overloaded
>>> without the capacity to provide the memory needed for our VM.
>>>
>>>>
>>>> I'm not really sure what allocated all the memory, but it had to be
>>>> something internal. The kernel dumped a list of processes and their
>>>> memory usage at each oom iteration, but none of them is particularly
>>>> big.
>>>>
>>>> The real memory usage of apache is very hard to estimate, because it
>>>> forks plenty of children, each with big VSS and RSS figures. However,
>>>> most of the pages should be shared, so they don't add up.
>>>
>>>
>> I didn't read it well before. It makes sense now. It is really hard to
>> estimate if those pages are shared.
>>
>>>
>>> Currently, we are running apache in prefork mode (multiple child
>>> processes with one thread each). I have done a little test in order to
>>> sizing it now. We can check again this value when apache is crawling.
>>>
>>> Total number of apache2 processes: *53*
>>>
>>> ps aux | grep 'apache2' | grep -v grep | wc -l
>>>
>>> Our server limit value is 512.
>>>
>>> Total RSS (resident set size) value for the total number of apache2
>>> processes: *3553.78 MB*
>>>
>>> ps aux | grep 'apache2' | grep -v grep | awk '{sum += $6;} END {print
>>> sum/1024 " MB";}'
>>>
>>> According to ps man page, the RSS is the non-swapped physical memory
>>> that a task has used. However, this sum is not related with the free cmd
>>> output.
>>>
>>> scg at sunjammer:~$ free -m
>>>             total       used       free     shared    buffers     cached
>>> Mem:         11992      10848       1144          0        682
>>>       8036
>>> -/+ buffers/cache:       2128       9864
>>> Swap:         8191          0       8191
>>>
>>> Where the used memory is only 2 G. The rest is cached memory. Do am I
>>> missing something?
>>>
>>>
>>>> > Regarding to disk I/O:
>>>> >
>>>> > iostat shows:
>>>> >
>>>> >   * An average of 32 tps (IOPS) in the first partition (/root). iostat
>>>> >     -x shows an average latency (await) of 126 ms. The 25% are read
>>>> >     operations and the 75% are write operations. Munin shows an
>>>> average
>>>> >     latency of 145 ms since we're running diskstats plugin.
>>>> >   * An average of 26 tps in the third partition (/srv). iostat -x
>>>> shows
>>>> >     an average latency of 16.5 ms. The 81% are read operations and the
>>>> >     19% are write operations. Munin shows an average latency of 14.5
>>>> ms.
>>>> >
>>>> > sar -dp -f /var/log/sysstat/sa[day] shows (for some days):
>>>> >
>>>> >   * Jan 27:
>>>> >       o An avg of 26 tps (IOPS) in the first partition (/root). An avg
>>>> >         latency of 126 ms.
>>>> >       o An avg of 11 tps in the third partition (/srv). An avg latency
>>>> >         of 29 ms.
>>>> >       o
>>>> >
>>>> >   * Jan 26:
>>>> >       o An avg of 27 tps (IOPS) in the first partition (/root). An avg
>>>> >         latency of 126 ms.
>>>> >       o An avg of 11 tps in the third partition (/srv). An avg latency
>>>> >         of 29 ms.
>>>> >
>>>> > I can check this avg in the other days.
>>>> >
>>>> > As we can see, we have a high latency on the first partition (where
>>>> > databases reside) and taking into account that our VM is struggling
>>>> for
>>>> > disk I/O in an old disk subsystem, it is likely that 37 IOPS would be
>>>> a
>>>> > big part of the total maximum IOPS value.
>>>>
>>>> Great analysis. Ruben and I upgraded the kernel to 3.0.0, which is still
>>>> ancient, but at least better than what we had before. We also disabled
>>>> barriers, which might not play well with the dom0 which is also running
>>>> a very old kernel.
>>>>
>>>> Let's see if this brings down the damn latency.
>>>>
>>>> --
>>>> Bernie Innocenti
>>>> Sugar Labs Infrastructure Team
>>>> http://wiki.sugarlabs.org/go/Infrastructure_Team
>>>>
>>>
>>>
>> ------------------------------
>>
>> Systems mailing list
>> Systems at lists.sugarlabs.org
>> http://lists.sugarlabs.org/listinfo/systems
>>
>>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sugarlabs.org/archive/systems/attachments/20160129/d1ec8c39/attachment.html>


More information about the Systems mailing list