[Systems] Sunjammer crash postmortem

Bernie Innocenti bernie at codewiz.org
Fri Jan 29 10:35:16 EST 2016


I thought memory ballooning requires a kernel module loaded in the guest to free up pages before releasing them:

https://rwmj.wordpress.com/2010/07/17/virtio-balloon/

Dunno about XEN, but memory can't just vanish without the guest kernel knowing.

On January 29, 2016 8:05:04 AM EST, Samuel Cantero <scanterog at gmail.com> wrote:
>On Fri, Jan 29, 2016 at 9:49 AM, Samuel Cantero <scanterog at gmail.com>
>wrote:
>
>> On Fri, Jan 29, 2016 at 12:33 AM, Bernie Innocenti
><bernie at sugarlabs.org>
>> wrote:
>>
>>> On 01/28/2016 08:32 PM, Samuel Cantero wrote:
>>> > Apache wasn't the only process calling oom-killer. I found also
>>> > opendkim, spamc, mb, uwsgi, and smtpd.
>>> >
>>> > The first incident was at Jan 28 03:07:25. Usually we have a lot
>of
>>> > memory available in sunjammer. Munin stopped plotting at 02:40 and
>the
>>> > memory was low as expected. I just can only imagine some kind of
>>> > unmanaged over-commitment (over-provisioning) in the Xen Dom0.
>>>
>>> I don't think the Dom0 can steal ram from the domU. That's fixed,
>unless
>>> you use those weird virtio balloon devices.
>>>
>>
>> As far as I know, Xen have the memory over-commit feature. With this
>> feature, memory is taken from one domain and given to another using
>the xen
>> "ballooning" mechanism. My simple theory is that this feature is
>enabled in
>> the Xen instance running at the FSF and suddenly the host got
>overloaded
>> without the capacity to provide the memory needed for our VM.
>>
>>>
>>> I'm not really sure what allocated all the memory, but it had to be
>>> something internal. The kernel dumped a list of processes and their
>>> memory usage at each oom iteration, but none of them is particularly
>big.
>>>
>>> The real memory usage of apache is very hard to estimate, because it
>>> forks plenty of children, each with big VSS and RSS figures.
>However,
>>> most of the pages should be shared, so they don't add up.
>>
>>
>I didn't read it well before. It makes sense now. It is really hard to
>estimate if those pages are shared.
>
>>
>> Currently, we are running apache in prefork mode (multiple child
>processes
>> with one thread each). I have done a little test in order to sizing
>it now.
>> We can check again this value when apache is crawling.
>>
>> Total number of apache2 processes: *53*
>>
>> ps aux | grep 'apache2' | grep -v grep | wc -l
>>
>> Our server limit value is 512.
>>
>> Total RSS (resident set size) value for the total number of apache2
>> processes: *3553.78 MB*
>>
>> ps aux | grep 'apache2' | grep -v grep | awk '{sum += $6;} END {print
>> sum/1024 " MB";}'
>>
>> According to ps man page, the RSS is the non-swapped physical memory
>that
>> a task has used. However, this sum is not related with the free cmd
>output.
>>
>> scg at sunjammer:~$ free -m
>>             total       used       free     shared    buffers    
>cached
>> Mem:         11992      10848       1144          0        682      
>8036
>> -/+ buffers/cache:       2128       9864
>> Swap:         8191          0       8191
>>
>> Where the used memory is only 2 G. The rest is cached memory. Do am I
>> missing something?
>>
>>
>>> > Regarding to disk I/O:
>>> >
>>> > iostat shows:
>>> >
>>> >   * An average of 32 tps (IOPS) in the first partition (/root).
>iostat
>>> >     -x shows an average latency (await) of 126 ms. The 25% are
>read
>>> >     operations and the 75% are write operations. Munin shows an
>average
>>> >     latency of 145 ms since we're running diskstats plugin.
>>> >   * An average of 26 tps in the third partition (/srv). iostat -x
>shows
>>> >     an average latency of 16.5 ms. The 81% are read operations and
>the
>>> >     19% are write operations. Munin shows an average latency of
>14.5 ms.
>>> >
>>> > sar -dp -f /var/log/sysstat/sa[day] shows (for some days):
>>> >
>>> >   * Jan 27:
>>> >       o An avg of 26 tps (IOPS) in the first partition (/root). An
>avg
>>> >         latency of 126 ms.
>>> >       o An avg of 11 tps in the third partition (/srv). An avg
>latency
>>> >         of 29 ms.
>>> >       o
>>> >
>>> >   * Jan 26:
>>> >       o An avg of 27 tps (IOPS) in the first partition (/root). An
>avg
>>> >         latency of 126 ms.
>>> >       o An avg of 11 tps in the third partition (/srv). An avg
>latency
>>> >         of 29 ms.
>>> >
>>> > I can check this avg in the other days.
>>> >
>>> > As we can see, we have a high latency on the first partition
>(where
>>> > databases reside) and taking into account that our VM is
>struggling for
>>> > disk I/O in an old disk subsystem, it is likely that 37 IOPS would
>be a
>>> > big part of the total maximum IOPS value.
>>>
>>> Great analysis. Ruben and I upgraded the kernel to 3.0.0, which is
>still
>>> ancient, but at least better than what we had before. We also
>disabled
>>> barriers, which might not play well with the dom0 which is also
>running
>>> a very old kernel.
>>>
>>> Let's see if this brings down the damn latency.
>>>
>>> --
>>> Bernie Innocenti
>>> Sugar Labs Infrastructure Team
>>> http://wiki.sugarlabs.org/go/Infrastructure_Team
>>>
>>
>>
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Systems mailing list
>Systems at lists.sugarlabs.org
>http://lists.sugarlabs.org/listinfo/systems

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sugarlabs.org/archive/systems/attachments/20160129/8768d4d0/attachment.html>


More information about the Systems mailing list