[Systems] Sunjammer Swapping
Bernie Innocenti
bernie at codewiz.org
Sat Jan 2 03:53:20 EST 2016
Nice summary Quozl! I'll add my ramblings to it:
Swap is the bane of all real-time applications. In Google's engineering
culture, HTTP queries have soft real-time requirements if a remote user
may be waiting for the response. All the involved back-ends must be
provisioned to respond within a bounded 99-percentile latency or else
they could as well be dead.
The unpredictable latency of swap is never a good idea for serving. It
is best to return a "500 Server Too Busy" immediately and let the load
balancer retry the query on another machine.
Swap MIGHT be an option for those large batch jobs where you don't know
ahead of time what the size of your working set is going to be. But many
large processes can be sharded, checkpointed and dispatched to many
slave machines. So the modern way to deal with this is to automatically
restart the slaves that failed due to OOM with a larger allocation.
If your working set which is both too large for physical memory and
impossible to shard, then you're out of luck. Come back in a few years,
or see if adding swap helps complete the processing before then :-)
So far we only discussed workloads which are somewhat homogeneous and
known ahead of time. Sunjammer, however, is a mess of different
workloads; a general-purpose system in which several shell users can
start random processes and those compete for resources with apache and
other serving processes.
Arguably, this is bad system design. Running heterogeneous workloads on
a single machine is totally fine and actually helps maximize
utilization, but there should be better resource isolation between them.
ESPECIALLY for memory!
I guess I can be excused for designing sunjammer as a big promiscuous
machine back in 2008, when cloud hosting wasn't a mature option and
managing a running many virtual machines was both inefficient and
required a lot of manual labor.
Today, the SL infrastructure is moving towards per-service Docker
containers and it seems like a huge improvement: small footprint, better
isolation between services, good security and overall easier
maintenance. Management of images seems still a little sketchy, but I'm
confident it will improve over time.
We have already removed a number of problematic services from Sunjammer,
including Trac, and I personally think that we should continue
decomposing it into Docker containers until Sunjammer becomes nothing
but a plain shell server... perhaps also doing email handling for
@sugarlabs.org, but without Mailman.
Once we reach that point, swap can stay enabled since it would no longer
harm web serving.
On 01/02/2016 01:20 AM, James Cameron wrote:
> Good questions.
>
> No, it's not expected or desirable to have swap used, but seeing it
> used is not a problem.
>
> When a workload causes swapping, it is better to increase physical
> memory or decrease workload. But both of these require investment,
> so swap is the next best thing.
>
> Short peaks in workload are better handled with swap than not handled.
>
> The evidence you gave showed a workload larger than physical memory.
> I just couldn't tell whether it was a peak in workload (as Bernie has
> identified) or an ongoing behaviour.
>
> Yes, some systems are set up without swap. When a peak in workload
> exceeds physical memory, there are two outcomes;
>
> - Systems with swap slow down, while they thrash with paging,
> substituting secondary storage (disk) for primary (RAM),
>
> - Systems without swap fail, either by reporting an error to whatever
> program is allocating memory, or killing a process (Linux, OOM).
>
> The design question then becomes; do you want a system to slow down or
> refuse a transaction?
>
> The OLPC XO laptops were originally set up without swap. Recently
> swap was added to the XO-1 in order to release about 32 MB of the 256
> MB.
>
> Virtual memory as a technology dates back to the 1970s, and my work
> tuning VAX systems running VMS was most rewarding. It is really nice
> that virtual memory has stood the test of time.
--
_ // Bernie Innocenti
\X/ http://codewiz.org
More information about the Systems
mailing list