[Systems] solarsail: http down

Ivan Krstić krstic at solarsail.hcs.harvard.edu
Sat Mar 14 22:05:28 EDT 2009


On Mar 14, 2009, at 11:37 PM, Sascha Silbe wrote:
> Both dev.sugarlabs.org and wiki.sugarlabs.org were down for over an  
> hour.

A 3rd party service starts flooding my cell phone with alerts if  
anything is down for an hour, and I haven't received any notifications  
about this before the kernel problem kicked in. I'll have to  
investigate more; the machine probably slowed down initially but still  
worked.

In general, downtime with this machine has been exceedingly rare, so I  
haven't felt compelled to improve upon the combination of hourly  
monitoring and 20-minute watchdog. It's now time to rethink this.  
Tomorrow, I will set up off-site Nagios with 60s monitoring frequency  
for all of SL infra and post the details here.

Cheers,

--
Ivan Krstić <krstic at solarsail.hcs.harvard.edu> | http://radian.org



More information about the Systems mailing list