[Systems] Ailing drive on housetree
bernie at sugarlabs.org
Fri Nov 11 22:48:30 EST 2011
On Fri, 2011-11-11 at 22:32 -0500, Chris Leonard wrote:
> On Fri, Nov 11, 2011 at 9:46 PM, Bernie Innocenti <bernie at codewiz.org>
> Today I finally figured out why housetree was reporting high load
> occasionally without any apparent activity on the VMs.
> It turns out that sdb is dying, and we didn't even have smartd running.
> Luckily, sdb is part of RAID1 arrays with triple-redundancy. We could
> continue operating with 2 drives for some time, but I'd feel safer if we
> replaced the drive as soon as possible. After all, the remaining drives
> are the same model and have been operating for the same amount of time
> (806 days).
> My personal rule is that failed drives, even if they are only hot
> spares should be replaced as soon as possible. Does this require
> expenditure of funds beyond your SLOBs delegated limit? If so, then
> hardware failure replacement like this needs to be added to your
> authorization as a general case. The only e-mail you should have to
> send to get a replacement ordered is to your favorite vendor.
The cost for a new drive should be within my existing cap of $200, but
you're right: we should have a higher pre-authorized expense limit for
emergency hardware replacements.
BTW: just in case, I'm pre-seeding the filesystem of jita.sugarlabs.org
Sugar Labs Infrastructure Team
More information about the Systems