[Systems] Ailing drive on housetree

Bernie Innocenti bernie at sugarlabs.org
Fri Nov 11 22:48:30 EST 2011


On Fri, 2011-11-11 at 22:32 -0500, Chris Leonard wrote:

> On Fri, Nov 11, 2011 at 9:46 PM, Bernie Innocenti <bernie at codewiz.org>
> wrote:
>         Today I finally figured out why housetree was reporting high load
>         occasionally without any apparent activity on the VMs.
>         
>         It turns out that sdb is dying, and we didn't even have smartd running.
>         Luckily, sdb is part of RAID1 arrays with triple-redundancy. We could
>         continue operating with 2 drives for some time, but I'd feel safer if we
>         replaced the drive as soon as possible. After all, the remaining drives
>         are the same model and have been operating for the same amount of time
>         (806 days).
>         
> 
> Bernie,
> 
> My personal rule is that failed drives, even if they are only hot
> spares should be replaced as soon as possible.  Does this require
> expenditure of funds beyond your SLOBs delegated limit?   If so, then
> hardware failure replacement like this needs to be added to your
> authorization as a general case.  The only e-mail you should have to
> send to get a replacement ordered is to your favorite vendor.

The cost for a new drive should be within my existing cap of $200, but
you're right: we should have a higher pre-authorized expense limit for
emergency hardware replacements.

BTW: just in case, I'm pre-seeding the filesystem of jita.sugarlabs.org
on treehouse.

-- 
Bernie Innocenti
Sugar Labs Infrastructure Team
http://wiki.sugarlabs.org/go/Infrastructure_Team



More information about the Systems mailing list