[Systems] Disk array issues on cloud9.fsf.org affecting sunjammer

Ruben Rodriguez ruben at fsf.org
Sun May 17 14:50:03 EDT 2020


Hi there, FSF CTO and sysadmin here.

Last night a drive or a controller (unclear) in cloud9 failed and got
kicked out of the array. I removed it from the controller and reset it,
the disk came back and I re-added it to the array. After some resync
time there were errors reading from the non-failed disk, and the resync
couldn't finish. Although there are some further steps we could do to
fix that array, chances for failure are high and other disks in that box
are in bad condition.

I have started to copy the data over to our new server stack, which runs
on ceph instead of raid. I'm copying it over through rsync from the
running sunjammer. When that is done in a few hours, I'll have to turn
the old SJ off, mount the volume on the host, do a final rsync, and turn
on the new SJ VM. Downtime should only be a few minutes.

That is all assuming that the current SJ, which is still running on the
degraded array, doesn't hit IO errors (it hasn't yet). I'll try my best
to minimize changes to the vm structure (e.g. /srv/ is on a separate
volume using quotas).

You can find me on #sugar or #fsfsys as quidam.


More information about the Systems mailing list