[IAEP] [Systems] ALERT JUSTICE DOWN [24 hours without response] (!)
Stefan Unterhauser
stefan at unterhauser.name
Thu Jun 18 19:18:47 EDT 2015
On Thu, Jun 18, 2015 at 3:14 PM, Samuel Greenfeld <samuel at greenfeld.org>
wrote:
> Unless the hardware is newer than I think it is, it likely is quite old.
>
> < 3 years
> OLPC's hardware in the Media Lab kept flaking out to the point most (all?)
> of it was eventually virtualized.
>
> How much would it cost to look into getting new hardware and/or using
> someone's virtualization platform?
>
> Sugar seems to change their setup a bit more than OLPC, so it may be worth
> investigating a scenario where resources could be spun up on demand.
>
>
> On Thu, Jun 18, 2015 at 3:09 PM, Bernie Innocenti <bernie at sugarlabs.org>
> wrote:
>
>> On 06/18/2015 03:01 PM, Gonzalo Odiard wrote:
>> > Any chance to check if disks are dying or there other reason for these
>> > instabilities?
>>
>> Nothing odd from smartctl, and anyway the server would keep responding
>> to pings even if both disks in the raid array were dead.
>>
>> +1 +2 +3 +...
on the console the system completely freezes
> So I'm thinking it's either a kernel bug, or unstable hardware.
>>
>>
>> > Gonzalo
>> >
>> > On Thu, Jun 18, 2015 at 3:56 PM, Bernie Innocenti <bernie at sugarlabs.org
>> > <mailto:bernie at sugarlabs.org>> wrote:
>> >
>> > +systems@
>>
>
thanks bernie
> >
>> > I rebooted justice from the management console and it's now
>> responding
>> > to pings.
>> >
>> > I couldn't view the screen capture and I had no time to go to the
>> Media
>> > Lab to physically inspect the machine, so I don't understand the
>> > root cause.
>> >
>> > As reported by Dogi, Justice seems to crash every 1-2 months.
>>
>
more ~3 months
> I suggest we try the following steps:
>> >
>> > 1. upgrade justice to Ubuntu 14.04 (like we did with freedom 1yr
>> ago)
>> >
>>
> +1 specially since justice compared to freedom has a long history of being
upgraded (it has its roots on housetree server build which means it lives
already since 2009 ... freedom got fresh installed ~2012/13)
this is why after that we should just consider a total new install of
justice, since my guess is that it is a software issue (justice always
lasts 2+ months)
why I think it is not a hardware issue is that this crashing is the case
already for the last 2 years (I did 95% of all reboots) and started with
our last system upgrade (something got upgraded to unstable)
> > 2. if crashes continue, go to the server room and swap the drives
>> with
>> > freedom (which is our hot-swap server and doesn't currently run
>> anything
>> > critical)
>> >
>> > 3. Ask again the ML to give one of us physical access to the server
>> > room. I work nearby, but I have trouble leaving during office hours
>> on a
>> > personal errand and if anything happens over a week-end we're in
>> > trouble.
>> >
>> > Sebastian: you should at least get access to the management console.
>> > Ping me on IRC and I'll send you the credentials on a secure
>> channel.
>> >
>> >
>> > On 06/18/2015 10:40 AM, Sebastian Silva wrote:
>> > > Hello Sugar Oversight Board, Sugar Labs Members,
>> > >
>> > > Our main production server virtual machine host is down and I
>> can't
>> > > reach it.
>> > > We have several systems that depend on this infrastructure,
>> including
>> > > pootle server which was actively being used by translators of
>> > Aymara and
>> > > Awajun native languages.
>> > >
>> > > I respectfully request that you call on the phone whoever has
>> physical
>> > > access to this machine and we try to bring it back online. I think
>> > this
>> > > should be either Bernie Inocenti or Stefan Unterhauser.
>>
>
my phonenumber is 617 767 2668 ... just call
> > >
>> > > Also, I would like to request for more volunteers from
>> infrasctucure
>> > > team to have virtual terminal access to these machines (not just
>> ssh),
>> > > or to put them in a proper collocation service where we can get
>> some
>> > > support.
>> > >
>> > > Thanks in advance for your help.
>> > > Sebastian
>> > >
>> > > On 17/06/15 20:55, Sebastian Silva wrote:
>> > >> Affected services:
>> > >> translate.sugarlabs.org <http://translate.sugarlabs.org>
>> > >> git.sugarlabs.org <http://git.sugarlabs.org>
>> > >> packages.sugarlabs.org <http://packages.sugarlabs.org>
>> > >>
>> > >>
>> > >>
>> > >> On 17/06/15 20:48, Sebastian Silva wrote:
>> > >>> We can't reach it.
>> > >>>
>> > >>> Anybody with physical access to the machine please respond.
>> > >>>
>> > >>>
>> > >>> Regards,
>> > >>> Sebastian
>> >
>> > --
>> > Bernie Innocenti
>> > Sugar Labs Infrastructure Team
>> > http://wiki.sugarlabs.org/go/Infrastructure_Team
>> > _______________________________________________
>> > IAEP -- It's An Education Project (not a laptop project!)
>> > IAEP at lists.sugarlabs.org <mailto:IAEP at lists.sugarlabs.org>
>> > http://lists.sugarlabs.org/listinfo/iaep
>> >
>> >
>> >
>> >
>> > --
>> > Gonzalo Odiard
>> >
>> > SugarLabs - Software for children learning
>>
>>
>> --
>> Bernie Innocenti
>> Sugar Labs Infrastructure Team
>> http://wiki.sugarlabs.org/go/Infrastructure_Team
>> _______________________________________________
>> IAEP -- It's An Education Project (not a laptop project!)
>> IAEP at lists.sugarlabs.org
>> http://lists.sugarlabs.org/listinfo/iaep
>>
>
>
> _______________________________________________
> Systems mailing list
> Systems at lists.sugarlabs.org
> http://lists.sugarlabs.org/listinfo/systems
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sugarlabs.org/archive/iaep/attachments/20150618/a8042795/attachment.html>
More information about the IAEP
mailing list