[IAEP] [Systems] ALERT JUSTICE DOWN [24 hours without response] (!)

Bernie Innocenti bernie at sugarlabs.org
Fri Jun 19 15:33:55 EDT 2015


As dogi said, freedom and justice are relatively new and completely
independent from the OLPC machines which live in the same server room.

Our servers are pretty decent and we're using less than 50% of the
available capacity. I don't see a reason to replace them for another 2-3
years.

What's being neglected is software upgrades: justice is on Ubuntu 12.04
and sunjammer 10.04. Upgrading justice should be safe and easy (since we
already walked the same upgrade path with freedom a while ago).

Upgrading sunjammer, otoh, is going to be tricky. It's also our most
critical piece of infrastructure, so we can't afford prolonged downtime.

If someone had some spare cycles, I'd try to decompose sunjammer into
smaller manageable chunks. For instance, by moving all the wikis and the
main website to a separate VM. SamP probably already has a plan for ASLO.

I'd leave shell accounts, LDAP, email processing and mailing lists on
sunjammer because they're vaguely related and interconnected.

On 06/18/2015 07:18 PM, Stefan Unterhauser wrote:
> 
> 
> On Thu, Jun 18, 2015 at 3:14 PM, Samuel Greenfeld <samuel at greenfeld.org
> <mailto:samuel at greenfeld.org>> wrote:
> 
>     Unless the hardware is newer than I think it is, it likely is quite old.
> 
> < 3 years
>
>     OLPC's hardware in the Media Lab kept flaking out to the point most
>     (all?) of it was eventually virtualized.
>
>     How much would it cost to look into getting new hardware and/or
>     using someone's virtualization platform
> 
>     Sugar seems to change their setup a bit more than OLPC, so it may be
>     worth investigating a scenario where resources could be spun up on
>     demand.
> 
> 
>     On Thu, Jun 18, 2015 at 3:09 PM, Bernie Innocenti
>     <bernie at sugarlabs.org <mailto:bernie at sugarlabs.org>> wrote:
> 
>         On 06/18/2015 03:01 PM, Gonzalo Odiard wrote:
>         > Any chance to check if disks are dying or there other reason for these
>         > instabilities?
> 
>         Nothing odd from smartctl, and anyway the server would keep
>         responding
>         to pings even if both disks in the raid array were dead.
> 
> +1 +2 +3 +...
> 
> on the console the system completely freezes 
> 
>  
> 
>         So I'm thinking it's either a kernel bug, or unstable hardware.
> 
> 
>         > Gonzalo
>         >
>         > On Thu, Jun 18, 2015 at 3:56 PM, Bernie Innocenti <bernie at sugarlabs.org <mailto:bernie at sugarlabs.org>
>         > <mailto:bernie at sugarlabs.org <mailto:bernie at sugarlabs.org>>>
>         wrote:
>         >
>         >     +systems@
> 
> 
> thanks bernie
>  
> 
>         >
>         >     I rebooted justice from the management console and it's
>         now responding
>         >     to pings.
>         >
>         >     I couldn't view the screen capture and I had no time to go
>         to the Media
>         >     Lab to physically inspect the machine, so I don't
>         understand the
>         >     root cause.
>         >
>         >     As reported by Dogi, Justice seems to crash every 1-2 months.
> 
>  
> more ~3 months 
> 
>         >     I suggest we try the following steps:
>         >
>         >     1. upgrade justice to Ubuntu 14.04 (like we did with
>         freedom 1yr ago)
>         >
> 
> +1 specially since justice compared to freedom has a long history of
> being upgraded (it has its roots on housetree server build which means
> it lives already since 2009 ... freedom got fresh installed ~2012/13)
> 
> this is why after that we should just consider a total new install of
> justice, since my guess is that it is a software issue (justice always
> lasts 2+ months)
> 
> why I think it is not a hardware issue is that this crashing is the case
> already for the last 2 years (I did 95% of all reboots) and started with
> our last system upgrade (something got upgraded to unstable)
> 
>  
> 
>         >     2. if crashes continue, go to the server room and swap the
>         drives with
>         >     freedom (which is our hot-swap server and doesn't
>         currently run anything
>         >     critical)
>         >
>         >     3. Ask again the ML to give one of us physical access to
>         the server
>         >     room. I work nearby, but I have trouble leaving during
>         office hours on a
>         >     personal errand and if anything happens over a week-end
>         we're in
>         >     trouble.
>         >
>         >     Sebastian: you should at least get access to the
>         management console.
>         >     Ping me on IRC and I'll send you the credentials on a
>         secure channel.
>         >
>         >
>         >     On 06/18/2015 10:40 AM, Sebastian Silva wrote:
>         >     > Hello Sugar Oversight Board, Sugar Labs Members,
>         >     >
>         >     > Our main production server virtual machine host is down
>         and I can't
>         >     > reach it.
>         >     > We have several systems that depend on this
>         infrastructure, including
>         >     > pootle server which was actively being used by
>         translators of
>         >     Aymara and
>         >     > Awajun native languages.
>         >     >
>         >     > I respectfully request that you call on the phone
>         whoever has physical
>         >     > access to this machine and we try to bring it back
>         online. I think
>         >     this
>         >     > should be either Bernie Inocenti or Stefan Unterhauser.
> 
> 
> my phonenumber is 617 767 2668 <tel:617%20767%202668> ... just call
>  
> 
>         >     >
>         >     > Also, I would like to request for more volunteers from
>         infrasctucure
>         >     > team to have virtual terminal access to these machines
>         (not just ssh),
>         >     > or to put them in a proper collocation service where we
>         can get some
>         >     > support.
>         >     >
>         >     > Thanks in advance for your help.
>         >     > Sebastian
>         >     >
>         >     > On 17/06/15 20:55, Sebastian Silva wrote:
>         >     >> Affected services:
>         >     >> translate.sugarlabs.org
>         <http://translate.sugarlabs.org> <http://translate.sugarlabs.org>
>         >     >> git.sugarlabs.org <http://git.sugarlabs.org>
>         <http://git.sugarlabs.org>
>         >     >> packages.sugarlabs.org <http://packages.sugarlabs.org>
>         <http://packages.sugarlabs.org>
>         >     >>
>         >     >>
>         >     >>
>         >     >> On 17/06/15 20:48, Sebastian Silva wrote:
>         >     >>> We can't reach it.
>         >     >>>
>         >     >>> Anybody with physical access to the machine please respond.
>         >     >>>
>         >     >>>
>         >     >>> Regards,
>         >     >>> Sebastian
>         >
>         >     --
>         >     Bernie Innocenti
>         >     Sugar Labs Infrastructure Team
>         >     http://wiki.sugarlabs.org/go/Infrastructure_Team
>         >     _______________________________________________
>         >     IAEP -- It's An Education Project (not a laptop project!)
>         >     IAEP at lists.sugarlabs.org <mailto:IAEP at lists.sugarlabs.org>
>         <mailto:IAEP at lists.sugarlabs.org <mailto:IAEP at lists.sugarlabs.org>>
>         >     http://lists.sugarlabs.org/listinfo/iaep
>         >
>         >
>         >
>         >
>         > --
>         > Gonzalo Odiard
>         >
>         > SugarLabs - Software for children learning
> 
> 
>         --
>         Bernie Innocenti
>         Sugar Labs Infrastructure Team
>         http://wiki.sugarlabs.org/go/Infrastructure_Team
>         _______________________________________________
>         IAEP -- It's An Education Project (not a laptop project!)
>         IAEP at lists.sugarlabs.org <mailto:IAEP at lists.sugarlabs.org>
>         http://lists.sugarlabs.org/listinfo/iaep
> 
> 
> 
>     _______________________________________________
>     Systems mailing list
>     Systems at lists.sugarlabs.org <mailto:Systems at lists.sugarlabs.org>
>     http://lists.sugarlabs.org/listinfo/systems
> 
> 


-- 
Bernie Innocenti
Sugar Labs Infrastructure Team
http://wiki.sugarlabs.org/go/Infrastructure_Team


More information about the IAEP mailing list