[Sugar-devel] Datastore index corruption

Aleksey Lim alsroot at member.fsf.org
Sat Jun 19 21:41:35 EDT 2010


On Sun, Jun 20, 2010 at 01:10:16AM +0000, Aleksey Lim wrote:
> On Sat, Jun 19, 2010 at 08:33:50PM -0400, Bernie Innocenti wrote:
> > We've found an XO-1 running Fedora 11 + Sugar 0.84 with an interesting
> > datastore corruption issue.
> > 
> > The journal was showing just one object, but the
> > ~/.sugar/default/datastore directory contained 4-5 invisible entries.
> > After removing index_updated and restarting Sugar, the entries
> > reappeared.
> > 
> > There was no time to analyze the problem in detail, but I have strong
> > feeling that this isn't an isolated case. It would explain other reports
> > of files disappearing and mysteriously reduced disk space.
> > 
> > Some reflections:
> > 
> >  * the corruption could be caused by flash problems. I have found
> >    laptops in the field that wouldn't boot because /sbin/lvm was
> >    corrupted
> > 
> >  * we can't exclude jffs2 problems too: when it's almost full, it does
> >    slow garbage collection passes on boot which kids interrupt by
> >    power cycling. I wonder how robust jffs2 is in this case.
> > 
> >  * there might be a bug in xapian. If so, we'll see this issue also
> >    on the XO-1.5
> > 
> >  * I'm skeptical it's a new issue in 0.84 or F-11: the older builds
> >    had so many data loss issues that a subtler problem like this
> >    could have easily gone unnoticed.
> > 
> >  * can the datastore detect index corruption in the most obvious cases?
> >    If so, what would it do?
> > 
> >  * how long does it take to rebuild the index on a busy journal?
> >    Can we afford to rebuild from time to time? On every boot?
> > 
> >  * finally, if we can't find a 100% robust solution, would it make
> >    sense to add a "Reindex Journal" button somewhere? Where would
> >    you put it?
> 
> There were patches on bugs.sl.o that are using fsync on every ds change
> (file based fsync) to make journal more robust. I'm personally not happy
> with this method. And maybe most of journal issues with "lost" data could
> be effectively solved just by keeping index up-to-date.
> 
> So, what about using dirty flag. For example:
> 
> * before any ds changes, set dirty flag (well, in fs as well)
> * process several ds requests
> * in some time (some minutes), remove dirty flag and do sync on entirely fs
> * expose sync ds method to make sync directly e.g. on Reboot button click

Since ds already calls xapian's flush we don't need to manually call
fs based fsync, just file based fsync for dirty flag.

-- 
Aleksey


More information about the Sugar-devel mailing list