[Sugar-devel] Datastore index corruption

Tomeu Vizoso tomeu at sugarlabs.org
Mon Jun 21 03:33:20 EDT 2010


On Sun, Jun 20, 2010 at 03:10, Aleksey Lim <alsroot at member.fsf.org> wrote:
> On Sat, Jun 19, 2010 at 08:33:50PM -0400, Bernie Innocenti wrote:
>> We've found an XO-1 running Fedora 11 + Sugar 0.84 with an interesting
>> datastore corruption issue.
>>
>> The journal was showing just one object, but the
>> ~/.sugar/default/datastore directory contained 4-5 invisible entries.
>> After removing index_updated and restarting Sugar, the entries
>> reappeared.
>>
>> There was no time to analyze the problem in detail, but I have strong
>> feeling that this isn't an isolated case. It would explain other reports
>> of files disappearing and mysteriously reduced disk space.
>>
>> Some reflections:
>>
>>  * the corruption could be caused by flash problems. I have found
>>    laptops in the field that wouldn't boot because /sbin/lvm was
>>    corrupted
>>
>>  * we can't exclude jffs2 problems too: when it's almost full, it does
>>    slow garbage collection passes on boot which kids interrupt by
>>    power cycling. I wonder how robust jffs2 is in this case.
>>
>>  * there might be a bug in xapian. If so, we'll see this issue also
>>    on the XO-1.5
>>
>>  * I'm skeptical it's a new issue in 0.84 or F-11: the older builds
>>    had so many data loss issues that a subtler problem like this
>>    could have easily gone unnoticed.
>>
>>  * can the datastore detect index corruption in the most obvious cases?
>>    If so, what would it do?
>>
>>  * how long does it take to rebuild the index on a busy journal?
>>    Can we afford to rebuild from time to time? On every boot?
>>
>>  * finally, if we can't find a 100% robust solution, would it make
>>    sense to add a "Reindex Journal" button somewhere? Where would
>>    you put it?
>
> There were patches on bugs.sl.o that are using fsync on every ds change
> (file based fsync) to make journal more robust. I'm personally not happy
> with this method. And maybe most of journal issues with "lost" data could
> be effectively solved just by keeping index up-to-date.
>
> So, what about using dirty flag. For example:
>
> * before any ds changes, set dirty flag (well, in fs as well)
> * process several ds requests
> * in some time (some minutes), remove dirty flag and do sync on entirely fs
> * expose sync ds method to make sync directly e.g. on Reboot button click

Please see how flush is actually implemented:

http://git.sugarlabs.org/projects/sugar-datastore/repos/mainline/blobs/master/src/carquinyol/indexstore.py#line313

Xapian's flush actually flushes to the filesystem:

http://xapian.org/docs/apidoc/html/classXapian_1_1WritableDatabase.html#d0077acafa9485c97b73b8726c375732

Btw, am I the only one to feel that this thread has a bit too much of
hand-waving? The software in use is FOSS so the sources are readily
available (we even have some docs), and the issue is grave enough to
deserve serious attention and analysis.

We should try to avoid any easy "solutions" that gives us a false
sense of security because getting real data about the net effects of
patches will be hard.

Regards,

Tomeu

> --
> Aleksey
> _______________________________________________
> Sugar-devel mailing list
> Sugar-devel at lists.sugarlabs.org
> http://lists.sugarlabs.org/listinfo/sugar-devel
>


More information about the Sugar-devel mailing list