[Sugar-devel] Datastore index corruption

Tomeu Vizoso tomeu at sugarlabs.org
Mon Jun 21 03:26:36 EDT 2010


On Sun, Jun 20, 2010 at 02:33, Bernie Innocenti <bernie at codewiz.org> wrote:
> We've found an XO-1 running Fedora 11 + Sugar 0.84 with an interesting
> datastore corruption issue.
>
> The journal was showing just one object, but the
> ~/.sugar/default/datastore directory contained 4-5 invisible entries.
> After removing index_updated and restarting Sugar, the entries
> reappeared.
>
> There was no time to analyze the problem in detail, but I have strong
> feeling that this isn't an isolated case. It would explain other reports
> of files disappearing and mysteriously reduced disk space.

All instances of mysteriously reduced disk space were due to leaking
temp files, leaking hard links or activities storing too much stuff in
their data dirs inside ~/.sugar/default. Has anybody seen a case in
which the Journal was empty but the DS dir was taking an anomalous
amount of space?

> Some reflections:
>
>  * the corruption could be caused by flash problems. I have found
>   laptops in the field that wouldn't boot because /sbin/lvm was
>   corrupted
>
>  * we can't exclude jffs2 problems too: when it's almost full, it does
>   slow garbage collection passes on boot which kids interrupt by
>   power cycling. I wonder how robust jffs2 is in this case.
>
>  * there might be a bug in xapian. If so, we'll see this issue also
>   on the XO-1.5
>
>  * I'm skeptical it's a new issue in 0.84 or F-11: the older builds
>   had so many data loss issues that a subtler problem like this
>   could have easily gone unnoticed.

Note that the DS in 0.84 is totally different than the one in 0.82.

>  * can the datastore detect index corruption in the most obvious cases?
>   If so, what would it do?

If cannot open it:
http://git.sugarlabs.org/projects/sugar-datastore/repos/mainline/blobs/master/src/carquinyol/datastore.py#line73

If cannot perform a query:
http://git.sugarlabs.org/projects/sugar-datastore/repos/mainline/blobs/master/src/carquinyol/datastore.py#line223

>  * how long does it take to rebuild the index on a busy journal?
>   Can we afford to rebuild from time to time? On every boot?

With the I/O and CPU on both XOs, I don't think we can afford much. We
should try to be a bit systematic and get to understand what is going
on.

>  * finally, if we can't find a 100% robust solution, would it make
>   sense to add a "Reindex Journal" button somewhere? Where would
>   you put it?

With atomic dir moving and xapian transactions, we should be able to
keep the DS consistent at all times. That's not "100% robustness" but
should made unnecessary a "Reindex Journal" button.

Please consider reading the DS code, it's really simple and less than
1600 lines of code in total.

Here is the file structure explained:
http://wiki.sugarlabs.org/go/Development_Team/Datastore_Rewrite

Regards,

Tomeu

> --
>   // Bernie Innocenti - http://codewiz.org/
>  \X/  Sugar Labs       - http://sugarlabs.org/
>
> _______________________________________________
> Sugar-devel mailing list
> Sugar-devel at lists.sugarlabs.org
> http://lists.sugarlabs.org/listinfo/sugar-devel
>


More information about the Sugar-devel mailing list