[Sugar-devel] Datastore redesign

Tomeu Vizoso tomeu at sugarlabs.org
Mon Jul 6 06:29:53 EDT 2009


On Sun, Jul 5, 2009 at 22:32, Sascha
Silbe<sascha-ml-ui-sugar-devel at silbe.org> wrote:
> Hi!
>
> After reading the recent threads about Journal / data store and the IDs used
> by them the big picture is much more clear now, thanks!
> I've adjusted by datastore redesign proposal [1] accordingly and would like
> to submit it for review now.
> It's not clear a full redesign is The Right Thing to do now, but I'd rather
> like to specify what it should look like, not what steps to take next
> towards it.

Agreed. I have to say that your proposal is excellent, congratulations!

> There's an important design decision that's still open:
>
> Is the asynchronous API design useful enough to warrant more complex
> implementation?

I'm not sure, but I think that whatever decision we take should be
made based on actual usage of the DS. What about proposing an example
of how an existing activity would be modified to use the new API?

>  - DBus operations can be run asynchronously so UI responsiveness shouldn't
> be
>    an issue
>  - For save() calls activity needs to wait for result (containing new
>    version_id) before it can invoke save() again for the same object
>    which can take quite some time if save() is sync - especially if other
>    activities are saving at the same time.

What about having a separate call that returns synchronously a new
tree_id and/or version_id?

> Making the API fully asynchronous is the cause for much of the complexity of
> my proposal, but if we eliminate the queueing the response times for write
> accesses and checkout() can be very long even for unrelated operations.

Why for unrelated operations?

> [1]
> http://git.sugarlabs.org/projects/versionsupport-project/repos/mainline/blobs/master/datastore-redesign.html

Some comments and questions:

# do we want an optimized way to determine (only) the branch HEADs of
a given tree_id?

This depends on the intended UI. My opinion is that if we branch at
every interesting modification (triggered by the activity detecting an
interesting change or by the user clicking on the Keep button), we
would like to display in the object list all the HEADs of each branch
in each tree_id. In that case yes, we need a way to retrieve that list
that is fast on both the client and the server side.

# using symlink instead of hardlink for "incoming" queue since we want
to support directory trees, not just files

What justifies this new requirement? I agree it would be nice to have,
but I would prefer to start with the basics then grow step by step. We
have had (several) cases before of overpromising datastores that
failed to provide the bare minimum.

# since an index rebuild can take a lot of time we need to provide UI
feedback while doing that

Any I/O operation can potentially take a lot of time, but with the
current version of the DS rebuilding an index with a few thousands of
entries is not so slow on the XO. We should never need to rebuild the
index, so this new requirement might not be justified (given the
current resources, all the other work we need to do, etc).

# detecting identical files across objects isn't as important since
duplicates are mostly expected to occur as versions of the same object

Based on how current activities are using the DS, this isn't like
that. The most common case I have heard from the field are children
downloading a PDF for reading several times. An alternative to the
current method for detecting duplicates is moving this task to
activities, is that what you suggest?

About the benefits of differential compression I would like to note
that if you analize a real world journal, the biggest files are
videos, mp3, pdfs, etc., so files in formats not easily editable with
the activities we currently have. With that I don't mean is not an
interesting challenge or something that we won't need in the future,
just that it has a relatively low impact as of today.

# activities should not submit new entries while the previously
submitted one hasn't been fully committed yet

Why so?

# version_id and parent_id

Have you thought about version_id being of the form of '2.1.4'? That
would make parent_id unneeded because we could refer to the parent as
(tree_id, 2.1.3). And would also allow us to identify the HEAD of each
branch.

# creator

What is it for?

# activity saves data to a disk, ensuring it has been committed (sync)
and proper access rights for data store

By sync you mean written to disk? Why activities need to worry about this?

#    Changes the (unversioned/version-specific) metadata of the given
object to match metadata. Fully synchronous, no return value.

How do we know which properties are version-specific and which aren't?

#     Remove (all versions of) given object from data store. Fully
synchronous. Doesn't return anything, emits signal Deleted(tree_id).

Do we have any operation in the UI that matches this?

# Get/Got

Maybe should we make it a bit more verbose? Like GetData?

# Prefixing a key name with '!' inverts the sense of matching

Is this used by the UI?

# prefixing it with '*' enables regular expression search

Is this used by the UI? I think it's good to think now how possibly
interesting new features would be added in the future, but based on
past experiences I think it would be better to only implement what we
need right now.

# Arbitrary key names are allowed, but speed may vary (i.e. not
everything is indexed).

Same here, I would return an exception for a non-indexed field before
implementing searches for arbitrary properties.

#     if True returns all matching versions of an object instead of
only the latest one

Where in the UI we would list only the last versions of several tree_id?

# textsearch(querystring, options)

What if the user has a date filter and enters a fulltext query? I
don't see how this would be implemented with the proposed
find/textsearch split.

# Stopped()

What is this for?

#      The internal data structures of datastore or one of its
backends are corrupted. Should only happen in case of hardware defects
or OS bugs.

Is power failure considered here hw defect or does the proposed design
protects against that?

Thanks,

Tomeu

> CU Sascha
>
> --
> http://sascha.silbe.org/
> http://www.infra-silbe.de/
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iQEcBAEBAgAGBQJKUQ3bAAoJELpz82VMF3DaWK8IALI6iuQd+RlC69cFIw5H9Uuo
> galQzLefbf+c/Xu+yuAgMvMTyehHENdKitDshLzkdepvoAvSVYeRuj6Hwk7OW3o2
> ptMsN1YfOt0toqP5oHERlclkbpKRruiZtabZJUvyqKpSm5Dms1P7/vPVbBBfAEz2
> QMMQEbCW/FO7NOpg6gy+V/bc5MJ5lYS+X274t29gp2yDk5rrlN4MIPcqlWPjZhsw
> sYZkb0CKCU/cxnf+eTNZglVrQAQbuvgKbh3Z6YFNc59ddmqGBhQSQHTMlnQgwPZx
> pOrE4CJ0l5rvQkBNAJJhoUmQTGxj08vDbnZBVtZqGRZ5sJkRqpRKeBwjglGmAlg=
> =6Efs
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Sugar-devel mailing list
> Sugar-devel at lists.sugarlabs.org
> http://lists.sugarlabs.org/listinfo/sugar-devel
>
>


More information about the Sugar-devel mailing list