[Sugar-devel] Datastore redesign

Tomeu Vizoso tomeu at sugarlabs.org
Thu Jul 9 07:39:40 EDT 2009


On Mon, Jul 6, 2009 at 16:02, Sascha
Silbe<sascha-ml-ui-sugar-devel at silbe.org> wrote:
> On Mon, Jul 06, 2009 at 12:29:53PM +0200, Tomeu Vizoso wrote:
>
>> Agreed. I have to say that your proposal is excellent, congratulations!
>
> Thanks, I'm flattered. :)
>
>
>>> Is the asynchronous API design useful enough to warrant more complex
>>> implementation?
>>
>> I'm not sure, but I think that whatever decision we take should be
>> made based on actual usage of the DS. What about proposing an example
>> of how an existing activity would be modified to use the new API?
>
> OK, will work on one.
>
>
>>>  - For save() calls activity needs to wait for result (containing new
>>>    version_id) before it can invoke save() again for the same object
>>>    which can take quite some time if save() is sync - especially if other
>>>    activities are saving at the same time.
>>
>> What about having a separate call that returns synchronously a new
>> tree_id and/or version_id?
>
> Interesting idea, need to think about it. As we're going to use UUIDs not
> using requested versions shouldn't be an issue (for other version number
> schemes like the one you propose below "holes" in the numbering could be
> troublesome).

The issue I see with this is the actual creation or update operation
failing, thus further operations on that uuid would be invalid. I
guess activities will need to be able to notice that the creation
failed and do something accordingly.

>>> Making the API fully asynchronous is the cause for much of the complexity
>>> of
>>> my proposal, but if we eliminate the queueing the response times for
>>> write
>>> accesses and checkout() can be very long even for unrelated operations.
>>
>> Why for unrelated operations?
>
> Because we're serializing VCS operations. They are IO bound (more
> specifically: disk bound) and parallelisation would only lead to IO
> starvation, especially for HDDs.

What's the scenario in which starvation would happen and with which
consequences?

>> # do we want an optimized way to determine (only) the branch HEADs of
>> a given tree_id?
>>
>> This depends on the intended UI. My opinion is that if we branch at
>> every interesting modification (triggered by the activity detecting an
>> interesting change or by the user clicking on the Keep button), we
>> would like to display in the object list all the HEADs of each branch
>> in each tree_id. In that case yes, we need a way to retrieve that list
>> that is fast on both the client and the server side.
>
> My imagined usage of branches was to create them automatically upon altering
> a non-HEAD version.
> A user basing off an old version could mean the newer version is "broken"
> (in that case promoting the new version to the HEAD of the current branch
> makes more sense) or that (s)he uses the older version as a kind of template
> to create derivates (so creating a branch would make most sense).
> But I'm open to alternative suggestions. We'd most likely need a way to
> explicitly create branches then.

I actually would do a branch at every resume, even if it's the current HEAD.

>> # using symlink instead of hardlink for "incoming" queue since we want
>> to support directory trees, not just files
>
>> What justifies this new requirement?
>
> That it's
> a) of use to activities (IIRC some of them use ZIP files right instead now),
> b) easy enough to achieve with the new design and
> c) leads to better delta compression and thus disk space effiency.

That's fine, but I think we should first address present needs, once
we are done we can try to anticipate new ones. We are already planning
to do very big changes in one single go and would be very unfortunate
to have to back up those because of some peripheral requirements that
were added because they seemed easy enough at the time. I tell you
this because of my experience in releasing broken datastores. Would be
good if we could learn from the past.

>> # since an index rebuild can take a lot of time we need to provide UI
>> feedback while doing that
>>
>> Any I/O operation can potentially take a lot of time, but with the
>> current version of the DS rebuilding an index with a few thousands of
>> entries is not so slow on the XO. We should never need to rebuild the
>> index, so this new requirement might not be justified (given the
>> current resources, all the other work we need to do, etc).
>
> OK, good to know index rebuilding is fast. So the simple, boolean API I
> proposed (check_ready() / Ready()) suffices.

Well, that's the index as implemented in the current DS, may not hold
for the future.

>> # detecting identical files across objects isn't as important since
>> duplicates are mostly expected to occur as versions of the same object
>
>> Based on how current activities are using the DS, this isn't like
>> that.
>> The most common case I have heard from the field are children
>> downloading a PDF for reading several times.
>
> Oh, didn't know that, so it's a new requirement.
>
>> An alternative to the current method for detecting duplicates is moving
>> this task to
>> activities, is that what you suggest?
>
> I'm ambivalent about it. On one hand it's not so easy to achieve in
> datastore (for various backends) and more indicative of UI deficiencies (why
> did the children download the file several times in the first place; it's
> bandwidth wastage as well), on the other hand it might not be easy to do in
> Browse, too. But maybe storing the URL as metadata and looking for that is
> enough for most cases? I guess it happens during a single session so the URL
> (even if including a session ID or whatever) should be stable enough?

Maybe, but I would try to avoid doing such assumptions. We could store
a hash as metadata (I think we already do) and let the activity query
by it. But then each activity needs to perform this potentially
expensive operation and probably give some kind of feedback to the
user. Currently it's done in the DS as an optimization.

I agree though that the ideal would be that the user doesn't feel the
need to download stuff already in the journal.

>> About the benefits of differential compression I would like to note
>> that if you analize a real world journal, the biggest files are
>> videos, mp3, pdfs, etc., so files in formats not easily editable with
>> the activities we currently have.
>
> Which is neither an argument pro nor contra delta compression as storage
> requirements should be about the same either way.
> OTOH most activities that do support modification currently save in a text
> based format, so for the large number of versions I expect (remember we're
> autosaving on activity switch) it could be a huge gain (not with git though
> since AFAICT it stores the entire blob every time, not just the
> differences).

I'm not saying there could be a harm in having differential
compression, just that it's not one of the things that will make a big
difference to users. Same as with storing dirs instead of files, I
think we should spend any extra energy in making the datastore more
robust and polished.

>> With that I don't mean is not an
>> interesting challenge or something that we won't need in the future,
>> just that it has a relatively low impact as of today.
>
> Which is why the minimal "delta compression" in git should be sufficient for
> now. :)
> What's more of a problem is one of the points mtd raised, though: git
> potentially choking on large files (mmap should be fine OTOH).

Do we already know that we need to use git? If we haven't decided that
yet, then we may better focus now on what we actually need. Then once
we have decided on a backend based on those requirements, we can talk
about possible opportunities brought by the chosen backend.

>> # activities should not submit new entries while the previously
>> submitted one hasn't been fully committed yet
>>
>> Why so?
>
> This is the answer I gave before:
>
> Looks like I need to define should/must/etc. for the final version of the
> document. It's an advice, not a requirement. The intention is to avoid
> having an ever-increasing backlog because the activity saves faster than
> the datastore can process.

Hmm, that scares me a bit. Which expensive operations need to happen
during checkin? I think this should be a fast operation always, and
any chance of having a big backlog should be due to abuse or gross bug
in the activity side. I was expecting that rainbow would do some
rate-limiting for these cases.

>> # version_id and parent_id
>>
>> Have you thought about version_id being of the form of '2.1.4'?
>
> Yes, that's what I intended originally. But someone (Ben?) made a good
> argument for random IDs in one of the recent threads. Besides: the current
> prototype already uses the latter ones. :)
> Using random IDs and storing relationship in metadata is easier to implement
> than constructing and parsing structured IDs. It's not clear the latter
> would buy us anything real.

Ok, but having a reference to the parent_id in each version is not
more fragile? What happens if I delete a version, the children are
changed to point to their grandparent?

>> That
>> would make parent_id unneeded because we could refer to the parent as
>> (tree_id, 2.1.3). And would also allow us to identify the HEAD of each
>> branch.
>
> But only inside datastore - any API consumer shouldn't make assumptions
> about version format.

Would they need to know about versions at all?

>> # creator
>>
>> What is it for?
>
> E.g. to determine the default activity for resuming. Current name of this
> property is 'activity'.

Then the example may be wrong: "org.laptop.WebActivity for downloaded files" ?

>> # activity saves data to a disk, ensuring it has been committed (sync)
>> and proper access rights for data store
>>
>> By sync you mean written to disk? Why activities need to worry about this?
>
> Because activities know best what exactly needs to be synced. We should be
> able to remove this requirement in exchange for reduced datastore
> performance (esp. for directory objects).
> I'm not perfectly sure fdatasync() done in datastore will cause data written
> by the activity to be written to disk (though I read the POSIX definition of
> fdatasync() that way) but there are ways to find that out. :)

Say I'm the author of Paint, what would I need to sync (and what do
you exactly mean by that)?

>> #    Changes the (unversioned/version-specific) metadata of the given
>> object to match metadata. Fully synchronous, no return value.
>>
>> How do we know which properties are version-specific and which aren't?
>
> By treating them accordingly. :)
> Datastore is agnostic to this property (of metadata entries). Metadata is
> bound to each version but modifiable. For "versioned" metadata the API
> consumer is supposed to call save(), for "unversioned/version-specific"
> metadata it should call change_metadata() instead.

That sounds good to me.

> If we decide to make some metadata global (i.e. common to all versions) I'd
> just hardcode those few names.

Can you think of a use case for it?

> [delete(tree_id)]
>>
>> #     Remove (all versions of) given object from data store. Fully
>> synchronous. Doesn't return anything, emits signal Deleted(tree_id).
>>
>> Do we have any operation in the UI that matches this?
>
> Sure. It's exactly the same as delete(uid) in the current API, used by
> Journal.
> You might convince me to add a variant to remove single versions, but keep
> in my mind that deleting a single version from a VCS repository can be quite
> tough.
> A variant to remove branches might be easier to implement, but we should
> decide how to use branches before thinking about how useful that would be.

What we expose in the UI is the possibility to delete one of the
entries the user is seeing. If we are going to represent several
versions of the same tree_id in the list view, then the user needs to
be able to delete only one of those entries, not all the entries with
the same tree_id.

>> # Get/Got
>> Maybe should we make it a bit more verbose? Like GetData?
>
> Makes sense as we're only returning data anyway, not metadata so it's not
> the exact opposite of save(). Changed, thanks for the suggestion.
>
>
>> # Prefixing a key name with '!' inverts the sense of matching
>>
>> Is this used by the UI?
>
> Currently not but easy to implement (on datastore side) and AFAIR talked
> about in one of the current threads.
>
>
>> # prefixing it with '*' enables regular expression search
>>
>> Is this used by the UI? I think it's good to think now how possibly
>> interesting new features would be added in the future, but based on
>> past experiences I think it would be better to only implement what we
>> need right now.
>
> This is one features I'm easy to convince to throw out. :)
> As I included the textsearch() API call now (since current Journal needs it)
> we can rely on that instead. Marked as OPTIONAL for now. Will only be
> implemented if it's just a few SLoCs (as I expect it to be).
>
>
>> # Arbitrary key names are allowed, but speed may vary (i.e. not
>> everything is indexed).
>>
>> Same here, I would return an exception for a non-indexed field before
>> implementing searches for arbitrary properties.
>
> I think that's crippling potential Journal development / alternatives too
> much. See Library for example, it uses arbitrary metadata and has to read
> the whole datastore contents currently.

The point I'm trying to make across (and perhaps not making a good job
at) is that the journal development has been crippled by datastores
that tried to do too much and failed at the basics. If the basics seem
trivial to you, it may be because you haven't had to deal with issues
after deployment.

Sorry if I sound irritated, but I have been promised 4 datastore
implementations already by very smart people, and at the end I have
had to maintain with great pain for one year the initial prototype and
then do a totally unexciting rewrite that at least: exists, doesn't
lose data, and has acceptable performance on the XO. I'm just trying
to avoid falling on the same trap for the 5th time...

Would be wonderful if we had a DS that is everything we would like,
but we should put our priorities straight and think of ways to give
developers of alternatives freedom but without putting at risk our
users.

Something we could explore in this sense is adding a direct, read-only
access mode for privileged activities such as Journal and Library. Let
them directly read the DS storage backend and build their own search
indexes.

In that way people can get fancy about displaying the data, but we
have a DS that is relatively simple, robust and fast. Having direct
access to the index and data without going through D-Bus would mean
better performance in the journal, also eliminating one cache layer
thus saving memory.

>> #     if True returns all matching versions of an object instead of
>> only the latest one
>>
>> Where in the UI we would list only the last versions of several tree_id?
>
> In the current Journal list view and in the object picker (which I don't
> particularly like but that's a topic on its own). There's a case to be made
> to return the HEADs of all branches instead, see also the corresponding TODO
> entry.

Yeah, i think we'll be making more versions than what we want to
expose in the list view.

>> # textsearch(querystring, options)
>>
>> What if the user has a date filter and enters a fulltext query? I
>> don't see how this would be implemented with the proposed
>> find/textsearch split.
>
> That's a tough example. All other filters are easily replaced by prefix
> terms, but date is a range so it needs to be a value inside Xapian, not a
> term.
> How about just adding "query" from find() to it? Then most activities could
> rely on the stable interface of find() and the few advanced consumers (like
> Journal) would need to be adapted to a new IR search API anyway in order to
> provide better user experience (spelling corrections, tag suggestions, ...).

Sorry, I fail to see what splitting find() in two functions would
bring us. And I think that the journal should be the main driver of
the find() func, most other activities use the objectchooser.

>> # Stopped()
>> What is this for?
>
> Tell me. :-P
> Maybe to delay shutdown until datastore has finished writing?

Oh, I think this was related to backups, we can drop it now because
the index doesn't need to be backed up any more.

>> #      The internal data structures of datastore or one of its
>> backends are corrupted. Should only happen in case of hardware defects
>> or OS bugs.
>>
>> Is power failure considered here hw defect or does the proposed design
>> protects against that?
>
> The latter option. Actually there's another way to corrupt the data
> structures, namely improper tuning of filesystem / fs options (e.g.
> data=writeback on ext3 or using VFAT), but it could be argued that it's just
> an OS bug since the API contract is broken then. ;)

Regards,

Tomeu

> CU Sascha
>
> --
> http://sascha.silbe.org/
> http://www.infra-silbe.de/
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iQEcBAEBAgAGBQJKUgPpAAoJELpz82VMF3DaO50H/05SRNcD0NTLeTFLMIto6kDR
> 2W+j91nBc0fl5BAGEkSo4COdbJfXDkUMYcXjOTyoV5fspl5DVzP/OO2eATHhLgdu
> H9A5Lyc0YQXOdeKp3mhPVdXvlj67AdRl8iBuNDAVcu8qbLBTVl3Zb+NB2iFXun7B
> SUORrSoQMf6l8gBsUm38CU0xZpO/Bs8hR7bfJwEqpI28b+lbceMEizBoVR5xflZ0
> 5rwfPG7IobelVu9RnRMoXmwg/gKhWN7LYmtTb+94OJbm92+hoj2rzkZXxVV9o8zw
> Eci6CoiReLqHg7dvA9ewe6kMRc1XGGTlE7/N6DN9fYTmwwF9W4mkJkMLNgbJBUA=
> =yQws
> -----END PGP SIGNATURE-----
>
>


More information about the Sugar-devel mailing list