[Sugar-devel] Datastore redesign
Tomeu Vizoso
tomeu at sugarlabs.org
Thu Jul 9 08:00:59 EDT 2009
On Mon, Jul 6, 2009 at 16:33, Eben Eliason<eben at laptop.org> wrote:
> On Mon, Jul 6, 2009 at 10:02 AM, Sascha
> Silbe<sascha-ml-ui-sugar-devel at silbe.org> wrote:
>> On Mon, Jul 06, 2009 at 12:29:53PM +0200, Tomeu Vizoso wrote:
>>
>>> Agreed. I have to say that your proposal is excellent, congratulations!
>>
>> Thanks, I'm flattered. :)
>>
>>
>>>> Is the asynchronous API design useful enough to warrant more complex
>>>> implementation?
>>>
>>> I'm not sure, but I think that whatever decision we take should be
>>> made based on actual usage of the DS. What about proposing an example
>>> of how an existing activity would be modified to use the new API?
>>
>> OK, will work on one.
>>
>>
>>>> - For save() calls activity needs to wait for result (containing new
>>>> version_id) before it can invoke save() again for the same object
>>>> which can take quite some time if save() is sync - especially if other
>>>> activities are saving at the same time.
>>>
>>> What about having a separate call that returns synchronously a new
>>> tree_id and/or version_id?
>>
>> Interesting idea, need to think about it. As we're going to use UUIDs not
>> using requested versions shouldn't be an issue (for other version number
>> schemes like the one you propose below "holes" in the numbering could be
>> troublesome).
>
> I think "holes" can be expected, so we should definitely be prepared for them.
>
>>
>>>> Making the API fully asynchronous is the cause for much of the complexity
>>>> of
>>>> my proposal, but if we eliminate the queueing the response times for
>>>> write
>>>> accesses and checkout() can be very long even for unrelated operations.
>>>
>>> Why for unrelated operations?
>>
>> Because we're serializing VCS operations. They are IO bound (more
>> specifically: disk bound) and parallelisation would only lead to IO
>> starvation, especially for HDDs.
>>
>>
>>> # do we want an optimized way to determine (only) the branch HEADs of
>>> a given tree_id?
>>>
>>> This depends on the intended UI. My opinion is that if we branch at
>>> every interesting modification (triggered by the activity detecting an
>>> interesting change or by the user clicking on the Keep button), we
>
> I don't think we need to branch in these instances. These events
> should create new versions, but not necessarily new branches.
In my proposal, branch is not a concept directly exposed to the user.
Is just an artefact that allows the journal to display the relevant
info to an user. If we branch at the following events, displaying only
HEADs of branches in the journal list view makes sense for the user:
- new tree_id,
- resume entry,
- keep button,
- after a big change in the activity model (user deletes the whole
drawing, etc).
There are other ways to manage what we want, but this approach made it
very easy to implement.
Just to make sure I'm understood, I see why using branches this way
may seem conceptually wrong, it's not how we would work in a VCS or
CMS. But by creating new branches at those points and displaying only
HEADs of branches in the list view is the simplest way I found of
displaying the relevant entries in a robust way (resisting activity
crashes).
>>> would like to display in the object list all the HEADs of each branch
>>> in each tree_id. In that case yes, we need a way to retrieve that list
>>> that is fast on both the client and the server side.
>>
>> My imagined usage of branches was to create them automatically upon altering
>> a non-HEAD version.
>
> This makes sense to me, personally.
>
>> A user basing off an old version could mean the newer version is "broken"
>> (in that case promoting the new version to the HEAD of the current branch
>> makes more sense) or that (s)he uses the older version as a kind of template
>> to create derivates (so creating a branch would make most sense).
>> But I'm open to alternative suggestions. We'd most likely need a way to
>> explicitly create branches then.
>>
>>
>>> # using symlink instead of hardlink for "incoming" queue since we want
>>> to support directory trees, not just files
>>
>>> What justifies this new requirement?
>>
>> That it's
>> a) of use to activities (IIRC some of them use ZIP files right instead now),
>
> This has its own merits, though. Encapsulating the related files as a
> single archive makes transporting the file around, or sending it to a
> friend, trivial. It's good for the same reason that activity bundles
> are good.
But this may be considered an internal implementation detail of the
DS, unless we want to support users directly browsing the DS backend.
>> b) easy enough to achieve with the new design and
>> c) leads to better delta compression and thus disk space effiency.
>>
>>
>>> # since an index rebuild can take a lot of time we need to provide UI
>>> feedback while doing that
>>>
>>> Any I/O operation can potentially take a lot of time, but with the
>>> current version of the DS rebuilding an index with a few thousands of
>>> entries is not so slow on the XO. We should never need to rebuild the
>>> index, so this new requirement might not be justified (given the
>>> current resources, all the other work we need to do, etc).
>>
>> OK, good to know index rebuilding is fast. So the simple, boolean API I
>> proposed (check_ready() / Ready()) suffices.
>>
>>
>>> # detecting identical files across objects isn't as important since
>>> duplicates are mostly expected to occur as versions of the same object
>>
>>> Based on how current activities are using the DS, this isn't like
>>> that.
>>> The most common case I have heard from the field are children
>>> downloading a PDF for reading several times.
>>
>> Oh, didn't know that, so it's a new requirement.
>>
>>> An alternative to the current method for detecting duplicates is moving
>>> this task to
>>> activities, is that what you suggest?
>>
>> I'm ambivalent about it. On one hand it's not so easy to achieve in
>> datastore (for various backends) and more indicative of UI deficiencies (why
>
> This might be the case, indeed. On other operating systems/browsers,
> this (downloading multiple copies if the link is clicked multiple
> times) is expected behavior. Perhaps we can work out some ways to make
> the UI clearer.
That would be great.
>> did the children download the file several times in the first place; it's
>> bandwidth wastage as well), on the other hand it might not be easy to do in
>> Browse, too. But maybe storing the URL as metadata and looking for that is
>> enough for most cases? I guess it happens during a single session so the URL
>> (even if including a session ID or whatever) should be stable enough?
>>
>>
>>> About the benefits of differential compression I would like to note
>>> that if you analize a real world journal, the biggest files are
>>> videos, mp3, pdfs, etc., so files in formats not easily editable with
>>> the activities we currently have.
>>
>> Which is neither an argument pro nor contra delta compression as storage
>> requirements should be about the same either way.
>> OTOH most activities that do support modification currently save in a text
>> based format, so for the large number of versions I expect (remember we're
>> autosaving on activity switch) it could be a huge gain (not with git though
>> since AFAICT it stores the entire blob every time, not just the
>> differences).
>>
>>
>>> With that I don't mean is not an
>>> interesting challenge or something that we won't need in the future,
>>> just that it has a relatively low impact as of today.
>>
>> Which is why the minimal "delta compression" in git should be sufficient for
>> now. :)
>> What's more of a problem is one of the points mtd raised, though: git
>> potentially choking on large files (mmap should be fine OTOH).
>>
>>
>>> # activities should not submit new entries while the previously
>>> submitted one hasn't been fully committed yet
>>>
>>> Why so?
>>
>> This is the answer I gave before:
>>
>> Looks like I need to define should/must/etc. for the final version of the
>> document. It's an advice, not a requirement. The intention is to avoid
>> having an ever-increasing backlog because the activity saves faster than
>> the datastore can process.
>
> Another consideration could be to allow overlapping saves, but to
> institute a policy by which the new request supersedes the working
> one, which would get dropped.
>
>>
>>> # version_id and parent_id
>>>
>>> Have you thought about version_id being of the form of '2.1.4'?
>>
>> Yes, that's what I intended originally. But someone (Ben?) made a good
>> argument for random IDs in one of the recent threads. Besides: the current
>> prototype already uses the latter ones. :)
>
> The short summary here is that we expect two kids to take version 1.2
> home with them, modify it, and come back together later on. If we use
> this scheme, their versions will both be identified as 1.3, which is
> misleading since the (tree_id, version_id) pair should identify a
> unique blob, whereas their resulting versions will be different.
That's a good point.
>> Using random IDs and storing relationship in metadata is easier to implement
>> than constructing and parsing structured IDs. It's not clear the latter
>> would buy us anything real.
>>
>>
>>> That
>>> would make parent_id unneeded because we could refer to the parent as
>>> (tree_id, 2.1.3). And would also allow us to identify the HEAD of each
>>> branch.
>>
>> But only inside datastore - any API consumer shouldn't make assumptions
>> about version format.
>>
>>
>>> # creator
>>>
>>> What is it for?
>>
>> E.g. to determine the default activity for resuming. Current name of this
>> property is 'activity'.
>>
>>
>>> # activity saves data to a disk, ensuring it has been committed (sync)
>>> and proper access rights for data store
>>>
>>> By sync you mean written to disk? Why activities need to worry about this?
>>
>> Because activities know best what exactly needs to be synced. We should be
>> able to remove this requirement in exchange for reduced datastore
>> performance (esp. for directory objects).
>> I'm not perfectly sure fdatasync() done in datastore will cause data written
>> by the activity to be written to disk (though I read the POSIX definition of
>> fdatasync() that way) but there are ways to find that out. :)
>>
>>
>>> # Changes the (unversioned/version-specific) metadata of the given
>>> object to match metadata. Fully synchronous, no return value.
>>>
>>> How do we know which properties are version-specific and which aren't?
>>
>> By treating them accordingly. :)
>> Datastore is agnostic to this property (of metadata entries). Metadata is
>> bound to each version but modifiable. For "versioned" metadata the API
>> consumer is supposed to call save(), for "unversioned/version-specific"
>> metadata it should call change_metadata() instead.
>> If we decide to make some metadata global (i.e. common to all versions) I'd
>> just hardcode those few names.
>>
>>
>> [delete(tree_id)]
>>>
>>> # Remove (all versions of) given object from data store. Fully
>>> synchronous. Doesn't return anything, emits signal Deleted(tree_id).
>>>
>>> Do we have any operation in the UI that matches this?
>
> When we created mockups of versions in the Journal we did provide this
> functionality as a secondary option under the erase button: "Erase all
> versions"
>
>> Sure. It's exactly the same as delete(uid) in the current API, used by
>> Journal.
>> You might convince me to add a variant to remove single versions, but keep
>> in my mind that deleting a single version from a VCS repository can be quite
>> tough.
>
> Tough or not, I think this is actually a critical feature, and a
> rather likely case. It was (though we could invert this, I guess) the
> primary delete operation in our minds as we were designing a Journal
> with versions.
>
>> A variant to remove branches might be easier to implement, but we should
>> decide how to use branches before thinking about how useful that would be.
>>
>>
>>> # Get/Got
>>> Maybe should we make it a bit more verbose? Like GetData?
>>
>> Makes sense as we're only returning data anyway, not metadata so it's not
>> the exact opposite of save(). Changed, thanks for the suggestion.
>>
>>
>>> # Prefixing a key name with '!' inverts the sense of matching
>>>
>>> Is this used by the UI?
>>
>> Currently not but easy to implement (on datastore side) and AFAIR talked
>> about in one of the current threads.
>>
>>
>>> # prefixing it with '*' enables regular expression search
>>>
>>> Is this used by the UI? I think it's good to think now how possibly
>>> interesting new features would be added in the future, but based on
>>> past experiences I think it would be better to only implement what we
>>> need right now.
>>
>> This is one features I'm easy to convince to throw out. :)
>> As I included the textsearch() API call now (since current Journal needs it)
>> we can rely on that instead. Marked as OPTIONAL for now. Will only be
>> implemented if it's just a few SLoCs (as I expect it to be).
>>
>>
>>> # Arbitrary key names are allowed, but speed may vary (i.e. not
>>> everything is indexed).
>>>
>>> Same here, I would return an exception for a non-indexed field before
>>> implementing searches for arbitrary properties.
>>
>> I think that's crippling potential Journal development / alternatives too
>> much. See Library for example, it uses arbitrary metadata and has to read
>> the whole datastore contents currently.
>>
>>> # if True returns all matching versions of an object instead of
>>> only the latest one
>>>
>>> Where in the UI we would list only the last versions of several tree_id?
>
> The object list may only show the most recent version of each object.
> Revealing all versions might be done only in details, or perhaps by
> expanding the entry somehow. Even in these cases, though, it's not
> clear that we'd want to return ONLY the latest.
I was thinking that we intended to display the "relevant" versions of
a tree_id, not just the last one. That's what was implemented a couple
of years ago in the versions prototype. If we only want to display the
last version, then we can disregard what I wrote before about
branches.
Then my question is: if we don't want branches for knowing which
versions are relevant, what do we want branches for?
Regards,
Tomeu
>> In the current Journal list view and in the object picker (which I don't
>> particularly like but that's a topic on its own). There's a case to be made
>> to return the HEADs of all branches instead, see also the corresponding TODO
>> entry.
>>
>>
>>> # textsearch(querystring, options)
>>>
>>> What if the user has a date filter and enters a fulltext query? I
>>> don't see how this would be implemented with the proposed
>>> find/textsearch split.
>>
>> That's a tough example. All other filters are easily replaced by prefix
>> terms, but date is a range so it needs to be a value inside Xapian, not a
>> term.
>> How about just adding "query" from find() to it? Then most activities could
>> rely on the stable interface of find() and the few advanced consumers (like
>> Journal) would need to be adapted to a new IR search API anyway in order to
>> provide better user experience (spelling corrections, tag suggestions, ...).
>>
>>
>>> # Stopped()
>>> What is this for?
>>
>> Tell me. :-P
>> Maybe to delay shutdown until datastore has finished writing?
>>
>>
>>> # The internal data structures of datastore or one of its
>>> backends are corrupted. Should only happen in case of hardware defects
>>> or OS bugs.
>>>
>>> Is power failure considered here hw defect or does the proposed design
>>> protects against that?
>>
>> The latter option. Actually there's another way to corrupt the data
>> structures, namely improper tuning of filesystem / fs options (e.g.
>> data=writeback on ext3 or using VFAT), but it could be argued that it's just
>> an OS bug since the API contract is broken then. ;)
>>
>> CU Sascha
>
>
> Thanks for entertaining my limited feedback.
> Eben
>
>> --
>> http://sascha.silbe.org/
>> http://www.infra-silbe.de/
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.9 (GNU/Linux)
>>
>> iQEcBAEBAgAGBQJKUgPpAAoJELpz82VMF3DaO50H/05SRNcD0NTLeTFLMIto6kDR
>> 2W+j91nBc0fl5BAGEkSo4COdbJfXDkUMYcXjOTyoV5fspl5DVzP/OO2eATHhLgdu
>> H9A5Lyc0YQXOdeKp3mhPVdXvlj67AdRl8iBuNDAVcu8qbLBTVl3Zb+NB2iFXun7B
>> SUORrSoQMf6l8gBsUm38CU0xZpO/Bs8hR7bfJwEqpI28b+lbceMEizBoVR5xflZ0
>> 5rwfPG7IobelVu9RnRMoXmwg/gKhWN7LYmtTb+94OJbm92+hoj2rzkZXxVV9o8zw
>> Eci6CoiReLqHg7dvA9ewe6kMRc1XGGTlE7/N6DN9fYTmwwF9W4mkJkMLNgbJBUA=
>> =yQws
>> -----END PGP SIGNATURE-----
>>
>> _______________________________________________
>> Sugar-devel mailing list
>> Sugar-devel at lists.sugarlabs.org
>> http://lists.sugarlabs.org/listinfo/sugar-devel
>>
>>
>
More information about the Sugar-devel
mailing list