[Sugar-devel] Plain Query Format proposal

Aleksey Lim alsroot at member.fsf.org
Sun Jul 19 22:12:59 EDT 2009


On Sun, Jul 19, 2009 at 02:02:14PM +0200, Sascha Silbe wrote:
> On Sun, Jul 19, 2009 at 01:11:42AM +0000, Aleksey Lim wrote:
> 
> > I'm going to implement [1] for 0.86.
> Have you taken a look at my datastore redesign proposal [3]? I'm 
> currently busy implementing the new API in the old datastore. Or rather 
> busy adapting the Python side (several parts of Sugar each interface 
> directly with the datastore via DBus, I'm trying to unify it) - the 
> datastore is already mostly done, though not thoroughly tested for 
> obvious reasons. Have already found a number of (pre-existing) bugs in 
> the Journal, though. ;)
>
> Added today:
> - list and range queries
> - prefixes in Xapian (string) queries
> 
> Still missing:
> - support for (filtering by) arbitrary metadata
>     - quite hard to implement: range matches are only supported on Xapian
>       Values, not Xapian Terms. Values are accessed by a 32bit number and
>       I'm not sure how storage space will be affected by "holes" in the
>       numbering.
>     - probably won't be implemented for current datastore

I'm not sure its worth doing for arbitrary fields(terms), I think a set of
predefined terms [5] is quite sufficient here(and efficient).

> - regular expression search (requires full scan)
>     - probably won't be implemented for current datastore
> - match inversion ("!" prefix)

Do you mean to implement it in outside of xapian?
Xapian already has minimal(but imho sufficient features) in query parsing:
"*" symbol, NOT operator etc. [8].

> Long story short: AFAICT most of your proposal is already supported by 
> the textsearch() API call. The "few" metadata keys you have proposed can 
> be added to (=hardcoded in) the DS, avoiding the "arbitrary metadata" 
> problem for now.

In fact I didn't think that it should be separate metadata fields
it could be substrings(but with prefixes) in tags field, but I guess
it doesn't matter for users(shell/activities) where these terms(not
exactly ds fields) live.

> The proposed unit tests would be quite nice to have, of course. :)
> 
> What I don't understand about your proposal, though, is why you intend 
> to _replace_ the query dictionary instead of supplementing it like in my 
> proposal (and already supported in the old datastore by passing 'query': 
> 'whatever' as part of the query dictionary). In what way is formatting 
> (including quoting/escaping!) a string easier than creating a 
> dictionary?

Well, plain string format is more new-feature-proof then dictionary,
e.g. after adding new ds field old users driven search code will work
(since it only takes query string from user and passes it to DS).

But if you mean that in [3] user can pass system terms to "query" key
to find()'s dictionary argument. Well, in that case, imho not having
implicit behavior (by implicitly ORing or ANDind dictionary keys) and
having only plain xapian query string(since it doesn't makes much sense
from xapian pov, we can pass to xapian dictionary keys, but xapian could
do the same job by parsing query string) could make more sense.

> > * let experienced users use system terms in Journal search bar
> FWIW, the current Journal already supports this, albeit with a very 
> limited numbers of terms and single-character prefixes (no colon):
> 
> _PREFIX_UID = 'Q'
> _PREFIX_ACTIVITY = 'A'
> _PREFIX_ACTIVITY_ID = 'I'
> _PREFIX_MIME_TYPE = 'M'
> _PREFIX_KEEP = 'K'
> 
> My prototype also supports prefixes now (e.g. "mime_type:text/plain").

The purpose of [1] is not only having system terms but users
predefined [5] and users terms in tags fields as well [6].
But the main purpose is having all these terms to implement tags
feature [7] - we can implement tags in effective way by utilize xapian
terms.

> >   * existed implementation has hard-coded logic for example in case of 
> > having several mime_types in query(all mime_types will be ORed despite 
> > what user wants).
> Sorry, I fail to see your point.

In current implementation and in your proposal, "query" argument is a
dictionary(and moreover in current implementation "mime_type" key from
that dictionary is dictionary as well), so relationship between these
keys/values is hardcoded(ORed or ANDed).

> What both your and my proposal don't address, BTW, are stemming and 
> spelling corrections - those need to know what language a given string 
> is in, so are not that easy to handle.

I'm thinking about passing "lang" argument for find()[4] to setup
xapian's stemming feature while parsing request string

[4] http://wiki.sugarlabs.org/go/Features/Plain_Query_Format#Result_set_control_parameters
[5] http://wiki.sugarlabs.org/go/Features/Plain_Query_Format#Users_predefined_terms
[6] http://wiki.sugarlabs.org/go/Features/Plain_Query_Format#Another_ways_to_differentiate_DS_objects
[7] http://wiki.sugarlabs.org/go/Features/Tags_in_Journal
[8] http://www.xapian.org/docs/queryparser.html

> > [1] http://wiki.sugarlabs.org/go/Features/Plain_Query_Format
> [3] http://git.sugarlabs.org/projects/versionsupport-project/repos/mainline/blobs/master/datastore-redesign.html (follow "raw blob" link)

-- 
Aleksey


More information about the Sugar-devel mailing list