[Bugs] #1090 UNSP: refactor indexing/query parsing, add support for prefixes
SugarLabs Bugs
bugtracker-noreply at sugarlabs.org
Fri Aug 7 06:18:05 EDT 2009
#1090: refactor indexing/query parsing, add support for prefixes
------------------------------------------+---------------------------------
Reporter: sascha_silbe | Owner: tomeu
Type: enhancement | Status: new
Priority: Unspecified by Maintainer | Milestone: Unspecified by Release Team
Component: sugar-datastore | Version: Git as of bugdate
Severity: Unspecified | Resolution:
Keywords: r- | Distribution: Unspecified
Status_field: New |
------------------------------------------+---------------------------------
Changes (by tomeu):
* keywords: r? => r-
Comment:
I'm unsure about date ranges, do you really expect people will use them
without any hint about the expected syntax? I had to look at the code,
then the strptime() man page, then run in python time.strftime() in order
to know the exact date syntax.
I'm also not happy about the datastore using locale-dependent functions
because it means that the DS needs to run in the same locale as the user
expects, which puts additional burden on the distributor. Hard to track
bugs will appear because of this in different platforms (we have seen this
with the older DS implementation).
While I see some value for prefixes in the free text query, I don't see
date ranges being really used in the field.
There's also the issue with the Journal appending a * to the query, how do
you plan to solve this?
Are we changing anything in what is stored in the index? If so, the index
needs to be rebuilt? Any change in the size of the index?
Barring that, the patch looks very good and will be happy to push it soon.
Some small issues, only noting the first occurrence:
{{{
_query_term_map = {
}}}
See PEP8 about the naming of constants.
{{{
# add full value for dictionary-based searches
...
# index with and without prefix so specifying a prefix is optional
# in query strings
}}}
If you need comments to explain what your code does, consider
restructuring it so it's clearer, or use API comments instead (likely the
best option in this case).
{{{
# TODO: change query parser to generate phrase query instead
}}}
Can you extend on this?
{{{
doc.add_term(prefix+value)
}}}
What's the value of having a term without any prefix if we are already
calling index_text() for that value? See PEP8 about spaces around
operators.
{{{
for (name, prefix) in _query_term_map.items():
}}}
Unneeded parens.
{{{
def _parse_query_term(self, m_name, prefix, m_value):
}}}
No idea what m_ means here, please don't use abbreviations if at all
possible.
{{{
+ return Query(Query.OP_OR, [
+ self._parse_query_term(m_name, prefix, word)
+ for word in m_value])
}}}
Can you split this in several statements? That will also give you the
chance to give names to expressions, thus making clearer what is going on
there.
{{{
+ "Only tuples of size 2 have a defined meaning. " \
+ "Did you mean to pass a list instead?")
}}}
Strings don't need \. If we can use ' everywhere instead of ", would be
great.
{{{
+ return Query(Query.OP_VALUE_RANGE,
+ value_no, str(start), str(end))
}}}
Fits in one line.
{{{
+ # compatibility option for timestamp: {'start': 0, 'end': 1}
}}}
Does this mean that ranges in the query dict would get deprecated by the
ranges in the query string?
{{{
+ except xapian.QueryParserError, exception:
+ logging.warning("Invalid query string: "+exception.get_msg())
+ return Query()
}}}
Hmm, if the exception was propagated, the Journal could say that an error
has happened, similar to how we say today that the query returned no
results. Wouldn't that be more useful feedback than returning the results
of a blank query?
{{{
+ queries += [
+ self._parse_query_term(name, prefix, query_dict.pop(name))
+ for (name, prefix) in _query_term_map.items()
+ if name in query_dict]
+
+ queries += [
+ self._parse_query_value(name, value_no, query_dict.pop(name))
+ for (name, value_no) in _query_value_map.items()
+ if name in query_dict]
}}}
Please split this in several statements. Not only helps with readability
but also keeps future patches more clear.
{{{
+ self._query_parser = QueryParser()
+ self._query_parser.set_database(self._database)
+ self._term_generator = TermGenerator()
}}}
Are these operations expensive enough to cache those objects? Don't
remember seeing them when I profiled the DS.
I'm trying to improve the review process docs, comments welcome:
http://wiki.sugarlabs.org/go/Development_Team/CodeReview
--
Ticket URL: <http://dev.sugarlabs.org/ticket/1090#comment:4>
Sugar Labs <http://sugarlabs.org/>
Sugar Labs bug tracking system
More information about the Bugs
mailing list