[Bugs] #1090 UNSP: refactor indexing/query parsing, add support for prefixes

SugarLabs Bugs bugtracker-noreply at sugarlabs.org
Fri Aug 7 06:18:05 EDT 2009


#1090: refactor indexing/query parsing, add support for prefixes
------------------------------------------+---------------------------------
    Reporter:  sascha_silbe               |          Owner:  tomeu                      
        Type:  enhancement                |         Status:  new                        
    Priority:  Unspecified by Maintainer  |      Milestone:  Unspecified by Release Team
   Component:  sugar-datastore            |        Version:  Git as of bugdate          
    Severity:  Unspecified                |     Resolution:                             
    Keywords:  r-                         |   Distribution:  Unspecified                
Status_field:  New                        |  
------------------------------------------+---------------------------------
Changes (by tomeu):

  * keywords:  r? => r-


Comment:

 I'm unsure about date ranges, do you really expect people will use them
 without any hint about the expected syntax? I had to look at the code,
 then the strptime() man page, then run in python time.strftime() in order
 to know the exact date syntax.

 I'm also not happy about the datastore using locale-dependent functions
 because it means that the DS needs to run in the same locale as the user
 expects, which puts additional burden on the distributor. Hard to track
 bugs will appear because of this in different platforms (we have seen this
 with the older DS implementation).

 While I see some value for prefixes in the free text query, I don't see
 date ranges being really used in the field.

 There's also the issue with the Journal appending a * to the query, how do
 you plan to solve this?

 Are we changing anything in what is stored in the index? If so, the index
 needs to be rebuilt? Any change in the size of the index?

 Barring that, the patch looks very good and will be happy to push it soon.
 Some small issues, only noting the first occurrence:

 {{{
 _query_term_map = {
 }}}

 See PEP8 about the naming of constants.

 {{{
         # add full value for dictionary-based searches
 ...
         # index with and without prefix so specifying a prefix is optional
         # in query strings
 }}}

 If you need comments to explain what your code does, consider
 restructuring it so it's clearer, or use API comments instead (likely the
 best option in this case).

 {{{
         # TODO: change query parser to generate phrase query instead
 }}}

 Can you extend on this?

 {{{
         doc.add_term(prefix+value)
 }}}

 What's the value of having a term without any prefix if we are already
 calling index_text() for that value? See PEP8 about spaces around
 operators.

 {{{
         for (name, prefix) in _query_term_map.items():
 }}}

 Unneeded parens.

 {{{
     def _parse_query_term(self, m_name, prefix, m_value):
 }}}

 No idea what m_ means here, please don't use abbreviations if at all
 possible.

 {{{
 +            return Query(Query.OP_OR, [
 +                self._parse_query_term(m_name, prefix, word)
 +                for word in m_value])
 }}}

 Can you split this in several statements? That will also give you the
 chance to give names to expressions, thus making clearer what is going on
 there.

 {{{
 +                "Only tuples of size 2 have a defined meaning. " \
 +                "Did you mean to pass a list instead?")
 }}}

 Strings don't need \. If we can use ' everywhere instead of ", would be
 great.

 {{{
 +        return Query(Query.OP_VALUE_RANGE,
 +            value_no, str(start), str(end))
 }}}

 Fits in one line.

 {{{
 +            # compatibility option for timestamp: {'start': 0, 'end': 1}
 }}}

 Does this mean that ranges in the query dict would get deprecated by the
 ranges in the query string?

 {{{
 +        except xapian.QueryParserError, exception:
 +            logging.warning("Invalid query string: "+exception.get_msg())
 +            return Query()
 }}}

 Hmm, if the exception was propagated, the Journal could say that an error
 has happened, similar to how we say today that the query returned no
 results. Wouldn't that be more useful feedback than returning the results
 of a blank query?

 {{{
 +        queries += [
 +            self._parse_query_term(name, prefix, query_dict.pop(name))
 +            for (name, prefix) in _query_term_map.items()
 +            if name in query_dict]
 +
 +        queries += [
 +            self._parse_query_value(name, value_no, query_dict.pop(name))
 +            for (name, value_no) in _query_value_map.items()
 +            if name in query_dict]
 }}}

 Please split this in several statements. Not only helps with readability
 but also keeps future patches more clear.

 {{{
 +        self._query_parser = QueryParser()
 +        self._query_parser.set_database(self._database)
 +        self._term_generator = TermGenerator()
 }}}

 Are these operations expensive enough to cache those objects? Don't
 remember seeing them when I profiled the DS.

 I'm trying to improve the review process docs, comments welcome:

 http://wiki.sugarlabs.org/go/Development_Team/CodeReview

-- 
Ticket URL: <http://dev.sugarlabs.org/ticket/1090#comment:4>
Sugar Labs <http://sugarlabs.org/>
Sugar Labs bug tracking system


More information about the Bugs mailing list