[Sugar-devel] [PATCH] Convert Journal entries that have been saved to a storage device in 0.82

Simon Schampijer simon at schampijer.de
Tue Feb 22 11:08:38 EST 2011


On 02/19/2011 02:13 PM, Sascha Silbe wrote:
> Excerpts from Simon Schampijer's message of Mon Feb 14 23:04:42 +0100 2011:
>
> [src/jarabe/journal/volumestoolbar.py]
>> +import cPickle
>
> I was about to suggest "import cPickle as pickle" instead, but can't
> find anything in the docs to back up my preference.

Yes, I looked that up as well, but kept it like that, it is the module 
name in the end [1].

[1] http://docs.python.org/library/pickle.html

>> +import json
>
> Like for the other patch we should use simplejson for 0.92.

Yes, done.

> [_get_id()]
>> +    """Get the ID for the document in the xapian database."""
>> +    tl = document.termlist()
>> +    try:
>> +        term = tl.skip_to('Q').term
>> +        if len(term) == 0 or term[0] != 'Q':
>> +            return None
>> +        return term[1:]
>> +    except StopIteration:
>> +        return None
>
> The old data store contains code with a very similar function that's
> more succinct:
>
>      def get_all_ids(self):
>          return [ti.term[1:] for ti in self.read_index._index.allterms('Q')]
>
> Is there a noticable performance benefit from using skip_to() instead of
> allterms()?

Hmmm, I actually have the code from the old DS. I did not do performance 
tests on it.

> [_convert_entries()]
>> +    """Converts the entries written by the datastore version 0.
>> +    The metadata and the preview will be written using the new
>> +    scheme for writing Journal entries to removable storage
>> +    devices.
>
> PEP 257. ;)

Done.

>> +    try:
>> +        database = xapian.Database(os.path.join(root, _JOURNAL_0_METADATA_DIR,
>> +                                                'index'))
>> +    except xapian.DatabaseError, e:
>> +        logging.error('Convert DS-0 Journal entry. Error reading db: %s',
>> +                      os.path.join(root, _JOURNAL_0_METADATA_DIR, 'index'))
>> +        return
>
> How about: "Error reading Xapian data base written by pre-0.84 data
> store"?
> And s/error/exception/ so we log the full error. "e" can be dropped.

Dropped the e and the s/error/exception/. The error messages all have 
the same format (starting with: 'Convert DS-0 Journal entries:') and 
reflect the name of the functions as well. So I would like to keep them 
or we should change them all.

>> +    metadata_dir_path = os.path.join(root, model.JOURNAL_METADATA_DIR)
>> +    if not os.path.exists(metadata_dir_path):
>> +        os.mkdir(metadata_dir_path)
>
> We need to catch EnvironmentError.

Done.

>> +    for i in range(1, database.get_lastdocid() + 1):
>> +        try:
>> +            document = database.get_document(i)
>> +        except xapian.DocNotFoundError, e:
>> +            logging.debug('Convert DS-0 Journal entry. ' \
>> +                              'Error getting document %s: %s', i, e)
>> +            continue
>
> Since Xapian 1.0.0 [1] there's a better way [2]:
>
>      for posting_item in database.postlist(''):
>          document = database.get(posting_item.docid)

Done.

> Since all we need below this point (and inside the loop) is document,
> this is a good place to start a new function (_convert_entry()). Gets
> rid of one level of indentation (for the code inside the new function).

Done.

>> +        try:
>> +            metadata_loaded = cPickle.loads(document.get_data())
>> +        except cPickle.PickleError, e:
>> +            logging.debug('Convert DS-0 Journal entry. ' \
>> +                              'Error converting metadata: %s', e)
>> +            continue
 >
> Should be at least warning. Maybe even exception (i.e. error). Backslash
> isn't necessary; indentation doesn't match (proportional font?).

Since we don't fail and keep on iterating through the items I think that 
a debug message is enough. I wanted this code to be a 'try to convert if 
possible' approach what I want to reflect in the priority level for the 
messages as well.

>> +        if 'activity_id' in metadata_loaded and \
>> +                'mime_type' in metadata_loaded and \
>> +                'title' in metadata_loaded:
>
> A good opportunity for using guardian style again to get rid of one
> level of indentation:

Done.

>      for property in ['activity_id', 'mime_type', 'title']:
>          if property not in metadata_loaded:
>              return
>
>
>> +            metadata = {}
>> +
>> +            uid = _get_id(document)
>> +            if uid is None:
>> +                continue
>> +
>> +            for key, value in metadata_loaded.items():
>> +                metadata[str(key)] = str(value[0])
>
> What exactly does the JSON look like? (I can't easily tell from the
> source)

Here is an example of the metadata:

metadata={'activity_id': 
[dbus.String(u'82ed9018485d6c4d2e8227e7377455557e8946b6')], 
'title_set_by_user': [dbus.String(u\
'0')], 'vid': ['1.0'], 'title': [dbus.String(u'Write Activity')], 
'timestamp': ['1293567361'], 'mtime': ['1293567361.0'], 
dbus.String(u'share-scope'): [dbus\
.String(u'private')], 'filename': [u'Write Activity.odt'], 'icon-color': 
[dbus.String(u'#FF2B34,#F8E800')], 'activity': 
[dbus.String(u'org.laptop.AbiWordAct\
ivity')], dbus.String(u'suggested_filename'): [dbus.String(u'Write 
Activity.odt')], 'keep': ['0'], 'mime_type': 
[dbus.String(u'application/vnd.oasis.opendoc\
ument.text')]}

> I wonder if we should do ' '.join(value) instead so we don't silently
> drop any (meta)data.
>
>> +            if 'uid' not in metadata:
>> +                metadata['uid'] = uid
>
> Minor nitpick: metadata.setdefault('uid', uid) is shorter and at least
> as easy to understand.

Hmm, might be personal preference, I think the longer code is readable 
nicely :)

>> +            if 'filename' in metadata:
>> +                filename = metadata['filename']
>> +            else:
>> +                continue
>
> Guardian style. ;)

Done.

>> +            if metadata['title'] == '':
>
> I don't know the 0.82 data store too well, but it might be a good idea
> to use "if not metadata.get('title')" instead.
>
>> +                metadata['title'] = _('Untitled')
>> +                fn = model.get_file_name(metadata['title'],
>> +                                         metadata['mime_type'])
>> +                new_filename = model.get_unique_file_name(root, fn)
>> +                metadata['filename'] = new_filename
>
> Like for the preview, I don't think it's a good idea to save the
> absolute path of the file in metadata.

Ok, sounds good with me. Poped that metadata.

>> +                os.rename(os.path.join(root, filename),
>> +                          os.path.join(root, new_filename))
>> +                filename = new_filename
>
> This function is rather long (even with the proposed split), so it might
> be worth factoring this and the following preview handling out into
> two new functions.

After the split it looks ok for me.

>> +            preview_path = os.path.join(root, _JOURNAL_0_METADATA_DIR,
>> +                                        'preview', uid)
>
> If we "pre-compute" the root + metadata dir part, we might get away we
> a single line each time we need to compute a file name.

But we would have to do it for both "root, _JOURNAL_0_METADATA_DIR" and 
"root, model.JOURNAL_METADATA_DIR" and for both methods convert_entries 
and convert_entry, or pass it around, or...not sure we gain much here.

>> +            if os.path.exists(preview_path):
>
> Guardian style again to get rid of indentation (and thus line folding).

Hmm, I want to write the preview only if it exists and if not continue 
without logging something.

>> +                if not os.path.exists(new_preview_path):
>> +                    metadata['preview'] = preview_fname
>
> Like in the other patch we should simply drop preview from metadata.

Done.

>> +
>> +            metadata_fname = filename + '.metadata'
>> +            metadata_path = os.path.join(root, model.JOURNAL_METADATA_DIR,
>> +                                         metadata_fname)
>> +            if not os.path.exists(metadata_path):
>> +                (fh, fn) = tempfile.mkstemp(dir=root)
>> +                os.write(fh, json.dumps(metadata))
>> +                os.close(fh)
>> +                os.rename(fn, metadata_path)
>> +
>
>> +            logging.debug('Convert DS-0 Journal entry. Entry converted: ' \
>> +                              'File=%s Metadata=%s',
>> +                          os.path.join(root, filename), metadata)
>
> How about "Converted metadata for %r written by pre-0.84 data store: %r"?
>
> [VolumesToolbar._add_button()]
>> +        if os.path.exists(os.path.join(mount.get_root().get_path(),
>> +                                       _JOURNAL_0_METADATA_DIR)):
>
[...]
>
> [still _add_button()]
>> +            logging.debug('Convert DS-0 Journal entries.')
>
> How about "Pre-0.84 data store metadata detected, starting conversion."?

Made it: 'Convert DS-0 Journal entries: starting conversion'

> PS: I can do the refactorings for 0.94 if you'd like. Like for the other
> patch it would be nice to do the changes that are "observable from the
> outside" (dropping preview+filename from metadata etc.) before we include
> it in 0.92, though.

I have done all the changes listed above. See as well the mail about 
performance and the reasoning about the approach taken.

Thanks for the thorough review.

Regards,
    Simon


More information about the Sugar-devel mailing list