[Sugar-devel] [PATCH] Convert Journal entries that have been saved to a storage device in 0.82

Sascha Silbe sascha-ml-reply-to-2011-2 at silbe.org
Sat Feb 19 14:13:28 EST 2011


Excerpts from Simon Schampijer's message of Mon Feb 14 23:04:42 +0100 2011:

[src/jarabe/journal/volumestoolbar.py]
> +import cPickle

I was about to suggest "import cPickle as pickle" instead, but can't
find anything in the docs to back up my preference.

> +import json

Like for the other patch we should use simplejson for 0.92.

[_get_id()]
> +    """Get the ID for the document in the xapian database."""
> +    tl = document.termlist()
> +    try:
> +        term = tl.skip_to('Q').term
> +        if len(term) == 0 or term[0] != 'Q':
> +            return None
> +        return term[1:]
> +    except StopIteration:
> +        return None

The old data store contains code with a very similar function that's
more succinct:

    def get_all_ids(self):
        return [ti.term[1:] for ti in self.read_index._index.allterms('Q')]

Is there a noticable performance benefit from using skip_to() instead of
allterms()?


[_convert_entries()]
> +    """Converts the entries written by the datastore version 0.
> +    The metadata and the preview will be written using the new
> +    scheme for writing Journal entries to removable storage
> +    devices.

PEP 257. ;)

> +    try:
> +        database = xapian.Database(os.path.join(root, _JOURNAL_0_METADATA_DIR,
> +                                                'index'))
> +    except xapian.DatabaseError, e:
> +        logging.error('Convert DS-0 Journal entry. Error reading db: %s',
> +                      os.path.join(root, _JOURNAL_0_METADATA_DIR, 'index'))
> +        return

How about: "Error reading Xapian data base written by pre-0.84 data
store"?
And s/error/exception/ so we log the full error. "e" can be dropped.

> +    metadata_dir_path = os.path.join(root, model.JOURNAL_METADATA_DIR)
> +    if not os.path.exists(metadata_dir_path):
> +        os.mkdir(metadata_dir_path)

We need to catch EnvironmentError.

> +    for i in range(1, database.get_lastdocid() + 1):
> +        try:
> +            document = database.get_document(i)
> +        except xapian.DocNotFoundError, e:
> +            logging.debug('Convert DS-0 Journal entry. ' \
> +                              'Error getting document %s: %s', i, e)
> +            continue

Since Xapian 1.0.0 [1] there's a better way [2]:

    for posting_item in database.postlist(''):
        document = database.get(posting_item.docid)


Since all we need below this point (and inside the loop) is document,
this is a good place to start a new function (_convert_entry()). Gets
rid of one level of indentation (for the code inside the new function).

> +        try:
> +            metadata_loaded = cPickle.loads(document.get_data())
> +        except cPickle.PickleError, e:
> +            logging.debug('Convert DS-0 Journal entry. ' \
> +                              'Error converting metadata: %s', e)
> +            continue

Should be at least warning. Maybe even exception (i.e. error). Backslash
isn't necessary; indentation doesn't match (proportional font?).


> +        if 'activity_id' in metadata_loaded and \
> +                'mime_type' in metadata_loaded and \
> +                'title' in metadata_loaded:

A good opportunity for using guardian style again to get rid of one
level of indentation:

    for property in ['activity_id', 'mime_type', 'title']:
        if property not in metadata_loaded:
            return


> +            metadata = {}
> +
> +            uid = _get_id(document)
> +            if uid is None:
> +                continue
> +
> +            for key, value in metadata_loaded.items():
> +                metadata[str(key)] = str(value[0])

What exactly does the JSON look like? (I can't easily tell from the
source)

I wonder if we should do ' '.join(value) instead so we don't silently
drop any (meta)data.

> +            if 'uid' not in metadata:
> +                metadata['uid'] = uid

Minor nitpick: metadata.setdefault('uid', uid) is shorter and at least
as easy to understand.

> +            if 'filename' in metadata:
> +                filename = metadata['filename']
> +            else:
> +                continue

Guardian style. ;)

> +            if metadata['title'] == '':

I don't know the 0.82 data store too well, but it might be a good idea
to use "if not metadata.get('title')" instead.

> +                metadata['title'] = _('Untitled')
> +                fn = model.get_file_name(metadata['title'],
> +                                         metadata['mime_type'])
> +                new_filename = model.get_unique_file_name(root, fn)
> +                metadata['filename'] = new_filename

Like for the preview, I don't think it's a good idea to save the
absolute path of the file in metadata.

> +                os.rename(os.path.join(root, filename),
> +                          os.path.join(root, new_filename))
> +                filename = new_filename

This function is rather long (even with the proposed split), so it might
be worth factoring this and the following preview handling out into
two new functions.

> +            preview_path = os.path.join(root, _JOURNAL_0_METADATA_DIR,
> +                                        'preview', uid)

If we "pre-compute" the root + metadata dir part, we might get away we
a single line each time we need to compute a file name.

> +            if os.path.exists(preview_path):

Guardian style again to get rid of indentation (and thus line folding).

> +                if not os.path.exists(new_preview_path):
> +                    metadata['preview'] = preview_fname

Like in the other patch we should simply drop preview from metadata.

> +
> +            metadata_fname = filename + '.metadata'
> +            metadata_path = os.path.join(root, model.JOURNAL_METADATA_DIR,
> +                                         metadata_fname)
> +            if not os.path.exists(metadata_path):
> +                (fh, fn) = tempfile.mkstemp(dir=root)
> +                os.write(fh, json.dumps(metadata))
> +                os.close(fh)
> +                os.rename(fn, metadata_path)
> +

> +            logging.debug('Convert DS-0 Journal entry. Entry converted: ' \
> +                              'File=%s Metadata=%s',
> +                          os.path.join(root, filename), metadata)

How about "Converted metadata for %r written by pre-0.84 data store: %r"?

[VolumesToolbar._add_button()]
> +        if os.path.exists(os.path.join(mount.get_root().get_path(),
> +                                       _JOURNAL_0_METADATA_DIR)):

So this gets triggered as soon as somebody inserts a USB stick and will
scan all entries in one go, blocking the UI? I'm a bit worried about
response time for media with a lot of entries written by 0.82,
especially since we do it over and over again.

We could do it similar to how the data store does it: One entry at a
time, returning True (=> call again) unless we're done. We'd have to
block (show the progress bar) when the user tries to show the contents
before we're done with the conversion.

An alternative would be to trigger it only once the user tries to list
the contents, integrating it with the existing progress bar. I guess
it wouldn't make much of a difference for the code (we'd still need to
process only small chunks so the progress bar can keep going). The real
issue is whether we do it in the background or in the foreground. If we
do it in the background and the user does something else while we
convert the entry, the response to switch the view to the medium will
be much faster. OTOH if we do it in the foreground the user has a better
chance of noticing there's something being done every time and can
delete the 0.82 files.

BTW, why don't we rename the 0.82 files after we finished the
conversion? That way we don't need to do it over and over again. And if
we really discover a bug in the conversion, we can just check for the
renamed file, redo the conversion and rename to yet another name.


[still _add_button()]
> +            logging.debug('Convert DS-0 Journal entries.')

How about "Pre-0.84 data store metadata detected, starting conversion."?


PS: I can do the refactorings for 0.94 if you'd like. Like for the other
patch it would be nice to do the changes that are "observable from the
outside" (dropping preview+filename from metadata etc.) before we include
it in 0.92, though.

Sascha

[1] http://xapian.org/cgi-bin/bugzilla/show_bug.cgi?id=47
[2] http://xapian.org/docs/apidoc/html/classXapian_1_1Database.html#504d8e800384d12a8f9defee39362f5e
-- 
http://sascha.silbe.org/
http://www.infra-silbe.de/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 494 bytes
Desc: not available
URL: <http://lists.sugarlabs.org/archive/sugar-devel/attachments/20110219/62f8ee0a/attachment.pgp>


More information about the Sugar-devel mailing list