[Sugar-devel] [PATCH] Share Journal entries over external devices #1636

Sascha Silbe sascha-ml-reply-to-2010-3 at silbe.org
Fri Nov 26 10:33:13 EST 2010


Excerpts from simon's message of Thu Nov 25 10:51:16 +0100 2010:

> metadata and preview of an entry are stored in hidden files. For example
> when copying an entry homework.pdf to the device a file called
> .homework.pdf.metadata and .homework.pdf.preview will be stored on the
> device as well. Those will be read in when copying an entry from the
> external device to a Journal.

I'm not yet convinced of that approach. How do you detect and handle
"stale" metadata and preview files? Especially since you "hide" the
files (how do non-Unix OSs handle dot-files, BTW?) they will not get
deleted together with the data file. Any file created later (from
outside of Sugar) with the same name will suddenly have metadata that
the user never intended it to have.

> The approach has been discussed in the ticket #1636 and has been reviewed.

I missed that discussion because it happened on Trac. Please, let's have
all engineering discussions on sugar-devel where everybody can follow
them.

> This patch should go into master and 0.90 to allow for an update from
> 0.84 to 0.90.

I'm not sure we should really do non-bugfix changes to 0.90. Downstream
is welcome to cherry-pick everything they like from master, of course.


Let's do a "quick" run over the patch itself. I'll have a more thorough
look once we decided on the general approach and you (hopefully)
addressed the points below.


[src/jarabe/journal/model.py]
> @@ -289,8 +291,10 @@ class InplaceResultSet(BaseResultSet):
>          files = self._file_list[offset:offset + limit]
>  
>          entries = []
> -        for file_path, stat, mtime_, size_ in files:
> -            metadata = _get_file_metadata(file_path, stat)
> +        for file_path, stat, mtime_, size_, metadata in files:
> +            if metadata is None:
> +                # FIXME: the find should fetch metadata
> +                metadata = _get_file_metadata(file_path, stat)

Looks like you wanted to change something before submitting the patch
(and I agree you should ;) ).

[InplaceResultSet._real_recurse_dir()]
> +                        metadata = _get_file_metadata_from_json( \
> +                            dir_path, entry, preview=False)
> +                        if metadata is not None:
> +                            for f in ['fulltext', 'title',
> +                                      'description', 'tags']:
> +                                if f in metadata and \
> +                                        self._regex.match(metadata[f]):
> +                                    add_to_list = True
> +                                    break

Please break up this function. Nine levels of indentation are clearly too
much. Some refactoring would also help to make it easier to follow. E.g.
add_to_list = False could be replaced by a simple return.

> @@ -358,6 +371,17 @@ class InplaceResultSet(BaseResultSet):
>  
>  
>  def _get_file_metadata(path, stat):
> +    """Returns the metadata from the corresponding file
> +    on the external device or does create the metadata
> +    based on the file properties.
> +
> +    """
> +    filename = os.path.basename(path)
> +    dir_path = os.path.dirname(path)
> +    metadata = _get_file_metadata_from_json(dir_path, filename, preview=True)
> +    if metadata:
> +        return metadata
> +
>      client = gconf.client_get_default()
>      return {'uid': path,
>              'title': os.path.basename(path),
> @@ -370,6 +394,37 @@ def _get_file_metadata(path, stat):
>              'description': path}
>  
>  
> +def _get_file_metadata_from_json(dir_path, filename, preview=False):
> +    """Returns the metadata from the json file and the preview
> +    stored on the external device.
> +
> +    """

How are "binary" properties other than 'preview' handled? AFAICT JSON
can only transport unicode strings (json.dumps('\x85') throws a
UnicodeDecodeError).

> +    metadata = None
> +    metadata_path = os.path.join(dir_path,
> +                                 '.' + filename + '.metadata')
> +    if os.path.exists(metadata_path):
> +        try:
> +            metadata = json.load(open(metadata_path))
> +        except ValueError:
> +            logging.debug("Could not read metadata for file %r on" \
> +                              "external device.", filename)
> +        else:
> +            metadata['uid'] = os.path.join(dir_path, filename)
> +    if preview:
> +        preview_path = os.path.join(dir_path,
> +                                    '.' + filename + '.preview')
> +        if os.path.exists(preview_path):
> +            try:
> +                metadata['preview'] = dbus.ByteArray(open(preview_path).read())
> +            except:
> +                logging.debug("Could not read preview for file %r on" \
> +                                  "external device.", filename)
> +    else:
> +        if metadata and 'preview' in metadata:
> +            del(metadata['preview'])
> +    return metadata

I would prefer an approach with fewer and shorter conditional parts,
making it easier to follow the function flow:

    metadata = {'uid': os.path.join(dir_path, filename)}
    if os.path.exists(metadata_path):
        try:
            metadata = json.load(open(metadata_path))
        except (ValueError, UnicodeDecodeError):
            logging.warning("Could not read metadata for file %r on" \
                            "external device.", filename)

    if not preview:
        metadata.pop('preview')
        return metadata

    preview_path = os.path.join(dir_path, '.%s.preview' % (filename, ))
    if not os.path.exists(preview_path):
        return metadata

    try:
        metadata['preview'] = open(preview_path).read()
    except IOError:
        logging.warning("Could not read preview %r", filename)

    return metadata

> @@ -513,17 +578,83 @@ def write(metadata, file_path='', update_mtime=True, transfer_ownership=True):
> -        if not os.path.exists(file_path):
> -            raise ValueError('Entries without a file cannot be copied to '
> -                             'removable devices')
> +        object_id = _write_entry_on_external_device(metadata, file_path)
> +
> +    return object_id

Once you're done with your patch, we should replace all "object_id ="
in this function with "return".


> +def _write_entry_on_external_device(metadata, file_path):
> +    """This creates and updates an entry copied from the
> +    DS to external storage device. Besides copying the
> +    associated file a hidden file for the preview and one
> +    for the metadata are stored. We make sure that the
> +    metadata and preview file are in the same directory
> +    as the data file.
> +
> +    This function handles renames of an entry on the
> +    external device and avoids name collisions. Renames are
> +    handled failsafe.
> +
> +    """

Please follow PEP-257 [1]:
- one-line summary at the start of multi-line docstrings
- "The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ..."."

Please mention the return value.

> +    if 'uid' in metadata and os.path.exists(metadata['uid']):
> +        file_path = metadata['uid']

I don't understand why this is correct or even why you do it at all.

> +    if not file_path or not os.path.exists(file_path):
> +        raise ValueError('Entries without a file cannot be copied to '
> +                         'removable devices')

Why don't we allow this now that we store the metadata?

> +    file_name = _get_file_name(metadata['title'], metadata['mime_type'])

If we rely on mime_type being set, we should mention in the docstring
that we expect a DSMetadata instance (instead of a dictionary) to be
passed in.

> +    metadata_path = os.path.join(metadata['mountpoint'],
> +                                 '.' + file_name + '.metadata')

> +    (fh, fn) = tempfile.mkstemp(dir=metadata['mountpoint'])
> +    os.write(fh, json.dumps(metadata_copy))
> +    os.close(fh)
> +    os.rename(fn, metadata_path)

If the intention behind this pattern is to ensure that we don't end up
with garbage in the original file after a crash, you should fsync()
before rename() (i.e. before close()).
As you use it several times, please factor it out. atomic_file_replace()
might be a suitable name.

> +    if os.path.dirname(destination_path) == os.path.dirname(file_path):
[...]

It would be nice to have the rename functionality (which I guess this is)
factored out completely. If larger code blocks are in common, they could
be factored out into individual functions.

Sascha

[1] http://www.python.org/dev/peps/pep-0257/
--
http://sascha.silbe.org/
http://www.infra-silbe.de/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://lists.sugarlabs.org/archive/sugar-devel/attachments/20101126/c1e52e9f/attachment.pgp>


More information about the Sugar-devel mailing list