[Sugar-devel] [PATCH] Journal entry sharing using a mass storage device

Sascha Silbe sascha-ml-reply-to-2011-2 at silbe.org
Fri Feb 18 15:25:09 EST 2011


Excerpts from Simon Schampijer's message of Mon Feb 14 23:13:40 +0100 2011:

> http://wiki.sugarlabs.org/go/Features/Journal_Entry_Sharing

We should come up with a better description before we push the patch.


[src/jarabe/journal/model.py]
> +import json

As agreed on in IRC, we should stick with simplejson for 0.92.

> +JOURNAL_METADATA_DIR = '.Sugar-Metadata'

Do you intend to use this from other modules?


[InplaceResultSet.find()]
> -        for file_path, stat, mtime_, size_ in files:
> -            metadata = _get_file_metadata(file_path, stat)
> +        for file_path, stat, mtime_, metadata, size_ in files:
> +            if metadata is None:
> +                metadata = _get_file_metadata(file_path, stat)

I would prefer to handle metadata reading / construction in a single
place, i.e. to move the _get_file_metadata() call down below the
_get_file_metadata_from_json() call.

As mentioned on IRC we should remove the unused parts from the list. I
don't mind doing it in a follow-up patch, though.


[InplaceResultSet._scan_a_file()]
> @@ -365,7 +371,20 @@ class InplaceResultSet(BaseResultSet):
>  
>          if self._regex is not None and \
>                  not self._regex.match(full_path):
> -            return
> +            filename = os.path.basename(full_path)
> +            dir_path = os.path.dirname(full_path)
> +            metadata = _get_file_metadata_from_json( \
> +                dir_path, filename, preview=False)
> +            add_to_list = False
> +            if metadata is not None:
> +                for f in ['fulltext', 'title',
> +                          'description', 'tags']:
> +                    if f in metadata and \
> +                            self._regex.match(metadata[f]):
> +                        add_to_list = True
> +                        break
> +            if not add_to_list:
> +                return

This is getting rather deeply nested again. If we move it below the
other checks we can unconditionally parse resp. construct the metadata
and move the full text matches two indentation levels up.

BTW, is it intentional that you search only a subset of the properties,
unlike what the data store does?


[InplaceResultSet._get_file_metadata()]
>  def _get_file_metadata(path, stat):

We should consider renaming this to make clears it's constructing the
metadata based on file system metadata and heuristics (MIME type), as
opposed to reading the serialised metadata from disk as
_get_file_metadata_from_json() does.


> +def _get_file_metadata_from_json(dir_path, filename, preview=False):

The reference to preview in the code below confused me for a few
minutes. We should rename it to e.g. fetch_preview.

> +    """Returns the metadata from the json file and the preview
> +    stored on the external device. If the metadata is corrupted
> +    we do remove it and the preview as well.

PEP 257 (referenced by PEP 8) says:

>> The docstring is a phrase ending in a period. It prescribes the
>> function or method's effect as a command ("Do this", "Return that"),
>> not as a description; e.g. don't write "Returns the pathname ...".
[...]
>> Multi-line docstrings consist of a summary line just like a one-line
>> docstring, followed by a blank line, followed by a more elaborate
>> description.

> +    metadata = None
> +    metadata_path = os.path.join(dir_path, JOURNAL_METADATA_DIR,
> +                                 filename + '.metadata')
> +    preview_path = os.path.join(dir_path, JOURNAL_METADATA_DIR,
> +                                filename + '.preview')
> +
> +    if os.path.exists(metadata_path):
> +        try:

Please change this to "guardian" style to a) get rid of the indentation
and b) prevent an exception (that currently gets caught by your
catch-all except:) if the preview exists, but the metadata file doesn't.
I.e. please use:

    if not os.path.exists(metadata_path):
        return None

> +            metadata = json.load(open(metadata_path))
> +        except ValueError:

We should catch EnvironmentError (base class of IOError and OSError),
too. The external medium might be formatted using a non-FAT file system
(=> permission denied) or be defective (many different errors).

> +            os.unlink(metadata_path)
> +            if os.path.exists(preview_path):
> +                os.unlink(preview_path)

Dito.

> +            logging.debug("Could not read metadata for file %r on " \
> +                              "external device.", filename)

Since the file existed and gets removed by us, we should use log level
error.

> +        else:
> +            metadata['uid'] = os.path.join(dir_path, filename)

If we can't read the metadata (file does not exist or is corrupt), we
should call _get_file_metadata(). That will fill in several properties,
including uid.

I'm not sure if we should try reading the preview if we couldn't read
the metadata file.

> +    if preview:

Again, please use guardian style.

> +        if os.path.exists(preview_path):
> +            try:
> +                metadata['preview'] = dbus.ByteArray(open(preview_path).read())
> +            except:

PEP 8 says:

>> When catching exceptions, mention specific exceptions whenever
>> possible instead of using a bare 'except:' clause.

EnvironmentError would seem to be a good fit once we are rid of the
corner case metadata=None (see above).

> +                logging.debug("Could not read preview for file %r on " \
> +                                  "external device.", filename)

The backslash is not necessary here.


[delete()]
>      """
>      if os.path.exists(object_id):

Again guardian style would make it easier to understand (at least for
me). Just put the datastore case inside an
"if not os.path.exists(object_id)" block.

>          os.unlink(object_id)
> +        dir_path = os.path.dirname(object_id)
> +        filename = os.path.basename(object_id)
> +        old_files = [os.path.join(dir_path, JOURNAL_METADATA_DIR,
> +                                  filename + '.metadata'),
> +                     os.path.join(dir_path, JOURNAL_METADATA_DIR,
> +                                  filename + '.preview')]

This might be slightly more readable if we do the path construction
inside the loop. Or maybe not.

> +        for old_file in old_files:
> +            if os.path.exists(old_file):
> +                try:
> +                    os.unlink(old_file)
> +                except:
> +                    pass

We should at least log an error. And EnvironmentError again. :)
And I still wish regular for loops would take a condition like list
comprehensions do. :)

[write()]
>      else:
> -        if not os.path.exists(file_path):
> -            raise ValueError('Entries without a file cannot be copied to '
> -                             'removable devices')
> +        object_id = _write_entry_on_external_device(metadata, file_path)
> +
> +    return object_id

I like this change, but please reverse the condition to make it guardian
style. You can drop the intermediate variable object_id inside the (new)
if block, too.

[_write_entry_on_external_device()]
> +    """This creates and updates an entry copied from the
> +    DS to external storage device. Besides copying the
> +    associated file a hidden file for the preview and one
> +    for the metadata are stored in the hidden directory
> +    .Sugar-Metadata.

PEP 257 again.

> -        file_name = _get_file_name(metadata['title'], metadata['mime_type'])
> -        file_name = _get_unique_file_name(metadata['mountpoint'], file_name)
> +    This function handles renames of an entry on the
> +    external device and avoids name collisions. Renames are
> +    handled failsafe.
>  
> +    """
> +    if 'uid' in metadata and os.path.exists(metadata['uid']):
> +        file_path = metadata['uid']

Wouldn't this break in the case that you read a metadata file from disk,
containing a UUID for uid instead of the file path? Or do I
misunderstand what the code is supposed to do?

> +    if metadata['title'] == '':
> +        metadata['title'] = _('Untitled')

metadata might have been modified, so we shouldn't assume any
non-essential property exists:

    if not metadata.get('title'):

(None evaluates to False in boolean contexts, so we don't need to pass
a default value to get).

> +    file_name = get_file_name(metadata['title'], metadata['mime_type'])

Dito.

> +    destination_path = os.path.join(metadata['mountpoint'], file_name)
> +    if destination_path != file_path:
> +        file_name = get_unique_file_name(metadata['mountpoint'], file_name)
>          destination_path = os.path.join(metadata['mountpoint'], file_name)
> +        clean_name, extension_ = os.path.splitext(file_name)
> +        metadata['title'] = clean_name

We shouldn't override a potentially user-chosen title.

> +    if 'uid' in metadata_copy:
> +        del metadata_copy['uid']

Why?

> +    metadata_dir_path = os.path.join(metadata['mountpoint'],
> +                                     JOURNAL_METADATA_DIR)

Given the many times we construct these names, each time taking a line
folded one or several times, it would make sense to create a small
function to do it.

> +    if not os.path.exists(metadata_dir_path):
> +        os.mkdir(metadata_dir_path)
> +
> +    if 'preview' in metadata_copy:
> +        preview = metadata_copy['preview']
> +        preview_fname = file_name + '.preview'
> +        preview_path = os.path.join(metadata['mountpoint'],
> +                                    JOURNAL_METADATA_DIR, preview_fname)
> +        metadata_copy['preview'] = preview_fname

Why don't we just .pop() the preview? preview_path is an absolute path
on the system writing the entry. It's not going to be useful to the
reader. Or rather if it is, we don't want it to be (information
leakage). :)

> +        (fh, fn) = tempfile.mkstemp(dir=metadata['mountpoint'])
> +        os.write(fh, preview)
> +        os.close(fh)
> +        os.rename(fn, preview_path)

Candidate for factoring out (it's used at least two times).

> +    metadata_path = os.path.join(metadata['mountpoint'],
> +                                 JOURNAL_METADATA_DIR,
> +                                 file_name + '.metadata')
> +    (fh, fn) = tempfile.mkstemp(dir=metadata['mountpoint'])
> +    os.write(fh, json.dumps(metadata_copy))
> +    os.close(fh)
> +    os.rename(fn, metadata_path)

We might want to do the json.dumps() early so that trying to write an
entry with invalid metadata fails right away instead of starting to
overwrite parts of the entry.

> +            for ofile in old_files:
> +                if os.path.exists(ofile):
> +                    try:
> +                        os.unlink(ofile)
> +                    except:
> +                        pass
Error message, EnvironmentError.

[...]

This function (_write_entry_on_external_device()) is rather long. Can we
split it up into several functions, please? data, preview and metadata
writing and entry renaming would be natural candidates.

> -def _get_file_name(title, mime_type):
> +def get_file_name(title, mime_type):
[...]
> -def _get_unique_file_name(mount_point, file_name):
> +def get_unique_file_name(mount_point, file_name):

Aleksey is going to be delighted - one of his outstanding patches does
the same thing. :)

Sascha

-- 
http://sascha.silbe.org/
http://www.infra-silbe.de/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 494 bytes
Desc: not available
URL: <http://lists.sugarlabs.org/archive/sugar-devel/attachments/20110218/b1352a84/attachment-0001.pgp>


More information about the Sugar-devel mailing list