[Sugar-devel] [PATCH] Journal entry sharing using a mass storage device

Simon Schampijer simon at schampijer.de
Mon Feb 21 15:40:02 EST 2011


On 02/18/2011 03:25 PM, Sascha Silbe wrote:
> Excerpts from Simon Schampijer's message of Mon Feb 14 23:13:40 +0100 2011:
>
>> http://wiki.sugarlabs.org/go/Features/Journal_Entry_Sharing
>
> We should come up with a better description before we push the patch.

I would name it:

"Add functionality to copy a Journal entry including the metadata to a 
removable device"

> [src/jarabe/journal/model.py]
>> +import json
>
> As agreed on in IRC, we should stick with simplejson for 0.92.

Done.

>> +JOURNAL_METADATA_DIR = '.Sugar-Metadata'
>
> Do you intend to use this from other modules?

Yes, from from "[PATCH] Convert Journal entries that have been saved to 
a storage device in 0.82".

> [InplaceResultSet.find()]
>> -        for file_path, stat, mtime_, size_ in files:
>> -            metadata = _get_file_metadata(file_path, stat)
>> +        for file_path, stat, mtime_, metadata, size_ in files:
>> +            if metadata is None:
>> +                metadata = _get_file_metadata(file_path, stat)
>
> I would prefer to handle metadata reading / construction in a single
> place, i.e. to move the _get_file_metadata() call down below the
> _get_file_metadata_from_json() call.

Actually, I did that but then the scanning time took longer. So I 
reverted to the previous behavior.

> As mentioned on IRC we should remove the unused parts from the list. I
> don't mind doing it in a follow-up patch, though.

Actually, I think you need these in order to do the filter the entries.

> [InplaceResultSet._scan_a_file()]
>> @@ -365,7 +371,20 @@ class InplaceResultSet(BaseResultSet):
>>
>>           if self._regex is not None and \
>>                   not self._regex.match(full_path):
>> -            return
>> +            filename = os.path.basename(full_path)
>> +            dir_path = os.path.dirname(full_path)
>> +            metadata = _get_file_metadata_from_json( \
>> +                dir_path, filename, preview=False)
>> +            add_to_list = False
>> +            if metadata is not None:
>> +                for f in ['fulltext', 'title',
>> +                          'description', 'tags']:
>> +                    if f in metadata and \
>> +                            self._regex.match(metadata[f]):
>> +                        add_to_list = True
>> +                        break
>> +            if not add_to_list:
>> +                return
>
> This is getting rather deeply nested again. If we move it below the
> other checks we can unconditionally parse resp. construct the metadata
> and move the full text matches two indentation levels up.

I was able to remove one level. I hope this is ok now.

> BTW, is it intentional that you search only a subset of the properties,
> unlike what the data store does?

Before this patch you could only search for filenames on a removable 
device. Now you can as well search for title, description, tags for 
items that do have metadata. In general searches like 'Write activity 
entries' are not supported for a removable device. Maybe we could add 
this later, for now I think it is fine like that.

> [InplaceResultSet._get_file_metadata()]
>>   def _get_file_metadata(path, stat):
>
> We should consider renaming this to make clears it's constructing the
> metadata based on file system metadata and heuristics (MIME type), as
> opposed to reading the serialised metadata from disk as
> _get_file_metadata_from_json() does.

The method gets now the metadata for a file, if available from the json 
file otherwise it constructs it. So I think this is the correct name now.

>> +def _get_file_metadata_from_json(dir_path, filename, preview=False):
>
> The reference to preview in the code below confused me for a few
> minutes. We should rename it to e.g. fetch_preview.

Good catch, done.

>> +    """Returns the metadata from the json file and the preview
>> +    stored on the external device. If the metadata is corrupted
>> +    we do remove it and the preview as well.
>
> PEP 257 (referenced by PEP 8) says:
>
>>> The docstring is a phrase ending in a period. It prescribes the
>>> function or method's effect as a command ("Do this", "Return that"),
>>> not as a description; e.g. don't write "Returns the pathname ...".
> [...]
>>> Multi-line docstrings consist of a summary line just like a one-line
>>> docstring, followed by a blank line, followed by a more elaborate
>>> description.

Done.

>> +    metadata = None
>> +    metadata_path = os.path.join(dir_path, JOURNAL_METADATA_DIR,
>> +                                 filename + '.metadata')
>> +    preview_path = os.path.join(dir_path, JOURNAL_METADATA_DIR,
>> +                                filename + '.preview')
>> +
>> +    if os.path.exists(metadata_path):
>> +        try:
>
> Please change this to "guardian" style to a) get rid of the indentation
> and b) prevent an exception (that currently gets caught by your
> catch-all except:) if the preview exists, but the metadata file doesn't.
> I.e. please use:
>
>      if not os.path.exists(metadata_path):
>          return None

Done.

>> +            metadata = json.load(open(metadata_path))
>> +        except ValueError:
>
> We should catch EnvironmentError (base class of IOError and OSError),
> too. The external medium might be formatted using a non-FAT file system
> (=>  permission denied) or be defective (many different errors).

Yes, sounds good, done.

>> +            os.unlink(metadata_path)
>> +            if os.path.exists(preview_path):
>> +                os.unlink(preview_path)
>
> Dito.

Done.

>> +            logging.debug("Could not read metadata for file %r on " \
>> +                              "external device.", filename)
>
> Since the file existed and gets removed by us, we should use log level
> error.

Done.

>> +        else:
>> +            metadata['uid'] = os.path.join(dir_path, filename)
>
> If we can't read the metadata (file does not exist or is corrupt), we
> should call _get_file_metadata(). That will fill in several properties,
> including uid.

Actually the case you describe would be handled like that.

> I'm not sure if we should try reading the preview if we couldn't read
> the metadata file.

We actually don't. It is taken care of.

>> +    if preview:
>
> Again, please use guardian style.
>
>> +        if os.path.exists(preview_path):
>> +            try:
>> +                metadata['preview'] = dbus.ByteArray(open(preview_path).read())
>> +            except:
>
> PEP 8 says:
>
>>> When catching exceptions, mention specific exceptions whenever
>>> possible instead of using a bare 'except:' clause.
>
> EnvironmentError would seem to be a good fit once we are rid of the
> corner case metadata=None (see above).

Done.

>> +                logging.debug("Could not read preview for file %r on " \
>> +                                  "external device.", filename)
>
> The backslash is not necessary here.

Done.

> [delete()]
>>       """
>>       if os.path.exists(object_id):
>
> Again guardian style would make it easier to understand (at least for
> me). Just put the datastore case inside an
> "if not os.path.exists(object_id)" block.

Done.

>>           os.unlink(object_id)
>> +        dir_path = os.path.dirname(object_id)
>> +        filename = os.path.basename(object_id)
>> +        old_files = [os.path.join(dir_path, JOURNAL_METADATA_DIR,
>> +                                  filename + '.metadata'),
>> +                     os.path.join(dir_path, JOURNAL_METADATA_DIR,
>> +                                  filename + '.preview')]
>
> This might be slightly more readable if we do the path construction
> inside the loop. Or maybe not.

I think it is fine like that, actually.

>> +        for old_file in old_files:
>> +            if os.path.exists(old_file):
>> +                try:
>> +                    os.unlink(old_file)
>> +                except:
>> +                    pass
>
> We should at least log an error. And EnvironmentError again. :)
> And I still wish regular for loops would take a condition like list
> comprehensions do. :)

Done.

> [write()]
>>       else:
>> -        if not os.path.exists(file_path):
>> -            raise ValueError('Entries without a file cannot be copied to '
>> -                             'removable devices')
>> +        object_id = _write_entry_on_external_device(metadata, file_path)
>> +
>> +    return object_id
>
> I like this change, but please reverse the condition to make it guardian
> style. You can drop the intermediate variable object_id inside the (new)
> if block, too.

The style is still the same just inside _write_entry_on_external_device.

> [_write_entry_on_external_device()]
>> +    """This creates and updates an entry copied from the
>> +    DS to external storage device. Besides copying the
>> +    associated file a hidden file for the preview and one
>> +    for the metadata are stored in the hidden directory
>> +    .Sugar-Metadata.
>
> PEP 257 again.

Done.

>> -        file_name = _get_file_name(metadata['title'], metadata['mime_type'])
>> -        file_name = _get_unique_file_name(metadata['mountpoint'], file_name)
>> +    This function handles renames of an entry on the
>> +    external device and avoids name collisions. Renames are
>> +    handled failsafe.
>>
>> +    """
>> +    if 'uid' in metadata and os.path.exists(metadata['uid']):
>> +        file_path = metadata['uid']
>
> Wouldn't this break in the case that you read a metadata file from disk,
> containing a UUID for uid instead of the file path? Or do I
> misunderstand what the code is supposed to do?



>> +    if metadata['title'] == '':
>> +        metadata['title'] = _('Untitled')
>
> metadata might have been modified, so we shouldn't assume any
> non-essential property exists:
>
>      if not metadata.get('title'):
>
> (None evaluates to False in boolean contexts, so we don't need to pass
> a default value to get).

Ok, done.

>> +    file_name = get_file_name(metadata['title'], metadata['mime_type'])
>
> Dito.
>
>> +    destination_path = os.path.join(metadata['mountpoint'], file_name)
>> +    if destination_path != file_path:
>> +        file_name = get_unique_file_name(metadata['mountpoint'], file_name)
>>           destination_path = os.path.join(metadata['mountpoint'], file_name)
>> +        clean_name, extension_ = os.path.splitext(file_name)
>> +        metadata['title'] = clean_name
>
> We shouldn't override a potentially user-chosen title

The clean_name is based on the title (get_file_name), we need to reset 
the title here in this case since get_unique_file_name has changed it 
and the title should match the filename.

>> +    if 'uid' in metadata_copy:
>> +        del metadata_copy['uid']
>
> Why?

We don't want to carry the 'uid' metadata around on the usb key. It only 
has a meaning when we store something in the DS.

>> +    metadata_dir_path = os.path.join(metadata['mountpoint'],
>> +                                     JOURNAL_METADATA_DIR)
>
> Given the many times we construct these names, each time taking a line
> folded one or several times, it would make sense to create a small
> function to do it.

Oh, I actually just have to use metadata_dir_path in the other occasions.

>> +    if not os.path.exists(metadata_dir_path):
>> +        os.mkdir(metadata_dir_path)
>> +
>> +    if 'preview' in metadata_copy:
>> +        preview = metadata_copy['preview']
>> +        preview_fname = file_name + '.preview'
>> +        preview_path = os.path.join(metadata['mountpoint'],
>> +                                    JOURNAL_METADATA_DIR, preview_fname)
>> +        metadata_copy['preview'] = preview_fname
>
> Why don't we just .pop() the preview? preview_path is an absolute path
> on the system writing the entry. It's not going to be useful to the
> reader. Or rather if it is, we don't want it to be (information
> leakage). :)

Yes, sounds like a good idea. I pop('preview', None) it now.

>> +        (fh, fn) = tempfile.mkstemp(dir=metadata['mountpoint'])
>> +        os.write(fh, preview)
>> +        os.close(fh)
>> +        os.rename(fn, preview_path)
>
> Candidate for factoring out (it's used at least two times).

I think here with passing the args around we do not gain much by 
factoring it out.

>> +    metadata_path = os.path.join(metadata['mountpoint'],
>> +                                 JOURNAL_METADATA_DIR,
>> +                                 file_name + '.metadata')
>> +    (fh, fn) = tempfile.mkstemp(dir=metadata['mountpoint'])
>> +    os.write(fh, json.dumps(metadata_copy))
>> +    os.close(fh)
>> +    os.rename(fn, metadata_path)
>
> We might want to do the json.dumps() early so that trying to write an
> entry with invalid metadata fails right away instead of starting to
> overwrite parts of the entry.

Done.

>> +            for ofile in old_files:
>> +                if os.path.exists(ofile):
>> +                    try:
>> +                        os.unlink(ofile)
>> +                    except:
>> +                        pass
> Error message, EnvironmentError.
>
> [...]
>
> This function (_write_entry_on_external_device()) is rather long. Can we
> split it up into several functions, please? data, preview and metadata
> writing and entry renaming would be natural candidates.

I split out renaming (I think it is clearer now), for the others I think 
it would be more passing around of args than it would be beneficial.

>> -def _get_file_name(title, mime_type):
>> +def get_file_name(title, mime_type):
> [...]
>> -def _get_unique_file_name(mount_point, file_name):
>> +def get_unique_file_name(mount_point, file_name):
>
> Aleksey is going to be delighted - one of his outstanding patches does
> the same thing. :)

Yes, I need to call them from "[PATCH] Convert Journal entries that have 
been saved to a storage device in 0.82".

Thanks for the thorough review,
    Simon


More information about the Sugar-devel mailing list