[Sugar-devel] [PATCH] Journal entry sharing using a mass storage device
Simon Schampijer
simon at schampijer.de
Mon Feb 21 15:40:02 EST 2011
On 02/18/2011 03:25 PM, Sascha Silbe wrote:
> Excerpts from Simon Schampijer's message of Mon Feb 14 23:13:40 +0100 2011:
>
>> http://wiki.sugarlabs.org/go/Features/Journal_Entry_Sharing
>
> We should come up with a better description before we push the patch.
I would name it:
"Add functionality to copy a Journal entry including the metadata to a
removable device"
> [src/jarabe/journal/model.py]
>> +import json
>
> As agreed on in IRC, we should stick with simplejson for 0.92.
Done.
>> +JOURNAL_METADATA_DIR = '.Sugar-Metadata'
>
> Do you intend to use this from other modules?
Yes, from from "[PATCH] Convert Journal entries that have been saved to
a storage device in 0.82".
> [InplaceResultSet.find()]
>> - for file_path, stat, mtime_, size_ in files:
>> - metadata = _get_file_metadata(file_path, stat)
>> + for file_path, stat, mtime_, metadata, size_ in files:
>> + if metadata is None:
>> + metadata = _get_file_metadata(file_path, stat)
>
> I would prefer to handle metadata reading / construction in a single
> place, i.e. to move the _get_file_metadata() call down below the
> _get_file_metadata_from_json() call.
Actually, I did that but then the scanning time took longer. So I
reverted to the previous behavior.
> As mentioned on IRC we should remove the unused parts from the list. I
> don't mind doing it in a follow-up patch, though.
Actually, I think you need these in order to do the filter the entries.
> [InplaceResultSet._scan_a_file()]
>> @@ -365,7 +371,20 @@ class InplaceResultSet(BaseResultSet):
>>
>> if self._regex is not None and \
>> not self._regex.match(full_path):
>> - return
>> + filename = os.path.basename(full_path)
>> + dir_path = os.path.dirname(full_path)
>> + metadata = _get_file_metadata_from_json( \
>> + dir_path, filename, preview=False)
>> + add_to_list = False
>> + if metadata is not None:
>> + for f in ['fulltext', 'title',
>> + 'description', 'tags']:
>> + if f in metadata and \
>> + self._regex.match(metadata[f]):
>> + add_to_list = True
>> + break
>> + if not add_to_list:
>> + return
>
> This is getting rather deeply nested again. If we move it below the
> other checks we can unconditionally parse resp. construct the metadata
> and move the full text matches two indentation levels up.
I was able to remove one level. I hope this is ok now.
> BTW, is it intentional that you search only a subset of the properties,
> unlike what the data store does?
Before this patch you could only search for filenames on a removable
device. Now you can as well search for title, description, tags for
items that do have metadata. In general searches like 'Write activity
entries' are not supported for a removable device. Maybe we could add
this later, for now I think it is fine like that.
> [InplaceResultSet._get_file_metadata()]
>> def _get_file_metadata(path, stat):
>
> We should consider renaming this to make clears it's constructing the
> metadata based on file system metadata and heuristics (MIME type), as
> opposed to reading the serialised metadata from disk as
> _get_file_metadata_from_json() does.
The method gets now the metadata for a file, if available from the json
file otherwise it constructs it. So I think this is the correct name now.
>> +def _get_file_metadata_from_json(dir_path, filename, preview=False):
>
> The reference to preview in the code below confused me for a few
> minutes. We should rename it to e.g. fetch_preview.
Good catch, done.
>> + """Returns the metadata from the json file and the preview
>> + stored on the external device. If the metadata is corrupted
>> + we do remove it and the preview as well.
>
> PEP 257 (referenced by PEP 8) says:
>
>>> The docstring is a phrase ending in a period. It prescribes the
>>> function or method's effect as a command ("Do this", "Return that"),
>>> not as a description; e.g. don't write "Returns the pathname ...".
> [...]
>>> Multi-line docstrings consist of a summary line just like a one-line
>>> docstring, followed by a blank line, followed by a more elaborate
>>> description.
Done.
>> + metadata = None
>> + metadata_path = os.path.join(dir_path, JOURNAL_METADATA_DIR,
>> + filename + '.metadata')
>> + preview_path = os.path.join(dir_path, JOURNAL_METADATA_DIR,
>> + filename + '.preview')
>> +
>> + if os.path.exists(metadata_path):
>> + try:
>
> Please change this to "guardian" style to a) get rid of the indentation
> and b) prevent an exception (that currently gets caught by your
> catch-all except:) if the preview exists, but the metadata file doesn't.
> I.e. please use:
>
> if not os.path.exists(metadata_path):
> return None
Done.
>> + metadata = json.load(open(metadata_path))
>> + except ValueError:
>
> We should catch EnvironmentError (base class of IOError and OSError),
> too. The external medium might be formatted using a non-FAT file system
> (=> permission denied) or be defective (many different errors).
Yes, sounds good, done.
>> + os.unlink(metadata_path)
>> + if os.path.exists(preview_path):
>> + os.unlink(preview_path)
>
> Dito.
Done.
>> + logging.debug("Could not read metadata for file %r on " \
>> + "external device.", filename)
>
> Since the file existed and gets removed by us, we should use log level
> error.
Done.
>> + else:
>> + metadata['uid'] = os.path.join(dir_path, filename)
>
> If we can't read the metadata (file does not exist or is corrupt), we
> should call _get_file_metadata(). That will fill in several properties,
> including uid.
Actually the case you describe would be handled like that.
> I'm not sure if we should try reading the preview if we couldn't read
> the metadata file.
We actually don't. It is taken care of.
>> + if preview:
>
> Again, please use guardian style.
>
>> + if os.path.exists(preview_path):
>> + try:
>> + metadata['preview'] = dbus.ByteArray(open(preview_path).read())
>> + except:
>
> PEP 8 says:
>
>>> When catching exceptions, mention specific exceptions whenever
>>> possible instead of using a bare 'except:' clause.
>
> EnvironmentError would seem to be a good fit once we are rid of the
> corner case metadata=None (see above).
Done.
>> + logging.debug("Could not read preview for file %r on " \
>> + "external device.", filename)
>
> The backslash is not necessary here.
Done.
> [delete()]
>> """
>> if os.path.exists(object_id):
>
> Again guardian style would make it easier to understand (at least for
> me). Just put the datastore case inside an
> "if not os.path.exists(object_id)" block.
Done.
>> os.unlink(object_id)
>> + dir_path = os.path.dirname(object_id)
>> + filename = os.path.basename(object_id)
>> + old_files = [os.path.join(dir_path, JOURNAL_METADATA_DIR,
>> + filename + '.metadata'),
>> + os.path.join(dir_path, JOURNAL_METADATA_DIR,
>> + filename + '.preview')]
>
> This might be slightly more readable if we do the path construction
> inside the loop. Or maybe not.
I think it is fine like that, actually.
>> + for old_file in old_files:
>> + if os.path.exists(old_file):
>> + try:
>> + os.unlink(old_file)
>> + except:
>> + pass
>
> We should at least log an error. And EnvironmentError again. :)
> And I still wish regular for loops would take a condition like list
> comprehensions do. :)
Done.
> [write()]
>> else:
>> - if not os.path.exists(file_path):
>> - raise ValueError('Entries without a file cannot be copied to '
>> - 'removable devices')
>> + object_id = _write_entry_on_external_device(metadata, file_path)
>> +
>> + return object_id
>
> I like this change, but please reverse the condition to make it guardian
> style. You can drop the intermediate variable object_id inside the (new)
> if block, too.
The style is still the same just inside _write_entry_on_external_device.
> [_write_entry_on_external_device()]
>> + """This creates and updates an entry copied from the
>> + DS to external storage device. Besides copying the
>> + associated file a hidden file for the preview and one
>> + for the metadata are stored in the hidden directory
>> + .Sugar-Metadata.
>
> PEP 257 again.
Done.
>> - file_name = _get_file_name(metadata['title'], metadata['mime_type'])
>> - file_name = _get_unique_file_name(metadata['mountpoint'], file_name)
>> + This function handles renames of an entry on the
>> + external device and avoids name collisions. Renames are
>> + handled failsafe.
>>
>> + """
>> + if 'uid' in metadata and os.path.exists(metadata['uid']):
>> + file_path = metadata['uid']
>
> Wouldn't this break in the case that you read a metadata file from disk,
> containing a UUID for uid instead of the file path? Or do I
> misunderstand what the code is supposed to do?
>> + if metadata['title'] == '':
>> + metadata['title'] = _('Untitled')
>
> metadata might have been modified, so we shouldn't assume any
> non-essential property exists:
>
> if not metadata.get('title'):
>
> (None evaluates to False in boolean contexts, so we don't need to pass
> a default value to get).
Ok, done.
>> + file_name = get_file_name(metadata['title'], metadata['mime_type'])
>
> Dito.
>
>> + destination_path = os.path.join(metadata['mountpoint'], file_name)
>> + if destination_path != file_path:
>> + file_name = get_unique_file_name(metadata['mountpoint'], file_name)
>> destination_path = os.path.join(metadata['mountpoint'], file_name)
>> + clean_name, extension_ = os.path.splitext(file_name)
>> + metadata['title'] = clean_name
>
> We shouldn't override a potentially user-chosen title
The clean_name is based on the title (get_file_name), we need to reset
the title here in this case since get_unique_file_name has changed it
and the title should match the filename.
>> + if 'uid' in metadata_copy:
>> + del metadata_copy['uid']
>
> Why?
We don't want to carry the 'uid' metadata around on the usb key. It only
has a meaning when we store something in the DS.
>> + metadata_dir_path = os.path.join(metadata['mountpoint'],
>> + JOURNAL_METADATA_DIR)
>
> Given the many times we construct these names, each time taking a line
> folded one or several times, it would make sense to create a small
> function to do it.
Oh, I actually just have to use metadata_dir_path in the other occasions.
>> + if not os.path.exists(metadata_dir_path):
>> + os.mkdir(metadata_dir_path)
>> +
>> + if 'preview' in metadata_copy:
>> + preview = metadata_copy['preview']
>> + preview_fname = file_name + '.preview'
>> + preview_path = os.path.join(metadata['mountpoint'],
>> + JOURNAL_METADATA_DIR, preview_fname)
>> + metadata_copy['preview'] = preview_fname
>
> Why don't we just .pop() the preview? preview_path is an absolute path
> on the system writing the entry. It's not going to be useful to the
> reader. Or rather if it is, we don't want it to be (information
> leakage). :)
Yes, sounds like a good idea. I pop('preview', None) it now.
>> + (fh, fn) = tempfile.mkstemp(dir=metadata['mountpoint'])
>> + os.write(fh, preview)
>> + os.close(fh)
>> + os.rename(fn, preview_path)
>
> Candidate for factoring out (it's used at least two times).
I think here with passing the args around we do not gain much by
factoring it out.
>> + metadata_path = os.path.join(metadata['mountpoint'],
>> + JOURNAL_METADATA_DIR,
>> + file_name + '.metadata')
>> + (fh, fn) = tempfile.mkstemp(dir=metadata['mountpoint'])
>> + os.write(fh, json.dumps(metadata_copy))
>> + os.close(fh)
>> + os.rename(fn, metadata_path)
>
> We might want to do the json.dumps() early so that trying to write an
> entry with invalid metadata fails right away instead of starting to
> overwrite parts of the entry.
Done.
>> + for ofile in old_files:
>> + if os.path.exists(ofile):
>> + try:
>> + os.unlink(ofile)
>> + except:
>> + pass
> Error message, EnvironmentError.
>
> [...]
>
> This function (_write_entry_on_external_device()) is rather long. Can we
> split it up into several functions, please? data, preview and metadata
> writing and entry renaming would be natural candidates.
I split out renaming (I think it is clearer now), for the others I think
it would be more passing around of args than it would be beneficial.
>> -def _get_file_name(title, mime_type):
>> +def get_file_name(title, mime_type):
> [...]
>> -def _get_unique_file_name(mount_point, file_name):
>> +def get_unique_file_name(mount_point, file_name):
>
> Aleksey is going to be delighted - one of his outstanding patches does
> the same thing. :)
Yes, I need to call them from "[PATCH] Convert Journal entries that have
been saved to a storage device in 0.82".
Thanks for the thorough review,
Simon
More information about the Sugar-devel
mailing list