[Sugar-devel] [PATCH sugar-0.84] journal scan of external media

Sascha Silbe sascha-ml-reply-to-2010-3 at silbe.org
Wed Sep 8 11:32:39 EDT 2010


Excerpts from James Cameron's message of Wed Sep 08 09:00:31 +0200 2010:

> I've reworked it to be as asynchronous as I can make it without moving
> into a separate thread or process.

Nice!

> The patch below is a rewrite of the function _recurse_dir, and I've
> replaced tabs with spaces only because the indenting changes make the
> patch hard to read otherwise.

Are you sure you used spaces? The patch looked strangely formatted in
both my MUA and my editor.

> I'm not asking for the patch to be accepted as is, I presume I'd be
> asked to break it down into ten different patches so that each change
> is explained in a different way to how my thought processes proceeded.

<g>

> I think it needs to be split into smaller functions, and it doesn't
> address the issue of relative recursive links.

OK.

> +    def _recurse_dir(self):
> +    if self._stopped:
> +        return False
> +
> +        except OSError, e:
> +        if e.errno not in [errno.ENOENT, errno.EPERM]:
> +            logging.error('Error reading metadata of file %r: %s' %
> +                  (full_path, traceback.format_exc()))

You can simplify this by using logging.exception:

               logging.exception('Error reading metadata of file %r',
                   full_path)

While we're at it, let's improve on the original code:

> +        if self._regex is not None and \
> +            not self._regex.match(full_path):
> +            add_to_list = False

All following ifs should be elif to avoid further examining entries that
get skipped anyway. Especially for the MIME type comparison it will make
a huge difference as it scans the file content.

> +        if None not in [self._date_start, self._date_end] and \
> +            (stat.st_mtime < self._date_start or
> +             stat.st_mtime > self._date_end):
> +            add_to_list = False

How about either checking just self._date_start or splitting it up into
two separate rules?

> +        if self._mime_types:
> +            mime_type = gio.content_type_guess(filename=full_path)

We should use a single way to derive MIME types (sugar.mime) in our code
to avoid using different MIME types in different parts of Sugar.

Back to your code:

> +    if len(self._pending_directories) > 0:
> +        dir_path = self._pending_directories[0]
> +        self._pending_directories.remove(dir_path)

Why not .pop(0)?

> +        try:
> +        entries = os.listdir(dir_path)
> +        except OSError, e:
> +        if e.errno not in [errno.ENOENT, errno.EACCES,
> +                   errno.EPERM, errno.ENOTDIR]:
> +            logging.error('Error reading directory %r: %s' %
> +                  (dir_path, traceback.format_exc()))

logging.exception() again. Also should we really ignore ENOTDIR?
I don't see any way we could trigger ENOTDIR. If a symlink is broken,
we won't recognise it as a directory, even if it was meant to point to
a directory.

> +        self._pending_files.append(dir_path + '/' + entry)

What's the memory footprint of the two lists? I'm wondering whether it
might make sense to store only the relative path in self._pending_files
and construct the full path for each individual file. Would need to
either save the name of the current directory in an additional attribute
or adjust the code to remove the directory only after it was processed.

> +        self._pending_files.reverse()

Why? The order in which os.listdir() returns the entries is undefined
anyway.

Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
Url : http://lists.sugarlabs.org/archive/sugar-devel/attachments/20100908/5b42daeb/attachment-0001.pgp 


More information about the Sugar-devel mailing list