[Sugar-devel] [PATCH sugar-0.84] journal scan of external media
James Cameron
quozl at laptop.org
Mon Sep 13 05:19:07 EDT 2010
On Fri, Sep 10, 2010 at 05:22:58PM +0200, Sascha Silbe wrote:
> Excerpts from James Cameron's message of Thu Sep 09 03:24:36 +0200 2010:
> > > > + if None not in [self._date_start, self._date_end] and \
> > > > + (stat.st_mtime < self._date_start or
> > > > + stat.st_mtime > self._date_end):
> > > > + add_to_list = False
> > >
> > > How about either checking just self._date_start or splitting it up into
> > > two separate rules?
> >
> > Don't understand, sorry.
>
> I mean either:
>
> if self._date_start is not None and self.st_mtime < self._date_start:
> return True
> if self._date_end is not None and self.st_mtime > self._date_end:
> return True
Thanks, taken.
> [ENOTDIR during os.listdir()]
> > The reproducible instance is /media/disk/dev/fd/24 ... since fd is a
> > symlink to /proc/self/fd ... (awesome isn't it? One can insert a USB key
> > that will list the contents of any directory on the laptop, provided
> > access is allowed.)
>
> Ah, it's good that came up during testing, didn't think about it. We
> should make sure symlinks don't point outside the filesystem by resolving
> them (os.path.abspath()) and checking against the root path.
Agreed. ENOTDIR removed and os.path.abspath added.
> > os.lstat of an entry returned by os.listdir
> [...]
> > exception EIO a total of 7 times
>
> Interesting. Was the storage medium broken in some way (read error)?
> I don't even see EIO as a documented error code for lstat(2).
Yes, storage medium is broken. It is a USB HDD with reallocated sectors
and no spare area sectors available, or with a pending unstable sector
waiting to be remapped. dmesg shows:
[ 5070.582713] sd 7:0:0:0: [sda] Unhandled sense code
[ 5070.582726] sd 7:0:0:0: [sda] Result: hostbyte=DID_ERROR driverbyte=DRIVER_SENSE
[ 5070.582740] sd 7:0:0:0: [sda] Sense Key : Hardware Error [current]
[ 5070.582757] sd 7:0:0:0: [sda] Add. Sense: No additional sense information
[ 5070.582774] end_request: I/O error, dev sda, sector 66846816
[ 5070.584114] EXT2-fs error (device sda1): ext2_get_inode: unable to read inode block - inode=2084987, block=8355848
The drive itself is reasonably old and should have completely failed
long ago.
> > > Why? The order in which os.listdir() returns the entries is undefined
> > > anyway.
> >
> > When one uses os.listdir and then reverses the order before going
> > through them, the resulting performance is low, since the files are
> > being accessed in reverse. Because I didn't know about pop(0), I had to
> > use .reverse() to maintain ordering.
>
> Ah, don't you love how undefined behaviour still affects performance?
I think that's quite expected, especially since performance is equally
undefined. Although the order may not be defined, there is likely to be
an ordering imposed by the underlying implementation.
> [_scan()]
> > + if len(self._pending_files) > 0:
> > + return self._scan_a_file()
> > +
> > + if len(self._pending_directories) > 0:
> > + return self._scan_a_directory()
>
> PEP-8 says we should use "if self._pending_*" instead (without len()).
>
> If we just call the functions and return True in _scan(), we don't
> need to ensure that we always return True in self._scan_a_*().
Thanks, taken both points.
On Fri, Sep 10, 2010 at 06:27:50PM +0200, Sascha Silbe wrote:
> Excerpts from James Cameron's message of Thu Sep 09 06:39:53 +0200 2010:
>
> > + if link == '.':
> > + return True
>
> > + if link.startswith('/') and full_path.startswith(link):
> > + return True
>
> Both shouldn't be necessary (will be catched by the loop detection).
>
> As mentioned in the other mail, we should also make sure we stay on the
> current device:
>
> if not os.path.abspath(full_path).startswith(self._mount_point):
> return True
Taken, with modification ... only need to check the absolute path of a
link after it is read, not before.
Repeated test on Sugar 0.84 with my set of working and broken devices.
The following diff is proposed for final review.
diff --git a/src/jarabe/journal/model.py b/src/jarabe/journal/model.py
index 50e8dc1..6b1011c 100644
--- a/src/jarabe/journal/model.py
+++ b/src/jarabe/journal/model.py
@@ -1,4 +1,4 @@
-# Copyright (C) 2007-2008, One Laptop Per Child
+# Copyright (C) 2007-2010, One Laptop per Child
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
@@ -16,10 +16,11 @@
import logging
import os
+import errno
from datetime import datetime
import time
import shutil
-from stat import S_IFMT, S_IFDIR, S_IFREG
+from stat import S_IFLNK, S_IFMT, S_IFDIR, S_IFREG
import traceback
import re
@@ -258,7 +259,9 @@ class InplaceResultSet(BaseResultSet):
BaseResultSet.__init__(self, query, cache_limit)
self._mount_point = mount_point
self._file_list = None
- self._pending_directories = 0
+ self._pending_directories = []
+ self._visited_directories = []
+ self._pending_files = []
self._stopped = False
query_text = query.get('query', '')
@@ -283,7 +286,10 @@ class InplaceResultSet(BaseResultSet):
def setup(self):
self._file_list = []
- self._recurse_dir(self._mount_point)
+ self._pending_directories = [self._mount_point]
+ self._visited_directories = []
+ self._pending_files = []
+ gobject.idle_add(self._scan)
def stop(self):
self._stopped = True
@@ -317,51 +323,100 @@ class InplaceResultSet(BaseResultSet):
return entries, total_count
- def _recurse_dir(self, dir_path):
+ def _scan(self):
if self._stopped:
+ return False
+
+ self.progress.send(self)
+
+ if self._pending_files:
+ self._scan_a_file()
+ return True
+
+ if self._pending_directories:
+ self._scan_a_directory()
+ return True
+
+ self.setup_ready()
+ self._visited_directories = []
+ return False
+
+ def _scan_a_file(self):
+ full_path = self._pending_files.pop(0)
+
+ try:
+ stat = os.lstat(full_path)
+ except OSError, e:
+ if e.errno != errno.ENOENT:
+ logging.exception(
+ 'Error reading metadata of file %r', full_path)
+ return
+
+ if S_IFMT(stat.st_mode) == S_IFLNK:
+ try:
+ link = os.readlink(full_path)
+ except OSError, e:
+ logging.exception(
+ 'Error reading target of link %r', full_path)
+ return
+
+ if not os.path.abspath(link).startswith(self._mount_point):
+ return
+
+ try:
+ stat = os.stat(full_path)
+
+ except OSError, e:
+ if e.errno != errno.ENOENT:
+ logging.exception(
+ 'Error reading metadata of linked file %r', full_path)
+ return
+
+ if S_IFMT(stat.st_mode) == S_IFDIR:
+ id_tuple = stat.st_ino, stat.st_dev
+ if not id_tuple in self._visited_directories:
+ self._visited_directories.append(id_tuple)
+ self._pending_directories.append(full_path)
+ return
+
+ if S_IFMT(stat.st_mode) != S_IFREG:
+ return
+
+ if self._regex is not None and \
+ not self._regex.match(full_path):
+ return
+
+ if self._date_start is not None and self.st_mtime < self._date_start:
+ return
+
+ if self._date_end is not None and self.st_mtime > self._date_end:
+ return
+
+ if self._mime_types:
+ mime_type = gio.content_type_guess(filename=full_path)
+ if mime_type not in self._mime_types:
+ return
+
+ file_info = (full_path, stat, int(stat.st_mtime))
+ self._file_list.append(file_info)
+
+ return
+
+ def _scan_a_directory(self):
+ dir_path = self._pending_directories.pop(0)
+
+ try:
+ entries = os.listdir(dir_path)
+ except OSError, e:
+ if e.errno != errno.EACCES:
+ logging.exception('Error reading directory %r', dir_path)
return
- for entry in os.listdir(dir_path):
+ for entry in entries:
if entry.startswith('.'):
continue
- full_path = dir_path + '/' + entry
- try:
- stat = os.stat(full_path)
- if S_IFMT(stat.st_mode) == S_IFDIR:
- self._pending_directories += 1
- gobject.idle_add(lambda s=full_path: self._recurse_dir(s))
-
- elif S_IFMT(stat.st_mode) == S_IFREG:
- add_to_list = True
-
- if self._regex is not None and \
- not self._regex.match(full_path):
- add_to_list = False
-
- if None not in [self._date_start, self._date_end] and \
- (stat.st_mtime < self._date_start or
- stat.st_mtime > self._date_end):
- add_to_list = False
-
- if self._mime_types:
- mime_type = gio.content_type_guess(filename=full_path)
- if mime_type not in self._mime_types:
- add_to_list = False
-
- if add_to_list:
- file_info = (full_path, stat, int(stat.st_mtime))
- self._file_list.append(file_info)
-
- self.progress.send(self)
-
- except Exception:
- logging.error('Error reading file %r: %s' % \
- (full_path, traceback.format_exc()))
-
- if self._pending_directories == 0:
- self.setup_ready()
- else:
- self._pending_directories -= 1
+ self._pending_files.append(dir_path + '/' + entry)
+ return
def _get_file_metadata(path, stat):
client = gconf.client_get_default()
--
James Cameron
http://quozl.linux.org.au/
More information about the Sugar-devel
mailing list