[Sugar-devel] Analysing code changes, git diff

Sascha Silbe sascha-ml-reply-to-2010-3 at silbe.org
Thu Sep 9 08:08:32 EDT 2010


Excerpts from James Cameron's message of Thu Sep 09 03:24:36 +0200 2010:

> Sorry, I used tabify in emacs, so as to force the diff to show the
> entire function as changed.  I had said I "replaced tabs with spaces",
> but I should have said "replaced spaces with tabs".  This is only
> because I don't know how to drive git diff well enough.

This brings up something I have been brooding about for some time now:

The default output of "git diff" often isn't very useful for
understanding a patch or even just merging changes. It's a low-level
representation and, when comparing Python code, often not even a good
one.

I know git can be configured to use external "diff drivers" and "merge
drivers" (see also [6]), so maybe we can improve this. That's why I
started a new thread: Others might be aware of tools that can be used,
either as-is or after some integration work (or at least on their own).


There are two different things that bug me about "git diff" when applied
to the Sugar source code:


1. The built-in algorithm is generic (good) and always language-
   agnostic (bad). For Python code, this often results in:

   I. Choosing unsuitable borders.
      Example:

@@ -81,15 +82,19 @@ def _async_copy(self, file_path, destination_path, completion_cb,
                 unlink_src)
         async_copy.start()
 
-    def retrieve(self, uid, user_id, extension):
+    def has_data(self, object_id):
+        file_path = layoutmanager.get_instance().get_data_path(object_id)
+        return os.path.exists(file_path)
+
+    def retrieve(self, object_id, user_id, extension):
         """Place the file associated to a given entry into a directory
            where the user can read it. The caller is reponsible for
            deleting this file.
 
         """

      This mixes up the changes to retrieve (only signature changed) and
      adding a new method has_data(). When handling conflicts, the above
      output is unnecessarily hard to understand and resolve.
      What I would have liked to see instead is:

+    def has_data(self, object_id):
+        file_path = layoutmanager.get_instance().get_data_path(object_id)
+        return os.path.exists(file_path)
+
-    def retrieve(self, uid, user_id, extension):
+    def retrieve(self, object_id, user_id, extension):
         """Place the file associated to a given entry into a directory
            where the user can read it. The caller is reponsible for
            deleting this file.

         """


   II. Less useful "hunk headers".
       Default output from git diff:

@@ -54,7 +251,8 @@ class DataStore(dbus.service.Object):
                                         bus=dbus.SessionBus(),
                                         replace_existing=False,
                                         allow_replacement=False)
-        dbus.service.Object.__init__(self, bus_name, DS_OBJECT_PATH)
+        dbus.service.Object.__init__(self, bus_name, DS_OBJECT_PATH2)
+        self._api_v1 = DBusAPIv1(self)
 
         migrated = self._migrate()

      I can see that the change happens in class DataStore, but not
      in which function (line numbers and even context can vary
      massively in my version support branch).
      If I set diff=python in .gitattributes, I get:

@@ -54,7 +251,8 @@ def __init__(self, **options):
                                         bus=dbus.SessionBus(),
                                         replace_existing=False,
                                         allow_replacement=False)
-        dbus.service.Object.__init__(self, bus_name, DS_OBJECT_PATH)
+        dbus.service.Object.__init__(self, bus_name, DS_OBJECT_PATH2)
+        self._api_v1 = DBusAPIv1(self)
 
         migrated = self._migrate()

      OK, so it's in method __init__(), but of which class?
      Both combined would be great:

@@ -54,7 +251,8 @@ DataStore.__init__(self, **options):



2. What I actually want to see most of the time is some kind of
   "semantic diff". Something that shows me the refactorings that have
   taken place, like "factor out the following block of code from foo
   into _bar". I know there are tools to perform these kind of changes
   [1-3] but I don't know of anything that can analyse some source code
   and output the differences in terms of refactorings (and additional
   code changes). There's at least one tool that seems to do something
   along this line (though its output isn't exactly easy to read [4])
   but it's proprietary, Windows-only software [5].
   SSDDiff [7] looks fine and is open source, but only operates on XML.

   Research on this topic has been going on for at least two decades [8],
   but what I'm looking for is actual, usable tools, not papers.
   Something like SSDDiff operating on ASTs instead of XML would be
   awesome.


Thanks for taking the time to read my entire post. Looking forward to
any suggestion on what tools we can use to better analyse code changes.

Sascha (wishing for wiki formatting in EMails)

[1] http://bicyclerepair.sourceforge.net/
[2] http://www.eclipse.org/
[3] http://www.eclipse.org/articles/article.php?file=Article-Unleashing-the-Power-of-Refactoring/index.html
[4] http://www.semdesigns.com/Products/SmartDifferencer/CSmartDifferencerExample.html#CSharpSmartDifferencerOutput
[5] http://www.semdesigns.com/Products/SmartDifferencer/CSmartDifferencer.html
[6] http://idnotfound.wordpress.com/2009/05/09/word-by-word-diffs-in-git/
[7] http://ssddiff.alioth.debian.org/
[8] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.49.3377
--
http://sascha.silbe.org/
http://www.infra-silbe.de/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
Url : http://lists.sugarlabs.org/archive/sugar-devel/attachments/20100909/ed3dd992/attachment-0001.pgp 


More information about the Sugar-devel mailing list