This post presents papers and tools on semantic source code differencing. It is a special kind of tree differencing.
Semantic source code diff
Unix diff and successors (CVS, GIT diff) are line-based. On the contrary, semantic source code diff work on the abstract syntax tree (AST).
Papers:
- Dex: a semantic-graph differencing tool for studying changes in large code bases (Raghavan et al., 2004)
- Understanding source code evolution using abstract syntax tree matching (Neamtiu et al., 2005)
- Diff/TS: A Tool for Fine-Grained Structural Change Analysis (Hashimoto and Mori, 2008) (contributed by JR Falleri)
- Change distilling:tree differencing for fine-grained source code change extraction (Fluri et al., 2007)
- Detecting similar Java classes using tree algorithms (Sager et al. 2009)
- Operation-based, fine-grained version control model for tree-based representation (NGuyen et al., 2010)
I don't know any comparative evaluation of those algorithms (do you?).
Tools:
- Eclipse Structure Compare
- Evolizer/ChangeDistiller: http://www.ifi.uzh.ch/seal/research/tools/changeDistiller.html
- Ydiff: http://github.com/yinwang0/ydiff, http://www.cs.indiana.edu/~yw21/ydiff.html (written in Racket/Scheme)
- GumTree: https://code.google.com/p/harmony/source/browse/gumtree?repo=vpraxis-maven
I've used ChangeDistiller to diff Java code and it works great. The demos of Ydiff look nice.
XML diff
If one transforms an AST to XML (for instance using srcML, one could use XML diff tools:
Tools:
- diffxml: http://diffxml.sourceforge.net/
- fc-xmldiff: http://fc-xmldiff.googlecode.com/
- diffx: http://www.topologi.com/diffx/
- xmldiff: http://www.logilab.org/project/xmldiff
- x-diff: http://pages.cs.wisc.edu/~yuanwang/xdiff.html
- tools mentioned at http://www.w3.org/wiki/XmlDiff
Seminal papers on tree differencing
- Simple fast algorithms for the editing distance between trees and related problems (Zhang and Shasha, 1989) (contributed by JR Falleri)
- Change detection in hierarchically structured information (Chawathe et al. 1996) (contributed by JR Falleri)
See also A survey on tree edit distance and related problems (Bille, 2005)
All papers on this subject tend to use one of the following terms: