Pointers on abstract syntax tree differencing algorithms and tools

by Martin Monperrus

This post presents papers and tools on semantic source code differencing. It is a special kind of tree differencing.

Unix diff and successors (CVS, GIT diff) are line-based. On the contrary, semantic source code diff work on the abstract syntax tree (AST) [1,2,3,4,5,6,11]. They are based on known algorithms [7,8,9,10].

AST differencing

Tools:

I recommend GumTree and for Java GumTree-Spoon (full disclosure: I’m one of the authors :-)

Tree differencing

JSON differencing

If one transforms an AST to JSON (for instance using shift-ast), one could use JSON diff tools for AST diff:

XML differencing

If one transforms an AST to XML (for instance using srcML), one could use XML diff tools for AST diff:

Tools:

Terminology

All papers on this subject tend to use one of the following terms:

Bibliography

  1. Raghavan et al., “Dex: a semantic-graph differencing tool for studying changes in large code bases”, 2004
  2. Neamtiu et al., “Understanding source code evolution using abstract syntax tree matching”, 2005
  3. Hashimoto and Mori, “Diff/TS: A Tool for Fine-Grained Structural Change Analysis”, 2008
  4. Fluri et al., “Change distilling:tree differencing for fine-grained source code change extraction”, 2007
  5. Sager et al., “Detecting similar Java classes using tree algorithms”, 2009
  6. NGuyen et al., “Operation-based, fine-grained version control model for tree-based representation”, 2010
  7. Zhang and Shasha, “Simple fast algorithms for the editing distance between trees and related problems”, 1989
  8. Chawathe et al., “Change detection in hierarchically structured information”, 1996
  9. Rönnau and Borghoff, XCC: change control of XML documents, 2010
  10. Bille, “A survey on tree edit distance and related problems”, 2005
  11. Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, Martin Monperrus, “Fine-grained and Accurate Source Code Differencing”, In Proceedings of the International Conference on Automated Software Engineering, 2014.
  12. Jonathan I. Maletic, Michael L. Collard, “Supporting Source Code Difference Analysis” 2004

Acknowledgements

JR Falleri contributed to this page

Tagged as: