This post presents papers and tools on semantic source code differencing. It is a special kind of tree differencing.
Unix diff and successors (CVS, GIT diff) are line-based. On the contrary, semantic source code diff work on the abstract syntax tree (AST) [1,2,3,4,5,6,11]. They are based on known algorithms [7,8,9,10].
AST differencing
Tools:
- GumTree [11] (language independent): https://github.com/GumTreeDiff/gumtree/
- GumTree for Java: https://github.com/SpoonLabs/gumtree-spoon-ast-diff/
- GumTree for C: https://github.com/GumTreeDiff/cgum
- ChangeNodes (for Java): https://github.com/ReinoutStevens/ChangeNodes/
- CLDiff (for Java): https://github.com/FudanSELab/CLDIFF
- treedifferencing (for Java): https://github.com/FAU-Inf2/treedifferencing
- ChangeDistiller (for Java): http://www.ifi.uzh.ch/seal/research/tools/changeDistiller.html
- LAS (for Java): https://github.com/thwak/LAS
- Ydiff (for Lisp): https://github.com/bartuer/ydiff
- APTED (generic tree distance): https://github.com/DatabaseGroup/apted
- IJM (for Java): https://github.com/VeitFrick/IJM
- truediff https://gitlab.rlp.net/plmz/truediff paper
- difftastic: https://github.com/Wilfred/difftastic
- diffsitter: https://github.com/afnanenayet/diffsitter
- prettydiff: https://prettydiff.com/
I recommend GumTree and for Java GumTree-Spoon (full disclosure: I’m one of the authors :-)
Tree differencing
- treediff-rs: Extract differences between arbitrary datastructures https://github.com/Byron/treediff-rs
JSON differencing
If one transforms an AST to JSON (for instance using shift-ast), one could use JSON diff tools for AST diff:
- https://github.com/zgrossbart/jdd (Javascript)
- https://github.com/benjamine/jsondiffpatch (JS)
- https://github.com/andreyvit/json-diff (CoffeeScript)
- https://github.com/yudai/gojsondiff (Go)
- https://github.com/gnieh/diffson (Scala)
- https://github.com/xlwings/jsondiff (Python)
XML differencing
If one transforms an AST to XML (for instance using srcML), one could use XML diff tools for AST diff:
Tools:
- xcc: https://launchpad.net/xcc
- diffxml: http://diffxml.sourceforge.net/
- fc-xmldiff: http://fc-xmldiff.googlecode.com/
- diffx: http://www.topologi.com/diffx/
- xmldiff: http://www.logilab.org/project/xmldiff
- x-diff: http://pages.cs.wisc.edu/~yuanwang/xdiff.html
- microsoft xml diff http://www.microsoft.com/en-us/download/details.aspx?id=24313
- nokogiri-diff https://github.com/postmodern/nokogiri-diff
- tdiff https://github.com/postmodern/tdiff
Terminology
All papers on this subject tend to use one of the following terms:
- tree editing distance / tree edit distance
- tree edit script
- change detection on trees
- tree difference / tree differencing
- tree matching
Bibliography
- Raghavan et al., “Dex: a semantic-graph differencing tool for studying changes in large code bases”, 2004
- Neamtiu et al., “Understanding source code evolution using abstract syntax tree matching”, 2005
- Hashimoto and Mori, “Diff/TS: A Tool for Fine-Grained Structural Change Analysis”, 2008
- Fluri et al., “Change distilling:tree differencing for fine-grained source code change extraction”, 2007
- Sager et al., “Detecting similar Java classes using tree algorithms”, 2009
- NGuyen et al., “Operation-based, fine-grained version control model for tree-based representation”, 2010
- Zhang and Shasha, “Simple fast algorithms for the editing distance between trees and related problems”, 1989
- Chawathe et al., “Change detection in hierarchically structured information”, 1996
- Rönnau and Borghoff, XCC: change control of XML documents, 2010
- Bille, “A survey on tree edit distance and related problems”, 2005
- Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, Martin Monperrus, “Fine-grained and Accurate Source Code Differencing”, In Proceedings of the International Conference on Automated Software Engineering, 2014.
- Jonathan I. Maletic, Michael L. Collard, “Supporting Source Code Difference Analysis” 2004
Acknowledgements
JR Falleri contributed to this page