Lecture "Empirical Software Engineering" at EJCP'2015

by Martin Monperrus

This page presents the material related to the tutorial on Empirical Software Engineering that I'll give at EJCP'2015. The tutorial will be a mix of lecture, DIY and student presentations.

Don't hesitate to ask me questions in advance by email.


Course goals

The course is a success if:

Rules: Ask questions, interrupt me, you don't have to listen to me.

Course Notes

Introduction to Empirical Software Engineering: http://www.monperrus.net/martin/introduction-to-empirical-software-engineering.pdf


At the end of the day, you will be asked to present an article about empirical research. You can do the exercise alone or in pair.

Read an empirical software engineering paper (paper choice discussed below) so as to answer the following questions:

You can plan one slide per question. The presentation is expected to last 5 minutes, plus questions and discussion afterwards.

Paper choice:

Do It Yourself (DIY)

Source code

GitHub Java Corpus: http://groups.inf.ed.ac.uk/cup/javaGithub/ (1.8GB)

UCI Source Code Data Sets : http://www.ics.uci.edu/~lopes/datasets/ (80GB, 390GB, )

Qualitas Corpus: http://qualitascorpus.com/download/

Questions and tools

What is the distribution of file size? Why? Tool: WC on source files

What is the distribution of dependencies? Is the distribution on incoming and outgoing dependencies different? Tool: in Java DependencyFinder (./DependencyExtractor -xml foo.jar)

What is the linking between repo and bug repository? Tool: git log (git log –oneline | grep MATH- | less)

What is the code ownership map of software X? Tool git log: for i in `find src/ -name "*java"`; do echo `git log  --pretty=%ce $i | sort -u | wc -l` $i; done | sort -n

What is the distribution of commit size? Tool git log: see documentation of –pretty

What is the uniqueness of source code? Tool: njrams https://github.com/monperrus/njrams

What is the naturalness of source code ([])? Tool: njrams https://github.com/monperrus/njrams

What is the average depth of if statements? Tool: analyze the abstract syntax tree of C/C++/Java/C# with srcML and the XML library you like. src2srcml --literal --operator --modifier `find src -name "*.java"`

Tagged as: 42? No: