This page presents the material related to the tutorial on Empirical Software Engineering that I'll give at EJCP'2015. The tutorial will be a mix of lecture, DIY and student presentations.
Don't hesitate to ask me questions in advance by email.
The course is a success if:
You come up with your own empirical research questions.
You make one fun DYI empirical software engineering experiment.
You give a good presentation about a cool empirical software engineering paper.
Rules: Ask questions, interrupt me, you don't have to listen to me.
Introduction to Empirical Software Engineering: http://www.monperrus.net/martin/introduction-to-empirical-software-engineering.pdf
At the end of the day, you will be asked to present an article about empirical research. You can do the exercise alone or in pair.
Read an empirical software engineering paper (paper choice discussed below) so as to answer the following questions:
What kind of empirical research is it?
What are the main research questions?
What software tools are used for performing the study? Are there ad-hoc tools that were specifically developed for the study?
What statistical techniques are used (if any)?
Is the result actionable (if yes, explain)?
Do you like/dislike the paper? Why?
Summary of the paper in one sentence?
You can plan one slide per question. The presentation is expected to last 5 minutes, plus questions and discussion afterwards.
One cited in the course notes
One from http://www.dblp.org/search/index.php?query=controlled%7Cstudy%7Cexperiment%7Cempiric%20ce:venue:ieee_trans_software_eng_tse_ (TSE paper with some keywords in the title)
Any empirical paper that fits your interests. Please ask me for validation
Do It Yourself (DIY)
GitHub Java Corpus: http://groups.inf.ed.ac.uk/cup/javaGithub/ (1.8GB)
UCI Source Code Data Sets : http://www.ics.uci.edu/~lopes/datasets/ (80GB, 390GB, )
Qualitas Corpus: http://qualitascorpus.com/download/
Questions and tools
What is the distribution of file size? Why? Tool: WC on source files
What is the distribution of dependencies? Is the distribution on incoming and outgoing dependencies different? Tool: in Java DependencyFinder (./DependencyExtractor -xml foo.jar)
What is the linking between repo and bug repository? Tool: git log (git log –oneline | grep MATH- | less)
What is the code ownership map of software X? Tool git log: for i in `find src/ -name "*java"`; do echo `git log --pretty=%ce $i | sort -u | wc -l` $i; done | sort -n
What is the distribution of commit size? Tool git log: see documentation of –pretty
What is the average depth of if statements? Tool: analyze the abstract syntax tree of C/C++/Java/C# with srcML and the XML library you like. src2srcml --literal --operator --modifier `find src -name "*.java"`