Research topics for a thesis or an internship

by Martin Monperrus

Are you a KTH student looking for a fun and challenging topic in software engineering? Do you your See also Master's thesis / Bachelor's thesis under my supervision; or register to "Advanced Individual Course in Computer Science" with me as supervisor; or apply to be a research assistant (aka research amanuens), which is a 20% research job, where you get paid by KTH.

Are you a brilliant international student looking for an internship in a world-class research lab (remote internship included)?

Contact me by email if you want to join my group –Martin


     Topics in Program Repair
          Artificial Intelligence for Repair
               Study of the importance of pre-training for machine learning on code
               A Large-scale study of patch embedding (collaboration with Univ. Wisconsin-Madison)
               Automatic transfer of formatting with machine learning
               Deep-learning of variable relationships for automatic program repair
               Sequence-to-sequence machine learning for automatic program repair
          Program Analysis for Repair
               Comparing execution traces to identify buggy inputs.
               Automatic Repair of SonarQube Static Warnings.
          Artificial Software Developer on Github (Repairnator)
               An Artificial Bug Fixing Bot for Python.
               Automated stewardship of open-source projects.
               Automated understanding and usage of software development emoticons.
     Topics in Self-healing Software & Chaos Engineering
               Automatic Repair of Broken Websites due to Privacy-enhancing Plugins
               Preventing algorithmic DOS attacks with blackbox randomization.
     Topics in Software Testing
               Automatic Repair of Flaky Tests
               Automatic Renaming of Test Variables for Improve Maintainability
     Topics in Code Transformation
               Declarative Code Transformation with the Semantic Patch Language for Java
               Metamorphic code transformation for Java

Topics in Program Repair

Artificial Intelligence for Repair

Study of the importance of pre-training for machine learning on code

Supervisors: Martin Monperrus (KTH) / Hugo Mougard (source{d})

Circa 2014, the field of Computer Vision saw its first publications of efficient pre-trainings and transfer learning (https://en.wikipedia.org/wiki/Transferlearning). In natural language processing, a big result was published in 2018 with performant pre-trainings on generic tasks [2] (such as word or sentence completion) allowing models to achieve state of the art with minor changes to the base model. Today, there is a growing body of research on machine learning on code [3], with a variety of models and embeddings used to perform a variety of goals (code completion, program repair, renaming, etc;). Yet, there is no research on the importance and feasibility of pretraining on those machine learning on code tasks. The student will design, implement and perform an experiment to study the importance of pre-training for machine learning on code. The student will have access to the full code analysis stack of sourced (link) to run the experiments. Weekly progress reports with the sourced engineers will be organized.

  1. Very Deep Convolutional Networks for Large-Scale Image Recognition

  2. Deep contextualized word representations

  3. A survey of machine learning for big code and naturalness

A Large-scale study of patch embedding (collaboration with Univ. Wisconsin-Madison)

Supervisor: Martin Monperrus, KTH Royal Institute of Technology, EECS/TCS

A lot of automatic bug fixing generation techniques rely studying past patches. Recent work has proposed to embed source changes in a real-valued vectorial space [1]. The student will implement and extend this embedding technique, in order to make it work on Java code and large patches [2]. The student will perform the experiment by running it on a scientific computing grid.

  1. Learning to represent edits (2018)

  2. An Empirical Study on Real Bug Fixes (2015).

Automatic transfer of formatting with machine learning

Supervisor: Martin Monperrus, KTH Royal Institute of Technology, EECS/TCS

Description: It is common practice to use and enforce a certain coding style in software projects. This can become a nightmare when one copies files from one project to another, where the project use different conventions. For this, there is a need to be able to transfer the style one project to files coming from potentially anywhere with any style. It may be possible to use machine learning to perform the transfer [1,2] The student will devise, implement and evaluate an approach to automatically transfer coding style. The student will perform the experiments a scientific computing grid.

  1. Learning Natural Coding Conventions (https://github.com/mast-group/naturalize)

  2. Towards a Universal Code Formatter through Machine Learning (https://github.com/antlr/codebuff)

Deep-learning of variable relationships for automatic program repair

Supervisor: Martin Monperrus, KTH Royal Institute of Technology, EECS/TCS

Description: Most patches don't introduce new variables, they only reuse existing variables and method calls.. The student will set up and perform an experiment to use deep learning for mining variables relationships. . The planned methodology is as follows: 1) extract a dataset of variable relations based on existing code 2) apply deep-learning on this data and analyze the results 3) devise, implement and assess a extension of Nopol/Astor to use the mined information. The student will perform the experiment by running it on a scientific computing grid.

  1. ASTOR: A Program Repair Library for Java

  2. Sorting and Transforming Program Repair Ingredients via Deep Learning Code Similarities

Sequence-to-sequence machine learning for automatic program repair

Supervisor: Martin Monperrus, KTH Royal Institute of Technology, EECS/TCS

A lot of automatic bug fixing generation techniques rely on slightly modifying the existing code. The student will devise and evaluate a new repair algorithm that will learn from past diffs, using sequence-to-sequence learning. The planned methodology is as follows: 1) set up a training and evaluation dataset based on diffs 2) devise, implement and assess a new repair algorithm based on this data. The student will perform the experiment by running it on a scientific computing grid.

  1. ASTOR: A Program Repair Library for Java

  2. An Empirical Investigation into Learning Bug-Fixing Patches in the Wild via Neural Machine Translation (ASE18)

Program Analysis for Repair

Comparing execution traces to identify buggy inputs.

Supervisor: Martin Monperrus, KTH Royal Institute of Technology, EECS/TCS

Description: When one uses test generation on a buggy program, one does not know whether the generated input or scenario is in the buggy input domain or in the correct input domain [1]. Assuming one has an execution trace of a buggy input, the idea is to estimate whether a new input is buggy or correct, by comparing two executions [2]. The goal of this thesis is to study, design and implement a machine-learning based system for measuring the likelihood of an execution trace to be buggy.

  1. Alleviating Patch Overfitting with Automatic Test Generation: A Study of Feasibility and Effectiveness for the Nopol Repair System (EMSE 17)

  2. Identifying Patch Correctness in Test-Based Automatic Program Repair (ICSE 18)

Automatic Repair of SonarQube Static Warnings.

Supervisor: Martin Monperrus, KTH Royal Institute of Technology, EECS/TCS

Description: There exists several systems for statically identifying problems (FindBugs, PMD, SpotBugs, SonarQube). The student will work on a system to automatically repair those issues. She will focus on SonarQube warnings. The repair algorithms will be code transformations written using the Spoon transformation library for Java.

  1. SonarQube rules explorer

  2. Automatic Software Repair: a Bibliography (CSUR 17)

  3. Are Static Analysis Violations Really Fixed? A Closer Look at Realistic Usage of SonarQube. Dataset for OSS organizations

Artificial Software Developer on Github (Repairnator)

Example of related work: https://medium.com/@martin.monperrus/human-competitive-patches-in-automatic-program-repair-with-repairnator-359042e00f6a.

An Artificial Bug Fixing Bot for Python.

Supervisor: Martin Monperrus, KTH Royal Institute of Technology, EECS/TCS

Description: On Travis, the #1 language is Python (millions of builds, 4x more than Java). In order to increase the impact of Repairnator, the goal of this work is to implement a first prototype of Repairnator in Python. The student will devise, implement and evaluate an automatic repair system for Python and Travis CI.

  1. Human-competitive Patches in Automatic Program Repair with Repairnator

  2. How to Design a Program Repair Bot? Insights from the Repairnator Project

  3. https://github.com/Spirals-Team/repairnator/

Automated stewardship of open-source projects.

Supervisor: Martin Monperrus, KTH Royal Institute of Technology, EECS/TCS

Description: On Github, certain projects are very popular but the maintainers have not enough bandwidth to keep up the pace of pull requests. Those projects literally die under too many pull requests. For those projects, the maintainers need a robot to help them merge pull requests. The student will devise, implement and evaluate an automated steward for software development projects. The steward will try to take over dead yet popular software projects.

  1. Human-competitive Patches in Automatic Program Repair with Repairnator

  2. How to Design a Program Repair Bot? Insights from the Repairnator Project

Automated understanding and usage of software development emoticons.

Supervisor: Martin Monperrus, KTH Royal Institute of Technology, EECS/TCS

Description: Repairnator is a software development bot on Github [1,2]. It constantly monitors software bugs discovered during continuous integration of open-source software and tries to fix them automatically. If it succeeds to synthesize a valid patch, Repairnator proposes the patch to the human developers, disguised under a fake human identity. A good Github developer is able to express her intentions and feelings with the appropriate usage of emoticons. Repairnator should be able to do the same: to decorate its pull-requests and comments with emoticons. The student will use the whole Github data to train a predictor of emoticons for software development.

  1. Human-competitive Patches in Automatic Program Repair with Repairnator

  2. How to Design a Program Repair Bot? Insights from the Repairnator Project

Topics in Self-healing Software & Chaos Engineering

Example of recent work in my group:

Automatic Repair of Broken Websites due to Privacy-enhancing Plugins

Supervisor: Martin Monperrus, KTH Royal Institute of Technology, EECS/TCS

Description: Users can improve their online privacy by installing privacy-enhancing plugins, such as uBlock. However, those plugins break important functionality of some websites [1]. The student will build on the ideas on BikiniProxy [1] to provide automatic repair of broken websites due privacy enhancing technology. The student will design and implement the system (eg "uBlock-repair") either as a browser plugin or as proxy.

  1. A comparison of web privacy protection techniques

  2. Fully Automated HTML and Javascript Rewriting for Constructing a Self-healing Web Proxy

Preventing algorithmic DOS attacks with blackbox randomization.

Supervisor: Martin Monperrus, KTH Royal Institute of Technology, EECS/TCS

Description: The goal of this thesis is to study counter-measures to algorithmic denial of service attacks. An algorithmic DOS consists of a input specifically designed by the attacker to trigger the worst case execution of a program [1]. Black-box randomization consists of identifying and injecting randomization points in software, without any knowledge of the application domain and implementation choices. The goal of this thesis is to study the usage of black-box randomization for countering algorithmic DOS attacks. The student will devise and perform a scientific experiment in this context. She/he will read the literature, implement the required software for supporting the experiment, design the inclusion criteria for subjects and run the experiment on a scientific computing grid.

  1. Denial of Service via Algorithmic Complexity Attacks

  2. Correctness Attraction: A Study of Stability of Software Behavior Under Runtime Perturbation

Topics in Software Testing

Automatic Repair of Flaky Tests

Supervisors: Benoit Baudry, Martin Monperrus (KTH)

Description: Flaky tests are tests that fail in an non-determistic way, and it is is a big problem in industry. Following the automatic repair philosophy, one can automatically repair a some of them by improving sandboxing or virtualizing time. The student will design, implement and evaluation a prototype system for automatic repair of flaky tests.

  1. An empirical analysis of flaky tests (2014)

  2. Automatic Software Repair: a Bibliography (CSUR 17)

Automatic Renaming of Test Variables for Improve Maintainability

Supervisors: Benoit Baudry, Martin Monperrus (KTH)

Description: In test code, it is very important to have good variables names, so that the test intention is clear, and so that the test is maintainable. Recent research has shown that we can use machine learning to predict good names in code. The student will work to apply the state-of-the-art technique in variable renaming to test code in C++ or Java. The work will be related to the EU H2020 project STAMP.

  1. code2vec: Learning Distributed Representations of Code (2018), implementation at https://github.com/tech-srl/code2vec.

  2. Context2Name: A Deep Learning-Based Approach to Infer Natural Variable Names from Usage Contexts (2018)

Topics in Code Transformation

Declarative Code Transformation with the Semantic Patch Language for Java

Supervisor: Martin Monperrus, KTH Royal Institute of Technology, EECS/TCS

Description: Code transformation is a powerful tool in dynamic software analysis [1]. Declarative transformations are easier to specify, understand and maintain. In that realm, the "Semantic Patch Language (SmPL)" is the state-of-the-art [2]. The student will implement SmPL for Java. The interpretation engine will be made in the Spoon library [1].

  1. Spoon: A Library for Implementing Analyses and Transformations of Java Source Code

  2. SmPL: A Domain-Specific Language for Specifying Collateral Evolutions in Linux Device Drivers

Metamorphic code transformation for Java

Supervisor: Martin Monperrus, KTH Royal Institute of Technology, EECS/TCS

Description: Code transformation is a powerful tool in dynamic software analysis. It can be used as source code transformation or binary code transformation. For the programmer, it is much easier to write a transformation at the source code level, because she is familiar with the language constructs. However, for applicability, it is better to be able to apply the transformations at the binary level. The student will design a transformation system for Java such that the transformations are applicable on source code or binary code interchangeably. This will be done in the context of Java, which means that the transformation system will be able to work both on Java source code and on JVM bytecode. The student will study different options, incl: 1) compiling a source code transformation in Spoon to a binary code transformation in ASM/Javassist 2) applying transformation after decompilation.

  1. Spoon: A Library for Implementing Analyses and Transformations of Java Source Code

Tagged as: