Research topics for a thesis or an internship

by Martin Monperrus

Under my supervision, you can do cool research in software technology, here are our current hot topics.

Are you a KTH student? See Master's thesis / Bachelor's thesis guidelines and contact me by email

Are you a brilliant international student? Contact me by email


Category Code Transformation
     Source Code
               Diverse Double Compilation for Bitcoin
     Binary Code
               Automated Software Diversity via Multi-compilation
               Patch Presence Assessment on Binary Code
               Decompilation for WebAssembly/WASM using Datalog
Category Program Repair
     Machine Learning for Program Repair
               Trace Analysis for Patch Generation
               Unsupervised Learning for Program Repair (Sequencer)
               Automatic transfer of formatting with machine learning
     Program Analysis for Automatic Repair
               Program Repair with Datalog
               Automatic repair of unreproducible builds (Repairnator)
               Automatic ranking of pull-requests for merging or rejecting
     Generate-and-validate program repair
               Automatic Program Repair of Bears with Repair Templates in Astor
               Explainable repair templates for Astor
               Automatic Repair of OCRed programs
Category Program Hardening
     Chaos Engineering
               Master's thesis Chaos Engineering at Electrolux
               Maximum Likelihood Perturbations for Chaos Engineering
               Chaos Engineering for Microservices with Resilience4J
     Randomization & Self-Healing
               Self-healing capabilities for Deployed WebAssembly
               Preventing cache attacks on AES with blackbox randomization
               Preventing algorithmic DOS attacks with blackbox randomization
Category Software Testing
               Collection and Analysis of Code Coverage in Production
               Automatic Identification of Pseudo-tested Conditions
               Automatic Debloat of Test Suites
               Automatic Repair of Rotten Green Tests

Category Code Transformation

Source Code

See also topics on the Spoon repository: https://github.com/INRIA/spoon/labels/mastersthesis

Diverse Double Compilation for Bitcoin

Supervisor: Benoit Baudry, Martin Monperrus, KTH Royal Institute of Technology

Description: The problem of deceptive compilers introducing malicious code is relevant and hard [1,2]. One solution for this is to use doule diverse compilation to mitigate the problem [3] Yet it is hard to change the compilation pipeline of real software. You will design, implement and evaluate diverse double compilation for bitcoin-core.

  1. Reflection on Trusting trust 1983

  2. Defending Against Compiler-Based Backdoors

  3. Countering trusting trust through diverse double-compiling

Binary Code

Automated Software Diversity via Multi-compilation

Supervisor: Benoit Baudry, Martin Monperrus, KTH Royal Institute of Technology

Description: The problem of deceptive compilers introducing malicious code is relevant and hard [1,2]. One solution for this is to use software diversity [3]. For instance, one can compile a C program with both GCC and CLANG. You will design, implement and evaluate a multi-compiler scheme for C.

  1. Reflection on Trusting trust 1983

  2. Defending Against Compiler-Based Backdoors Java Decompiler Diversity and its Application to Meta-decompilation

Patch Presence Assessment on Binary Code

Supervisors: Martin Monperrus

Description: It is essential to know whether a specific binary contains all required vulnerability fixes. For: we need to components: 1) extracting signatures of vulnerability fixes 2) checking for the presence of those fixes in a binary. The student will work on those components in the context of Java code.

  1. Patch Based Vulnerability Matching for Binary Programs

  2. BScout: Direct Whole Patch Presence Test for Java Executables

Decompilation for WebAssembly/WASM using Datalog

A decompiler takes compiled binary code and produces textual source code. Decompilation is an essential step for program comprehension, security analyses, etc. However, it is challenging to write an accurate decompiler (that can always reconstruct the source code that actually corresponds to the compiled code) and the implementation of decompilers currently relies on the careful, manual design of decompilation rules. Recently, researchers have proposed to use Datalog for decompilation [1]. In this work, you will will design and evaluate a decompiler for WebAssembly/WASM based on Datalog.

  1. Datalog disassembly

  2. Java Decompiler Diversity and its Application to Meta-decompilation

Category Program Repair

Machine Learning for Program Repair

You would commit in project https://github.com/KTH/sequencer.

Trace Analysis for Patch Generation

Supervision: Zimin Chen, Martin Monperrus (KTH)

Description: To reason about programs one can use source code or/and dynamic traces. Most works in machine learning on programs only use code [1]. However, the usage of traces is a very promising research direction [2,3]. You will invent a new patch generation systems, based on trace embeddings. The work will be done on top of recent works in the field [2,3], for generating patches in Java.

  1. A survey of machine learning for big code and naturalness

  2. Learning Blended, Precise Semantic Program Embeddings (2019)

  3. Learning to Encode and Classify Test Executions (2020)

Unsupervised Learning for Program Repair (Sequencer)

Supervision: Zimin Chen, Martin Monperrus (KTH)

Recently, it has been shown that one can use self-supervised learning to synthesize code [1,2]. This opens up many new possibilities for program repair. The goal of this work is to use self-supervised learning to learn how to fix one-line bugs [3].

  1. Generative Code Modeling with Graphs (2018

  2. SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair

  3. Graph-based Self-Supervised Program Repair from Diagnostic Feedback (2020)

Automatic transfer of formatting with machine learning

Supervisor: Martin Monperrus, KTH Royal Institute of Technology

Description: It is common practice to use and enforce a certain coding style in software projects. This can become a nightmare when one copies files from one project to another, where the project use different conventions. For this, there is a need to be able to transfer the style one project to files coming from potentially anywhere with any style. It may be possible to use machine learning to perform the transfer [1,2] The student will devise, implement and evaluate an approach to automatically transfer coding style. The student will perform the experiments a scientific computing grid.

  1. Learning Natural Coding Conventions (https://github.com/mast-group/naturalize)

  2. Towards a Universal Code Formatter through Machine Learning (https://github.com/antlr/codebuff)

Program Analysis for Automatic Repair

You would commit in project https://github.com/eclipse/repairnator.

Program Repair with Datalog

Supervision: Martin Monperrus (KTH)

Description: Static analysis tools are much used in industry to statically detect bugs and code smells [1]. It has been shown that anti-unification can be used to synthesize patches for static analysis warnings [2]. Datalog is a language that is becoming more and popular for program analysis due to its effectiveness [3]. You will design, implement and evaluate a program repair system to repair SonarQube static analysis warnings using anti-unification and Datalog.

  1. SonaQube-java

  2. Phoenix: Automated data-driven synthesis of repairs for static analysis violations

  3. Datalog disassembly

Automatic repair of unreproducible builds (Repairnator)

Supervisor: Martin Monperrus, KTH Royal Institute of Technology

Description: Achieving reproducible builds is essential for achieving a secure software supply chain. In particular, one must be sure that no malicious actor has injected malware during the build process. The Debian linux distribution has deployed this in the large. However, when a project build process is no reproducible or becomes unstable again, it can be really hard to find the root cause and fix it. You will design, implement and evaluate a system that automatically suggests repair actions for unreproducible build processes.

  1. Automated Localization for Unreproducible Builds (2018)

  2. Debian Reproducible Builds

Automatic ranking of pull-requests for merging or rejecting

Supervisor: Martin Monperrus, KTH Royal Institute of Technology

Description: On Github, certain projects have too many pull requests and the maintainers need a way to order the pull requests. The ranking could be according to their maturity, importance or likelihood to be merged or rejected. You will devise, implement and evaluate an approach for ranking pull-requests based the history of merged and rejected pull-requests.

  1. Wait for it: determinants of pull request evaluation latency on GitHub

  2. Improving the pull requests review process using learning-to-rank algorithms

Generate-and-validate program repair

Automatic Program Repair of Bears with Repair Templates in Astor

Supervisor: Martin Monperrus (KTH Royal Institute of Technology)

Description: In automatic program repair, template-based repair is an effective way to reduce programs with little overfitting [1]. The student will implement template-based repair in Astor [3], based on the consolidated list by Liu et al [1]. The student will design design and perform a large scale experiment of template-based repair on the BEARS benchmark [2], with quantitative and qualitative analysis.

  1. TBar: Revisiting Template-based Automated Program Repair

  2. Bears: An Extensible Java Bug Benchmark for Automatic Program Repair Studies

  3. https://github.com/SpoonLabs/astor

Explainable repair templates for Astor

Supervisor: Martin Monperrus (KTH Royal Institute of Technology), Matias Martinez (University of Valenciennes)

Description: In practice, template-based repair is promising, because they are very easy to explain to the developer and they also provide reduced overfitting. The student will design and implement a full-fledged template based system for Astor, based on the latest literature on repair templates. A large scale evaluation will be done on the novel repair benchmarks Bugs.jar and BEARS. we can play with the vocabulary to have pure language based templates and project-specific ones

  1. Explainable Software Bot Contributions: Case Study of Automated Bug Fixes

  2. Revisiting template-based automated program repair

  3. FixMiner: Mining Relevant Fix Patterns for Automated Program Repair

  4. https://github.com/SpoonLabs/astor

Automatic Repair of OCRed programs

Supervisor: Martin Monperrus (KTH Royal Institute of Technology)

Description: In difficult situations, programs have to be transferred with other means than electronic channels. For instance, in the 90ies, the code of PGP had to be sent outside the USA as a printed book [1]. Yet, performing OCR on programs is a challenge, because a single incorrectly recognized character can make the program non compilable or non executable. For this, automatic repair of OCR errors can be used [2] or one can encode the program differently [3]. The goal of this thesis is to push the state of the art of automatic repair of OCRed programs.

  1. Scanning the source code

  2. Tools for Publishing Source Code via OCR

  3. How to get perfect OCR of digital data with 100% accuracy?

Category Program Hardening

See also https://github.com/KTH/royal-chaos.

Chaos Engineering

Master's thesis Chaos Engineering at Electrolux

See https://www.linkedin.com/jobs/view/2260031591/.

Maximum Likelihood Perturbations for Chaos Engineering

Supervision: Long Zhang, Martin Monperrus, KTH Royal Institute of Technology

Description: Chaos Engineering [1] is the discipline of verifying resilience capabilities of software systems in production. Our group is doing research in this field, in the context of Java applications [2]. Today, perturbation models are naive: they are either random or based on enumeration. You will study, design and implement a chaos engineering algorithm where each perturbation is chosen according to a utility function in order to maximize the knowledge gained from each perturbation.

  1. Chaos Engineering (the book)

  2. A Chaos Engineering System for Live Analysis and Falsification of Exception-handling in the JVM

Chaos Engineering for Microservices with Resilience4J

Supervision: Long Zhang, Martin Monperrus, KTH Royal Institute of Technology

Description: Chaos Engineering [1] is the discipline of verifying resilience capabilities of software systems in production. Resilience4J [2] is a notable middleware for microservices and DevOps orginally done by Netflix. You will study, design and implement a chaos engineering framework for Resilience4J.

  1. Chaos Engineering (the book)

  2. https://github.com/resilience4j/resilience4j

  3. A Chaos Engineering System for Live Analysis and Falsification of Exception-handling in the JVM

Randomization & Self-Healing

Self-healing capabilities for Deployed WebAssembly

Supervisor: Martin Monperrus, KTH Royal Institute of Technology

Description: Over the last few years, the complexity of web applications has extraordinarily increased and WebAssembly has been introduced to overcome performance issues with Javascript. You will design and implement a system that provides self-healing capabilities for web applications with WebAssembly code. The self-healing techniques will be inspired from [1].

  1. Fully Automated HTML and Javascript Rewriting for Constructing a Self-healing Web Proxy

  2. Assure: automatic software self-healing using rescue points

  3. From hack to elaborate technique—a survey on binary rewriting

Preventing cache attacks on AES with blackbox randomization

Supervisor: Martin Monperrus, KTH Royal Institute of Technology

Description: The goal of this thesis is to study counter-measures to cache attacks on AES. A cache attack consists of inferring a private key by means of cache observation [1]. Black-box randomization consists of identifying and injecting randomization points in software, without any knowledge of the application domain and implementation choices. The goal of this thesis is to study the usage of black-box randomization for countering cache attacks. The student will devise and perform a scientific experiment in this context. She/he will read the literature, implement the required software for supporting the experiment, design the inclusion criteria for subjects and run the experiment on a scientific computing grid.

  1. Efficient Cache Attacks on AES, and Countermeasures

  2. Correctness Attraction: A Study of Stability of Software Behavior Under Runtime Perturbation

Preventing algorithmic DOS attacks with blackbox randomization

Supervisor: Martin Monperrus, KTH Royal Institute of Technology

Description: The goal of this thesis is to study counter-measures to algorithmic denial of service attacks. An algorithmic DOS consists of a input specifically designed by the attacker to trigger the worst case execution of a program [1]. Black-box randomization consists of identifying and injecting randomization points in software, without any knowledge of the application domain and implementation choices. The goal of this thesis is to study the usage of black-box randomization for countering algorithmic DOS attacks. The student will devise and perform a scientific experiment in this context. She/he will read the literature, implement the required software for supporting the experiment, design the inclusion criteria for subjects and run the experiment on a scientific computing grid.

  1. Denial of Service via Algorithmic Complexity Attacks

  2. Correctness Attraction: A Study of Stability of Software Behavior Under Runtime Perturbation

  3. Slowfuzz: Automated domain-independent detection of algorithmic complexity vulnerabilities

Category Software Testing

Collection and Analysis of Code Coverage in Production

Supervisors: Martin Monperrus (KTH)

Description: Code coverage usually relates to test code. Production code coverage is the coverage over real interactions made by users in production. Obtaining and analysing production code coverage enables to identify useless code as well as relevant test data and values. It enables testers and developers to better align the test intentions with what matters for users. The student will compare and analyze techniques for automatically collecting code coverage in production for Java software.

  1. Code Pulse: Real-time code coverage for penetration testing activities

  2. Measuring production code coverage with JaCoCo

  3. Perpetual testing

Automatic Identification of Pseudo-tested Conditions

Supervisors: Benoit Baudry, Martin Monperrus (KTH)

Description: One of the problems with coverage is that it does not detect unspecified code [1]. There is the same problem with conditions, some conditions are well covered according to edge coverage yet the code can actually take any branch without failing. The student will design, implement and evaluate a novel technique for automatically identifying pseudo-tested conditions in Java software.

  1. A Comprehensive Study of Pseudo-tested Methods

  2. https://github.com/STAMP-project/pitest-descartes/

Automatic Debloat of Test Suites

Supervisors: Benoit Baudry, Martin Monperrus (KTH)

Description: Today, test suites are really big. Not all tests are equally effective, some tests are useless, some tests overlap. Test code has to be maintained and having too much test code can be considered as technical debt. The student will design, implement and evaluate a novel technique for automatically debloating test suites in Java in order to increase test effectiveness.

  1. An empirical study of junit test-suite reduction

  2. Balancing trade-offs in test-suite reduction (2014)

Automatic Repair of Rotten Green Tests

Supervisors: Benoit Baudry, Martin Monperrus (KTH)

Description: Rotten green tests provide a false sense of security, and it is is a big problem in practice. Following the automatic repair philosophy, one can automatically repair such tests by refactoring the test case. The student will design, implement and evaluation a prototype system for automatic repair of rotten green tests.

  1. RTj: a Java framework for detecting and refactoring rotten green test cases (2019) Github: https://github.com/UPHF/RTj

  2. Intent-Preserving Test Repair

  3. Automatic Software Repair: a Bibliography (CSUR 17)

Tagged as: