Under my supervision, you can do cool research in software technology, here are our current hot topics.
Are you a KTH student? See Master's thesis / Bachelor's thesis guidelines and contact me by email
Are you a brilliant international student? Contact me by email
Category Code Transformation
Source Code
Diverse Double Compilation for Bitcoin
Binary Code
Automated Software Diversity via Multi-compilation
Patch Presence Assessment on Binary Code
Decompilation for WebAssembly/WASM using Datalog
Category Program Repair
Machine Learning for Program Repair
Trace Analysis for Patch Generation
Unsupervised Learning for Program Repair (Sequencer)
Automatic transfer of formatting with machine learning
Program Analysis for Automatic Repair
Program Repair with Datalog
Automatic repair of unreproducible builds (Repairnator)
Automatic ranking of pull-requests for merging or rejecting
Generate-and-validate program repair
Automatic Program Repair of Bears with Repair Templates in Astor
Explainable repair templates for Astor
Automatic Repair of OCRed programs
Category Program Hardening
Chaos Engineering
Master's thesis Chaos Engineering at Electrolux
Maximum Likelihood Perturbations for Chaos Engineering
Chaos Engineering for Microservices with Resilience4J
Randomization & Self-Healing
Self-healing capabilities for Deployed WebAssembly
Preventing cache attacks on AES with blackbox randomization
Preventing algorithmic DOS attacks with blackbox randomization
Category Software Testing
Collection and Analysis of Code Coverage in Production
Automatic Identification of Pseudo-tested Conditions
Automatic Debloat of Test Suites
Automatic Repair of Rotten Green Tests
Category Code Transformation
Source Code
See also topics on the Spoon repository: https://github.com/INRIA/spoon/labels/mastersthesis
Diverse Double Compilation for Bitcoin
Supervisor: Benoit Baudry, Martin Monperrus, KTH Royal Institute of Technology
Description: The problem of deceptive compilers introducing malicious code is relevant and hard [1,2]. One solution for this is to use doule diverse compilation to mitigate the problem [3] Yet it is hard to change the compilation pipeline of real software. You will design, implement and evaluate diverse double compilation for bitcoin-core.
Binary Code
Automated Software Diversity via Multi-compilation
Supervisor: Benoit Baudry, Martin Monperrus, KTH Royal Institute of Technology
Description: The problem of deceptive compilers introducing malicious code is relevant and hard [1,2]. One solution for this is to use software diversity [3]. For instance, one can compile a C program with both GCC and CLANG. You will design, implement and evaluate a multi-compiler scheme for C.
Defending Against Compiler-Based Backdoors Java Decompiler Diversity and its Application to Meta-decompilation
Patch Presence Assessment on Binary Code
Supervisors: Martin Monperrus
Description: It is essential to know whether a specific binary contains all required vulnerability fixes. For: we need to components: 1) extracting signatures of vulnerability fixes 2) checking for the presence of those fixes in a binary. The student will work on those components in the context of Java code.
Decompilation for WebAssembly/WASM using Datalog
A decompiler takes compiled binary code and produces textual source code. Decompilation is an essential step for program comprehension, security analyses, etc. However, it is challenging to write an accurate decompiler (that can always reconstruct the source code that actually corresponds to the compiled code) and the implementation of decompilers currently relies on the careful, manual design of decompilation rules. Recently, researchers have proposed to use Datalog for decompilation [1]. In this work, you will will design and evaluate a decompiler for WebAssembly/WASM based on Datalog.
Category Program Repair
Machine Learning for Program Repair
You would commit in project https://github.com/KTH/sequencer.
Trace Analysis for Patch Generation
Supervision: Zimin Chen, Martin Monperrus (KTH)
Description: To reason about programs one can use source code or/and dynamic traces. Most works in machine learning on programs only use code [1]. However, the usage of traces is a very promising research direction [2,3]. You will invent a new patch generation systems, based on trace embeddings. The work will be done on top of recent works in the field [2,3], for generating patches in Java.
Unsupervised Learning for Program Repair (Sequencer)
Supervision: Zimin Chen, Martin Monperrus (KTH)
Recently, it has been shown that one can use self-supervised learning to synthesize code [1,2]. This opens up many new possibilities for program repair. The goal of this work is to use self-supervised learning to learn how to fix one-line bugs [3].
SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair
Graph-based Self-Supervised Program Repair from Diagnostic Feedback (2020)
Automatic transfer of formatting with machine learning
Supervisor: Martin Monperrus, KTH Royal Institute of Technology
Description: It is common practice to use and enforce a certain coding style in software projects. This can become a nightmare when one copies files from one project to another, where the project use different conventions. For this, there is a need to be able to transfer the style one project to files coming from potentially anywhere with any style. It may be possible to use machine learning to perform the transfer [1,2] The student will devise, implement and evaluate an approach to automatically transfer coding style. The student will perform the experiments a scientific computing grid.
Learning Natural Coding Conventions (https://github.com/mast-group/naturalize)
Towards a Universal Code Formatter through Machine Learning (https://github.com/antlr/codebuff)
Program Analysis for Automatic Repair
You would commit in project https://github.com/eclipse/repairnator.
Program Repair with Datalog
Supervision: Martin Monperrus (KTH)
Description: Static analysis tools are much used in industry to statically detect bugs and code smells [1]. It has been shown that anti-unification can be used to synthesize patches for static analysis warnings [2]. Datalog is a language that is becoming more and popular for program analysis due to its effectiveness [3]. You will design, implement and evaluate a program repair system to repair SonarQube static analysis warnings using anti-unification and Datalog.
Automatic repair of unreproducible builds (Repairnator)
Supervisor: Martin Monperrus, KTH Royal Institute of Technology
Description: Achieving reproducible builds is essential for achieving a secure software supply chain. In particular, one must be sure that no malicious actor has injected malware during the build process. The Debian linux distribution has deployed this in the large. However, when a project build process is no reproducible or becomes unstable again, it can be really hard to find the root cause and fix it. You will design, implement and evaluate a system that automatically suggests repair actions for unreproducible build processes.
Automatic ranking of pull-requests for merging or rejecting
Supervisor: Martin Monperrus, KTH Royal Institute of Technology
Description: On Github, certain projects have too many pull requests and the maintainers need a way to order the pull requests. The ranking could be according to their maturity, importance or likelihood to be merged or rejected. You will devise, implement and evaluate an approach for ranking pull-requests based the history of merged and rejected pull-requests.
Wait for it: determinants of pull request evaluation latency on GitHub
Improving the pull requests review process using learning-to-rank algorithms
Generate-and-validate program repair
Automatic Program Repair of Bears with Repair Templates in Astor
Supervisor: Martin Monperrus (KTH Royal Institute of Technology)
Description: In automatic program repair, template-based repair is an effective way to reduce programs with little overfitting [1]. The student will implement template-based repair in Astor [3], based on the consolidated list by Liu et al [1]. The student will design design and perform a large scale experiment of template-based repair on the BEARS benchmark [2], with quantitative and qualitative analysis.
Explainable repair templates for Astor
Supervisor: Martin Monperrus (KTH Royal Institute of Technology), Matias Martinez (University of Valenciennes)
Description: In practice, template-based repair is promising, because they are very easy to explain to the developer and they also provide reduced overfitting. The student will design and implement a full-fledged template based system for Astor, based on the latest literature on repair templates. A large scale evaluation will be done on the novel repair benchmarks Bugs.jar and BEARS. we can play with the vocabulary to have pure language based templates and project-specific ones
Explainable Software Bot Contributions: Case Study of Automated Bug Fixes
FixMiner: Mining Relevant Fix Patterns for Automated Program Repair
Automatic Repair of OCRed programs
Supervisor: Martin Monperrus (KTH Royal Institute of Technology)
Description: In difficult situations, programs have to be transferred with other means than electronic channels. For instance, in the 90ies, the code of PGP had to be sent outside the USA as a printed book [1]. Yet, performing OCR on programs is a challenge, because a single incorrectly recognized character can make the program non compilable or non executable. For this, automatic repair of OCR errors can be used [2] or one can encode the program differently [3]. The goal of this thesis is to push the state of the art of automatic repair of OCRed programs.
Category Program Hardening
See also https://github.com/KTH/royal-chaos.
Chaos Engineering
Master's thesis Chaos Engineering at Electrolux
See https://www.linkedin.com/jobs/view/2260031591/.
Maximum Likelihood Perturbations for Chaos Engineering
Supervision: Long Zhang, Martin Monperrus, KTH Royal Institute of Technology
Description: Chaos Engineering [1] is the discipline of verifying resilience capabilities of software systems in production. Our group is doing research in this field, in the context of Java applications [2]. Today, perturbation models are naive: they are either random or based on enumeration. You will study, design and implement a chaos engineering algorithm where each perturbation is chosen according to a utility function in order to maximize the knowledge gained from each perturbation.
Chaos Engineering for Microservices with Resilience4J
Supervision: Long Zhang, Martin Monperrus, KTH Royal Institute of Technology
Description: Chaos Engineering [1] is the discipline of verifying resilience capabilities of software systems in production. Resilience4J [2] is a notable middleware for microservices and DevOps orginally done by Netflix. You will study, design and implement a chaos engineering framework for Resilience4J.
Randomization & Self-Healing
Self-healing capabilities for Deployed WebAssembly
Supervisor: Martin Monperrus, KTH Royal Institute of Technology
Description: Over the last few years, the complexity of web applications has extraordinarily increased and WebAssembly has been introduced to overcome performance issues with Javascript. You will design and implement a system that provides self-healing capabilities for web applications with WebAssembly code. The self-healing techniques will be inspired from [1].
Fully Automated HTML and Javascript Rewriting for Constructing a Self-healing Web Proxy
From hack to elaborate technique—a survey on binary rewriting
Preventing cache attacks on AES with blackbox randomization
Supervisor: Martin Monperrus, KTH Royal Institute of Technology
Description: The goal of this thesis is to study counter-measures to cache attacks on AES. A cache attack consists of inferring a private key by means of cache observation [1]. Black-box randomization consists of identifying and injecting randomization points in software, without any knowledge of the application domain and implementation choices. The goal of this thesis is to study the usage of black-box randomization for countering cache attacks. The student will devise and perform a scientific experiment in this context. She/he will read the literature, implement the required software for supporting the experiment, design the inclusion criteria for subjects and run the experiment on a scientific computing grid.
Preventing algorithmic DOS attacks with blackbox randomization
Supervisor: Martin Monperrus, KTH Royal Institute of Technology
Description: The goal of this thesis is to study counter-measures to algorithmic denial of service attacks. An algorithmic DOS consists of a input specifically designed by the attacker to trigger the worst case execution of a program [1]. Black-box randomization consists of identifying and injecting randomization points in software, without any knowledge of the application domain and implementation choices. The goal of this thesis is to study the usage of black-box randomization for countering algorithmic DOS attacks. The student will devise and perform a scientific experiment in this context. She/he will read the literature, implement the required software for supporting the experiment, design the inclusion criteria for subjects and run the experiment on a scientific computing grid.
Correctness Attraction: A Study of Stability of Software Behavior Under Runtime Perturbation
Slowfuzz: Automated domain-independent detection of algorithmic complexity vulnerabilities
Category Software Testing
Collection and Analysis of Code Coverage in Production
Supervisors: Martin Monperrus (KTH)
Description: Code coverage usually relates to test code. Production code coverage is the coverage over real interactions made by users in production. Obtaining and analysing production code coverage enables to identify useless code as well as relevant test data and values. It enables testers and developers to better align the test intentions with what matters for users. The student will compare and analyze techniques for automatically collecting code coverage in production for Java software.
Automatic Identification of Pseudo-tested Conditions
Supervisors: Benoit Baudry, Martin Monperrus (KTH)
Description: One of the problems with coverage is that it does not detect unspecified code [1]. There is the same problem with conditions, some conditions are well covered according to edge coverage yet the code can actually take any branch without failing. The student will design, implement and evaluate a novel technique for automatically identifying pseudo-tested conditions in Java software.
Automatic Debloat of Test Suites
Supervisors: Benoit Baudry, Martin Monperrus (KTH)
Description: Today, test suites are really big. Not all tests are equally effective, some tests are useless, some tests overlap. Test code has to be maintained and having too much test code can be considered as technical debt. The student will design, implement and evaluate a novel technique for automatically debloating test suites in Java in order to increase test effectiveness.
Automatic Repair of Rotten Green Tests
Supervisors: Benoit Baudry, Martin Monperrus (KTH)
Description: Rotten green tests provide a false sense of security, and it is is a big problem in practice. Following the automatic repair philosophy, one can automatically repair such tests by refactoring the test case. The student will design, implement and evaluation a prototype system for automatic repair of rotten green tests.