Under my supervision, you can do cool research in software technology, here are our current hot topics.
Are you a KTH student? See Master's thesis / Bachelor's thesis guidelines and contact me by email
Are you a brilliant international student? Contact me by email
Category Code Transformation
Source Code
Delta-debugging for debloating Java source code using Spoon
Advanced constructs in the Semantic Patch Language for Spoon
Diverse Double Compilation for Bitcoin
Binary Code
Automated Software Diversity via Multi-compilation
Patch Presence Assessment on Binary Code
Decompilation for WebAssembly/WASM using Datalog
Category Program Repair
Machine Learning for Program Repair
Addressing Catastrophing Forgetting in Patch Generation
Execution Trace Analysis for Patch Generation
Automatic transfer of formatting with machine learning
Program Analysis for Automatic Repair
Automatic repair of unreproducible builds (Repairnator)
Automatic ranking of pull-requests for merging or rejecting
Generate-and-validate program repair
Automatic Program Repair of Bears with Repair Templates in Astor
Explainable repair templates for Astor
Automatic Repair of OCRed programs
Category Program Hardening
Chaos Engineering
Maximum Likelihood Perturbations for Chaos Engineering
Chaos Engineering for Microservices with Resilience4J
Randomization & Cyber-deception
Machine learning to synthesize faux patches
Self-healing capabilities for Deployed WebAssembly
Preventing cache attacks on AES with blackbox randomization
Category Software Testing
Collection and Analysis of Code Coverage in Production
Automatic Identification of Pseudo-tested Conditions
Automatic Debloat of Test Suites
Automatic Repair of Rotten Green Tests
Category Code Transformation
Source Code
See also topics on the Spoon repository: https://github.com/INRIA/spoon/labels/mastersthesis
Delta-debugging for debloating Java source code using Spoon
Supervisor: César Soto Valero, Martin Monperrus, KTH Royal Institute of Technology
Description: The problem of removing unnecessary code (aka software debloat) is relevant and challenging [1]. Hierarchical Delta Debugging (HDD) is a useful technique for minimizing all failure-inducing inputs for more effective debugging [2]. HDD can be used to improve the debloating process [3]. It allows determining the set of instructions in a program that, after execution, satisfies a set of user-provided outputs. Yet, there is currently no tool that to debloat Java code based on HDD techniques. The student will design, implement, and evaluate an HDD-based approach to debloat the source code of Java programs using the Spoon library.
Misherghi, Ghassan, and Zhendong Su. “HDD: hierarchical delta debugging.” Proceedings of the 28th international conference on Software engineering. 2006.
Heo, Kihong, et al. “Effective program debloating via reinforcement learning.” Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 2018.
Advanced constructs in the Semantic Patch Language for Spoon
Supervisor: Martin Monperrus, KTH Royal Institute of Technology
Description: Code transformation is a powerful tool in dynamic software analysis [1]. Declarative transformations are easier to specify, understand and maintain. In that realm, the "Semantic Patch Language (SmPL)" is the state-of-the-art [3,4]. The student will extend SmPL in the Spoon library [1,2], starting with 1) Loop support 2) Matching on "for" elements specifically. 3) Matching on type arguments e.g <String>. 4) Matching and transformation on "try" elements.
Spoon: A Library for Implementing Analyses and Transformations of Java Source Code
Design and Implementation of Semantic Patch Support for the Spoon Java Transformation Engine
Diverse Double Compilation for Bitcoin
Supervisor: Benoit Baudry, Martin Monperrus, KTH Royal Institute of Technology
Description: The problem of deceptive compilers introducing malicious code is relevant and hard [1,2]. One solution for this is to use doule diverse compilation to mitigate the problem [3] Yet it is hard to change the compilation pipeline of real software. You will design, implement and evaluate diverse double compilation for bitcoin-core.
Binary Code
Automated Software Diversity via Multi-compilation
Supervisor: Benoit Baudry, Martin Monperrus, KTH Royal Institute of Technology
Description: The problem of deceptive compilers introducing malicious code is relevant and hard [1,2]. One solution for this is to use software diversity [3]. For instance, one can compile a C program with both GCC and CLANG. You will design, implement and evaluate a multi-compiler scheme for C.
Defending Against Compiler-Based Backdoors Java Decompiler Diversity and its Application to Meta-decompilation
Patch Presence Assessment on Binary Code
Supervisors: Martin Monperrus
Description: It is essential to know whether a specific binary contains all required vulnerability fixes. For: we need to components: 1) extracting signatures of vulnerability fixes 2) checking for the presence of those fixes in a binary. The student will work on those components in the context of Java code.
Decompilation for WebAssembly/WASM using Datalog
A decompiler takes compiled binary code and produces textual source code. Decompilation is an essential step for program comprehension, security analyses, etc. However, it is challenging to write an accurate decompiler (that can always reconstruct the source code that actually corresponds to the compiled code) and the implementation of decompilers currently relies on the careful, manual design of decompilation rules. Recently, researchers have proposed to use Datalog for decompilation [1]. In this work, you will will design and evaluate a decompiler for WebAssembly/WASM based on Datalog.
Category Program Repair
Machine Learning for Program Repair
You would commit in project https://github.com/KTH/sequencer.
Addressing Catastrophing Forgetting in Patch Generation
Supervision: Martin Monperrus (KTH)
Description: Recent approaches to automatic repair use machine learning on source code [1,2]. Yet, it has been shown that when done in an online context, ML-based repair systems tend to "forget" how to fix programs [3]. This is problem called catastrophic forgetting, which occurs when newly learned knowledge interferes with capabilities previously learned by the model [4]. You will study in depth the reasons and the mitigations for catastrophic forgetting in automatic patch generation.
Execution Trace Analysis for Patch Generation
Supervision: Zimin Chen, Martin Monperrus (KTH)
Description: To reason about programs one can use source code or/and dynamic traces. Most works in machine learning on programs only use code [1]. However, the usage of traces is a very promising research direction [2,3]. You will invent a new patch generation systems, based on trace embeddings. The work will be done on top of recent works in the field [2,3], for generating patches in Java.
Automatic transfer of formatting with machine learning
Supervisor: Martin Monperrus, KTH Royal Institute of Technology
Description: It is common practice to use and enforce a certain coding style in software projects. This can become a nightmare when one copies files from one project to another, where the project use different conventions. For this, there is a need to be able to transfer the style one project to files coming from potentially anywhere with any style. It may be possible to use machine learning to perform the transfer [1,2] The student will devise, implement and evaluate an approach to automatically transfer coding style. The student will perform the experiments a scientific computing grid.
Learning Natural Coding Conventions (https://github.com/mast-group/naturalize)
Towards a Universal Code Formatter through Machine Learning (https://github.com/antlr/codebuff)
Program Analysis for Automatic Repair
You would commit in project https://github.com/eclipse/repairnator.
Automatic repair of unreproducible builds (Repairnator)
Supervisor: Martin Monperrus, KTH Royal Institute of Technology
Description: Achieving reproducible builds is essential for achieving a secure software supply chain. In particular, one must be sure that no malicious actor has injected malware during the build process. The Debian linux distribution has deployed this in the large. However, when a project build process is no reproducible or becomes unstable again, it can be really hard to find the root cause and fix it. You will design, implement and evaluate a system that automatically suggests repair actions for unreproducible build processes.
Automatic ranking of pull-requests for merging or rejecting
Supervisor: Martin Monperrus, KTH Royal Institute of Technology
Description: On Github, certain projects have too many pull requests and the maintainers need a way to order the pull requests. The ranking could be according to their maturity, importance or likelihood to be merged or rejected. You will devise, implement and evaluate an approach for ranking pull-requests based the history of merged and rejected pull-requests.
Wait for it: determinants of pull request evaluation latency on GitHub
Improving the pull requests review process using learning-to-rank algorithms
Generate-and-validate program repair
Automatic Program Repair of Bears with Repair Templates in Astor
Supervisor: Martin Monperrus (KTH Royal Institute of Technology)
Description: In automatic program repair, template-based repair is an effective way to reduce programs with little overfitting [1]. The student will implement template-based repair in Astor [3], based on the consolidated list by Liu et al [1]. The student will design design and perform a large scale experiment of template-based repair on the BEARS benchmark [2], with quantitative and qualitative analysis.
Explainable repair templates for Astor
Supervisor: Martin Monperrus (KTH Royal Institute of Technology), Matias Martinez (University of Valenciennes)
Description: In practice, template-based repair is promising, because they are very easy to explain to the developer and they also provide reduced overfitting. The student will design and implement a full-fledged template based system for Astor, based on the latest literature on repair templates. A large scale evaluation will be done on the novel repair benchmarks Bugs.jar and BEARS. we can play with the vocabulary to have pure language based templates and project-specific ones
Explainable Software Bot Contributions: Case Study of Automated Bug Fixes
FixMiner: Mining Relevant Fix Patterns for Automated Program Repair
Automatic Repair of OCRed programs
Supervisor: Martin Monperrus (KTH Royal Institute of Technology)
Description: In difficult situations, programs have to be transferred with other means than electronic channels. For instance, in the 90ies, the code of PGP had to be sent outside the USA as a printed book [1]. Yet, performing OCR on programs is a challenge, because a single incorrectly recognized character can make the program non compilable or non executable. For this, automatic repair of OCR errors can be used [2] or one can encode the program differently [3]. The goal of this thesis is to push the state of the art of automatic repair of OCRed programs.
Category Program Hardening
See also https://github.com/KTH/royal-chaos.
Chaos Engineering
Maximum Likelihood Perturbations for Chaos Engineering
Supervision: Long Zhang, Martin Monperrus, KTH Royal Institute of Technology
Description: Chaos Engineering [1] is the discipline of verifying resilience capabilities of software systems in production. Our group is doing research in this field, in the context of Java applications [2]. Today, perturbation models are naive: they are either random or based on enumeration. You will study, design and implement a chaos engineering algorithm where each perturbation is chosen according to a utility function in order to maximize the knowledge gained from each perturbation.
Chaos Engineering for Microservices with Resilience4J
Supervision: Long Zhang, Martin Monperrus, KTH Royal Institute of Technology
Description: Chaos Engineering [1] is the discipline of verifying resilience capabilities of software systems in production. Resilience4J [2] is a notable middleware for microservices and DevOps orginally done by Netflix. You will study, design and implement a chaos engineering framework for Resilience4J.
Randomization & Cyber-deception
Machine learning to synthesize faux patches
Supervisor: Benoit Baudry, Martin Monperrus, Pontus Johnson, KTH Royal Institute of Technology
Description: Cyber-deception is the body of knowledge for securing software systems by "denial, deceit, misinformation, camouflage and obfuscation" [1]. Introducing faux patches in software [2] is a way to prevent attackers from easily reverse engineer legitimate vulnerability patches. The student will study a way to automatically synthesize faux patches with machine learning. The work will be based on the latest progress made at KTH on sequence to sequence learning for vulnerabilities [3].
Self-healing capabilities for Deployed WebAssembly
Supervisor: Martin Monperrus, KTH Royal Institute of Technology
Description: Over the last few years, the complexity of web applications has extraordinarily increased and WebAssembly has been introduced to overcome performance issues with Javascript. You will design and implement a system that provides self-healing capabilities for web applications with WebAssembly code. The self-healing techniques will be inspired from [1].
Fully Automated HTML and Javascript Rewriting for Constructing a Self-healing Web Proxy
From hack to elaborate technique—a survey on binary rewriting
Preventing cache attacks on AES with blackbox randomization
Supervisor: Martin Monperrus, KTH Royal Institute of Technology
Description: The goal of this thesis is to study counter-measures to cache attacks on AES. A cache attack consists of inferring a private key by means of cache observation [1]. Black-box randomization consists of identifying and injecting randomization points in software, without any knowledge of the application domain and implementation choices. The goal of this thesis is to study the usage of black-box randomization for countering cache attacks. The student will devise and perform a scientific experiment in this context. She/he will read the literature, implement the required software for supporting the experiment, design the inclusion criteria for subjects and run the experiment on a scientific computing grid.
Category Software Testing
Collection and Analysis of Code Coverage in Production
Supervisors: Martin Monperrus (KTH)
Description: Code coverage usually relates to test code. Production code coverage is the coverage over real interactions made by users in production. Obtaining and analysing production code coverage enables to identify useless code as well as relevant test data and values. It enables testers and developers to better align the test intentions with what matters for users. The student will compare and analyze techniques for automatically collecting code coverage in production for Java software.
Automatic Identification of Pseudo-tested Conditions
Supervisors: Benoit Baudry, Martin Monperrus (KTH)
Description: One of the problems with coverage is that it does not detect unspecified code [1]. There is the same problem with conditions, some conditions are well covered according to edge coverage yet the code can actually take any branch without failing. The student will design, implement and evaluate a novel technique for automatically identifying pseudo-tested conditions in Java software.
Automatic Debloat of Test Suites
Supervisors: Benoit Baudry, Martin Monperrus (KTH)
Description: Today, test suites are really big. Not all tests are equally effective, some tests are useless, some tests overlap. Test code has to be maintained and having too much test code can be considered as technical debt. The student will design, implement and evaluate a novel technique for automatically debloating test suites in Java in order to increase test effectiveness.
Automatic Repair of Rotten Green Tests
Supervisors: Benoit Baudry, Martin Monperrus (KTH)
Description: Rotten green tests provide a false sense of security, and it is is a big problem in practice. Following the automatic repair philosophy, one can automatically repair such tests by refactoring the test case. The student will design, implement and evaluation a prototype system for automatic repair of rotten green tests.