Open research topics

by Martin Monperrus

Under my supervision, you can do cool research in software technology, here are our current hot topics.

Are you a KTH student? See Master's thesis / Bachelor's thesis guidelines and contact me by email

Are you a brilliant international student? Contact me by email


Category Program Repair
     Machine Learning for Program Repair
          Datasets for Machine Learning-Driven Security in Go
          Analysis and Categorization of Patches for the SWE-bench Benchmark
          Explaining Code LLms with monosemanticity
          Test-Time Compute for Program Repair
          Automated Prompt Engineering for Program Repair
          Learning Program Transformations with Transformers
          Optimizing Code Diff Representation Strategies for Large Language Models: A Comparative Analysis and Framework
          An Empirical Comparison on Semantics Preserving Transformation Tools
     Code Analysis for program repair
          Semantic Tree based Resolution for Git Merge Conflicts
          Self-supervised learning for proving program equivalence in LLVM
          Automated Program Repair for Smart Contracts
          Identifying the Best Code LLM for Embedding Functions
Category Software Supply Chain (CHAINS)
          Empirical Study of Compilation Reproducibility in Solidity
          Study of non-reproducible builds in the Java ecosystem
          Dynamic Integrity Verification & Repair for Java Applications
          Dynamic Introspection of Dependencies in Java Applications
          Automatic Backporting of Java Libraries to Older Bytecode Versions
          Invariant Generation from Production Runs
Category Crypto & Smart Contracts
          Automated Program Repair for Smart Contracts
          Tracing Private Key Access in Crypto Wallet Dependencies
          Automatic Exploit Synthesis for Smart Contracts

Category Program Repair

Previous work in my group on this topic: https://www.monperrus.net/martin/bibtexbrowser.php?keywords=repair&bib=monperrus.bib

Machine Learning for Program Repair

Datasets for Machine Learning-Driven Security in Go

Machine learning for automatically detecting malicious code, malware, and vulnerabilities relies heavily on high-quality, labeled datasets for training classifiers. While such datasets exist for malware (e.g., in ecosystems like PyPI and npm) and vulnerabilities (e.g., in languages like C and Java), similar efforts are lacking in Go, hindering the widespread adoption of automated methods for detecting malicious code and vulnerabilities in this ecosystem. In this thesis, you will bridge the gap in Go security by creating datasets for downstream machine learning security tasks. This involves eg. developing methodologies for generating malicious code samples or collecting and processing samples in the wild, and validating the resulting dataset's quality and effectiveness for machine learning-based security tasks.

Analysis and Categorization of Patches for the SWE-bench Benchmark

The project aims to investigate and classify patches generated for the SWE-bench benchmark, a suite of software defects and corresponding fixes, using advanced techniques such as Abstract Syntax Tree (AST) analysis and machine learning. The objectives include conducting a detailed analysis of the structural and semantic characteristics of these patches, developing a categorization framework based on criteria like bug type, patch complexity, and affected programming constructs. The methodology involves analyzing a dataset of patches, utilizing the Gumtree AST diff library for structural analysis, extracting key features from the patches, and using machine learning models to predict patch categories. The expected outcomes include a comprehensive understanding of SWE-Bench patches,contributing to the advancement of automated program repair in the software engineering field.

Explaining Code LLms with monosemanticity

LLMs have revolutionized machine learning on code. However, they are mostly black-boxes which we still do not understand. In this project, you will explore the monosemanticity in LLMs trained on code. Monosemanticity is a recent area of mechanistic interpretability which learns monosemantic (i.e. they only have one meaning) linear combinations of neuron activations, overcoming the problem of a single neuron representing different semantic features. Your work will aim to understand and activate features related with code, specifically ones that improve code quality.

  1. https://transformer-circuits.pub/2023/monosemantic-features/index.html (original paper)

  2. https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html (scaling paper)

  3. https://www.astralcodexten.com/p/god-help-us-lets-try-to-understand (explanation blogpost)

  4. https://github.com/jbloomAus/SAELens (open-source implementation of sparse auto-encoder)

  5. https://transformerlensorg.github.io/TransformerLens/index.html (mech interpretability repo, supports loading of SAE trained by SAELens)

Test-Time Compute for Program Repair

The project focuses on exploring the concept of test-time training (TTT) for optimizing the performance of code LLMs in program repair tasks. TTT involves adapting the model during inference by leveraging explanations in the input data. This approach allows the model to enhance its ability to understand and process programming languages and effectively address program repair challenges. By dynamically producing reasoning of the task at hand, TTT/TTC aims to improve the efficiency and accuracy of LLMs in handling code-related tasks.

  1. RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair

  2. The Surprising Effectiveness of Test-Time Training for Abstract Reasoning

  3. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Automated Prompt Engineering for Program Repair

Description: Prompt engineering is a crucial aspect of utilising large language models effectively. In this project, you will explore the use of automated prompt engineering methods in the context of program repair. The goal is to develop, implement, and evaluate a system that can generate effective prompts to guide a language model in repairing faulty code.

  1. DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

  2. Large Language Models as Optimizers

Learning Program Transformations with Transformers

Description: The application of program transformations, such as bug fixing and refactoring, is essential for maintaining and improving software quality. In this project, you will investigate the use of transformer models to learn from a diverse set of program transformation applied across multiple projects. The objective is to develop a system that can automatically generate transformations for given code snippets by training on historical transformation data. This involves collecting a dataset of projects, curating code transformations, designing an appropriate transformer architecture, and evaluating the model's ability to generalize transformations to unseen code.

  1. Attention is all you need

  2. Learning to represent edits

Optimizing Code Diff Representation Strategies for Large Language Models: A Comparative Analysis and Framework

This thesis explores the effectiveness of different code diff representation methods when interacting with LLMs. The study will evaluate various diff formats (unified, side-by-side, contextual, semantic) across multiple LLMs (GPT, Claude, etc.) to determine which combinations yield the most accurate parsing results from the LLM output. The research will develop a scoring framework considering factors like diff application accuracy. The outcome will be a decision matrix and a tool for selecting optimal prompting strategies based on specific LLM characteristics and use cases.

  1. Evaluating large language models trained on code

  2. Octopack: Instruction tuning code large language models

An Empirical Comparison on Semantics Preserving Transformation Tools

Description: In recent years, various tools have been developed to generate equivalent programs using semantics preserving transformations. These tools aim to produce code that is semantically identical but syntactically different from the original code. In this thesis, you will embark on a comparative study of these existing tools, examining their efficiency and effectiveness in generating equivalent programs. This comparative study will shed light on the strengths and weaknesses of each tool, potentially inspiring further advancements in the field of semantics preserving transformations.

  1. On the generalizability of Neural Program Models with respect to semantic-preserving program transformations

  2. Self-Supervised Learning to Prove Equivalence Between Programs via Semantics-Preserving Rewrite Rules

  3. NatGen: generative pre-training by “naturalizing” source code

Code Analysis for program repair

Semantic Tree based Resolution for Git Merge Conflicts

In modern pull-request based development, merge conflicts are a common problem. Today, both client and server Git software still rely on basic line-based merge resolution. The student will design and implement the next generation of our flagship AST-based merge resolution system, Spork.

  1. Spork: Structured Merge for Java with Formatting Preservation

  2. Evaluation of Version Control Merge Tools

Self-supervised learning for proving program equivalence in LLVM

In recent years, self-supervised learning has emerged as a powerful technique for encoding high-level semantic properties in the absence of explicit supervision signals. The focus of this thesis is to explore the application of self-supervised learning methodologies towards proving program equivalence in LLVM bytecode. LLVM provides a structured format for representing program constructs at the intermediate level. Program equivalence is a fundamental problem in computer science, concerned with proving that two programs exhibit the same behavior under all possible inputs. By utilizing self-supervised learning techniques, we aim to develop a practical approach for efficient and accurate program equivalence verification in a mainstream binary format.

  1. On the generalizability of Neural Program Models with respect to semantic-preserving program transformations

  2. Self-Supervised Learning to Prove Equivalence Between Programs via Semantics-Preserving Rewrite Rules

Automated Program Repair for Smart Contracts

Description: Smart contracts are software, and hence, cannot be perfect. Smart contracts suffer from bugs, some of which putting high financial stakes at risk. There is a new line of research on automated patching of smart contract. You will devise, perform and analyze a comparative experiment to identify the successes, challenges and limitations of automated program repair for smart contracts.

  1. Elysium: Automagically Healing Vulnerable Smart Contracts Using Context-Aware Patching

  2. EVMPatch: Timely and automated patching of ethereum smart contracts

Identifying the Best Code LLM for Embedding Functions

Description: This thesis aims to identify the most effective LLM for function-level embedding. By evaluating various state-of-the-art open source code LLMs, the research will assess their performance in capturing semantic information and contextual relationships within code. The goal is to design and operate an evaluation framework for selecting the best function-level embedding model. This research will provide practical insights for researchers in ownstream tasks such as vulnerability detection and automated program repair.

  1. CodeBERT: A Pre-Trained Model for Programming and Natural Languages

  2. A Literature Study of Embeddings on Source Code

  3. A Survey on Pre-Trained Models of Source Code

Category Software Supply Chain (CHAINS)

Work done as part of the CHAINS research project. See also [other Chains topics](https://chains.proj.kth.se/master-thesis.html).

Empirical Study of Compilation Reproducibility in Solidity

Description: The reproducibility of software builds is a critical aspect of secure software development This concept has been pushed forward in the context of Solidity, the programming language used for writing smart contracts on the Ethereum blockchain, with the notion of "verified contracts". In this thesis, you will conduct an empirical study on the reproducibility of compilation in Solidity. You will recompile verified Solidity contracts and analyze the consistency of the results. The datasets for this study will be sourced from Etherscan and Sourcify. This research will contribute to the understanding of software integrity in the growing field of technology and could potentially inform best practices for Solidity development.

  1. Reproducible Builds: Increasing the Integrity of Software Supply Chains

  2. Etherscan

  3. Sourcify

Study of non-reproducible builds in the Java ecosystem

Description: Build Reproducibility means that a software build always results in a bit-by-bit identical output provided the source code and build environment is also the exact same [1]. This property is a good safeguard against compromised build process threat [2] and hence it is an important safeguard for software supply chain security. In Java ecosystem, Reproducible Central attempts to reproduce Maven/Gradle/sbt artifacts on Maven Central. It does so by building the artifact from source and then comparing it with the artifact in Maven registry. If it is bit-by-bit identical, then the maven package is said to be reproducible, else the package is non-reproducible. In this thesis, you will create a taxonomy of reasons for non-reproducible builds of Maven packages.

  1. https://reproducible-builds.org/

  2. AROMA: Automatic Reproduction of Maven Artifacts

  3. Example problems: https://github.com/algomaster99/reproducible-central/issues/4 and https://github.com/algomaster99/reproducible-central/issues/5

Dynamic Integrity Verification & Repair for Java Applications

Description: Attackers constantly try to tamper with the code of software applications in production. Chang and Attalah have proposed a technique to not only detect modifications and also repairing the code after attacks by a network of small security units called guards. These guards can be programmed to perform tasks such as checksumming the program code, and they work in concert to create mutual protection. In this thesis, you will devise, implement and evaluate such as an approach in the context of modern Java software with dependencies. An open question is how to set up guard inside or around dependency code.

  1. Protecting Software Code by Guards

  2. Reflection as a mechanism for software integrity verification

  3. Practical integrity protection with oblivious hashing

Dynamic Introspection of Dependencies in Java Applications

Description: We aim to design and develop a prototype for dynamic introspection of dependencies in Java applications. This would enable real-time tracking and decision based on the dependency execution context. By leveraging Java's instrumentation capabilities, the proposed system will monitor and identify the active dependencies at any given point during program execution. The focus will be on minimizing performance overhead to ensure that the introspection process does not significantly impact the application's responsiveness or efficiency, while integrating seamlessly with any existing Java application. Rigorous evaluation against various benchmarks will be one to assess its accuracy, performance, and usability.

  1. Reflection as a mechanism for software integrity verification

Automatic Backporting of Java Libraries to Older Bytecode Versions

Description: With the rapid evolution of Java, libraries often get updated to new bytecode versions. This causes compatibility issues and breakages for applications that are still running on older versions of Java. To address this, a possible solution is to automatically backport Java libraries to older bytecode versions. This thesis will focus on designing and implementing an automated tool for backporting Java libraries. The tool should be capable of translating new bytecode instructions to their older equivalents, maintaining the functional behavior of the library while ensuring compatibility with older Java versions. An open question is how to handle new language features and APIs that do not have direct equivalents in older versions.

  1. Back to the past–analysing backporting practices in package dependency networks

  2. Recommending code changes for automatic backporting of Linux device drivers

  3. Transforming C++11 Code to C++03 to Support Legacy Compilation Environments

Invariant Generation from Production Runs

The goal is to propose a novel approach to invariant generation from production runs of software systems, building upon the foundational work of Daikon. We aim to enhance the accuracy and efficiency of invariant generation by integrating machine learning techniques and runtime analysis, thereby improving the quality and applicability of invariants in real-world scenarios.

  1. The Daikon system for dynamic detection of likely invariants

  2. Augmenting Test Oracles with Production Observations

Category Crypto & Smart Contracts

Automated Program Repair for Smart Contracts

Description: Smart contracts are software, and hence, cannot be perfect. Smart contracts suffer from bugs, some of which putting high financial stakes at risk. There is a new line of research on automated patching of smart contract. You will devise, perform and analyze a comparative experiment to identify the successes, challenges and limitations of automated program repair for smart contracts.

  1. Elysium: Automagically Healing Vulnerable Smart Contracts Using Context-Aware Patching

  2. EVMPatch: Timely and automated patching of ethereum smart contracts

Tracing Private Key Access in Crypto Wallet Dependencies

Description: Software supply chain attacks pose a critical risk to cryptocurrency wallets, particularly when dependencies have access to sensitive cryptographic material like private keys. This research proposes to conduct a comprehensive analysis of major cryptocurrency wallets to trace and map all third-party libraries and suppliers that have potential access to users' private keys during runtime. The investigation will employ static and dynamic analysis techniques to identify the complete dependency chain and privilege levels of each component that interacts with private key material. This is crucial because any compromised dependency with key access could lead to catastrophic loss of funds, as demonstrated in the 2018 Copay wallet incident. The goal is to create a detailed risk assessment framework that ranks wallets based on their private key exposure surface to third-party code, and to propose architectural improvements that minimize unnecessary private key access across the software supply chain.

  1. Annotating, tracking, and protecting cryptographic secrets with cryptompk

  2. Security Aspects of Cryptocurrency Wallets-A Systematic Literature Review

  3. Software supply chain attacks on crypto

  4. Backstabber's knife collection: A review of open source software supply chain attacks

Automatic Exploit Synthesis for Smart Contracts

Smart contracts typically hold large stakes and consequently, they are under constant attack by malicious actors. As counter-measure, engineering smart contracts involves auditing and formal verification. Another option is automatic exploit synthesis In this thesis, you will evaluate the state of the art of exploit synthesis for smart contracts. You will then design, implement and evaluate a better system that improves upon the state of the art.

  1. ExGen: Cross-platform, Automated Exploit Generation for Smart Contract Vulnerabilities

  2. FlashSyn: Flash Loan Attack Synthesis via Counter Example Driven Approximation

  3. Smart Contract and DeFi Security: Insights from Tool Evaluations and Practitioner Surveys