Open research topics

In our research group ASSERT, you will do cool research in software technology, here are our current hot topics.

Are you a KTH student? See Master's thesis / Bachelor's thesis guidelines and contact me by email

Are you a brilliant international student? Contact me by email

Machine Learning for Code
     Machine Learning for Code
          Joint-Embedding Predictive Architecture for Source Code Understanding
          Accelerating Reinforcement Learning for Program Repair through Native Execution and Intelligent Test Selection
          Investigating the Effects of Vulnerable Code Fine-Tuning
          Latent space program synthesis and repair for ARC-AGI-2
          Application of Broken Context Diff Patch with LLM
          Test-Time Compute for Program Repair
          Domain Specific Code Generation Benchmarking
          Computing the Phylogenetic Tree of Open-Source AI Code Models
          Optimizing Code Diff Representation Strategies for Large Language Models: A Comparative Analysis and Framework
          An Empirical Comparison on Semantics Preserving Transformation Tools
          FastCodeBench: Live Benchmarking of Code Models
     Code Analysis for program repair
          Self-supervised learning for proving program equivalence in LLVM
          Identifying the Best Code LLM for Embedding Source Code Functions
          Identifying the Best Embedding Smart Contracts
          Agentic Software Architecture at Electrolux
Category Software Supply Chain (CHAINS)
          Audit Trail of Contributors in Dependencies
          Comparative Analysis of Software Composition Analysis Tools
          Trust Assumptions and Threats in Build Attestation Systems
          Verifiable Software Bill of Materials
          Empirical Study of Compilation Reproducibility in Solidity
          Server Integrity and Provenance Checking with Checksum Databases
          Just-in-Time Malware Analysis of Python Packages
          Dynamic Integrity Verification & Repair for Java Applications
          Enhancing Software Supply Chain Security: A Framework for Verifiable Audit Trails in Dependency Pinning
          Auditable build system
          Package manager with capabilities
          Everything authenticated data structures
Category Crypto & Smart Contracts
          Ensuring Smart Contract Execution with One-Shot Smart Contracts Encoding Simulation Results
          Automated Program Repair for Smart Contracts
          Smart Contract Audit Database
          Tracing Private Key Access in Crypto Wallet Dependencies
          Automatic Exploit Synthesis for Smart Contracts
          Evaluation of the Quality of LLM-generated Invariants
          Synthetic Vulnerability Generation for Smart Contracts
          Effective Mutation Testing for Solidity Smart Contracts
          Design and Evaluation of Blockchain Transaction Signing with Trusted Platform Modules (TPMs)
          Mutability Analysis of Smart Contracts with the Certora Prover
          Smart Contract Security with Business Logic Enforcement
          Deep in the Woods: Unveiling Smart Contract Attack Patterns Through Bytecode Analysis of On-Chain Attacks

Machine Learning for Code

Previous work in my group on this topic: our previous papers, theses in the team.

Machine Learning for Code

See papers

Joint-Embedding Predictive Architecture for Source Code Understanding

Description: Joint-Embedding Predictive Architecture (JEPA) has emerged as a powerful self-supervised learning approach that learns representations by predicting missing information in an abstract representation space rather than in pixel space. For example, JEPA, learns spatiotemporal representations by predicting masked regions of videos. This thesis explores adapting JEPA principles to source code understanding. Instead of predicting masked tokens directly, a Code-JEPA would learn to predict abstract representations of masked code regions (functions, blocks, or statements) based on surrounding context. This approach could capture semantic relationships, control flow patterns, and data dependencies more effectively than token-level prediction. The research will involve: (1) designing appropriate masking strategies for code that respect program structure (e.g., masking entire functions or code blocks rather than random tokens), (2) developing encoder-predictor architectures that operate in latent space to predict representations of masked regions, (3) evaluating whether JEPA-style learning produces more robust and generalizable code representations than traditional masked language modeling approaches like CodeBERT, and (4) assessing performance on downstream tasks such as code search, clone detection, and vulnerability detection. This work has potential to advance self-supervised learning for code by moving beyond surface-level token prediction to deeper semantic understanding.

Accelerating Reinforcement Learning for Program Repair through Native Execution and Intelligent Test Selection

This thesis investigates novel approaches to dramatically reduce the computational bottleneck in reinforcement learning (RL) for program repair by addressing the fundamental challenge of reward collection latency. Current RL-based program repair systems suffer from prohibitive evaluation times due to process spawning overhead, interpreted language execution, and inefficient test selection strategies. You will work on a comprehensive framework that combines: (1) a native code execution environment with efficient inter-process communication (IPC) mechanisms to eliminate Python interpretation overhead, (2) an intelligent test prioritization and selection algorithm that identifies the minimal subset of tests needed to accurately evaluate candidate repairs, and (3) a caching mechanism that remembers previously evaluated code fragments to avoid redundant testing. It is planned to leverage program analysis techniques to create a directed test dependency graph that guides the selection of the most informative tests first, enabling early termination of evaluation for clearly deficient repairs.

Investigating the Effects of Vulnerable Code Fine-Tuning

This thesis aims to replicate the findings of emergent misalignment in large language models (LLMs) through narrow fine-tuning [1], specifically focusing on the underlying mechanisms that contribute to this phenomenon. By fine-tuning various LLMs to produce insecure code, the research will explore how such targeted malicious training can lead to unexpected and broadly misaligned behaviors across unrelated prompts. In addition to replication, this research will delve deeper into the root causes of emergent misalignment by conducting extensive ablation studies and analyzing the models' responses to a diverse set of prompts. By examining the interplay between fine-tuning objectives, dataset characteristics, and model architecture, the thesis will seek to provide a comprehensive understanding of why narrow fine-tuning leads to broad misalignment. The findings will contribute to the ongoing discourse on AI alignment, offering insights that could inform safer and more responsible deployment of LLMs in real-world applications.

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Latent space program synthesis and repair for ARC-AGI-2

ARC-AGI-2 [1] challenges AI to solve visual grid transformation tasks where humans achieve 60% accuracy, yet current AI systems struggle with single-digit scores. Latent space refers to a compressed representation of data used to guide processes like generation or search. In this topic, you will investigate latent space program synthesic [2] and repair under the scope of the ARC-AGI-2 benchmark. The key insight is that latent space search, via gradient descent, is more compute efficient than doing search on the symbolic program space.

Application of Broken Context Diff Patch with LLM

This thesis explores the novel application of Large Language Models (LLMs) to repair and apply broken context diff in software development. When developers exchange code modifications through diff patches, these patches often fail to apply cleanly due to context changes in the target codebase, creating significant workflow disruptions. The research aims to develop an intelligent system that leverages LLMs' semantic understanding of code to correctly interpret and apply broken patches even when traditional patching tools fail. The methodology will involve: (1) creating a dataset of broken patches with their correct applications, (2) fine-tuning LLMs to understand the semantic intent behind patches rather than relying solely on exact context matching, (3) developing heuristics that combine traditional patching algorithms with LLM-based reasoning, and (4) evaluating the system's effectiveness across various programming languages and patch complexity levels. This work has potential to significantly improve developer productivity by reducing manual intervention in the patch application process.

Test-Time Compute for Program Repair

The project focuses on exploring the concept of test-time training (TTT) for optimizing the performance of code LLMs in program repair tasks. TTT involves adapting the model during inference by leveraging explanations in the input data. This approach allows the model to enhance its ability to understand and process programming languages and effectively address program repair challenges. By dynamically producing reasoning of the task at hand, TTT/TTC aims to improve the efficiency and accuracy of LLMs in handling code-related tasks.

Domain Specific Code Generation Benchmarking

Description: This thesis explores the development of specialized benchmarking frameworks for evaluating code generation capabilities of large language models within specific software engineering domains. Given a particular project or codebase, you will design methodologies to quantitatively assess a model's ability to generate accurate, efficient, and contextually appropriate code snippets. By analyzing model outputs against real-world project generation scenarios, the thesis will provide insights into model strengths and limitations in targeted domains, enabling more informed selection and fine-tuning of code generation models for practical software development scenarios.

Computing the Phylogenetic Tree of Open-Source AI Code Models

Description: The open-source AI landscape is experiencing explosive growth, with thousands of models released on platforms like Hugging Face. However, this proliferation of models, often with creative names, obscures their true origins and relationships. Many new models are simply fine-tunes, merges, or minor variants of existing ones, creating an illusion of diversity. This project aims to cut through the noise by automatically constructing a "phylogenetic tree" of open-source AI models. You will develop and apply techniques to determine the lineage of models, distinguishing novel architectures from incremental modifications. The methodology will involve analyzing model weights, architectures, and configuration files to identify parent-child relationships. Furthermore, you will explore using behavioral fingerprints such as performance profiles on specific benchmarks (e.g., EQ-Bench), to cluster and connect related models. The ultimate goal is to create a comprehensive, navigable map of the open-source model ecosystem, providing a crucial tool for researchers to track innovation, understand model evolution, and assess the true diversity of available AI systems.

Optimizing Code Diff Representation Strategies for Large Language Models: A Comparative Analysis and Framework

This thesis explores the effectiveness of different code diff representation methods when interacting with LLMs. The study will evaluate various diff formats (unified, side-by-side, contextual, semantic) across multiple LLMs (GPT, Claude, etc.) to determine which combinations yield the most accurate parsing results from the LLM output. The research will develop a scoring framework considering factors like diff application accuracy. The outcome will be a decision matrix and a tool for selecting optimal prompting strategies based on specific LLM characteristics and use cases.

An Empirical Comparison on Semantics Preserving Transformation Tools

Description: In recent years, various tools have been developed to generate equivalent programs using semantics preserving transformations. These tools aim to produce code that is semantically identical but syntactically different from the original code. In this thesis, you will embark on a comparative study of these existing tools, examining their efficiency and effectiveness in generating equivalent programs. This comparative study will shed light on the strengths and weaknesses of each tool, potentially inspiring further advancements in the field of semantics preserving transformations.

FastCodeBench: Live Benchmarking of Code Models

Description: The field of large language models for code is advancing at an unprecedented pace, with new models being released almost weekly. However, evaluating and comparing these models remains a slow and manual process, with leaderboards often lagging weeks or months behind. This thesis aims to design and implement "FastCodeBench," an automated infrastructure for the live benchmarking of code models. The goal is to create a system that can automatically detect the release of a new model, set up the necessary evaluation environment, run it against a suite of standard coding benchmarks (e.g., HumanEval, MBPP, LiveCodeBench), and publish the results within 48 hours. This project involves work on system automation, model integration APIs, and robust evaluation pipelines, providing a crucial service to the research community by enabling near real-time assessment of the state of the art.

Code Analysis for program repair

Self-supervised learning for proving program equivalence in LLVM

In recent years, self-supervised learning has emerged as a powerful technique for encoding high-level semantic properties in the absence of explicit supervision signals. The focus of this thesis is to explore the application of self-supervised learning methodologies towards proving program equivalence in LLVM bytecode. LLVM provides a structured format for representing program constructs at the intermediate level. Program equivalence is a fundamental problem in computer science, concerned with proving that two programs exhibit the same behavior under all possible inputs. By utilizing self-supervised learning techniques, we aim to develop a practical approach for efficient and accurate program equivalence verification in a mainstream binary format.

Identifying the Best Code LLM for Embedding Source Code Functions

Description: This thesis aims to identify the most effective LLM for function-level embedding. By evaluating various state-of-the-art open source code LLMs, the research will assess their performance in capturing semantic information and contextual relationships within code. The goal is to design and operate an evaluation framework for selecting the best function-level embedding model. This research will provide practical insights for researchers in ownstream tasks such as vulnerability detection and automated program repair.

Identifying the Best Embedding Smart Contracts

Description: This thesis aims to identify the most effective model for generating semantic embeddings of smart contracts. While numerous pre-trained models exist for general source code, their performance on the unique characteristics of smart contract languages like Solidity is not well understood. You will design and execute a comprehensive evaluation framework to compare various state-of-the-art embedding models on their ability to capture the functional and semantic properties of smart contracts. The research will involve creating a benchmark of smart contract-specific tasks, such as clone detection, vulnerability classification, and code search, to systematically assess model performance. The outcome will be a set of recommendations for the best embedding strategies for smart contract analysis, providing a valuable resource for researchers and practitioners working on security and analysis tools for the blockchain ecosystem.

Agentic Software Architecture at Electrolux

> Electrolux is looking for an MSc thesis student to investigate agentic software architecture, including agentic frameworks and potentially also LLM tool calling standard protocols (such as MCP). Electrolux currently uses the AutoGen agentic framework, but would like to explore options for designing and building a reusable agentic framework architecture. Initial applications would be agentic workflows within the Consumer Service, Finance and HR domain.

Category Software Supply Chain (CHAINS)

Work done as part of the CHAINS research project. See also https://chains.proj.kth.se/master-thesis.html.

Audit Trail of Contributors in Dependencies

Description: Open-source projects rely on a community of maintainers and contributors, which is a strength but also introduces potential security risks. New contributors, in particular, can represent a vector for vulnerabilities, as demonstrated by incidents such as the compromise of the event-stream package. For projects that depend on such packages, it is critical to monitor changes in maintainers and contributors to make informed decisions about whether to continue trusting a dependency. Audit trails provide verifiable records of who made changes, when they were made, and how they were reviewed and integrated. Maintaining such records helps verify the trustworthiness of new contributors and allows reconstruction of events if a package is compromised. In this master's thesis, you will design and implement a tool that automatically generates audit trails for new contributors in the dependencies of a project. The tool will track commit history, ownership changes of packages, the introduction of new dependencies, and the presence of release signatures along with their traceability to known maintainers.

Comparative Analysis of Software Composition Analysis Tools

Description: Software Composition Analysis (SCA) tools scan a project's dependencies to identify known security vulnerabilities, thereby supporting software supply chain security. Although numerous SCA tools have been developed, they differ significantly in functionality, capabilities, and the ecosystems they support. However, there is no comparative analysis that systematically compares these tools. In this Master's thesis, you will collect a representative set of SCA tools, analyze and compare their features, and evaluate them on a shared dataset. The study will provide practical insights into how SCA tools perform across different ecosystems and their relative strengths and limitations.

Trust Assumptions and Threats in Build Attestation Systems

Description: Build attestations are cryptographically verifiable statements that describe how, when, and by whom a software artifact was produced. They are used for strengthening software supply chain security by ensuring that binaries and container images can be traced back to a documented build process. While standards like SLSA and tools such as Sigstore, Tekton Chains, and GitHub's native attestations promise to ensure trust in build outputs, there is no systematic assessment of their capabilities and limitations. This thesis will examine which trust assumptions different build attestation systems make, what attacker models they use, and how well current implementations satisfy their security goals. The work should evaluate potential attack vectors and propose recommendations for more robust, verifiable provenance.

Verifiable Software Bill of Materials

Description You will work on designing and evaluating methodologies to verify that a given Software Bill of Materials (SBOM) accurately represents the components present in a corresponding binary executable or software package. The research would involve creating automated verification frameworks that can parse binary files to extract component signatures, dependency relationships, and metadata, then cross-reference this information against the claims made in the SBOM document to detect discrepancies, missing components, or falsified entries. The thesis would explore binary analysis techniques including static analysis, symbol table examination, library fingerprinting, and dependency graph reconstruction, while also investigating cryptographic approaches such as hash-based verification, digital signatures, and merkle tree structures to ensure SBOM integrity and authenticity. The work would address critical cybersecurity challenges in software supply chain security by providing organizations with tools to validate vendor-provided SBOMs, detect supply chain attacks, and maintain accurate inventories of software components for vulnerability management and compliance purposes. The prototype system will be evaluated against real-world software packages and SBOMs, demonstrating its effectiveness in identifying inconsistencies and its practical applicability in enterprise environments.

Empirical Study of Compilation Reproducibility in Solidity

Description: The reproducibility of software builds is a critical aspect of secure software development This concept has been pushed forward in the context of Solidity, the programming language used for writing smart contracts on the Ethereum blockchain, with the notion of "verified contracts". In this thesis, you will conduct an empirical study on the reproducibility of compilation in Solidity. You will recompile verified Solidity contracts and analyze the consistency of the results. The datasets for this study will be sourced from Etherscan and Sourcify. This research will contribute to the understanding of software integrity in the growing field of technology and could potentially inform best practices for Solidity development.

Server Integrity and Provenance Checking with Checksum Databases

Description: The primary objective is to develop a robust system for ensuring the integrity of server files and configurations through the use of checksum algorithms. The problem addressed is the increasing vulnerability of servers to unauthorized changes, data corruption, and potential security breaches, which can compromise sensitive information and disrupt services. The project aims to create a comprehensive checksum database that regularly scans server files, generates checksums, and compares them against a baseline to detect any unauthorized modifications. Expectations include the successful implementation of a user-friendly interface for monitoring integrity status, the ability to generate alerts for discrepancies, and a detailed report system for auditing purposes. Ultimately, the project seeks to enhance server security and reliability, providing a valuable tool for system administrators in maintaining the integrity of their digital environments.

A study on the use of checksums for integrity verification of web downloads

Just-in-Time Malware Analysis of Python Packages

Description: The primary objective is to develop an automated framework that can swiftly analyze Python packages for potential malware threats before they are used with pip or uv. With the increasing reliance on third-party libraries, the risk of introducing malicious code has become a significant concern for developers. The project aims to address this problem by creating a system that leverages static and dynamic analysis techniques to evaluate the behavior of Python packages in real-time, identifying suspicious activities and vulnerabilities. Expectations include the successful implementation of a prototype that can efficiently scan and report on the security status of various packages, as well as providing insights into the underlying mechanisms of detected threats. Ultimately, the project seeks to enhance the security posture of software development practices by enabling developers to make informed decisions about the packages they integrate into their applications.

https://github.com/apiiro/malicious-code-ruleset

Dynamic Integrity Verification & Repair for Java Applications

Description: Attackers constantly try to tamper with the code of software applications in production. Chang and Attalah have proposed a technique to not only detect modifications and also repairing the code after attacks by a network of small security units called guards. These guards can be programmed to perform tasks such as checksumming the program code, and they work in concert to create mutual protection. In this thesis, you will devise, implement and evaluate such as an approach in the context of modern Java software with dependencies. An open question is how to set up guard inside or around dependency code.

Enhancing Software Supply Chain Security: A Framework for Verifiable Audit Trails in Dependency Pinning

This thesis proposes a comprehensive framework for implementing verifiable audit trails when pinning dependencies in software projects. By combining dependency resolution with reasoning, the research will develop and evaluate a solution that creates verifiable records of dependency selection decisions, verification processes, and approval workflows. The framework will incorporate signed metadata, integrity verification mechanisms, and transparency documentation for each pinned dependency. It will enable organizations to demonstrate compliance with security standards while maintaining an authoritative history of dependency management decisions. Through case studies across various development ecosystems (npm, PyPI, Maven), the thesis will measure the security benefits, performance impacts, and operational overhead of implementing such audit trails, ultimately providing actionable guidance for enhancing software supply chain integrity in enterprise environments.

Auditable build system

Description: The action of building a software system (taken with a large scope definition, going from source code to deployed instance in production) is composed of a large number of steps, very few of them being tamperproof and auditable. In this project, you will design and develop a fully auditable build and deployment system. Every single step in the compilation, build and deployment process would be recorded in a transparency log. The choice of build system to consider for first the MVP is open to prioritization.

An experience report on producing verifiable builds for large-scale commercial systems

Package manager with capabilities

Description: All package managers have the same semantics, all dependencies run with the same privileges as the main application. Malicious code in dependencies have a full open avenue to infect the main target application. In this project, you will design and develop a package system with capabilities. For example, one dependency could have the right to read disk but not the other one. A clean first principle dependency calculus will be designed. Compartmentalization will be used for every dependency. To populate the package registry, there would be an automated port from an existing registry.

Package management systems

Everything authenticated data structures

There are millions of legacy applications built with no builtin integrity. We cannot afford rewriting all of them. Yet, we need to improve their integrity. In this project,you will design and develop an automated code transformation system that automatically ports legacy data structures to authenticated data structures. For example, one could transform all linked list to authenticated linked lists. One could transform an existing non auditable banking application into a auditable one.

Category Crypto & Smart Contracts

See our previous papers

Ensuring Smart Contract Execution with One-Shot Smart Contracts Encoding Simulation Results

This thesis proposes the development of a novel framework for creating ephemeral smart contracts that encode computational simulation results as immutable blockchain assertions. The research addresses the critical challenge of establishing trust and verifiability in simulation-based decision making. The proposed one-shot smart contract architecture will automatically synthesize, deploy and execute assertion-based validation based on pre-computed simulation outputs, and self-destruct after recording the verification result on-chain, thereby minimizing gas costs while maximizing transparency. Through comprehensive testing on Ethereum testnets using real-world simulation datasets, this research aims to demonstrate significant improvements in simulation result enforcement and auditability.

Automated Program Repair for Smart Contracts

Description: Smart contracts are software, and hence, cannot be perfect. Smart contracts suffer from bugs, some of which putting high financial stakes at risk. There is a new line of research on automated patching of smart contract. You will devise, perform and analyze a comparative experiment to identify the successes, challenges and limitations of automated program repair for smart contracts.

Smart Contract Audit Database

This thesis aims to build a comprehensive, open-source repository that surpasses existing solutions in terms of accessibility, usability, and depth of information. By aggregating a wide array of smart contract audits from various blockchain platforms, the database will feature detailed reports, vulnerability assessments, and remediation strategies, all categorized by contract type and risk level. Leveraging repository and scraping, the database will not only provide historical audit data but also facilitate real-time updates and insights into emerging threats and best practices.

Tracing Private Key Access in Crypto Wallet Dependencies

Description: Software supply chain attacks pose a critical risk to cryptocurrency wallets, particularly when dependencies have access to sensitive cryptographic material like private keys. This research proposes to conduct a comprehensive analysis of major cryptocurrency wallets to trace and map all third-party libraries and suppliers that have potential access to users' private keys during runtime. The investigation will employ static and dynamic analysis techniques to identify the complete dependency chain and privilege levels of each component that interacts with private key material. This is crucial because any compromised dependency with key access could lead to catastrophic loss of funds, as demonstrated in the 2018 Copay wallet incident. The goal is to create a detailed risk assessment framework that ranks wallets based on their private key exposure surface to third-party code, and to propose architectural improvements that minimize unnecessary private key access across the software supply chain.

Annotating, tracking, and protecting cryptographic secrets with cryptompk
Security Aspects of Cryptocurrency Wallets-A Systematic Literature Review
Software supply chain attacks on crypto
Backstabber's knife collection: A review of open source software supply chain attacks

Automatic Exploit Synthesis for Smart Contracts

Smart contracts typically hold large stakes and consequently, they are under constant attack by malicious actors. As counter-measure, engineering smart contracts involves auditing and formal verification. Another option is offensive automatic exploit synthesis. In this thesis, you will evaluate the state of the art of exploit synthesis for smart contracts. You will then design, implement and evaluate a better system that improves upon the state of the art.

Evaluation of the Quality of LLM-generated Invariants

You will perform in-depth case studies on a selection of real-world smart contracts to evaluate how well generated invariants hold up in practice. This will involve analyzing the impact of the invariants on the security posture of the contracts and their ability to prevent vulnerabilities. You will design a robust evaluation framework for assessing the quality of invariants generated by LLMs in the context of smart contracts. This thesis will advance the understanding of invariant generation in smart contracts.

Synthetic Vulnerability Generation for Smart Contracts

We need robust security measures to protect digital assets from vulnerabilities and attacks. Traditional methods of vulnerability detection often rely on too small vulnerability benchmarks. In this thesis, you will explore the concept of synthetic vulnerability generation for smart contracts. The goal is to develop a system that leverages deep learning models to automatically generate synthetic vulnerabilities in smart contracts, thereby facilitating the testing and evaluation of security tools and practices.

Effective Mutation Testing for Solidity Smart Contracts

Description: One of the problems with mutation testing is that the developers are overwhelmed by the number of mutants to kill with new tests. One way to approach this problem is to view it as a recommendation problem. The student will design, implement and evaluate a novel technique for automatically prioritizing mutants to be killed in Solidity smart contracts.

Design and Evaluation of Blockchain Transaction Signing with Trusted Platform Modules (TPMs)

Description: As blockchain technology continues to gain traction, the security of transaction signing processes becomes paramount, particularly in widely used cryptocurrencies like Bitcoin and Ethereum. Trusted Platform Modules (TPMs) offer a hardware-based solution to enhance the security of cryptographic operations, including transaction signing. This thesis aims to design a robust framework for integrating TPMs into the transaction signing processes of Bitcoin and Ethereum, focusing on the potential benefits and challenges of such integration. Additionally, the thesis will assess the performance implications of using TPMs in real-world scenarios, comparing the efficiency and security of TPM-based signing against traditional methods. This research will contribute to the understanding of secure transaction signing in blockchain systems and provide actionable insights for developers and stakeholders in the cryptocurrency ecosystem, ultimately enhancing the overall security posture of blockchain transactions.

Mutability Analysis of Smart Contracts with the Certora Prover

Description: Smart contracts on blockchain platforms like Ethereum are designed to execute specific functions based on predefined conditions. However, the mutability of contract states and the addresses they interact with can lead to vulnerabilities and unexpected behaviors. This research aims to analyze the mutability of smart contracts by leveraging the Certora Prover, a formal verification tool that helps ensure the correctness of smart contracts. The primary objective of this thesis is to investigate whether the targets of contract calls remain invariant, meaning they consistently point to the same address on-chain throughout the contract's lifecycle. This analysis is crucial for identifying potential risks associated with mutable contract states, such as reentrancy attacks, unexpected state changes, and reliance on external contracts that may change over time.

A survey of smart contract formal specification and verification

Smart Contract Security with Business Logic Enforcement

Description: Traditional upgradability patterns in smart contracts provide limited support for fine-grained updates, often making it difficult to preserve the core business logic invariants during contract evolution. This thesis explores a novel upgradability architecture where the business logic of a decentralized application is separated and explicitly encoded in a manager contract. Inspired by workflow models like DCR graphs, the manager enforces constraints and dependencies among modular activity contracts. Each activity is implemented as an independent smart contract, enabling upgrades to individual parts without compromising the overall system logic. This project involves designing, prototyping, and evaluating this new architecture on Ethereum or a compatible blockchain platform. It is ideal for students interested in smart contract engineering, software modularity, and secure upgrade mechanisms.

Highguard: Cross-chain business logic monitoring of smart contracts

Deep in the Woods: Unveiling Smart Contract Attack Patterns Through Bytecode Analysis of On-Chain Attacks

Description: Smart contracts enhance blockchain transparency through publicly accessible code, but any vulnerabilities are exposed to potential attackers scanning the chain. While verified contracts publish their source code, attackers deliberately avoid verification, obscuring their methods. Security researchers must rely solely on transaction data and bytecode to understand these attacks. This thesis will develop a comprehensive benchmark of malicious smart contract bytecode from real-world attacks. Using state-of-the-art decompilation tools, it will extract and analyze attack patterns to improve vulnerability detection and prevention. The findings will strengthen blockchain security by revealing previously hidden attack techniques and informing more robust smart contract development practices.

Dedaub decompiler (https://dedaub.com/faq/)
SEVM (https://github.com/ParkL/sevm)