I’m often invited to serve as an examiner for PhD theses. Through this role, I’ve witnessed firsthand how open science practices significantly varies. This post offers guidance for PhD candidates aiming to improve their open science profile and overall research impact.
Open science refers to the movement towards making research more transparent and reproducible. It involves sharing all parts of the research process, including data, code, and analyses, with the research and citizen community:
- Accessibility: Researchers make their work available to everyone, without barriers.
- Transparency: Detailed documentation allow for verification and replication of results.
- Reproducibility: By providing data and code, others can reproduce and build upon existing work, which strengthens scientific validity.
Embracing open science is both a moral imperative and a pragmatic strategy to amplify your research impact. When you share your methods, data, and analyses, you enhance the credibility of your work but also pave the way for greater citation and robust scientific progress.
- Accelerated Discovery: By providing communal access to your research artifacts, you enable others to build upon your work, reducing redundancy and speeding up the pace of discovery.
- Enhanced Impact: Open research garners increased visibility, attracting a diverse range of engagements that often translate into higher citation counts.
What is an Open Science Repository?
An open science repository is an accessible Internet location for sharing all essential research artifacts. Typically, an open science repository contains:
- Code: All scripts, algorithms, and tools used for data analysis and experimentation.
- Benchmarks: Curated input subjects that help evaluate and compare computational models or experimental methods.
- Datasets: Both raw and processed data sets that provide the foundation for research findings.
- Intermediate Experimental Data: Data and artifacts generated during various stages of your experiments, helpful for tracking your research process.
- Final Experimental Results: The conclusive outcomes and processed data that underpin your research contributions.
There are different platforms for sharing open science data, for example:
GitHub Repository:
A platform for hosting code, algorithms, and reproducible analyses, often complete with version control, issue tracking, and contributor guidelines.Zenodo Package:
A digital repository that archives datasets, supplementary materials, and finalized research results. Zenodo assigns a DOI for your work, making it citable and ensuring long-term preservation and accessibility.
Common Issues in Open Science Repositories
Below are typical obstacles that may hinder the effectiveness of open science practices:
- No Dedicated Repository: Without a single, centralized repository, research artifacts can be scattered, making them hard to locate.
- Insufficient Content: When repositories include minimal or superficial material, which I would call “open science theater”, it may actually be worse than no repo at all.
- Poor Documentation: Clear guidelines and documentation are essential for reproducibility. Repositories that lack detailed explanations discourage others from leveraging the shared resources.
- Usability Challenges: Repositories that are difficult to use limit the impact of the research.
- Missing Data: Comprehensive sharing involves not only code and methods but also the underlying data. Missing data can lead to challenges in validating research results.
- Unaddressed Issues: Proactive maintenance, including responding to user feedback and troubleshooting, is crucial for ensuring the repository remains a valuable resource.
How to Create an Effective Open Science Repository
A well-structured repository significantly enhances the reproducibility and impact of your research. Use this checklist to ensure your repository meets open science standards:
- Create a dedicated repository with clear descriptive title and README
- Include comprehensive documentation with setup instructions and dependency lists
- Share all code including analysis scripts, computational models, and visualization tools
- Provide complete datasets in accessible formats with clear documentation
- Include intermediate results to allow verification of your analysis pipeline
- Document your research environment with version information for all tools and libraries
- Establish a clear organization structure with logical file naming and folder hierarchy
- Add license information specifying how others can use your work
- Create reproducible examples demonstrating key findings
- Include citation information showing how to properly cite your repository
- Test reproduction steps from a clean environment before sharing
- Create permanent links using DOIs or other persistent identifiers
- Respond to issues promptly when users encounter problems
Inviting me as an examiner
If you’re inviting me to be an examiner for your PhD thesis, please include a dedicated section in your thesis titled “Open Science Resources” that includes:
- Consolidated list of all repository links
- Brief description of what each repository contains
- Access instructions for any gated resources (if applicable)
- How they relate to the manuscript chapters
This consolidated approach saves time, demonstrates your commitment to open science, and ensures I can thoroughly evaluate the reproducibility and transparency of your research.
Remember that open science is not just about checking boxes, it’s about facilitating genuine scientific progress through data & code collaboration, not only papers.
Martin Monperrus
April 2025