Open-science and Double-blind Peer-Review

Recently, double-blind peer-review has fallen over my research community as a storm. Unfortunately, beyond its noble goal of reducing unfairness, double-blind peer-review may have detrimental collateral effects on open-science.

Open-science is a scientific movement aiming at improving the scientific process through openness, where openness mostly refers to transparency and freedom. Intuitively, transparency seems to be in opposition with blindness. While “double-blind open-science” is not a perfect oxymoron, there are still important issues to consider, which are discussed in this paper.

In this article, I argue that open-science must not suffer from double-blind peer-review. The main reason is that the timely dissemination of knowledge is paramount. In particular, I explain that the boundary of anonymization is paper plus its appendix (possibly online appendix). The boundary of “blindness” is not the whole world. Consequently, the reviewer is sole responsible for breaking double-blind as soon as she searches on Google, or anywhere else on the Internet.

Over the past years, I have witnessed two problems posed by double-blind review on open-science:

a risk to reproducible science: under a double-blind review process, authors may think that attached data and online appendix would break double-blind, or they may skip the creation of an anonymized online appendix to save valuable time.
a risk to early dissemination: under a double-blind review process, authors may not publish preprints / working papers, even worse they may be forbidden to do so.

I discuss those two points in the following.

Anonymization of Open-science Data

For experimental disciplines, an open-science approach to submitting papers is to always attach the data or code that supports the claims in the paper. This has two main advantages: first, reviewers can complement their reviews of the paper by looking at the data, and they qualify their assessment by looking at whether the data is good enough for being used in future research. Second, if the assessment of the code/data is part of the review process, this makes a strong incentive for authors to create good reproduction package.

It is to be emphasized that double-blind does not prevent open-science, double-blind review does not mean absence of data or absence of appendix. Similarly to paper anonymization, in a double-blind review process, the open-science data or code that is in the online appendix must be anonymized. The authors must

anonymize URLs: the name of the institution/department/group/authors should not appear in the URLs of the open-science appendix
anonymize the appendix content itself

Rule 1: Under double blind, put your data open for review, as with single-blind, but take care of anonymization

Anonymizing an open-science appendix needs some work, but fortunately, this can be automated, see “Github anonymous” below.

Reviewer Responsibility

Science is a conversation. Ideas flows. Internet is the most wonderful ever salon where scientific ideas, data and code are exchanged at a very high speed. Search engines are built for finding them super efficiently. Consequently, it is not surprising to be able to identify the authors of a paper by using search engines. There are many reasons for this: 1) the paper under review resembles previous ones by the same authors, 2) the paper under review has already been discussed during public outreach 3) a previous version of the paper has already been made online, eg a working paper on a webpage or on Arxiv (this often happens when a paper is a resubmission). In short, there is a high chance that a single query on Google (or Google Scholar) will reveal the identity of the authors. This is just fine. Authors can do nothing about this.

In double-blind peer-review, the boundary of anonymization is the paper plus its online appendix, and only this, it’s not the whole world. Googling any part of the paper or the online appendix can be considered as deliberate attempt to break anonymity. This means that taking care of anonymity is not only on the author side, but also on the reviewer side. And both may be responsible for breaking double-blind.

Rule 2: The reviewer is sole responsible for breaking double-blind as soon as she searches for information about it, whether by asking to her colleagues, Google, or anywhere else on the Internet.

The authors are not responsible if it is possible to find their work, or traces of their work, or comments on their work on the Internet.

It is often asked whether one can publish a work on Arxiv while it is under review in a double-blind process. There are different answers:

legally: yes, one can publish on Arxiv, submitting a paper usually does not bind you in any way.
practically: maybe, it depends whether the review guidelines explicitly allow or forbid to do so (don’t hesitate to ask the editor / PC chair in advance). If the guidelines are unclear, it is a trade-off decision for the authors between having a rejected paper because of an Arxiv version and being able to claim precedence as well as being early cited.
ethically: yes, especially if you believe that scientific dissemination is above the artifacts of peer-review.

My own humble answer is YES, one can publish a work on Arxiv that is under double-blind review. Even more, one should really do so. First, because the essence of science is dissemination of knowledge, and this is all what a preprint is about. A preprint is a good starting point of a scientific conversation.

Second, because publishing a preprint is not only about dissemination, it is also about being able to claim precedence for a discovery, an idea or an invention. Double-blind peer-reviewers are not less likely to reuse or leak an idea, or to simply “be inspired” by your work. This holds whether your paper is accepted or not. You actually need a preprint backup much more if your paper is rejected…

Also, publishing on Arxiv is a way to have an early impact. It has happened several times to me that my work has been cited by papers in the same conference, because they were published on Arxiv, and consequently could be read and cited early.

Rule 3: Double blind allows you to publish on Arxiv as early as needed, and the incentives of early preprints – dissemination, precedence – are as clear and strong as with single-blind peer-review.

(Not to mention the case of papers that are resubmitted after one or several rejections, which should obviously be made public before the final acceptance…)

Automated Anonymization of Open-Science Github Repository

Github is now heavily used to host scientific code and data. The open-access platform Zenodo supported by CERN even attributes DOIs to Github releases! Now, let’s come back to the idea that double-blind peer-review should not prevent open-science appendix. What does this mean for Github open-science repositories?

It means that for a Github replication repository, accompanying a paper under double-blind review;

the owner / organization / repository name must be anonymized
the content of the repository must be anonymized.

Imagine how this is tedious, especially under the pressure of a deadline. Consider the tension between doing good open-science and keeping precious hours for doing something else than anonymizing an open-science Github repository.

The good news is that this can be automated! My talented student Thomas Durieux has developed anonymous_github which automatically anonymizes both the URL and the content of a Github repository. The anonymization of the URL is achieved though proxying the requests, and the anonymization of the content is done by replacing all occurrences of words in a list by “XXX”. The word list is provided by the authors, and typically contains the institution name, author names, logins, etc…

A public instance of anonymous_github is hosted at 4open.science:

http://anonymous.4open.science/

To use it, on the main page, one simply fills the Github repo URL and the word list (which can be updated afterwards).

Other Aspects of Open-science and Peer-review

For an overview of open-science, we refer to [@fecher2014open] who have drawn an overview of the different facets of open-science and in particular open infrastructure, open access, and open scientific data. However, they have not discussed the relationship between open-science and peer-review.

Beyond open scientific data and early dissemination with preprints, open-science has already met peer-review in a number of ways:

some claim that the reviews of accepted papers should be made public, so that everybody can understand why and how a paper has been accepted [@carmi2007improving].
others propose “attributed peer-review”, where reviewers sign their reviews and take full responsibility of what they write.
in some community, a submission is simply an Arxiv paper id. By doing so, transparency is maximal: everybody can see the initial, reviewed form of accepted papers.

Murphy interestingly discusses peer-review of data [@murphy2016update]. In the perspective of double-blind peer-review, this poses the interesting problem of anonymizing research data. While “Github anonymous” is a first step in this direction, there is certainly more to be done.

Open-science and Double-blind Peer-Review

The Risks of Double-blind Review for Open-science

Anonymization of Open-science Data

Reviewer Responsibility

Double-blind and Preprints / Arxiv

Automated Anonymization of Open-Science Github Repository

Other Aspects of Open-science and Peer-review

References