How to set up your researcher identity for academic search engines?

by Martin Monperrus

A publication has a set of authors. A single author publishes several papers. Some authors share the same name. An author has (most of the time) at least one first name and one family name.

These are easy rules for humans. But surprisingly hard to be handled correctly in a fully automated manner by a computer program. Academic search engines are those programs which perform canonicalization and deduplication of authors.

How to maximize the likelihood of proper indexing by academic search engines?

Happy path

The happy path is that you have a single word first name and a single word family name. In this case, everything will be fine, as long as you always put your first name first: “Jane Doe”. If it’s OK for you to stick to the simple format, it will make your researcher life (research page, bibliometrics) and the search engine’s life easier.

Caveat 1 (order): in some languages, the family name comes first (eg Chinese). However, most academic search engines are designed according to the Western usage. Recommendation: always put the family name as last position

Caveat 2 (consistency). If you or your co-authors use different spellings of your name, the academic search engines won’t be able to consolidate your publications under the same identity. For example, “{Jean-Christophe} {Bela-Suza}”, “{Jean} {Christophe Bela-Suza}”, “{Jean-Christophe} {Suza}” may be considered different authors. Recommendation: once you choose a naming scheme, stick to it consistently from your first to your last publication. Also, your researcher identity does not have to be exactly the same as your passport identity.

Classical Caveats

Caveat 3 (compound noun): if your first or last name is a compound noun, it may be hard for search engines to identify its parts. Recommendation 2: always use a dash between the parts, for example a first name may be “Jean-Christophe” and a family name may be “Bela-Suza”.

Caveat 4 (diacritics): if your name contains diacritics (eg é or â), make sure that it is well handled in PDF documents (“Josâ” and not “Jos¨a”). In Latex, this means that the document must contain the right fonts (\usepackage[T1]{fontenc}, see this page). Recommendation: before pushing a PDF on the internet, always try to copy-paste your name from the document to check it’s handled correctly.

Caveat 5 (middle name). In Anglo-Saxon countries, there is a tradition of having a middle name (eg “Earl T. Barr”). It’s typically abbreviated, hence search engines will typically recognize an abbreviated name as a middle name). Recommendation: if you really want to have your middle name as part of your research identity, use the abbreviated form, and always use your middle name, consistently.

Caveat 6 (marital status) In some cultures, married people may change family names. If that happens, this can be considered unrelated to your researcher identity. Recommendation: stick to one single researcher identity, independently of your marital status.

Continuous fixing

Caveat 7 (mistakes) It will always happen that some author metadata got wrong. However, it can always be fixed. Then, it takes a couple of weeks, or even months for search engines to discover the change and update their internal database. Recommendation: when you spot a mistake in your name spelling, reach out to the published 1) get the metadata correctly fixed 2) update the PDF.

Metadata

Help the search engines by appropriate metadata, starting with creating an ORCID identifier and putting your ORCID identifier in all your papers to the possible extent.

Tagged as: