Writing papers with LaTeX? Check out www.publications.li!

Copy-pastable ASCII characters with pdftex/pdflatex

by Martin Monperrus
(tagged as )
This document presents how to obtain fully copyable PDF documents from Latex. By fully, I mean that all ASCII characters are copied into the equivalent binary character with Ctrl-C in a PDF viewer (evince, xpdf, acrobat reader, etc.) or with pdftotext in command line. Indeed, by default certain characters are transformed, eg. ' becomes .

Let's list the printable ASCII characters:
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[]^_`{|}~

Alphanumeric characters are always copyable. Many punctutation characters work out of the box if powerful fonts are correctly set up (e.g. with cm-super): . / : ; < = > ? @ [ ].

Many other characters just need to be escaped in order to be copy-pastable:
\#, \$, \%, \&, \{, \}

The problems start with " (double quotes), \ (backslash), ' (single quote, apostrophe), - (hyphen or minus), ^ (caret), | (pipe), ` (backtick, backquotes, accent grave), ~ (equivalency sign - tilde), _ (underscore).

It turns out that obtaining those characters is not straigthforward: it depends on your default fonts and on whether \usepackage[T1]{fontenc} is present.
Here are commands that might work:
On my Debian Linux / TeX Live, with cm-super installed, and with \usepackage[T1]{fontenc} , the following produces a completely copy-pastable PDF:
\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{textcomp}
\begin{document}
\tiny{
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!\char34\#\$\%\&\verb|\|\textquotesingle()*+,\verb|-|./:;<=>?@[]\verb|^|\verb|_|\`{}\{\texttt{|}\}\verb|~|
}
\end{document}
Without \usepackage[T1]{fontenc}: the following works:
\documentclass{article}
\begin{document}
\tiny{
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!\verb|"|\#\$\%\&\verb|\|\char"0D()*+,\verb|-|./:;<=>?@[]\verb|^|\verb|_|\`{}\{\texttt{|}\}\verb|~|
}
\end{document}
See also Copy-pastable listings in PDF from LaTeX