Copy-pastable ASCII characters with pdftex/pdflatex

by Martin Monperrus

This document presents how to obtain fully copyable PDF documents from Latex. By fully, I mean that all ASCII characters are copied into the equivalent binary character with Ctrl-C in a PDF viewer (evince, xpdf, acrobat reader, etc.) or with pdftotext in command line. Indeed, by default certain characters are transformed, eg. becomes .

Let’s list the printable ASCII characters: ’’0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&'()*+,-./:;<=>?@[]^_`{|}~’’

Alphanumeric characters are always copyable. Many punctutation characters work out of the box if powerful fonts are correctly set up (e.g. with cm-super): . / : ; < = > ? @ [ ].

Many other characters just need to be escaped in order to be copy-pastable: #, $, %, &, {, }

The problems start with " (double quotes), ** (backslash), ’** (single quote, apostrophe), - (hyphen or minus), ^ (caret), | (pipe), ` (backtick, backquotes, accent grave), ~ (equivalency sign - tilde), **_** (underscore).

It turns out that obtaining those characters is not straigthforward: it depends on your default fonts and on whether ’’ ’’ is present. Here are commands that might work: On my Debian Linux / TeX Live, with cm-super installed, and **with ** , the following produces a completely copy-pastable PDF:
\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{textcomp}
\begin{document}
\tiny{
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!\char34\#\$\%\&\verb|\|\textquotesingle()*+,\verb|-|./:;<=>?@[]\verb|^|\verb|_|\`{}\{\texttt{|}\}\verb|~|
}
\end{document}
Without : the following works:
\documentclass{article}
\begin{document}
\tiny{
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!\verb|"|\#\$\%\&\verb|\|\char"0D()*+,\verb|-|./:;<=>?@[]\verb|^|\verb|_|\`{}\{\texttt{|}\}\verb|~|
}
\end{document}

See also [[http://www.monperrus.net/martin/copy-pastable-listings-in-pdf-from-latex|Copy-pastable listings in PDF from LaTeX]]

Tagged as: