How to Write Good Object-oriented API Documentation

The success of a piece of software depends on many factors, including the quality of its documentation. There are many kinds of documentation. In this post, we concentrate on one kind of documentation: the API documentation of object-oriented software (Java, .NET, Python, SmallTalk, Ruby, etc.). In the Java world, it is often referred to as the "Javadoc" [7] (and is sometimes "doc comment" [10]). For instance:

/** parses Java files to produce an AST */
class JavaParser {

  public JavaParser(File file) {...

  /** returns an AST, must be called once. */ 
  public AST produceAST() {...
}

class PythonParser :  
  """ parses Python files to produce an AST """

  def __init__(file):

  def produceAST():
    """ returns an AST, must be called once. """

The API documentation may appear at the package, class, method and field level (since it is a good idea to put the API documentation in source code directly [10]). Writing good API documentation is difficult. In this post, we present different rules that help writing good API documentation. Certain rules are general, they apply to all levels, others are more restricted in scope. They are meant to be language independent (hence there will be examples in both Java and Python).

Those rules come from different sources: recent software engineering research papers ([1], [3]), my experience from reviewing API documentation written by students, and good posts from the Internet (e.g. [4]).

Don't hesitate to comment this page if you disagree or would like to add something! Thanks!

--Martin

General Rules

First of all, a good API design facilitates understanding, so do good names (class names, method names, parameter names) (I recommend [12] for in-depth discussion about API design and API naming).

Then, the first fule is (R1) every sentence of the API documentation should contain some value by giving additional pieces of information. If the documentation only copies or rephrases the element name (whether class, method or parameter name), it has a an added-value of zero, it is simply wasted time for the writer and for the reader.

Examples:

/** parses Java files to produce an AST */ class JavaParser {...	Added value: this class takes a file as parameter, and outputs an object of type AST
~~/** sets the name of this object */~~ void setName(String name) {	Added value: None

There are other ways of formulating this rule. Dustin Marx puts it as "You Don't Necessarily Need to Javadoc Everything!" [4] and Adam Bien as sometimes "No Doc, is the Best Doc". Uncle Bob goes also in this direction [6]. Also, don't use tags (i.e. @param, @return) if they just contain duplicated information.

Furthermore, (R2) the first sentence matters [4]. People always read the first sentence but you have no guarantee that they will read the rest. So the first sentence should convey the key piece of information. A corollary is that: one single good sentence is better than 5 unclear and unstructured sentences. Finding a good first sentence is an art, it requires some time to find the best possible sentence. Starting the first sentence by a verb is always a good idea (especially for class and method-level comments) [7]. For instance, for class-level API documentation, "provides" forces one to identify a key (and hopefully single) responsability [7], for method-level API, "returns" or "computes" are good default choices.

Lastly, keep in mind that the API documentation will be read as much in the code as in an output format. Too much HTML (and more generally, too much heavyweight markup language) hinders the readibility when reading the API doc directly in the code (and is sometimes "an abomination" [6]).

Class-level API documentation

First of all, don't forget rule R1.

/** represents a bank account */ (**added value: none**)
class BankAccount {

There are many interesting things to put at the class level (or interface level), which go far beyond the semantics of the class name.

Where to obtain instances?. For interfaces, abstract classes, when there is no public constructors or when a factory or builder is in place, telling the developper where to obtain instances is a very important piece of information.

What are the main methods?. Not all methods are of equal importance, certain methods do contain the key responsability of the class. If you point the reader to the main domain methods, she does not have to browse a list of getters and utility methods before grasping how to use the class.

Is the class abstract? (unless the class name contains "Abstract", see R1). This is especially important for dynamically typed language. There is also some added value in languages where the "abstract" concept (and keyword) exists, because the generated documentation may not keep this information. For instance, in Java, abstract class Output .... is translated, on the package summary page, to "Output provides way to output text (whether in the Console or in a File)" (without "abstract). For abstract classes and interfaces, referring to known subclasses or implementations is added value.

Who are the main clients of this class? This class is meant to be used by others. The class-level API doc is a good place to list the classes and methods that use this class. This is very much related to code samples.

How to use the class? (code sample). I do share the point of many authors [8][9][11], code sample is very important for quickly understanding and correctly using a class, class-level documentation is the perfect place for that. Writing good code sample is an art: it should be short, it should hide unimportant details, it should be well documented (see also [11]).

Other added value:
- whether the class is immutable
- whether the class can be used in a concurrent setting
- whether some alternative classes exist (and some hints on how to choose between them)
- whether the class takes part of a design pattern (if the class name does not suggest it, see R1)
- how to correctly subclass/implement the class (known as "subclassing directives", see [2], [1])

Method-level API documentation

First, the method name and parameter names are the basic API documentation, they do contain some semantics about the purpose of the method, they must be named carefully [6] (but this post is not about naming). Then the method-level doc should start with a verb (a method does something) and can contain many interesting pieces of information.

Does this method have side-effects? . The method-level API doc must tell to what extent the state of the program is changed:
- at the object level
- at the parameter level (are the parameters impacted?)
- at the class level through static fields
- at the application level (through global variables and changes on the hard-drive)

What is the returned object (if any)?. The API documentation should tell the type and the semantics of the returned object. This is especially important for dynamically typed language.

Note that for abstract methods (from abstract classes or interfaces), the API documentation is actually a specification, which is often a contract (pre-condition, post-condition) between client code and the implementation (beyond the syntactic contract of the method signature). I would says that the API specification of abstract method actually requires more care than standard API documentation (a excellent example is the specification of java.util.Collection).

The method-level API documentation also contains the API documentation of the method parameters. I would call them parameter-level API documentation. Some added value for parameter-level documentation:
- acceptable range (e.g. an interval for numbers, a regular expression for Strings)
- whether null/None can be passed and what is its semantics
- relation between parameters (if a>0 then b must be >0)
- type of the parameters (esp. for dynamically typed languages)

Other added value for method-level API documentation:
- whether the method is part of the programming interface (i.e. whether the method is public, whether client code can rely on the method). This is important for languages with no visibility concepts.
- whether the method take part of an object protocol, i.e of a recommanded/required sequence of method calls (this method must be called after/before this one, this method must not be called twice)
- some performance considerations ("this method is O(1)", "this method is slow O(n^2)") [3]
- whether the method is synchronized (same argument as for "abstract")
- the situations when the method throws an exception and the type of the thrown exception.
- when callbacks are given, when will they be called?

Package-level API documentation

As added value, the package level documentation can provide:
- the rationale behind the creation of this package, for instance "all subclasses of Processors", "only exposes interfaces", "contains classes to filter elements"
- a class diagram if the classes of the package are meant to be used in conjunction
- a description of the domain and the main domain classes

Field-level API documentation

In general, it is a good practice to make fields private. Consequently, their API documentation is less important. However some added value can be:
- the field semantics (if the field name is not self-describing)
- whether the field value is constant or not
- in which methods the field is used
- the field type (for dynamically typed languages)

Subtleties on the term API documentation

First, the API documentation is different from inner-method comments. Contrary to comments, it also documents compiled code ("Javadoc" refers to both the physical comment and the produced HTML files). In general, code comments explain what it does while API documentation tells the reader how the element (whether class or method) should be used.

Second, the term API documentation refers to all structural pieces of documentation of Java software. However, it makes especially sense for libraries and frameworks (the "P" of API) and for public/exposed members (the "I" of API). As a rule of thumb, the more users of a code element, the more care should be put in writing its API documentation.

Bibliography

[1] What Should Developers Be Aware Of? An Empirical Study on the Directives of API Documentation (Martin Monperrus, Michael Eichberg, Elif Tekes, Mira Mezini) Empirical Software Engineering, 2012
[2] Mining Subclassing Directives to Improve Framework Reuse (Marcel Bruch, Mira Mezini, Martin Monperrus), MSR'2010
[3] Increasing Awareness of Delocalized Information to Facilitate API Usage (Uri Dekel), PhD thesis, 2009
[4] More Effective Javadoc (Dustin Marx)
[5] How to JavaDoc (efficient and maintainable) - 6 steps (Adam Bien)
[6] Clean Code (Robert C. Martin) -- Chapter 4: Comments
[7] How to Write Doc Comments for the Javadoc Tool (Oracle)
[8] A Coder’s Guide to Writing API Documentation (Peter Gruenbaum)
[9] Example Embedding (Ohad Barzilay)
[10] API Documentation from Source Code Comments: A Case Study of Javadoc (Douglas Kramer) 1999
[11] Synthesizing API Usage Examples (Raymond P.L. Buse and Westley Weimer), ICSE 2012
[12] Framework Design Guidelines (Krzysztof Cwalina, Brad Abrams)