Documentation

Document it or it doesn’t exist

What is it?

Documentation is:

  • User’s manual
  • Blueprints
  • Advertisements
  • Sticky notes
  • Textbook
  • Prose

Documentation Types


  • Code comments (in-line, block)
  • Docstrings
  • Readme
  • Project documentation

Code Comments: In-Line


Comments made on the same line as code. Should be separated with 2 spaces.


extent = width + 1  # Accounts for screen boarder  

Code Comments: In-Line


Use sparingly. Explain why, not what.


extent = width + 1  # add one to width  

Code Comments: In-Line


Don’t use comments to mask other issues


x = 1.7  # Event magnitude  

Code Comments: In-Line


Don’t use comments to mask other issues


event_magnitude = 1.7  

Code Comments: Block


  • One or more lines which starts with #
  • A single space follows each #
  • Can span many lines
  • Should match indentation level of code

Code Comments: Block


# We loop over the data list here because the rows 
# can have different lengths which precludes using
# an array.
for row in data:
    ...

Code Comments: Block


Don’t state the obvious

# Loop over 0-9, add 42 and print.
for num in range(10):
    new_num = num + 42
    print(new_num)

Code Comments: Block


Note

In some cases, it might actually help to explain what a groups of lines are doing with a comment above them to help with “human parsability”. But, it might also be better to group these lines of code into a function with a descriptive name and docsstring.

Code Comments: Block


Explain/justify unorthodoxy

# Keep imports sorted this way to avoid circular
# imports between sub-modules
import .utils
import .processing

Code Comments: Block


Comment on attributes/variables meaning.

# Planck's constant in m**2 kg / s
planck = 6.62607015e-34

Code Comments: In the Wild


class record(nt.void):
    """A data-type scalar that allows field
     access as attribute lookup.
    """
    # manually set name and module so that 
    # this class's type shows up as numpy.record 
    # when printed
    __name__ = 'record'
    __module__ = 'numpy'

Code Comments: In the Wild


# GH#23758: We may still need to 
# localize the result with tz
# GH#25546: Apply tz_parsed first (from arg),
# then tz (from caller)
# result will be naive but in UTC
result = (
    result.tz_localize("UTC")
    .tz_convert(tz_parsed)
)

Understanding Check

How can we make this comment better?

import numpy as np

if __name__ == "__main__":
    # Create sin array with given amplitude and frequency
    dt = 1. / 1_000
    time = np.arange(1_000) * dt
    amplitude = 10  # amplitude
    frequency = 10  # frequency in Hz
    sin_data = amplitude * np.sin(2.0 * np.pi * frequency * time)

Understanding Check

Delete it!

import numpy as np

def make_sin_array(dt, samples, amplitude, frequency):
    """Create a sin wave.
    ...
    """
    time = np.arange(samples) * dt
    sin_data = amplitude * np.sin(2.0 * np.pi * frequency * time)
    return sin_data

if __name__ == "__main__":
    sin_array = make_sin_array(1/1_000, 1000, 10, 10)

Docstrings


Docstrings follow functions, methods, modules, classes, etc. They explain the purpose and use of the code.


def recombobulate(bob, bits):
    """Put Bob and bits back together"""

Docstrings


docstrings are accessed via __doc__


def recombobulate(bob, bits):
    """Put Bob and bits back together"""

print(recombobulate.__doc__)

Docstrings


Ideal public (doesn’t start with _) docstrings:

  • Start with a single line description
  • Describe the input parameters
  • Describe the output parameters (when needed)
  • Provide an example

Docstrings


There are several common flavors of docstrings

  • Numpy
  • ReST
  • Google
  • Just markdown (I wish)

Docstrings: Numpy


  • Most common docstring style for scientific/machine learning codes
  • More human readable than some other styles
  • Uses several sections underscored by —–
  • Examples section uses standard doctest format

Docstrings: Numpy

def recombobulate(bob, bits=None):
    """Create new Bob with bits put back.
    
    Parameters
    ----------
    bob : Bob
        The Bob object which was discombobulated.
    bits : Bits, Optional
        The Bob bits which fell off.
    
    Returns
    -------
    A new Bob object without missing bits. 
        
    Examples
    --------
    >>> new_bob = recombobulate(bob, bits)
    """

Docstrings: Numpy

def recombobulate(
    bob: Bob,
    bits: None | Bits = None,
) -> Bob:
    """Create new Bob with bits put back.

    Parameters
    ----------
    bob
        The Bob object which was discombobulated.
    bits
        The Bob bits which fell off.

    Examples
    --------
    >>> new_bob = recombobulate(bob, bits)
    """

Docstrings: Doctests

def factorial(n):
    """Return the factorial of n, an exact integer >= 0.

    >>> factorial(30)
    265252859812191058636308480000000
    >>> [factorial(n) for n in range(6)]
    [1, 1, 2, 6, 24, 120]
    >>> factorial(-1)
    Traceback (most recent call last):
        ...
    ValueError: n must be >= 0
    """

Docstrings: Doctests

Directives


"""
>>> print(list(range(20)))  # doctest: +NORMALIZE_WHITESPACE
[0,   1,  2,  3,  4,  5,  6,  7,  8,  9,
10,  11, 12, 13, 14, 15, 16, 17, 18, 19]
"""

Docstrings: Doctests

Directives


"""
>>> print(list(range(20)))  # doctest: +ELLIPSIS
[0, 1, ..., 18, 19]
"""

Docstrings: Doctests


Example can (and should) be run to make sure it still works. Pytest does this:


pytest --doctest-modules

Docstrings: Tips


  • Use a spell checker
  • Use complete sentences
  • Try to be empathic, try to be a friend
  • It’s fine if the docstring is longer than the code
  • Typehints are good too, don’t be redundant

Understanding Check

What’s wrong here?

def my_func(thing_a, thing_b):
    """
    
    Parameters
    ----------
    thing_a
        The first thing 
    """

Understanding Check


When to use a comment vs a docstring?

Readme

Your project’s elevator pitch / note in a bottle

  • Concise
  • Talk is cheap, show code right away
  • Describe basic functionality/features
  • After 2 minutes readers should:
    • Know if the library might solve their problem
    • See an example
    • Know if the code is well used/maintained (badges)

Project Documentation

  • Supplement/explain the codebase
  • Are written in ReST, Markdown, or notebooks
  • Usually compiled to HTML/PDF
  • Can include:
    • Readme
    • Tutorials
    • Technical references
    • How to guides

Diátaxis

Tutorials

Take the reader through a series of steps (learning-oriented).

  • Get the user started
  • Provide a complete picture at the start
  • Ensure that the tutorial works reliably
  • Ensure the user sees results immediately
  • Describe concrete steps, not abstract concepts
  • Offer only minimum, necessary, explanation
  • Ignore options and alternatives

How tos

Walk the reader through solving a real-world problem (goal-oriented).

  • Solve a problem
  • Don’t explain concepts
  • Are flexible
  • Omit unnecessary info
  • Are well named

References

Technical descriptions of the machinery and how to operate it (information-oriented).

  • Are consistent
  • Do nothing but describe
  • Provide examples
  • Is accurate
  • “One hardly reads reference material; one consults it.”
  • Most common reference: API documentation

Explanation

Clarifies and illuminates a particular topic (understanding-oriented).

  • Makes connections
  • Provides context
  • Talk about the subject
  • Discuss alternatives and opinions
  • Doesn’t instruct or provide technical reference
  • Example: theory / design decisions

Advice


  • Don’t cross the streams! Try to keep each doc type separate but add links where needed.
  • Ensure code is executed when building docs (when possible)
  • Have someone who doesn’t know the library review the docs

Building and Hosting Docs


The most common libraries for building python docs:

The pages can be hosted on readthedocs, githubpages, netlify, or elsewhere.

Virtual Field Trip


With a partner, checkout two of these project’s readmes: