Documentation¶
Most people think of writing documentation as an unpleasant, but necessary task, done for the benefit of other people with no real benefit to themselves. So they choose not to do it, or they do it with little care.
But even if you are the only person who will ever use your code, it’s still a good idea to document it well. Being able to document your own code gives you confidence that you understand it yourself, and a sign of well-written code is that it can be easily documented. Code you wrote a few weeks ago may as well have been written by someone else, and you will be glad that you documented it.
The good news is that writing documentation can be fun, and you really don’t need to write a lot of it.
Docstrings and Comments¶
Documentation is not comments.
A docstring in Python is a string literal that appears at the beginning of a module, function, class, or method.
"""
A docstring in Python that appears
at the beginning of a module, function, class or method.
"""
Let’s add a trivial docstring to our code.
def cumulative_product(array):
"""
Compute the cumulative product of an array of numbers.
"""
result = list(array).copy()
for i, value in enumerate(array[1:]):
result[i+1] = result[i] * value
return result
The docstring of a module, function, class or method becomes the __doc__
attribute of that
object, and is printed if you type help(object)
:
In [1]: from python201.algorithms import cumulative_product
In [2]: help(cumulative_product)
Help on function cumulative_product in module python201.algorithms:
cumulative_product(array)
Compute the cumulative product of an array of numbers.
A comment in Python is any line that begins with a #
:
# a comment.
Note
The purpose of a docstring is to document a module, function, class, or method. The purpose of a comment is to explain a very difficult piece of code, or to justify a choice that was made while writing it.
Docstrings should not be used in place of comments, or vice versa. Don’t do the following:
def cumulative_product(array):
# Compute the cumulative product of an array of numbers.
result = list(array).copy()
for i, value in enumerate(array[1:]):
result[i+1] = result[i] * value
return result
Deleting code¶
Incidentally, many people use comments and string literals as a way of “deleting” code - also known as commenting out code. See this article on a better way to delete code.
What to document?¶
So what goes in a docstring?
At minimum, the docstring for a function or method should consist of the following:
A Summary section that describes in a sentence or two what the function does.
A Parameters section that provides a description of the parameters to the function, their types, and default values (in the case of optional arguments).
A Returns section that similarly describes the return values.
Optionally, a Notes section that describes the implementation, and includes references.
Let’s add some more information to our docstring.
def cumulative_product(array):
"""
Compute the cumulative product of an array of numbers.
Parameters:
array (list): An array of numeric values.
Returns:
result (list): A list of the same shape as `array`.
"""
result = list(array).copy()
for i, value in enumerate(array[1:]):
result[i+1] = result[i] * value
return result
Here we’ve followed a particular style guide; Sphinx uses Google’s documentation guidelines by default to parse your docstrings, more on this later! NumPy’s documentation guidelines are also a great reference for more information about what and how to document in your code. There are other style guides you might prefer.
Doctests¶
In addition to the sections above, your documentation can also contain runnable tests. This is
possible using the doctest module. Include a
section of examples in the following format and pytest
will discover and validate that the
expected output is indeed generated.
def cumulative_product(array):
"""
Compute the cumulative product of an array of numbers.
Parameters:
array (list): An array of numeric values.
Returns:
result (list): A list of the same shape as `array`.
Example:
>>> cumulative_product([1, 2, 3, 4, 5])
[1, 2, 6, 24, 120]
"""
result = list(array).copy()
for i, value in enumerate(array[1:]):
result[i+1] = result[i] * value
return result
You can tell pytest
to run doctests as well as other tests
using the --doctest-modules
switch:
$ pytest --doctest-modules python201/algorithms.py
================================== test session starts ===================================
platform linux -- Python 3.8.3, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /home/glentner/code/github.com/glentner/python201
plugins: hypothesis-5.20.3
collected 1 item
python201/algorithms.py . [100%]
=================================== 1 passed in 0.02s ====================================
Note
Doctests are great because they double up as documentation as well as tests. But they shouldn’t be the only kind of tests you write.
Documentation-Driven Development¶
In a similar manner in which Test-driven Development (TDD) forces you to think clearly about how the feature you intend to develop should behave, so too does Documentation-driven Development (DDD).
The idea is as follows, you must first be able to describe what the thing does before you can build the thing that does it. In this way, documentation-driven development precedes test-driven development. Think of writing your docstrings first as a sort of planning phase. Once you’ve sorted out the documentation, write the tests that it should pass; then and only then, write the implementation.
Note
We have of course gone in precisely the wrong order in this tutorial, but its a tutorial so we’ll make an exception for our sake.
Extras¶
Automatic Documentation Generation¶
Finally, you can turn your documentation into a beautiful website (like this one!), a PDF manual, and various other formats, using a document generator such as Sphinx.
Sphinx¶
For a Python project like this, it is common practice to have a docs
folder at the top level
of your project with the source to a Sphinx website. We won’t include a complete guide to using
Sphinx here; there are many such guides online.
To get started, create the directory and run the sphinx-quickstart
command from inside
the directory. There are a few options it will ask you about.
$ mkdir docs
$ cd docs
$ sphinx-quickstart
Depending on how you answered the prompts from the quickstart command you will have a new source
tree with an index.rst
and conf.py
file. The build directory will either be within this
same folder as _build
or you will have explicit, adjacent source
and build
directories. Either setup is fine, I prefer to have them separate.
The conf.py
file is your Sphinx configuration for the project and it contains essential,
high-level information (e.g., the name and version number for your project, copyright information,
etc.), as well as detailed options that may be specific to the theme you are using. Typically,
Sphinx themes are easily installable as Pip modules, and need merely to be assigned in
conf.py
. We’re using the
pydata_sphinx_theme.
The pages for your documentation are restructured text files (kind of like markdown), and the
top-level index.rst
(as well as within any folder) behave just as an index.html
page
would.
To build your documentation, use the provided Makefile
(or make.bat
on Windows).
$ make html
Sphinx doesn’t just create html. The whole point of Sphinx is that you create layers of content files that you can build into multiple formats, include HTML, PDF, man pages, etc.
The nice thing about using Sphinx with Python is that it knows about Python docstrings.
We’ll neglect a full exposition here, but to illustrate the point, documenting the API
for your project could quite literally be as simple as creating an api.rst
page
with something like the following.
API
===
.. automodule:: python201
:members:
:mod:`python201.algorithms`
---------------------------
.. automodule:: python201.algorithms
:members:
If we maintain a certain style in our docstrings as described here, now we only need to manage a single copy of the documentation! Sphinx can pull out and format our docstrings into a fully functioning website!
Note
This kind of special functionality and other features like it are often provided as a builtin
or third party extension, in this case we are using the builtin sphinx.ext.autodoc
extension. You can simply add these to the list of extensions activated in your conf.py
.
Hosting¶
If you put your project under version control, typically using git
, and host it online using a
provider (such as github.com), you can use git hooks to automatically
trigger an update to a website. Basically, services can register themselves with your repository
and when a particular event occurs (like a push to the master branch), they’ll take some
action (like pull to update the docs and update the website).
This tutorial is hosted using Github Pages. In the settings to the repository on GitHub I have
it pointing to my docs
folder with some additional necessary bits to tell GitHub what lives
where. When I push changes to GitHub it automatically syncs the contents of my docs/build
directory.
Many open-source projects like to use readthedocs.org, especially for Python projects. You can create an account and authenticate with GitHub, point to your repository, and follow some simple setup procedures. Not only will it host your Sphinx documentation, it will build it for you!
Type Annotations¶
A relatively new concept in Python (3.5+), type annotations are a powerful new feature that let you be more precise about your intentions with code. Many of the tools we rely on to develop code have support for using type annotations to help you catch bugs before you even get to your unit tests.
A trivial example might be as follows.
def greeting(name: str) -> str:
return 'Hello ' + name
Here we’re saying that name
should be type str
and that greeting
also returns a
str
. The topic of type annotations can unveil some deep philosophical questions about how to
write Python code, or even what it means for code to be Pythonic. We won’t crack that egg (pun
intended) open here, but type annotations are an officially supported part of the language and
with tooling like we’ll point out next, it lets you perform type checking at development-time instead of
at run-time.
The mypy project provides static type checking to your project using these type annotations. Editors like PyCharm will alert you if you use a method in a way that doesn’t conform to the annotations provided.
Type annotations in Python, in a sense, are part of documentation-driven development. If you cannot annotate your code, perhaps you should reconsider its design. And you will thank yourself later when trying to use your own code.
Note
Type annotations currently are not (and may never be) “real code”. That is, it is not in fact an error to provide an argument that doesn’t conform to the given type annotation.
We can add annotations to our code as follows.
from typing import List
def cumulative_product(array: List[float]) -> List[float]:
"""
Compute the cumulative product of an array of numbers.
Parameters:
array (list): An array of numeric values.
Returns:
result (list): A list of the same shape as `array`.
Example:
>>> cumulative_product([1, 2, 3, 4, 5])
[1, 2, 6, 24, 120]
"""
result = list(array).copy()
for i, value in enumerate(array[1:]):
result[i+1] = result[i] * value
return result
Note
Using type float
in this instance is actually sufficient to annotate as a generic numeric
type.
From PEP 484:
“Rather than requiring that users write import numbers
and then use numbers.Float
etc.,
this PEP proposes a straightforward shortcut that is almost as effective: when an argument is
annotated as having type float
, an argument of type int
is acceptable…”
CI / CD¶
Continuing from the previous section on CI/CD in testing,
it is pretty strait forward to this up for your documentation as well. If you are using
readthedocs as mentioned previously, everything is taken care of for you, and your Sphinx
documentation is built and published at <project>.readthedocs.io
for you. You can of course
set up your own hosting and a workflow to build and publish your documentation, with e.g.,
GitHub Actions acting as a trigger to kick off that workflow.