Testing¶
Now that we’ve encapsulated our code in a well structured and installable package, how can we validate that our code is robust and works as expected?
We want to address the following …
How can you write modular, extensible, and reusable code?
After making changes to a program, how do you ensure that it will still give the same answers as before?
How can we make finding and fixing bugs an easy, fun, and rewarding experience?
These seemingly unrelated questions all have the same answer, and it is automated testing.
Note
This section was originally based on Ned Batchelder’s excellent article and PyCon 2014 talk Getting Started Testing.
Testing Interactively¶
Let’s look at our initial bit of code.
def cumulative_product(array):
result = array[:1]
last_value = array[-1]
for value in array[1:]:
result.append(result[-1] * value)
if value == last_value:
break
return result
While we were developing this function, we would have likely started up an IPython console and either copied the code snippet or imported the function and tested it on a few simple cases to validate that it returns expected results.
In [1]: from python201.algorithms import cumulative_product
In [2]: cumulative_product([1, 2, 3])
Out[2]: [1, 2, 6]
In [3]: cumulative_product([3, 2, 1])
Out[3]: [3, 6, 6]
In [4]: cumulative_product([1, 2, 3, 4])
Out[4]: [1, 2, 6, 24]
While this kind of testing is better than not doing any testing at all, it leaves much to be
desired. First, it needs to be done each time cumulative_product
is changed. It also requires
that we manually inspect the output from each test to decide if the code “passes” or “fails” that
test. Further, we need to remember all the tests we came up with today if we want to test again
tomorrow.
Test Scripts¶
A much better way to write tests is to put them in a script.
from python201.algorithms import cumulative_product
array = [1, 2, 3]
print(f'cumulative_product({array}) ==', cumulative_product(array))
array = [3, 2, 1]
print(f'cumulative_product({array}) ==', cumulative_product(array))
array = [1, 2, 3, 4]
print(f'cumulative_product({array}) ==', cumulative_product(array))
Now, running and re-running our tests is very easy - we just run the script.
$ python tests/test_algorithms.py
cumulative_product([1, 2, 3]) == [1, 2, 6]
cumulative_product([3, 2, 1]) == [3, 6, 6]
cumulative_product([1, 2, 3, 4]) == [1, 2, 6, 24]
It’s also easy to add new tests, and there’s no need to remember all the tests we come up with.
Assertions¶
One problem with the method above is that we still need to manually inspect the results of our tests.
Assertions can help with this. The assert
statement in Python is very simple: Given a
condition, like 1 == 2
, it checks to see if the condition is true or false. If it is true,
then assert
does nothing, and if it false, it raises an AssertionError
.
In [1]: assert 1 == 1
In [2]: assert 1 < 2
In [3]: assert 1 > 2
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-3-f53b9196f459> in <module>
----> 1 assert 1 > 2
AssertionError:
We can re-write our script as follows.
from python201.algorithms import cumulative_product
assert cumulative_product([1, 2, 3]) == [1, 2, 6]
assert cumulative_product([3, 2, 1]) == [3, 6, 6]
assert cumulative_product([1, 2, 3, 4]) == [1, 2, 6, 24]
And we still run our tests the same way.
$ python tests/test_algorithms.py
This time, there’s no need to inspect the test results. If we get an AssertionError
,
then we had a test fail; and if not, all our tests passed.
That said, there’s no way to know if more than one test failed. The script stops executing
after the first AssertionError
is encountered. Let’s add another test to our test script and
re-run it.
from python201.algorithms import cumulative_product
assert cumulative_product([1, 2, 3]) == [1, 2, 6]
assert cumulative_product([3, 2, 1]) == [3, 6, 6]
assert cumulative_product([1, 2, 3, 4]) == [1, 2, 6, 24]
assert cumulative_product([1, 2, 3, 3]) == [1, 2, 6, 18]
$ python tests/test_algorithms.py
Traceback (most recent call last):
File "tests/test_algorithms.py", line 8, in <module>
assert cumulative_product([1, 2, 3, 3]) == [1, 2, 6, 18]
AssertionError
This time we get a failed test, because – as we said – our code is “bad”. Before adding more tests to investigate, we’ll discuss one more method for running tests.
Automated Testing¶
A test runner takes a bunch of tests, executes them all, and then reports which of them passed and which of them failed. A very popular test runner for Python is pytest.
Like many testing frameworks, pytest
can be quite sophisticated. For the purposes of
this tutorial, we’ll stick to the basics. Essentially, if you place all of your tests within
the appropriate layout, pytest
will automatically find and execute all your tests.
We want all of our tests to live under files with names that start with test
and we need all of our tests
to be encapsulated by functions whose names also start with test
. A nice approach is to have a top
level tests
folder in your project with a structure that mirrors your python package,
including a test_X.py
partner for every module in your package.
In our case, we would have tests/test_algorithms.py
.
from python201.algorithms import cumulative_product
def test_cumulative_product():
assert cumulative_product([1, 2, 3]) == [1, 2, 6]
assert cumulative_product([3, 2, 1]) == [3, 6, 6]
assert cumulative_product([1, 2, 3, 4]) == [1, 2, 6, 24]
assert cumulative_product([1, 2, 3, 3]) == [1, 2, 6, 18]
To run our tests, we simply execute pytest
at the command-line at the top of our
project.
$ pytest
=================================== test session starts ====================================
platform linux -- Python 3.8.3, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /home/glentner/code/github.com/glentner/python201
plugins: hypothesis-5.20.3
collected 1 item
tests/test_algorithms.py F [100%]
========================================= FAILURES =========================================
_________________________________ test_cumulative_product __________________________________
def test_cumulative_product():
assert cumulative_product([1, 2, 3]) == [1, 2, 6]
assert cumulative_product([3, 2, 1]) == [3, 6, 6]
assert cumulative_product([1, 2, 3, 4]) == [1, 2, 6, 24]
> assert cumulative_product([1, 2, 3, 3]) == [1, 2, 6, 18]
E assert [1, 2, 6] == [1, 2, 6, 18]
E Right contains one more item: 18
E Use -v to get the full diff
tests/test_algorithms.py:9: AssertionError
================================= short test summary info ==================================
FAILED tests/test_algorithms.py::test_cumulative_product - assert [1, 2, 6] == [1, 2, 6, 18]
==================================== 1 failed in 0.21s =====================================
Pytest has found our test modules and run all our tests. Each module will be reported on its own
line. A dot will appear while it is running each test. An F
is printed when a test fails
with a summary of what happened. Here we see that our final comparison failed and we are told
precisely what the problem is.
Useful Tests¶
Now that we know how to write and run tests, what kind of tests should we write?
Testing cumulative_product
for arbitrary choices of inputs like [1, 2, 3]
might not tell us much about where the problem might be.
Instead, we should choose tests that exercise specific functionality of the code we are testing, or represent different conditions that the code may be exposed to.
For example:
An array of length 0 or 1.
An array of mixed sign or precision.
An array containing NaN values.
In our case, it was even simpler than that; the existence of a value equal to that of the final value prematurely truncates the sequence.
Note
Handling edge cases like those listed above are of course important, but even simple tests that may seem silly are equally important sanity checks that will exercise your code when you make changes.
Fixing the Code¶
Let’s rewrite our function to be a bit more Pythonic and without that troublesome bug.
def cumulative_product(array):
result = list(array).copy()
for i, value in enumerate(array[1:]):
result[i+1] = result[i] * value
return result
Not necessarily perfect, but clean and concise. Our intent is better expressed by the code and we’ve become a bit more flexible with the possible input data types. And we’ve eliminated the bug!
from python201.algorithms import cumulative_product
def test_cumulative_product_simple():
assert cumulative_product([1, 2, 3]) == [1, 2, 6]
assert cumulative_product([3, 2, 1]) == [3, 6, 6]
assert cumulative_product([1, 2, 3, 4]) == [1, 2, 6, 24]
assert cumulative_product([1, 2, 3, 3]) == [1, 2, 6, 18]
def test_cumulative_product_empty():
assert cumulative_product([]) == []
def test_cumulative_product_starts_with_zero():
assert cumulative_product([0] + list(range(100))) == [0] * 101
Let’s run our tests again.
$ pytest
================================== test session starts ===================================
platform linux -- Python 3.8.3, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /home/glentner/code/github.com/glentner/python201
plugins: hypothesis-5.20.3
collected 3 items
tests/test_algorithms.py ... [100%]
=================================== 3 passed in 0.09s ====================================
Types of Testing¶
Software testing is a vast topic and there are many levels and types of software testing. For scientific and research software, the focus of testing efforts is primarily:
Unit tests: Unit tests aim to test small, independent sections of code (a function or parts of a function), so that when a test fails, the failure can easily be associated with that section of code. This is the kind of testing that we have been doing so far.
Regression tests: Regression tests aim to check whether changes to the program result in it producing different results from before. Regression tests can test larger sections of code than unit tests. As an example, if you are writing a machine learning application, you may want to run your model on small data in an automated way each time your software undergoes changes, and make sure that the same (or a better) result is produced.
Test-Driven Development¶
Test-driven development (TDD) is the practice of writing tests for a function or method before actually writing any code for that function or method. The TDD process is to:
Write a test for a function or method
Write just enough code that the function or method passes that test
Ensure that all tests written so far pass
Repeat the above steps until you are satisfied with the code
Proponents of TDD suggest that this results in better code. Whether or not TDD sounds appealing to you, writing tests should be part of your development process, and never an afterthought. In the process of writing tests, you often come up with new corner cases for your code, and realize better ways to organize it. The result is usually code that is more modular, more reusable and of course, more testable, than if you didn’t do any testing.
Growing a Useful Test Suite¶
More tests are always better than less, and your code should have as many tests as you are willing to write. That being said, some tests are more useful than others. Designing a useful suite of tests is a challenge in itself, and it helps to keep the following in mind when growing tests:
Tests should run quickly: testing is meant to be done as often as possible. Your entire test suite should complete in no more than a few seconds, otherwise you won’t run your tests often enough for them to be useful. Always test your functions or algorithms on very small and simple data; even if in practice they will be dealing with more complex and large datasets.
Tests should be focused: each test should exercise a small part of your code. When a test fails, it should be easy for you to figure out which part of your program you need to focus debugging efforts on. This can be difficult if your code isn’t modular, i.e., if different parts of your code depend heavily on each other. This is one of the reasons TDD is said to produce more modular code.
Tests should cover all possible code paths: if your function has multiple code paths (e.g., an if-else statement), write tests that execute both the “if” part and the “else” part. Otherwise, you might have bugs in your code and still have all tests pass.
Test data should include difficult and edge cases: it’s easy to write code that only handles cases with well-defined inputs and outputs. In practice however, your code may have to deal with input data for which it isn’t clear what the behavior should be. For example, what should
cumulative_product([])
return? Make sure you write tests for such cases, so that you force your code to handle them.
Extras¶
Hypothesis¶
Like many such frameworks, pytest
has a plugin system that allows its functionality to be
extended by design. A notable package that works as a plugin for pytest
is
Hypothesis.
Hypothesis
implements property-based testing that allows you to write unit tests in a way
that isn’t hard-coded. You define strategies for given inputs and hypothesis
automatically generates
entire ensembles of tests for a given definition including edge cases you would want to cover.
For our zero test, if the initial value of the array is zero, it simply doesn’t matter what the
remaining values of the array are, the result will be an array of the same length and all zeros.
So our test could use hypothesis
to define a strategy that will test many cases without us
hard coding them.
from hypothesis.strategies import lists, integers
from hypothesis import given
from python201.algorithms import cumulative_product
def test_cumulative_product_simple():
assert cumulative_product([1, 2, 3]) == [1, 2, 6]
assert cumulative_product([3, 2, 1]) == [3, 6, 6]
assert cumulative_product([1, 2, 3, 4]) == [1, 2, 6, 24]
assert cumulative_product([1, 2, 3, 3]) == [1, 2, 6, 18]
def test_cumulative_product_empty():
assert cumulative_product([]) == []
@given(lists(integers()))
def test_cumulative_product_starts_with_zero(values):
array = [0] + list(values)
assert cumulative_product(array) == [0] * len(array)
CI / CD¶
The term CI/CD stands for “continuous integration / continuous deployment”. The idea integrates development with software operations and can be a non-trivial topic to cover, let alone implement. This concept is equally applicable to the previous section on packaging as well as the next section on documentation. We’ll include it here because it’s a great idea to implement for automated testing.
What would be great is if we could automate running our tests on all changes to the code. Not just locally, but using our version control system, so that contributions made to the code are automatically tested without us needing to pull down that branch and running the tests manually to see if everything is okay. We want a nice “green light” to give us the all-clear – always.
There are several services out there that will integrate with your version control hosting platform and automatically run a customizable workflow when events happen (e.g., on a pull request or a new push to the repository). If you’re hosting your project on GitHub though, using the freely available Github Actions is a super simple way to provide this functionality for your Python project.
We basically just need to include a configuration file in our project in a special location
and GitHub will automatically understand what to do. Here is a really basic example to get
started that will run pytest
for us everytime we have new commits or PRs to the develop
branch of our project using multiple versions of Python.
name: tests
on:
push:
branches: [ master, develop ]
pull_request:
branches: [ develop ]
jobs:
testing:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.8', '3.9']
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install Project
run: |
python -m pip install --upgrade pip setuptools wheel pytest hypothesis
python -m pip install pipenv
pipenv install --deploy --system
- name: Run Tests
run: |
pytest -v