Testing

How else do we know it works?

Motivation


def add(thing1, thing2):
    """A function to add things."""
    return thing1 + thing2


Does it work?

Is it correct?

How can we prove it?

Motivation


def add(thing1, thing2):
    """A function to add things."""
    return thing1 + thing2

out = add(1, 1)

Motivation


def add(thing1, thing2):
    """A function to add things."""
    return thing1 + thing2

out = add(1.0, 1)

Motivation


def add(thing1, thing2):
    """A function to add things."""
    return thing1 + thing2

out = add(1.0, '1')

Motivation


def add(thing1, thing2):
    """A function to add things."""
    return thing1 + thing2

out = add([1.0, 2.0], 1.0)

Code Planes: Desired Behavior

<Figure size 960x480 with 0 Axes>

Code Planes: Actual Behavior

<Figure size 960x480 with 0 Axes>

Testing Taxonomy

Testing Domains

(What the planes mean)

  • Compilation
  • Unit
  • Integration
  • End to End

Testing Approaches

(How we measure residuals)

  • Example
  • Property
  • Mutation
  • Mathematical
  • Level of Automation is also important

Testing Domains


Testing Domains: Unit


def test_raises_value_error_if_sparse():
    error_msg = "dense data is required."
    # X must not be sparse if positive == True
    X = sparse.eye(10)
    y = np.ones(10)

    reg = LinearRegression(positive=True)

    with pytest.raises(TypeError, match=error_msg):
        reg.fit(X, y)

Testing Domains: Integration

def test_pipeline_methods_anova():
    X, Y = iris.data, iris.target
    # Test with Anova + LogisticRegression
    clf = LogisticRegression()
    filter1 = SelectKBest(f_classif, k=2)
    anova = ("anova", filter1)
    logistic = ("logistic", clf)
    pipe = Pipeline([anova, logistic])
    pipe.fit(X, y)
    pipe.predict(X)
    pipe.predict_proba(X)
    pipe.predict_log_proba(X)
    pipe.score(X, y)

Testing Domains: End to End (E2E)

def test_user_story_1():
    # read data from disk
    spool = dc.load('example_files')
    # chunk data, filter
    out = []
    for patch in spool.chunk(time=10):
        pa = patch.pass_filter(time=(None, 10))
        out.append(pa)
    # create new spool
    proc_spool = dc.spool(out)
    # plot first patch
    proc_spool[0].viz.waterfall(show=True)
    # save to disk
    proc_spool.io.write('processed', 'DASDAE')

Testing Approaches: Example


  • Pick out example inputs
  • Ensure expected and actual behavior match
  • Include inputs related to bugs
  • How do we know when we have picked enough examples?

Testing Approaches: Example


def add(thing1, thing2):
    return thing1 + thing2

def test_add_some_floats():
    """Add floats together"""
    assert add(1., 3.) == 4.
    assert add(10., 12.) == 22.
    assert add(-5., 5.) == 0.

Testing Approaches: Example

<Figure size 960x480 with 0 Axes>

Testing Approaches: Property


  • Generate many examples (distribution)
  • Keep track of failures
  • Emphasis on edge cases (eg 0, -1 NaN, inf)
  • Tests can take much longer
  • Non-deterministic

Testing Approaches: Property


from hypothesis import given
import hypothesis.strategies as st

def add(thing1, thing2):
    return thing1 + thing2

# The framework injects f1/f2 into test many times
@given(f1=st.floats(), f2=st.floats())
def test_add_many_floats_floats(f1, f2):
    """Add floats together"""
    assert add(f1, f2) == f1 + f2

Testing Approaches: Property

<Figure size 960x480 with 0 Axes>

Testing Approaches: Mutation

Testing Approaches: Mutation


  • Approach to test the tests
  • Make syntactically valid changes to the code
  • Run the test suite
  • Expects a test failure -> “killed the mutant””

Testing Approaches: Mutation


Original function and tests

def add(thing1, thing2):
    return thing1 + thing2

def test_add_floats():
    """Test add floats together"""
    assert add(1., 2.) == 3.
    assert add(2., 2.) == 4.

Testing Approaches: Mutation


Mutated function (tests unchanged)

def add(thing1, thing2):
    return thing1 - thing2

def test_add_floats():
    """Test add floats together"""
    assert add(1., 2.) == 3.
    assert add(2., 2.) == 4.

Testing Approaches: Mutation

<Figure size 960x480 with 0 Axes>

Testing Approaches: Mathematical


  • Pure functional languages can be proven correct via formal proofs
  • How can we prove the proof is correct?
  • Possible, but difficult in practice

When/How to Test?

The law of diminishing returns applies!

When/How to Test?

Selecting the right level of testing


  • How long will the software be in use?
  • How many people will interact with the software?
  • What are the consequences of failure?
  • How many lines of code/files could this grow to?

When/How to Test?


  • Running a few E2E tests (does it generate figures, does it look right?)
  • Writing targeted unit tests (most important/difficult parts)
  • Simple tests can be in the same files as code
  • For libraries/applications use a testing framework!

Single File Tests


def add(thing1, thing2):
    return thing1 + thing2

def test_add_many_floats_floats():
    """Test add floats together"""
    assert add(1., 2.) == 3.
    assert add(2., 2.) == 4.

if __name__ == "__main__":
    test_add_many_floats_floats()

Pytest


  • De-facto python testing framework (apart from unittest)
  • Highly extensible (100+ plugins)
  • Separates setup, testing, teardown
  • Configurable with commandline arguments

Pytest

How to use pytest?

  • Call pytest from the command line, it will then:
  • Parse command line arguments
  • Discover tests (files: test*)
  • Collect tests and fixtures (classes Test*, functions/methods test*)
  • Resolve fixture/test dependencies
  • Run selected tests

Pytest

pytest to test our add function:

test_add.py
# assumes add function is in myadd.py
from myadd import add 

def test_add_floats():
    """Test add floats together"""
    assert add(1., 2.) == 3.
    assert add(2., 2.) == 4.

Pytest

Organizing tests

test_add.py
from mymodule import add

def test_add_floats():
    ...

def test_add_strings():
    ...
   
def test_add_lists():
    ...

Pytest

Organizing tests

test_add.py
from mymodule import add

class TestAdd:
    def test_add_floats(self):
        ...
    
    def test_add_strings(self):
        ...
       
    def test_add_lists(self):
        ...

Pytest

Example: Write extra tests for numpy’s linalg norm

test_norm.py
import numpy as np

def test_l2_norm_positive():
    random = np.rand.random(100)
    assert np.linalg.norm(random, ord=2) > 0

def test_nan_returns_nan():
    random = np.rand.random(100)
    random[10] = np.NaN
    out = np.linalg.norm(random)
    assert np.isnan(out)

Pytest

Run tests from the command line using pytest.

pytest

pytest test_norm.py


pytest test_norm.py::test_nan_returns_nan

Knowledge Check


What domain and approach does test_norm use?

Domain: Unit

Approach: Example

Pytest: Testing Errors


test_add.py
import pytest

from mymodule import add

def test_add_string_num_raises():
    msg = "unsupported operand type"
    with pytest.raises(TyperError, match=msg):
        add(1, '1')

Pytest: Parametrization

test_add.py
import pytest
from mymodule import add

io_tuple = (
    (1, 2, 3),
    (4, 5, 9),
    ("a", "b", "ab"),
)

@pytest.mark.parametrize("a,b,expected", io_tuple)
def test_add_string_num_raises(a, b, expected):
    assert add(a, b) == expected

Anatomy of a Software Test


  • Arrange - setup test conditions
  • Act - perform behavior under test
  • Assert - measure difference between expected and observed
  • Cleanup - put program state back in order

Anatomy of a Software Test


from myadd import add 

def test_add_many_floats_floats():
    """Test add floats together"""
    value_1 = 1.
    value_2 = 2.
    out = add(value_1, value_2)
    assert out == 3
    del value_1, value_2

Pytest: Fixtures


  • Used to manage arrange, cleanup (and sometimes act)
  • Can “inject” objects under test
  • Scope controls when fixtures run
  • Tests “request” fixtures by using their name as parameters
  • Many useful built-in fixtures (tmp_path, monkeypatch, capsys)

Pytest: Fixtures

test_norm1.py
import numpy as np
import pytest

@pytest.fixture()
def array_with_nans():
    random = np.rand.random(100)
    random[10] = np.NaN
    yield random
    del random

def test_nan_returns_nan(array_with_nans):
    out = np.linalg.norm(array_with_nans)
    assert np.isnan(out)

Pytest: Fixtures Scope

Scopes control how often fixtures are run: “function”, “class”, “module”, “package”, “session”

test_norm1.py
import numpy as np
import pytest

@pytest.fixture(scope='class')
def array_with_nans():
    random = np.rand.random(100)
    random[10] = np.NaN
    yield random
    del random

Knowledge Check

What’s wrong here?

import numpy as np
import pytest

@pytest.fixture()
def data():
    return [1, 2, 3]
    
def test_data_1():
    assert isinstance(data, list)

test_data_1 needs the data argument!

Knowledge Check

What’s wrong here? How can we fix it?

import numpy as np
import pytest

@pytest.fixture(scope='session')
def data():
    return [1, 2, 3]
    
def test_data_1(data):
    data[0] = 10

def test_data_2(data):
    assert sum(data) == 6

Pytest Marks


  • Marks allow organizing tests/fixtures/modules
    • Custom marks defined in pytest.ini or pyproject.toml
  • Running tests can be controlled by marks
  • Marks can also skip/xfail
  • Marks can control parametrization

Pytest Marks

import pytest
    
@pytest.mark.slow
def test_make_new_friends():
    ...

@pytest.mark.skip(reason="Not safe!")
def test_gaze_into_the_abyss():
    ...
    
@pytest.mark.skipif(on_windows, reason="windows sucks")
def test_do_useful_things():
    ...
    
@pytest.mark.xfail
def test_impress_my_father():
    ...

Pytest Marks


import pytest
 
pytestmark = pytest.mark.slow

@pytest.mark.webtest
class TestDownloadData:
    ...

Pytest Marks


Run slow tests

pytest -m slow


Run slow but not webtests

pytest -m "slow not webtests"

Pytest More Skips


import pytest

def test_function():
    if not valid_config():
        pytest.skip("unsupported configuration")

def test_thing_with_optional_library():
    np = pytest.importorskip("numpy")
    ...

Pytest: Derrick’s Test Organization

  • Organize like tests into classes
  • Use fixtures when:
    • Arrange is more than ~3 lines
    • Other tests need the same setup
    • Cleanup is needed
  • Fixtures should be as close to tests as possible
  • Move fixtures from classes, to modules, to conftest.py
  • Each python file (x.py) should have a test file (test_x.py)
  • Test files mirror package org. in tests/ directory

Pytest: Derrick’s Test Organization

import numpy as np

class TestThing1:
    def test_1_thing_1(self, array):
        ar = np.array([1, 2, 3])
        ...
    
    def test_2_thing_1(self, array):
        ar = np.array([1, 2, 3])
        ...

Pytest: Derrick’s Test Organization

import numpy as np
import pytest

class TestThing1:
    @pytest.fixture()
    def array(self):
        return np.array([1, 2, 3])
    
    def test_1_thing_1(self, array):
        ...
    
    def test_2_thing_1(self, array):
        ...

Pytest: Derrick’s Test Organization

import numpy as np
import pytest

@pytest.fixture()
def array():
    return np.array([1, 2, 3])

class TestThing1:
    
    def test_1_thing_1(self, array):
        ...
    
    def test_2_thing_1(self, array):
        ...

class TestThing2:
    def test_1_thing_2(self, array):
        ...

Pytest: Tips/Tricks


  • –pdb flag stops after a test failure and drops into debugger
  • Check coverage with pytest-cov
  • Testing reqs are different for packages/libraries vs research scripts
  • Pytest integrates with many IDEs (vscode, pycharm)
  • Testing helps you write smaller, more modular, code
  • Have fun!

Pytest DemoTime!