[ad_1]
Overview and Implementation With Python
Introduction
When maintaining a data pipeline, it’s important to test the underlying code after making any changes. That way, you can be alerted whenever any refactored code fails to perform as expected.
Most beginners tend to test their code manually, running their functions against different arguments one at a time. This approach is simple, but it does not scale well.
Testing a piece of code against a set of arguments once shouldn’t be very time-consuming, but how long would it take to do it 10 times? 50 times? 100 times?
In projects that span months or years, where a data pipeline’s underlying code is frequently refactored, testing code manually will cost programmers a considerable amount of time.
Those taking on long-term projects will benefit from automating this process by using pytest, a Python testing framework that enables users to execute tests with little code.
Here, we uncover the benefits of pytest and show how data scientists can leverage this package to write basic tests through a case study.
Is Pytest Worth The Effort?
Writing test with pytest requires users to learn the new framework and even adopt new programming habits. So, it might be tempting to shun this tool and stick to testing code by manually running it on kernels in a Jupyter Notebook (confession: this used to be me).
However, the benefits of pytest more than make up for the time and effort it takes to learn to write them.
- Pytest requires little run time
The time needed to run the given tests is negligible. Moreover, thanks to the pytest package’s easy syntax, a single command is sufficient for running all tests against the pre-defined arguments.
2. Pytest improves the debugging experience
The reports generated by pytest are very informative. They identify the tests that pass/fail, while also pointing out the cause of the failed tests. This makes it easier for programmers to spot their mistake and rectify it.
3. Pytest provides documentation
The scripts written with pytest serve as an additional piece of documentation. Collaborators looking to understand a piece of code can use the test functions to determine its purpose without needing to skim through endless lines of source code.
4. Pytest breeds confidence
With pytest, users can safely push code to production knowing that the code still performs as expected. This allays any fears that data scientists may have of pushing code that sabotages the data pipeline.
Case Study
For this case study, we will write a number of test functions using pytest to test the functions in the module.py
file. This file contains two functions: add_lists
and subtract_lists
.
The add_lists
function merges two lists by adding the numbers in each element. For instance:
add_lists([1,1], [2,2]) = [3,3] #[1+2, 1+2]
add_lists([1,2], [3,4]) = [4,6] #[1+3, 2+4]
add_lists([2,3], [2,1]) = [4,4] #[2+2, 3+1]
The subtract_lists
function merges two lists by subtracting the numbers in each element from the lists. For instance:
subtract_lists([1,1], [2,2]) = [-1, -1] #[1-2, 1-2]
subtract_lists([1,2], [3,4]) = [-2,-2] #[1-3, 2-4]
subtract_lists([2,3], [2,1]) = [0,2] #[2-2, 3-1]
Setting Up the Environment
First, install pytest by running the following command:
pip install -U pytest
Long-term projects tend to comprise multiple scripts, so it is common practice for the written tests to be placed in a separate folder that doesn’t contain the source code. For this case study, we will adhere to this practice by setting up the project in the following manner.
.
└── Project/
├── src/
│ └── module.py
└── test/
└── test_module.py
The file named test_module.py
will store all of the test functions.
Writing Our First Tests
Now, we can write our first test functions, which test the add_lists
and subtract_lists
functions. As explained by the pytest documentation, a test typically has 4 steps:
- Arrange: Prepare everything needed for the test.
- Act: Run the function that is being tested
- Assert: Check if the tested code’s output matches what was expected.
- Cleanup: Remove any objects generated from the test (if any) so that other tests aren’t affected. This step is optional.
Let’s write two test functions that follow these steps.
Note: The test functions in pytest must follow the form test_*.py
or \*_test.py
to be executed.
In the snippet above, we arrange the tests by establishing the input lists as well as the expected outputs. Then, we act by running the add_lists
and subtract_lists
functions with the provided inputs. Finally, we assert by using assert statements to check if the returned values match the expected values.
Note: Assert statements are a core component in many tests. If you’re unfamiliar with the syntax of assert statements or just need a refresher, check out the following article:
Tests can be run by using the following command in the command line:
pytest <file_name>
Interpreting The Pytest Report
Let’s get familiar with the pytest report by running the tests in the test_module.py
file with the command:
pytest test_module.py
If all test functions pass, the report will look like the following:
If one or more test functions fail, the report will look like the following:
As shown by the outputs, test functions that pass are denoted with the .
character, while test functions that fail are denoted with the F
character.
When a test fails, the report points to the assert statement that isn’t being met with the >
symbol and presents the error message right under it.
Overall, the reports generated by pytest are very informative. They tell us how many tests were run, how many tests passed/failed, and why the tests failed (if they did).
Generating Reports With Greater Verbosity
The pytest <file_name>
command is sufficient for running tests, but if you want to increase the amount of information reported in the output, you can simply use the -v flag.
pytest <file_name> -v
Let’s see the pytest report again after using the -v flag.
pytest test_module.py -v
This time, we can explicitly see the name of the test functions as well as their outcomes.
Testing a Function With Multiple Arguments
The test_add_lists
function currently tests the add_lists
function with only one case. However, there are many situations where functions need to be tested with multiple cases.
Take the add_lists
function for example. Although adding numbers in two lists is a simple task, there are a few edge cases that need to be considered:
- Adding lists with unequal length
- Adding empty lists
- Adding lists with strings
We could test the add_lists
function with all of these cases by creating test functions that test one case each.
def test_function1():
# test function with the first argument
def test_function2():
# test function with the second argument
def test_function3():
# test function with the third argument
However, that would require repeating many lines of code. Instead, we can run tests against multiple inputs by using the pytest.mark.parametrize
decorator.
We can modify the current test_add_lists
function so that it tests multiple arguments with the following snippet:
This time, the pytest.mark.parametrize
decorator defines 3 inputs as well as the expected outputs. This serves as the arrange phase of the test. The act and assert phases don’t need to be changed.
When we execute the test for this function, we get the following results:
pytest test_module.py -v
As shown by the generated report, each input is defined in the pytest.mark.parametrize
decorator is treated as an individual test. Hence, the report shows the results of 3 tests.
Testing Multiple Functions With the Same Data
So far, we have been creating the input data inside each test function.
This begs the question: how should we approach tests where we wish to test multiple functions with the same data?
The simplest approach would be to instantiate the same input data inside each function.
def test_function1():
# instantiate input data
def test_function2():
# instantiate input data
def test_function3():
# instantiate input data
However, this is an undesirable practice for a number of reasons.
Firstly, it would entail repeating the same lines of code, which hampers readability. Secondly, loading data repeatedly can be a time-consuming and computationally intensive process. If the data needed for tests comes from a database or flat file, performing reads repeatedly on the same data would be highly inefficient.
Fortunately, users of pytest can resolve this issue by using fixtures.
A fixture is a function that uses the pytest.fixture
decorator. It returns the data needed for subsequent tests. Tests that need data from a fixture can access it by passing the fixture function as an argument.
As an example, let’s suppose that we wish to test the add_lists
and subtract_lists
functions with the same data.
To do so, we can first create a function with the pytest.fixture
decorator called example_data
that returns the data that will be used for the tests.
This data can be accessed by the test functions test_add_list
and test_subtract_lists
by passing the example_data
function as the argument.
pytest test_module.py -v
Conclusion
Well done! You have now learned to write and run basic tests with pytest!
While this beginner-level case study doesn’t provide a comprehensive breakdown of all the features in pytest, it has hopefully encouraged users to embrace the practice of writing scripts using the package for a more structured, efficient, and scalable approach toward testing.
I wish you the best of luck in your data science endeavors!
Source link