In its article The Anatomy of a Unit Test, Alastair Smith, compares tests to science experiments. There is a stated hypothesis, and that hypothesis is either proved or disproved.
The Scientific Method suggests the following general structure for an experiment:
- Hypothesis: the idea or theory we wish to prove or disprove
- Method: a detailed description of how we intend to go about proving or disproving our hypothesis
- Observations: the raw results obtained by following the method
- Analysis: an inspection of the observations to determine their significance
- Evaluation: based on the analysis, a conclusion on whether we can prove/disprove our hypothesis
There are three important consequences of the Scientific Method, namely that the experiments are reproducible, falsifiable, and provide a measure of confidence in the results.
Implementation wise, the parallelism proposed compares:
- test name with the hypothesis
- test function with the method
Software Testing Levels
Software Testing Levels are the different stages of the software development lifecycle where testing is conducted.
- UNIT TESTING validates that each unit of the software performs as designed.
- INTEGRATION TESTING exposes faults in the interaction between integrated units.
- SYSTEM TESTING (performed by developers and/or QA, maybe automated): evaluates the system’s compliance with the specified requirements.
- ACCEPTANCE TESTING (performed by customers and/or managers): evaluates the system’s compliance with the business requirements and assess whether it is acceptable for delivery.
Regression Testing is just a type of testing that can be performed at any of the four main levels.
Unit Testing
F.I.R.S.T Principles
The FIRST mnemonic is a concise set of criteria for effective unit tests:
- FAST
- ISOLATED
- REPEATABLE
- SELF-VALIDATING
- TIMELY
There are variations, where I stands for INDEPENDENT and/or T for THOROUGH.
Approaches
There are two different approaches to how a test is performed:
- BLACK BOX TESTING (a.k.a. BEHAVIORAL TESTING) is where you test a unit without knowning its internal structure/design/implementation details.
- WHITE BOX TESTING (a.k.a. STRUCTURAL TESTING) can help capture obscure scenarios that you might not think about without knowing implementation details.
We can also make a distinction according to what we assert:
-
TESTING BEHAVIOUR OR STATE involves performing an action on a unit and then checking that either an expected result was returned, or the state of the unit has been updated as expected (e.g. "I don’t care how you come up with the answer, just make sure that the answer is correct under this set of circumstances")
-
TESTING IMPLEMENTATION (only possible with WBT) is where you check that certain methods are invoked or not during the execution of an action. You verify that internal behavior is doing what is expected (e.g. "I don’t care what the answer is, just make sure you do this thing while figuring it out.")
.
Anatomy
Anatomy of a test depends on the testing level and/or the testing framework you are using. But most authors propose similar and more or less overlapping formulations.
See for instance Gerard Meszaros’s 4 phases:
- SETUP
- EXERCISE
- VERIFY
- TEARDOWN
TDD flavour
The AAA approach is common among TDD practicioners and when tests will only be read by developers.
- ARRANGE all necessary preconditions and inputs.
- ACT on the object or method under test.
- ASSERT that the expected results have occurred.
BDD flavour
The GWT approach is preferred when tests will be read by business users and Domain experts. It is also very popular in TDD.
- GIVEN describes the state of the world before you begin the behavior you’re specifying in this scenario (pre-conditions).
- WHEN is that behavior that you’re specifying.
- THEN describes the changes you expect due to the specified behavior.
Test Doubles
As Martin Fowler defines it, a Test Double is a generic term for any case where you replace a production object for testing purposes.
- DUMMY objects are passed around but never actually used (e.g. a mandatory method parameter).
- FAKE objects actually have working implementations, but take shortcuts which make them not suitable for production (e.g.
InMemoryTestDatabase
).
- STUBS provide canned answers to calls made during the test (fake responses).
- SPIES are stubs that also record some information based on how they were called (e.g. email service that records the number of messages sent).
- MOCKS are pre-programmed with expectations which form a specification of the calls they are expected to receive (provide means to check if a method is called or if a property is set).
Anti-pattern
- PARTIAL MOCKS: object which has been wrapped or changed to provide any artificial response to some methods but not others. Their use result in decreased comprehensibility of the test and difficulter setup.