Mutation Testing
Geschreven door
Jan Smets
Picture this: you've been working night and day to get your new and revolutionary feature ready for your customer. You've used TDD, have done your testing, code has been reviewed and your dashboard shows an astonishing 99% line, method and branch coverage. You deploy your feature on acceptance and go home, feeling good. Next day, you arrive at work and - out of the blue - your mailbox is filled with test findings of the customer... what happened? After some analysis on these findings, it seems the 99% line and branch coverage didn't result in a complete picture of the quality of the code. How is this possible?
Categorie:
Technology
What went wrong?
Why can't we rely on our old pals, line coverage and branch coverage? Well, the problem is that a high line or branch coverage is no real indication of the quality of your code. Let's look at three problems with line/ branch coverage:
What is missing? Well, there's no check on the invocation of the perform method so all side-effects of this method (additional parameters being set, context being changed...) are not covered by this test.
Is this test complete? No, a check is missing on the boundary case of a 0 value for the variable "i" while boundary values often have special meaning, can trigger seperate behaviour or can even trigger undesired exceptions (divisions by 0 etc.).
What's wrong with the two tests above? The return value of method foo is completely ignored while a return value has an explicit meaning (otherwise it wouldn't be used) and can have direct impact on the result of the process flow.
To make it even more concrete, check the test below, a test for the power-function. What can go wrong if this is your only test?
Answer: power() can perform an addition, a multiplication, perform y to the power x instead of x to the power y, etc. and still the test wouldn't fail.
What do we need - is there a better way?
A solution for these problems can be found in mutation testing. Mutation testing is all about testing your tests, checking the quality of your tests. When using mutation testing, you will execute your tests on slightly modified versions of your source code. Such a modified version of your code is called a mutant. The endgame is: getting all mutants - or as much as possible - killed. Killing a mutant means that at least one of your tests fails on this mutation.
Mutation testing is quite an old idea, it was conceived in the 70's but wasn't very mainstream until recently because a giant amount of mutants can be generated based on your code. This makes running mutation tests very resource intensive. As computers have become extremely powerful nowadays, the concept gets more and more traction.
How does it work?
How does a mutation test work? It will create a tree structure of your source code and in all nodes that contain conditions, constant values for variable, etc. a mutation can be applied. Those can - for example - entail removing or negating a condition.
For all the generated mutations, the units tests will be run. As soon as one test fails, the mutant is killed and that test ends. That way, all mutants are processed and at the end you will get an overview of the percentage of killed mutations.
A tool as PIT can use this outcome to generate a detailed report with the quality of your tests per class, package, etc.
Mutator types
What are the different types of mutators? Here you can see the most important types:
You have - as already mentioned - the mutations on conditions, next to that you have mutations on math operators or logical operators. If you look a return values, you can replace a boolean true by false, negate a numeric value, etc. and - in case of collections - you can create a mutation that returns an empty collection instead of the intended one.
Do we need perfection?
Do we need perfection? No, this isn't necessary in all cases. As you can see in the screenshot, which shows the generation of a informational message, mutation tests fail because there is nog test for the message constructed. Is this a test we absolutely need? Maybe we don't...
Extra benefits
What did we get from improving the mutation coverage, other that being able to leave work without fearing the next day😊?
Well, we discovered a few bugs which weren't reported, we added tests that covered unforeseen but probable scenarios. We also removed dead code, unnecessary checks,... So simply performing the process delivered value. Next to this, our test became more robust and more complete. So I would say it's a no-brainer to add this type of testing to the development process.
Java Only?
Is mutation testing only available for Java development? No, not at all, here you can find some examples for a whole range of programming languages and frameworks:
A well-known mutation testing framework is Stryker, which is available for Javascript, Typescript, .NET en Scala.
References:
- https://pitest.org/
- https://en.wikipedia.org/wiki/Mutation_testing
- https://www.softwaretestinghelp.com/what-is-mutation-testing/
- https://www.youtube.com/watch?v=G0MbITvWfgY
- https://stryker-mutator.io/
Lees ook: