* It promotes actually looking at the code before considering it done
* It promotes refactoring
* It helps to prevent breaking changes for stuff that wasn't supposed to change
* The tests doesn't actually test functionality, edge cases etc, just that things doesn't crash in a happy-path.
* Any changes to an implementation breaks a test needlessly, because the test tests specifics of the implementation, not correctness. Thus it makes refactoring actually harder, since your test said you broke something, but you probably didn't, and now you have to double the work of writing a new test.
* In codebases for dynamic languages, most of what these tests end up catching is stuff a compiler would catch in a statically typed language.
(x,y) => x + y
Orgs targeting code coverage write a test for 1,2 => 3 and get 100% coverage and then stop as there is no incentive to go further. They don't write tests for say (1,null)
(null,null)
('x',1)
(NaN, Infinity)
and so on.. these additional tests will improve coverage of scenarios and code coverage will not move.I have seen projects where a test will have sequence of steps which trigger the code but the assertion is effectively true === true, or they will replicate the same function in the test instead of generating proper mock data or myriad different absurd testing approaches. This comes from twin pressure of showing coverage and having tests pass.
Coverage is also a challenge in code which uses AI/ML libraries or that use third party services. These really need statistical testing with large volume of diverse well maintained samples and results need statistical analysis for error rates not different from how manufacturing does it, I don't see that often. For code using face detection for example a single face getting detected or not is hardly an adequate test.
Finally, it is easier to improve coverage by testing simpler code than improve coverage/ refactor a function which has say 10 nested branches, so it is not uncommon to see 90% coverage and 10% of the most used / most error prone code poorly or not tested at all.
There are some methods to address these like mutation testing, do retros for failure of tests to capture production bugs, it is not easy to measure and coverage driven orgs will not see their metrics moving by doing these.
Well written test suites will also have good coverage, but not necessarily other way around. Developers who care and understand what they are doing and why they are doing it, will use coverage as only the first step to see where there are gaps in their tests.
Tests are also code that need peer reviewed and maintained, if the tests depend on implementation and constantly break or contains improper mocks or assert poorly. A lot of not well written tests is hindrance to development than aid it.
[1] Yes, most of these are not applicable in a strongly typed language, but it is far easier as a illustration .
This is low coverage.
> Any changes to an implementation breaks a test needlessly, because the test tests specifics of the implementation, not correctness.
This is bad design.
> In codebases for dynamic languages, most of what these tests end up catching is stuff a compiler would catch in a statically typed language.
So they are not useless.
No, as a sibling comment to mine shows, it's actually easy to make 100% coverage with bad tests, since one doesn't challenge the implementation to handle edge cases.
I am not aware of any language (outside of intentionally-minimalist esolangs) that doesn't support floating point numbers. In some languages (like JavaScript) that's the only kind of number you get.
It was a pretty thorough study:
> Our study is the largest to date in the literature: we generated 31,000 test suites for five systems consisting of up to 724,000 lines of source code. We measured the statement coverage, decision coverage, and modified condition coverage of these suites and used mutation testing to evaluate their fault detection effectiveness. We found that there is a low to moderate correlation between coverage and effectiveness when the number of test cases in the suite is controlled for.
Given their data, their conclusion seems pretty plausible:
> Our results suggest that coverage, while useful for identifying under-tested parts of a program, should not be used as a quality target because it is not a good indicator of test suite effectiveness.
That's certainly how I approach testing: I value having a thorough test suite, but I do not treat coverage as a target or use it as a requirement for other people working on the same project.
[1]: https://neverworkintheory.org/2021/09/24/coverage-is-not-str...
AFAIK, «high coverage» may have different meaning for different people. For me, it's «high quality», for others it's «high percentage», e.g. «full coverage» or «80% coverage», which is easy to OKR.
Which is what makes this whole concept of code coverage so much toxic nonsense...
Not to argue against writing 'quality' tests, but high 'coverage' actually decreases quality, objectively speaking, since erroneous coverage of code serves negative purposes such as obscuring important testing, enshrinig bugs within testing.
I would make my case here CodePilot and all such 'AI' tools should be banned from production, at least until they solve the above problem, since as it stands they will serve to shovel piles of useless or worse, incorrect testing.
It is also important to remember what AI does, i.e. produce networks which create results based upon desired metrics - if the metrics were wrong or incomplete, you produce and propagate bad design.
So yes people use it now as a learning tool (fine) and it will get 'better' (sure), but as a tool, when it gets better, it will constrain more, not less, along whatever lines have been deemed better, and it will become harder, not easier, to adjust.
Never encountered it before (and I'm ashamed to say C# is my main language since 2009).