zlacker

[parent] [thread] 17 comments
1. manque+(OP)[view] [source] 2021-10-27 18:49:51
Writing tests for the sake of coverage is already practically useless which is what a lot of orgs do, This could maybe generate such tests. However it doesn't materially impact quality now, so not much difference if automated.

One of the main value props for writing meaningful unit tests, is it helps the developer think differently about the code he is writing tests for, and that improves quality of the code composition.

replies(2): >>Graffu+57 >>xtract+hq
2. Graffu+57[view] [source] 2021-10-27 19:23:53
>>manque+(OP)
Why is that useless? Codebases I have worked on that had high code coverage requirements had very little bugs.

* It promotes actually looking at the code before considering it done

* It promotes refactoring

* It helps to prevent breaking changes for stuff that wasn't supposed to change

replies(3): >>matsem+28 >>manque+mb >>tikhon+mH
◧◩
3. matsem+28[view] [source] [discussion] 2021-10-27 19:28:06
>>Graffu+57
I feel the opposite of codebases where having high coverage has been a priority:

* The tests doesn't actually test functionality, edge cases etc, just that things doesn't crash in a happy-path.

* Any changes to an implementation breaks a test needlessly, because the test tests specifics of the implementation, not correctness. Thus it makes refactoring actually harder, since your test said you broke something, but you probably didn't, and now you have to double the work of writing a new test.

* In codebases for dynamic languages, most of what these tests end up catching is stuff a compiler would catch in a statically typed language.

replies(1): >>drran+ik
◧◩
4. manque+mb[view] [source] [discussion] 2021-10-27 19:42:25
>>Graffu+57
The example i usually give [1] in javascript is say you have function

    (x,y) => x + y
Orgs targeting code coverage write a test for 1,2 => 3 and get 100% coverage and then stop as there is no incentive to go further. They don't write tests for say

   (1,null)
   (null,null)
   ('x',1) 
   (NaN, Infinity) 
and so on.. these additional tests will improve coverage of scenarios and code coverage will not move.

I have seen projects where a test will have sequence of steps which trigger the code but the assertion is effectively true === true, or they will replicate the same function in the test instead of generating proper mock data or myriad different absurd testing approaches. This comes from twin pressure of showing coverage and having tests pass.

Coverage is also a challenge in code which uses AI/ML libraries or that use third party services. These really need statistical testing with large volume of diverse well maintained samples and results need statistical analysis for error rates not different from how manufacturing does it, I don't see that often. For code using face detection for example a single face getting detected or not is hardly an adequate test.

Finally, it is easier to improve coverage by testing simpler code than improve coverage/ refactor a function which has say 10 nested branches, so it is not uncommon to see 90% coverage and 10% of the most used / most error prone code poorly or not tested at all.

There are some methods to address these like mutation testing, do retros for failure of tests to capture production bugs, it is not easy to measure and coverage driven orgs will not see their metrics moving by doing these.

Well written test suites will also have good coverage, but not necessarily other way around. Developers who care and understand what they are doing and why they are doing it, will use coverage as only the first step to see where there are gaps in their tests.

Tests are also code that need peer reviewed and maintained, if the tests depend on implementation and constantly break or contains improper mocks or assert poorly. A lot of not well written tests is hindrance to development than aid it.

[1] Yes, most of these are not applicable in a strongly typed language, but it is far easier as a illustration .

replies(2): >>Graffu+Fg >>jmnico+Oi
◧◩◪
5. Graffu+Fg[view] [source] [discussion] 2021-10-27 20:08:02
>>manque+mb
So one argument against code coverage requirements is that poor engineers won't test correctly. Without the code coverage requirements you're in the same situation.
replies(2): >>crysin+Uo >>manque+uK
◧◩◪
6. jmnico+Oi[view] [source] [discussion] 2021-10-27 20:19:29
>>manque+mb
I don't think there's Infinity in my language, what do you use it for except maths?
replies(2): >>dpryde+Yy >>eloisi+zC
◧◩◪
7. drran+ik[view] [source] [discussion] 2021-10-27 20:28:07
>>matsem+28
> The tests doesn't actually test functionality, edge cases etc, just that things doesn't crash in a happy-path.

This is low coverage.

> Any changes to an implementation breaks a test needlessly, because the test tests specifics of the implementation, not correctness.

This is bad design.

> In codebases for dynamic languages, most of what these tests end up catching is stuff a compiler would catch in a statically typed language.

So they are not useless.

replies(1): >>matsem+Yk
◧◩◪◨
8. matsem+Yk[view] [source] [discussion] 2021-10-27 20:32:17
>>drran+ik
> This is low coverage.

No, as a sibling comment to mine shows, it's actually easy to make 100% coverage with bad tests, since one doesn't challenge the implementation to handle edge cases.

replies(2): >>Number+0H >>drran+GH
◧◩◪◨
9. crysin+Uo[view] [source] [discussion] 2021-10-27 20:53:55
>>Graffu+Fg
Problem is with 100% code coverage of badly guarded / implemented code you'll have a fall sense of security if you're just looking at coverage as the metric of quality. Anytime I've worked with a company who had a required code coverage percent, they never actually cared what the code being covered looked like only that it was covered in some test.
10. xtract+hq[view] [source] 2021-10-27 21:00:05
>>manque+(OP)
I've found that for large codebases of dynamic typed interpreted languages Test Coverage is very useful at preventing typos or subtle bugs that wouldn't be caught otherwise.
◧◩◪◨
11. dpryde+Yy[view] [source] [discussion] 2021-10-27 21:54:51
>>jmnico+Oi
If you are using floating point numbers implemented in hardware, then infinity is absolutely a valid value and one that your code will encounter. This is true regardless of language, as long as the language requires or allows IEEE-754 semantics.

I am not aware of any language (outside of intentionally-minimalist esolangs) that doesn't support floating point numbers. In some languages (like JavaScript) that's the only kind of number you get.

replies(1): >>jmnico+KE1
◧◩◪◨
12. eloisi+zC[view] [source] [discussion] 2021-10-27 22:22:30
>>jmnico+Oi
If you are seeking a minimum value within some complicated iteration, it’s easier to start your min accumulator as Inf than null with extra null checks.
◧◩◪◨⬒
13. Number+0H[view] [source] [discussion] 2021-10-27 23:01:00
>>matsem+Yk
I think maybe you are using different definitions of coverage -- textual coverage vs logic coverage.
◧◩
14. tikhon+mH[view] [source] [discussion] 2021-10-27 23:03:36
>>Graffu+57
I saw a cool study recently (summarized well here[1]) with an empirical experiment on how well code coverage predicts how well a test suite catches bugs. They found that the number of test cases correlated well with the test suite's effectiveness, but, when controlling for the number of tests, code coverage didn't.

It was a pretty thorough study:

> Our study is the largest to date in the literature: we generated 31,000 test suites for five systems consisting of up to 724,000 lines of source code. We measured the statement coverage, decision coverage, and modified condition coverage of these suites and used mutation testing to evaluate their fault detection effectiveness. We found that there is a low to moderate correlation between coverage and effectiveness when the number of test cases in the suite is controlled for.

Given their data, their conclusion seems pretty plausible:

> Our results suggest that coverage, while useful for identifying under-tested parts of a program, should not be used as a quality target because it is not a good indicator of test suite effectiveness.

That's certainly how I approach testing: I value having a thorough test suite, but I do not treat coverage as a target or use it as a requirement for other people working on the same project.

[1]: https://neverworkintheory.org/2021/09/24/coverage-is-not-str...

◧◩◪◨⬒
15. drran+GH[view] [source] [discussion] 2021-10-27 23:06:04
>>matsem+Yk
It's easy to achieve 100% coverage with happy-path code and low quality shallow tests, agreed.

AFAIK, «high coverage» may have different meaning for different people. For me, it's «high quality», for others it's «high percentage», e.g. «full coverage» or «80% coverage», which is easy to OKR.

replies(1): >>smaude+5a1
◧◩◪◨
16. manque+uK[view] [source] [discussion] 2021-10-27 23:21:46
>>Graffu+Fg
Without going down the rabbit hole of Goodhart's law, code coverage % is a poor metric particularly when used standalone.
◧◩◪◨⬒⬓
17. smaude+5a1[view] [source] [discussion] 2021-10-28 03:02:27
>>drran+GH
It's the fact this could even have a different meaning that makes this a useless metric - defining 'quality' or 'coverage' is subjective. The majority of tests written are meaningless noise, and serve mainly to distract from covering 'critical' failures. Again, a subjective measure, in the sense that was is critical to you and me may not be the same thing.

Which is what makes this whole concept of code coverage so much toxic nonsense...

Not to argue against writing 'quality' tests, but high 'coverage' actually decreases quality, objectively speaking, since erroneous coverage of code serves negative purposes such as obscuring important testing, enshrinig bugs within testing.

I would make my case here CodePilot and all such 'AI' tools should be banned from production, at least until they solve the above problem, since as it stands they will serve to shovel piles of useless or worse, incorrect testing.

It is also important to remember what AI does, i.e. produce networks which create results based upon desired metrics - if the metrics were wrong or incomplete, you produce and propagate bad design.

So yes people use it now as a learning tool (fine) and it will get 'better' (sure), but as a tool, when it gets better, it will constrain more, not less, along whatever lines have been deemed better, and it will become harder, not easier, to adjust.

◧◩◪◨⬒
18. jmnico+KE1[view] [source] [discussion] 2021-10-28 08:41:37
>>dpryde+Yy
You're right, there's double.PositiveInfinity and double.NegativeInfinity in C#.

Never encountered it before (and I'm ashamed to say C# is my main language since 2009).

[go to top]