zlacker

> I'm interested in knowing how you determine what an "absolute level of skill" is though.

They're set by experts in the area, the same as the ones who write our question content.

To give a little more detail: the tests run in a beta state for a while before being fully released. We gather a bunch of data and calibrate parameters for our IRT model based on that. So the ordinal ordering of performance is entirely mathematical and data driven. (When we were still doing in-house human interviews, those were part of the data set as well, and still are for the subjects that overlap them.) But that produces a continuous, hard-to-interpret, and population-dependent score distribution, and SMEs draw the lines with which we bucket those scores. (For those of you familiar with IRT as a framework, they set theta thresholds.)

But yes, there is some chance involved. It's a tradeoff between the standard error in our scores and the length of the quiz, and we try to optimize for a sweet spot there (since most people don't want to take two hours of quizzes). And we are absolutely going to get it wrong sometimes. That's both for in-model reasons (the statistical standard error is enough that we we'll be off by a level either way around 20-25% of the time or something like that) and for out-of-model ones (maybe some of our questions just test the wrong thing in ways that don't show up in the data). Assuming your self-assessment is correct (and I will say that many peoples' are not! confidence correlates with skill, but with a whooooole lot of noise.) then yeah, you probably had a bad roll of the dice on one and not on the other.

As I say a lot (in this thread and elsewhere), we can't reasonably bat 1.000: our goal is to bat better than the next guy. And I think we do do that, messy though the entire space can be in practice.

---

For the record, when we talk to companies, here's what we tell them about scores:

2 = knows something in this area, but we can't say with confidence that they know enough to handle things independently. OK for entry-level roles, but lower than you'd like for others. We don't show a 2 on profiles. The only place companies see it is if they're using our screens to screen their own candidates via our Screen product. The idea being that if you have the choice of whether to take an assessment or not in the first place it shouldn't really hurt you to try.

3 ("Proficient") = professional competence in that area, can work independently in it. A score we'd expect of a mid-level engineer within their area of expertise. A recommendation for most roles, maybe not very senior ones (but not a point against even for senior roles). A score of 3 or above counts as certified, meaning it earns a shareable certificate and makes you appear in search results for a particular quiz score.

4 ("Advanced") = significant expertise, something more typical of a senior eng who really knows their way around the subject. A recc for all levels, even very senior ones.

5 ("Expert") = exceptional, above and beyond even by the standards of senior roles