Why? My experience with them was pretty bad. I took their assessment for web development, I think I even did an assignment, and got put on a video call with someone from Triplebyte. He never cracked a smile. Suddenly I got asked a bunch of CS questions that really were not very relevant to web development, some of which were entirely inappropriate like sorting a binary search tree. I even told the guy that I thought I was getting those questions wrong and he just scowled and said "well you just don't know when you're going to use this stuff." "My point exactly," I thought.
Ultimately I got rejected.
The whole idea that you can boil down a candidate to some coding challenges and a video quiz is bad. I do like the idea of streamlining the hiring process for developers, but there's more to it than knowing a bunch of stuff, because that can be gamed. And quizzing me on irrelevant material was a bad move. A firm like Triplebyte won't be as good at interviewing a candidate as the employer itself, and may even keep perfectly qualified candidates out of view from all employers affiliated with them.
I started using them about a year ago (first passively looking, then actively looking)
I really enjoyed the ability to be assessed on something besides Leetcode style questions.
I didn't take a job through their platform (though I did get one really strong offer), but even still, found the assessments incredibly useful, since they give you a percentile distribution of your performance for each topic-specific test.
After taking their assessments, when interviewers asked me how I am at, say, Python, I could tell them I have a hard time assessing my capabilities. "But hey, I took this standardized test that says I'm in the 85th percentile, not sure how good of a metric it is" (and not mentioning that I think I'm OK at best, at Python)
It's the only way I've found to get a measure of your talents compared to the rest of the field (even if it might not be reliable/useful)
A lot of the companies that interview through Triplebyte also skip LC mediums because they have a different signal about your potential suitability as a candidate.
Way too much of engineering is non-quantifiable. Putting a number to someone's skills is bound to be reductive at best.
Like honestly I might think I'm a 3 at X, but if some test that thousands of other people took tells me I'm in the 90th percentile of X users, that information is still useful to me.
One complaint I do have is that (in addition to the percentile bucket) they give you a 1-5 rating, where 4 is "senior engineer level" and 5 is something "exceptional performance, a leader in the field"
But the ratings seem to fall at different percentile distributions for each test.
For example, I might get 80th percentile on one test, but get a 3 rating, and for another test, 80th percentile is a 5.
In general, different quizzes have vastly different populations of people attempting them. For example, our front-end quiz gets a lot of beginners and hobbyists, and thus has a very bottom-loaded score distribution. Our devops-related quizzes, on the other hand, have a population that skews skilled and senior, and has a very top-loaded score distribution.
Communicating this information to our users (particularly the less-quantitatively-oriented ones on the company side) has been a source of considerable UI challenges for us.
Personally speaking, I've used Python a handful of times over the years, but never as a primary language for any work I've done. I got a 4 on the Python test.
Compared to front-end, which I've been using professionally, and also dabbling in for ~20 years (still keeping up with developments in the years in which I wasn't primarily doing front-end dev professionally)
I got a 3.
I definitely know 100X as many random facts about front-end APIs, libraries, tooling, and technologies than I do about Python. So perhaps it just came down to luck (guessed unlucky for the front-end and lucky for Python). Or perhaps there's just so much more to know that falls in scope for the front-end quiz than there is for Python, to the point where you can spend 20 years learning the front-end technologies and still be "middle-of-the-road" in terms of "absolute level of skill".
But I think that makes your descriptions of the 1-5 rankings a bit disingenuous. If people who has (what most other companies would consider) senior-level knowledge is generally considered a 3 by your system, a more honest ranking of the descriptions would involve changing "4: level expected of seniors" to "4: knows roughly ~80% or more of all things there are to know about this subject".
They're set by experts in the area, the same as the ones who write our question content.
To give a little more detail: the tests run in a beta state for a while before being fully released. We gather a bunch of data and calibrate parameters for our IRT model based on that. So the ordinal ordering of performance is entirely mathematical and data driven. (When we were still doing in-house human interviews, those were part of the data set as well, and still are for the subjects that overlap them.) But that produces a continuous, hard-to-interpret, and population-dependent score distribution, and SMEs draw the lines with which we bucket those scores. (For those of you familiar with IRT as a framework, they set theta thresholds.)
But yes, there is some chance involved. It's a tradeoff between the standard error in our scores and the length of the quiz, and we try to optimize for a sweet spot there (since most people don't want to take two hours of quizzes). And we are absolutely going to get it wrong sometimes. That's both for in-model reasons (the statistical standard error is enough that we we'll be off by a level either way around 20-25% of the time or something like that) and for out-of-model ones (maybe some of our questions just test the wrong thing in ways that don't show up in the data). Assuming your self-assessment is correct (and I will say that many peoples' are not! confidence correlates with skill, but with a whooooole lot of noise.) then yeah, you probably had a bad roll of the dice on one and not on the other.
As I say a lot (in this thread and elsewhere), we can't reasonably bat 1.000: our goal is to bat better than the next guy. And I think we do do that, messy though the entire space can be in practice.
---
For the record, when we talk to companies, here's what we tell them about scores:
2 = knows something in this area, but we can't say with confidence that they know enough to handle things independently. OK for entry-level roles, but lower than you'd like for others. We don't show a 2 on profiles. The only place companies see it is if they're using our screens to screen their own candidates via our Screen product. The idea being that if you have the choice of whether to take an assessment or not in the first place it shouldn't really hurt you to try.
3 ("Proficient") = professional competence in that area, can work independently in it. A score we'd expect of a mid-level engineer within their area of expertise. A recommendation for most roles, maybe not very senior ones (but not a point against even for senior roles). A score of 3 or above counts as certified, meaning it earns a shareable certificate and makes you appear in search results for a particular quiz score.
4 ("Advanced") = significant expertise, something more typical of a senior eng who really knows their way around the subject. A recc for all levels, even very senior ones.
5 ("Expert") = exceptional, above and beyond even by the standards of senior roles