Let's take Alice and Bob, who are both in the same class.
Alice has clinical depression, but on this particular Tuesday, she is feeling ok. She knows the material well and works through the test answering all the questions. She is allowed 30 minutes of extra time, which is helpful as it allows her to work carefully and checking her work.
Bob doesn't have a disability, but he was just dumped by his long term girlfriend yesterday and as a result barely slept last night. Because of his acute depression (a natural emotion that happens to all people sometimes), Bob has trouble focusing during the exam and his mind regularly drifts to ruminate on his personal issues. He knows the material well, but just can't stay on the task at hand. He runs at out of time before even attempting all the problems.
Now, I can imagine two situations.
1. For this particular exam, there really isn't a need to evaluate whether the students can quickly recall and apply the material. In this situation, what reason is there to not also give Bob an extra 30 minutes, same as Alice?
2. For whatever reason, part of the evaluation criteria for this exam is that the test taker is able to quickly recall and apply the material. To achieve a high score, being able to recall all the material is insufficient, it must be done quickly. In this case, basically Alice and Bob took different tests that measured different things.
One Problem is, that we first have to clearly define the construct that we want to measure with the test. That is not often clear and often underdefined. When designing a test, we also need to be clear about what external influences contribute to noise / error and which are created by the actual measurement. There never is a test that does not have a margin of error.
A simple / simplified example: When we measure IQ, we want to determine cognitive processing speed. So we need to have fixed time for the test. But people also may read the questions faster or slower. This is just a typical range, so when you look at actual IQ tests, they will not give a score (just the most likely score) but also a margin of error, and test theorists will be very unhappy if you don't take this margin of error seriously. Now take someone who is legally blind. That person will be far out of the margin of error of others. The margins of errors account for typical inter-personal and intra-personal (bad day, girlfriend broke up) etc occurrences. But this doesn't work here. So we try to fix this, and account for the new source of error differently, e.g. by giving more time.
So it highly depends on what you want to measure. If you are doing a test in CS, do you want to measure how well the student understood the material and how fast they can apply it? Or do you want to measure how fast the student could do an actual real-live coding task? Depending on what your answer is, you need a very different measurement strategy and you need to handle sources of error differently.
When looking at grades people usually account for these margins of errors intuitively. We don't just rely on grades when hiring, but also conduct interviews etc so we can get a clearer picture.