zlacker

Haha, agree, it would take longer for sure.

What I meant is, most likely assuming that you are using pytorch / jax you could code down the model pretty fast. Just compare it to llama, sure it is far behind, but the llama model is under 1000 lines of code and pretty good.

There is tons of work, for the training, infra, preparing the data and so on. That would result guess in millions lines of code. But the core ideas and the model are likely thin I would argue. So that is my point.