zlacker

I think you overestimate the technical part. Just speculating (no inside, no expert), but I would assume that the models are pretty "easy" and can be coded in few days. There are for sure some tweaks to the standard transformer architecture, but guess the tweaks are well known to sam and co.

The dataset is more challenging, but here msft can help - since they have bing and github as well. So they might be able to make few shortcuts here.

The most time consuming part is compute, but here again msft has the compute.

Will they beat chat-gpt 4 in a year? Guess no. But they will come very close to it and maybe it would not matter that much if you focus on the product.

replies(1): >>duhast+Y3

>>Michae+(OP)
You lost me at "can be coded in few days".

replies(1): >>Michae+GR

>>duhast+Y3
Haha, agree, it would take longer for sure.

What I meant is, most likely assuming that you are using pytorch / jax you could code down the model pretty fast. Just compare it to llama, sure it is far behind, but the llama model is under 1000 lines of code and pretty good.

There is tons of work, for the training, infra, preparing the data and so on. That would result guess in millions lines of code. But the core ideas and the model are likely thin I would argue. So that is my point.