Disappointing how this lacks a clear reference implementation, if not mixed at almost yet unreleased VLLM (nightly version) stuff. I'm ok with Open Weights being a form of OSS in the case of models, because frankly I don't believe that, for large LLMs, it is feasible to release the training data, all the orchestration stuff, and so forth. But it can't be: here are the weights, we partnered with VLLM for inference. Come on. Open Weights
must mean that you put me in a situation to write an implementation easily for any hardware.
p.s. even the demo uses a remote server via websocket.