>>daniel+(OP)
Is this going to need 1x or 2x of those RTX PRO 6000s to allow for a decent KV for an active context length of 64-100k?
It's one thing running the model without any context, but coding agents build it up close to the max and that slows down generation massively in my experience.