There is no BF16. There is no FP8 for the instruct model. The instruct model at full precision is 160 GB (mixed FP4 and FP8). The base model at full precision is 284 GB (FP8). Almost everyone is going to use instruct. But I do love to see base models released.
The Flash version is 284B A13B in mixed FP8 / FP4 and the full native precision weights total approximately 154 GB. KV cache is said to take 10% as much space as V3. This looks very accessible for people running "large" local models. It's a nice follow up to the Gemma 4 and Qwen3.5 small local models.
Although Mistral's model card seems to indicate that Devstral 2 doesn't support FIM, it seems very odd that it wouldn't. I have been meaning to test it.
Qwen Coder 30B A3B is far better than Qwen Coder Next imo. I may have inference issues or it's just a problem with running Coder Next at IQ4 XS, vs Q8 for the earlier/smaller model but I don't find the 80B to be much better at coding, even in instruct mode, and the insane speed and low latency of the smaller model is way more useful. Good one-line completions often happen in 300ms.
Even bumping up to 16-bit K cache should fit comfortably by dropping down to 64K context, which is still a pretty decent amount. I would try both. I'm not sure how tolerant Qwen3.5 series is of dropping K cache to 8 bits.
These calculators are almost entirely useless. They don't understand specific model architectures. Even the ones that try to support only specific models (like the apxml one) get it very wrong a lot of the time.
For example, the one you linked, when I provide a Qwen3.5 27B Q_4_M GGUF [0], says that it will require 338 GB of memory with 16-bit kv cache. That is wrong by over an order of magnitude.
Excellent job with this! I tried a few combinations that completely fail on other calculators and yours gets VRAM usage pretty much spot on, and even the performance estimate is in the ballpark to what I see with mixed VRAM / RAM workloads.
It's a shame that search is so polluted these days that it's impossible to find good tools like yours.
But in this case, it's more likely just to be a tooling issue.
reply