The circle of life

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 3 days ago

The circle of life

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 3 days ago

You should be able to get very decent performance with 128gb vram running Qwen 3.6 with something like https://github.com/itigges22/ATLAS especially if you run MTP https://huggingface.co/unsloth/Qwen3.6-27B-MTP-GGUF

A friend of mine gets something like 50 tokens a second with it, and output quality is quite decent.

dragnucs@lemmy.ml · 3 days ago

How does it compare to largest deepseek ans Claude opus 4.6? I hot used to blazing fast speed and accurate results. I’m not buying a server and 128 GB of RAM just to run a model similar to gpt-4.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 3 days ago

ATLAS has some benchmarks in the repo, and it’s comparable to opus 4.6, you don’t actually even need 128gb model for that. An 8 bit quantized model will run with around 32gb and still perform quite well.