Set your hardware constraints and get a ranked list of open-weight LLMs — each shown at the highest-quality quant that fits your RAM and meets your speed floor. Models that don't fit are listed below with the reason.

Hardware

Device

Memory bandwidth caps token generation speed.

Memory · GB

RAM available for the model and KV cache.

Workload

Context length · tokens

Tokens in scope (in + out)

Min speed · tok/s

Slower models are filtered out.

Weight quant

Picks the highest-quality weight quant that fits.

KV cache quant

Picks the highest-quality KV quant that fits.

Models

Developers:

40 models meet your constraints

LM Calc

Hardware

Workload

Models

31 filtered out