Hardware
Memory bandwidth caps token generation speed.
RAM available for the model and KV cache.
Workload
Tokens in scope (in + out)
10
Slower models are filtered out.
Picks the highest-quality weight quant that fits.
Picks the highest-quality KV quant that fits.
Models
Developers:
40 models meet your constraints