willitrunit .com
Local-LLM hardware matcher · data ~Jun 2026
~/llm will-it-run-it
Will it run it?
Pick your GPU or type your VRAM, and instantly see which open AI models your machine can actually run — how fast they'll generate, the best quantization, and a one-click Hugging Face download for each.
0 Models
0 GPUs
VRAM · KV-cache · bandwidth Physics engine
Your hardware
Select GPU
— choose a GPU —
— OR —
Test any model
Filters
Minimum context
Any 2K 4K
8K 16K 32K
64K 128K 200K
Minimum speed (tokens/sec)
Any 5 10
20 30 50 100
↺ Reset filters
Results
Select a GPU or enter your VRAM to begin.
How it works. Memory need = quantized weights + KV cache (sized to your chosen context) + a runtime buffer.
A model runs fully on GPU if it all fits in VRAM, or with offload if it fits in VRAM + system RAM.
Speed ≈ memory bandwidth ÷ bytes read per token (active params for MoE models). All numbers are practical estimates.
♥ Support this tool
Single static file — host it anywhere. Model & GPU data live at the top of the script; add your own freely.
· All models on Hugging Face ↗