Open-source tool prioritises local LLM performance over hardware fit
Developer Andyyyy64 releases GitHub utility that auto-detects system specs and scores models on recency and real-world efficiency rather than parameter count alone.
A new open-source utility released on GitHub by developer Andyyyy64 aims to resolve a common bottleneck in local large language model deployment: determining which model offers the best performance for specific hardware. Titled 'whichllm', the tool moves beyond simple compatibility checks to rank models based on real, recency-aware benchmarks, ensuring that efficiency and current generation quality take precedence over raw parameter size.
The software auto-detects GPU, CPU, and RAM specifications to identify models from HuggingFace that fit the user's system. However, it distinguishes itself by ranking models based on performance metrics rather than just size. For instance, the tool may rank a 27 billion parameter model higher than a 32 billion parameter model if the former scores better on real benchmarks and represents a newer generation, a nuance often missed by size-only compatibility tools.
Scoring is derived from a 0-100 quality scale utilising live data from sources such as LiveBench and Artificial Analysis. The benchmarking process combines a 'current tier' featuring LiveBench, the Artificial Analysis Index, and Aider, which are merged live when reachable, with a 'frozen tier' including the Open LLM Leaderboard v2 and Chatbot Arena ELO. This dual-tier approach helps ensure that newer or more efficient models are prioritised over larger, older generations.
To prevent stale leaderboards from over-rewarding older model generations, the tool applies lineage-aware recency demotion. It utilises five resolution levels for benchmark evidence, increasingly discounting older data. Additionally, the system rejects inheritance claims when a model's parameters diverge more than twice from its family's dominant member, aiming to catch draft, MTP, or abliterated forks that share a family ID with a much larger base.
The tool creates isolated environments using 'uv' to test models instantly without manual installation. It installs dependencies, downloads the model, and starts an interactive chat experience. Speed is ranked on active parameters, while quality is ranked on total parameters, with specific note given to Mixture of Experts (MoE) models. A snapshot of top picks dated May 2026 tracks live HuggingFace data, allowing users to simulate configurations before purchase.


