Tech

Open-source tool prioritises local LLM performance over hardware fit

Developer Andyyyy64 releases GitHub utility that auto-detects system specs and scores models on recency and real-world efficiency rather than parameter count alone.

Author
Owen Mercer
Markets and Finance Editor
Published
Draft
Source: Hacker News · original
Tech
No image available
Whichllm uses live benchmarks to rank models, challenging size-centric compatibility checks

A new open-source utility released on GitHub by developer Andyyyy64 aims to resolve a common bottleneck in local large language model deployment: determining which model offers the best performance for specific hardware. Titled 'whichllm', the tool moves beyond simple compatibility checks to rank models based on real, recency-aware benchmarks, ensuring that efficiency and current generation quality take precedence over raw parameter size.

The software auto-detects GPU, CPU, and RAM specifications to identify models from HuggingFace that fit the user's system. However, it distinguishes itself by ranking models based on performance metrics rather than just size. For instance, the tool may rank a 27 billion parameter model higher than a 32 billion parameter model if the former scores better on real benchmarks and represents a newer generation, a nuance often missed by size-only compatibility tools.

Scoring is derived from a 0-100 quality scale utilising live data from sources such as LiveBench and Artificial Analysis. The benchmarking process combines a 'current tier' featuring LiveBench, the Artificial Analysis Index, and Aider, which are merged live when reachable, with a 'frozen tier' including the Open LLM Leaderboard v2 and Chatbot Arena ELO. This dual-tier approach helps ensure that newer or more efficient models are prioritised over larger, older generations.

To prevent stale leaderboards from over-rewarding older model generations, the tool applies lineage-aware recency demotion. It utilises five resolution levels for benchmark evidence, increasingly discounting older data. Additionally, the system rejects inheritance claims when a model's parameters diverge more than twice from its family's dominant member, aiming to catch draft, MTP, or abliterated forks that share a family ID with a much larger base.

The tool creates isolated environments using 'uv' to test models instantly without manual installation. It installs dependencies, downloads the model, and starts an interactive chat experience. Speed is ranked on active parameters, while quality is ranked on total parameters, with specific note given to Mixture of Experts (MoE) models. A snapshot of top picks dated May 2026 tracks live HuggingFace data, allowing users to simulate configurations before purchase.

Continue reading

More from Tech

Read next: Apple to roll out manual EQ controls for AirPods in iOS 27 update
Read next: Apple rolls out visionOS 27, integrating AI-driven Siri into Vision Pro headset
Read next: Apple Overhauls Siri with Google Gemini Partnership and Standalone App at WWDC 2026