Tech

Open-source tool prioritises local LLM performance over hardware fit

Developer Andyyyy64 releases GitHub utility that auto-detects system specs and scores models on recency and real-world efficiency rather than parameter count alone.

Author

Owen Mercer

Markets and Finance Editor

Published

Draft

Source: Hacker News · original

Artificial Intelligence Research

Related coverage

Explore Artificial Intelligence coverage Explore Research coverage More from the Tech desk

Tech

No image available

Whichllm uses live benchmarks to rank models, challenging size-centric compatibility checks

A new open-source utility released on GitHub by developer Andyyyy64 aims to resolve a common bottleneck in local large language model deployment: determining which model offers the best performance for specific hardware. Titled 'whichllm', the tool moves beyond simple compatibility checks to rank models based on real, recency-aware benchmarks, ensuring that efficiency and current generation quality take precedence over raw parameter size.

The software auto-detects GPU, CPU, and RAM specifications to identify models from HuggingFace that fit the user's system. However, it distinguishes itself by ranking models based on performance metrics rather than just size. For instance, the tool may rank a 27 billion parameter model higher than a 32 billion parameter model if the former scores better on real benchmarks and represents a newer generation, a nuance often missed by size-only compatibility tools.

Scoring is derived from a 0-100 quality scale utilising live data from sources such as LiveBench and Artificial Analysis. The benchmarking process combines a 'current tier' featuring LiveBench, the Artificial Analysis Index, and Aider, which are merged live when reachable, with a 'frozen tier' including the Open LLM Leaderboard v2 and Chatbot Arena ELO. This dual-tier approach helps ensure that newer or more efficient models are prioritised over larger, older generations.

To prevent stale leaderboards from over-rewarding older model generations, the tool applies lineage-aware recency demotion. It utilises five resolution levels for benchmark evidence, increasingly discounting older data. Additionally, the system rejects inheritance claims when a model's parameters diverge more than twice from its family's dominant member, aiming to catch draft, MTP, or abliterated forks that share a family ID with a much larger base.

The tool creates isolated environments using 'uv' to test models instantly without manual installation. It installs dependencies, downloads the model, and starts an interactive chat experience. Speed is ranked on active parameters, while quality is ranked on total parameters, with specific note given to Mixture of Experts (MoE) models. A snapshot of top picks dated May 2026 tracks live HuggingFace data, allowing users to simulate configurations before purchase.

Open-source tool prioritises local LLM performance over hardware fit

More from Tech

Apple to roll out manual EQ controls for AirPods in iOS 27 update

Apple rolls out visionOS 27, integrating AI-driven Siri into Vision Pro headset

Apple Overhauls Siri with Google Gemini Partnership and Standalone App at WWDC 2026