Scores & Benchmarks

Three scores. One opinion.

Every device in the registry is graded on three orthogonal axes. The Overall Score is a weighted blend (SCS 0.45 · ARS 0.35 · TLS 0.20). Weights are versioned and shown on each device page.

SCS

Spark Capability Score

Raw AI + compute capability per device. Weighted toward tokens/sec on representative open models.

  • AI throughput (tok/s, Q4 70B + Q5 13B)50%
  • CPU build & compile (Clang, Rust LTO)30%
  • GPU rendering & RTX features (DLSS, RT)20%
ARS

Agent Readiness Score

How well a device runs as a 24/7 local agent host: thermals, stability, sleep/resume, battery under mixed load.

  • 72-hour Agent Stress Suite (crash-free)40%
  • Sustained thermals & throttle headroom25%
  • Battery under mixed agent load20%
  • Wake/sleep with resident agents15%
TLS

Trust & Lifecycle Score

Governance and longevity. Driver cadence, OS compatibility, security posture, OEM track record.

  • Firmware & driver update cadence30%
  • Agent stack compatibility (OpenShell, llama.cpp, vLLM)25%
  • Security posture (TPM, secure boot, telemetry)25%
  • OEM long-term support history20%

Agent Stress Suite v1

A 72-hour mixed-agent workload that we run on every device under test. Anything that crashes, throttles, or refuses to resume is penalized.

Agent A — Coder
Qwen3-Coder-32B + Aider
Hourly refactor loop on a 250k LOC monorepo.
Agent B — Researcher
Llama-3.1-70B + Letta
Continuous web research with a 32k context.
Agent C — Meeting
Qwen3-7B + Whisper-large-v3
Live transcription + summary every 15 min.