Scores & Benchmarks

Three scores. One opinion.

Every device in the registry is graded on three orthogonal axes. The Overall Score is a weighted blend (SCS 0.45 · ARS 0.35 · TLS 0.20). Weights are versioned and shown on each device page.

SCS

Spark Capability Score

Raw AI + compute capability per device. Weighted toward tokens/sec on representative open models.

AI throughput (tok/s, Q4 70B + Q5 13B)50%
CPU build & compile (Clang, Rust LTO)30%
GPU rendering & RTX features (DLSS, RT)20%

ARS

Agent Readiness Score

How well a device runs as a 24/7 local agent host: thermals, stability, sleep/resume, battery under mixed load.

72-hour Agent Stress Suite (crash-free)40%
Sustained thermals & throttle headroom25%
Battery under mixed agent load20%
Wake/sleep with resident agents15%

TLS

Trust & Lifecycle Score

Governance and longevity. Driver cadence, OS compatibility, security posture, OEM track record.

Firmware & driver update cadence30%
Agent stack compatibility (OpenShell, llama.cpp, vLLM)25%
Security posture (TPM, secure boot, telemetry)25%
OEM long-term support history20%

Agent Stress Suite v1

A 72-hour mixed-agent workload that we run on every device under test. Anything that crashes, throttles, or refuses to resume is penalized.

Agent A — Coder

Qwen3-Coder-32B + Aider

Hourly refactor loop on a 250k LOC monorepo.

Agent B — Researcher

Llama-3.1-70B + Letta

Continuous web research with a 32k context.

Agent C — Meeting

Qwen3-7B + Whisper-large-v3

Live transcription + summary every 15 min.