Clusters & Scaling

From one laptop to a mini-DGX.

Three validated topologies for scaling RTX Spark into a multi-node agent host. Each blueprint covers networking, orchestration, and the models you can realistically serve.

2 nodes · 128GB effective

2-node 64GB Desktop Pair

Lowest-cost path to 128GB of effective memory with redundancy. Ideal for a small team running shared agents.

Interconnect: 10GbE point-to-point
Orchestration: vLLM tensor parallel
Per-node memory: 64GB unified
Total memory: 128GB

Expected throughput

Llama-3.1-70B Q4
28 tok/s
Mixtral 8x22B Q3
14 tok/s

┌─────────┐   10GbE   ┌─────────┐
│ Spark 1 │ ───────── │ Spark 2 │
│ 64 GB   │           │ 64 GB   │
└─────────┘           └─────────┘

4 nodes · 128GB effective

4-node 32GB Mini-PC Rack

Best $/concurrent-agent. Shard small models across nodes to host 30+ parallel agents.

Interconnect: 25GbE switch
Orchestration: Ray Serve + vLLM
Per-node memory: 32GB unified
Total memory: 128GB

Expected throughput

Qwen3-Coder-32B Q4 ×4
120 tok/s
Llama-3.1-70B Q4 (TP=4)
22 tok/s

┌─────────────────── 25GbE switch ───────────────────┐
│        │           │           │           │
┌──────┐ ┌──────┐  ┌──────┐  ┌──────┐
│Node 1│ │Node 2│  │Node 3│  │Node 4│
│ 32GB │ │ 32GB │  │ 32GB │  │ 32GB │
└──────┘ └──────┘  └──────┘  └──────┘

3 nodes · 288GB effective

1×128GB Laptop + 2 Headless Nodes

Travel-ready orchestrator that scales when docked. Headless nodes wake on demand.

Interconnect: Thunderbolt 5 + 10GbE
Orchestration: Custom router on laptop
Per-node memory: 96GB unified
Total memory: 288GB

Expected throughput

Qwen3-235B Q4
18 tok/s
Llama-3.1-70B Q4 (HA)
24 tok/s

        ┌──────────────┐
        │ Laptop 128GB │ (orchestrator)
        └──────┬───────┘
       TB5 / 10GbE
        ┌──────┴───────┐
┌────────┐         ┌────────┐
│ Node A │         │ Node B │
└────────┘         └────────┘