Clusters & Scaling

From one laptop to a mini-DGX.

Three validated topologies for scaling RTX Spark into a multi-node agent host. Each blueprint covers networking, orchestration, and the models you can realistically serve.

2 nodes · 128GB effective

2-node 64GB Desktop Pair

Lowest-cost path to 128GB of effective memory with redundancy. Ideal for a small team running shared agents.

Interconnect
10GbE point-to-point
Orchestration
vLLM tensor parallel
Per-node memory
64GB unified
Total memory
128GB
Expected throughput
  • Llama-3.1-70B Q4
    28 tok/s
  • Mixtral 8x22B Q3
    14 tok/s
┌─────────┐   10GbE   ┌─────────┐
│ Spark 1 │ ───────── │ Spark 2 │
│ 64 GB   │           │ 64 GB   │
└─────────┘           └─────────┘
4 nodes · 128GB effective

4-node 32GB Mini-PC Rack

Best $/concurrent-agent. Shard small models across nodes to host 30+ parallel agents.

Interconnect
25GbE switch
Orchestration
Ray Serve + vLLM
Per-node memory
32GB unified
Total memory
128GB
Expected throughput
  • Qwen3-Coder-32B Q4 ×4
    120 tok/s
  • Llama-3.1-70B Q4 (TP=4)
    22 tok/s
┌─────────────────── 25GbE switch ───────────────────┐
│        │           │           │           │
┌──────┐ ┌──────┐  ┌──────┐  ┌──────┐
│Node 1│ │Node 2│  │Node 3│  │Node 4│
│ 32GB │ │ 32GB │  │ 32GB │  │ 32GB │
└──────┘ └──────┘  └──────┘  └──────┘
3 nodes · 288GB effective

1×128GB Laptop + 2 Headless Nodes

Travel-ready orchestrator that scales when docked. Headless nodes wake on demand.

Interconnect
Thunderbolt 5 + 10GbE
Orchestration
Custom router on laptop
Per-node memory
96GB unified
Total memory
288GB
Expected throughput
  • Qwen3-235B Q4
    18 tok/s
  • Llama-3.1-70B Q4 (HA)
    24 tok/s
        ┌──────────────┐
        │ Laptop 128GB │ (orchestrator)
        └──────┬───────┘
       TB5 / 10GbE
        ┌──────┴───────┐
┌────────┐         ┌────────┐
│ Node A │         │ Node B │
└────────┘         └────────┘