FLEET · 14 NODES · 96% UTILISED

Compute substrate.

Mixed GPU pool spread across four colos. Inference jobs hit a global scheduler that fans them out by model affinity and current pool depth. Excess load fails-over to Workers AI.

pool temp avg NOM
67.0 °C
min 58 · max 80
pool load NOM
78 %
min 30 · max 98
queue depth NOM
4
min 0 · max 24
jobs / min NOM
22
min 4 · max 64
avg wall NOM
14.0 s
min 4 · max 36
draw NOM
14.6 kW
min 8 · max 22

// host map

Per-node load

Compute fleet

32 nodes · 88% utilised

28 busy 5 warn 2 crit 2 idle

// pools

GPU pool inventory

Each pool is provisioned for a workload class. Switching a model between pools is a hot operation. No restart.

inference-a

H100 · 8×

LLM serving · 70B class

84%
12 active streams

inference-b

A100 · 12×

LLM serving · 8B/12B fallback

62%
7 active streams

vision

L40S · 6×

SDXL + control nets

73%
4 active batches

audio

4090 · 4×

Whisper / Whisper-X / Diarisation

28%
1 active stream

embed

A100 · 2×

BGE-large-en bulk embedding

91%
38 batches queued

fine-tune

H100 · 4×

LoRA finetune jobs

12%
0 jobs

// schedule

Recent dispatch decisions

  • 00:00:08 0x7c4a sdxl vision // affinity match · 4 slots free
  • 00:00:12 0x8b9e llama-3.3-70b inference-a // memory class fits · qd=2 < 4
  • 00:00:21 0x3f02 whisper-v3 audio // only valid pool
  • 00:00:32 0x9a17 bge-large-en embed // batch coalesce · joined batch 12
  • 00:00:44 0x4c1d llama-3.3-70b inference-b // inference-a saturated · failover
  • 00:01:02 0x6e80 sdxl workers-ai // pool-vision qd > 16 · spill