FLEET · 14 NODES · 96% UTILISED

Compute substrate.

Mixed GPU pool spread across four colos. Inference jobs hit a global scheduler that fans them out by model affinity and current pool depth. Excess load fails-over to Workers AI.

pool temp avg NOM

67.0 °C

pool load NOM

78 %

queue depth NOM

4

jobs / min NOM

22

avg wall NOM

14.0 s

draw NOM

14.6 kW

// host map

Per-node load

Compute fleet

32 nodes · 88% utilised

28 busy 5 warn 2 crit 2 idle

// pools

GPU pool inventory

Each pool is provisioned for a workload class. Switching a model between pools is a hot operation. No restart.

inference-a

H100 · 8×

LLM serving · 70B class

84%

12 active streams

inference-b

A100 · 12×

LLM serving · 8B/12B fallback

62%

7 active streams

vision

L40S · 6×

SDXL + control nets

73%

4 active batches

audio

4090 · 4×

Whisper / Whisper-X / Diarisation

28%

1 active stream

embed

A100 · 2×

BGE-large-en bulk embedding

91%

38 batches queued

fine-tune

H100 · 4×

LoRA finetune jobs

12%

0 jobs

// schedule

Recent dispatch decisions

00:00:08 0x7c4a sdxl → vision // affinity match · 4 slots free
00:00:12 0x8b9e llama-3.3-70b → inference-a // memory class fits · qd=2 < 4
00:00:21 0x3f02 whisper-v3 → audio // only valid pool
00:00:32 0x9a17 bge-large-en → embed // batch coalesce · joined batch 12
00:00:44 0x4c1d llama-3.3-70b → inference-b // inference-a saturated · failover
00:01:02 0x6e80 sdxl → workers-ai // pool-vision qd > 16 · spill