67.0 °C
FLEET · 14 NODES · 96% UTILISED
Compute substrate.
Mixed GPU pool spread across four colos. Inference jobs hit a global scheduler that fans them out by model affinity and current pool depth. Excess load fails-over to Workers AI.
78 %
4
22
14.0 s
14.6 kW
// host map
Per-node load
Compute fleet
32 nodes · 88% utilised
28 busy 5 warn 2 crit 2 idle
// pools
GPU pool inventory
Each pool is provisioned for a workload class. Switching a model between pools is a hot operation. No restart.
inference-a
H100 · 8×LLM serving · 70B class
inference-b
A100 · 12×LLM serving · 8B/12B fallback
vision
L40S · 6×SDXL + control nets
audio
4090 · 4×Whisper / Whisper-X / Diarisation
embed
A100 · 2×BGE-large-en bulk embedding
fine-tune
H100 · 4×LoRA finetune jobs
// schedule
Recent dispatch decisions
- 00:00:08
0x7c4asdxl → vision // affinity match · 4 slots free - 00:00:12
0x8b9ellama-3.3-70b → inference-a // memory class fits · qd=2 < 4 - 00:00:21
0x3f02whisper-v3 → audio // only valid pool - 00:00:32
0x9a17bge-large-en → embed // batch coalesce · joined batch 12 - 00:00:44
0x4c1dllama-3.3-70b → inference-b // inference-a saturated · failover - 00:01:02
0x6e80sdxl → workers-ai // pool-vision qd > 16 · spill