EXPERIMENTS · 4 RUNNING · 2 QUEUED

Active research.

Every experiment is a hypothesis the lab is actively testing. Runs get their own GPU slice, their own artefact bucket, and a status row that's tailable from the console. Results hit the research log when they land.

running NOM

4

queued NOM

2

neurons / hr · k NOM

188

avg wall NOM

1.8 h

artefacts / hr NOM

68

fail rate NOM

4.0 %

// active

In-flight experiments

EX-014 RUNNING

Recursive critique tower

Two-model debate loop. Researcher proposes, critic refutes, three rounds, judge ranks. Tracking inter-rater agreement vs. round depth.

64%

#agents#judging#rlhf

EX-013 RUNNING

SDXL stylebank explorer

Sweeping 64 controlnet conditioning combos × 12 schedulers. 4 images per combo. Saving thumbnail grids for review.

28%

#vision#sweep#styles

EX-012 PAUSED

Whisper diarisation eval

Comparing v3-large turbo against pyannote-3.1 on a 14-speaker meeting corpus. Measuring DER, JER, word-attribution accuracy.

42%

#audio#eval

EX-011 DONE

Embedding cache half-life

Production trace replay against a TTL'd embed cache. Looking at hit-rate vs. evict policy: LRU, LFU, ARC, random.

100%

#embeddings#cache

EX-010 QUEUED

Tool-router fine-tune

LoRA on Llama-8b → tool-call routing classifier. Train on 1.2M synthetic dispatches. Eval on hand-labelled 4k set.

0%

#fine-tune#routing

EX-009 FAILED

Lab-wide drift watch

Continuous K-S test on response-length and tool-choice distributions. Alerts if today's drift exceeds 2σ from a 30-day baseline.

89%

#ops#monitoring

EX-008 RUNNING

Retriever recall sweep

Sweeping chunk size × overlap × encoder for an internal docs corpus. Measuring recall@k against a hand-curated query set.

51%

#rag#sweep

EX-007 QUEUED

Multilingual code summariser

Fine-tune for code → 1-paragraph English summary across 8 languages. Eval on BLEU + human rubric.

0%

#code#fine-tune

EX-006 DONE

Streaming tool-call protocol

Spec'ing a streaming protocol for tool calls: partial-result hooks, cancellation, recovery. Reference impl + conformance suite.

100%

#agents#protocol

EX-005 RUNNING

Vision agent for charts

Multi-modal agent for reading dashboard screenshots and answering natural-language questions about the data shown.

73%

#vision#agents

// archive

Completed last 30 days

EX-004 Cold-start cache priming // 34% p95 reduction across LLM endpoints 2026-05-19
EX-003 Multi-turn safety probe // 178 prompts. 9 break-throughs (4.7%), written up 2026-05-12
EX-002 Context window stress-test // Long-context recall degrades sharply past 96k 2026-05-04
EX-001 Tool-call latency budget // Spec landed: 800ms hard ceiling per tool 2026-04-28