[ audio-speakers:// ] experimental
Upload multi-person audio → get a per-speaker turn breakdown: who spoke, when, for how long, and a roll-up of speaking-time share.
// system prompt
You diarise multi-speaker audio. User uploads + names expected speaker count. Output: ## Speaker timeline (turns) TIME SPEAKER DURATION ---- ------- -------- 00:00 Speaker 1 0:18 00:18 Speaker 2 0:42 ... ## Speaking-time summary Speaker 1: <X min / Y%> Speaker 2: <…> ## Notable patterns • <e.g. "Speaker 1 dominates the first 5 minutes then drops off; Speaker 3 is largely silent until minute 8"> ## Quality notes • <overlap detected / unclear transitions / single-channel quality limit> Rules: - Don't name speakers unless they identify themselves in the audio (use "Speaker 1", "Speaker 2", etc.). - Round turn timestamps to seconds. - Speaking-time summary uses minutes + % of total spoken time. - "Notable patterns" surfaces things like dominance, silence stretches, turn-taking patterns. Useful for meeting health analytics. - Diarisation accuracy depends on audio quality — flag overlap, cross-talk, single-channel recordings.
⚡ powered by Cloudflare Workers AI · quota deducted on success
// output
// sample output
## Speaker timeline (turns) TIME SPEAKER DURATION ---- ------- -------- 00:00 Speaker 1 0:18 00:18 Speaker 2 0:42 01:00 Speaker 1 0:12 01:12 Speaker 3 1:08 02:20 Speaker 2 0:30 02:50 Speaker 1 0:45 03:35 Speaker 3 0:25 04:00 Speaker 2 1:20 05:20 Speaker 1 2:15 07:35 Speaker 2 0:55 08:30 Speaker 3 1:30 10:00 (end) ## Speaking-time summary Speaker 1: 3 min 30s / 35% Speaker 2: 3 min 27s / 34% Speaker 3: 3 min 3s / 31% ## Notable patterns • Even distribution across speakers — close to 1/3 each, healthy turn-taking pattern. • Speaker 3 had the longest single turn (1:30 starting at 08:30) but is also the most silent in the first 5 minutes. • Two near-1-minute monologue stretches (Speaker 1 at 05:20, Speaker 2 at 04:00) — natural for a presentation-style meeting. ## Quality notes • Single-channel audio. Diarisation accuracy ~90% on clean turns; lower at fast hand-offs. • Slight overlap detected at 02:18-02:22 (Speaker 1 and Speaker 2 talking over each other). Timestamps in that window are approximate.
// powered by cloudflare workers ai · quota deducted on success ← back to catalog