[ transcribe:// ] experimental
Upload audio (MP3 / WAV / FLAC / M4A, ≤9MB per chunk, longer audio is chunked client-side). Whisper-large-v3-turbo transcribes. Outputs plain text + SRT + VTT. Auto-detects language across 99 tongues.
// system prompt
Audio transcription. Whisper handles language detection. Output JSON:
{ text, language?, segments?: [{ start, end, text }] }
Client UI will render the transcript with click-to-jump timestamps, plus SRT and VTT download buttons. Long-form audio (>1 min) chunks client-side via ffmpeg.wasm before posting. ⚡ Cloudflare Workers AI · quota deducted on success
// output
// sample output
{
"language": "en",
"text": "So the question I keep coming back to is: are we building the right thing, or are we just shipping the easiest thing?",
"segments": [
{ "start": 0.0, "end": 2.4, "text": "So the question I keep coming back to is:" },
{ "start": 2.4, "end": 4.8, "text": "are we building the right thing," },
{ "start": 4.8, "end": 7.6, "text": "or are we just shipping the easiest thing?" }
]
} // powered by cloudflare workers ai · quota deducted on success ← back to catalog