// catalog · grep -e "Vision / Audio"
Catalog/Vision / Audio · page 1/5
Showing entries 1 to 6 of 30 · click any row to launch
- 001 Vision / Audio describe-img:// Drop an image (PNG/JPG/WebP, ≤4MB). LLaVA-1.5-7B describes it in detail. Ask follow-up questions in a chat thread. Full visual Q&A.
- 002 Vision / Audio img-forge:// Generate an image from a text prompt. Stable Diffusion XL on the edge. Pick a style preset (photo / illustration / pixel / poster).
- 003 Vision / Audio roast-shot:// Upload a screenshot. The model roasts your UI in 5 bullet points (kind but pointed) + suggests 3 specific fixes. Surprisingly useful as a quick design crit.
- 004 Vision / Audio transcribe:// Upload audio (MP3 / WAV / FLAC / M4A, ≤9MB per chunk). Whisper-large-v3-turbo transcribes. Plain text + SRT + VTT output. Auto-detects 99 languages.
- 005 Vision / Audio vibe-shift:// Upload an image. Tell the AI what mood to shift it to (cyberpunk, vaporwave, film-noir, golden-hour, etc.). Returns a re-rendered version.
- 006 Vision / Audio alt-text:// Generates accessible alt text from any image. Returns three lengths: brief (under 10 words), standard (one sentence), detailed (2-3 sentences). Also flags decorative-only images that should get empty alt text + tells you when an image is doing real semantic work that needs a longer description.