// catalog · grep -e "Vision / Audio"
Catalog/Vision / Audio · page 2/5
Showing entries 7 to 12 of 30 · click any row to launch
- 007 Vision / Audio image-prompt:// Generates a text-to-image prompt that would recreate (or remix) the uploaded image. Names subject, style, composition, lighting, palette. Useful for finding a reference image you can iterate from in SDXL / Flux / Midjourney.
- 008 Vision / Audio image-caption:// Generates an image caption in your chosen voice: formal (museum-label), witty (one-line gag), poetic (3 lines), deadpan (literal-funny), or hype (social-post energy). Same image, different mood.
- 009 Vision / Audio image-classify:// Classifies any image: returns top-5 class labels with confidence percentages, the dominant category (object / scene / portrait / chart / screenshot / illustration), and 3 "what this image is NOT" anti-labels that other classifiers might get wrong.
- 010 Vision / Audio image-mood:// For designers: takes an image and translates it into a mood / palette / style brief you can hand to another designer or feed into a prompt. Names the colour palette in hex, the emotional read, the design references it echoes, the era / context.
- 011 Vision / Audio image-ocr:// Optical character recognition via vision model. Extracts text from any image. Handwriting, signage, document scans, screenshots, photos of receipts. Preserves layout where useful (line breaks, columns). Flags low-confidence words you should double-check.
- 012 Vision / Audio image-receipt:// Extracts structured data from receipt / invoice images. Returns JSON with vendor, date, line items (description + amount), subtotal, tax, total. Flags items with low confidence. Handles paper receipts, restaurant bills, and PDF invoices.