Skip to main content

Content Processing Commands

Commands for extracting, transforming, validating, and generating content from documents, web pages, images, and data files.

Content Extraction

CommandInputDescription
extract-pdfPDF filesExtract PDF to markdown with OCR and template support
extract-webURLsFetch web pages via Playwright to markdown/CSV
extract-youtube-autoYouTube URLExtract metadata, description, and transcript

Media Processing

CommandInputDescription
process-imageImagesResize, crop, optimize for social media platforms
generate-logo-packagePNG logoGenerate multi-platform logo delivery package
images-to-pdfImagesConvert images to a single PDF

Data Transformation

CommandInputDescription
csv-mapCSVMap columns using YAML profile
md2mailMarkdownConvert to Gmail-compatible rich text
generate-presentationMarkdownGenerate branded PPTX from structured markdown

Quality & Validation

CommandInputDescription
validateMarkdown filesValidate naming, frontmatter, and formatting rules
health-checkRepositoryCheck project health, dependencies, and environment

Anonymization (CATE)

CommandInputDescription
cate transformAny fileAnonymize sensitive content (PII)
cate batchDirectoryBatch-process multiple files for anonymization
cate analyzeAny fileDetect sensitive entities without changes
inbox-pollIMAP inboxPoll email inbox and dispatch by route rules

Typical Workflow

Document Extraction Pipeline

  1. Extract: extract-pdf --source ./invoices/ or extract-web --url https://example.com
  2. Validate: validate --path ./extracted/ to check formatting compliance
  3. Anonymize (if needed): cate analyze report.md then cate transform report.md
  4. Distribute: md2mail -i report.md to copy to clipboard for email

Logo Delivery

  1. Generate package: generate-logo-package --source logo.png --client acme -o ./output
  2. Review: Check manifest.json for items flagged as review
  3. Custom sizes: process-image --source logo.png --width 800 --height 600 for one-off sizes