Content Processing Commands
Commands for extracting, transforming, validating, and generating content from documents, web pages, images, and data files.
| Command | Input | Description |
|---|
extract-pdf | PDF files | Extract PDF to markdown with OCR and template support |
extract-web | URLs | Fetch web pages via Playwright to markdown/CSV |
extract-youtube-auto | YouTube URL | Extract metadata, description, and transcript |
| Command | Input | Description |
|---|
csv-map | CSV | Map columns using YAML profile |
md2mail | Markdown | Convert to Gmail-compatible rich text |
generate-presentation | Markdown | Generate branded PPTX from structured markdown |
Quality & Validation
| Command | Input | Description |
|---|
validate | Markdown files | Validate naming, frontmatter, and formatting rules |
health-check | Repository | Check project health, dependencies, and environment |
Anonymization (CATE)
| Command | Input | Description |
|---|
cate transform | Any file | Anonymize sensitive content (PII) |
cate batch | Directory | Batch-process multiple files for anonymization |
cate analyze | Any file | Detect sensitive entities without changes |
inbox-poll | IMAP inbox | Poll email inbox and dispatch by route rules |
Typical Workflow
- Extract:
extract-pdf --source ./invoices/ or extract-web --url https://example.com
- Validate:
validate --path ./extracted/ to check formatting compliance
- Anonymize (if needed):
cate analyze report.md then cate transform report.md
- Distribute:
md2mail -i report.md to copy to clipboard for email
Logo Delivery
- Generate package:
generate-logo-package --source logo.png --client acme -o ./output
- Review: Check
manifest.json for items flagged as review
- Custom sizes:
process-image --source logo.png --width 800 --height 600 for one-off sizes