cate batch
Overview
Batch-process multiple files in a directory for anonymization. Supports glob patterns, parallel workers, resume capability for interrupted jobs, and configurable error handling.
Usage
python -m src.anonymization.cli batch DIRECTORY [OPTIONS]
Options
| Option | Description | Default |
|---|---|---|
DIRECTORY | Directory containing files to process (positional, required) | -- |
-o, --output-dir | Output directory | <dir>/_anonymized |
-p, --pattern | Glob pattern(s) to match files; repeatable | * |
-s, --strategy | Transformation strategy | placeholder |
-j, --parallel | Number of parallel workers | 1 |
-r, --resume | Resume a previously interrupted batch | false |
--overwrite | Overwrite existing output files | false |
--no-recursive | Don't search subdirectories | false |
--continue-on-error | Continue if a file fails (default) | true |
--stop-on-error | Stop on first error | false |
-c, --config | Path to CATE configuration file | -- |
Prerequisites
- Repo: content-conductor
- Install:
pip install -r requirements.txtfrom repo root
Examples
Process entire directory
python -m src.anonymization.cli batch ./documents -o ./anonymized
Only text files with parallel processing
python -m src.anonymization.cli batch ./docs --pattern "*.txt" --parallel 4
Resume interrupted batch
python -m src.anonymization.cli batch ./documents --resume
Related Commands
cc cate transform-- transform a single filecc cate analyze-- detect PII without making changes