Skip to content

Transform a DITA Package

The DITA Package Processor transforms a source DITA package into a materialized target package using a strict, deterministic pipeline.

The pipeline consists of:

  • Discovery – observe and classify the source package
  • Normalize – enforce structural invariants
  • Planning – generate an explicit execution plan
  • Execution – materialize content into a target directory

Filesystem mutation only occurs when explicitly enabled with --apply.

There is no in-place mutation.
There is no guessing.
There is no implicit execution.


The one command that matters

To transform a source DITA package into a materialized target package, run:

dita_package_processor run \
  --package /path/to/source-dita-package \
  --source-root /path/to/source-dita-package \
  --output /path/to/target-package \
  --apply

That’s the canonical invocation.

No extra flags.
No hidden defaults.
No accidental overwrites.


What each part actually does (no lies)

--package

This identifies the logical package root for discovery and planning.

It must contain:

  • index.ditamap
  • a resolvable main map
  • well-formed XML

Nothing in this directory is mutated.

Ever.


--source-root

This is the filesystem truth used during execution.

Key rule (the one that bit you):

All paths in the execution plan are resolved relative to --source-root.

If this is wrong, execution will “work” but find nothing.

This flag is mandatory for execute and required for run once execution is involved.


--output

This is the target directory where the processed package is written.

Rules enforced by the CLI:

  • Required when --apply is present
  • Must be writable
  • Is created if missing
  • Is never deleted implicitly
  • Is never allowed to overlap the source tree

There is no in-place mutation.

This is intentional and non-negotiable.


--apply

This flag explicitly authorizes filesystem mutation.

Without it:

  • discovery runs
  • normalization runs
  • planning runs
  • execution runs in dry-run mode only
  • no files are written

With it:

  • execution switches to the filesystem executor
  • actions are applied deterministically
  • files are written into --output
  • results are recorded in an execution report

No --apply means no writes. Always.


What actually happens internally (the real pipeline)

When you run the command above, the pipeline is:

Discovery
  ↓
Normalize
  ↓
Planning
  ↓
Execution
    • sandboxed filesystem access
    • explicit source-root resolution
    • deterministic action dispatch

Important clarifications:

  • There is no separate materialization phase anymore
  • Layout resolution is encoded directly in the plan
  • Collision prevention is enforced by:
  • explicit target paths
  • sandbox validation
  • executor policy checks

Execution never invents paths.
It only executes what the plan describes.


What you get at the end

After a successful run with --apply, you have:

  • A fully materialized DITA package in --output
  • Deterministic directory layout
  • Explicitly copied maps, topics, and media
  • Normalized map hierarchy
  • Optional semantic refactors applied
  • A package that opens cleanly in Oxygen

Example:

oxygen /path/to/target-package/<main-map>.ditamap

No cleanup required.
No manual repair.


dita_package_processor run \
  --package ./source \
  --source-root ./source \
  --output ./out

This:

  • executes the full pipeline
  • simulates execution
  • validates every action
  • proves path resolution
  • writes no files

If this fails, applying would have failed too.


Writing an execution report

dita_package_processor run \
  --package ./source \
  --source-root ./source \
  --output ./out \
  --apply \
  --report execution_report.json

The report records:

  • every action
  • execution status
  • handler identity
  • dry-run vs applied state
  • failures and skips
  • timing (started_at, finished_at, duration_ms)
  • discovery counts (maps, topics, media, missing_references, external_references)

It is a forensic artifact.
You can diff it, archive it, or feed it into CI.

Quick checks:

jq '{dry_run, summary, discovery}' execution_report.json
jq '{started_at, finished_at, duration_ms}' execution_report.json
jq '.results[] | select(.status == "failed") | {action_id, handler, error_type, error}' execution_report.json

For full usage examples, see Execution Report Guide.


The important meta-point

You did not build:

  • a script
  • a copier
  • a batch hack

You built:

  • a compiler-style pipeline
  • with explicit contracts
  • a validated plan
  • a sandboxed executor
  • and a boring execution phase

The CLI feels strict because it is.

Strict is what makes this safe at scale.