Design: DITA Package Processor¶
Overview¶
The current system is a deterministic, contract-driven DITA processing toolchain with a plugin-based extension model.
The public lifecycle is:
discover -> normalize -> plan -> execute
The run command orchestrates that lifecycle and wraps execution with materialization preflight and finalize behavior.
Design Goals¶
1. Deterministic Behavior¶
- the same input contract should produce the same plan
- the same plan and execution mode should produce the same execution behavior
- ordering is explicit and stable
- failures should be structural and loud, not heuristic
2. Contracts Between Phases¶
Each phase produces a durable artifact or model boundary:
- discovery produces inventory and report data
- normalization produces
PlanningInput - planning produces
plan.json-compatible action data - execution produces an
ExecutionReport
This keeps reasoning localized. If something is wrong, there is a concrete boundary to inspect.
3. Plugin-Based Extensibility¶
The current extension surface is the plugin system, not runtime pipeline step injection.
Plugins can contribute:
- discovery patterns
- planning action emission
- execution handlers
Built-in behavior is exposed through CorePlugin, which loads before any third-party plugin.
4. Explicit Mutation Boundaries¶
- discovery does not mutate
- normalization does not mutate
- planning does not mutate
- execution is dry-run by default
- real writes require
--applyand a bounded target root
High-Level Architecture¶
CLI
-> Discovery
-> Normalization
-> Planning
-> Materialization preflight
-> Execution
-> Execution report
Cross-cutting extension flow:
plugins.patterns() -> discovery evidence
plugins.emit_actions() -> plan actions
plugins.handlers() -> execution registry
Responsibility Map¶
| Concern | Current Location |
|---|---|
| CLI routing and global flags | cli.py and cli_* modules |
| Discovery scanning and reports | discovery/* |
| Discovery-to-planning normalization | planning/contracts/* |
| Plan generation | planning/planner.py |
| Pipeline orchestration | pipeline.py |
| Materialization safety/finalize | materialization/* |
| Executor selection and dispatch | orchestration.py, execution/* |
| Plugin contract and loading | plugins/* |
Plugin Design¶
DitaPlugin¶
DitaPlugin is the root extension contract.
Each plugin must expose:
nameversion
And may expose:
patterns()emit_actions(...)handlers()
Loading Model¶
Plugins are loaded from the Python entry-point group dita_package_processor.plugins.
Load order is deterministic:
CorePlugin- third-party plugins sorted by entry-point name
Conflict Rules¶
The system fails at startup on:
- duplicate pattern IDs across plugins
- duplicate handler
action_typevalues across plugins - structurally invalid plugins
This is intentional. Silent extension conflicts are not acceptable in this tool.
Discovery Design¶
Discovery observes package structure and classification evidence.
Current discovery classification is plugin-aware:
- the classifier asks the plugin registry for all patterns
- built-in and third-party patterns participate in the same evaluation path
- discovery carries evidence forward for planning
Discovery is read-only and should never encode mutation behavior.
Planning Design¶
Planning consumes PlanningInput only.
The planner:
- sorts artifacts deterministically
- asks every loaded plugin to emit actions for each artifact
- assigns stable action IDs after aggregation
- validates the resulting plan against the schema and invariants
The planner does not discover files, mutate content, or register handlers.
Execution Design¶
Execution is mediated by executors and handlers.
- executors enforce dry-run versus apply behavior
- handlers implement one action type each
- the execution handler registry is populated from plugin-contributed handlers
This means the execution layer is also plugin-aware, not manually hardcoded through import-time registration.
Materialization Design¶
Materialization remains the safety boundary around execution.
It is responsible for:
- preflight validation of the target root
- finalization around execution output
- report-related artifact handling
It does not replace planning or execution and should not become a hidden behavior layer.
Testing Strategy¶
Current testing emphasis:
- contract tests for discovery, planning, execution, and schemas
- execution tests around filesystem and safety behavior
- integration tests across end-to-end flows
- plugin-aware behavior verified through planning and execution surfaces
Non-Goals¶
The current system is not:
- an interactive editor
- an inference engine
- a silent repair tool
- an unbounded workflow engine
- a step-injection framework for external customization
Summary¶
The repo has moved from an older step-oriented architecture to a contract-and-plugin architecture:
- phases are explicit
- extension points are explicit
- mutation is explicit
- conflicts fail loudly
That is the current design center.