Contracts API¶
dita_etl.contracts
¶
Stage input/output contracts.
All contracts are immutable frozen dataclasses that validate their contents on construction. They form the type-safe boundaries between pipeline stages, making implicit assumptions explicit and enabling confident refactoring.
Example usage::
input_ = ExtractInput(
source_paths=("docs/guide.md", "docs/ref.html"),
intermediate_dir="build/intermediate",
)
output = extract_stage.run(input_)
assert output.success
ContractError
¶
AssessInput
dataclass
¶
Input contract for the Assess stage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source_paths
|
tuple[str, ...]
|
Absolute or relative paths of all source files to assess. |
required |
output_dir
|
str
|
Directory where assessment artefacts will be written. |
required |
config_path
|
str
|
Path to the assessment YAML configuration file. |
required |
Source code in dita_etl/contracts.py
AssessOutput
dataclass
¶
Output contract for the Assess stage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
inventory_path
|
str
|
Path to the written |
required |
dedupe_path
|
str
|
Path to the written |
required |
report_path
|
str
|
Path to the written HTML report file. |
required |
plans_dir
|
str
|
Directory containing per-file conversion plan JSONs. |
required |
Source code in dita_etl/contracts.py
ExtractInput
dataclass
¶
Input contract for the Extract stage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source_paths
|
tuple[str, ...]
|
Paths to source documents to convert. |
required |
intermediate_dir
|
str
|
Directory where intermediate DocBook XML files will be written. |
required |
handler_overrides
|
dict[str, str]
|
Optional mapping of file extension to extractor
name, e.g. |
dict()
|
max_workers
|
int | None
|
Thread-pool size for parallel extraction. |
None
|
Source code in dita_etl/contracts.py
ExtractOutput
dataclass
¶
Output contract for the Extract stage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
outputs
|
dict[str, str]
|
Mapping of source path → intermediate XML path for every successfully extracted file. |
required |
errors
|
dict[str, str]
|
Mapping of source path → error message for every file that failed extraction. |
required |
Source code in dita_etl/contracts.py
success
property
¶
True when no extraction errors occurred.
TransformInput
dataclass
¶
Input contract for the Transform stage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
intermediates
|
dict[str, str]
|
Mapping of source path → intermediate XML path (output of the Extract stage). |
required |
output_dir
|
str
|
Directory where DITA topic files will be written. |
required |
rules_by_filename
|
tuple[object, ...]
|
Classification rules matched against filenames. |
tuple()
|
rules_by_content
|
tuple[object, ...]
|
Classification rules matched against file content. |
tuple()
|
Source code in dita_etl/contracts.py
TransformOutput
dataclass
¶
Output contract for the Transform stage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
topics
|
dict[str, list[str]]
|
Mapping of source path → list of generated DITA topic paths. |
required |
errors
|
dict[str, str]
|
Mapping of source path → error message for every file that failed transformation. |
required |
Source code in dita_etl/contracts.py
success
property
¶
True when no transform errors occurred.
LoadInput
dataclass
¶
Input contract for the Load stage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
topics
|
dict[str, list[str]]
|
Mapping of source path → list of DITA topic paths (output of the Transform stage). |
required |
output_dir
|
str
|
Directory where the DITA map and assets will be written. |
required |
map_title
|
str
|
Human-readable title for the generated DITA map. |
required |
intermediate_dir
|
str | None
|
Optional path to the intermediate directory so that assets (images, styles) can be copied to the output. |
None
|
Source code in dita_etl/contracts.py
LoadOutput
dataclass
¶
Output contract for the Load stage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
map_path
|
str
|
Absolute path to the written DITA map file. |
required |
topic_count
|
int
|
Number of topic references included in the map. |
required |
Source code in dita_etl/contracts.py
PipelineOutput
dataclass
¶
Aggregated result returned by the full pipeline run.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
assess
|
AssessOutput
|
Output from the Assess stage. |
required |
extract
|
ExtractOutput
|
Output from the Extract stage. |
required |
transform
|
TransformOutput
|
Output from the Transform stage. |
required |
load
|
LoadOutput
|
Output from the Load stage. |
required |
Source code in dita_etl/contracts.py
map_path
property
¶
Convenience accessor for the final DITA map path.