Contracts API¶

`dita_etl.contracts` ¶

Stage input/output contracts.

All contracts are immutable frozen dataclasses that validate their contents on construction. They form the type-safe boundaries between pipeline stages, making implicit assumptions explicit and enabling confident refactoring.

Example usage::

input_ = ExtractInput(
    source_paths=("docs/guide.md", "docs/ref.html"),
    intermediate_dir="build/intermediate",
)
output = extract_stage.run(input_)
assert output.success

`ContractError` ¶

Bases: ValueError

Raised when a stage contract is violated at construction time.

Source code in dita_etl/contracts.py

class ContractError(ValueError):
    """Raised when a stage contract is violated at construction time."""

`AssessInput` `dataclass` ¶

Input contract for the Assess stage.

Parameters:

Name	Type	Description	Default
`source_paths`	`tuple[str, ...]`	Absolute or relative paths of all source files to assess.	required
`output_dir`	`str`	Directory where assessment artefacts will be written.	required
`config_path`	`str`	Path to the assessment YAML configuration file.	required

Source code in dita_etl/contracts.py

@dataclass(frozen=True)
class AssessInput:
    """Input contract for the Assess stage.

    :param source_paths: Absolute or relative paths of all source files to
        assess.
    :param output_dir: Directory where assessment artefacts will be written.
    :param config_path: Path to the assessment YAML configuration file.
    """

    source_paths: tuple[str, ...]
    output_dir: str
    config_path: str

    def __post_init__(self) -> None:
        if not self.source_paths:
            raise ContractError("AssessInput.source_paths must not be empty")
        if not self.output_dir:
            raise ContractError("AssessInput.output_dir must not be empty")
        if not self.config_path:
            raise ContractError("AssessInput.config_path must not be empty")

`AssessOutput` `dataclass` ¶

Output contract for the Assess stage.

Parameters:

Name	Type	Description	Default
`inventory_path`	`str`	Path to the written `inventory.json` file.	required
`dedupe_path`	`str`	Path to the written `dedupe_map.json` file.	required
`report_path`	`str`	Path to the written HTML report file.	required
`plans_dir`	`str`	Directory containing per-file conversion plan JSONs.	required

Source code in dita_etl/contracts.py

@dataclass(frozen=True)
class AssessOutput:
    """Output contract for the Assess stage.

    :param inventory_path: Path to the written ``inventory.json`` file.
    :param dedupe_path: Path to the written ``dedupe_map.json`` file.
    :param report_path: Path to the written HTML report file.
    :param plans_dir: Directory containing per-file conversion plan JSONs.
    """

    inventory_path: str
    dedupe_path: str
    report_path: str
    plans_dir: str

    def __post_init__(self) -> None:
        if not self.inventory_path:
            raise ContractError("AssessOutput.inventory_path must not be empty")

`ExtractInput` `dataclass` ¶

Input contract for the Extract stage.

Parameters:

Name	Type	Description	Default
`source_paths`	`tuple[str, ...]`	Paths to source documents to convert.	required
`intermediate_dir`	`str`	Directory where intermediate DocBook XML files will be written.	required
`handler_overrides`	`dict[str, str]`	Optional mapping of file extension to extractor name, e.g. `{".docx": "oxygen-docx"}`.	`dict()`
`max_workers`	`int \| None`	Thread-pool size for parallel extraction. `None` uses a sensible default based on CPU count.	`None`

Source code in dita_etl/contracts.py

@dataclass(frozen=True)
class ExtractInput:
    """Input contract for the Extract stage.

    :param source_paths: Paths to source documents to convert.
    :param intermediate_dir: Directory where intermediate DocBook XML files
        will be written.
    :param handler_overrides: Optional mapping of file extension to extractor
        name, e.g. ``{".docx": "oxygen-docx"}``.
    :param max_workers: Thread-pool size for parallel extraction. ``None``
        uses a sensible default based on CPU count.
    """

    source_paths: tuple[str, ...]
    intermediate_dir: str
    handler_overrides: dict[str, str] = field(default_factory=dict)
    max_workers: int | None = None

    def __post_init__(self) -> None:
        if not self.source_paths:
            raise ContractError("ExtractInput.source_paths must not be empty")
        if not self.intermediate_dir:
            raise ContractError("ExtractInput.intermediate_dir must not be empty")
        if self.max_workers is not None and self.max_workers < 1:
            raise ContractError("ExtractInput.max_workers must be >= 1")

`ExtractOutput` `dataclass` ¶

Output contract for the Extract stage.

Parameters:

Name	Type	Description	Default
`outputs`	`dict[str, str]`	Mapping of source path → intermediate XML path for every successfully extracted file.	required
`errors`	`dict[str, str]`	Mapping of source path → error message for every file that failed extraction.	required

Source code in dita_etl/contracts.py

@dataclass(frozen=True)
class ExtractOutput:
    """Output contract for the Extract stage.

    :param outputs: Mapping of source path → intermediate XML path for every
        successfully extracted file.
    :param errors: Mapping of source path → error message for every file that
        failed extraction.
    """

    outputs: dict[str, str]
    errors: dict[str, str]

    @property
    def success(self) -> bool:
        """``True`` when no extraction errors occurred."""
        return len(self.errors) == 0

`success` `property` ¶

True when no extraction errors occurred.

`TransformInput` `dataclass` ¶

Input contract for the Transform stage.

Parameters:

Name	Type	Description	Default
`intermediates`	`dict[str, str]`	Mapping of source path → intermediate XML path (output of the Extract stage).	required
`output_dir`	`str`	Directory where DITA topic files will be written.	required
`rules_by_filename`	`tuple[object, ...]`	Classification rules matched against filenames.	`tuple()`
`rules_by_content`	`tuple[object, ...]`	Classification rules matched against file content.	`tuple()`

Source code in dita_etl/contracts.py

@dataclass(frozen=True)
class TransformInput:
    """Input contract for the Transform stage.

    :param intermediates: Mapping of source path → intermediate XML path
        (output of the Extract stage).
    :param output_dir: Directory where DITA topic files will be written.
    :param rules_by_filename: Classification rules matched against filenames.
    :param rules_by_content: Classification rules matched against file content.
    """

    intermediates: dict[str, str]
    output_dir: str
    rules_by_filename: tuple[object, ...] = field(default_factory=tuple)
    rules_by_content: tuple[object, ...] = field(default_factory=tuple)

    def __post_init__(self) -> None:
        if not self.output_dir:
            raise ContractError("TransformInput.output_dir must not be empty")

`TransformOutput` `dataclass` ¶

Output contract for the Transform stage.

Parameters:

Name	Type	Description	Default
`topics`	`dict[str, list[str]]`	Mapping of source path → list of generated DITA topic paths.	required
`errors`	`dict[str, str]`	Mapping of source path → error message for every file that failed transformation.	required

Source code in dita_etl/contracts.py

@dataclass(frozen=True)
class TransformOutput:
    """Output contract for the Transform stage.

    :param topics: Mapping of source path → list of generated DITA topic
        paths.
    :param errors: Mapping of source path → error message for every file that
        failed transformation.
    """

    topics: dict[str, list[str]]
    errors: dict[str, str]

    @property
    def success(self) -> bool:
        """``True`` when no transform errors occurred."""
        return len(self.errors) == 0

`success` `property` ¶

True when no transform errors occurred.

`LoadInput` `dataclass` ¶

Input contract for the Load stage.

Parameters:

Name	Type	Description	Default
`topics`	`dict[str, list[str]]`	Mapping of source path → list of DITA topic paths (output of the Transform stage).	required
`output_dir`	`str`	Directory where the DITA map and assets will be written.	required
`map_title`	`str`	Human-readable title for the generated DITA map.	required
`intermediate_dir`	`str \| None`	Optional path to the intermediate directory so that assets (images, styles) can be copied to the output.	`None`

Source code in dita_etl/contracts.py

@dataclass(frozen=True)
class LoadInput:
    """Input contract for the Load stage.

    :param topics: Mapping of source path → list of DITA topic paths (output
        of the Transform stage).
    :param output_dir: Directory where the DITA map and assets will be written.
    :param map_title: Human-readable title for the generated DITA map.
    :param intermediate_dir: Optional path to the intermediate directory so
        that assets (images, styles) can be copied to the output.
    """

    topics: dict[str, list[str]]
    output_dir: str
    map_title: str
    intermediate_dir: str | None = None

    def __post_init__(self) -> None:
        if not self.output_dir:
            raise ContractError("LoadInput.output_dir must not be empty")
        if not self.map_title:
            raise ContractError("LoadInput.map_title must not be empty")

`LoadOutput` `dataclass` ¶

Output contract for the Load stage.

Parameters:

Name	Type	Description	Default
`map_path`	`str`	Absolute path to the written DITA map file.	required
`topic_count`	`int`	Number of topic references included in the map.	required

Source code in dita_etl/contracts.py

@dataclass(frozen=True)
class LoadOutput:
    """Output contract for the Load stage.

    :param map_path: Absolute path to the written DITA map file.
    :param topic_count: Number of topic references included in the map.
    """

    map_path: str
    topic_count: int

    def __post_init__(self) -> None:
        if self.topic_count < 0:
            raise ContractError("LoadOutput.topic_count must be >= 0")

`PipelineOutput` `dataclass` ¶

Aggregated result returned by the full pipeline run.

Parameters:

Name	Type	Description	Default
`assess`	`AssessOutput`	Output from the Assess stage.	required
`extract`	`ExtractOutput`	Output from the Extract stage.	required
`transform`	`TransformOutput`	Output from the Transform stage.	required
`load`	`LoadOutput`	Output from the Load stage.	required

Source code in dita_etl/contracts.py

@dataclass(frozen=True)
class PipelineOutput:
    """Aggregated result returned by the full pipeline run.

    :param assess: Output from the Assess stage.
    :param extract: Output from the Extract stage.
    :param transform: Output from the Transform stage.
    :param load: Output from the Load stage.
    """

    assess: AssessOutput
    extract: ExtractOutput
    transform: TransformOutput
    load: LoadOutput

    @property
    def map_path(self) -> str:
        """Convenience accessor for the final DITA map path."""
        return self.load.map_path

`map_path` `property` ¶

Convenience accessor for the final DITA map path.

Contracts API¶

dita_etl.contracts ¶

ContractError ¶

AssessInput dataclass ¶

AssessOutput dataclass ¶

ExtractInput dataclass ¶

ExtractOutput dataclass ¶

success property ¶

TransformInput dataclass ¶

TransformOutput dataclass ¶

success property ¶

LoadInput dataclass ¶

LoadOutput dataclass ¶

PipelineOutput dataclass ¶

map_path property ¶

`dita_etl.contracts` ¶

`ContractError` ¶

`AssessInput` `dataclass` ¶

`AssessOutput` `dataclass` ¶

`ExtractInput` `dataclass` ¶

`ExtractOutput` `dataclass` ¶

`success` `property` ¶

`TransformInput` `dataclass` ¶

`TransformOutput` `dataclass` ¶

`success` `property` ¶

`LoadInput` `dataclass` ¶

`LoadOutput` `dataclass` ¶

`PipelineOutput` `dataclass` ¶

`map_path` `property` ¶