Config API¶
dita_etl.config
¶
Pipeline configuration dataclasses.
Configuration is loaded once at startup from a YAML file and passed immutably through the pipeline. No I/O occurs after initial loading.
Example YAML structure::
tooling:
pandoc_path: /usr/local/bin/pandoc
java_path: /usr/bin/java
saxon_jar: /opt/saxon/saxon9he.jar
source_formats:
treat_as_html: [".html", ".htm"]
dita_output:
output_folder: build/out
map_title: "My Documentation Set"
classification_rules:
by_filename:
- match: "index"
type: "concept"
by_content:
- match: "procedure"
type: "task"
ClassificationRule
dataclass
¶
A single topic-classification rule.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pattern
|
str | None
|
Glob pattern (for filename rules) or regex fragment (for
content rules). The legacy |
None
|
type
|
str
|
DITA topic type to assign when the rule matches — one of
|
''
|
Source code in dita_etl/config.py
topic_type
property
¶
Alias for :attr:type for API consistency.
Chunking
dataclass
¶
Chunking parameters used during topic generation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
level
|
int
|
Heading level at which to split into separate topics. |
1
|
nested_topics
|
bool
|
Whether to nest child topics under their parent. |
True
|
Source code in dita_etl/config.py
DITAOutput
dataclass
¶
DITA output settings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dita_version
|
str
|
Target DITA version string (e.g. |
'1.3'
|
use_specialization
|
bool
|
Whether to emit DITA specialization elements. |
False
|
output_folder
|
str
|
Root folder for all pipeline build artefacts. |
'out/dita'
|
map_title
|
str
|
Title written into the generated DITA map. |
'Documentation Set'
|
Source code in dita_etl/config.py
Tooling
dataclass
¶
External tool configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pandoc_path
|
str
|
Absolute path (or command name) for the Pandoc binary. |
'pandoc'
|
oxygen_scripts_dir
|
str | None
|
Optional path to Oxygen XML Editor's scripts directory, required only when using the Oxygen DOCX extractor. |
None
|
saxon_jar
|
str
|
Path to the Saxon HE JAR file for XSLT transformation. |
'saxon-he.jar'
|
java_path
|
str
|
Absolute path (or command name) for the Java binary. |
'java'
|
Source code in dita_etl/config.py
Config
dataclass
¶
Root configuration object for the full ETL pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source_formats
|
dict[str, list[str]]
|
Mapping of treat-as keys to lists of file
extensions, e.g. |
(lambda: {'treat_as_markdown': ['.md']})()
|
classification_rules
|
dict[str, list[ClassificationRule]]
|
Mapping with |
dict()
|
chunking
|
Chunking
|
Chunking parameters. |
Chunking()
|
dita_output
|
DITAOutput
|
DITA output settings. |
DITAOutput()
|
tooling
|
Tooling
|
External tool paths. |
Tooling()
|
Source code in dita_etl/config.py
load(path)
staticmethod
¶
Load and parse a YAML configuration file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to the YAML configuration file. |
required |
Returns:
| Type | Description |
|---|---|
'Config'
|
Fully populated :class: |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If path does not exist. |
yaml.YAMLError
|
If the file is not valid YAML. |
Source code in dita_etl/config.py
source_extensions()
¶
Return all configured source file extensions.
Returns:
| Type | Description |
|---|---|
list[str]
|
Sorted, deduplicated list of extension strings (e.g.
|