Transforms API¶
classify¶
dita_etl.transforms.classify
¶
DITA topic-type classifier — pure functional core.
Classification proceeds in priority order:
- Filename rules (glob-style pattern matching against the basename).
- Content rules (regex search against the full document text).
- Built-in heuristics (keyword frequency in content).
- Default fallback →
"concept".
All functions are pure: they take data and return data with no side effects.
TOPIC_TYPES = frozenset({'concept', 'task', 'reference'})
module-attribute
¶
classify_topic(filename, content, rules_by_filename, rules_by_content)
¶
Determine the DITA topic type for a document.
:Example:
.. code-block:: python
result = classify_topic(
"install.md",
"Click the button to install...",
rules_by_filename=[],
rules_by_content=[],
)
assert result == "task"
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
Basename of the source file (e.g. |
required |
content
|
str
|
Full text content of the (intermediate) document. |
required |
rules_by_filename
|
list['ClassificationRule']
|
Ordered list of filename classification rules. |
required |
rules_by_content
|
list['ClassificationRule']
|
Ordered list of content classification rules. |
required |
Returns:
| Type | Description |
|---|---|
str
|
One of |
Source code in dita_etl/transforms/classify.py
dita¶
dita_etl.transforms.dita
¶
Pure DITA XML construction functions (functional core).
All functions are pure: given the same inputs they always return the same output and have no side effects. They produce well-formed DITA 1.3 XML fragments as plain strings; serialisation to disk is handled by the imperative shell.
extract_title(docbook_text)
¶
Extract the first <title> value from DocBook XML text.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
docbook_text
|
str
|
Raw DocBook XML string. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Title text, or |
Source code in dita_etl/transforms/dita.py
extract_body(docbook_text)
¶
Extract paragraph content from DocBook XML text as DITA <p> elements.
Paragraphs inside <para> elements are converted; if none are found
the plain text is wrapped in a single <p>.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
docbook_text
|
str
|
Raw DocBook XML string. |
required |
Returns:
| Type | Description |
|---|---|
str
|
String of one or more |
Source code in dita_etl/transforms/dita.py
build_topic(title, body, topic_type, topic_id='t1')
¶
Render a minimal DITA 1.3 topic element.
:Example:
.. code-block:: python
xml = build_topic("Installation", "<p>Run the installer.</p>", "task")
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
title
|
str
|
Topic title text (will be XML-escaped). |
required |
body
|
str
|
Pre-formatted body content (inserted verbatim — caller is responsible for validity). |
required |
topic_type
|
str
|
One of |
required |
topic_id
|
str
|
Value for the element's |
't1'
|
Returns:
| Type | Description |
|---|---|
str
|
Serialised DITA topic XML string. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If topic_type is not a known type. |
Source code in dita_etl/transforms/dita.py
make_topicref(topic_path, base_dir)
¶
Build a <topicref> element with a path relative to the map file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
topic_path
|
str
|
Absolute or relative path to the DITA topic file. |
required |
base_dir
|
str
|
Directory that the DITA map will be written to. The
|
required |
Returns:
| Type | Description |
|---|---|
str
|
A |
Source code in dita_etl/transforms/dita.py
build_map(title, topic_paths, base_dir)
¶
Build a complete DITA map XML document.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
title
|
str
|
Human-readable map title (will be XML-escaped). |
required |
topic_paths
|
list[str]
|
Paths to all DITA topic files to include, in order. |
required |
base_dir
|
str
|
Directory where the map will be written (used to compute
relative |
required |
Returns:
| Type | Description |
|---|---|
str
|
Complete DITA map XML as a string. |