API Reference

All public symbols are importable from the top-level package unless otherwise noted.

from mplm import (
    run_pipeline,
    compile_pattern_spec,
    validate_draft,
    validate_draft_collection,
    render_validation_dashboard_html,
    save_validation_dashboard_html,
    render_validation_index_markdown,
    render_validation_index_json,
    save_validation_index,
    save_validation_index_bundle,
    render_validation_report_markdown,
    render_validation_report_json,
    save_validation_report,
    Node,
    Pattern,
    PatternLibrary,
    Level,
)
from mplm.io import load_text, save_library, load_library, load_compiled_spec, save_compiled_spec
from mplm.normalizer import Normalizer
from mplm.boundary import BoundaryDetector
from mplm.miner import PatternMiner
from mplm.compiler import compile_pattern_spec
from mplm.reports import (
    render_validation_index_markdown,
    render_validation_index_json,
    save_validation_index,
    save_validation_index_bundle,
    render_validation_report_markdown,
    render_validation_report_json,
    save_validation_report,
)
from mplm.dashboard import (
    render_validation_dashboard_html,
    save_validation_dashboard_html,
)
from mplm.validator import (
    validate_draft,
    validate_draft_collection,
    ValidationBatchResult,
    ValidationResult,
)
from mplm.serializer import PatternRepository

Pipeline function

`run_pipeline(text, source, min_support)`

from mplm import run_pipeline

Orchestrate all three transformation stages in sequence and return discovered patterns. This is the primary entry point for programmatic use.

Parameters

Name	Type	Default	Description
`text`	`str`	—	Raw file content (Markdown or plain text).
`source`	`str`	`"<mem>"`	Human-readable origin label propagated into node `attrs` (e.g. a file path).
`min_support`	`int`	`2`	Minimum number of occurrences a pattern must have to be included in the output.

Returns PatternLibrary

Logs INFO-level structured messages at each stage boundary (enable with logging.basicConfig(level=logging.INFO) or mplm --log-level INFO mine …).

Example

from mplm import run_pipeline
from mplm.io import load_text, save_library

text = load_text("corpus/chapter1.md")
lib = run_pipeline(text, source="chapter1.md", min_support=2)
save_library(lib, "patterns.yml")

for p in lib.patterns:
    print(p.id, p.level, p.structure["signature"], p.support)

I/O functions

from mplm.io import load_text, save_library, load_library, load_compiled_spec, save_compiled_spec

All file-system access is isolated in this module so the pipeline stages remain pure functions.

`load_text(path)`

Read a text or Markdown file and return its raw content.

Parameters

Name	Type	Description
`path`	`str`	Absolute or relative path to the source file.

Returns str — Full file content decoded as UTF-8.

Raises FileNotFoundError if path does not exist.

`save_library(lib, path)`

Write a PatternLibrary to disk. Format is selected by extension: .json → JSON, anything else → YAML.

Parameters

Name	Type	Description
`lib`	`PatternLibrary`	The library to persist.
`path`	`str`	Destination file path.

`load_library(path)`

Read a PatternLibrary from disk using the repository loader.

Parameter	Type	Description
`path`	`str`	Source YAML or JSON path.

Returns PatternLibrary

`save_compiled_spec(spec, path)`

Write a compiled LLM-ready bridge spec to disk. .json writes JSON; everything else writes YAML.

Parameter	Type	Description
`spec`	`Mapping[str, Any]`	Compiler output dict.
`path`	`str`	Destination YAML or JSON path.

`load_compiled_spec(path)`

Read a compiled bridge spec from YAML or JSON.

Parameter	Type	Description
`path`	`str`	Source YAML or JSON path.

Returns Dict[str, Any]

Compiler

`compile_pattern_spec(lib, ..., top_k_children=3)`

from mplm import compile_pattern_spec

Compile a mined PatternLibrary into the LLM-ready bridge schema. The compiler fills factual fields deterministically and marks interpretive fields for review.

Parameters

Name	Type	Default	Description
`lib`	`PatternLibrary`	—	Source mined library.
`pattern_library_path`	`str`	`"patterns.yml"`	Provenance path copied into the spec.
`corpus_label`	`Optional[str]`	`None`	Human-readable corpus label; defaults to the input filename stem.
`source_kind`	`Optional[str]`	`None`	Override for `"single_document"` or `"corpus"`.
`title`	`Optional[str]`	`None`	Optional output title.
`intent`	`Optional[str]`	`None`	Optional top-level purpose string.
`mined_at`	`Optional[str]`	`None`	Optional timestamp string.
`top_k_children`	`int`	`3`	Number of child-pattern candidates to attach to each structural slot.

Returns Dict[str, Any]

Example

from mplm.compiler import compile_pattern_spec
from mplm.io import load_library, save_compiled_spec

lib = load_library("patterns.yml")
spec = compile_pattern_spec(lib, pattern_library_path="patterns.yml")
save_compiled_spec(spec, "compiled-pattern-spec.yml")

Validator

`validate_draft(spec, draft_text, ..., contract_id=None, template_id=None)`

from mplm import validate_draft

Validate generated or manually written text against a compiled spec's validation_contracts. The validator parses the draft into the existing AST, checks structural slot matches, checks lexical requirements, and checks policy constraints such as example overlap.

Parameters

Name	Type	Default	Description
`spec`	`Mapping[str, Any]`	—	Compiled bridge spec.
`draft_text`	`str`	—	Generated or authored text to validate.
`contract_id`	`Optional[str]`	`None`	Explicit validation contract ID.
`template_id`	`Optional[str]`	`None`	Template ID used to resolve the contract.
`source`	`str`	`"<draft>"`	Provenance label attached during parsing.

Returns ValidationResult

ValidationResult.to_dict() returns a serialisable dict with overall score, a dashboard-friendly numeric sort_key, category scores, a machine-readable severity_summary, and per-check outcomes.

Example

from mplm.io import load_compiled_spec, load_text
from mplm.validator import validate_draft

spec = load_compiled_spec("compiled-pattern-spec.yml")
draft = load_text("draft.md")
result = validate_draft(spec, draft, template_id="tmpl-l5-para-para")

print(result.passed, result.score)
for check in result.checks:
    print(check.id, check.passed, check.message)

`validate_draft_collection(spec, drafts, ..., contract_id=None, template_id=None)`

Validate a collection of drafts and return one ValidationBatchResult containing per-draft results plus aggregate counts and a failure index.

drafts should be an iterable of (path, text) tuples.

from mplm.validator import validate_draft_collection

batch = validate_draft_collection(
    spec,
    [
        ("drafts/a.md", draft_a),
        ("drafts/b.md", draft_b),
    ],
    template_id="tmpl-l5-para-para",
)

print(batch.total_drafts, batch.failed_drafts)

Reports

`render_validation_index_markdown(result)`

Render a ValidationBatchResult as a Markdown summary index.

`render_validation_index_json(result)`

Render a ValidationBatchResult as formatted JSON.

`save_validation_index(result, path)`

Write a batch validation summary index to disk. .json produces JSON; any other extension produces Markdown.

`save_validation_index_bundle(result, index_path, detail_reports_dir=None)`

Write one aggregate validation index plus one detail report per draft. The detail reports are written into a sibling directory by default, and the index includes links to them.

For JSON bundles, the aggregate index includes:

detail_reports: per-draft detail-report paths
detail_report_checks: per-draft, per-check anchor metadata and hrefs
severity_index: aggregate severity metadata for the batch

Each JSON detail report also includes:

index_path: backlink to the aggregate index
check_anchors: stable anchor metadata for each check
severity_summary: per-draft severity counts, highest failed severity, and numeric severity ranks

`render_validation_report_markdown(result)`

Render a ValidationResult as a Markdown review report with failed checks listed before passing checks.

`render_validation_report_json(result)`

Render a ValidationResult as formatted JSON.

`save_validation_report(result, path)`

Write a validation report to disk. .json produces JSON; any other extension produces Markdown.

`render_validation_dashboard_html(payload, title=None)`

Render a static HTML dashboard from a validation index JSON payload. The payload is expected to be the JSON written by save_validation_index_bundle(..., "validation-index.json") or render_validation_index_json(...).

`save_validation_dashboard_html(payload, path, title=None)`

Write the dashboard HTML document to disk.

from mplm.dashboard import save_validation_dashboard_html
from mplm.reports import save_validation_index_bundle

save_validation_index_bundle(batch, "validation-index.json")
save_validation_dashboard_html(payload, "validation-dashboard.html")

Data models

from mplm import Node, Pattern, PatternLibrary, Level

`Level`

IntEnum representing the six hierarchical depths of a document.

Constant	Value	Meaning
`Level.PHRASE`	`1`	Comma/semicolon/colon clause — finest unit.
`Level.LINE`	`2`	Sentence or logical line.
`Level.PARAGRAPH`	`3`	Consecutive non-blank lines; also list items and table rows.
`Level.CHUNK`	`4`	Topic grouping (reserved for future use).
`Level.SECTION`	`5`	Heading-delimited block.
`Level.DOCUMENT`	`6`	Entire file — coarsest unit.

Example

from mplm import Level

for p in lib.patterns:
    if p.level == Level.PARAGRAPH:
        print(p.structure["signature"])

`Node`

Dataclass representing a single structural unit at any Level.

Constructor

Field	Type	Default	Description
`id`	`str`	required	Unique identifier within the document tree. Must be non-empty.
`level`	`Level`	required	Structural depth. Integer values are coerced to `Level`.
`text`	`str`	required	Raw text content of the node.
`children`	`List[Node]`	`[]`	Ordered child nodes, one level finer than `level`.
`source_slice`	`Optional[Tuple[int, int]]`	`None`	`(start_line, end_line)` span in the source file.
`attrs`	`Dict[str, Any]`	`{}`	Stage-populated metadata — see table below.

attrs keys

Key	Type	Set by	Description
`"source"`	`str`	`Normalizer`, `BoundaryDetector`	Origin file path or label.
`"lineno"`	`int`	`Normalizer`	Line number of the raw line.
`"depth"`	`int`	`Normalizer`	Heading depth (`#` count) for SECTION nodes.
`"start_line"`	`int`	`BoundaryDetector`	First line of a paragraph or block.
`"end_line"`	`int`	`BoundaryDetector`	Last line of a paragraph or block.
`"kind"`	`str`	`BoundaryDetector`	Block sub-type: `"paragraph"`, `"list-item"`, or `"table-row"`.

Raises ValueError if id is empty.

Methods

`walk() → List[Node]`

Return a flat list of this node and all descendants in pre-order (self first, then children recursively).

all_nodes = doc.walk()
phrases = [n for n in all_nodes if n.level == Level.PHRASE]

`by_level(level) → List[Node]`

Shorthand for filtering walk() by level.

Parameter	Type	Description
`level`	`Level`	The level to filter by.

paragraphs = doc.by_level(Level.PARAGRAPH)

`Pattern`

Dataclass representing a discovered structural pattern.

Constructor

Field	Type	Default	Description
`id`	`str`	required	Pattern identifier, e.g. `"P-3-1292"`. Must be non-empty.
`level`	`Level`	required	The level at which this pattern appears.
`selector`	`Dict[str, Any]`	required	DSL query descriptor for matching, e.g. `{"engine": "dsl", "query": "level==3 and sig=='S'"}`.
`structure`	`Dict[str, Any]`	required	Contains key `"signature"` — the structural shape string.
`support`	`int`	required	Number of occurrences found. Must be `>= 0`.
`examples`	`List[Dict[str, Any]]`	`[]`	Up to 3 representative instances — see below.
`centroid`	`List[str]`	`[]`	`structure["signature"]` split into a list of tokens.

Example dict shape

Key	Type	Description
`"source"`	`str`	Origin file path.
`"lines"`	`List[int, int]`	`[start_line, end_line]` in the source.
`"excerpt"`	`str`	First 200 characters of the node's text.

Signature tokens

Token	Appears in	Meaning
`"EMPTY"`	leaf	0 words
`"XS"`	leaf	1–4 words
`"S"`	leaf or child label	5–11 words / LINE child
`"M"`	leaf	12–24 words
`"L"`	leaf	25+ words
`"P"`	child label	PHRASE child
`"PARA"`	child label	PARAGRAPH child
`"CHUNK"`	child label	CHUNK child
`"SEC"`	child label	SECTION child

Raises ValueError if id is empty or support < 0.

Example

for p in lib.patterns:
    sig = p.structure["signature"]
    print(f"{p.id}  level={int(p.level)}  sig={sig!r}  n={p.support}")
    for ex in p.examples:
        print(f"  {ex['source']}:{ex['lines']}  {ex['excerpt'][:60]}")

`PatternLibrary`

Ordered collection of Pattern objects with provenance metadata.

Constructor

Field	Type	Default	Description
`patterns`	`List[Pattern]`	`[]`	Discovered patterns in insertion order.
`meta`	`Dict[str, Any]`	`{}`	Provenance metadata, e.g. `{"min_support": 2, "scope": "corpus"}`.

Methods

`add(p) → None`

Append a pattern to the library.

Parameter	Type	Description
`p`	`Pattern`	Pattern to add.

Example

# Filter patterns above a support threshold
strong = PatternLibrary(meta=lib.meta)
for p in lib.patterns:
    if p.support >= 10:
        strong.add(p)

Pipeline stages

The three stages below are composed by run_pipeline(). Use them directly when you need access to intermediate AST representations.

`Normalizer`

from mplm.normalizer import Normalizer

Convert raw text into a rough AST. This is a pure transformation — no file I/O.

`parse(text, source) → Node`

Parameter	Type	Default	Description
`text`	`str`	required	Raw file content.
`source`	`str`	`"<mem>"`	Origin label written into `node.attrs["source"]`.

Returns A Level.DOCUMENT Node whose children are Level.SECTION nodes, each containing raw Level.LINE children.

from mplm.normalizer import Normalizer

norm = Normalizer()
doc = norm.parse("# Intro\n\nFirst paragraph.", source="draft.md")
sections = doc.by_level(Level.SECTION)

`BoundaryDetector`

from mplm.boundary import BoundaryDetector

Refine the AST produced by Normalizer into block, sentence, and phrase nodes. This is a pure transformation — the input Node tree is not mutated.

`detect(doc) → Node`

Parameter	Type	Description
`doc`	`Node`	A `Level.DOCUMENT` node from `Normalizer.parse()`.

Returns A new Level.DOCUMENT node whose SECTION children contain Level.PARAGRAPH / Level.LINE blocks, each split into Level.LINE sentences and Level.PHRASE phrases.

from mplm.boundary import BoundaryDetector

det = BoundaryDetector()
refined = det.detect(doc)           # doc is unchanged
paragraphs = refined.by_level(Level.PARAGRAPH)

Subclassing Override detect() to plug in a different splitting strategy (e.g. spaCy sentence segmentation):

class SpacyDetector(BoundaryDetector):
    def detect(self, doc: Node) -> Node:
        ...  # spaCy-powered implementation

`PatternMiner`

from mplm.miner import PatternMiner

Discover repeated structural patterns by grouping nodes that share the same (level, signature). This is a pure transformation — no I/O.

`PatternMiner(min_support)`

Parameter	Type	Default	Description
`min_support`	`int`	`2`	Groups smaller than this are discarded.

`mine(doc) → PatternLibrary`

Parameter	Type	Description
`doc`	`Node`	A refined `Level.DOCUMENT` node from `BoundaryDetector.detect()`.

Returns A PatternLibrary containing one Pattern per (level, signature) group that meets min_support.

from mplm.miner import PatternMiner

miner = PatternMiner(min_support=3)
lib = miner.mine(refined)
print(len(lib.patterns), "patterns found")

Serializer

from mplm.serializer import PatternRepository

Read and write PatternLibrary objects. Format is auto-detected from the file extension (.json → JSON, anything else → YAML).

`PatternRepository`

`dump(lib, path) → None`

Parameter	Type	Description
`lib`	`PatternLibrary`	The library to write.
`path`	`str`	Destination path.

`load(path) → PatternLibrary`

Parameter	Type	Description
`path`	`str`	Source path.

Returns A fully populated PatternLibrary.

from mplm.serializer import PatternRepository

repo = PatternRepository()
repo.dump(lib, "patterns.yml")
loaded = repo.load("patterns.yml")
assert loaded.meta == lib.meta

CLI

mplm [--log-level LEVEL] COMMAND [OPTIONS]

Global option

Option	Values	Default	Description
`--log-level`	`DEBUG` `INFO` `WARNING` `ERROR`	`WARNING`	Logging verbosity. Set to `INFO` to see pipeline stage timings.

`mplm mine PATH`

Mine patterns from PATH and write to an output file.

mplm mine examples/example1.md --out patterns.yml --min-support 2
mplm --log-level INFO mine corpus/doc.md --out out.yml

Option	Type	Default	Description
`--out`	path	`patterns.yml`	Output file. Extension selects format (`.yml` or `.json`).
`--min-support`	int	`2`	Minimum occurrence count.

`mplm preview PATH`

Print a hierarchical AST outline to stdout — useful for verifying that boundaries are detected correctly before mining.

mplm preview examples/example1.md

Output format: - L{level} {node_id}: {first 60 chars of text}

`mplm validate-index SPEC_PATH DRAFTS_ROOT`

Validate a whole folder of drafts and write one aggregate report bundle.

mplm validate-index compiled-pattern-spec.yml drafts \
  --template-id tmpl-l5-para-para \
  --out validation-index.json

When --out ends in .json, the command writes a JSON bundle plus one linked JSON detail report per draft.

`mplm render-dashboard [JSON_PATH]`

Render a static HTML dashboard in either of two modes:

Bundle mode: read an existing validation index JSON bundle.
Direct mode: read a compiled spec plus a drafts folder and validate on the fly.

mplm render-dashboard validation-index.json --out validation-dashboard.html
mplm render-dashboard validation-index.json --out dashboards/review.html --title "Template Review"
mplm render-dashboard \
  --spec-path compiled-pattern-spec.yml \
  --drafts-root drafts \
  --template-id tmpl-l5-para-para \
  --out validation-dashboard.html

Option	Type	Default	Description
`--out`	path	`validation-dashboard.html`	HTML dashboard output path.
`--title`	str	`None`	Optional dashboard title override.
`--spec-path`	path	`None`	Compiled spec path for direct mode.
`--drafts-root`	path	`None`	Draft folder for direct mode.
`--contract-id`	str	`None`	Optional validation contract ID for direct mode.
`--template-id`	str	`None`	Optional template ID for direct mode.
`--exts`	str	`md,txt`	Draft file extensions for direct mode.
`--recursive / --no-recursive`	flag	`recursive`	Whether to recurse under `--drafts-root`.

In bundle mode, if the dashboard is written to a different directory than the source JSON, detail-report links are automatically rebased so the deep links still work. The rendered dashboard also includes client-side CSV export for the currently filtered draft queue.

Exceptions

Exception	Raised by	Condition
`ValueError`	`Node.__init__`	`id` is an empty string
`ValueError`	`Node.__init__`	`level` cannot be coerced to `Level`
`ValueError`	`Pattern.__init__`	`id` is an empty string
`ValueError`	`Pattern.__init__`	`support < 0`
`FileNotFoundError`	`load_text()`	`path` does not exist