API Reference
All public symbols are importable from the top-level package unless otherwise noted.
from mplm import (
run_pipeline,
compile_pattern_spec,
validate_draft,
validate_draft_collection,
render_validation_dashboard_html,
save_validation_dashboard_html,
render_validation_index_markdown,
render_validation_index_json,
save_validation_index,
save_validation_index_bundle,
render_validation_report_markdown,
render_validation_report_json,
save_validation_report,
Node,
Pattern,
PatternLibrary,
Level,
)
from mplm.io import load_text, save_library, load_library, load_compiled_spec, save_compiled_spec
from mplm.normalizer import Normalizer
from mplm.boundary import BoundaryDetector
from mplm.miner import PatternMiner
from mplm.compiler import compile_pattern_spec
from mplm.reports import (
render_validation_index_markdown,
render_validation_index_json,
save_validation_index,
save_validation_index_bundle,
render_validation_report_markdown,
render_validation_report_json,
save_validation_report,
)
from mplm.dashboard import (
render_validation_dashboard_html,
save_validation_dashboard_html,
)
from mplm.validator import (
validate_draft,
validate_draft_collection,
ValidationBatchResult,
ValidationResult,
)
from mplm.serializer import PatternRepository
Pipeline function
run_pipeline(text, source, min_support)
from mplm import run_pipeline
Orchestrate all three transformation stages in sequence and return discovered patterns. This is the primary entry point for programmatic use.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
text |
str |
— | Raw file content (Markdown or plain text). |
source |
str |
"<mem>" |
Human-readable origin label propagated into node attrs (e.g. a file path). |
min_support |
int |
2 |
Minimum number of occurrences a pattern must have to be included in the output. |
Returns PatternLibrary
Logs INFO-level structured messages at each stage boundary (enable with
logging.basicConfig(level=logging.INFO) or mplm --log-level INFO mine …).
Example
from mplm import run_pipeline
from mplm.io import load_text, save_library
text = load_text("corpus/chapter1.md")
lib = run_pipeline(text, source="chapter1.md", min_support=2)
save_library(lib, "patterns.yml")
for p in lib.patterns:
print(p.id, p.level, p.structure["signature"], p.support)
I/O functions
from mplm.io import load_text, save_library, load_library, load_compiled_spec, save_compiled_spec
All file-system access is isolated in this module so the pipeline stages remain pure functions.
load_text(path)
Read a text or Markdown file and return its raw content.
Parameters
| Name | Type | Description |
|---|---|---|
path |
str |
Absolute or relative path to the source file. |
Returns str — Full file content decoded as UTF-8.
Raises FileNotFoundError if path does not exist.
save_library(lib, path)
Write a PatternLibrary to disk. Format is selected by extension:
.json → JSON, anything else → YAML.
Parameters
| Name | Type | Description |
|---|---|---|
lib |
PatternLibrary |
The library to persist. |
path |
str |
Destination file path. |
load_library(path)
Read a PatternLibrary from disk using the repository loader.
| Parameter | Type | Description |
|---|---|---|
path |
str |
Source YAML or JSON path. |
Returns PatternLibrary
save_compiled_spec(spec, path)
Write a compiled LLM-ready bridge spec to disk. .json writes JSON;
everything else writes YAML.
| Parameter | Type | Description |
|---|---|---|
spec |
Mapping[str, Any] |
Compiler output dict. |
path |
str |
Destination YAML or JSON path. |
load_compiled_spec(path)
Read a compiled bridge spec from YAML or JSON.
| Parameter | Type | Description |
|---|---|---|
path |
str |
Source YAML or JSON path. |
Returns Dict[str, Any]
Compiler
compile_pattern_spec(lib, ..., top_k_children=3)
from mplm import compile_pattern_spec
Compile a mined PatternLibrary into the LLM-ready bridge schema. The
compiler fills factual fields deterministically and marks interpretive fields
for review.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
lib |
PatternLibrary |
— | Source mined library. |
pattern_library_path |
str |
"patterns.yml" |
Provenance path copied into the spec. |
corpus_label |
Optional[str] |
None |
Human-readable corpus label; defaults to the input filename stem. |
source_kind |
Optional[str] |
None |
Override for "single_document" or "corpus". |
title |
Optional[str] |
None |
Optional output title. |
intent |
Optional[str] |
None |
Optional top-level purpose string. |
mined_at |
Optional[str] |
None |
Optional timestamp string. |
top_k_children |
int |
3 |
Number of child-pattern candidates to attach to each structural slot. |
Returns Dict[str, Any]
Example
from mplm.compiler import compile_pattern_spec
from mplm.io import load_library, save_compiled_spec
lib = load_library("patterns.yml")
spec = compile_pattern_spec(lib, pattern_library_path="patterns.yml")
save_compiled_spec(spec, "compiled-pattern-spec.yml")
Validator
validate_draft(spec, draft_text, ..., contract_id=None, template_id=None)
from mplm import validate_draft
Validate generated or manually written text against a compiled spec's
validation_contracts. The validator parses the draft into the existing AST,
checks structural slot matches, checks lexical requirements, and checks policy
constraints such as example overlap.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
spec |
Mapping[str, Any] |
— | Compiled bridge spec. |
draft_text |
str |
— | Generated or authored text to validate. |
contract_id |
Optional[str] |
None |
Explicit validation contract ID. |
template_id |
Optional[str] |
None |
Template ID used to resolve the contract. |
source |
str |
"<draft>" |
Provenance label attached during parsing. |
Returns ValidationResult
ValidationResult.to_dict() returns a serialisable dict with overall score,
a dashboard-friendly numeric sort_key, category scores, a machine-readable
severity_summary, and per-check outcomes.
Example
from mplm.io import load_compiled_spec, load_text
from mplm.validator import validate_draft
spec = load_compiled_spec("compiled-pattern-spec.yml")
draft = load_text("draft.md")
result = validate_draft(spec, draft, template_id="tmpl-l5-para-para")
print(result.passed, result.score)
for check in result.checks:
print(check.id, check.passed, check.message)
validate_draft_collection(spec, drafts, ..., contract_id=None, template_id=None)
Validate a collection of drafts and return one ValidationBatchResult
containing per-draft results plus aggregate counts and a failure index.
drafts should be an iterable of (path, text) tuples.
from mplm.validator import validate_draft_collection
batch = validate_draft_collection(
spec,
[
("drafts/a.md", draft_a),
("drafts/b.md", draft_b),
],
template_id="tmpl-l5-para-para",
)
print(batch.total_drafts, batch.failed_drafts)
Reports
render_validation_index_markdown(result)
Render a ValidationBatchResult as a Markdown summary index.
render_validation_index_json(result)
Render a ValidationBatchResult as formatted JSON.
save_validation_index(result, path)
Write a batch validation summary index to disk. .json produces JSON; any
other extension produces Markdown.
save_validation_index_bundle(result, index_path, detail_reports_dir=None)
Write one aggregate validation index plus one detail report per draft. The detail reports are written into a sibling directory by default, and the index includes links to them.
For JSON bundles, the aggregate index includes:
detail_reports: per-draft detail-report pathsdetail_report_checks: per-draft, per-check anchor metadata and hrefsseverity_index: aggregate severity metadata for the batch
Each JSON detail report also includes:
index_path: backlink to the aggregate indexcheck_anchors: stable anchor metadata for each checkseverity_summary: per-draft severity counts, highest failed severity, and numeric severity ranks
render_validation_report_markdown(result)
Render a ValidationResult as a Markdown review report with failed checks
listed before passing checks.
render_validation_report_json(result)
Render a ValidationResult as formatted JSON.
save_validation_report(result, path)
Write a validation report to disk. .json produces JSON; any other extension
produces Markdown.
render_validation_dashboard_html(payload, title=None)
Render a static HTML dashboard from a validation index JSON payload. The
payload is expected to be the JSON written by
save_validation_index_bundle(..., "validation-index.json") or
render_validation_index_json(...).
save_validation_dashboard_html(payload, path, title=None)
Write the dashboard HTML document to disk.
from mplm.dashboard import save_validation_dashboard_html
from mplm.reports import save_validation_index_bundle
save_validation_index_bundle(batch, "validation-index.json")
save_validation_dashboard_html(payload, "validation-dashboard.html")
Data models
from mplm import Node, Pattern, PatternLibrary, Level
Level
IntEnum representing the six hierarchical depths of a document.
| Constant | Value | Meaning |
|---|---|---|
Level.PHRASE |
1 |
Comma/semicolon/colon clause — finest unit. |
Level.LINE |
2 |
Sentence or logical line. |
Level.PARAGRAPH |
3 |
Consecutive non-blank lines; also list items and table rows. |
Level.CHUNK |
4 |
Topic grouping (reserved for future use). |
Level.SECTION |
5 |
Heading-delimited block. |
Level.DOCUMENT |
6 |
Entire file — coarsest unit. |
Example
from mplm import Level
for p in lib.patterns:
if p.level == Level.PARAGRAPH:
print(p.structure["signature"])
Node
Dataclass representing a single structural unit at any Level.
Constructor
| Field | Type | Default | Description |
|---|---|---|---|
id |
str |
required | Unique identifier within the document tree. Must be non-empty. |
level |
Level |
required | Structural depth. Integer values are coerced to Level. |
text |
str |
required | Raw text content of the node. |
children |
List[Node] |
[] |
Ordered child nodes, one level finer than level. |
source_slice |
Optional[Tuple[int, int]] |
None |
(start_line, end_line) span in the source file. |
attrs |
Dict[str, Any] |
{} |
Stage-populated metadata — see table below. |
attrs keys
| Key | Type | Set by | Description |
|---|---|---|---|
"source" |
str |
Normalizer, BoundaryDetector |
Origin file path or label. |
"lineno" |
int |
Normalizer |
Line number of the raw line. |
"depth" |
int |
Normalizer |
Heading depth (# count) for SECTION nodes. |
"start_line" |
int |
BoundaryDetector |
First line of a paragraph or block. |
"end_line" |
int |
BoundaryDetector |
Last line of a paragraph or block. |
"kind" |
str |
BoundaryDetector |
Block sub-type: "paragraph", "list-item", or "table-row". |
Raises ValueError if id is empty.
Methods
walk() → List[Node]
Return a flat list of this node and all descendants in pre-order (self first, then children recursively).
all_nodes = doc.walk()
phrases = [n for n in all_nodes if n.level == Level.PHRASE]
by_level(level) → List[Node]
Shorthand for filtering walk() by level.
| Parameter | Type | Description |
|---|---|---|
level |
Level |
The level to filter by. |
paragraphs = doc.by_level(Level.PARAGRAPH)
Pattern
Dataclass representing a discovered structural pattern.
Constructor
| Field | Type | Default | Description |
|---|---|---|---|
id |
str |
required | Pattern identifier, e.g. "P-3-1292". Must be non-empty. |
level |
Level |
required | The level at which this pattern appears. |
selector |
Dict[str, Any] |
required | DSL query descriptor for matching, e.g. {"engine": "dsl", "query": "level==3 and sig=='S'"}. |
structure |
Dict[str, Any] |
required | Contains key "signature" — the structural shape string. |
support |
int |
required | Number of occurrences found. Must be >= 0. |
examples |
List[Dict[str, Any]] |
[] |
Up to 3 representative instances — see below. |
centroid |
List[str] |
[] |
structure["signature"] split into a list of tokens. |
Example dict shape
| Key | Type | Description |
|---|---|---|
"source" |
str |
Origin file path. |
"lines" |
List[int, int] |
[start_line, end_line] in the source. |
"excerpt" |
str |
First 200 characters of the node's text. |
Signature tokens
| Token | Appears in | Meaning |
|---|---|---|
"EMPTY" |
leaf | 0 words |
"XS" |
leaf | 1–4 words |
"S" |
leaf or child label | 5–11 words / LINE child |
"M" |
leaf | 12–24 words |
"L" |
leaf | 25+ words |
"P" |
child label | PHRASE child |
"PARA" |
child label | PARAGRAPH child |
"CHUNK" |
child label | CHUNK child |
"SEC" |
child label | SECTION child |
Raises ValueError if id is empty or support < 0.
Example
for p in lib.patterns:
sig = p.structure["signature"]
print(f"{p.id} level={int(p.level)} sig={sig!r} n={p.support}")
for ex in p.examples:
print(f" {ex['source']}:{ex['lines']} {ex['excerpt'][:60]}")
PatternLibrary
Ordered collection of Pattern objects with provenance metadata.
Constructor
| Field | Type | Default | Description |
|---|---|---|---|
patterns |
List[Pattern] |
[] |
Discovered patterns in insertion order. |
meta |
Dict[str, Any] |
{} |
Provenance metadata, e.g. {"min_support": 2, "scope": "corpus"}. |
Methods
add(p) → None
Append a pattern to the library.
| Parameter | Type | Description |
|---|---|---|
p |
Pattern |
Pattern to add. |
Example
# Filter patterns above a support threshold
strong = PatternLibrary(meta=lib.meta)
for p in lib.patterns:
if p.support >= 10:
strong.add(p)
Pipeline stages
The three stages below are composed by run_pipeline(). Use them directly
when you need access to intermediate AST representations.
Normalizer
from mplm.normalizer import Normalizer
Convert raw text into a rough AST. This is a pure transformation — no file I/O.
parse(text, source) → Node
| Parameter | Type | Default | Description |
|---|---|---|---|
text |
str |
required | Raw file content. |
source |
str |
"<mem>" |
Origin label written into node.attrs["source"]. |
Returns A Level.DOCUMENT Node whose children are Level.SECTION nodes,
each containing raw Level.LINE children.
from mplm.normalizer import Normalizer
norm = Normalizer()
doc = norm.parse("# Intro\n\nFirst paragraph.", source="draft.md")
sections = doc.by_level(Level.SECTION)
BoundaryDetector
from mplm.boundary import BoundaryDetector
Refine the AST produced by Normalizer into block, sentence, and phrase nodes.
This is a pure transformation — the input Node tree is not mutated.
detect(doc) → Node
| Parameter | Type | Description |
|---|---|---|
doc |
Node |
A Level.DOCUMENT node from Normalizer.parse(). |
Returns A new Level.DOCUMENT node whose SECTION children contain
Level.PARAGRAPH / Level.LINE blocks, each split into Level.LINE
sentences and Level.PHRASE phrases.
from mplm.boundary import BoundaryDetector
det = BoundaryDetector()
refined = det.detect(doc) # doc is unchanged
paragraphs = refined.by_level(Level.PARAGRAPH)
Subclassing Override detect() to plug in a different splitting strategy
(e.g. spaCy sentence segmentation):
class SpacyDetector(BoundaryDetector):
def detect(self, doc: Node) -> Node:
... # spaCy-powered implementation
PatternMiner
from mplm.miner import PatternMiner
Discover repeated structural patterns by grouping nodes that share the same
(level, signature). This is a pure transformation — no I/O.
PatternMiner(min_support)
| Parameter | Type | Default | Description |
|---|---|---|---|
min_support |
int |
2 |
Groups smaller than this are discarded. |
mine(doc) → PatternLibrary
| Parameter | Type | Description |
|---|---|---|
doc |
Node |
A refined Level.DOCUMENT node from BoundaryDetector.detect(). |
Returns A PatternLibrary containing one Pattern per (level, signature)
group that meets min_support.
from mplm.miner import PatternMiner
miner = PatternMiner(min_support=3)
lib = miner.mine(refined)
print(len(lib.patterns), "patterns found")
Serializer
from mplm.serializer import PatternRepository
Read and write PatternLibrary objects. Format is auto-detected from the
file extension (.json → JSON, anything else → YAML).
PatternRepository
dump(lib, path) → None
| Parameter | Type | Description |
|---|---|---|
lib |
PatternLibrary |
The library to write. |
path |
str |
Destination path. |
load(path) → PatternLibrary
| Parameter | Type | Description |
|---|---|---|
path |
str |
Source path. |
Returns A fully populated PatternLibrary.
from mplm.serializer import PatternRepository
repo = PatternRepository()
repo.dump(lib, "patterns.yml")
loaded = repo.load("patterns.yml")
assert loaded.meta == lib.meta
CLI
mplm [--log-level LEVEL] COMMAND [OPTIONS]
Global option
| Option | Values | Default | Description |
|---|---|---|---|
--log-level |
DEBUG INFO WARNING ERROR |
WARNING |
Logging verbosity. Set to INFO to see pipeline stage timings. |
mplm mine PATH
Mine patterns from PATH and write to an output file.
mplm mine examples/example1.md --out patterns.yml --min-support 2
mplm --log-level INFO mine corpus/doc.md --out out.yml
| Option | Type | Default | Description |
|---|---|---|---|
--out |
path | patterns.yml |
Output file. Extension selects format (.yml or .json). |
--min-support |
int | 2 |
Minimum occurrence count. |
mplm preview PATH
Print a hierarchical AST outline to stdout — useful for verifying that boundaries are detected correctly before mining.
mplm preview examples/example1.md
Output format: - L{level} {node_id}: {first 60 chars of text}
mplm validate-index SPEC_PATH DRAFTS_ROOT
Validate a whole folder of drafts and write one aggregate report bundle.
mplm validate-index compiled-pattern-spec.yml drafts \
--template-id tmpl-l5-para-para \
--out validation-index.json
When --out ends in .json, the command writes a JSON bundle plus one linked
JSON detail report per draft.
mplm render-dashboard [JSON_PATH]
Render a static HTML dashboard in either of two modes:
- Bundle mode: read an existing validation index JSON bundle.
- Direct mode: read a compiled spec plus a drafts folder and validate on the fly.
mplm render-dashboard validation-index.json --out validation-dashboard.html
mplm render-dashboard validation-index.json --out dashboards/review.html --title "Template Review"
mplm render-dashboard \
--spec-path compiled-pattern-spec.yml \
--drafts-root drafts \
--template-id tmpl-l5-para-para \
--out validation-dashboard.html
| Option | Type | Default | Description |
|---|---|---|---|
--out |
path | validation-dashboard.html |
HTML dashboard output path. |
--title |
str | None |
Optional dashboard title override. |
--spec-path |
path | None |
Compiled spec path for direct mode. |
--drafts-root |
path | None |
Draft folder for direct mode. |
--contract-id |
str | None |
Optional validation contract ID for direct mode. |
--template-id |
str | None |
Optional template ID for direct mode. |
--exts |
str | md,txt |
Draft file extensions for direct mode. |
--recursive / --no-recursive |
flag | recursive |
Whether to recurse under --drafts-root. |
In bundle mode, if the dashboard is written to a different directory than the source JSON, detail-report links are automatically rebased so the deep links still work. The rendered dashboard also includes client-side CSV export for the currently filtered draft queue.
Exceptions
| Exception | Raised by | Condition |
|---|---|---|
ValueError |
Node.__init__ |
id is an empty string |
ValueError |
Node.__init__ |
level cannot be coerced to Level |
ValueError |
Pattern.__init__ |
id is an empty string |
ValueError |
Pattern.__init__ |
support < 0 |
FileNotFoundError |
load_text() |
path does not exist |