API Reference

All public symbols are importable from the top-level package unless otherwise noted.

from mplm import (
    run_pipeline,
    compile_pattern_spec,
    validate_draft,
    validate_draft_collection,
    render_validation_dashboard_html,
    save_validation_dashboard_html,
    render_validation_index_markdown,
    render_validation_index_json,
    save_validation_index,
    save_validation_index_bundle,
    render_validation_report_markdown,
    render_validation_report_json,
    save_validation_report,
    Node,
    Pattern,
    PatternLibrary,
    Level,
)
from mplm.io import load_text, save_library, load_library, load_compiled_spec, save_compiled_spec
from mplm.normalizer import Normalizer
from mplm.boundary import BoundaryDetector
from mplm.miner import PatternMiner
from mplm.compiler import compile_pattern_spec
from mplm.reports import (
    render_validation_index_markdown,
    render_validation_index_json,
    save_validation_index,
    save_validation_index_bundle,
    render_validation_report_markdown,
    render_validation_report_json,
    save_validation_report,
)
from mplm.dashboard import (
    render_validation_dashboard_html,
    save_validation_dashboard_html,
)
from mplm.validator import (
    validate_draft,
    validate_draft_collection,
    ValidationBatchResult,
    ValidationResult,
)
from mplm.serializer import PatternRepository

Pipeline function

run_pipeline(text, source, min_support)

from mplm import run_pipeline

Orchestrate all three transformation stages in sequence and return discovered patterns. This is the primary entry point for programmatic use.

Parameters

Name Type Default Description
text str Raw file content (Markdown or plain text).
source str "<mem>" Human-readable origin label propagated into node attrs (e.g. a file path).
min_support int 2 Minimum number of occurrences a pattern must have to be included in the output.

Returns PatternLibrary

Logs INFO-level structured messages at each stage boundary (enable with logging.basicConfig(level=logging.INFO) or mplm --log-level INFO mine …).

Example

from mplm import run_pipeline
from mplm.io import load_text, save_library

text = load_text("corpus/chapter1.md")
lib = run_pipeline(text, source="chapter1.md", min_support=2)
save_library(lib, "patterns.yml")

for p in lib.patterns:
    print(p.id, p.level, p.structure["signature"], p.support)

I/O functions

from mplm.io import load_text, save_library, load_library, load_compiled_spec, save_compiled_spec

All file-system access is isolated in this module so the pipeline stages remain pure functions.

load_text(path)

Read a text or Markdown file and return its raw content.

Parameters

Name Type Description
path str Absolute or relative path to the source file.

Returns str — Full file content decoded as UTF-8.

Raises FileNotFoundError if path does not exist.


save_library(lib, path)

Write a PatternLibrary to disk. Format is selected by extension: .json → JSON, anything else → YAML.

Parameters

Name Type Description
lib PatternLibrary The library to persist.
path str Destination file path.

load_library(path)

Read a PatternLibrary from disk using the repository loader.

Parameter Type Description
path str Source YAML or JSON path.

Returns PatternLibrary


save_compiled_spec(spec, path)

Write a compiled LLM-ready bridge spec to disk. .json writes JSON; everything else writes YAML.

Parameter Type Description
spec Mapping[str, Any] Compiler output dict.
path str Destination YAML or JSON path.

load_compiled_spec(path)

Read a compiled bridge spec from YAML or JSON.

Parameter Type Description
path str Source YAML or JSON path.

Returns Dict[str, Any]


Compiler

compile_pattern_spec(lib, ..., top_k_children=3)

from mplm import compile_pattern_spec

Compile a mined PatternLibrary into the LLM-ready bridge schema. The compiler fills factual fields deterministically and marks interpretive fields for review.

Parameters

Name Type Default Description
lib PatternLibrary Source mined library.
pattern_library_path str "patterns.yml" Provenance path copied into the spec.
corpus_label Optional[str] None Human-readable corpus label; defaults to the input filename stem.
source_kind Optional[str] None Override for "single_document" or "corpus".
title Optional[str] None Optional output title.
intent Optional[str] None Optional top-level purpose string.
mined_at Optional[str] None Optional timestamp string.
top_k_children int 3 Number of child-pattern candidates to attach to each structural slot.

Returns Dict[str, Any]

Example

from mplm.compiler import compile_pattern_spec
from mplm.io import load_library, save_compiled_spec

lib = load_library("patterns.yml")
spec = compile_pattern_spec(lib, pattern_library_path="patterns.yml")
save_compiled_spec(spec, "compiled-pattern-spec.yml")

Validator

validate_draft(spec, draft_text, ..., contract_id=None, template_id=None)

from mplm import validate_draft

Validate generated or manually written text against a compiled spec's validation_contracts. The validator parses the draft into the existing AST, checks structural slot matches, checks lexical requirements, and checks policy constraints such as example overlap.

Parameters

Name Type Default Description
spec Mapping[str, Any] Compiled bridge spec.
draft_text str Generated or authored text to validate.
contract_id Optional[str] None Explicit validation contract ID.
template_id Optional[str] None Template ID used to resolve the contract.
source str "<draft>" Provenance label attached during parsing.

Returns ValidationResult

ValidationResult.to_dict() returns a serialisable dict with overall score, a dashboard-friendly numeric sort_key, category scores, a machine-readable severity_summary, and per-check outcomes.

Example

from mplm.io import load_compiled_spec, load_text
from mplm.validator import validate_draft

spec = load_compiled_spec("compiled-pattern-spec.yml")
draft = load_text("draft.md")
result = validate_draft(spec, draft, template_id="tmpl-l5-para-para")

print(result.passed, result.score)
for check in result.checks:
    print(check.id, check.passed, check.message)

validate_draft_collection(spec, drafts, ..., contract_id=None, template_id=None)

Validate a collection of drafts and return one ValidationBatchResult containing per-draft results plus aggregate counts and a failure index.

drafts should be an iterable of (path, text) tuples.

from mplm.validator import validate_draft_collection

batch = validate_draft_collection(
    spec,
    [
        ("drafts/a.md", draft_a),
        ("drafts/b.md", draft_b),
    ],
    template_id="tmpl-l5-para-para",
)

print(batch.total_drafts, batch.failed_drafts)

Reports

render_validation_index_markdown(result)

Render a ValidationBatchResult as a Markdown summary index.

render_validation_index_json(result)

Render a ValidationBatchResult as formatted JSON.

save_validation_index(result, path)

Write a batch validation summary index to disk. .json produces JSON; any other extension produces Markdown.

save_validation_index_bundle(result, index_path, detail_reports_dir=None)

Write one aggregate validation index plus one detail report per draft. The detail reports are written into a sibling directory by default, and the index includes links to them.

For JSON bundles, the aggregate index includes:

  • detail_reports: per-draft detail-report paths
  • detail_report_checks: per-draft, per-check anchor metadata and hrefs
  • severity_index: aggregate severity metadata for the batch

Each JSON detail report also includes:

  • index_path: backlink to the aggregate index
  • check_anchors: stable anchor metadata for each check
  • severity_summary: per-draft severity counts, highest failed severity, and numeric severity ranks

render_validation_report_markdown(result)

Render a ValidationResult as a Markdown review report with failed checks listed before passing checks.

render_validation_report_json(result)

Render a ValidationResult as formatted JSON.

save_validation_report(result, path)

Write a validation report to disk. .json produces JSON; any other extension produces Markdown.

render_validation_dashboard_html(payload, title=None)

Render a static HTML dashboard from a validation index JSON payload. The payload is expected to be the JSON written by save_validation_index_bundle(..., "validation-index.json") or render_validation_index_json(...).

save_validation_dashboard_html(payload, path, title=None)

Write the dashboard HTML document to disk.

from mplm.dashboard import save_validation_dashboard_html
from mplm.reports import save_validation_index_bundle

save_validation_index_bundle(batch, "validation-index.json")
save_validation_dashboard_html(payload, "validation-dashboard.html")

Data models

from mplm import Node, Pattern, PatternLibrary, Level

Level

IntEnum representing the six hierarchical depths of a document.

Constant Value Meaning
Level.PHRASE 1 Comma/semicolon/colon clause — finest unit.
Level.LINE 2 Sentence or logical line.
Level.PARAGRAPH 3 Consecutive non-blank lines; also list items and table rows.
Level.CHUNK 4 Topic grouping (reserved for future use).
Level.SECTION 5 Heading-delimited block.
Level.DOCUMENT 6 Entire file — coarsest unit.

Example

from mplm import Level

for p in lib.patterns:
    if p.level == Level.PARAGRAPH:
        print(p.structure["signature"])

Node

Dataclass representing a single structural unit at any Level.

Constructor

Field Type Default Description
id str required Unique identifier within the document tree. Must be non-empty.
level Level required Structural depth. Integer values are coerced to Level.
text str required Raw text content of the node.
children List[Node] [] Ordered child nodes, one level finer than level.
source_slice Optional[Tuple[int, int]] None (start_line, end_line) span in the source file.
attrs Dict[str, Any] {} Stage-populated metadata — see table below.

attrs keys

Key Type Set by Description
"source" str Normalizer, BoundaryDetector Origin file path or label.
"lineno" int Normalizer Line number of the raw line.
"depth" int Normalizer Heading depth (# count) for SECTION nodes.
"start_line" int BoundaryDetector First line of a paragraph or block.
"end_line" int BoundaryDetector Last line of a paragraph or block.
"kind" str BoundaryDetector Block sub-type: "paragraph", "list-item", or "table-row".

Raises ValueError if id is empty.

Methods

walk() → List[Node]

Return a flat list of this node and all descendants in pre-order (self first, then children recursively).

all_nodes = doc.walk()
phrases = [n for n in all_nodes if n.level == Level.PHRASE]

by_level(level) → List[Node]

Shorthand for filtering walk() by level.

Parameter Type Description
level Level The level to filter by.
paragraphs = doc.by_level(Level.PARAGRAPH)

Pattern

Dataclass representing a discovered structural pattern.

Constructor

Field Type Default Description
id str required Pattern identifier, e.g. "P-3-1292". Must be non-empty.
level Level required The level at which this pattern appears.
selector Dict[str, Any] required DSL query descriptor for matching, e.g. {"engine": "dsl", "query": "level==3 and sig=='S'"}.
structure Dict[str, Any] required Contains key "signature" — the structural shape string.
support int required Number of occurrences found. Must be >= 0.
examples List[Dict[str, Any]] [] Up to 3 representative instances — see below.
centroid List[str] [] structure["signature"] split into a list of tokens.

Example dict shape

Key Type Description
"source" str Origin file path.
"lines" List[int, int] [start_line, end_line] in the source.
"excerpt" str First 200 characters of the node's text.

Signature tokens

Token Appears in Meaning
"EMPTY" leaf 0 words
"XS" leaf 1–4 words
"S" leaf or child label 5–11 words / LINE child
"M" leaf 12–24 words
"L" leaf 25+ words
"P" child label PHRASE child
"PARA" child label PARAGRAPH child
"CHUNK" child label CHUNK child
"SEC" child label SECTION child

Raises ValueError if id is empty or support < 0.

Example

for p in lib.patterns:
    sig = p.structure["signature"]
    print(f"{p.id}  level={int(p.level)}  sig={sig!r}  n={p.support}")
    for ex in p.examples:
        print(f"  {ex['source']}:{ex['lines']}  {ex['excerpt'][:60]}")

PatternLibrary

Ordered collection of Pattern objects with provenance metadata.

Constructor

Field Type Default Description
patterns List[Pattern] [] Discovered patterns in insertion order.
meta Dict[str, Any] {} Provenance metadata, e.g. {"min_support": 2, "scope": "corpus"}.

Methods

add(p) → None

Append a pattern to the library.

Parameter Type Description
p Pattern Pattern to add.

Example

# Filter patterns above a support threshold
strong = PatternLibrary(meta=lib.meta)
for p in lib.patterns:
    if p.support >= 10:
        strong.add(p)

Pipeline stages

The three stages below are composed by run_pipeline(). Use them directly when you need access to intermediate AST representations.

Normalizer

from mplm.normalizer import Normalizer

Convert raw text into a rough AST. This is a pure transformation — no file I/O.

parse(text, source) → Node

Parameter Type Default Description
text str required Raw file content.
source str "<mem>" Origin label written into node.attrs["source"].

Returns A Level.DOCUMENT Node whose children are Level.SECTION nodes, each containing raw Level.LINE children.

from mplm.normalizer import Normalizer

norm = Normalizer()
doc = norm.parse("# Intro\n\nFirst paragraph.", source="draft.md")
sections = doc.by_level(Level.SECTION)

BoundaryDetector

from mplm.boundary import BoundaryDetector

Refine the AST produced by Normalizer into block, sentence, and phrase nodes. This is a pure transformation — the input Node tree is not mutated.

detect(doc) → Node

Parameter Type Description
doc Node A Level.DOCUMENT node from Normalizer.parse().

Returns A new Level.DOCUMENT node whose SECTION children contain Level.PARAGRAPH / Level.LINE blocks, each split into Level.LINE sentences and Level.PHRASE phrases.

from mplm.boundary import BoundaryDetector

det = BoundaryDetector()
refined = det.detect(doc)           # doc is unchanged
paragraphs = refined.by_level(Level.PARAGRAPH)

Subclassing Override detect() to plug in a different splitting strategy (e.g. spaCy sentence segmentation):

class SpacyDetector(BoundaryDetector):
    def detect(self, doc: Node) -> Node:
        ...  # spaCy-powered implementation

PatternMiner

from mplm.miner import PatternMiner

Discover repeated structural patterns by grouping nodes that share the same (level, signature). This is a pure transformation — no I/O.

PatternMiner(min_support)

Parameter Type Default Description
min_support int 2 Groups smaller than this are discarded.

mine(doc) → PatternLibrary

Parameter Type Description
doc Node A refined Level.DOCUMENT node from BoundaryDetector.detect().

Returns A PatternLibrary containing one Pattern per (level, signature) group that meets min_support.

from mplm.miner import PatternMiner

miner = PatternMiner(min_support=3)
lib = miner.mine(refined)
print(len(lib.patterns), "patterns found")

Serializer

from mplm.serializer import PatternRepository

Read and write PatternLibrary objects. Format is auto-detected from the file extension (.json → JSON, anything else → YAML).

PatternRepository

dump(lib, path) → None

Parameter Type Description
lib PatternLibrary The library to write.
path str Destination path.

load(path) → PatternLibrary

Parameter Type Description
path str Source path.

Returns A fully populated PatternLibrary.

from mplm.serializer import PatternRepository

repo = PatternRepository()
repo.dump(lib, "patterns.yml")
loaded = repo.load("patterns.yml")
assert loaded.meta == lib.meta

CLI

mplm [--log-level LEVEL] COMMAND [OPTIONS]

Global option

Option Values Default Description
--log-level DEBUG INFO WARNING ERROR WARNING Logging verbosity. Set to INFO to see pipeline stage timings.

mplm mine PATH

Mine patterns from PATH and write to an output file.

mplm mine examples/example1.md --out patterns.yml --min-support 2
mplm --log-level INFO mine corpus/doc.md --out out.yml
Option Type Default Description
--out path patterns.yml Output file. Extension selects format (.yml or .json).
--min-support int 2 Minimum occurrence count.

mplm preview PATH

Print a hierarchical AST outline to stdout — useful for verifying that boundaries are detected correctly before mining.

mplm preview examples/example1.md

Output format: - L{level} {node_id}: {first 60 chars of text}


mplm validate-index SPEC_PATH DRAFTS_ROOT

Validate a whole folder of drafts and write one aggregate report bundle.

mplm validate-index compiled-pattern-spec.yml drafts \
  --template-id tmpl-l5-para-para \
  --out validation-index.json

When --out ends in .json, the command writes a JSON bundle plus one linked JSON detail report per draft.


mplm render-dashboard [JSON_PATH]

Render a static HTML dashboard in either of two modes:

  • Bundle mode: read an existing validation index JSON bundle.
  • Direct mode: read a compiled spec plus a drafts folder and validate on the fly.
mplm render-dashboard validation-index.json --out validation-dashboard.html
mplm render-dashboard validation-index.json --out dashboards/review.html --title "Template Review"
mplm render-dashboard \
  --spec-path compiled-pattern-spec.yml \
  --drafts-root drafts \
  --template-id tmpl-l5-para-para \
  --out validation-dashboard.html
Option Type Default Description
--out path validation-dashboard.html HTML dashboard output path.
--title str None Optional dashboard title override.
--spec-path path None Compiled spec path for direct mode.
--drafts-root path None Draft folder for direct mode.
--contract-id str None Optional validation contract ID for direct mode.
--template-id str None Optional template ID for direct mode.
--exts str md,txt Draft file extensions for direct mode.
--recursive / --no-recursive flag recursive Whether to recurse under --drafts-root.

In bundle mode, if the dashboard is written to a different directory than the source JSON, detail-report links are automatically rebased so the deep links still work. The rendered dashboard also includes client-side CSV export for the currently filtered draft queue.


Exceptions

Exception Raised by Condition
ValueError Node.__init__ id is an empty string
ValueError Node.__init__ level cannot be coerced to Level
ValueError Pattern.__init__ id is an empty string
ValueError Pattern.__init__ support < 0
FileNotFoundError load_text() path does not exist