Pattern Library Generator – User Guide
Overview
generate_pattern_library_md.py converts the YAML output from your pattern miner into a human-readable Pattern Library in Markdown.
What it does: - Creates one combined Pattern Library (Markdown) and/or separate Markdown files for each level. - Groups patterns by hierarchical level (macro to micro structure). - Shows how each pattern is composed of lower-level patterns. - For the smallest patterns, lists most common words in examples. - Cross-links related patterns between levels. - Embeds an optional Mermaid diagram of the hierarchy.
This lets you move from raw pattern mining output to something you can actually read, navigate, and use for content standardization, analysis, or redesign.
1. Prerequisites
You need:
- Python 3.8 or newer
- PyYAML:
bash
pip install pyyaml
- A YAML file from the pattern miner (e.g., corpus_mine.py), which can be mined from .md or .txt files.
2. Running the Script
Single Combined Markdown Output
python3 generate_pattern_library_md.py \
--in /path/to/patterns.yml \
--out /path/to/pattern-library.md \
--top-k 5 \
--lowest-words 12 \
--mermaid
Separate Files Per Level
python3 generate_pattern_library_md.py \
--in /path/to/patterns.yml \
--out /path/to/output-folder \
--split \
--mermaid
This creates level-1.md, level-2.md, etc., in the specified folder.
3. Arguments
| Argument | Required | Description |
|---|---|---|
--in |
✅ | Path to mined YAML patterns file. |
--out |
✅ | Path to output file (or folder if --split is used). |
--split |
❌ | If set, generates separate per-level Markdown files instead of one combined file. |
--top-k |
❌ | Number of lower-level patterns to cross-reference for each higher-level token (default: 5). |
--lowest-words |
❌ | Number of top words to list for level-1 patterns (default: 12). |
--mermaid |
❌ | If set, embeds a Mermaid diagram showing the level hierarchy. |
4. Output Structure
4.1 Meta Section
Displays metadata from the miner output (e.g., min_support, mining parameters).
4.2 Table of Contents
Links to all patterns grouped by level.
4.3 Per-Level Sections
Each level gets:
- Pattern ID (P-<level>-<id>)
- Level
- Signature (structural shape, e.g., PARA PARA)
- Support (# of occurrences)
- Composition (links to lower-level patterns)
- Top Terms (for level-1 patterns)
- Examples with excerpts and/or locations.
5. The Mermaid Diagram
If --mermaid is used, the output starts with:
flowchart TD
L6[Level 6: Document] --> L5[Level 5: Section]
L5 --> L4[Level 4: Chunk]
L4 --> L3[Level 3: Block]
L3 --> L2[Level 2: Sentence / Line]
L2 --> L1[Level 1: Phrase]
You can paste this into a Mermaid live editor (or GitHub Markdown viewer with Mermaid support) to see the hierarchy visually.
6. How to Read the Pattern Library
- Start from Level 1 — See the most atomic recurring units (common phrases, word sequences).
- Go up levels — See how these smaller patterns combine into larger ones (sentences → paragraphs → sections).
- Check Support counts — High support means the pattern repeats often (candidate for templates).
- Use Composition links — Jump between parent and child patterns to trace reuse.
- Look at Examples — Understand how patterns appear in real content.
7. Insights You Can Get
- Structural reuse — Which document shapes repeat across the corpus.
- Content standardization — Where templates would reduce inconsistency.
- Domain lexicon — Common terms in atomic patterns show the domain’s vocabulary.
- Pattern gaps — Missing structures may indicate weak or inconsistent documentation.
8. Example Flow
- Mining:
bash python3 corpus_mine.py /docs-folder /output/patterns.yml - Library Generation:
bash python3 generate_pattern_library_md.py \ --in /output/patterns.yml \ --out /output/library.md \ --mermaid - Review Output:
- Scroll to Level 1 for building blocks.
- Use Mermaid chart to orient yourself in the hierarchy.
- Follow cross-links to see pattern usage.