Pattern Library Generator – User Guide

Overview

generate_pattern_library_md.py converts the YAML output from your pattern miner into a human-readable Pattern Library in Markdown.

What it does: - Creates one combined Pattern Library (Markdown) and/or separate Markdown files for each level. - Groups patterns by hierarchical level (macro to micro structure). - Shows how each pattern is composed of lower-level patterns. - For the smallest patterns, lists most common words in examples. - Cross-links related patterns between levels. - Embeds an optional Mermaid diagram of the hierarchy.

This lets you move from raw pattern mining output to something you can actually read, navigate, and use for content standardization, analysis, or redesign.


1. Prerequisites

You need: - Python 3.8 or newer - PyYAML: bash pip install pyyaml - A YAML file from the pattern miner (e.g., corpus_mine.py), which can be mined from .md or .txt files.


2. Running the Script

Single Combined Markdown Output

python3 generate_pattern_library_md.py \
    --in /path/to/patterns.yml \
    --out /path/to/pattern-library.md \
    --top-k 5 \
    --lowest-words 12 \
    --mermaid

Separate Files Per Level

python3 generate_pattern_library_md.py \
    --in /path/to/patterns.yml \
    --out /path/to/output-folder \
    --split \
    --mermaid

This creates level-1.md, level-2.md, etc., in the specified folder.


3. Arguments

Argument Required Description
--in Path to mined YAML patterns file.
--out Path to output file (or folder if --split is used).
--split If set, generates separate per-level Markdown files instead of one combined file.
--top-k Number of lower-level patterns to cross-reference for each higher-level token (default: 5).
--lowest-words Number of top words to list for level-1 patterns (default: 12).
--mermaid If set, embeds a Mermaid diagram showing the level hierarchy.

4. Output Structure

4.1 Meta Section

Displays metadata from the miner output (e.g., min_support, mining parameters).

4.2 Table of Contents

Links to all patterns grouped by level.

4.3 Per-Level Sections

Each level gets: - Pattern ID (P-<level>-<id>) - Level - Signature (structural shape, e.g., PARA PARA) - Support (# of occurrences) - Composition (links to lower-level patterns) - Top Terms (for level-1 patterns) - Examples with excerpts and/or locations.


5. The Mermaid Diagram

If --mermaid is used, the output starts with:

flowchart TD
  L6[Level 6: Document] --> L5[Level 5: Section]
  L5 --> L4[Level 4: Chunk]
  L4 --> L3[Level 3: Block]
  L3 --> L2[Level 2: Sentence / Line]
  L2 --> L1[Level 1: Phrase]

You can paste this into a Mermaid live editor (or GitHub Markdown viewer with Mermaid support) to see the hierarchy visually.


6. How to Read the Pattern Library

  1. Start from Level 1 — See the most atomic recurring units (common phrases, word sequences).
  2. Go up levels — See how these smaller patterns combine into larger ones (sentences → paragraphs → sections).
  3. Check Support counts — High support means the pattern repeats often (candidate for templates).
  4. Use Composition links — Jump between parent and child patterns to trace reuse.
  5. Look at Examples — Understand how patterns appear in real content.

7. Insights You Can Get

  • Structural reuse — Which document shapes repeat across the corpus.
  • Content standardization — Where templates would reduce inconsistency.
  • Domain lexicon — Common terms in atomic patterns show the domain’s vocabulary.
  • Pattern gaps — Missing structures may indicate weak or inconsistent documentation.

8. Example Flow

  1. Mining: bash python3 corpus_mine.py /docs-folder /output/patterns.yml
  2. Library Generation: bash python3 generate_pattern_library_md.py \ --in /output/patterns.yml \ --out /output/library.md \ --mermaid
  3. Review Output:
  4. Scroll to Level 1 for building blocks.
  5. Use Mermaid chart to orient yourself in the hierarchy.
  6. Follow cross-links to see pattern usage.