Manual Pattern Library Analysis Workflow
The goal is to go from "cryptic DSL signatures" like level==2 and sig=='P P P' to something that explains:
- What structure the pattern describes
- How often it occurs in the corpus
- What role it plays in the document set
1. Understand the Data Structure
Your pattern inventory (YAML or MD) contains repeating blocks like this:
- id: P-2-2119
level: 2
selector:
engine: dsl
query: level==2 and sig=='P P P'
structure:
signature: P P P
support: 4
examples:
- excerpt: "..."
centroid:
- P
- P
- P
Here's what each field means:
- id -- Unique identifier for this pattern (P-[level]-[hash]).
- level -- The hierarchy level (1=Phrase, 2=Sentence/Line, 3=Paragraph/List/Table, 4=Chunk, 5=Section, 6=Document).
- signature -- The sequence of lower-level units that make up this pattern.
Example: P P P means "three paragraphs in a row."
- support -- How many times this exact structure appears across the corpus.
- examples -- Snippets from the corpus showing where this structure occurs.
- centroid -- The abstracted list of child elements for the pattern.
2. Decode the Signatures
You'll need to map the cryptic tokens (P, S, PARA, L, etc.) to human-readable terms.
Example map:
| Token | Meaning |
|-------|---------|
| P or PARA | Paragraph |
| S | Sentence |
| XS | Short phrase |
| L | List |
| M | Metadata |
| CHUNK | Chunk of related blocks |
| SEC | Section |
3. Build a Human-Readable Table
Go pattern by pattern and make a spreadsheet or Markdown table:
| Pattern ID | Level | Signature | Support | Template Name |
|---|---|---|---|---|
| P-2-2119 | 2 | P P P | 4 | Three-paragraph sequence |
| P-3-2617 | 3 | S | 108 | Single-sentence block |
| P-5-3244 | 5 | PARA × 8 | 3 | Section with eight short paragraphs |
This makes it much easier to scan and identify recurring document structures.
4. Look for Dominant Patterns
Sort by Support to find the most common structures.
Example insights:
- A very high support count for P at Level 2 means the docs lean heavily on single-paragraph sentences.
- Frequent Level 5 patterns (Section) made up of PARA PARA PARA suggest standard section layouts.
5. Identify Hierarchical Composition
Patterns at higher levels are made from patterns at lower levels.
To manually trace this:
1. Pick a high-level pattern (e.g., Level 5: PARA PARA PARA).
2. Look up what PARA maps to at Level 3.
3. Continue down until you hit Level 1 phrases.
This shows structural nesting, which reveals the blueprint of your corpus.
6. Interpret the Function of the Patterns
Ask: - What problem does this structure solve? - Example: A "Section → Multiple short paragraphs" pattern makes information scannable. - How does it fit the content domain? - Technical docs often solve "reader needs quick facts" by using repetitive, predictable patterns. - What variations exist? - Does the same topic get presented with 2 paragraphs in one doc but 5 in another? Why?
7. Create Human-Readable Templates
Once you understand the patterns, you can write templates:
Example:
## Section: VM Sizes Supported
- Intro paragraph summarizing purpose
- List of supported configurations
- One or more explanatory paragraphs
- Optional table for reference
Types of Insight You Get
- Structural consistency -- Whether your docs follow a standard composition.
- Pattern dominance -- Which layouts occur most often.
- Information density -- Short vs. long paragraph chains, sentence counts.
- Opportunities for reuse -- Frequent patterns can be turned into authoring templates.
- Reader experience clues -- Consistent patterns suggest intentional design for scannability, navigation, or comprehension.