Skip to content

Rules Reference

Rule fields

Field Type Required Description
id integer Yes Unique rule identifier (strings coerced to int)
name string Yes Human-readable rule name
type "header" or "body" Yes Whether to query metadata or document body
query string Yes Metadata key (header) or XPath expression (body)
flag string Yes Processing mode (see Flags table)
operation string Yes Comparison operator (see Operators table)
value string Yes Expected value; comma-separated for multi-value checks
level "Required" or "Suggested" No Default: "Required"
mitigation string No Remediation message shown on failure

level and CI exit codes: A Required rule that fails sets the CLI exit code to 1, blocking a CI merge gate. A Suggested rule that fails is reported but does not change the exit code. Use Suggested for style recommendations that should not block publication.


Flags

Flag Applies to Description
value header Evaluate the metadata value with operation and value
check header True if the key exists (ignores operation/value)
date header Compare metadata value as a date
pattern header Match metadata value against value as a regex
count body Count matching XPath nodes; compare with operation/value
text body Extract text of first matching XPath node
dom body Extract element tag names of matching nodes
all body Full plain-text content of the page

Flag examples

value — check that a metadata field equals an expected string:

{
  "id": 1, "name": "topic-must-be-tutorial", "type": "header",
  "query": "ms.topic", "flag": "value", "operation": "==", "value": "tutorial",
  "mitigation": "Set ms.topic: tutorial in the front matter."
}

check — verify a required metadata key is present (value doesn't matter):

{
  "id": 2, "name": "description-must-exist", "type": "header",
  "query": "description", "flag": "check", "operation": "==", "value": "true",
  "mitigation": "Add a description field to the front matter."
}

date — check that ms.date is no more than 365 days in the past. The value field is the maximum age in days; the operation is < (age must be less than the threshold):

{
  "id": 3, "name": "date-must-be-fresh", "type": "header",
  "query": "ms.date", "flag": "date", "operation": "<", "value": "365",
  "mitigation": "Update ms.date to a date within the last year."
}
The metadata value must be a date string parseable by Python's dateutil (e.g. 2026-01-15). The comparison is: (today − ms.date).days < int(value).

pattern — match the metadata value against a regex (alias for r on header fields):

{
  "id": 4, "name": "author-github-format", "type": "header",
  "query": "author", "flag": "pattern", "operation": "r",
  "value": "^[a-z0-9-]+$",
  "mitigation": "author must be a lowercase GitHub username (letters, numbers, hyphens)."
}

count — count matching XPath nodes:

{
  "id": 5, "name": "must-have-exactly-one-h1", "type": "body",
  "query": "/html/body/h1", "flag": "count", "operation": "==", "value": "1",
  "mitigation": "The document must have exactly one H1 heading."
}

text — extract the text content of the first matching XPath node:

{
  "id": 6, "name": "h1-must-start-with-tutorial", "type": "body",
  "query": "/html/body/h1", "flag": "text", "operation": "[:","value": "Tutorial:",
  "mitigation": "H1 must begin with 'Tutorial:' for tutorial articles."
}

dom — extract element tag names (useful for checking heading order):

{
  "id": 7, "name": "first-heading-after-h1-must-be-h2", "type": "body",
  "query": "/html/body/*[2]", "flag": "dom", "operation": "==", "value": "h2",
  "mitigation": "The second element in the body must be an H2."
}

all — run a check against the entire plain-text body:

{
  "id": 8, "name": "body-must-not-be-empty", "type": "body",
  "query": "/html/body", "flag": "all", "operation": ">", "value": "0",
  "mitigation": "The document body must not be empty."
}


Operators

Token Name Notes
== Equal Strips whitespace
!= Not equal Strips whitespace
> Greater than Numeric comparison
< Less than Numeric comparison
[] Contains Case-insensitive
[: Starts with
:] Ends with
r Regex match Python re.search, DOTALL mode
l Length limit len(result) < int(value) — True if shorter than threshold
s Max sentences sentence_count <= int(value) — requires NLTK
p<N> Part of speech p1 = first word POS tag; uses Penn Treebank tags

Operator examples for non-obvious operators

r (regex) — check that the title matches a pattern:

{
  "flag": "text", "operation": "r", "value": "^Tutorial: [A-Z]",
  "mitigation": "Title must match 'Tutorial: ' followed by a capital letter."
}

l (length limit) — ensure the title is under 60 characters:

{
  "flag": "text", "operation": "l", "value": "60",
  "mitigation": "Title must be under 60 characters."
}

s (sentence count) — limit the intro paragraph to 3 sentences:

{
  "flag": "all", "operation": "s", "value": "3",
  "mitigation": "Introduction must be no more than 3 sentences."
}
Requires NLTK corpora. Run python -m nltk.downloader punkt_tab averaged_perceptron_tagger_eng after installing the package.

p<N> (part of speech) — check that the third word of the H1 is a verb (VB):

{
  "flag": "text", "operation": "p3", "value": "VB",
  "mitigation": "The third word of the H1 should be a base-form verb."
}
Uses Penn Treebank tags: NN (noun), VB (base verb), VBZ (3rd-person verb), JJ (adjective), RB (adverb). Position is 1-indexed.


Common mistakes

Multi-element XPath + equality

When an XPath returns multiple elements and you use ==, the evaluator checks that every element satisfies the assertion (logical AND). This is almost always wrong.

Wrong: "The second-to-last H2 is 'Clean up resources'"

{ "query": ".//h2[last()]/preceding-sibling::h2", "flag": "text", "operation": "==" }
This XPath returns all H2s before the last one. Requiring all of them to equal 'Clean up resources' fails on any document with more than two H2s.

Right: Target the specific element directly:

{ "query": ".//h2[last()-1]", "flag": "text", "operation": "==" }

Comma-separated multi-values

The value field is split on , to produce a list for operators that support multi-value checks ([]). The value "guide, article, topic" becomes three checks: "guide", " article", " topic" (note the leading space). Whitespace is stripped before comparison, so this works in practice — but a literal comma inside a single expected value has no escape mechanism.

Expressing absence (no negation operator)

The rule language has no built-in negation. To express "the title must NOT contain the word 'guide'", the idiomatic (but fragile) approach is to use workflow branching with a known-passing rule as a sentinel. The upcoming negate: true field (planned for v0.3) will eliminate this pattern. See the Roadmap for details.


Workflow step language

A workflow is a comma-separated string of <source>-<target> tokens that chains rule evaluations with conditional branching.

Pattern Example Meaning
S-N S-1 Start: load rule N as the initial state
N-D 1-D Rule N result becomes the decision point
M-D M-D Merge state becomes the decision
T-N T-2 If decision is True, load rule N
F-N F-3 If decision is False, load rule N
T-R T-R If decision is True, reverse (negate) it
F-R F-R If decision is False, reverse (negate) it
N-M 2-M Rule N exits into merge state
M-N M-3 Merge state exits to rule N
M-E M-E Merge state ends the workflow
N-E 4-E Rule N result ends the workflow
N-N 1-2 Both rules must pass

Worked example 1 — simple rule check

S-1,1-E

Start with rule 1. Rule 1's result ends the workflow. This is the simplest possible workflow: a single rule.

Worked example 2 — conditional branch

S-1,1-D,T-2,F-3,2-M,3-M,M-E

Rule 1 is evaluated and its result becomes the decision. If True, rule 2 runs; if False, rule 3 runs. Both rules exit into merge state, which ends the workflow. The overall workflow passes if the branch that executed passed.

Practical use: Rule 1 checks ms.topic == "tutorial". Rule 2 checks that the H1 starts with "Tutorial:". Rule 3 checks that the H1 starts with "How to". This enforces topic-type-specific H1 conventions in a single workflow.

Worked example 3 — conditional presence check

S-38,38-D,T-39,F-38,39-M,M-E

Rule 38 counts H2 headings. If the count is > 0 (True), rule 39 checks for H3s. If there are no H2s (False), rule 38's own (passing) result feeds into merge. This implements "H3s are only required if H2s are present."

For the full technical reference, see Design — Workflow Step Language.