Architecture

Layer diagram

graph TD
    CLI["CLI layer<br/>markdown_validator.cli<br/><em>Click batch validator + REPL</em>"]
    SVC["Services layer<br/>markdown_validator.services<br/><em>Scanner · WorkflowEngine</em>"]
    INF["Infrastructure layer<br/>markdown_validator.infrastructure<br/><em>parser · loader · reporter</em>"]
    DOM["Domain layer<br/>markdown_validator.domain<br/><em>models · operators · evaluator · pos</em>"]

    CLI --> SVC
    SVC --> INF
    SVC --> DOM
    INF --> DOM

Dependency rule: each layer may only import from layers below it. The CLI never calls infrastructure directly. Domain depends on nothing internal — no I/O, no side effects.

Module responsibilities

Layer	Module	Responsibility
Domain	`models.py`	Pydantic v2 contract models — all frozen
Domain	`operators.py`	Pure comparison strategy functions; `OPERATOR_REGISTRY`
Domain	`evaluator.py`	Apply a single `RuleModel` to a `ParsedDocument` → `ValidationResult`
Domain	`pos.py`	Thin NLTK wrapper for POS tagging and sentence counting
Infrastructure	`parser.py`	Read `.md` file → `ParsedDocument` (sole file reader for source docs)
Infrastructure	`loader.py`	Read JSON rule-set file → `RuleSetModel` (Repository pattern)
Infrastructure	`reporter.py`	Write `ScanReport` to JSON or CSV (sole file writer for output)
Services	`scanner.py`	Facade: compose parser + loader + evaluator → `ScanReport`
Services	`workflow.py`	Chain-of-Responsibility: execute workflow step sequences
CLI	`main.py`	Click batch CLI — parse args, call Scanner, render output
CLI	`repl.py`	Interactive `cmd.Cmd` REPL for rule development

For the full design-pattern justifications, see Design Document.

Data flow

sequenceDiagram
    participant CLI
    participant Scanner
    participant Parser
    participant Loader
    participant Evaluator
    participant Reporter

    CLI->>Scanner: validate(file, rules)
    Scanner->>Parser: parse(file)
    Parser-->>Scanner: ParsedDocument (frozen)
    Scanner->>Loader: load(rules)
    Loader-->>Scanner: RuleSetModel (frozen)
    loop Each rule
        Scanner->>Evaluator: evaluate_rule(rule, doc)
        Evaluator-->>Scanner: ValidationResult (frozen)
    end
    Scanner->>Reporter: write(ScanReport)
    Reporter-->>CLI: text / json / csv output

Extension points

The architecture is designed so that common extensions touch exactly one module.

What you want to add	Where to change	What to do
New operator (e.g., `>=`)	`domain/operators.py`	Add one function; add one entry to `OPERATOR_REGISTRY`
New flag (e.g., `word_count`)	`domain/evaluator.py`	Add one `elif` branch in the flag dispatch
New report format (e.g., Markdown table)	`infrastructure/reporter.py`	Add one `elif` branch in the format dispatch
Different Markdown parser	`infrastructure/parser.py`	Replace the `markdown` + `lxml` call; keep the `ParsedDocument` return type
New workflow step pattern	`services/workflow.py`	Add one `elif` branch in `_dispatch()`

No extension requires touching the CLI or changing the Pydantic models (unless the new capability requires a new contract field).

Test organisation

The test suite mirrors the layer structure, so you know exactly where to add a test for any change.

tests/
  unit/
    domain/
      test_models.py        ← Pydantic validation, coercions, frozen behaviour
      test_operators.py     ← one test per operator token
      test_evaluator.py     ← rule evaluation logic
    infrastructure/
      test_parser.py        ← parse front matter, render HTML, XPath availability
      test_loader.py        ← JSON loading, schema normalisation, backward compat
      test_reporter.py      ← JSON and CSV output format
    services/
      test_scanner.py       ← full validate() pipeline (in-memory rule injection)
      test_workflow.py      ← workflow step patterns
  integration/
    test_validate.py        ← CLI validate command end-to-end
  fixtures/
    checkworkflow.json      ← 26-rule reference rule set for integration tests
    concept.json            ← old-schema rule set (tests backward compat)

Coverage gate: ≥ 90% line coverage enforced by pytest-cov in CI. CLI modules (cli/main.py, cli/repl.py) are excluded from the gate because they require a live terminal. For requirements traceability, see the SRS.