Skip to content

Architecture

Layer diagram

graph TD
    CLI["CLI layer<br/>markdown_validator.cli<br/><em>Click batch validator + REPL</em>"]
    SVC["Services layer<br/>markdown_validator.services<br/><em>Scanner · WorkflowEngine</em>"]
    INF["Infrastructure layer<br/>markdown_validator.infrastructure<br/><em>parser · loader · reporter</em>"]
    DOM["Domain layer<br/>markdown_validator.domain<br/><em>models · operators · evaluator · pos</em>"]

    CLI --> SVC
    SVC --> INF
    SVC --> DOM
    INF --> DOM

Dependency rule: each layer may only import from layers below it. The CLI never calls infrastructure directly. Domain depends on nothing internal — no I/O, no side effects.

Module responsibilities

Layer Module Responsibility
Domain models.py Pydantic v2 contract models — all frozen
Domain operators.py Pure comparison strategy functions; OPERATOR_REGISTRY
Domain evaluator.py Apply a single RuleModel to a ParsedDocumentValidationResult
Domain pos.py Thin NLTK wrapper for POS tagging and sentence counting
Infrastructure parser.py Read .md file → ParsedDocument (sole file reader for source docs)
Infrastructure loader.py Read JSON rule-set file → RuleSetModel (Repository pattern)
Infrastructure reporter.py Write ScanReport to JSON or CSV (sole file writer for output)
Services scanner.py Facade: compose parser + loader + evaluator → ScanReport
Services workflow.py Chain-of-Responsibility: execute workflow step sequences
CLI main.py Click batch CLI — parse args, call Scanner, render output
CLI repl.py Interactive cmd.Cmd REPL for rule development

For the full design-pattern justifications, see Design Document.

Data flow

sequenceDiagram
    participant CLI
    participant Scanner
    participant Parser
    participant Loader
    participant Evaluator
    participant Reporter

    CLI->>Scanner: validate(file, rules)
    Scanner->>Parser: parse(file)
    Parser-->>Scanner: ParsedDocument (frozen)
    Scanner->>Loader: load(rules)
    Loader-->>Scanner: RuleSetModel (frozen)
    loop Each rule
        Scanner->>Evaluator: evaluate_rule(rule, doc)
        Evaluator-->>Scanner: ValidationResult (frozen)
    end
    Scanner->>Reporter: write(ScanReport)
    Reporter-->>CLI: text / json / csv output

Extension points

The architecture is designed so that common extensions touch exactly one module.

What you want to add Where to change What to do
New operator (e.g., >=) domain/operators.py Add one function; add one entry to OPERATOR_REGISTRY
New flag (e.g., word_count) domain/evaluator.py Add one elif branch in the flag dispatch
New report format (e.g., Markdown table) infrastructure/reporter.py Add one elif branch in the format dispatch
Different Markdown parser infrastructure/parser.py Replace the markdown + lxml call; keep the ParsedDocument return type
New workflow step pattern services/workflow.py Add one elif branch in _dispatch()

No extension requires touching the CLI or changing the Pydantic models (unless the new capability requires a new contract field).

Test organisation

The test suite mirrors the layer structure, so you know exactly where to add a test for any change.

tests/
  unit/
    domain/
      test_models.py        ← Pydantic validation, coercions, frozen behaviour
      test_operators.py     ← one test per operator token
      test_evaluator.py     ← rule evaluation logic
    infrastructure/
      test_parser.py        ← parse front matter, render HTML, XPath availability
      test_loader.py        ← JSON loading, schema normalisation, backward compat
      test_reporter.py      ← JSON and CSV output format
    services/
      test_scanner.py       ← full validate() pipeline (in-memory rule injection)
      test_workflow.py      ← workflow step patterns
  integration/
    test_validate.py        ← CLI validate command end-to-end
  fixtures/
    checkworkflow.json      ← 26-rule reference rule set for integration tests
    concept.json            ← old-schema rule set (tests backward compat)

Coverage gate: ≥ 90% line coverage enforced by pytest-cov in CI. CLI modules (cli/main.py, cli/repl.py) are excluded from the gate because they require a live terminal. For requirements traceability, see the SRS.