Skip to content

Infrastructure API

Markdown document parser.

Reads a .md file from disk and returns a :class:~markdown_validator.domain.models.ParsedDocument. This module is the only place in the codebase that touches the filesystem for reading source documents.

Pipeline

  1. Read the raw file bytes (UTF-8).
  2. Split the YAML front-matter block from the body using PyYAML.
  3. Convert the body Markdown to HTML using the markdown library.
  4. Return an immutable :class:~markdown_validator.domain.models.ParsedDocument.

Raises:

Type Description
FileNotFoundError

If the given path does not exist.

ValueError

If the YAML front-matter block is missing or malformed.

find_markdown_files(directory: str | Path) -> list[Path]

Recursively discover all .md files under directory.

Parameters:

Name Type Description Default
directory str | Path

Root directory to search.

required

Returns:

Type Description
list[Path]

Sorted list of :class:pathlib.Path objects.

Raises:

Type Description
NotADirectoryError

If directory is not a directory.

Source code in markdown_validator/infrastructure/parser.py
64
65
66
67
68
69
70
71
72
73
74
75
76
def find_markdown_files(directory: str | Path) -> list[Path]:
    """Recursively discover all ``.md`` files under *directory*.

    :param directory: Root directory to search.
    :return: Sorted list of :class:`pathlib.Path` objects.
    :raises NotADirectoryError: If *directory* is not a directory.
    """
    root = Path(directory)
    if not root.is_dir():
        raise NotADirectoryError(f"Not a directory: {root}")
    files = sorted(root.rglob("*.md"))
    logger.info("find_markdown_files: found %d files under %s", len(files), root)
    return files

parse_document(filepath: str | Path) -> ParsedDocument

Parse a Markdown file and return an immutable :class:ParsedDocument.

The file must begin with a YAML front-matter block delimited by --- lines, for example::

---
title: My Article
ms.topic: tutorial
---
# Body starts here

Parameters:

Name Type Description Default
filepath str | Path

Path to the .md file to parse.

required

Returns:

Type Description
ParsedDocument

Parsed and validated document representation.

Raises:

Type Description
FileNotFoundError

If filepath does not exist.

ValueError

If front-matter is absent or cannot be parsed as YAML.

Source code in markdown_validator/infrastructure/parser.py
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
def parse_document(filepath: str | Path) -> ParsedDocument:
    """Parse a Markdown file and return an immutable :class:`ParsedDocument`.

    The file must begin with a YAML front-matter block delimited by ``---``
    lines, for example::

        ---
        title: My Article
        ms.topic: tutorial
        ---
        # Body starts here

    :param filepath: Path to the ``.md`` file to parse.
    :return: Parsed and validated document representation.
    :raises FileNotFoundError: If *filepath* does not exist.
    :raises ValueError: If front-matter is absent or cannot be parsed as YAML.
    """
    path = Path(filepath)
    logger.info("parse_document: loading %s", path)

    if not path.exists():
        raise FileNotFoundError(f"Markdown file not found: {path}")

    raw = path.read_text(encoding="utf-8")
    metadata, body = _split_front_matter(raw, path)
    html = markdown.markdown(body)

    logger.debug("parse_document: %d metadata keys, %d chars html", len(metadata), len(html))
    return ParsedDocument(filepath=path.resolve(), metadata=metadata, html=html)

Rule-set loader — Repository pattern.

Loads a :class:~markdown_validator.domain.models.RuleSetModel from a JSON file. This is the only place in the codebase that reads rule-set files from disk.

Design pattern: Repository — the :class:RuleSetRepository class separates the service layer from the filesystem. Tests can substitute an in-memory rule set without touching the loader.

Raises:

Type Description
FileNotFoundError

If the JSON file does not exist.

ValueError

If the JSON does not conform to the rule-set schema.

RuleSetRepository

Loads and validates rule-set JSON files.

Usage::

repo = RuleSetRepository()
rule_set = repo.load("path/to/rules.json")

Design pattern: Repository — decouples the scanner service from direct file I/O, making the scanner independently testable.

Source code in markdown_validator/infrastructure/loader.py
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
class RuleSetRepository:
    """Loads and validates rule-set JSON files.

    Usage::

        repo = RuleSetRepository()
        rule_set = repo.load("path/to/rules.json")

    Design pattern: **Repository** — decouples the scanner service from
    direct file I/O, making the scanner independently testable.
    """

    def load(self, filepath: str | Path) -> RuleSetModel:
        """Load and validate a rule-set from a JSON file.

        :param filepath: Path to the ``.json`` rule-set file.
        :return: A validated, immutable :class:`~markdown_validator.domain.models.RuleSetModel`.
        :raises FileNotFoundError: If *filepath* does not exist.
        :raises ValueError: If the JSON is invalid or fails schema validation.
        """
        path = Path(filepath)
        logger.info("RuleSetRepository.load: reading %s", path)

        if not path.exists():
            raise FileNotFoundError(f"Rule-set file not found: {path}")

        raw = path.read_text(encoding="utf-8")
        try:
            data = json.loads(raw)
        except json.JSONDecodeError as exc:
            raise ValueError(f"Rule-set file {path} is not valid JSON: {exc}") from exc

        try:
            rule_set = RuleSetModel.model_validate(data)
        except ValidationError as exc:
            raise ValueError(
                f"Rule-set file {path} failed schema validation:\n{exc}"
            ) from exc

        n_rules = len(rule_set.rules.header) + len(rule_set.rules.body)
        n_workflows = len(rule_set.workflows)
        logger.info(
            "RuleSetRepository.load: loaded %d rules (%d header, %d body), %d workflows",
            n_rules,
            len(rule_set.rules.header),
            len(rule_set.rules.body),
            n_workflows,
        )
        return rule_set

    def load_from_dict(self, data: dict) -> RuleSetModel:  # type: ignore[type-arg]
        """Load and validate a rule-set from an already-parsed dictionary.

        Useful in tests to construct rule sets without filesystem access.

        :param data: Dictionary conforming to the rule-set schema.
        :return: A validated, immutable :class:`~markdown_validator.domain.models.RuleSetModel`.
        :raises ValueError: If *data* fails schema validation.
        """
        try:
            return RuleSetModel.model_validate(data)
        except ValidationError as exc:
            raise ValueError(f"Rule-set dict failed schema validation:\n{exc}") from exc

load(filepath: str | Path) -> RuleSetModel

Load and validate a rule-set from a JSON file.

Parameters:

Name Type Description Default
filepath str | Path

Path to the .json rule-set file.

required

Returns:

Type Description
RuleSetModel

A validated, immutable :class:~markdown_validator.domain.models.RuleSetModel.

Raises:

Type Description
FileNotFoundError

If filepath does not exist.

ValueError

If the JSON is invalid or fails schema validation.

Source code in markdown_validator/infrastructure/loader.py
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
def load(self, filepath: str | Path) -> RuleSetModel:
    """Load and validate a rule-set from a JSON file.

    :param filepath: Path to the ``.json`` rule-set file.
    :return: A validated, immutable :class:`~markdown_validator.domain.models.RuleSetModel`.
    :raises FileNotFoundError: If *filepath* does not exist.
    :raises ValueError: If the JSON is invalid or fails schema validation.
    """
    path = Path(filepath)
    logger.info("RuleSetRepository.load: reading %s", path)

    if not path.exists():
        raise FileNotFoundError(f"Rule-set file not found: {path}")

    raw = path.read_text(encoding="utf-8")
    try:
        data = json.loads(raw)
    except json.JSONDecodeError as exc:
        raise ValueError(f"Rule-set file {path} is not valid JSON: {exc}") from exc

    try:
        rule_set = RuleSetModel.model_validate(data)
    except ValidationError as exc:
        raise ValueError(
            f"Rule-set file {path} failed schema validation:\n{exc}"
        ) from exc

    n_rules = len(rule_set.rules.header) + len(rule_set.rules.body)
    n_workflows = len(rule_set.workflows)
    logger.info(
        "RuleSetRepository.load: loaded %d rules (%d header, %d body), %d workflows",
        n_rules,
        len(rule_set.rules.header),
        len(rule_set.rules.body),
        n_workflows,
    )
    return rule_set

load_from_dict(data: dict) -> RuleSetModel

Load and validate a rule-set from an already-parsed dictionary.

Useful in tests to construct rule sets without filesystem access.

Parameters:

Name Type Description Default
data dict

Dictionary conforming to the rule-set schema.

required

Returns:

Type Description
RuleSetModel

A validated, immutable :class:~markdown_validator.domain.models.RuleSetModel.

Raises:

Type Description
ValueError

If data fails schema validation.

Source code in markdown_validator/infrastructure/loader.py
78
79
80
81
82
83
84
85
86
87
88
89
90
def load_from_dict(self, data: dict) -> RuleSetModel:  # type: ignore[type-arg]
    """Load and validate a rule-set from an already-parsed dictionary.

    Useful in tests to construct rule sets without filesystem access.

    :param data: Dictionary conforming to the rule-set schema.
    :return: A validated, immutable :class:`~markdown_validator.domain.models.RuleSetModel`.
    :raises ValueError: If *data* fails schema validation.
    """
    try:
        return RuleSetModel.model_validate(data)
    except ValidationError as exc:
        raise ValueError(f"Rule-set dict failed schema validation:\n{exc}") from exc

Scan result reporter.

Writes :class:~markdown_validator.domain.models.ScanReport objects to disk in JSON or CSV format. This is the only place in the codebase that writes output files.

Raises:

Type Description
OSError

If the destination directory cannot be created or the file cannot be written.

write_csv_report(report: ScanReport, output_path: str | Path) -> Path

Write a :class:ScanReport as a flat CSV file.

Each row represents one :class:~markdown_validator.domain.models.ValidationResult. The parent directory is created if it does not already exist.

Parameters:

Name Type Description Default
report ScanReport

Validated scan report to serialise.

required
output_path str | Path

Destination .csv file path.

required

Returns:

Type Description
Path

Resolved path of the written file.

Raises:

Type Description
OSError

On filesystem errors.

Source code in markdown_validator/infrastructure/reporter.py
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
def write_csv_report(report: ScanReport, output_path: str | Path) -> Path:
    """Write a :class:`ScanReport` as a flat CSV file.

    Each row represents one :class:`~markdown_validator.domain.models.ValidationResult`.
    The parent directory is created if it does not already exist.

    :param report: Validated scan report to serialise.
    :param output_path: Destination ``.csv`` file path.
    :return: Resolved path of the written file.
    :raises OSError: On filesystem errors.
    """
    dest = Path(output_path)
    dest.parent.mkdir(parents=True, exist_ok=True)

    fieldnames = [
        "filepath",
        "rule_id",
        "rule_name",
        "passed",
        "level",
        "expected_value",
        "actual_value",
        "mitigation",
    ]

    with dest.open("w", newline="", encoding="utf-8") as fh:
        writer = csv.DictWriter(fh, fieldnames=fieldnames)
        writer.writeheader()
        for result in report.results:
            writer.writerow(
                {
                    "filepath": report.filepath,
                    "rule_id": result.rule_id,
                    "rule_name": result.rule_name,
                    "passed": result.passed,
                    "level": result.level,
                    "expected_value": result.expected_value,
                    "actual_value": result.actual_value,
                    "mitigation": result.mitigation,
                }
            )

    logger.info("write_csv_report: wrote %d rows to %s", len(report.results), dest)
    return dest.resolve()

write_json_report(report: ScanReport, output_path: str | Path) -> Path

Serialise a :class:ScanReport to a JSON file.

The parent directory is created if it does not already exist.

Parameters:

Name Type Description Default
report ScanReport

Validated scan report to serialise.

required
output_path str | Path

Destination .json file path.

required

Returns:

Type Description
Path

Resolved path of the written file.

Raises:

Type Description
OSError

On filesystem errors.

Source code in markdown_validator/infrastructure/reporter.py
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
def write_json_report(report: ScanReport, output_path: str | Path) -> Path:
    """Serialise a :class:`ScanReport` to a JSON file.

    The parent directory is created if it does not already exist.

    :param report: Validated scan report to serialise.
    :param output_path: Destination ``.json`` file path.
    :return: Resolved path of the written file.
    :raises OSError: On filesystem errors.
    """
    dest = Path(output_path)
    dest.parent.mkdir(parents=True, exist_ok=True)
    payload = report.model_dump()
    dest.write_text(json.dumps(payload, indent=2), encoding="utf-8")
    logger.info("write_json_report: wrote %s", dest)
    return dest.resolve()