Infrastructure API

Markdown document parser.

Reads a .md file from disk and returns a :class:~markdown_validator.domain.models.ParsedDocument. This module is the only place in the codebase that touches the filesystem for reading source documents.

Pipeline

Read the raw file bytes (UTF-8).
Split the YAML front-matter block from the body using PyYAML.
Convert the body Markdown to HTML using the markdown library.
Return an immutable :class:~markdown_validator.domain.models.ParsedDocument.

Raises:

Type	Description
`FileNotFoundError`	If the given path does not exist.
`ValueError`	If the YAML front-matter block is missing or malformed.

`find_markdown_files(directory: str | Path) -> list[Path]`

Recursively discover all .md files under directory.

Parameters:

Name	Type	Description	Default
`directory`	`str \| Path`	Root directory to search.	required

Returns:

Type	Description
`list[Path]`	Sorted list of :class:`pathlib.Path` objects.

Raises:

Type	Description
`NotADirectoryError`	If directory is not a directory.

Source code in markdown_validator/infrastructure/parser.py

def find_markdown_files(directory: str | Path) -> list[Path]:
    """Recursively discover all ``.md`` files under *directory*.

    :param directory: Root directory to search.
    :return: Sorted list of :class:`pathlib.Path` objects.
    :raises NotADirectoryError: If *directory* is not a directory.
    """
    root = Path(directory)
    if not root.is_dir():
        raise NotADirectoryError(f"Not a directory: {root}")
    files = sorted(root.rglob("*.md"))
    logger.info("find_markdown_files: found %d files under %s", len(files), root)
    return files

`parse_document(filepath: str | Path) -> ParsedDocument`

Parse a Markdown file and return an immutable :class:ParsedDocument.

The file must begin with a YAML front-matter block delimited by --- lines, for example::

---
title: My Article
ms.topic: tutorial
---
# Body starts here

Parameters:

Name	Type	Description	Default
`filepath`	`str \| Path`	Path to the `.md` file to parse.	required

Returns:

Type	Description
`ParsedDocument`	Parsed and validated document representation.

Raises:

Type	Description
`FileNotFoundError`	If filepath does not exist.
`ValueError`	If front-matter is absent or cannot be parsed as YAML.

Source code in markdown_validator/infrastructure/parser.py

def parse_document(filepath: str | Path) -> ParsedDocument:
    """Parse a Markdown file and return an immutable :class:`ParsedDocument`.

    The file must begin with a YAML front-matter block delimited by ``---``
    lines, for example::

        ---
        title: My Article
        ms.topic: tutorial
        ---
        # Body starts here

    :param filepath: Path to the ``.md`` file to parse.
    :return: Parsed and validated document representation.
    :raises FileNotFoundError: If *filepath* does not exist.
    :raises ValueError: If front-matter is absent or cannot be parsed as YAML.
    """
    path = Path(filepath)
    logger.info("parse_document: loading %s", path)

    if not path.exists():
        raise FileNotFoundError(f"Markdown file not found: {path}")

    raw = path.read_text(encoding="utf-8")
    metadata, body = _split_front_matter(raw, path)
    html = markdown.markdown(body)

    logger.debug("parse_document: %d metadata keys, %d chars html", len(metadata), len(html))
    return ParsedDocument(filepath=path.resolve(), metadata=metadata, html=html)

Rule-set loader — Repository pattern.

Loads a :class:~markdown_validator.domain.models.RuleSetModel from a JSON file. This is the only place in the codebase that reads rule-set files from disk.

Design pattern: Repository — the :class:RuleSetRepository class separates the service layer from the filesystem. Tests can substitute an in-memory rule set without touching the loader.

Raises:

Type	Description
`FileNotFoundError`	If the JSON file does not exist.
`ValueError`	If the JSON does not conform to the rule-set schema.

`RuleSetRepository`

Loads and validates rule-set JSON files.

Usage::

repo = RuleSetRepository()
rule_set = repo.load("path/to/rules.json")

Design pattern: Repository — decouples the scanner service from direct file I/O, making the scanner independently testable.

Source code in markdown_validator/infrastructure/loader.py

class RuleSetRepository:
    """Loads and validates rule-set JSON files.

    Usage::

        repo = RuleSetRepository()
        rule_set = repo.load("path/to/rules.json")

    Design pattern: **Repository** — decouples the scanner service from
    direct file I/O, making the scanner independently testable.
    """

    def load(self, filepath: str | Path) -> RuleSetModel:
        """Load and validate a rule-set from a JSON file.

        :param filepath: Path to the ``.json`` rule-set file.
        :return: A validated, immutable :class:`~markdown_validator.domain.models.RuleSetModel`.
        :raises FileNotFoundError: If *filepath* does not exist.
        :raises ValueError: If the JSON is invalid or fails schema validation.
        """
        path = Path(filepath)
        logger.info("RuleSetRepository.load: reading %s", path)

        if not path.exists():
            raise FileNotFoundError(f"Rule-set file not found: {path}")

        raw = path.read_text(encoding="utf-8")
        try:
            data = json.loads(raw)
        except json.JSONDecodeError as exc:
            raise ValueError(f"Rule-set file {path} is not valid JSON: {exc}") from exc

        try:
            rule_set = RuleSetModel.model_validate(data)
        except ValidationError as exc:
            raise ValueError(
                f"Rule-set file {path} failed schema validation:\n{exc}"
            ) from exc

        n_rules = len(rule_set.rules.header) + len(rule_set.rules.body)
        n_workflows = len(rule_set.workflows)
        logger.info(
            "RuleSetRepository.load: loaded %d rules (%d header, %d body), %d workflows",
            n_rules,
            len(rule_set.rules.header),
            len(rule_set.rules.body),
            n_workflows,
        )
        return rule_set

    def load_from_dict(self, data: dict) -> RuleSetModel:  # type: ignore[type-arg]
        """Load and validate a rule-set from an already-parsed dictionary.

        Useful in tests to construct rule sets without filesystem access.

        :param data: Dictionary conforming to the rule-set schema.
        :return: A validated, immutable :class:`~markdown_validator.domain.models.RuleSetModel`.
        :raises ValueError: If *data* fails schema validation.
        """
        try:
            return RuleSetModel.model_validate(data)
        except ValidationError as exc:
            raise ValueError(f"Rule-set dict failed schema validation:\n{exc}") from exc

`load(filepath: str | Path) -> RuleSetModel`

Load and validate a rule-set from a JSON file.

Parameters:

Name	Type	Description	Default
`filepath`	`str \| Path`	Path to the `.json` rule-set file.	required

Returns:

Type	Description
`RuleSetModel`	A validated, immutable :class:`~markdown_validator.domain.models.RuleSetModel`.

Raises:

Type	Description
`FileNotFoundError`	If filepath does not exist.
`ValueError`	If the JSON is invalid or fails schema validation.

Source code in markdown_validator/infrastructure/loader.py

def load(self, filepath: str | Path) -> RuleSetModel:
    """Load and validate a rule-set from a JSON file.

    :param filepath: Path to the ``.json`` rule-set file.
    :return: A validated, immutable :class:`~markdown_validator.domain.models.RuleSetModel`.
    :raises FileNotFoundError: If *filepath* does not exist.
    :raises ValueError: If the JSON is invalid or fails schema validation.
    """
    path = Path(filepath)
    logger.info("RuleSetRepository.load: reading %s", path)

    if not path.exists():
        raise FileNotFoundError(f"Rule-set file not found: {path}")

    raw = path.read_text(encoding="utf-8")
    try:
        data = json.loads(raw)
    except json.JSONDecodeError as exc:
        raise ValueError(f"Rule-set file {path} is not valid JSON: {exc}") from exc

    try:
        rule_set = RuleSetModel.model_validate(data)
    except ValidationError as exc:
        raise ValueError(
            f"Rule-set file {path} failed schema validation:\n{exc}"
        ) from exc

    n_rules = len(rule_set.rules.header) + len(rule_set.rules.body)
    n_workflows = len(rule_set.workflows)
    logger.info(
        "RuleSetRepository.load: loaded %d rules (%d header, %d body), %d workflows",
        n_rules,
        len(rule_set.rules.header),
        len(rule_set.rules.body),
        n_workflows,
    )
    return rule_set

`load_from_dict(data: dict) -> RuleSetModel`

Load and validate a rule-set from an already-parsed dictionary.

Useful in tests to construct rule sets without filesystem access.

Parameters:

Name	Type	Description	Default
`data`	`dict`	Dictionary conforming to the rule-set schema.	required

Returns:

Type	Description
`RuleSetModel`	A validated, immutable :class:`~markdown_validator.domain.models.RuleSetModel`.

Raises:

Type	Description
`ValueError`	If data fails schema validation.

Source code in markdown_validator/infrastructure/loader.py

def load_from_dict(self, data: dict) -> RuleSetModel:  # type: ignore[type-arg]
    """Load and validate a rule-set from an already-parsed dictionary.

    Useful in tests to construct rule sets without filesystem access.

    :param data: Dictionary conforming to the rule-set schema.
    :return: A validated, immutable :class:`~markdown_validator.domain.models.RuleSetModel`.
    :raises ValueError: If *data* fails schema validation.
    """
    try:
        return RuleSetModel.model_validate(data)
    except ValidationError as exc:
        raise ValueError(f"Rule-set dict failed schema validation:\n{exc}") from exc

Scan result reporter.

Writes :class:~markdown_validator.domain.models.ScanReport objects to disk in JSON or CSV format. This is the only place in the codebase that writes output files.

Raises:

Type	Description
`OSError`	If the destination directory cannot be created or the file cannot be written.

`write_csv_report(report: ScanReport, output_path: str | Path) -> Path`

Write a :class:ScanReport as a flat CSV file.

Each row represents one :class:~markdown_validator.domain.models.ValidationResult. The parent directory is created if it does not already exist.

Parameters:

Name	Type	Description	Default
`report`	`ScanReport`	Validated scan report to serialise.	required
`output_path`	`str \| Path`	Destination `.csv` file path.	required

Returns:

Type	Description
`Path`	Resolved path of the written file.

Raises:

Type	Description
`OSError`	On filesystem errors.

Source code in markdown_validator/infrastructure/reporter.py

def write_csv_report(report: ScanReport, output_path: str | Path) -> Path:
    """Write a :class:`ScanReport` as a flat CSV file.

    Each row represents one :class:`~markdown_validator.domain.models.ValidationResult`.
    The parent directory is created if it does not already exist.

    :param report: Validated scan report to serialise.
    :param output_path: Destination ``.csv`` file path.
    :return: Resolved path of the written file.
    :raises OSError: On filesystem errors.
    """
    dest = Path(output_path)
    dest.parent.mkdir(parents=True, exist_ok=True)

    fieldnames = [
        "filepath",
        "rule_id",
        "rule_name",
        "passed",
        "level",
        "expected_value",
        "actual_value",
        "mitigation",
    ]

    with dest.open("w", newline="", encoding="utf-8") as fh:
        writer = csv.DictWriter(fh, fieldnames=fieldnames)
        writer.writeheader()
        for result in report.results:
            writer.writerow(
                {
                    "filepath": report.filepath,
                    "rule_id": result.rule_id,
                    "rule_name": result.rule_name,
                    "passed": result.passed,
                    "level": result.level,
                    "expected_value": result.expected_value,
                    "actual_value": result.actual_value,
                    "mitigation": result.mitigation,
                }
            )

    logger.info("write_csv_report: wrote %d rows to %s", len(report.results), dest)
    return dest.resolve()

`write_json_report(report: ScanReport, output_path: str | Path) -> Path`

Serialise a :class:ScanReport to a JSON file.

The parent directory is created if it does not already exist.

Parameters:

Name	Type	Description	Default
`report`	`ScanReport`	Validated scan report to serialise.	required
`output_path`	`str \| Path`	Destination `.json` file path.	required

Returns:

Type	Description
`Path`	Resolved path of the written file.

Raises:

Type	Description
`OSError`	On filesystem errors.

Source code in markdown_validator/infrastructure/reporter.py

def write_json_report(report: ScanReport, output_path: str | Path) -> Path:
    """Serialise a :class:`ScanReport` to a JSON file.

    The parent directory is created if it does not already exist.

    :param report: Validated scan report to serialise.
    :param output_path: Destination ``.json`` file path.
    :return: Resolved path of the written file.
    :raises OSError: On filesystem errors.
    """
    dest = Path(output_path)
    dest.parent.mkdir(parents=True, exist_ok=True)
    payload = report.model_dump()
    dest.write_text(json.dumps(payload, indent=2), encoding="utf-8")
    logger.info("write_json_report: wrote %s", dest)
    return dest.resolve()