Skip to content

glazing.base

Base models and shared functionality.

base

Base models and utilities for the glazing package.

This module provides base classes and common functionality used throughout the glazing package. All dataset-specific models inherit from these base classes to ensure consistent behavior and validation.

CLASS DESCRIPTION
GlazingBaseModel

Extended Pydantic BaseModel with JSON Lines support.

CrossReferenceBase

Base class for cross-dataset references.

MappingBase

Base class for dataset mappings.

FUNCTION DESCRIPTION
validate_pattern

Validate a string against a regex pattern.

validate_confidence_score

Validate a confidence score is between 0.0 and 1.0.

Notes

This module uses Pydantic v2 for data validation and serialization. All models support JSON Lines export/import for efficient data storage.

Classes

ConflictResolution pydantic-model

Bases: GlazingBaseModel

Model for representing mapping conflict resolution.

ATTRIBUTE DESCRIPTION
conflict_type

Type of conflict detected.

TYPE: ConflictType

resolution_strategy

Strategy used to resolve the conflict.

TYPE: str

selected_mapping

The mapping selected after resolution.

TYPE: CrossReferenceBase | None

rejected_mappings

Mappings that were rejected.

TYPE: list[CrossReferenceBase]

resolution_confidence

Confidence in the resolution.

TYPE: MappingConfidenceScore

Fields:

  • conflict_type (ConflictType)
  • resolution_strategy (str)
  • selected_mapping (CrossReferenceBase | None)
  • rejected_mappings (list[CrossReferenceBase])
  • resolution_confidence (MappingConfidenceScore)

Validators:

Functions
validate_resolution() -> Self pydantic-validator

Resolution status must match presence of resolved_by/resolved_at.

Source code in src/glazing/base.py
@model_validator(mode="after")
def validate_resolution(self) -> Self:
    """Resolution status must match presence of resolved_by/resolved_at."""
    if not self.selected_mapping and not self.rejected_mappings:
        raise ValueError("Resolution must have either selected or rejected mappings")
    return self

CrossReferenceBase pydantic-model

Bases: GlazingBaseModel

Base class for cross-dataset references.

Provides common fields and validation for references between FrameNet, PropBank, VerbNet, and WordNet.

ATTRIBUTE DESCRIPTION
source_dataset

The source dataset.

TYPE: DatasetType

source_id

Identifier in the source dataset.

TYPE: str

target_dataset

The target dataset.

TYPE: DatasetType

target_id

Identifier(s) in the target dataset.

TYPE: str | list[str]

mapping_type

Type of mapping relationship.

TYPE: MappingType

confidence

Confidence score for the mapping.

TYPE: MappingConfidenceScore | None

mapping_source

Provenance of the mapping.

TYPE: MappingSource | None

notes

Additional notes about the mapping.

TYPE: str | None

Fields:

  • source_dataset (DatasetType)
  • source_id (str)
  • target_dataset (DatasetType)
  • target_id (str | list[str])
  • mapping_type (MappingType)
  • confidence (MappingConfidenceScore | None)
  • mapping_source (MappingSource | None)
  • notes (str | None)

Validators:

Functions
get_confidence_score() -> float

Get confidence score with default fallback.

RETURNS DESCRIPTION
float

Confidence score, or 0.5 if not specified.

Source code in src/glazing/base.py
def get_confidence_score(self) -> float:
    """Get confidence score with default fallback.

    Returns
    -------
    float
        Confidence score, or 0.5 if not specified.
    """
    return self.confidence if self.confidence is not None else 0.5
is_high_confidence(threshold: float = 0.8) -> bool

Check if this is a high-confidence mapping.

PARAMETER DESCRIPTION
threshold

Minimum confidence score for high confidence.

TYPE: float DEFAULT: 0.8

RETURNS DESCRIPTION
bool

True if confidence exceeds threshold.

Source code in src/glazing/base.py
def is_high_confidence(self, threshold: float = 0.8) -> bool:
    """Check if this is a high-confidence mapping.

    Parameters
    ----------
    threshold : float, default=0.8
        Minimum confidence score for high confidence.

    Returns
    -------
    bool
        True if confidence exceeds threshold.
    """
    return self.get_confidence_score() >= threshold
validate_datasets() -> Self pydantic-validator

Source and target must be different datasets.

Source code in src/glazing/base.py
@model_validator(mode="after")
def validate_datasets(self) -> Self:
    """Source and target must be different datasets."""
    if self.source_dataset == self.target_dataset and self.mapping_type not in (
        "inherited",
        "transitive",
    ):
        msg = f"Source and target datasets cannot be the same for {self.mapping_type} mappings"
        raise ValueError(msg)
    return self
validate_ids(v: str | list[str]) -> str | list[str] pydantic-validator

IDs must be non-empty strings.

Source code in src/glazing/base.py
@field_validator("source_id", "target_id")
@classmethod
def validate_ids(cls, v: str | list[str]) -> str | list[str]:
    """IDs must be non-empty strings."""
    if isinstance(v, str):
        if not v.strip():
            raise ValueError("ID cannot be empty")
    elif isinstance(v, list):
        if not v:
            raise ValueError("ID list cannot be empty")
        for item in v:
            if not isinstance(item, str) or not item.strip():
                raise ValueError("All IDs must be non-empty strings")
    return v

GlazingBaseModel pydantic-model

Bases: BaseModel

Base model class for all glazing data models.

Extends Pydantic's BaseModel with JSON Lines support and common validation functionality used across all linguistic datasets.

ATTRIBUTE DESCRIPTION
model_config

Pydantic configuration for the model.

TYPE: ConfigDict

METHOD DESCRIPTION
to_jsonl

Export model to JSON Lines format.

from_jsonl

Load model from JSON Lines format.

to_json_lines_file

Write model to JSON Lines file.

from_json_lines_file

Load model from JSON Lines file.

Examples:

>>> class MyModel(GlazingBaseModel):
...     name: str
...     value: int
>>> model = MyModel(name="test", value=42)
>>> jsonl = model.to_jsonl()
>>> loaded = MyModel.from_jsonl(jsonl)

Config:

  • populate_by_name: True
  • validate_assignment: True
  • use_enum_values: False
  • json_schema_extra: {'description': 'Base model for glazing package data structures'}
Functions
from_json_lines_file(path: Path | str, skip_errors: bool = False) -> Generator[Self, None, None] classmethod

Load models from a JSON Lines file.

PARAMETER DESCRIPTION
path

Path to the JSON Lines file.

TYPE: Path | str

skip_errors

If True, skip lines that fail validation.

TYPE: bool DEFAULT: False

YIELDS DESCRIPTION
Self

Instances of the model class.

RAISES DESCRIPTION
ValueError

If skip_errors is False and a line fails validation.

Source code in src/glazing/base.py
@classmethod
def from_json_lines_file(
    cls, path: Path | str, skip_errors: bool = False
) -> Generator[Self, None, None]:
    """Load models from a JSON Lines file.

    Parameters
    ----------
    path : Path | str
        Path to the JSON Lines file.
    skip_errors : bool, default=False
        If True, skip lines that fail validation.

    Yields
    ------
    Self
        Instances of the model class.

    Raises
    ------
    ValueError
        If skip_errors is False and a line fails validation.
    """
    path = Path(path)
    with path.open("r", encoding="utf-8") as f:
        for line_num, raw_line in enumerate(f, 1):
            line = raw_line.strip()
            if not line:  # Skip empty lines
                continue
            try:
                yield cls.from_jsonl(line)
            except (json.JSONDecodeError, ValueError, TypeError) as e:
                if not skip_errors:
                    msg = f"Error on line {line_num}: {e}"
                    raise ValueError(msg) from e
from_jsonl(line: str) -> Self classmethod

Load model from a JSON Lines string.

PARAMETER DESCRIPTION
line

Single line of JSON Lines format.

TYPE: str

RETURNS DESCRIPTION
Self

Instance of the model class.

RAISES DESCRIPTION
ValueError

If the JSON is invalid or doesn't match the model schema.

Source code in src/glazing/base.py
@classmethod
def from_jsonl(cls, line: str) -> Self:
    """Load model from a JSON Lines string.

    Parameters
    ----------
    line : str
        Single line of JSON Lines format.

    Returns
    -------
    Self
        Instance of the model class.

    Raises
    ------
    ValueError
        If the JSON is invalid or doesn't match the model schema.
    """
    data = json.loads(line)
    return cls.model_validate(data)
to_json_lines_file(path: Path | str) -> None

Write model to a JSON Lines file.

PARAMETER DESCRIPTION
path

Path to the output file.

TYPE: Path | str

Source code in src/glazing/base.py
def to_json_lines_file(self, path: Path | str) -> None:
    """Write model to a JSON Lines file.

    Parameters
    ----------
    path : Path | str
        Path to the output file.
    """
    path = Path(path)
    with path.open("w", encoding="utf-8") as f:
        f.write(self.to_jsonl())
        f.write("\n")
to_jsonl() -> str

Export model to JSON Lines format.

RETURNS DESCRIPTION
str

JSON Lines string representation of the model.

Source code in src/glazing/base.py
def to_jsonl(self) -> str:
    """Export model to JSON Lines format.

    Returns
    -------
    str
        JSON Lines string representation of the model.
    """
    return json.dumps(self.model_dump(mode="json"), ensure_ascii=False)
validate_many(items: Iterable[dict[str, ModelValue]]) -> list[tuple[Self | None, Exception | None]] classmethod

Validate multiple items and return results with errors.

PARAMETER DESCRIPTION
items

Items to validate.

TYPE: Iterable[dict[str, ModelValue]]

RETURNS DESCRIPTION
list[tuple[Self | None, Exception | None]]

List of (model, error) tuples. If validation succeeds, error is None. If validation fails, model is None.

Source code in src/glazing/base.py
@classmethod
def validate_many(
    cls, items: Iterable[dict[str, ModelValue]]
) -> list[tuple[Self | None, Exception | None]]:
    """Validate multiple items and return results with errors.

    Parameters
    ----------
    items : Iterable[dict[str, ModelValue]]
        Items to validate.

    Returns
    -------
    list[tuple[Self | None, Exception | None]]
        List of (model, error) tuples. If validation succeeds,
        error is None. If validation fails, model is None.
    """
    results: list[tuple[Self | None, Exception | None]] = []
    for item in items:
        try:
            model = cls.model_validate(item)
            results.append((model, None))
        except (ValueError, TypeError) as e:
            results.append((None, e))
    return results

MappingBase pydantic-model

Bases: GlazingBaseModel

Base class for mapping metadata.

Provides common fields for tracking mapping provenance, validation status, and versioning.

ATTRIBUTE DESCRIPTION
created_date

When the mapping was created.

TYPE: datetime

created_by

Person or system that created the mapping.

TYPE: str

modified_date

When the mapping was last modified.

TYPE: datetime | None

modified_by

Person or system that last modified the mapping.

TYPE: str | None

version

Dataset version this mapping was created for.

TYPE: VersionString

validation_status

Current validation status.

TYPE: ValidationStatus

validation_method

How the mapping was validated.

TYPE: str | None

Fields:

  • created_date (datetime)
  • created_by (str)
  • modified_date (datetime | None)
  • modified_by (str | None)
  • version (VersionString)
  • validation_status (ValidationStatus)
  • validation_method (str | None)

Validators:

Functions
mark_validated(method: str, validator: str | None = None) -> None

Mark the mapping as validated.

PARAMETER DESCRIPTION
method

Validation method used.

TYPE: str

validator

Person or system that performed validation.

TYPE: str | None DEFAULT: None

Source code in src/glazing/base.py
def mark_validated(self, method: str, validator: str | None = None) -> None:
    """Mark the mapping as validated.

    Parameters
    ----------
    method : str
        Validation method used.
    validator : str | None
        Person or system that performed validation.
    """
    # Temporarily disable validation to set both fields
    original_config = self.model_config.get("validate_assignment", True)
    self.model_config["validate_assignment"] = False

    try:
        self.validation_status = "validated"
        self.validation_method = method
        if validator:
            self.modified_by = validator
            self.modified_date = datetime.now(UTC)
    finally:
        self.model_config["validate_assignment"] = original_config
validate_modification() -> Self pydantic-validator

If modified_by is set, modified_at must also be set.

Source code in src/glazing/base.py
@model_validator(mode="after")
def validate_modification(self) -> Self:
    """If modified_by is set, modified_at must also be set."""
    if self.modified_date and not self.modified_by:
        raise ValueError("modified_by required when modified_date is set")
    if self.modified_by and not self.modified_date:
        raise ValueError("modified_date required when modified_by is set")
    return self

Functions

validate_confidence_score(value: float) -> float

Validate a confidence score is between 0.0 and 1.0.

PARAMETER DESCRIPTION
value

The confidence score to validate.

TYPE: float

RETURNS DESCRIPTION
float

The validated confidence score.

RAISES DESCRIPTION
ValueError

If the value is not between 0.0 and 1.0.

Source code in src/glazing/base.py
def validate_confidence_score(value: float) -> float:
    """Validate a confidence score is between 0.0 and 1.0.

    Parameters
    ----------
    value : float
        The confidence score to validate.

    Returns
    -------
    float
        The validated confidence score.

    Raises
    ------
    ValueError
        If the value is not between 0.0 and 1.0.
    """
    if not 0.0 <= value <= 1.0:
        msg = f"Confidence score must be between 0.0 and 1.0, got {value}"
        raise ValueError(msg)
    return value

validate_fe_name(value: str) -> str

Check FrameNet FE name format.

Source code in src/glazing/base.py
def validate_fe_name(value: str) -> str:
    """Check FrameNet FE name format."""
    return validate_pattern(value, FE_NAME_PATTERN, "frame element name")

validate_frame_id(value: int | str) -> str

Check FrameNet frame ID format (positive integer).

Source code in src/glazing/base.py
def validate_frame_id(value: int | str) -> str:
    """Check FrameNet frame ID format (positive integer)."""
    str_value = str(value)
    return validate_pattern(str_value, FRAME_ID_PATTERN, "frame ID")

validate_frame_name(value: str) -> str

Check FrameNet frame name format.

Source code in src/glazing/base.py
def validate_frame_name(value: str) -> str:
    """Check FrameNet frame name format."""
    return validate_pattern(value, FRAME_NAME_PATTERN, "frame name")

validate_hex_color(value: str) -> str

Check hex color format (#RRGGBB).

Source code in src/glazing/base.py
def validate_hex_color(value: str) -> str:
    """Check hex color format (#RRGGBB)."""
    return validate_pattern(value, HEX_COLOR_PATTERN, "hex color")

validate_lemma(value: str) -> str

Check that lemma contains valid characters.

Source code in src/glazing/base.py
def validate_lemma(value: str) -> str:
    """Check that lemma contains valid characters."""
    return validate_pattern(value, LEMMA_PATTERN, "lemma")

validate_pattern(value: str, pattern: str, field_name: str) -> str

Validate a string against a regex pattern.

PARAMETER DESCRIPTION
value

The value to validate.

TYPE: str

pattern

The regex pattern to match.

TYPE: str

field_name

Name of the field being validated (for error messages).

TYPE: str

RETURNS DESCRIPTION
str

The validated value.

RAISES DESCRIPTION
ValueError

If the value doesn't match the pattern.

Source code in src/glazing/base.py
def validate_pattern(value: str, pattern: str, field_name: str) -> str:
    """Validate a string against a regex pattern.

    Parameters
    ----------
    value : str
        The value to validate.
    pattern : str
        The regex pattern to match.
    field_name : str
        Name of the field being validated (for error messages).

    Returns
    -------
    str
        The validated value.

    Raises
    ------
    ValueError
        If the value doesn't match the pattern.
    """
    if not re.match(pattern, value):
        msg = f"Invalid {field_name} format: {value}"
        raise ValueError(msg)
    return value

validate_percentage_notation(value: str) -> str

Check VerbNet's WordNet notation (lemma%#:#:#::).

Source code in src/glazing/base.py
def validate_percentage_notation(value: str) -> str:
    """Check VerbNet's WordNet notation (lemma%#:#:#::)."""
    return validate_pattern(value, PERCENTAGE_NOTATION_PATTERN, "percentage notation")

validate_propbank_roleset(value: str) -> str

Check PropBank roleset ID format (lemma.##).

Source code in src/glazing/base.py
def validate_propbank_roleset(value: str) -> str:
    """Check PropBank roleset ID format (lemma.##)."""
    return validate_pattern(value, PROPBANK_ROLESET_PATTERN, "PropBank roleset ID")

validate_verbnet_class(value: str) -> str

Check VerbNet class ID format (e.g., give-13.1).

Source code in src/glazing/base.py
def validate_verbnet_class(value: str) -> str:
    """Check VerbNet class ID format (e.g., give-13.1)."""
    return validate_pattern(value, VERBNET_CLASS_PATTERN, "VerbNet class ID")

validate_verbnet_key(value: str) -> str

Check VerbNet member key format.

Source code in src/glazing/base.py
def validate_verbnet_key(value: str) -> str:
    """Check VerbNet member key format."""
    return validate_pattern(value, VERBNET_KEY_PATTERN, "VerbNet key")

validate_wordnet_offset(value: str) -> str

Check WordNet synset offset format.

Source code in src/glazing/base.py
def validate_wordnet_offset(value: str) -> str:
    """Check WordNet synset offset format."""
    return validate_pattern(value, WORDNET_OFFSET_PATTERN, "WordNet offset")

validate_wordnet_sense_key(value: str) -> str

Check WordNet sense key format.

Source code in src/glazing/base.py
def validate_wordnet_sense_key(value: str) -> str:
    """Check WordNet sense key format."""
    return validate_pattern(value, WORDNET_SENSE_KEY_PATTERN, "WordNet sense key")