glazing.references¶
Cross-dataset reference resolution with automatic extraction and fuzzy matching.
Overview¶
The references module provides utilities for extracting and resolving cross-references between datasets. The new CrossReferenceIndex class provides an ergonomic API with automatic extraction, caching, and fuzzy matching support.
Quick Start¶
from glazing.references.index import CrossReferenceIndex
# Automatic extraction and caching
xref = CrossReferenceIndex()
# Resolve references
refs = xref.resolve("give.01", source="propbank")
print(refs["verbnet_classes"]) # ['give-13.1']
# Find data with variations or inconsistencies
refs = xref.resolve("realize.01", source="propbank", fuzzy=True)
Main Classes¶
CrossReferenceIndex¶
The primary interface for cross-reference operations:
class CrossReferenceIndex(
auto_extract: bool = True,
cache_dir: Path | None = None,
show_progress: bool = True
)
Key Methods:
- resolve(entity_id, source, fuzzy=False) - Resolve cross-references
- find_mappings(source_id, source_dataset, target_dataset) - Find direct mappings
- extract_all() - Manually trigger extraction
- clear_cache() - Clear cached references
Modules¶
- Models - Reference data models
- Extractor - Lower-level extraction interface
- Resolver - Reference resolution logic
- Mapper - Mapping between dataset identifiers
Features¶
- Automatic Extraction: References are extracted automatically on first use
- Caching: Extracted references are cached for fast subsequent loads
- Fuzzy Matching: Find data with typos, morphological variants, and spelling inconsistencies
- Confidence Scores: All mappings include confidence scores
- Progress Indicators: Visual feedback during extraction
references
¶
Cross-reference models and resolution utilities.
This module provides models and utilities for managing cross-references between FrameNet, PropBank, VerbNet, and WordNet. It includes confidence scoring, transitive mapping resolution, and conflict detection.
| CLASS | DESCRIPTION |
|---|---|
CrossReference |
A mapping between entities in different datasets. |
MappingConfidence |
Confidence scoring for mappings. |
UnifiedLemma |
A lemma with representations across all datasets. |
MappingIndex |
Bidirectional index for fast mapping lookups. |
CrossReferenceIndex |
Automatic cross-reference extraction and resolution. |
ReferenceExtractor |
Extract references from datasets. |
ReferenceResolver |
Resolve cross-references between datasets. |
| FUNCTION | DESCRIPTION |
|---|---|
get_default_index |
Get or create the default global index. |
Examples:
>>> from glazing.references.index import CrossReferenceIndex
>>> xref = CrossReferenceIndex()
>>> refs = xref.resolve("give.01", source="propbank")
>>> print(refs["verbnet_classes"])
['give-13.1']