Skip to content

glazing.references

Cross-dataset reference resolution with automatic extraction and fuzzy matching.

Overview

The references module provides utilities for extracting and resolving cross-references between datasets. The new CrossReferenceIndex class provides an ergonomic API with automatic extraction, caching, and fuzzy matching support.

Quick Start

from glazing.references.index import CrossReferenceIndex

# Automatic extraction and caching
xref = CrossReferenceIndex()

# Resolve references
refs = xref.resolve("give.01", source="propbank")
print(refs["verbnet_classes"])  # ['give-13.1']

# Find data with variations or inconsistencies
refs = xref.resolve("realize.01", source="propbank", fuzzy=True)

Main Classes

CrossReferenceIndex

The primary interface for cross-reference operations:

class CrossReferenceIndex(
    auto_extract: bool = True,
    cache_dir: Path | None = None,
    show_progress: bool = True
)

Key Methods: - resolve(entity_id, source, fuzzy=False) - Resolve cross-references - find_mappings(source_id, source_dataset, target_dataset) - Find direct mappings - extract_all() - Manually trigger extraction - clear_cache() - Clear cached references

Modules

  • Models - Reference data models
  • Extractor - Lower-level extraction interface
  • Resolver - Reference resolution logic
  • Mapper - Mapping between dataset identifiers

Features

  • Automatic Extraction: References are extracted automatically on first use
  • Caching: Extracted references are cached for fast subsequent loads
  • Fuzzy Matching: Find data with typos, morphological variants, and spelling inconsistencies
  • Confidence Scores: All mappings include confidence scores
  • Progress Indicators: Visual feedback during extraction

references

Cross-reference models and resolution utilities.

This module provides models and utilities for managing cross-references between FrameNet, PropBank, VerbNet, and WordNet. It includes confidence scoring, transitive mapping resolution, and conflict detection.

CLASS DESCRIPTION
CrossReference

A mapping between entities in different datasets.

MappingConfidence

Confidence scoring for mappings.

UnifiedLemma

A lemma with representations across all datasets.

MappingIndex

Bidirectional index for fast mapping lookups.

CrossReferenceIndex

Automatic cross-reference extraction and resolution.

ReferenceExtractor

Extract references from datasets.

ReferenceResolver

Resolve cross-references between datasets.

FUNCTION DESCRIPTION
get_default_index

Get or create the default global index.

Examples:

>>> from glazing.references.index import CrossReferenceIndex
>>> xref = CrossReferenceIndex()
>>> refs = xref.resolve("give.01", source="propbank")
>>> print(refs["verbnet_classes"])
['give-13.1']