Skip to content

glazing.references.mapper

Mapping between dataset identifiers.

mapper

Role alignment and concept mapping across linguistic datasets.

This module provides functionality for aligning semantic roles, mapping concepts, and calculating similarity between entities across FrameNet, PropBank, VerbNet, and WordNet.

CLASS DESCRIPTION
ReferenceMapper

Main class for role alignment and concept mapping.

Notes

The mapper uses various alignment strategies including direct mappings, syntactic position matching, and semantic similarity measures to establish correspondences between dataset elements.

Classes

ReferenceMapper()

Map roles and concepts across linguistic datasets.

This class provides algorithms for aligning semantic roles between datasets, unifying concepts, and calculating similarity scores.

ATTRIBUTE DESCRIPTION
role_alignments

Unified role mappings indexed by concept name.

TYPE: dict[str, UnifiedRoleMapping]

concept_alignments

Concept alignments indexed by concept name.

TYPE: dict[str, ConceptAlignment]

role_mapping_tables

Pre-defined role mapping tables.

TYPE: list[RoleMappingTable]

METHOD DESCRIPTION
align_roles

Align a VerbNet role with FrameNet FEs and PropBank arguments.

map_concepts

Create unified concept mapping across datasets.

calculate_similarity

Calculate semantic similarity between entities.

build_alignment_matrix

Build confidence matrix for entity alignments.

get_unified_lemma

Get unified representation of a lemma across datasets.

Initialize the reference mapper.

Source code in src/glazing/references/mapper.py
def __init__(self) -> None:
    """Initialize the reference mapper."""
    self.role_alignments: dict[str, UnifiedRoleMapping] = {}
    self.concept_alignments: dict[str, ConceptAlignment] = {}
    self.role_mapping_tables: list[RoleMappingTable] = []
    self._init_default_mappings()
Functions
align_roles(verbnet_role: ThematicRole, verbnet_class: VerbClass, framenet_frame: Frame | None = None, propbank_roleset: Roleset | None = None) -> UnifiedRoleMapping

Align a VerbNet role with FrameNet FEs and PropBank arguments.

Uses multiple strategies including direct mappings, syntactic position, and semantic similarity to establish alignments.

PARAMETER DESCRIPTION
verbnet_role

VerbNet thematic role to align.

TYPE: ThematicRole

verbnet_class

VerbNet class containing the role.

TYPE: VerbClass

framenet_frame

FrameNet frame to align with.

TYPE: Frame | None DEFAULT: None

propbank_roleset

PropBank roleset to align with.

TYPE: Roleset | None DEFAULT: None

RETURNS DESCRIPTION
UnifiedRoleMapping

Unified mapping for the role across datasets.

Source code in src/glazing/references/mapper.py
def align_roles(
    self,
    verbnet_role: ThematicRole,
    verbnet_class: VerbClass,
    framenet_frame: Frame | None = None,
    propbank_roleset: Roleset | None = None,
) -> UnifiedRoleMapping:
    """Align a VerbNet role with FrameNet FEs and PropBank arguments.

    Uses multiple strategies including direct mappings, syntactic position,
    and semantic similarity to establish alignments.

    Parameters
    ----------
    verbnet_role : ThematicRole
        VerbNet thematic role to align.
    verbnet_class : VerbClass
        VerbNet class containing the role.
    framenet_frame : Frame | None, default=None
        FrameNet frame to align with.
    propbank_roleset : Roleset | None, default=None
        PropBank roleset to align with.

    Returns
    -------
    UnifiedRoleMapping
        Unified mapping for the role across datasets.
    """
    concept = self._get_role_concept(verbnet_role.type)

    # Check if we already have this alignment
    if concept in self.role_alignments:
        mapping = self.role_alignments[concept]
        # Add new dataset mappings if not present
        mapping.verbnet_roles.append((verbnet_class.id, verbnet_role.type))
    else:
        mapping = UnifiedRoleMapping(
            concept=concept,
            verbnet_roles=[(verbnet_class.id, verbnet_role.type)],
            framenet_fes=[],
            propbank_args=[],
            wordnet_restrictions=[],
            confidence_matrix={},
        )

    # Align with FrameNet
    if framenet_frame:
        fe_alignment = self._align_with_framenet_fe(verbnet_role, framenet_frame)
        if fe_alignment:
            mapping.framenet_fes.append((framenet_frame.name, fe_alignment))
            # Add confidence
            self._add_alignment_confidence(
                mapping,
                f"VerbNet:{verbnet_class.id}:{verbnet_role.type}",
                f"FrameNet:{framenet_frame.name}:{fe_alignment}",
                0.8,  # Default confidence
            )

    # Align with PropBank
    if propbank_roleset:
        pb_alignment = self._align_with_propbank_arg(verbnet_role, propbank_roleset)
        if pb_alignment:
            mapping.propbank_args.append((propbank_roleset.id, pb_alignment))
            # Add confidence
            self._add_alignment_confidence(
                mapping,
                f"VerbNet:{verbnet_class.id}:{verbnet_role.type}",
                f"PropBank:{propbank_roleset.id}:{pb_alignment}",
                0.85,  # Default confidence
            )

    # Extract WordNet restrictions from selectional restrictions
    if verbnet_role.sel_restrictions:
        restrictions = self._extract_wordnet_restrictions(verbnet_role)
        mapping.wordnet_restrictions.extend(restrictions)

    self.role_alignments[concept] = mapping
    return mapping
build_alignment_matrix(entities1: list[str], dataset1: DatasetType, entities2: list[str], dataset2: DatasetType) -> dict[str, dict[str, float]]

Build confidence matrix for entity alignments.

Creates a matrix of similarity scores between all pairs of entities from two datasets.

PARAMETER DESCRIPTION
entities1

Entities from first dataset.

TYPE: list[str]

dataset1

First dataset type.

TYPE: DatasetType

entities2

Entities from second dataset.

TYPE: list[str]

dataset2

Second dataset type.

TYPE: DatasetType

RETURNS DESCRIPTION
dict[str, dict[str, float]]

Nested dict mapping entity1 -> entity2 -> confidence.

Source code in src/glazing/references/mapper.py
def build_alignment_matrix(
    self,
    entities1: list[str],
    dataset1: DatasetType,
    entities2: list[str],
    dataset2: DatasetType,
) -> dict[str, dict[str, float]]:
    """Build confidence matrix for entity alignments.

    Creates a matrix of similarity scores between all pairs
    of entities from two datasets.

    Parameters
    ----------
    entities1 : list[str]
        Entities from first dataset.
    dataset1 : DatasetType
        First dataset type.
    entities2 : list[str]
        Entities from second dataset.
    dataset2 : DatasetType
        Second dataset type.

    Returns
    -------
    dict[str, dict[str, float]]
        Nested dict mapping entity1 -> entity2 -> confidence.
    """
    matrix: dict[str, dict[str, float]] = {}

    for e1 in entities1:
        matrix[e1] = {}
        for e2 in entities2:
            similarity = self.calculate_similarity(e1, dataset1, e2, dataset2)
            if similarity > 0:
                matrix[e1][e2] = similarity

    return matrix
calculate_similarity(entity1: str, dataset1: DatasetType, entity2: str, dataset2: DatasetType) -> float

Calculate semantic similarity between entities.

Uses various heuristics to estimate similarity between entities from different datasets.

PARAMETER DESCRIPTION
entity1

First entity ID.

TYPE: str

dataset1

First entity's dataset.

TYPE: DatasetType

entity2

Second entity ID.

TYPE: str

dataset2

Second entity's dataset.

TYPE: DatasetType

RETURNS DESCRIPTION
float

Similarity score (0.0-1.0).

Source code in src/glazing/references/mapper.py
def calculate_similarity(
    self,
    entity1: str,
    dataset1: DatasetType,
    entity2: str,
    dataset2: DatasetType,
) -> float:
    """Calculate semantic similarity between entities.

    Uses various heuristics to estimate similarity between
    entities from different datasets.

    Parameters
    ----------
    entity1 : str
        First entity ID.
    dataset1 : DatasetType
        First entity's dataset.
    entity2 : str
        Second entity ID.
    dataset2 : DatasetType
        Second entity's dataset.

    Returns
    -------
    float
        Similarity score (0.0-1.0).
    """
    # Check if they're in the same concept alignment
    for alignment in self.concept_alignments.values():
        in_first = False
        in_second = False

        if (
            (dataset1 == "framenet" and entity1 in alignment.framenet_frames)
            or (dataset1 == "propbank" and entity1 in alignment.propbank_rolesets)
            or (dataset1 == "verbnet" and entity1 in alignment.verbnet_classes)
            or (dataset1 == "wordnet" and entity1 in alignment.wordnet_synsets)
        ):
            in_first = True

        if (
            (dataset2 == "framenet" and entity2 in alignment.framenet_frames)
            or (dataset2 == "propbank" and entity2 in alignment.propbank_rolesets)
            or (dataset2 == "verbnet" and entity2 in alignment.verbnet_classes)
            or (dataset2 == "wordnet" and entity2 in alignment.wordnet_synsets)
        ):
            in_second = True

        if in_first and in_second:
            return alignment.confidence or 0.8

    # Check role alignments if both are roles
    for role_mapping in self.role_alignments.values():
        score = self._check_role_similarity(entity1, dataset1, entity2, dataset2, role_mapping)
        if score > 0:
            return score

    # Default: no similarity found
    return 0.0
get_unified_lemma(lemma: str, pos: str, framenet_lus: list[str] | None = None, propbank_rolesets: list[str] | None = None, verbnet_members: list[str] | None = None, wordnet_senses: list[Sense] | None = None) -> UnifiedLemma

Get unified representation of a lemma across datasets.

Creates a unified view of how a lemma is represented in each linguistic dataset.

PARAMETER DESCRIPTION
lemma

The lemma to unify.

TYPE: str

pos

Part of speech.

TYPE: str

framenet_lus

FrameNet lexical unit IDs.

TYPE: list[str] | None DEFAULT: None

propbank_rolesets

PropBank roleset IDs.

TYPE: list[str] | None DEFAULT: None

verbnet_members

VerbNet member keys.

TYPE: list[str] | None DEFAULT: None

wordnet_senses

WordNet senses.

TYPE: list[Sense] | None DEFAULT: None

RETURNS DESCRIPTION
UnifiedLemma

Unified lemma representation.

Source code in src/glazing/references/mapper.py
def get_unified_lemma(  # noqa: PLR0913
    self,
    lemma: str,
    pos: str,
    framenet_lus: list[str] | None = None,
    propbank_rolesets: list[str] | None = None,
    verbnet_members: list[str] | None = None,
    wordnet_senses: list[Sense] | None = None,
) -> UnifiedLemma:
    """Get unified representation of a lemma across datasets.

    Creates a unified view of how a lemma is represented in
    each linguistic dataset.

    Parameters
    ----------
    lemma : str
        The lemma to unify.
    pos : str
        Part of speech.
    framenet_lus : list[str] | None, default=None
        FrameNet lexical unit IDs.
    propbank_rolesets : list[str] | None, default=None
        PropBank roleset IDs.
    verbnet_members : list[str] | None, default=None
        VerbNet member keys.
    wordnet_senses : list[Sense] | None, default=None
        WordNet senses.

    Returns
    -------
    UnifiedLemma
        Unified lemma representation.
    """
    # Validate POS early
    normalized_pos = self._validate_and_normalize_pos(pos)

    # Build references for each dataset
    framenet_lu_refs = self._build_framenet_lu_refs(framenet_lus)
    propbank_roleset_refs = self._build_propbank_roleset_refs(propbank_rolesets)
    verbnet_member_refs = self._build_verbnet_member_refs(verbnet_members, lemma)
    wordnet_sense_list = wordnet_senses or []

    return UnifiedLemma(
        lemma=lemma,
        pos=normalized_pos,
        framenet_lus=framenet_lu_refs,
        propbank_rolesets=propbank_roleset_refs,
        verbnet_members=verbnet_member_refs,
        wordnet_senses=wordnet_sense_list,
    )
map_concepts(concept_name: str, framenet_frames: list[str] | None = None, propbank_rolesets: list[str] | None = None, verbnet_classes: list[str] | None = None, wordnet_synsets: list[str] | None = None) -> ConceptAlignment

Create unified concept mapping across datasets.

Maps a semantic concept to its representations in each dataset.

PARAMETER DESCRIPTION
concept_name

Name of the semantic concept.

TYPE: str

framenet_frames

FrameNet frames representing the concept.

TYPE: list[str] | None DEFAULT: None

propbank_rolesets

PropBank rolesets representing the concept.

TYPE: list[str] | None DEFAULT: None

verbnet_classes

VerbNet classes representing the concept.

TYPE: list[str] | None DEFAULT: None

wordnet_synsets

WordNet synset offsets representing the concept.

TYPE: list[str] | None DEFAULT: None

RETURNS DESCRIPTION
ConceptAlignment

Unified concept alignment.

Source code in src/glazing/references/mapper.py
def map_concepts(
    self,
    concept_name: str,
    framenet_frames: list[str] | None = None,
    propbank_rolesets: list[str] | None = None,
    verbnet_classes: list[str] | None = None,
    wordnet_synsets: list[str] | None = None,
) -> ConceptAlignment:
    """Create unified concept mapping across datasets.

    Maps a semantic concept to its representations in each dataset.

    Parameters
    ----------
    concept_name : str
        Name of the semantic concept.
    framenet_frames : list[str] | None, default=None
        FrameNet frames representing the concept.
    propbank_rolesets : list[str] | None, default=None
        PropBank rolesets representing the concept.
    verbnet_classes : list[str] | None, default=None
        VerbNet classes representing the concept.
    wordnet_synsets : list[str] | None, default=None
        WordNet synset offsets representing the concept.

    Returns
    -------
    ConceptAlignment
        Unified concept alignment.
    """
    alignment = ConceptAlignment(
        concept_name=concept_name,
        concept_type="event",  # Default, could be inferred
        framenet_frames=framenet_frames or [],
        propbank_rolesets=propbank_rolesets or [],
        verbnet_classes=verbnet_classes or [],
        wordnet_synsets=wordnet_synsets or [],
        confidence=self._calculate_concept_confidence(
            framenet_frames, propbank_rolesets, verbnet_classes, wordnet_synsets
        ),
        alignment_method="manual" if concept_name else "automatic",
        alignment_criteria=["semantic_similarity", "syntactic_pattern"],
    )

    self.concept_alignments[concept_name] = alignment
    return alignment