Skip to content

glazing.wordnet.models

WordNet core data models.

models

WordNet data models.

This module implements WordNet 3.1 data models including synsets, words, senses, and relations using Pydantic v2 for validation and type safety.

CLASS DESCRIPTION
Synset

WordNet synset (set of cognitive synonyms).

Word

Word/lemma in a synset.

Pointer

Relation/pointer to another synset or word.

VerbFrame

Syntactic frame for a verb.

Sense

Word sense (word-meaning pair).

IndexEntry

Entry in WordNet index file.

ExceptionEntry

Morphological exception mapping.

WordNetCrossRef

Cross-reference to WordNet from other resources.

Examples:

>>> from glazing.wordnet.models import Synset, Word
>>> synset = Synset(
...     offset="00001740",
...     lex_filenum=5,
...     lex_filename="noun.animal",
...     ss_type="n",
...     words=[Word(lemma="dog", lex_id=0)],
...     pointers=[],
...     gloss="a domesticated carnivorous mammal"
... )

Classes

ExceptionEntry pydantic-model

Bases: GlazingBaseModel

Morphological exception mapping.

ATTRIBUTE DESCRIPTION
inflected_form

Inflected/irregular form.

TYPE: str

base_forms

Base/lemma forms.

TYPE: list[str]

Examples:

>>> entry = ExceptionEntry(
...     inflected_form="geese",
...     base_forms=["goose"]
... )

Fields:

Validators:

Attributes
base_forms: list[str] pydantic-field

Base/lemma forms

inflected_form: str pydantic-field

Inflected/irregular form

pos: WordNetPOS | None = None pydantic-field

Part of speech

Functions
validate_forms(v: str | list[str]) -> str | list[str] pydantic-validator

Validate word forms.

PARAMETER DESCRIPTION
v

The value to validate.

TYPE: str | list[str]

RETURNS DESCRIPTION
str | list[str]

The validated value.

RAISES DESCRIPTION
ValueError

If word form is invalid.

Source code in src/glazing/wordnet/models.py
@field_validator("inflected_form", "base_forms")
@classmethod
def validate_forms(cls, v: str | list[str]) -> str | list[str]:
    """Validate word forms.

    Parameters
    ----------
    v : str | list[str]
        The value to validate.

    Returns
    -------
    str | list[str]
        The validated value.

    Raises
    ------
    ValueError
        If word form is invalid.
    """
    if isinstance(v, str):
        cleaned = v.replace("_", "").replace("-", "").replace("'", "").replace(".", "")
        if not v or not cleaned.isalpha():
            msg = f"Invalid word form: {v}"
            raise ValueError(msg)
    elif isinstance(v, list):
        for form in v:
            cleaned = form.replace("_", "").replace("-", "").replace("'", "").replace(".", "")
            if not form or not cleaned.isalpha():
                msg = f"Invalid word form: {form}"
                raise ValueError(msg)
    return v

IndexEntry pydantic-model

Bases: GlazingBaseModel

An entry in a WordNet index file.

ATTRIBUTE DESCRIPTION
lemma

Word form.

TYPE: str

pos

Part of speech.

TYPE: WordNetPOS

synset_cnt

Number of synsets.

TYPE: int

p_cnt

Number of pointer types.

TYPE: int

ptr_symbols

Pointer symbols for this word.

TYPE: list[PointerSymbol]

sense_cnt

Same as synset_cnt.

TYPE: int

tagsense_cnt

Number of senses in semantic concordances.

TYPE: int

synset_offsets

Synsets containing this word.

TYPE: list[SynsetOffset]

Examples:

>>> entry = IndexEntry(
...     lemma="dog",
...     pos="n",
...     synset_cnt=7,
...     p_cnt=4,
...     ptr_symbols=["!", "@", "~", "#m"],
...     sense_cnt=7,
...     tagsense_cnt=6,
...     synset_offsets=["00001740", "00002084"]
... )

Fields:

Attributes
lemma: str pydantic-field

Word form

p_cnt: int pydantic-field

Number of pointer types

pos: WordNetPOS pydantic-field

Part of speech

ptr_symbols: list[PointerSymbol] pydantic-field

Pointer symbols for this word

sense_cnt: int pydantic-field

Same as synset_cnt

synset_cnt: int pydantic-field

Number of synsets

synset_offsets: list[SynsetOffset] pydantic-field

Synsets with this word

tagsense_cnt: int pydantic-field

Semantic concordance senses

Pointer pydantic-model

Bases: GlazingBaseModel

A relation/pointer to another synset or word.

ATTRIBUTE DESCRIPTION
symbol

Relation type symbol.

TYPE: PointerSymbol

offset

Target synset offset.

TYPE: SynsetOffset

pos

Target part of speech.

TYPE: WordNetPOS

source

Source word number (0 = entire synset).

TYPE: int

target

Target word number (0 = entire synset).

TYPE: int

METHOD DESCRIPTION
is_lexical

Check if this is a lexical (word-to-word) relation.

is_semantic

Check if this is a semantic (synset-to-synset) relation.

Examples:

>>> pointer = Pointer(
...     symbol="@",
...     offset="00002084",
...     pos="n",
...     source=0,
...     target=0
... )
>>> pointer.is_semantic()
True

Fields:

Attributes
offset: SynsetOffset pydantic-field

Target synset offset

pos: WordNetPOS pydantic-field

Target part of speech

source: int pydantic-field

Source word number (0 = entire synset)

symbol: PointerSymbol pydantic-field

Relation type symbol

target: int pydantic-field

Target word number (0 = entire synset)

Functions
is_lexical() -> bool

Check if this is a lexical (word-to-word) relation.

RETURNS DESCRIPTION
bool

True if either source or target is non-zero.

Source code in src/glazing/wordnet/models.py
def is_lexical(self) -> bool:
    """Check if this is a lexical (word-to-word) relation.

    Returns
    -------
    bool
        True if either source or target is non-zero.
    """
    return self.source != 0 or self.target != 0
is_semantic() -> bool

Check if this is a semantic (synset-to-synset) relation.

RETURNS DESCRIPTION
bool

True if both source and target are zero.

Source code in src/glazing/wordnet/models.py
def is_semantic(self) -> bool:
    """Check if this is a semantic (synset-to-synset) relation.

    Returns
    -------
    bool
        True if both source and target are zero.
    """
    return self.source == 0 and self.target == 0

Sense pydantic-model

Bases: GlazingBaseModel

A word sense (word-meaning pair).

ATTRIBUTE DESCRIPTION
sense_key

Unique sense identifier.

TYPE: SenseKey

lemma

Word form.

TYPE: str

ss_type

Synset type.

TYPE: WordNetPOS

lex_filenum

Lexical file number.

TYPE: int

lex_id

Lexical ID.

TYPE: LexID

head_word

For adjective satellites.

TYPE: str | None, default=None

head_id

Head word lex_id.

TYPE: int | None, default=None

synset_offset

Synset containing this sense.

TYPE: SynsetOffset

sense_number

Frequency-based ordering.

TYPE: SenseNumber

tag_count

Semantic concordance count.

TYPE: TagCount

METHOD DESCRIPTION
parse_sense_key

Parse sense key into components.

Examples:

>>> sense = Sense(
...     sense_key="dog%1:05:00::",
...     lemma="dog",
...     ss_type="n",
...     lex_filenum=5,
...     lex_id=0,
...     synset_offset="00001740",
...     sense_number=1,
...     tag_count=15
... )
>>> components = sense.parse_sense_key()
>>> components['lemma']
'dog'

Fields:

Attributes
head_id: int | None = None pydantic-field

Head word lex_id

head_word: str | None = None pydantic-field

For adjective satellites

lemma: str pydantic-field

Word form

lex_filenum: int pydantic-field

Lexical file number

lex_id: LexID pydantic-field

Lexical ID

sense_key: SenseKey pydantic-field

Unique sense identifier

sense_number: SenseNumber pydantic-field

Frequency-based ordering

ss_type: WordNetPOS pydantic-field

Synset type

synset_offset: SynsetOffset pydantic-field

Synset containing this sense

tag_count: TagCount pydantic-field

Semantic concordance count

Functions
parse_sense_key() -> dict[str, str | int | None]

Parse sense key into components.

RETURNS DESCRIPTION
dict[str, str | int | None]

Dictionary with components: lemma, ss_type, lex_filenum, lex_id, head_word, head_id.

Examples:

>>> sense = Sense(sense_key="dog%1:05:00::", ...)
>>> components = sense.parse_sense_key()
>>> components['ss_type']
1
Source code in src/glazing/wordnet/models.py
def parse_sense_key(self) -> dict[str, str | int | None]:
    """Parse sense key into components.

    Returns
    -------
    dict[str, str | int | None]
        Dictionary with components: lemma, ss_type, lex_filenum, lex_id,
        head_word, head_id.

    Examples
    --------
    >>> sense = Sense(sense_key="dog%1:05:00::", ...)
    >>> components = sense.parse_sense_key()
    >>> components['ss_type']
    1
    """
    parts = self.sense_key.split("%")
    lemma = parts[0]
    rest = parts[1].split(":")
    return {
        "lemma": lemma,
        "ss_type": int(rest[0]),
        "lex_filenum": int(rest[1]),
        "lex_id": int(rest[2]),
        "head_word": rest[3] if rest[3] else None,
        "head_id": int(rest[4]) if rest[4] else None,
    }

Synset pydantic-model

Bases: GlazingBaseModel

A WordNet synset (set of cognitive synonyms).

ATTRIBUTE DESCRIPTION
offset

8-digit identifier.

TYPE: SynsetOffset

lex_filenum

Lexical file number (0-44).

TYPE: int

lex_filename

Validated lexical file name.

TYPE: LexFileName

ss_type

Synset type (n, v, a, r, s).

TYPE: WordNetPOS

words

Words in this synset.

TYPE: list[Word]

pointers

Relations to other synsets.

TYPE: list[Pointer]

frames

Verb frames (verbs only).

TYPE: list[VerbFrame] | None, default=None

gloss

Definition and examples.

TYPE: str

METHOD DESCRIPTION
get_lemmas

Get all lemmas in the synset.

get_hypernyms

Get hypernym pointers.

get_hyponyms

Get hyponym pointers.

Examples:

>>> synset = Synset(
...     offset="00001740",
...     lex_filenum=5,
...     lex_filename="noun.animal",
...     ss_type="n",
...     words=[Word(lemma="dog", lex_id=0)],
...     pointers=[],
...     gloss="a domesticated carnivorous mammal"
... )
>>> synset.get_lemmas()
['dog']

Fields:

Attributes
frames: list[VerbFrame] | None = None pydantic-field

Verb frames (verbs only)

gloss: str pydantic-field

Definition and examples

lex_filename: LexFileName pydantic-field

Lexical file name

lex_filenum: int pydantic-field

Lexical file number (0-44)

offset: SynsetOffset pydantic-field

8-digit synset identifier

pointers: list[Pointer] pydantic-field

Relations

ss_type: WordNetPOS pydantic-field

Synset type

words: list[Word] pydantic-field

Words in this synset

Functions
get_hypernyms() -> list[Pointer]

Get hypernym pointers.

RETURNS DESCRIPTION
list[Pointer]

Pointers with '@' symbol.

Source code in src/glazing/wordnet/models.py
def get_hypernyms(self) -> list[Pointer]:
    """Get hypernym pointers.

    Returns
    -------
    list[Pointer]
        Pointers with '@' symbol.
    """
    return [p for p in self.pointers if p.symbol == "@"]
get_hyponyms() -> list[Pointer]

Get hyponym pointers.

RETURNS DESCRIPTION
list[Pointer]

Pointers with '~' symbol.

Source code in src/glazing/wordnet/models.py
def get_hyponyms(self) -> list[Pointer]:
    """Get hyponym pointers.

    Returns
    -------
    list[Pointer]
        Pointers with '~' symbol.
    """
    return [p for p in self.pointers if p.symbol == "~"]
get_lemmas() -> list[str]

Get all lemmas in the synset.

RETURNS DESCRIPTION
list[str]

List of lemma strings.

Source code in src/glazing/wordnet/models.py
def get_lemmas(self) -> list[str]:
    """Get all lemmas in the synset.

    Returns
    -------
    list[str]
        List of lemma strings.
    """
    return [word.lemma for word in self.words]
get_lexical_pointers() -> list[Pointer]

Get lexical (word-to-word) pointers only.

RETURNS DESCRIPTION
list[Pointer]

Pointers where source!=0 or target!=0.

Source code in src/glazing/wordnet/models.py
def get_lexical_pointers(self) -> list[Pointer]:
    """Get lexical (word-to-word) pointers only.

    Returns
    -------
    list[Pointer]
        Pointers where source!=0 or target!=0.
    """
    return [p for p in self.pointers if p.is_lexical()]
get_pointers_by_symbol(symbol: PointerSymbol) -> list[Pointer]

Get pointers by relation symbol.

PARAMETER DESCRIPTION
symbol

Relation symbol to filter by.

TYPE: PointerSymbol

RETURNS DESCRIPTION
list[Pointer]

Pointers with the specified symbol.

Examples:

>>> synset = Synset(...)
>>> antonyms = synset.get_pointers_by_symbol("!")
Source code in src/glazing/wordnet/models.py
def get_pointers_by_symbol(self, symbol: PointerSymbol) -> list[Pointer]:
    """Get pointers by relation symbol.

    Parameters
    ----------
    symbol : PointerSymbol
        Relation symbol to filter by.

    Returns
    -------
    list[Pointer]
        Pointers with the specified symbol.

    Examples
    --------
    >>> synset = Synset(...)
    >>> antonyms = synset.get_pointers_by_symbol("!")
    """
    return [p for p in self.pointers if p.symbol == symbol]
get_semantic_pointers() -> list[Pointer]

Get semantic (synset-to-synset) pointers only.

RETURNS DESCRIPTION
list[Pointer]

Pointers where source=0 and target=0.

Source code in src/glazing/wordnet/models.py
def get_semantic_pointers(self) -> list[Pointer]:
    """Get semantic (synset-to-synset) pointers only.

    Returns
    -------
    list[Pointer]
        Pointers where source=0 and target=0.
    """
    return [p for p in self.pointers if p.is_semantic()]
has_relation(symbol: PointerSymbol) -> bool

Check if synset has a specific relation type.

PARAMETER DESCRIPTION
symbol

Relation symbol to check for.

TYPE: PointerSymbol

RETURNS DESCRIPTION
bool

True if synset has at least one pointer with this symbol.

Examples:

>>> synset = Synset(...)
>>> has_hypernyms = synset.has_relation("@")
Source code in src/glazing/wordnet/models.py
def has_relation(self, symbol: PointerSymbol) -> bool:
    """Check if synset has a specific relation type.

    Parameters
    ----------
    symbol : PointerSymbol
        Relation symbol to check for.

    Returns
    -------
    bool
        True if synset has at least one pointer with this symbol.

    Examples
    --------
    >>> synset = Synset(...)
    >>> has_hypernyms = synset.has_relation("@")
    """
    return any(p.symbol == symbol for p in self.pointers)

VerbFrame pydantic-model

Bases: GlazingBaseModel

Syntactic frame for a verb.

ATTRIBUTE DESCRIPTION
frame_number

Frame number (1-35).

TYPE: VerbFrameNumber

word_indices

Word indices (0 = all words, or specific indices).

TYPE: list[int]

template

Natural language frame template (e.g., "Something ----s").

TYPE: str | None, default=None

example_sentence

Example sentence with %s placeholder for verb.

TYPE: str | None, default=None

Examples:

>>> frame = VerbFrame(frame_number=8, word_indices=[0])
>>> frame.frame_number
8

Fields:

Validators:

Attributes
example_sentence: str | None = None pydantic-field

Example sentence with %s placeholder

frame_number: VerbFrameNumber pydantic-field

Frame number (1-35)

template: str | None = None pydantic-field

Natural language frame template

word_indices: list[int] pydantic-field

Word indices (0 = all words)

Functions
validate_word_indices(v: list[int]) -> list[int] pydantic-validator

Validate word indices.

PARAMETER DESCRIPTION
v

The word indices to validate.

TYPE: list[int]

RETURNS DESCRIPTION
list[int]

The validated indices.

RAISES DESCRIPTION
ValueError

If any index is negative.

Source code in src/glazing/wordnet/models.py
@field_validator("word_indices")
@classmethod
def validate_word_indices(cls, v: list[int]) -> list[int]:
    """Validate word indices.

    Parameters
    ----------
    v : list[int]
        The word indices to validate.

    Returns
    -------
    list[int]
        The validated indices.

    Raises
    ------
    ValueError
        If any index is negative.
    """
    for idx in v:
        if idx < 0:
            msg = f"Word index cannot be negative: {idx}"
            raise ValueError(msg)
    return v

Word pydantic-model

Bases: GlazingBaseModel

A word/lemma in a synset.

ATTRIBUTE DESCRIPTION
lemma

Word form (lowercase, underscores for spaces).

TYPE: str

lex_id

Distinguishes same word in synset (0-15).

TYPE: LexID

sense_number

Frequency-based sense ordering from index.sense.

TYPE: int | None, default=None

tag_count

Semantic concordance tag count.

TYPE: int, default=0

Examples:

>>> word = Word(lemma="dog", lex_id=0)
>>> word.lemma
'dog'
>>> word.lex_id
0

Fields:

Validators:

Attributes
lemma: str pydantic-field

Word form (lowercase, underscores for spaces)

lex_id: LexID pydantic-field

Lexical ID distinguishing same word in synset

sense_number: int | None = None pydantic-field

Frequency-based sense ordering

tag_count: int = 0 pydantic-field

Semantic concordance tag count

Functions
validate_lemma(v: str) -> str pydantic-validator

Validate lemma format.

PARAMETER DESCRIPTION
v

The lemma to validate.

TYPE: str

RETURNS DESCRIPTION
str

The validated lemma.

RAISES DESCRIPTION
ValueError

If lemma format is invalid.

Source code in src/glazing/wordnet/models.py
@field_validator("lemma")
@classmethod
def validate_lemma(cls, v: str) -> str:
    """Validate lemma format.

    Parameters
    ----------
    v : str
        The lemma to validate.

    Returns
    -------
    str
        The validated lemma.

    Raises
    ------
    ValueError
        If lemma format is invalid.
    """
    if not re.match(LEMMA_PATTERN, v):
        msg = f"Invalid lemma format: {v!r}"
        raise ValueError(msg)
    return v

WordNetCrossRef pydantic-model

Bases: GlazingBaseModel

Cross-reference to WordNet from other resources.

ATTRIBUTE DESCRIPTION
sense_key

TYPE: SenseKey | None, default=None

synset_offset

TYPE: SynsetOffset | None, default=None

lemma

Word lemma.

TYPE: str

pos

Part of speech.

TYPE: WordNetPOS

sense_number

Sense number for ordering.

TYPE: SenseNumber | None, default=None

METHOD DESCRIPTION
to_percentage_notation

Convert to VerbNet percentage notation.

from_percentage_notation

Parse VerbNet percentage notation.

is_valid_reference

Check if reference has valid identifiers.

get_primary_identifier

Get primary identifier (sense_key preferred).

Examples:

>>> ref = WordNetCrossRef(
...     sense_key="give%2:40:00::",
...     lemma="give",
...     pos="v"
... )
>>> notation = ref.to_percentage_notation()
>>> notation
'give%2:40:00'
>>> ref.is_valid_reference()
True

Fields:

Attributes
lemma: str pydantic-field

Word lemma

pos: WordNetPOS pydantic-field

Part of speech

sense_key: SenseKey | None = None pydantic-field

Stable sense identifier

sense_number: SenseNumber | None = None pydantic-field

Sense ordering

synset_offset: SynsetOffset | None = None pydantic-field

Version-specific offset

Functions
from_percentage_notation(notation: str) -> WordNetCrossRef classmethod

Parse VerbNet percentage notation.

PARAMETER DESCRIPTION
notation

Percentage notation (e.g., "give%2:40:00").

TYPE: str

RETURNS DESCRIPTION
WordNetCrossRef

Cross-reference object.

RAISES DESCRIPTION
ValueError

If notation format is invalid.

Examples:

>>> ref = WordNetCrossRef.from_percentage_notation("give%2:40:00")
>>> ref.lemma
'give'
>>> ref.pos
'v'
Source code in src/glazing/wordnet/models.py
@classmethod
def from_percentage_notation(cls, notation: str) -> WordNetCrossRef:
    """Parse VerbNet percentage notation.

    Parameters
    ----------
    notation : str
        Percentage notation (e.g., "give%2:40:00").

    Returns
    -------
    WordNetCrossRef
        Cross-reference object.

    Raises
    ------
    ValueError
        If notation format is invalid.

    Examples
    --------
    >>> ref = WordNetCrossRef.from_percentage_notation("give%2:40:00")
    >>> ref.lemma
    'give'
    >>> ref.pos
    'v'
    """
    match = re.match(r"^([a-z_-]+)%([1-5]):([0-9]{2}):([0-9]{2})$", notation)
    if not match:
        msg = f"Invalid percentage notation: {notation}"
        raise ValueError(msg)

    lemma = match.group(1)
    ss_type = int(match.group(2))
    lex_filenum = match.group(3)
    lex_id = match.group(4)

    # Map ss_type to POS
    pos_map: dict[int, WordNetPOS] = {1: "n", 2: "v", 3: "a", 4: "r", 5: "s"}
    pos = pos_map[ss_type]

    # Construct partial sense key
    sense_key = f"{lemma}%{ss_type}:{lex_filenum}:{lex_id}::"

    return cls(sense_key=sense_key, synset_offset=None, lemma=lemma, pos=pos, sense_number=None)
get_primary_identifier() -> str | None

Get primary identifier (sense_key preferred).

RETURNS DESCRIPTION
str | None

Sense key if available, otherwise synset offset.

Examples:

>>> ref = WordNetCrossRef(sense_key="give%2:40:00::", lemma="give", pos="v")
>>> ref.get_primary_identifier()
'give%2:40:00::'
Source code in src/glazing/wordnet/models.py
def get_primary_identifier(self) -> str | None:
    """Get primary identifier (sense_key preferred).

    Returns
    -------
    str | None
        Sense key if available, otherwise synset offset.

    Examples
    --------
    >>> ref = WordNetCrossRef(sense_key="give%2:40:00::", lemma="give", pos="v")
    >>> ref.get_primary_identifier()
    'give%2:40:00::'
    """
    return self.sense_key or self.synset_offset
is_valid_reference() -> bool

Check if reference has valid identifiers.

RETURNS DESCRIPTION
bool

True if has sense_key or synset_offset.

Examples:

>>> ref = WordNetCrossRef(sense_key="give%2:40:00::", lemma="give", pos="v")
>>> ref.is_valid_reference()
True
Source code in src/glazing/wordnet/models.py
def is_valid_reference(self) -> bool:
    """Check if reference has valid identifiers.

    Returns
    -------
    bool
        True if has sense_key or synset_offset.

    Examples
    --------
    >>> ref = WordNetCrossRef(sense_key="give%2:40:00::", lemma="give", pos="v")
    >>> ref.is_valid_reference()
    True
    """
    return self.sense_key is not None or self.synset_offset is not None
to_percentage_notation() -> str

Convert to VerbNet percentage notation.

RETURNS DESCRIPTION
str

Percentage notation (e.g., "give%2:40:00").

Examples:

>>> ref = WordNetCrossRef(sense_key="give%2:40:00::", lemma="give", pos="v")
>>> ref.to_percentage_notation()
'give%2:40:00'
Source code in src/glazing/wordnet/models.py
def to_percentage_notation(self) -> str:
    """Convert to VerbNet percentage notation.

    Returns
    -------
    str
        Percentage notation (e.g., "give%2:40:00").

    Examples
    --------
    >>> ref = WordNetCrossRef(sense_key="give%2:40:00::", lemma="give", pos="v")
    >>> ref.to_percentage_notation()
    'give%2:40:00'
    """
    if self.sense_key:
        # Extract components from sense key (format: lemma%ss_type:lex_filenum:lex_id::)
        parts = self.sense_key.split("%")
        if len(parts) >= 2:
            sense_part = parts[1].split(":")
            if len(sense_part) >= 3:
                return f"{self.lemma}%{sense_part[0]}:{sense_part[1]}:{sense_part[2]}"
    return ""