glazing.wordnet.models¶
WordNet core data models.
models
¶
WordNet data models.
This module implements WordNet 3.1 data models including synsets, words, senses, and relations using Pydantic v2 for validation and type safety.
| CLASS | DESCRIPTION |
|---|---|
Synset |
WordNet synset (set of cognitive synonyms). |
Word |
Word/lemma in a synset. |
Pointer |
Relation/pointer to another synset or word. |
VerbFrame |
Syntactic frame for a verb. |
Sense |
Word sense (word-meaning pair). |
IndexEntry |
Entry in WordNet index file. |
ExceptionEntry |
Morphological exception mapping. |
WordNetCrossRef |
Cross-reference to WordNet from other resources. |
Examples:
>>> from glazing.wordnet.models import Synset, Word
>>> synset = Synset(
... offset="00001740",
... lex_filenum=5,
... lex_filename="noun.animal",
... ss_type="n",
... words=[Word(lemma="dog", lex_id=0)],
... pointers=[],
... gloss="a domesticated carnivorous mammal"
... )
Classes¶
ExceptionEntry
pydantic-model
¶
Bases: GlazingBaseModel
Morphological exception mapping.
| ATTRIBUTE | DESCRIPTION |
|---|---|
inflected_form |
Inflected/irregular form.
TYPE:
|
base_forms |
Base/lemma forms.
TYPE:
|
Examples:
Fields:
-
inflected_form(str) -
base_forms(list[str]) -
pos(WordNetPOS | None)
Validators:
Attributes¶
base_forms: list[str]
pydantic-field
¶
Base/lemma forms
inflected_form: str
pydantic-field
¶
Inflected/irregular form
pos: WordNetPOS | None = None
pydantic-field
¶
Part of speech
Functions¶
validate_forms(v: str | list[str]) -> str | list[str]
pydantic-validator
¶
Validate word forms.
| PARAMETER | DESCRIPTION |
|---|---|
v
|
The value to validate.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str | list[str]
|
The validated value. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If word form is invalid. |
Source code in src/glazing/wordnet/models.py
IndexEntry
pydantic-model
¶
Bases: GlazingBaseModel
An entry in a WordNet index file.
| ATTRIBUTE | DESCRIPTION |
|---|---|
lemma |
Word form.
TYPE:
|
pos |
Part of speech.
TYPE:
|
synset_cnt |
Number of synsets.
TYPE:
|
p_cnt |
Number of pointer types.
TYPE:
|
ptr_symbols |
Pointer symbols for this word.
TYPE:
|
sense_cnt |
Same as synset_cnt.
TYPE:
|
tagsense_cnt |
Number of senses in semantic concordances.
TYPE:
|
synset_offsets |
Synsets containing this word.
TYPE:
|
Examples:
>>> entry = IndexEntry(
... lemma="dog",
... pos="n",
... synset_cnt=7,
... p_cnt=4,
... ptr_symbols=["!", "@", "~", "#m"],
... sense_cnt=7,
... tagsense_cnt=6,
... synset_offsets=["00001740", "00002084"]
... )
Fields:
-
lemma(str) -
pos(WordNetPOS) -
synset_cnt(int) -
p_cnt(int) -
ptr_symbols(list[PointerSymbol]) -
sense_cnt(int) -
tagsense_cnt(int) -
synset_offsets(list[SynsetOffset])
Attributes¶
lemma: str
pydantic-field
¶
Word form
p_cnt: int
pydantic-field
¶
Number of pointer types
pos: WordNetPOS
pydantic-field
¶
Part of speech
ptr_symbols: list[PointerSymbol]
pydantic-field
¶
Pointer symbols for this word
sense_cnt: int
pydantic-field
¶
Same as synset_cnt
synset_cnt: int
pydantic-field
¶
Number of synsets
synset_offsets: list[SynsetOffset]
pydantic-field
¶
Synsets with this word
tagsense_cnt: int
pydantic-field
¶
Semantic concordance senses
Pointer
pydantic-model
¶
Bases: GlazingBaseModel
A relation/pointer to another synset or word.
| ATTRIBUTE | DESCRIPTION |
|---|---|
symbol |
Relation type symbol.
TYPE:
|
offset |
Target synset offset.
TYPE:
|
pos |
Target part of speech.
TYPE:
|
source |
Source word number (0 = entire synset).
TYPE:
|
target |
Target word number (0 = entire synset).
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
is_lexical |
Check if this is a lexical (word-to-word) relation. |
is_semantic |
Check if this is a semantic (synset-to-synset) relation. |
Examples:
>>> pointer = Pointer(
... symbol="@",
... offset="00002084",
... pos="n",
... source=0,
... target=0
... )
>>> pointer.is_semantic()
True
Fields:
Attributes¶
offset: SynsetOffset
pydantic-field
¶
Target synset offset
pos: WordNetPOS
pydantic-field
¶
Target part of speech
source: int
pydantic-field
¶
Source word number (0 = entire synset)
symbol: PointerSymbol
pydantic-field
¶
Relation type symbol
target: int
pydantic-field
¶
Target word number (0 = entire synset)
Functions¶
is_lexical() -> bool
¶
Check if this is a lexical (word-to-word) relation.
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if either source or target is non-zero. |
is_semantic() -> bool
¶
Check if this is a semantic (synset-to-synset) relation.
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if both source and target are zero. |
Sense
pydantic-model
¶
Bases: GlazingBaseModel
A word sense (word-meaning pair).
| ATTRIBUTE | DESCRIPTION |
|---|---|
sense_key |
Unique sense identifier.
TYPE:
|
lemma |
Word form.
TYPE:
|
ss_type |
Synset type.
TYPE:
|
lex_filenum |
Lexical file number.
TYPE:
|
lex_id |
Lexical ID.
TYPE:
|
head_word |
For adjective satellites.
TYPE:
|
head_id |
Head word lex_id.
TYPE:
|
synset_offset |
Synset containing this sense.
TYPE:
|
sense_number |
Frequency-based ordering.
TYPE:
|
tag_count |
Semantic concordance count.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
parse_sense_key |
Parse sense key into components. |
Examples:
>>> sense = Sense(
... sense_key="dog%1:05:00::",
... lemma="dog",
... ss_type="n",
... lex_filenum=5,
... lex_id=0,
... synset_offset="00001740",
... sense_number=1,
... tag_count=15
... )
>>> components = sense.parse_sense_key()
>>> components['lemma']
'dog'
Fields:
-
sense_key(SenseKey) -
lemma(str) -
ss_type(WordNetPOS) -
lex_filenum(int) -
lex_id(LexID) -
head_word(str | None) -
head_id(int | None) -
synset_offset(SynsetOffset) -
sense_number(SenseNumber) -
tag_count(TagCount)
Attributes¶
head_id: int | None = None
pydantic-field
¶
Head word lex_id
head_word: str | None = None
pydantic-field
¶
For adjective satellites
lemma: str
pydantic-field
¶
Word form
lex_filenum: int
pydantic-field
¶
Lexical file number
lex_id: LexID
pydantic-field
¶
Lexical ID
sense_key: SenseKey
pydantic-field
¶
Unique sense identifier
sense_number: SenseNumber
pydantic-field
¶
Frequency-based ordering
ss_type: WordNetPOS
pydantic-field
¶
Synset type
synset_offset: SynsetOffset
pydantic-field
¶
Synset containing this sense
tag_count: TagCount
pydantic-field
¶
Semantic concordance count
Functions¶
parse_sense_key() -> dict[str, str | int | None]
¶
Parse sense key into components.
| RETURNS | DESCRIPTION |
|---|---|
dict[str, str | int | None]
|
Dictionary with components: lemma, ss_type, lex_filenum, lex_id, head_word, head_id. |
Examples:
>>> sense = Sense(sense_key="dog%1:05:00::", ...)
>>> components = sense.parse_sense_key()
>>> components['ss_type']
1
Source code in src/glazing/wordnet/models.py
Synset
pydantic-model
¶
Bases: GlazingBaseModel
A WordNet synset (set of cognitive synonyms).
| ATTRIBUTE | DESCRIPTION |
|---|---|
offset |
8-digit identifier.
TYPE:
|
lex_filenum |
Lexical file number (0-44).
TYPE:
|
lex_filename |
Validated lexical file name.
TYPE:
|
ss_type |
Synset type (n, v, a, r, s).
TYPE:
|
words |
Words in this synset.
TYPE:
|
pointers |
Relations to other synsets.
TYPE:
|
frames |
Verb frames (verbs only).
TYPE:
|
gloss |
Definition and examples.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
get_lemmas |
Get all lemmas in the synset. |
get_hypernyms |
Get hypernym pointers. |
get_hyponyms |
Get hyponym pointers. |
Examples:
>>> synset = Synset(
... offset="00001740",
... lex_filenum=5,
... lex_filename="noun.animal",
... ss_type="n",
... words=[Word(lemma="dog", lex_id=0)],
... pointers=[],
... gloss="a domesticated carnivorous mammal"
... )
>>> synset.get_lemmas()
['dog']
Fields:
-
offset(SynsetOffset) -
lex_filenum(int) -
lex_filename(LexFileName) -
ss_type(WordNetPOS) -
words(list[Word]) -
pointers(list[Pointer]) -
frames(list[VerbFrame] | None) -
gloss(str)
Attributes¶
frames: list[VerbFrame] | None = None
pydantic-field
¶
Verb frames (verbs only)
gloss: str
pydantic-field
¶
Definition and examples
lex_filename: LexFileName
pydantic-field
¶
Lexical file name
lex_filenum: int
pydantic-field
¶
Lexical file number (0-44)
offset: SynsetOffset
pydantic-field
¶
8-digit synset identifier
pointers: list[Pointer]
pydantic-field
¶
Relations
ss_type: WordNetPOS
pydantic-field
¶
Synset type
words: list[Word]
pydantic-field
¶
Words in this synset
Functions¶
get_hypernyms() -> list[Pointer]
¶
Get hypernym pointers.
| RETURNS | DESCRIPTION |
|---|---|
list[Pointer]
|
Pointers with '@' symbol. |
get_hyponyms() -> list[Pointer]
¶
Get hyponym pointers.
| RETURNS | DESCRIPTION |
|---|---|
list[Pointer]
|
Pointers with '~' symbol. |
get_lemmas() -> list[str]
¶
Get all lemmas in the synset.
| RETURNS | DESCRIPTION |
|---|---|
list[str]
|
List of lemma strings. |
get_lexical_pointers() -> list[Pointer]
¶
Get lexical (word-to-word) pointers only.
| RETURNS | DESCRIPTION |
|---|---|
list[Pointer]
|
Pointers where source!=0 or target!=0. |
get_pointers_by_symbol(symbol: PointerSymbol) -> list[Pointer]
¶
Get pointers by relation symbol.
| PARAMETER | DESCRIPTION |
|---|---|
symbol
|
Relation symbol to filter by.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[Pointer]
|
Pointers with the specified symbol. |
Examples:
Source code in src/glazing/wordnet/models.py
get_semantic_pointers() -> list[Pointer]
¶
Get semantic (synset-to-synset) pointers only.
| RETURNS | DESCRIPTION |
|---|---|
list[Pointer]
|
Pointers where source=0 and target=0. |
Source code in src/glazing/wordnet/models.py
has_relation(symbol: PointerSymbol) -> bool
¶
Check if synset has a specific relation type.
| PARAMETER | DESCRIPTION |
|---|---|
symbol
|
Relation symbol to check for.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if synset has at least one pointer with this symbol. |
Examples:
Source code in src/glazing/wordnet/models.py
VerbFrame
pydantic-model
¶
Bases: GlazingBaseModel
Syntactic frame for a verb.
| ATTRIBUTE | DESCRIPTION |
|---|---|
frame_number |
Frame number (1-35).
TYPE:
|
word_indices |
Word indices (0 = all words, or specific indices).
TYPE:
|
template |
Natural language frame template (e.g., "Something ----s").
TYPE:
|
example_sentence |
Example sentence with %s placeholder for verb.
TYPE:
|
Examples:
Fields:
-
frame_number(VerbFrameNumber) -
word_indices(list[int]) -
template(str | None) -
example_sentence(str | None)
Validators:
Attributes¶
example_sentence: str | None = None
pydantic-field
¶
Example sentence with %s placeholder
frame_number: VerbFrameNumber
pydantic-field
¶
Frame number (1-35)
template: str | None = None
pydantic-field
¶
Natural language frame template
word_indices: list[int]
pydantic-field
¶
Word indices (0 = all words)
Functions¶
validate_word_indices(v: list[int]) -> list[int]
pydantic-validator
¶
Validate word indices.
| PARAMETER | DESCRIPTION |
|---|---|
v
|
The word indices to validate.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[int]
|
The validated indices. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If any index is negative. |
Source code in src/glazing/wordnet/models.py
Word
pydantic-model
¶
Bases: GlazingBaseModel
A word/lemma in a synset.
| ATTRIBUTE | DESCRIPTION |
|---|---|
lemma |
Word form (lowercase, underscores for spaces).
TYPE:
|
lex_id |
Distinguishes same word in synset (0-15).
TYPE:
|
sense_number |
Frequency-based sense ordering from index.sense.
TYPE:
|
tag_count |
Semantic concordance tag count.
TYPE:
|
Examples:
Fields:
-
lemma(str) -
lex_id(LexID) -
sense_number(int | None) -
tag_count(int)
Validators:
Attributes¶
lemma: str
pydantic-field
¶
Word form (lowercase, underscores for spaces)
lex_id: LexID
pydantic-field
¶
Lexical ID distinguishing same word in synset
sense_number: int | None = None
pydantic-field
¶
Frequency-based sense ordering
tag_count: int = 0
pydantic-field
¶
Semantic concordance tag count
Functions¶
validate_lemma(v: str) -> str
pydantic-validator
¶
Validate lemma format.
| PARAMETER | DESCRIPTION |
|---|---|
v
|
The lemma to validate.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The validated lemma. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If lemma format is invalid. |
Source code in src/glazing/wordnet/models.py
WordNetCrossRef
pydantic-model
¶
Bases: GlazingBaseModel
Cross-reference to WordNet from other resources.
| ATTRIBUTE | DESCRIPTION |
|---|---|
sense_key |
TYPE:
|
synset_offset |
TYPE:
|
lemma |
Word lemma.
TYPE:
|
pos |
Part of speech.
TYPE:
|
sense_number |
Sense number for ordering.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
to_percentage_notation |
Convert to VerbNet percentage notation. |
from_percentage_notation |
Parse VerbNet percentage notation. |
is_valid_reference |
Check if reference has valid identifiers. |
get_primary_identifier |
Get primary identifier (sense_key preferred). |
Examples:
>>> ref = WordNetCrossRef(
... sense_key="give%2:40:00::",
... lemma="give",
... pos="v"
... )
>>> notation = ref.to_percentage_notation()
>>> notation
'give%2:40:00'
>>> ref.is_valid_reference()
True
Fields:
-
sense_key(SenseKey | None) -
synset_offset(SynsetOffset | None) -
lemma(str) -
pos(WordNetPOS) -
sense_number(SenseNumber | None)
Attributes¶
lemma: str
pydantic-field
¶
Word lemma
pos: WordNetPOS
pydantic-field
¶
Part of speech
sense_key: SenseKey | None = None
pydantic-field
¶
Stable sense identifier
sense_number: SenseNumber | None = None
pydantic-field
¶
Sense ordering
synset_offset: SynsetOffset | None = None
pydantic-field
¶
Version-specific offset
Functions¶
from_percentage_notation(notation: str) -> WordNetCrossRef
classmethod
¶
Parse VerbNet percentage notation.
| PARAMETER | DESCRIPTION |
|---|---|
notation
|
Percentage notation (e.g., "give%2:40:00").
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
WordNetCrossRef
|
Cross-reference object. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If notation format is invalid. |
Examples:
>>> ref = WordNetCrossRef.from_percentage_notation("give%2:40:00")
>>> ref.lemma
'give'
>>> ref.pos
'v'
Source code in src/glazing/wordnet/models.py
get_primary_identifier() -> str | None
¶
Get primary identifier (sense_key preferred).
| RETURNS | DESCRIPTION |
|---|---|
str | None
|
Sense key if available, otherwise synset offset. |
Examples:
>>> ref = WordNetCrossRef(sense_key="give%2:40:00::", lemma="give", pos="v")
>>> ref.get_primary_identifier()
'give%2:40:00::'
Source code in src/glazing/wordnet/models.py
is_valid_reference() -> bool
¶
Check if reference has valid identifiers.
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if has sense_key or synset_offset. |
Examples:
>>> ref = WordNetCrossRef(sense_key="give%2:40:00::", lemma="give", pos="v")
>>> ref.is_valid_reference()
True
Source code in src/glazing/wordnet/models.py
to_percentage_notation() -> str
¶
Convert to VerbNet percentage notation.
| RETURNS | DESCRIPTION |
|---|---|
str
|
Percentage notation (e.g., "give%2:40:00"). |
Examples:
>>> ref = WordNetCrossRef(sense_key="give%2:40:00::", lemma="give", pos="v")
>>> ref.to_percentage_notation()
'give%2:40:00'