glazing.wordnet.loader¶
Loading WordNet data from JSON Lines.
loader
¶
WordNet database loader with index building and caching.
This module provides functionality to load WordNet data from JSON Lines files, build efficient indices for fast lookups, and construct relation graphs for traversal operations.
| CLASS | DESCRIPTION |
|---|---|
WordNetLoader |
Loads and indexes WordNet database from JSON Lines format with automatic loading. |
| FUNCTION | DESCRIPTION |
|---|---|
load_wordnet |
Load a complete WordNet database from JSON Lines files. |
Examples:
>>> from glazing.wordnet.loader import WordNetLoader
>>> # Data loads automatically on initialization
>>> loader = WordNetLoader()
>>> synset = loader.get_synset("00001740")
>>> senses = loader.get_senses_by_lemma("dog", pos="n")
>>>
>>> # Or disable autoload for manual control
>>> loader = WordNetLoader(autoload=False)
>>> loader.load() # Load manually when needed
Classes¶
WordNetLoader(data_path: Path | str | None = None, lazy: bool = False, autoload: bool = True, cache_size: int = 1000)
¶
Load and index WordNet database from JSON Lines format with automatic loading.
This class provides efficient loading and indexing of WordNet data, including synsets, senses, and morphological exceptions. It builds multiple indices for fast lookups and supports lazy loading of large datasets. By default, data is loaded automatically on initialization.
| PARAMETER | DESCRIPTION |
|---|---|
data_path
|
Path to the WordNet JSONL file (e.g., wordnet.jsonl). If None, uses default path from environment.
TYPE:
|
lazy
|
If True, load synsets on demand rather than all at once.
TYPE:
|
autoload
|
Whether to automatically load data on initialization. Only applies when lazy=False.
TYPE:
|
cache_size
|
Number of synsets to cache when using lazy loading.
TYPE:
|
| ATTRIBUTE | DESCRIPTION |
|---|---|
synsets |
All loaded synsets indexed by offset.
TYPE:
|
lemma_index |
Index from lemmas to synset offsets by POS.
TYPE:
|
sense_index |
Index from sense keys to sense objects.
TYPE:
|
exceptions |
Morphological exceptions by POS.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
load |
Load all WordNet data from JSON Lines files. |
get_synset |
Get a synset by its offset. |
get_senses_by_lemma |
Get all senses for a lemma and optional POS. |
get_sense_by_key |
Get a sense by its unique sense key. |
Examples:
>>> # Automatic loading (default)
>>> loader = WordNetLoader()
>>> dog_synsets = loader.get_synsets_by_lemma("dog", "n")
>>> for synset in dog_synsets:
... print(f"{synset.offset}: {synset.gloss}")
>>> # Manual loading
>>> loader = WordNetLoader(autoload=False)
>>> loader.load()
>>> synsets = loader.synsets # Now accessible
Initialize WordNet loader.
| PARAMETER | DESCRIPTION |
|---|---|
data_path
|
Path to the WordNet JSONL file (e.g., wordnet.jsonl). If None, uses default path from environment.
TYPE:
|
lazy
|
If True, load synsets on demand.
TYPE:
|
autoload
|
Whether to automatically load data on initialization. Only applies when lazy=False.
TYPE:
|
cache_size
|
Size of LRU cache for lazy loading.
TYPE:
|
Source code in src/glazing/wordnet/loader.py
Functions¶
get_exceptions(pos: WordNetPOS) -> dict[str, list[str]]
¶
Get morphological exceptions for a POS.
| PARAMETER | DESCRIPTION |
|---|---|
pos
|
The part of speech.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, list[str]]
|
Mapping from inflected forms to base forms. |
Source code in src/glazing/wordnet/loader.py
get_holonyms(synset: Synset) -> list[Synset]
¶
Get all holonyms (wholes) of a synset.
| PARAMETER | DESCRIPTION |
|---|---|
synset
|
The synset to get holonyms for.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[Synset]
|
List of holonym synsets. |
Source code in src/glazing/wordnet/loader.py
get_hypernyms(synset: Synset) -> list[Synset]
¶
Get direct hypernyms of a synset.
| PARAMETER | DESCRIPTION |
|---|---|
synset
|
The synset to get hypernyms for.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[Synset]
|
List of hypernym synsets. |
Source code in src/glazing/wordnet/loader.py
get_hyponyms(synset: Synset) -> list[Synset]
¶
Get direct hyponyms of a synset.
| PARAMETER | DESCRIPTION |
|---|---|
synset
|
The synset to get hyponyms for.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[Synset]
|
List of hyponym synsets. |
Source code in src/glazing/wordnet/loader.py
get_meronyms(synset: Synset) -> list[Synset]
¶
Get all meronyms (parts) of a synset.
| PARAMETER | DESCRIPTION |
|---|---|
synset
|
The synset to get meronyms for.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[Synset]
|
List of meronym synsets. |
Source code in src/glazing/wordnet/loader.py
get_sense_by_key(sense_key: SenseKey) -> Sense | None
¶
Get a sense by its unique sense key.
| PARAMETER | DESCRIPTION |
|---|---|
sense_key
|
The unique sense key.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Sense | None
|
The sense or None if not found. |
Examples:
Source code in src/glazing/wordnet/loader.py
get_senses_by_lemma(lemma: str, pos: WordNetPOS | None = None) -> list[Sense]
¶
Get all senses for a lemma.
| PARAMETER | DESCRIPTION |
|---|---|
lemma
|
The word lemma to search for.
TYPE:
|
pos
|
Part of speech filter.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[Sense]
|
List of senses for the lemma, sorted by sense number. |
Examples:
>>> senses = loader.get_senses_by_lemma("run", "v")
>>> for sense in senses:
... print(f"{sense.sense_key}: {sense.sense_number}")
Source code in src/glazing/wordnet/loader.py
get_synset(offset: SynsetOffset) -> Synset | None
¶
Get a synset by its offset.
| PARAMETER | DESCRIPTION |
|---|---|
offset
|
The 8-digit synset offset.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Synset | None
|
The synset or None if not found. |
Examples:
Source code in src/glazing/wordnet/loader.py
get_synsets_by_lemma(lemma: str, pos: WordNetPOS | None = None) -> list[Synset]
¶
Get all synsets containing a lemma.
| PARAMETER | DESCRIPTION |
|---|---|
lemma
|
The word lemma to search for.
TYPE:
|
pos
|
Part of speech filter. If None, returns all POS.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[Synset]
|
List of synsets containing the lemma. |
Examples:
>>> synsets = loader.get_synsets_by_lemma("run", "v")
>>> for synset in synsets:
... print(synset.gloss)
Source code in src/glazing/wordnet/loader.py
load() -> None
¶
Load all WordNet data from JSON Lines files.
This method loads synsets from the primary JSONL file, builds lemma and relation indices from loaded data, and optionally loads supplementary sense and exception data.
| RAISES | DESCRIPTION |
|---|---|
FileNotFoundError
|
If the primary JSONL file doesn't exist. |
ValidationError
|
If JSON data doesn't match expected schema. |
Source code in src/glazing/wordnet/loader.py
Functions¶
load_wordnet(data_path: Path | str, lazy: bool = False, cache_size: int = 1000) -> WordNetLoader
¶
Load a WordNet database from JSON Lines files.
| PARAMETER | DESCRIPTION |
|---|---|
data_path
|
Path to the WordNet JSONL file (e.g., wordnet.jsonl).
TYPE:
|
lazy
|
If True, load synsets on demand.
TYPE:
|
cache_size
|
Size of LRU cache for lazy loading.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
WordNetLoader
|
Loaded WordNet database. |
Examples:
>>> wn = load_wordnet("data/wordnet.jsonl")
>>> dog = wn.get_synsets_by_lemma("dog", "n")[0]
>>> print(dog.gloss)