glazing.wordnet.search¶
Searching WordNet data.
search
¶
WordNet search functionality.
This module provides search capabilities for WordNet data, including synset searches by lemma, offset, sense key, and pattern-based searches with domain-specific filtering.
| CLASS | DESCRIPTION |
|---|---|
WordNetSearch |
Search interface for WordNet data. |
Classes¶
WordNetSearch(synsets: list[Synset] | None = None, senses: list[Sense] | None = None)
¶
Search interface for WordNet data.
Provides methods for finding synsets by various criteria including lemma, offset, sense key, pattern matching, and domain-specific searches.
| PARAMETER | DESCRIPTION |
|---|---|
synsets
|
Initial synsets to index. If None, creates empty search.
TYPE:
|
senses
|
Initial senses to index. If None, creates empty search.
TYPE:
|
| ATTRIBUTE | DESCRIPTION |
|---|---|
_synsets |
Mapping from synset offset to synset object.
TYPE:
|
_synsets_by_lemma |
Mapping from lemma and POS to synset offsets.
TYPE:
|
_senses |
Mapping from sense key to sense object.
TYPE:
|
_synsets_by_domain |
Mapping from lexical file name to synset offsets.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
add_synset |
Add a synset to the search index. |
add_sense |
Add a sense to the search index. |
by_offset |
Find synset by offset and POS. |
by_lemma |
Find synsets containing a lemma. |
by_sense_key |
Find synset by sense key. |
by_pattern |
Find synsets matching a pattern. |
by_domain |
Find synsets in a specific domain. |
by_gloss_pattern |
Find synsets with glosses matching a pattern. |
get_lemma_senses |
Get all senses for a lemma. |
Examples:
>>> search = WordNetSearch()
>>> search.add_synset(dog_synset)
>>> synsets = search.by_lemma("dog", "n")
>>> synset = search.by_offset("02084442", "n")
Initialize WordNet search with optional initial data.
Source code in src/glazing/wordnet/search.py
Functions¶
add_sense(sense: Sense) -> None
¶
Add a sense to the search index.
| PARAMETER | DESCRIPTION |
|---|---|
sense
|
Sense to add to index.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If sense with same key already exists. |
Source code in src/glazing/wordnet/search.py
add_synset(synset: Synset) -> None
¶
Add a synset to the search index.
| PARAMETER | DESCRIPTION |
|---|---|
synset
|
Synset to add to index.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If synset with same offset already exists. |
Source code in src/glazing/wordnet/search.py
by_domain(domain: LexFileName) -> list[Synset]
¶
Find synsets in a specific domain.
| PARAMETER | DESCRIPTION |
|---|---|
domain
|
Lexical file name (domain).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[Synset]
|
Synsets in the specified domain. |
Source code in src/glazing/wordnet/search.py
by_gloss_pattern(pattern: str, pos: WordNetPOS | None = None, case_sensitive: bool = False) -> list[Synset]
¶
Find synsets with glosses matching a pattern.
| PARAMETER | DESCRIPTION |
|---|---|
pattern
|
Regular expression pattern to match against glosses.
TYPE:
|
pos
|
Part of speech to filter by.
TYPE:
|
case_sensitive
|
Whether search should be case-sensitive.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[Synset]
|
Synsets with glosses matching the pattern. |
| RAISES | DESCRIPTION |
|---|---|
error
|
If pattern is invalid regular expression. |
Source code in src/glazing/wordnet/search.py
by_lemma(lemma: str, pos: WordNetPOS | None = None) -> list[Synset]
¶
Find synsets containing a lemma.
| PARAMETER | DESCRIPTION |
|---|---|
lemma
|
Lemma to search for (lowercase, underscores for spaces).
TYPE:
|
pos
|
Part of speech to filter by.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[Synset]
|
Synsets containing the lemma. |
Source code in src/glazing/wordnet/search.py
by_offset(offset: SynsetOffset, pos: WordNetPOS | None = None) -> Synset | None
¶
Find synset by offset and optionally POS.
| PARAMETER | DESCRIPTION |
|---|---|
offset
|
Synset offset (8-digit string).
TYPE:
|
pos
|
Part of speech to filter by.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Synset | None
|
Synset if found, None otherwise. |
Source code in src/glazing/wordnet/search.py
by_pattern(pattern: str, pos: WordNetPOS | None = None, case_sensitive: bool = False) -> list[Synset]
¶
Find synsets matching a pattern.
| PARAMETER | DESCRIPTION |
|---|---|
pattern
|
Regular expression pattern to match against lemmas.
TYPE:
|
pos
|
Part of speech to filter by.
TYPE:
|
case_sensitive
|
Whether search should be case-sensitive.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[Synset]
|
Synsets with lemmas matching the pattern. |
| RAISES | DESCRIPTION |
|---|---|
error
|
If pattern is invalid regular expression. |
Source code in src/glazing/wordnet/search.py
by_relation_type(relation_type: str) -> list[Synset]
¶
Find synsets with specific relation type.
| PARAMETER | DESCRIPTION |
|---|---|
relation_type
|
Relation type (e.g., "hypernym", "hyponym", "antonym").
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[Synset]
|
Synsets with the specified relation type. |
Source code in src/glazing/wordnet/search.py
by_sense_key(sense_key: SenseKey) -> Synset | None
¶
Find synset by sense key.
| PARAMETER | DESCRIPTION |
|---|---|
sense_key
|
Sense key to look up.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Synset | None
|
Synset containing the sense, None if not found. |
Source code in src/glazing/wordnet/search.py
by_syntax(pattern: str) -> list[Synset]
¶
Find synsets with verbs matching a syntactic pattern.
| PARAMETER | DESCRIPTION |
|---|---|
pattern
|
Syntactic pattern (e.g., "NP V", "NP V NP", "NP V PP").
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[Synset]
|
Synsets containing verbs with matching syntactic frames. |
Source code in src/glazing/wordnet/search.py
from_jsonl_files(synsets_path: Path | str | None = None, senses_path: Path | str | None = None) -> WordNetSearch
classmethod
¶
Load search index from JSON Lines files.
| PARAMETER | DESCRIPTION |
|---|---|
synsets_path
|
Path to JSON Lines file containing synsets.
TYPE:
|
senses_path
|
Path to JSON Lines file containing senses.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
WordNetSearch
|
Search index populated with data from files. |
| RAISES | DESCRIPTION |
|---|---|
FileNotFoundError
|
If specified file does not exist. |
ValueError
|
If file contains invalid data. |
Source code in src/glazing/wordnet/search.py
get_all_domains() -> list[LexFileName]
¶
Get all lexical file names (domains).
| RETURNS | DESCRIPTION |
|---|---|
list[LexFileName]
|
Sorted list of domains. |
get_all_lemmas(pos: WordNetPOS | None = None) -> list[str]
¶
Get all unique lemmas.
| PARAMETER | DESCRIPTION |
|---|---|
pos
|
Part of speech to filter by.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[str]
|
Sorted list of unique lemmas. |
Source code in src/glazing/wordnet/search.py
get_all_synsets() -> list[Synset]
¶
Get all synsets in the search index.
| RETURNS | DESCRIPTION |
|---|---|
list[Synset]
|
All synsets sorted by offset. |
get_lemma_senses(lemma: str, pos: WordNetPOS | None = None) -> list[Sense]
¶
Get all senses for a lemma.
| PARAMETER | DESCRIPTION |
|---|---|
lemma
|
Lemma to get senses for.
TYPE:
|
pos
|
Part of speech to filter by.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[Sense]
|
Senses for the lemma, ordered by sense number. |
Source code in src/glazing/wordnet/search.py
get_statistics() -> dict[str, int]
¶
Get search index statistics.
| RETURNS | DESCRIPTION |
|---|---|
dict[str, int]
|
Statistics about indexed data. |
Source code in src/glazing/wordnet/search.py
get_synset_by_id(synset_id: str) -> Synset | None
¶
Get a synset by its ID string.
| PARAMETER | DESCRIPTION |
|---|---|
synset_id
|
Synset ID in format "offsetpos" (e.g., "01234567n").
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Synset | None
|
The synset if found, None otherwise. |