glazing.wordnet.converter¶
Converting WordNet database to JSON Lines.
converter
¶
WordNet database file parser.
This module provides parsing functionality for WordNet 3.1 database files including index files, data files, sense index, and exception files.
| CLASS | DESCRIPTION |
|---|---|
WordNetConverter |
Parse WordNet database files into JSON Lines format. |
| FUNCTION | DESCRIPTION |
|---|---|
parse_index_file |
Parse WordNet index file (index.noun, index.verb, etc.). |
parse_data_file |
Parse WordNet data file (data.noun, data.verb, etc.). |
parse_sense_index |
Parse WordNet sense index file. |
parse_exception_file |
Parse morphological exception file. |
Examples:
>>> from pathlib import Path
>>> from glazing.wordnet.converter import WordNetConverter
>>> converter = WordNetConverter()
>>> synsets = converter.parse_data_file("data.noun")
>>> index_entries = converter.parse_index_file("index.verb")
>>> # Convert entire WordNet database
>>> converter.convert_wordnet_database(
... wordnet_dir="wordnet31/dict",
... output_dir="wordnet_jsonl"
... )
Classes¶
WordNetConverter
¶
Parse WordNet database files into structured models.
Handles parsing of WordNet 3.1 database files including index files, data files, sense index, and morphological exception files.
| METHOD | DESCRIPTION |
|---|---|
parse_data_file |
Parse WordNet data file into list of Synset models. |
parse_index_file |
Parse WordNet index file into list of IndexEntry models. |
parse_sense_index |
Parse sense index file into list of Sense models. |
parse_exception_file |
Parse morphological exception file. |
convert_wordnet_database |
Convert entire WordNet database to JSON Lines. |
Functions¶
convert_exceptions(wordnet_dir: Path | str, output_file: Path | str) -> int
¶
Parse *.exc files and output ExceptionEntry objects to JSONL.
| PARAMETER | DESCRIPTION |
|---|---|
wordnet_dir
|
Directory containing WordNet database files.
TYPE:
|
output_file
|
Output JSON Lines file path.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
Number of exception entries written. |
Source code in src/glazing/wordnet/converter.py
convert_sense_index(wordnet_dir: Path | str, output_file: Path | str) -> int
¶
Parse index.sense and output Sense objects to JSONL.
| PARAMETER | DESCRIPTION |
|---|---|
wordnet_dir
|
Directory containing WordNet database files.
TYPE:
|
output_file
|
Output JSON Lines file path.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
Number of sense entries written. |
| RAISES | DESCRIPTION |
|---|---|
FileNotFoundError
|
If index.sense file does not exist. |
Source code in src/glazing/wordnet/converter.py
convert_wordnet_database(wordnet_dir: Path | str, output_file: Path | str) -> dict[str, int]
¶
Convert entire WordNet database to JSON Lines.
| PARAMETER | DESCRIPTION |
|---|---|
wordnet_dir
|
Directory containing WordNet database files.
TYPE:
|
output_file
|
Output JSON Lines file path.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, int]
|
Counts of processed items by file type. |
| RAISES | DESCRIPTION |
|---|---|
FileNotFoundError
|
If WordNet directory does not exist. |
Source code in src/glazing/wordnet/converter.py
432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 | |
parse_cntlist(filepath: Path | str) -> dict[str, int]
¶
Parse cntlist into a mapping of sense key to frequency count.
| PARAMETER | DESCRIPTION |
|---|---|
filepath
|
Path to cntlist file.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, int]
|
Mapping from sense key to frequency count. |
Source code in src/glazing/wordnet/converter.py
parse_data_file(filepath: Path | str, pos: WordNetPOS) -> list[Synset]
¶
Parse WordNet data file into list of Synset models.
| PARAMETER | DESCRIPTION |
|---|---|
filepath
|
Path to WordNet data file (e.g., data.noun).
TYPE:
|
pos
|
Part of speech for validation.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[Synset]
|
List of parsed Synset models. |
| RAISES | DESCRIPTION |
|---|---|
FileNotFoundError
|
If the data file does not exist. |
ValueError
|
If line format is invalid. |
Source code in src/glazing/wordnet/converter.py
parse_exception_file(filepath: Path | str) -> list[ExceptionEntry]
¶
Parse morphological exception file.
| PARAMETER | DESCRIPTION |
|---|---|
filepath
|
Path to exception file (e.g., verb.exc).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[ExceptionEntry]
|
List of parsed ExceptionEntry models. |
| RAISES | DESCRIPTION |
|---|---|
FileNotFoundError
|
If the exception file does not exist. |
Source code in src/glazing/wordnet/converter.py
parse_index_file(filepath: Path | str, pos: WordNetPOS) -> list[IndexEntry]
¶
Parse WordNet index file into list of IndexEntry models.
| PARAMETER | DESCRIPTION |
|---|---|
filepath
|
Path to WordNet index file (e.g., index.noun).
TYPE:
|
pos
|
Part of speech for validation.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[IndexEntry]
|
List of parsed IndexEntry models. |
| RAISES | DESCRIPTION |
|---|---|
FileNotFoundError
|
If the index file does not exist. |
ValueError
|
If line format is invalid. |
Source code in src/glazing/wordnet/converter.py
parse_sense_index(filepath: Path | str) -> list[Sense]
¶
Parse WordNet sense index file.
| PARAMETER | DESCRIPTION |
|---|---|
filepath
|
Path to sense index file (index.sense).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[Sense]
|
List of parsed Sense models. |
| RAISES | DESCRIPTION |
|---|---|
FileNotFoundError
|
If the sense index file does not exist. |
Source code in src/glazing/wordnet/converter.py
parse_verb_framestext(filepath: Path | str) -> dict[int, str]
¶
Parse verb.Framestext into a mapping of frame number to template string.
| PARAMETER | DESCRIPTION |
|---|---|
filepath
|
Path to verb.Framestext file.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[int, str]
|
Mapping from frame number to template string. |
Source code in src/glazing/wordnet/converter.py
parse_verb_sentences(filepath: Path | str) -> dict[int, str]
¶
Parse sents.vrb into a mapping of frame number to example sentence.
| PARAMETER | DESCRIPTION |
|---|---|
filepath
|
Path to sents.vrb file.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[int, str]
|
Mapping from frame number to example sentence. |
Source code in src/glazing/wordnet/converter.py
Functions¶
convert_wordnet_database(wordnet_dir: Path | str, output_file: Path | str) -> dict[str, int]
¶
Convert entire WordNet database to JSON Lines.
| PARAMETER | DESCRIPTION |
|---|---|
wordnet_dir
|
WordNet database directory.
TYPE:
|
output_file
|
Output JSON Lines file path.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, int]
|
Processing counts by file type. |
Source code in src/glazing/wordnet/converter.py
parse_data_file(filepath: Path | str, pos: WordNetPOS) -> list[Synset]
¶
Parse WordNet data file into list of Synset models.
| PARAMETER | DESCRIPTION |
|---|---|
filepath
|
Path to WordNet data file.
TYPE:
|
pos
|
Part of speech code.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[Synset]
|
List of parsed synsets. |
Source code in src/glazing/wordnet/converter.py
parse_exception_file(filepath: Path | str) -> list[ExceptionEntry]
¶
Parse morphological exception file.
| PARAMETER | DESCRIPTION |
|---|---|
filepath
|
Path to exception file.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[ExceptionEntry]
|
List of parsed exception entries. |
Source code in src/glazing/wordnet/converter.py
parse_index_file(filepath: Path | str, pos: WordNetPOS) -> list[IndexEntry]
¶
Parse WordNet index file into list of IndexEntry models.
| PARAMETER | DESCRIPTION |
|---|---|
filepath
|
Path to WordNet index file.
TYPE:
|
pos
|
Part of speech code.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[IndexEntry]
|
List of parsed index entries. |