glazing.propbank.converter¶
Converting PropBank XML to JSON Lines.
converter
¶
PropBank XML to JSON Lines converter.
This module provides conversion from PropBank XML format to JSON Lines format using the glazing PropBank models.
| CLASS | DESCRIPTION |
|---|---|
PropBankConverter |
Convert PropBank XML files to JSON Lines format. |
| FUNCTION | DESCRIPTION |
|---|---|
convert_frameset_file |
Convert a single frameset XML file to Frameset model. |
convert_framesets_directory |
Convert all frameset files in a directory to JSON Lines. |
Examples:
>>> from pathlib import Path
>>> from glazing.propbank.converter import PropBankConverter
>>> converter = PropBankConverter()
>>> frameset = converter.convert_frameset_file("frames/abandon.xml")
>>> print(frameset.predicate_lemma)
'abandon'
Classes¶
PropBankConverter(validate_schema: bool = False)
¶
Convert PropBank XML files to JSON Lines format.
| PARAMETER | DESCRIPTION |
|---|---|
validate_schema
|
Whether to validate against DTD.
TYPE:
|
| ATTRIBUTE | DESCRIPTION |
|---|---|
validate_schema |
Whether to validate XML against schema.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
convert_frameset_file |
Convert a frameset XML file to Frameset model. |
convert_framesets_directory |
Convert all framesets in a directory to JSON Lines. |
Initialize the converter.
| PARAMETER | DESCRIPTION |
|---|---|
validate_schema
|
Whether to validate XML against DTD.
TYPE:
|
Source code in src/glazing/propbank/converter.py
Functions¶
convert_combined_frameset_file(filepath: Path | str) -> list[Frameset]
¶
Convert a combined frameset XML file with multiple predicates.
Handles files like AMR-UMR-91-rolesets.xml where a single
convert_frameset_file(filepath: Path | str) -> Frameset
¶
Convert a frameset XML file to Frameset model.
| PARAMETER | DESCRIPTION |
|---|---|
filepath
|
Path to frameset XML file.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Frameset
|
Parsed Frameset model instance. |
Examples:
>>> converter = PropBankConverter()
>>> frameset = converter.convert_frameset_file("frames/abandon.xml")
>>> print(f"Predicate: {frameset.predicate_lemma}")
'Predicate: abandon'
Source code in src/glazing/propbank/converter.py
convert_framesets_directory(input_dir: Path | str, output_file: Path | str, pattern: str = '*.xml') -> int
¶
Convert all frameset files in a directory to JSON Lines.
Also processes combined frameset files (e.g., AMR-UMR-91-rolesets.xml) found in the parent directory.
| PARAMETER | DESCRIPTION |
|---|---|
input_dir
|
Directory containing frameset XML files.
TYPE:
|
output_file
|
Output JSON Lines file path.
TYPE:
|
pattern
|
File pattern to match.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
Number of framesets converted. |
Examples:
>>> converter = PropBankConverter()
>>> count = converter.convert_framesets_directory(
... "propbank-frames/frames",
... "framesets.jsonl"
... )
>>> print(f"Converted {count} framesets")
'Converted 5559 framesets'
Source code in src/glazing/propbank/converter.py
Functions¶
convert_frameset_file(filepath: Path | str) -> Frameset
¶
Convert a single frameset XML file to Frameset model.
| PARAMETER | DESCRIPTION |
|---|---|
filepath
|
Path to frameset XML file.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Frameset
|
Parsed Frameset model. |
Source code in src/glazing/propbank/converter.py
convert_framesets_directory(input_dir: Path | str, output_file: Path | str, pattern: str = '*.xml') -> int
¶
Convert all framesets in a directory to JSON Lines.
| PARAMETER | DESCRIPTION |
|---|---|
input_dir
|
Directory with frameset XML files.
TYPE:
|
output_file
|
Output JSON Lines file.
TYPE:
|
pattern
|
File pattern to match.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
Number of framesets converted. |