Skip to content

glazing.verbnet.converter

Converting VerbNet XML to JSON Lines.

converter

VerbNet XML to JSON Lines converter.

This module provides conversion from VerbNet XML format to JSON Lines format using the glazing VerbNet models. Handles verb class hierarchy with role inheritance, selectional restrictions with complex logic, and cross-references.

CLASS DESCRIPTION
VerbNetConverter

Convert VerbNet XML files to JSON Lines format.

FUNCTION DESCRIPTION
convert_verbnet_file

Convert a single VerbNet XML file to VerbClass model.

convert_verbnet_directory

Convert all VerbNet XML files in a directory to JSON Lines.

parse_member_cross_references

Parse cross-references from member attributes.

parse_selectional_restrictions

Parse selectional restrictions with nested logic.

Examples:

>>> from pathlib import Path
>>> from glazing.verbnet.converter import VerbNetConverter
>>> converter = VerbNetConverter()
>>> verb_class = converter.convert_verbnet_file("verbnet/give-13.1.xml")
>>> print(verb_class.id)
'give-13.1'
>>> # Convert entire directory
>>> converter.convert_verbnet_directory(
...     input_dir="verbnet_v34",
...     output_file="verbnet.jsonl"
... )

Classes

VerbNetConverter

Convert VerbNet XML files to JSON Lines format.

Handles VerbNet XML parsing with proper inheritance resolution, cross-reference extraction, and complex selectional restrictions.

METHOD DESCRIPTION
convert_verbnet_file

Convert a single VerbNet XML file to VerbClass model.

convert_verbnet_directory

Convert all VerbNet XML files to JSON Lines.

parse_verb_class

Parse a VNCLASS element into VerbClass model.

parse_members

Parse MEMBERS element into list of Member models.

parse_themroles

Parse THEMROLES element into list of ThematicRole models.

parse_frames

Parse FRAMES element into list of VNFrame models.

Functions
convert_verbnet_directory(input_dir: Path | str, output_file: Path | str) -> int

Convert all VerbNet XML files in a directory to JSON Lines.

PARAMETER DESCRIPTION
input_dir

Directory containing VerbNet XML files.

TYPE: Path | str

output_file

Output JSON Lines file path.

TYPE: Path | str

RETURNS DESCRIPTION
int

Number of files processed.

RAISES DESCRIPTION
FileNotFoundError

If the input directory does not exist.

Source code in src/glazing/verbnet/converter.py
def convert_verbnet_directory(self, input_dir: Path | str, output_file: Path | str) -> int:
    """Convert all VerbNet XML files in a directory to JSON Lines.

    Parameters
    ----------
    input_dir : Path | str
        Directory containing VerbNet XML files.
    output_file : Path | str
        Output JSON Lines file path.

    Returns
    -------
    int
        Number of files processed.

    Raises
    ------
    FileNotFoundError
        If the input directory does not exist.
    """
    input_dir = Path(input_dir)
    output_file = Path(output_file)

    if not input_dir.exists():
        msg = f"Input directory not found: {input_dir}"
        raise FileNotFoundError(msg)

    # Create output directory if needed
    output_file.parent.mkdir(parents=True, exist_ok=True)

    xml_files = list(input_dir.glob("*.xml"))
    if not xml_files:
        msg = f"No XML files found in {input_dir}"
        raise FileNotFoundError(msg)

    count = 0
    with output_file.open("w", encoding="utf-8") as f:
        for xml_file in xml_files:
            try:
                verb_class = self.convert_verbnet_file(xml_file)
                json_line = verb_class.model_dump_json()
                f.write(f"{json_line}\n")
                count += 1
            except (ValueError, FileNotFoundError) as e:
                # Log error but continue processing
                print(f"Error processing {xml_file}: {e}")
                continue

    return count
convert_verbnet_file(filepath: Path | str) -> VerbClass

Convert a single VerbNet XML file to VerbClass model.

PARAMETER DESCRIPTION
filepath

Path to VerbNet XML file.

TYPE: Path | str

RETURNS DESCRIPTION
VerbClass

Parsed VerbClass model with all subclasses.

RAISES DESCRIPTION
FileNotFoundError

If the input file does not exist.

ValueError

If XML parsing fails or structure is invalid.

Source code in src/glazing/verbnet/converter.py
def convert_verbnet_file(self, filepath: Path | str) -> VerbClass:
    """Convert a single VerbNet XML file to VerbClass model.

    Parameters
    ----------
    filepath : Path | str
        Path to VerbNet XML file.

    Returns
    -------
    VerbClass
        Parsed VerbClass model with all subclasses.

    Raises
    ------
    FileNotFoundError
        If the input file does not exist.
    ValueError
        If XML parsing fails or structure is invalid.
    """
    filepath = Path(filepath)
    if not filepath.exists():
        msg = f"VerbNet file not found: {filepath}"
        raise FileNotFoundError(msg)

    try:
        tree = etree.parse(str(filepath))
        root = tree.getroot()

        if root.tag != "VNCLASS":
            msg = f"Expected VNCLASS root element, got {root.tag}"
            raise ValueError(msg)

        return self.parse_verb_class(root)

    except etree.XMLSyntaxError as e:
        msg = f"XML parsing failed for {filepath}: {e}"
        raise ValueError(msg) from e
parse_verb_class(element: etree._Element, parent_id: VerbClassID | None = None) -> VerbClass

Parse a VNCLASS element into VerbClass model.

PARAMETER DESCRIPTION
element

VNCLASS XML element.

TYPE: _Element

parent_id

Parent class ID for subclasses.

TYPE: VerbClassID | None DEFAULT: None

RETURNS DESCRIPTION
VerbClass

Parsed VerbClass model.

RAISES DESCRIPTION
ValueError

If required attributes are missing or invalid.

Source code in src/glazing/verbnet/converter.py
def parse_verb_class(
    self, element: etree._Element, parent_id: VerbClassID | None = None
) -> VerbClass:
    """Parse a VNCLASS element into VerbClass model.

    Parameters
    ----------
    element : etree._Element
        VNCLASS XML element.
    parent_id : VerbClassID | None, default=None
        Parent class ID for subclasses.

    Returns
    -------
    VerbClass
        Parsed VerbClass model.

    Raises
    ------
    ValueError
        If required attributes are missing or invalid.
    """
    attrs = parse_attributes(element)
    class_id = str(attrs.get("ID", "")).strip()

    if not class_id:
        msg = "VNCLASS element missing ID attribute"
        raise ValueError(msg)

    # Validate class ID format
    if not re.match(VERBNET_CLASS_PATTERN, class_id):
        msg = f"Invalid VerbNet class ID format: {class_id}"
        raise ValueError(msg)

    # Parse components
    members = self._parse_members(element)
    themroles = self._parse_themroles(element)
    frames = self._parse_frames(element)
    subclasses = self._parse_subclasses(element, class_id)

    return VerbClass(
        id=class_id,
        members=members,
        themroles=themroles,
        frames=frames,
        subclasses=subclasses,
        parent_class=parent_id,
    )

Functions

convert_verbnet_directory(input_dir: Path | str, output_file: Path | str) -> int

Convert all VerbNet XML files in a directory to JSON Lines.

PARAMETER DESCRIPTION
input_dir

Directory containing VerbNet XML files.

TYPE: Path | str

output_file

Output JSON Lines file path.

TYPE: Path | str

RETURNS DESCRIPTION
int

Number of files processed.

Source code in src/glazing/verbnet/converter.py
def convert_verbnet_directory(input_dir: Path | str, output_file: Path | str) -> int:
    """Convert all VerbNet XML files in a directory to JSON Lines.

    Parameters
    ----------
    input_dir : Path | str
        Directory containing VerbNet XML files.
    output_file : Path | str
        Output JSON Lines file path.

    Returns
    -------
    int
        Number of files processed.
    """
    converter = VerbNetConverter()
    return converter.convert_verbnet_directory(input_dir, output_file)

convert_verbnet_file(filepath: Path | str) -> VerbClass

Convert a single VerbNet XML file to VerbClass model.

PARAMETER DESCRIPTION
filepath

Path to VerbNet XML file.

TYPE: Path | str

RETURNS DESCRIPTION
VerbClass

Parsed VerbClass model.

Source code in src/glazing/verbnet/converter.py
def convert_verbnet_file(filepath: Path | str) -> VerbClass:
    """Convert a single VerbNet XML file to VerbClass model.

    Parameters
    ----------
    filepath : Path | str
        Path to VerbNet XML file.

    Returns
    -------
    VerbClass
        Parsed VerbClass model.
    """
    converter = VerbNetConverter()
    return converter.convert_verbnet_file(filepath)