glazing.propbank.loader¶
Loading PropBank data from JSON Lines.
loader
¶
PropBank data loader.
This module provides functionality for loading PropBank framesets and rolesets from JSON Lines files, with support for cross-reference resolution and lazy loading.
| CLASS | DESCRIPTION |
|---|---|
PropBankLoader |
Load and manage PropBank framesets and rolesets with automatic loading. |
| FUNCTION | DESCRIPTION |
|---|---|
load_framesets |
Load all framesets from a JSON Lines file. |
load_frameset |
Load a specific frameset by predicate lemma. |
Examples:
>>> from glazing.propbank.loader import PropBankLoader
>>> # Data loads automatically on initialization
>>> loader = PropBankLoader()
>>> framesets = loader.framesets # Access loaded framesets via property
>>> frameset = loader.get_frameset("abandon")
>>> roleset = loader.get_roleset("abandon.01")
>>>
>>> # Or disable autoload for manual control
>>> loader = PropBankLoader(autoload=False)
>>> framesets = loader.load() # Load manually when needed
Classes¶
PropBankLoader(data_path: Path | str | None = None, lazy: bool = False, autoload: bool = True, cache_size: int = 1000, **kwargs)
pydantic-model
¶
Bases: GlazingBaseModel
Load and manage PropBank framesets and rolesets with automatic loading.
By default, data is loaded automatically on initialization.
| PARAMETER | DESCRIPTION |
|---|---|
data_path
|
Path to PropBank JSON Lines file. If None, uses default path.
TYPE:
|
lazy
|
Whether to use lazy loading for framesets.
TYPE:
|
autoload
|
Whether to automatically load data on initialization. Only applies when lazy=False.
TYPE:
|
cache_size
|
Maximum number of framesets to cache in memory.
TYPE:
|
| ATTRIBUTE | DESCRIPTION |
|---|---|
data_path |
Path to the data file.
TYPE:
|
lazy |
Whether lazy loading is enabled.
TYPE:
|
framesets |
Property that returns loaded framesets, loading them if needed.
TYPE:
|
cache |
Cache for loaded framesets (only when lazy=True).
TYPE:
|
frameset_index |
Index mapping predicates to file positions.
TYPE:
|
roleset_index |
Index mapping roleset IDs to predicates.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
load |
Load all framesets into memory. |
get_frameset |
Get a specific frameset by predicate. |
get_roleset |
Get a specific roleset by ID. |
build_indices |
Build predicate and roleset indices. |
resolve_cross_references |
Resolve cross-references in a roleset. |
Examples:
>>> # Automatic loading (default)
>>> loader = PropBankLoader()
>>> framesets = loader.framesets # Already loaded
>>> frameset = loader.get_frameset("give")
>>> print(f"Found {len(frameset.rolesets)} rolesets")
Found 3 rolesets
Initialize PropBank loader.
| PARAMETER | DESCRIPTION |
|---|---|
data_path
|
Path to PropBank JSON Lines file. If None, uses default path from environment.
TYPE:
|
lazy
|
Whether to use lazy loading.
TYPE:
|
autoload
|
Whether to automatically load data on initialization. Only applies when lazy=False.
TYPE:
|
cache_size
|
Maximum cache size.
TYPE:
|
**kwargs
|
Additional keyword arguments.
DEFAULT:
|
Config:
arbitrary_types_allowed:True
Fields:
-
data_path(Path) -
lazy(bool) -
cache(QueryCache | None) -
frameset_index(dict[PredicateLemma, int]) -
roleset_index(dict[RolesetID, PredicateLemma]) -
framesets_cache(dict[PredicateLemma, Frameset] | None)
Source code in src/glazing/propbank/loader.py
Attributes¶
framesets: dict[PredicateLemma, Frameset]
property
¶
Get loaded framesets.
| RETURNS | DESCRIPTION |
|---|---|
dict[PredicateLemma, Frameset]
|
Dictionary of framesets mapped by predicate lemma. Loads automatically if not yet loaded. |
Functions¶
build_indices() -> None
¶
Build predicate and roleset indices for fast lookup.
This method scans the JSON Lines file to build indices without loading all data into memory.
Source code in src/glazing/propbank/loader.py
get_frameset(predicate: PredicateLemma) -> Frameset | None
¶
Get a specific frameset by predicate lemma.
| PARAMETER | DESCRIPTION |
|---|---|
predicate
|
Predicate lemma to look up.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Frameset | None
|
The frameset if found, None otherwise. |
Source code in src/glazing/propbank/loader.py
get_roleset(roleset_id: RolesetID) -> Roleset | None
¶
Get a specific roleset by ID.
| PARAMETER | DESCRIPTION |
|---|---|
roleset_id
|
Roleset ID (e.g., "abandon.01").
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Roleset | None
|
The roleset if found, None otherwise. |
Source code in src/glazing/propbank/loader.py
get_statistics() -> dict[str, int | float | bool]
¶
Get statistics about loaded PropBank data.
| RETURNS | DESCRIPTION |
|---|---|
dict[str, int | float | bool]
|
Statistics including counts and coverage. |
Source code in src/glazing/propbank/loader.py
iter_framesets(batch_size: int = 100) -> Generator[list[Frameset], None, None]
¶
Iterate over framesets in batches.
| PARAMETER | DESCRIPTION |
|---|---|
batch_size
|
Number of framesets per batch.
TYPE:
|
| YIELDS | DESCRIPTION |
|---|---|
list[Frameset]
|
Batch of framesets. |
Source code in src/glazing/propbank/loader.py
load() -> dict[PredicateLemma, Frameset]
¶
Load all framesets into memory.
| RETURNS | DESCRIPTION |
|---|---|
dict[PredicateLemma, Frameset]
|
All framesets mapped by predicate lemma. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If data file contains invalid JSON. |
Source code in src/glazing/propbank/loader.py
resolve_cross_references(roleset: Roleset) -> None
¶
Resolve cross-references in a roleset.
This method validates and enhances RoleLinks and LexLinks with additional metadata where available.
| PARAMETER | DESCRIPTION |
|---|---|
roleset
|
The roleset to resolve references for.
TYPE:
|
Source code in src/glazing/propbank/loader.py
search_by_pattern(pattern: str) -> list[Frameset]
¶
Search for framesets by predicate pattern.
| PARAMETER | DESCRIPTION |
|---|---|
pattern
|
Regular expression pattern to match predicates.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[Frameset]
|
Matching framesets. |
Source code in src/glazing/propbank/loader.py
Functions¶
load_frameset(path: Path | str, predicate: PredicateLemma) -> Frameset | None
¶
Load a specific frameset by predicate.
| PARAMETER | DESCRIPTION |
|---|---|
path
|
Path to the JSON Lines file.
TYPE:
|
predicate
|
Predicate lemma to load.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Frameset | None
|
The frameset if found. |
Examples:
>>> frameset = load_frameset("propbank.jsonl", "abandon")
>>> if frameset:
... print(f"Found {len(frameset.rolesets)} rolesets")
Source code in src/glazing/propbank/loader.py
load_framesets(path: Path | str) -> dict[PredicateLemma, Frameset]
¶
Load all PropBank framesets from a JSON Lines file.
| PARAMETER | DESCRIPTION |
|---|---|
path
|
Path to the JSON Lines file.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[PredicateLemma, Frameset]
|
All framesets mapped by predicate. |
Examples: