glazing.downloader¶
Dataset downloading utilities.
downloader
¶
Dataset downloaders for linguistic resources.
This module provides automatic downloading capabilities for FrameNet, PropBank, VerbNet, and WordNet datasets. Each downloader handles version tracking, progress indication, and archive extraction.
| CLASS | DESCRIPTION |
|---|---|
BaseDownloader |
Abstract base class for dataset downloaders. |
VerbNetDownloader |
Downloads VerbNet from GitHub with commit hash versioning. |
PropBankDownloader |
Downloads PropBank from GitHub with commit hash versioning. |
WordNetDownloader |
Downloads WordNet 3.1 from Princeton University. |
FrameNetDownloader |
Provides instructions for manual FrameNet download (license required). |
| FUNCTION | DESCRIPTION |
|---|---|
download_dataset |
Download a specific dataset by name. |
download_all |
Download all available datasets. |
get_downloader |
Get downloader instance for a dataset. |
Examples:
>>> from glazing.downloader import download_dataset
>>> path = download_dataset("verbnet", Path("data/raw"))
>>> print(f"VerbNet downloaded to: {path}")
>>> from glazing.downloader import VerbNetDownloader
>>> downloader = VerbNetDownloader()
>>> path = downloader.download(Path("data/raw"))
Classes¶
BaseDownloader
¶
Bases: ABC
Abstract base class for dataset downloaders.
Provides common functionality for downloading and extracting datasets with progress tracking and error handling.
| ATTRIBUTE | DESCRIPTION |
|---|---|
dataset_name |
Human-readable name of the dataset.
TYPE:
|
version |
Version string or commit hash for the dataset.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
download |
Download the dataset to the specified directory. |
Attributes¶
dataset_name: str
abstractmethod
property
¶
Name of the dataset.
| RETURNS | DESCRIPTION |
|---|---|
str
|
Human-readable dataset name. |
version: str
abstractmethod
property
¶
Version or commit hash.
| RETURNS | DESCRIPTION |
|---|---|
str
|
Version identifier for reproducible downloads. |
Functions¶
download(output_dir: Path) -> Path
abstractmethod
¶
Download dataset to output directory.
| PARAMETER | DESCRIPTION |
|---|---|
output_dir
|
Directory to download the dataset to.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Path
|
Path to the downloaded and extracted dataset. |
| RAISES | DESCRIPTION |
|---|---|
DownloadError
|
If download fails. |
ExtractionError
|
If archive extraction fails. |
Source code in src/glazing/downloader.py
DownloadError
¶
Bases: Exception
Raised when a download operation fails.
ExtractionError
¶
Bases: Exception
Raised when archive extraction fails.
FrameNetDownloader
¶
Bases: BaseDownloader
Downloads FrameNet from NLTK data repository.
Downloads FrameNet v1.7 from the NLTK data GitHub repository, which provides the dataset without license restrictions.
| ATTRIBUTE | DESCRIPTION |
|---|---|
dataset_name |
"framenet"
TYPE:
|
version |
"1.7"
TYPE:
|
commit_hash |
"427fc05d3a8cc1ca99e7ff93bdea937507cc9e7a"
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
download |
Download FrameNet from NLTK data repository. |
Attributes¶
commit_hash: str
property
¶
NLTK data repository commit hash.
dataset_name: str
property
¶
Name of the dataset.
version: str
property
¶
Version of FrameNet.
Functions¶
download(output_dir: Path) -> Path
¶
Download FrameNet from NLTK data repository.
| PARAMETER | DESCRIPTION |
|---|---|
output_dir
|
Directory to download FrameNet into.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Path
|
Path to the extracted FrameNet directory. |
| RAISES | DESCRIPTION |
|---|---|
DownloadError
|
If download fails. |
ExtractionError
|
If extraction fails. |
Source code in src/glazing/downloader.py
PropBankDownloader
¶
Bases: BaseDownloader
Downloads PropBank from GitHub repository.
Downloads the PropBank frames from the official GitHub repository using a specific commit hash for reproducibility.
| ATTRIBUTE | DESCRIPTION |
|---|---|
dataset_name |
"propbank"
TYPE:
|
version |
"3.4.0"
TYPE:
|
commit_hash |
"7280a04806b6ca3955ec82e28c4df96b6da76aef"
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
download |
Download PropBank dataset. |
Attributes¶
commit_hash: str
property
¶
GitHub repository commit hash.
dataset_name: str
property
¶
Name of the dataset.
version: str
property
¶
Version of PropBank.
Functions¶
download(output_dir: Path) -> Path
¶
Download PropBank dataset.
| PARAMETER | DESCRIPTION |
|---|---|
output_dir
|
Directory to download PropBank to.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Path
|
Path to the extracted PropBank directory. |
| RAISES | DESCRIPTION |
|---|---|
DownloadError
|
If download fails. |
ExtractionError
|
If extraction fails. |
Source code in src/glazing/downloader.py
VerbNetDownloader
¶
Bases: BaseDownloader
Downloads VerbNet from GitHub repository.
Downloads the VerbNet dataset from the official GitHub repository using a specific commit hash for reproducibility.
| ATTRIBUTE | DESCRIPTION |
|---|---|
dataset_name |
"verbnet"
TYPE:
|
version |
"3.4"
TYPE:
|
commit_hash |
"ae8e9cfdc2c0d3414b748763612f1a0a34194cc1"
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
download |
Download VerbNet dataset. |
Attributes¶
commit_hash: str
property
¶
GitHub repository commit hash.
dataset_name: str
property
¶
Name of the dataset.
version: str
property
¶
Version of VerbNet.
Functions¶
download(output_dir: Path) -> Path
¶
Download VerbNet dataset.
| PARAMETER | DESCRIPTION |
|---|---|
output_dir
|
Directory to download VerbNet to.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Path
|
Path to the extracted VerbNet directory. |
| RAISES | DESCRIPTION |
|---|---|
DownloadError
|
If download fails. |
ExtractionError
|
If extraction fails. |
Source code in src/glazing/downloader.py
WordNetDownloader
¶
Bases: BaseDownloader
Downloads WordNet 3.1 from Princeton University.
Downloads the WordNet 3.1 database from the official Princeton University distribution site.
| ATTRIBUTE | DESCRIPTION |
|---|---|
dataset_name |
"wordnet"
TYPE:
|
version |
"3.1"
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
download |
Download WordNet dataset. |
Attributes¶
dataset_name: str
property
¶
Name of the dataset.
version: str
property
¶
Version of WordNet.
Functions¶
download(output_dir: Path) -> Path
¶
Download WordNet dataset.
| PARAMETER | DESCRIPTION |
|---|---|
output_dir
|
Directory to download WordNet to.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Path
|
Path to the extracted WordNet directory. |
| RAISES | DESCRIPTION |
|---|---|
DownloadError
|
If download fails. |
ExtractionError
|
If extraction fails. |
Source code in src/glazing/downloader.py
Functions¶
download_all(output_dir: Path, datasets: list[DatasetType] | None = None) -> dict[DatasetType, Path | Exception]
¶
Download all available datasets.
| PARAMETER | DESCRIPTION |
|---|---|
output_dir
|
Directory to download datasets to.
TYPE:
|
datasets
|
List of datasets to download. If None, downloads all supported datasets.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[DatasetType, Path | Exception]
|
Mapping of dataset names to either the download path (success) or the exception that occurred (failure). |
Examples:
>>> from pathlib import Path
>>> results = download_all(Path("data/raw"))
>>> for dataset, result in results.items():
... if isinstance(result, Path):
... print(f"{dataset}: success -> {result}")
... else:
... print(f"{dataset}: failed -> {result}")
Source code in src/glazing/downloader.py
download_dataset(dataset: DatasetType | str, output_dir: Path) -> Path
¶
Download a specific dataset.
| PARAMETER | DESCRIPTION |
|---|---|
dataset
|
Name of the dataset to download (case-insensitive).
TYPE:
|
output_dir
|
Directory to download the dataset to.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Path
|
Path to the downloaded dataset directory. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If dataset is not supported. |
DownloadError
|
If download fails. |
ExtractionError
|
If extraction fails. |
NotImplementedError
|
If dataset requires manual download (FrameNet). |
Examples:
>>> from pathlib import Path
>>> path = download_dataset("verbnet", Path("data/raw"))
>>> print(f"Downloaded to: {path}")
Source code in src/glazing/downloader.py
get_available_datasets() -> list[DatasetType]
¶
Get list of available datasets for download.
| RETURNS | DESCRIPTION |
|---|---|
list[DatasetType]
|
List of supported dataset names. |
Examples:
>>> datasets = get_available_datasets()
>>> print(datasets)
['VerbNet', 'PropBank', 'WordNet', 'FrameNet']
Source code in src/glazing/downloader.py
get_dataset_info(dataset: DatasetType | str) -> dict[str, str]
¶
Get information about a dataset.
| PARAMETER | DESCRIPTION |
|---|---|
dataset
|
Name of the dataset (case-insensitive).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, str]
|
Dictionary with dataset information including name and version. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If dataset is not supported. |
Examples:
Source code in src/glazing/downloader.py
get_downloader(dataset: DatasetType | str) -> BaseDownloader
¶
Get downloader instance for a dataset.
| PARAMETER | DESCRIPTION |
|---|---|
dataset
|
Name of the dataset to get downloader for (case-insensitive).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
BaseDownloader
|
Downloader instance for the specified dataset. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If dataset is not supported. |
Examples: