glazing.initialize¶
Initialization and setup functions.
initialize
¶
Initialize glazing by downloading and converting all datasets.
This module provides functionality to automatically download and convert all linguistic datasets on first use or installation.
| FUNCTION | DESCRIPTION |
|---|---|
check_initialization |
Check if datasets are initialized. |
get_default_data_dir |
Get the default data directory for glazing. |
get_default_data_path |
Get the default path for a converted data file. |
initialize_datasets |
Download and convert all datasets. |
main |
Set up all datasets. Downloads raw data and converts to JSON Lines format. |
Classes¶
Functions¶
check_initialization(data_dir: Path | None = None) -> bool
¶
Check if datasets are initialized.
| PARAMETER | DESCRIPTION |
|---|---|
data_dir
|
Data directory to check.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if initialized, False otherwise. |
Source code in src/glazing/initialize.py
get_default_data_dir() -> Path
¶
Get the default data directory for glazing.
| RETURNS | DESCRIPTION |
|---|---|
Path
|
Default data directory path. |
Source code in src/glazing/initialize.py
get_default_data_path(filename: str | None = None) -> Path
¶
Get the default path for a converted data file.
| PARAMETER | DESCRIPTION |
|---|---|
filename
|
Filename to append to the converted data directory. If None, returns the converted directory path.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Path
|
Path to the data file or directory. |
Source code in src/glazing/initialize.py
initialize_datasets(data_dir: Path | None = None, force: bool = False, verbose: bool = True) -> bool
¶
Download and convert all datasets.
| PARAMETER | DESCRIPTION |
|---|---|
data_dir
|
Directory to store data. If None, uses default.
TYPE:
|
force
|
Force re-download even if data exists.
TYPE:
|
verbose
|
Print progress messages.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if successful, False otherwise. |
Source code in src/glazing/initialize.py
main(data_dir: str | Path | None, force: bool, quiet: bool) -> None
¶
Set up all datasets. Downloads raw data and converts to JSON Lines format.