Skip to content

glazing.wordnet.morphy

Morphological processing utilities.

morphy

WordNet morphological processing (morphy).

This module implements WordNet's morphological processing algorithm for finding base forms (lemmas) of inflected words. It handles both regular inflections through suffix substitution rules and irregular forms through exception lists. It also supports collocations and multi-word expressions.

CLASS DESCRIPTION
Morphy

Morphological processor for finding word base forms.

FUNCTION DESCRIPTION
morphy

Find base forms of a word given its POS.

Examples:

>>> from glazing.wordnet import Morphy
>>> morphy = Morphy(loader)
>>> lemmas = morphy.morphy("running", "v")
>>> print(lemmas)  # ['run']

Classes

Morphy(loader: WordNetLoader)

Morphological processor for finding word base forms.

This class implements WordNet's morphological processing algorithm, which attempts to find the base form (lemma) of inflected words through a combination of exception list lookup and suffix substitution rules. It also handles collocations and multi-word expressions.

PARAMETER DESCRIPTION
loader

WordNet loader with exception lists and lemma index.

TYPE: WordNetLoader

ATTRIBUTE DESCRIPTION
loader

The WordNet loader instance.

TYPE: WordNetLoader

suffix_rules

Suffix substitution rules by POS.

TYPE: dict[WordNetPOS, list[tuple[str, str]]]

METHOD DESCRIPTION
morphy

Find base forms of a word for given POS.

apply_rules

Apply morphological rules to generate candidates.

check_exceptions

Check exception lists for irregular forms.

Notes

The algorithm follows WordNet's morphy implementation: 1. Check if the word itself exists in WordNet 2. Check exception lists for irregular forms 3. Apply suffix substitution rules 4. For each candidate, verify it exists in WordNet

Special cases handled: - Multi-word expressions (collocations) - Nouns ending with "ful" - Verb-preposition collocations - Abbreviations with periods - Hyphenated words

Examples:

>>> morphy = Morphy(loader)
>>> morphy.morphy("children", "n")
['child']
>>> morphy.morphy("ran", "v")
['run']
>>> morphy.morphy("better", "a")
['good', 'well']
>>> morphy.morphy("attorneys general", "n")
['attorney general']

Initialize morphy with a WordNet loader.

PARAMETER DESCRIPTION
loader

WordNet loader with exception lists and lemma index.

TYPE: WordNetLoader

Source code in src/glazing/wordnet/morphy.py
def __init__(self, loader: WordNetLoader) -> None:
    """Initialize morphy with a WordNet loader.

    Parameters
    ----------
    loader : WordNetLoader
        WordNet loader with exception lists and lemma index.
    """
    self.loader = loader

    # Build suffix rules dictionary
    self.suffix_rules: dict[WordNetPOS, list[tuple[str, str]]] = {
        "n": self.NOUN_RULES,
        "v": self.VERB_RULES,
        "a": self.ADJ_RULES,
        "s": self.ADJ_RULES,  # Satellite adjectives use same rules
        "r": self.ADV_RULES,
    }
Functions
apply_rules(word: str, pos: WordNetPOS) -> list[str]

Apply morphological rules to generate candidates.

PARAMETER DESCRIPTION
word

The word to process (lowercase).

TYPE: str

pos

The part of speech.

TYPE: WordNetPOS

RETURNS DESCRIPTION
list[str]

List of candidate base forms.

Examples:

>>> morphy.apply_rules("running", "v")
['runn', 'run', 'runne', 'running']
Source code in src/glazing/wordnet/morphy.py
def apply_rules(self, word: str, pos: WordNetPOS) -> list[str]:
    """Apply morphological rules to generate candidates.

    Parameters
    ----------
    word : str
        The word to process (lowercase).
    pos : WordNetPOS
        The part of speech.

    Returns
    -------
    list[str]
        List of candidate base forms.

    Examples
    --------
    >>> morphy.apply_rules("running", "v")
    ['runn', 'run', 'runne', 'running']
    """
    candidates = []
    rules = self.suffix_rules.get(pos, [])

    for suffix, replacement in rules:
        if word.endswith(suffix):
            # Generate candidate by replacing suffix
            candidate = word[: -len(suffix)] + replacement if suffix else word + replacement

            if candidate and candidate not in candidates:
                candidates.append(candidate)

            # Handle doubled consonants (e.g., running -> run)
            # Check if the word has a doubled consonant before the suffix
            if suffix and len(word) > len(suffix) + 2:
                stem = word[: -len(suffix)]
                if len(stem) >= 2 and stem[-1] == stem[-2] and stem[-1] in "bdfglmnprst":
                    # Remove the doubled consonant
                    undoubled = stem[:-1] + replacement
                    if undoubled not in candidates:
                        candidates.append(undoubled)

    return candidates
check_exceptions(word: str, pos: WordNetPOS) -> list[str]

Check exception lists for irregular forms.

PARAMETER DESCRIPTION
word

The word to check (lowercase).

TYPE: str

pos

The part of speech.

TYPE: WordNetPOS

RETURNS DESCRIPTION
list[str]

List of base forms from exception list.

Examples:

>>> morphy.check_exceptions("children", "n")
['child']
>>> morphy.check_exceptions("went", "v")
['go']
Source code in src/glazing/wordnet/morphy.py
def check_exceptions(self, word: str, pos: WordNetPOS) -> list[str]:
    """Check exception lists for irregular forms.

    Parameters
    ----------
    word : str
        The word to check (lowercase).
    pos : WordNetPOS
        The part of speech.

    Returns
    -------
    list[str]
        List of base forms from exception list.

    Examples
    --------
    >>> morphy.check_exceptions("children", "n")
    ['child']

    >>> morphy.check_exceptions("went", "v")
    ['go']
    """
    exceptions = self.loader.get_exceptions(pos)
    return exceptions.get(word, [])
get_base_forms(word: str, pos: WordNetPOS | None = None) -> list[str]

Get all possible base forms of a word.

This method returns all candidates without checking if they exist in WordNet. Useful for debugging or when you want all morphological variants.

PARAMETER DESCRIPTION
word

The word to process.

TYPE: str

pos

Part of speech. If None, tries all POS.

TYPE: WordNetPOS | None DEFAULT: None

RETURNS DESCRIPTION
list[str]

List of all candidate base forms.

Examples:

>>> forms = morphy.get_base_forms("running", "v")
>>> print(forms)
['running', 'run', 'runn', 'runne']
Source code in src/glazing/wordnet/morphy.py
def get_base_forms(self, word: str, pos: WordNetPOS | None = None) -> list[str]:
    """Get all possible base forms of a word.

    This method returns all candidates without checking if they
    exist in WordNet. Useful for debugging or when you want all
    morphological variants.

    Parameters
    ----------
    word : str
        The word to process.
    pos : WordNetPOS | None, default=None
        Part of speech. If None, tries all POS.

    Returns
    -------
    list[str]
        List of all candidate base forms.

    Examples
    --------
    >>> forms = morphy.get_base_forms("running", "v")
    >>> print(forms)
    ['running', 'run', 'runn', 'runne']
    """
    word = word.lower()

    # Determine POS tags
    pos_tags: list[WordNetPOS] = [pos] if pos is not None else ["n", "v", "a", "r"]

    all_forms = []
    seen = set()

    for pos_tag in pos_tags:
        # Add word itself
        if word not in seen:
            all_forms.append(word)
            seen.add(word)

        # Add exceptions
        for exc in self.check_exceptions(word, pos_tag):
            if exc not in seen:
                all_forms.append(exc)
                seen.add(exc)

        # Add rule-based candidates
        for candidate in self.apply_rules(word, pos_tag):
            if candidate not in seen:
                all_forms.append(candidate)
                seen.add(candidate)

    return all_forms
morphy(word: str, pos: WordNetPOS | None = None) -> list[str]

Find base forms of a word for given POS.

This is the main entry point for morphological processing. It returns all possible base forms found through exception lists and suffix rules. Handles collocations and special cases.

PARAMETER DESCRIPTION
word

The inflected word or collocation to process.

TYPE: str

pos

Part of speech. If None, tries all POS.

TYPE: WordNetPOS | None DEFAULT: None

RETURNS DESCRIPTION
list[str]

List of base forms found. Empty if none found.

Examples:

>>> lemmas = morphy.morphy("running", "v")
>>> print(lemmas)
['run']
>>> lemmas = morphy.morphy("geese", "n")
>>> print(lemmas)
['goose']
>>> lemmas = morphy.morphy("attorneys general", "n")
>>> print(lemmas)
['attorney general']
Source code in src/glazing/wordnet/morphy.py
def morphy(self, word: str, pos: WordNetPOS | None = None) -> list[str]:
    """Find base forms of a word for given POS.

    This is the main entry point for morphological processing.
    It returns all possible base forms found through exception
    lists and suffix rules. Handles collocations and special cases.

    Parameters
    ----------
    word : str
        The inflected word or collocation to process.
    pos : WordNetPOS | None, default=None
        Part of speech. If None, tries all POS.

    Returns
    -------
    list[str]
        List of base forms found. Empty if none found.

    Examples
    --------
    >>> lemmas = morphy.morphy("running", "v")
    >>> print(lemmas)
    ['run']

    >>> lemmas = morphy.morphy("geese", "n")
    >>> print(lemmas)
    ['goose']

    >>> lemmas = morphy.morphy("attorneys general", "n")
    >>> print(lemmas)
    ['attorney general']
    """
    # Normalize word to lowercase
    word = word.lower()

    # Remove periods from potential abbreviations
    word_no_period = word.rstrip(".")

    # Determine POS tags to try
    pos_tags: list[WordNetPOS]
    pos_tags = [pos] if pos is not None else ["n", "v", "a", "r"]

    base_forms: list[str] = []
    seen = set()

    # Check for collocations (multi-word expressions)
    if " " in word or "-" in word:
        # Process as collocation
        collocation_forms = self._morphy_collocation(word, pos_tags)
        for form in collocation_forms:
            if form not in seen:
                base_forms.append(form)
                seen.add(form)

    # Process as single word (also try without period)
    for test_word in [word, word_no_period]:
        if test_word != word and test_word == word_no_period and word != word_no_period:
            # Only test without period if it's different
            pass

        for pos_tag in pos_tags:
            # Get base forms for this POS
            forms = self._morphy_pos(test_word, pos_tag)

            # Add unique forms
            for form in forms:
                if form not in seen:
                    base_forms.append(form)
                    seen.add(form)

    return base_forms

Functions

morphy(word: str, pos: WordNetPOS | None = None, loader: WordNetLoader | None = None) -> list[str]

Find base forms of a word.

Convenience function for morphological processing.

PARAMETER DESCRIPTION
word

The word to process.

TYPE: str

pos

Part of speech. If None, tries all POS.

TYPE: WordNetPOS | None DEFAULT: None

loader

WordNet loader. If None, uses default instance.

TYPE: WordNetLoader | None DEFAULT: None

RETURNS DESCRIPTION
list[str]

List of base forms.

RAISES DESCRIPTION
ValueError

If loader is None and no default is available.

Examples:

>>> lemmas = morphy("running", "v", loader)
>>> print(lemmas)
['run']
Source code in src/glazing/wordnet/morphy.py
def morphy(
    word: str, pos: WordNetPOS | None = None, loader: WordNetLoader | None = None
) -> list[str]:
    """Find base forms of a word.

    Convenience function for morphological processing.

    Parameters
    ----------
    word : str
        The word to process.
    pos : WordNetPOS | None, default=None
        Part of speech. If None, tries all POS.
    loader : WordNetLoader | None, default=None
        WordNet loader. If None, uses default instance.

    Returns
    -------
    list[str]
        List of base forms.

    Raises
    ------
    ValueError
        If loader is None and no default is available.

    Examples
    --------
    >>> lemmas = morphy("running", "v", loader)
    >>> print(lemmas)
    ['run']
    """
    if loader is None:
        raise ValueError("WordNet loader required for morphy")

    processor = Morphy(loader)
    return processor.morphy(word, pos)