lookout.style.typos.symspell

Module Contents

class lookout.style.typos.symspell.SymSpell(max_dictionary_edit_distance=2, prefix_length=7, count_threshold=1)

SymSpell: 1 million times faster through Symmetric Delete spelling correction algorithm.

The Symmetric Delete spelling correction algorithm reduces the complexity of edit candidate generation and dictionary lookup for a given Damerau-Levenshtein distance. It is six orders of magnitude faster and language independent. Opposite to other algorithms only deletes are required, no transposes + replaces + inserts. Transposes + replaces + inserts of the input term are transformed into deletes of the dictionary term. Replaces and inserts are expensive and language dependent: e.g. Chinese has 70,000 Unicode Han characters!

SymSpell supports compound splitting / decompounding of multi-word input strings with three cases: 1. mistakenly inserted space into a correct word led to two incorrect terms 2. mistakenly omitted space between two correct words led to one incorrect combined term 3. multiple independent input terms with/without spelling errors

See https://github.com/wolfgarbe/SymSpell for details.

Args:
max_dictionary_edit_distance (int, optional): Maximum distance
used to generate index. Also acts as an upper bound for max_edit_distance parameter in lookup() method. Defaults to 2.
prefix_length (int, optional): Prefix length. Should not
be changed normally. Defaults to 7.
count_threshold (int, optional): Threshold corpus-count
value for words to be considered correct. Defaults to 1, values below zero are also mapped to 1. Consider setting a higher value if your corpus contains mistakes.
_create_dictionary_entry(self, key, count)

Creates or updates a dictionary entry.

Args:
key (str): Word to insert or update. count (int): Count to save or add to existing.
Returns:
bool: True if word was added to the dictionary,
False if word was updated or ignored.
load_dictionary(self, corpus)

Loads dictionary from :param:`corpus` file.

File should contain space-separated word-count pairs one at a line.

Args:
corpus (str): Path to .csv corpus file.
create_dictionary(self, corpus)

Creates dictionary from :param:`corpus` file.

Note:
Words are not preprocessed in any way. It is your duty to provide
appropriate corpus. Also keep in mind that the distance used to generate index is specified at initialization. Consider doing a purge of below threshold words afterwards.
Args:
corpus (str): Path to corpus file.
purge_below_threshold_words(self)

Purges words below threshold.

Consider using this method after creating a dictionary to reduce memory usage.
These words are not used in any way during lookup.
lookup(self, phrase, verbosity, max_edit_distance)

Attempts to correct the spelling of :param:`phrase`.

Note:
Phrase is not preprocessed in any way.
Args:

phrase: (str) Word to correct. Should be a valid word. verbosity: (int, 0, 1 or 2) Output toggle. Set to 0 to output

closest most common correction, set to 1 to output closest suggestion, set to 2 to output all suggestions.

max_edit_distance: (int) Maximum edit distance to consider.

Returns:
list of SuggestionItem: Suggested corrections.
Raises:
AssertionError: If :param:`max_edit_distance` is larger than maximum
edit distance specified at initialization.
lookup_compound(self, phrase, max_edit_distance)

Attempts to correct the spelling of :param:`phrase`.

Note:
Phrase is not preprocessed in any way.
Args:
phrase (str): Sentence to correct. max_edit_distance (int): Maximum edit distance to consider for each word.
Returns:
list of SuggestionItem: Length-one list with suggested correction.
Raises:
AssertionError: If :param:`max_edit_distance` is larger than maximum
edit distance specified at initialization.
_delete_in_suggestion_prefix(self, delete, delete_len, suggestion, suggestion_len)

Helper method to check if :param:`delete` is prefix of :param:`suggestion`.

Args:
delete (str): String to look for in prefix. delete_len (int): Length of :param:`delete`. suggestion (str): String to take prefix from. suggestion_len (int): Length of :param:`suggestion`.
Returns:
bool: True if :param:`delete` is prefix of :param:`suggestion`, False otherwise.
_edits(self, word, edit_distance, delete_words)

helper recursive method to generate deletes.

Refer to article for details.

Args:
word (str): Word to generate deletes from. edit_distance (int): Maximum edit distance to consider, recursion depth. delete_words (set): Generated deletes, pass empty set first time.
Returns:
delete_words (set): Generated deletes.
_edits_prefix(self, key)
_hash(self, s)
_parse_words(self, text, filters='!"#$%&()*+, -./:;<=>?@[\]^_`{|}~tn', lower=True, split=' ')
class lookout.style.typos.symspell.EditDistance(base_string, distance_algorithm)
compare(self, string_2, max_distance)
damerau_levenshtein_distance(self, string_2, max_distance)
class lookout.style.typos.symspell.SuggestionItem(term, distance, count)
count
distance
term
__eq__(self, other)
__lt__(self, other)
__str__(self)