lookout.style.typos.research.baseline
¶
Module Contents¶
-
lookout.style.typos.research.baseline.
MAX_DISTANCE
= 2¶
-
class
lookout.style.typos.research.baseline.
Baseline
(frequencies_file)¶ Typos correction model, based on SymSpell lookout algorithm
https://github.com/wolfgarbe/SymSpell
and simple Random Forest classifier, based on token frequencies and edit distance between typo and candidate.
Requires file containing tokens frequencies in a format “token, frequency”.
Training data: dataframe indexed by “id” and containing columns “identifier”, “typo”. Testing data: dataframe indexed by “id” and containing column “typo”.
-
fit
(self, train_file, cand_train_file=None)¶
-
dump
(self, dump_file)¶
-
suggest
(self, test_file, cand_test_file=None)¶
-
correct
(self, test_file, cand_file=None)¶
-
_freq
(self, token)¶
-
_lookup_corrections
(self, typo_info)¶
-
_create_candidates
(self, data, cand_file)¶
-
_create_labels
(self)¶
-
_create_matrix
(self, candidates)¶
-
-
lookout.style.typos.research.baseline.
baseline
(args)¶