:mod:`lookout.style.typos.research.baseline` ============================================ .. py:module:: lookout.style.typos.research.baseline Module Contents --------------- .. data:: MAX_DISTANCE :annotation: = 2 .. py:class:: Baseline(frequencies_file) Typos correction model, based on SymSpell lookout algorithm https://github.com/wolfgarbe/SymSpell and simple Random Forest classifier, based on token frequencies and edit distance between typo and candidate. Requires file containing tokens frequencies in a format "token, frequency". Training data: dataframe indexed by "id" and containing columns "identifier", "typo". Testing data: dataframe indexed by "id" and containing column "typo". .. method:: fit(self, train_file, cand_train_file=None) .. method:: dump(self, dump_file) .. method:: suggest(self, test_file, cand_test_file=None) .. method:: correct(self, test_file, cand_file=None) .. method:: _freq(self, token) .. method:: _lookup_corrections(self, typo_info) .. method:: _create_candidates(self, data, cand_file) .. method:: _create_labels(self) .. method:: _create_matrix(self, candidates) .. function:: baseline(args)