lookout.style.typos.research.dev_utils

Module Contents

lookout.style.typos.research.dev_utils.extract_embeddings_from_fasttext(fasttext:FastText, tokens:Iterable[str])

Convert the embeddings from FastText to a dense matrix.

Parameters:
  • fasttext – trained embeddings.
  • tokens – list of tokens - axis Y of the returned matrix.
Returns:

matrix with extracted embeddings.

lookout.style.typos.research.dev_utils.rand_bool(true_prob)

Returns True with probability true_prob

lookout.style.typos.research.dev_utils.detection_score(typos, suggestions)

Calculates score of solution for typo detection problem.

typos: DataFrame which indexed by “id” and has columns “typo”, “corrupted”.

suggestions: {id : [(candidate, correct_prob)]}, candidates are sorted
by correct_prob in a descending order .
lookout.style.typos.research.dev_utils.first_k_set(corrections, k)
lookout.style.typos.research.dev_utils.score_at_k(typos, suggestions, k)

Calculates score of solution for typo correction problem. The suggestions for typo correction are considered correct if there is a right one among the first k.

typos: DataFrame which is indexed by “id” and
has columns “typo”, “corrupted”.
suggestions: {id : [(candidate, correct_prob)]},
candidates inside one suggestions list are sorted by correct_prob in a descending order.
lookout.style.typos.research.dev_utils.correction_score(typos, corrections)

Equal to score_at_k(typos, corrections, 1).

lookout.style.typos.research.dev_utils.accuracy(score)
lookout.style.typos.research.dev_utils.precision(score)
lookout.style.typos.research.dev_utils.recall(score)
lookout.style.typos.research.dev_utils.f1(score)
lookout.style.typos.research.dev_utils.print_score_metrics(score, file=None)
lookout.style.typos.research.dev_utils.print_suggestion_results(typos, suggestions, file=None)