lookout.style.typos.ranking

Ranking typo correction candidates using a GBT.

Module Contents

class lookout.style.typos.ranking.CandidatesRanker(config:Optional[Mapping[str, Any]]=None, **kwargs)

Bases:modelforge.Model

Rank typos correcting candidates based on given features. XGBoost classifier is used.

_log
NAME = candidates_ranks
VENDOR = source{d}
DESCRIPTION = Model that ranks candidates according to their probability to fix the typo.
LICENSE
set_config(self, config:Optional[Mapping[str, Any]]=None)

Update ranking configuration.

Parameters:config – Ranking configuration, options: train_rounds: Number of training rounds (int). early_stopping: Early stopping parameter (int). boost_param: Boosting parameters (dict).
fit(self, identifiers:pandas.Series, candidates:pandas.DataFrame, features:numpy.ndarray, val_part:float=0.1)

Train booster on the given data.

Parameters:
  • identifiers – Series containing column right corrections and indexed in correspondence with typos from which candidates were generated.
  • candidates – DataFrame containing information about candidates for correction. Columns are [Columns.Id, Columns.Token, Columns.Candidate].
  • features – Matrix of features for candidates.
  • val_part – Part of data used for validation.
rank(self, candidates:pandas.DataFrame, features:numpy.ndarray, n_candidates:int=3, return_all:bool=True)

Assign the correctness probability value for each of the candidates.

Parameters:
  • candidates – DataFrame containing information about candidates for correction.
  • features – Matrix of features for candidates.
  • n_candidates – Number of most probably correct candidates to return for each typo.
  • return_all – False to return corrections only for typos corrected in the first candidate.
Returns:

Dictionary {id : [(candidate, correctness_proba), …]}, candidates are sorted by correctness probability in a descending order.

dump(self)

Describe the model for introspection.

__eq__(self, other:'CandidatesRanker')
static _create_labels(identifiers:pandas.Series, candidates:pandas.DataFrame)
_generate_tree(self)
_load_tree(self, tree:dict)