`lookout.style.typos.ranking`¶

Ranking typo correction candidates using a GBT.

Module Contents¶

class lookout.style.typos.ranking.CandidatesRanker(config:Optional[Mapping[str, Any]]=None, **kwargs)¶

Bases:modelforge.Model

Rank typos correcting candidates based on given features. XGBoost classifier is used.

DESCRIPTION = Model that ranks candidates according to their probability to fix the typo.¶

set_config(self, config:Optional[Mapping[str, Any]]=None)¶

Update ranking configuration.

Parameters:	config – Ranking configuration, options: train_rounds: Number of training rounds (int). early_stopping: Early stopping parameter (int). boost_param: Boosting parameters (dict).

fit(self, identifiers:pandas.Series, candidates:pandas.DataFrame, features:numpy.ndarray, val_part:float=0.1)¶

Train booster on the given data.

Parameters:	identifiers – Series containing column right corrections and indexed in correspondence with typos from which candidates were generated. candidates – DataFrame containing information about candidates for correction. Columns are [Columns.Id, Columns.Token, Columns.Candidate]. features – Matrix of features for candidates. val_part – Part of data used for validation.

rank(self, candidates:pandas.DataFrame, features:numpy.ndarray, n_candidates:int=3, return_all:bool=True)¶

Assign the correctness probability value for each of the candidates.

Parameters:	candidates – DataFrame containing information about candidates for correction. features – Matrix of features for candidates. n_candidates – Number of most probably correct candidates to return for each typo. return_all – False to return corrections only for typos corrected in the first candidate.
Returns:	Dictionary {id : [(candidate, correctness_proba), …]}, candidates are sorted by correctness probability in a descending order.

static _create_labels(identifiers:pandas.Series, candidates:pandas.DataFrame)¶