lookout.style.typos.ranking¶
Ranking typo correction candidates using a GBT.
Module Contents¶
-
class
lookout.style.typos.ranking.CandidatesRanker(config:Optional[Mapping[str, Any]]=None, **kwargs)¶ Bases:
modelforge.ModelRank typos correcting candidates based on given features. XGBoost classifier is used.
-
_log¶
-
NAME= candidates_ranks¶
-
VENDOR= source{d}¶
-
DESCRIPTION= Model that ranks candidates according to their probability to fix the typo.¶
-
LICENSE¶
-
set_config(self, config:Optional[Mapping[str, Any]]=None)¶ Update ranking configuration.
Parameters: config – Ranking configuration, options: train_rounds: Number of training rounds (int). early_stopping: Early stopping parameter (int). boost_param: Boosting parameters (dict).
-
fit(self, identifiers:pandas.Series, candidates:pandas.DataFrame, features:numpy.ndarray, val_part:float=0.1)¶ Train booster on the given data.
Parameters: - identifiers – Series containing column right corrections and indexed in correspondence with typos from which candidates were generated.
- candidates – DataFrame containing information about candidates for correction. Columns are [Columns.Id, Columns.Token, Columns.Candidate].
- features – Matrix of features for candidates.
- val_part – Part of data used for validation.
-
rank(self, candidates:pandas.DataFrame, features:numpy.ndarray, n_candidates:int=3, return_all:bool=True)¶ Assign the correctness probability value for each of the candidates.
Parameters: - candidates – DataFrame containing information about candidates for correction.
- features – Matrix of features for candidates.
- n_candidates – Number of most probably correct candidates to return for each typo.
- return_all – False to return corrections only for typos corrected in the first candidate.
Returns: Dictionary {id : [(candidate, correctness_proba), …]}, candidates are sorted by correctness probability in a descending order.
-
dump(self)¶ Describe the model for introspection.
-
__eq__(self, other:'CandidatesRanker')¶
-
static
_create_labels(identifiers:pandas.Series, candidates:pandas.DataFrame)¶
-
_generate_tree(self)¶
-
_load_tree(self, tree:dict)¶
-