lookout.style.typos.ranking
¶
Ranking typo correction candidates using a GBT.
Module Contents¶
-
class
lookout.style.typos.ranking.
CandidatesRanker
(config:Optional[Mapping[str, Any]]=None, **kwargs)¶ Bases:
modelforge.Model
Rank typos correcting candidates based on given features. XGBoost classifier is used.
-
_log
¶
-
NAME
= candidates_ranks¶
-
VENDOR
= source{d}¶
-
DESCRIPTION
= Model that ranks candidates according to their probability to fix the typo.¶
-
LICENSE
¶
-
set_config
(self, config:Optional[Mapping[str, Any]]=None)¶ Update ranking configuration.
Parameters: config – Ranking configuration, options: train_rounds: Number of training rounds (int). early_stopping: Early stopping parameter (int). boost_param: Boosting parameters (dict).
-
fit
(self, identifiers:pandas.Series, candidates:pandas.DataFrame, features:numpy.ndarray, val_part:float=0.1)¶ Train booster on the given data.
Parameters: - identifiers – Series containing column right corrections and indexed in correspondence with typos from which candidates were generated.
- candidates – DataFrame containing information about candidates for correction. Columns are [Columns.Id, Columns.Token, Columns.Candidate].
- features – Matrix of features for candidates.
- val_part – Part of data used for validation.
-
rank
(self, candidates:pandas.DataFrame, features:numpy.ndarray, n_candidates:int=3, return_all:bool=True)¶ Assign the correctness probability value for each of the candidates.
Parameters: - candidates – DataFrame containing information about candidates for correction.
- features – Matrix of features for candidates.
- n_candidates – Number of most probably correct candidates to return for each typo.
- return_all – False to return corrections only for typos corrected in the first candidate.
Returns: Dictionary {id : [(candidate, correctness_proba), …]}, candidates are sorted by correctness probability in a descending order.
-
dump
(self)¶ Describe the model for introspection.
-
__eq__
(self, other:'CandidatesRanker')¶
-
static
_create_labels
(identifiers:pandas.Series, candidates:pandas.DataFrame)¶
-
_generate_tree
(self)¶
-
_load_tree
(self, tree:dict)¶
-