:mod:`lookout.style.typos.ranking` ================================== .. py:module:: lookout.style.typos.ranking .. autoapi-nested-parse:: Ranking typo correction candidates using a GBT. Module Contents --------------- .. py:class:: CandidatesRanker(config:Optional[Mapping[str, Any]]=None, **kwargs) Bases::class:`modelforge.Model` Rank typos correcting candidates based on given features. XGBoost classifier is used. .. attribute:: _log .. attribute:: NAME :annotation: = candidates_ranks .. attribute:: VENDOR :annotation: = source{d} .. attribute:: DESCRIPTION :annotation: = Model that ranks candidates according to their probability to fix the typo. .. attribute:: LICENSE .. method:: set_config(self, config:Optional[Mapping[str, Any]]=None) Update ranking configuration. :param config: Ranking configuration, options: train_rounds: Number of training rounds (int). early_stopping: Early stopping parameter (int). boost_param: Boosting parameters (dict). .. method:: fit(self, identifiers:pandas.Series, candidates:pandas.DataFrame, features:numpy.ndarray, val_part:float=0.1) Train booster on the given data. :param identifiers: Series containing column right corrections and indexed in correspondence with typos from which candidates were generated. :param candidates: DataFrame containing information about candidates for correction. Columns are [Columns.Id, Columns.Token, Columns.Candidate]. :param features: Matrix of features for candidates. :param val_part: Part of data used for validation. .. method:: rank(self, candidates:pandas.DataFrame, features:numpy.ndarray, n_candidates:int=3, return_all:bool=True) Assign the correctness probability value for each of the candidates. :param candidates: DataFrame containing information about candidates for correction. :param features: Matrix of features for candidates. :param n_candidates: Number of most probably correct candidates to return for each typo. :param return_all: False to return corrections only for typos corrected in the first candidate. :return: Dictionary `{id : [(candidate, correctness_proba), ...]}`, candidates are sorted by correctness probability in a descending order. .. method:: dump(self) Describe the model for introspection. .. method:: __eq__(self, other:'CandidatesRanker') .. staticmethod:: _create_labels(identifiers:pandas.Series, candidates:pandas.DataFrame) .. method:: _generate_tree(self) .. method:: _load_tree(self, tree:dict)