lookout.style.typos

Typos corrector. Uses symspell, FastText over a dataset of identifiers, etc.

Package Contents

class lookout.style.typos.IdTyposAnalyzer(model:IdTyposModel, url:str, config:Mapping[str, Any])

Bases:lookout.core.analyzer.Analyzer

Identifier typos analyzer.

_log
model_type
name = lookout.style.typos
vendor = source{d}
version = 1
description = Corrector of typos in source code identifiers.
corrector_manager
default_config
static create_token_parser()

Create instance of TokenParser that should be used by IdTyposAnalyzer.

Returns:TokenParser.
analyze(self, ptr_from:ReferencePointer, ptr_to:ReferencePointer, data_service:DataService, changes:Iterable[Change], **data)

Return the list of Comment-s - found typo corrections.

Parameters:
  • ptr_from – The Git revision of the fork point. Exists in both the original and the forked repositories.
  • ptr_to – The Git revision to analyze. Exists only in the forked repository.
  • data_service – The channel to the data service in Lookout server to query for UASTs, file contents, etc.
  • changes – Iterator of changes from the data service.
  • data – Extra data passed into the method. Used by the decorators to simplify the data retrieval.
Returns:

List of found review suggestions. Refer to lookout/core/server/sdk/service_analyzer.proto.

generate_typos_fixes(self, changes:Sequence[Change])

Generate all data about typo fix required for any type of further processing.

The processing can be comment generation or performance report generation.

Parameters:changes – The list of changes in the pointed state.
Returns:Iterator with unrendered data per comment.
static _get_identifiers(uast, lines)
_find_new_lines(self, prev_content:str, content:str)
render_comment_text(self, typo_fix:TypoFix)

Generate the text of the comment for the specified typo fix.

Parameters:typo_fix – Information about typo fix required to render a comment text.
Returns:string with the generated comment.
static _normalize_confidences(confidences:Sequence[float])
generate_identifier_suggestions(self, suggestions:Mapping[str, Iterable[Candidate]], identifier:str)

Generate suggestions for the identifier and compute the probability of suggestion.

Parameters:
  • suggestions – suggestions are a mapping from a token to the list of candidates.
  • identifier – initial identifier.
Returns:

a generator of tuples with a suggestion for the identifier and probability.

static _proba(candidates:Iterable[Candidate])
static reconstruct_identifier(tokenizer:TokenParser, pred_tokens:List[str], identifier:str)

Reconstruct identifier given predicted tokens and initial identifier.

Parameters:
  • tokenizer – tokenizer - instance of TokenParser.
  • pred_tokens – list of predicted tokens.
  • identifier – identifier.
Returns:

reconstructed identifier based on predicted tokens.

classmethod train(cls, ptr:ReferencePointer, config:Mapping[str, Any], data_service:DataService, files:Iterator[File], **data)

Generate a new model on top of the specified source code.

Parameters:
  • ptr – Git repository state pointer.
  • config – Configuration of the training of unspecified structure.
  • data_service – The channel to the data service in Lookout server to query for UASTs, file contents, etc.
  • files – iterator of File records from the data service.
  • data – Extra data passed into the method. Used by the decorators to simplify the data retrieval.
Returns:

Instance of AnalyzerModel (model_type, to be precise).

check_identifiers(self, identifiers:List[str])

Check tokens from identifiers for typos.

Parameters:identifiers – List of identifiers to check.
Returns:Dictionary of corrections grouped by ids of corresponding identifier in ‘identifiers’ and typoed tokens which have correction suggestions.
filter_suggestions(self, test_df:pandas.DataFrame, suggestions:Dict[int, List[Candidate]])

Filter suggestions based on the repo specifics and confidence threshold.

Parameters:
  • test_df – DataFrame with info about tested tokens.
  • suggestions – Dictionary of correction suggestions grouped by typoed token index in test_df.
Returns:

Dictionary of filtered suggestions grouped by checked token’s index in test_df.

classmethod _load_config(cls, config:Mapping[str, Any])

Merge provided config with the default values.

Parameters:config – User-defined config.
Returns:Full config.
lookout.style.typos.main()

Entry point of the utility.

lookout.style.typos.analyzer_class
lookout.style.typos.run_cmdline_tool