lookout.style.typos
¶
Typos corrector. Uses symspell, FastText over a dataset of identifiers, etc.
Subpackages¶
lookout.style.typos.benchmarks
lookout.style.typos.research
lookout.style.typos.research.__main__
lookout.style.typos.research.baseline
lookout.style.typos.research.create_typos
lookout.style.typos.research.dev_utils
lookout.style.typos.research.filter_identifiers
lookout.style.typos.research.get_frequencies
lookout.style.typos.research.nn_prediction
lookout.style.typos.research.pick_subset
lookout.style.typos.research.preprocessing
lookout.style.typos.tests
lookout.style.typos.tests.test_analyzer
lookout.style.typos.tests.test_corrector
lookout.style.typos.tests.test_corrector_utils
lookout.style.typos.tests.test_corruption
lookout.style.typos.tests.test_evaluate_typos
lookout.style.typos.tests.test_generation
lookout.style.typos.tests.test_metrics
lookout.style.typos.tests.test_preparation
lookout.style.typos.tests.test_ranking
lookout.style.typos.tests.test_symspell
lookout.style.typos.tests.test_typo_commits_report
Submodules¶
lookout.style.typos.__main__
lookout.style.typos.analyzer
lookout.style.typos.cmdline_tools
lookout.style.typos.config
lookout.style.typos.corrector
lookout.style.typos.corrector_manager
lookout.style.typos.corruption
lookout.style.typos.generation
lookout.style.typos.metrics
lookout.style.typos.model
lookout.style.typos.modelforgecfg
lookout.style.typos.preparation
lookout.style.typos.ranking
lookout.style.typos.symspell
lookout.style.typos.utils
Package Contents¶
-
class
lookout.style.typos.
IdTyposAnalyzer
(model:IdTyposModel, url:str, config:Mapping[str, Any])¶ Bases:
lookout.core.analyzer.Analyzer
Identifier typos analyzer.
-
_log
¶
-
model_type
¶
-
name
= lookout.style.typos¶
-
vendor
= source{d}¶
-
version
= 1¶
-
description
= Corrector of typos in source code identifiers.¶
-
corrector_manager
¶
-
default_config
¶
-
static
create_token_parser
()¶ Create instance of TokenParser that should be used by IdTyposAnalyzer.
Returns: TokenParser.
-
analyze
(self, ptr_from:ReferencePointer, ptr_to:ReferencePointer, data_service:DataService, changes:Iterable[Change], **data)¶ Return the list of Comment-s - found typo corrections.
Parameters: - ptr_from – The Git revision of the fork point. Exists in both the original and the forked repositories.
- ptr_to – The Git revision to analyze. Exists only in the forked repository.
- data_service – The channel to the data service in Lookout server to query for UASTs, file contents, etc.
- changes – Iterator of changes from the data service.
- data – Extra data passed into the method. Used by the decorators to simplify the data retrieval.
Returns: List of found review suggestions. Refer to lookout/core/server/sdk/service_analyzer.proto.
-
generate_typos_fixes
(self, changes:Sequence[Change])¶ Generate all data about typo fix required for any type of further processing.
The processing can be comment generation or performance report generation.
Parameters: changes – The list of changes in the pointed state. Returns: Iterator with unrendered data per comment.
-
static
_get_identifiers
(uast, lines)¶
-
_find_new_lines
(self, prev_content:str, content:str)¶
-
render_comment_text
(self, typo_fix:TypoFix)¶ Generate the text of the comment for the specified typo fix.
Parameters: typo_fix – Information about typo fix required to render a comment text. Returns: string with the generated comment.
-
static
_normalize_confidences
(confidences:Sequence[float])¶
-
generate_identifier_suggestions
(self, suggestions:Mapping[str, Iterable[Candidate]], identifier:str)¶ Generate suggestions for the identifier and compute the probability of suggestion.
Parameters: - suggestions – suggestions are a mapping from a token to the list of candidates.
- identifier – initial identifier.
Returns: a generator of tuples with a suggestion for the identifier and probability.
-
static
_proba
(candidates:Iterable[Candidate])¶
-
static
reconstruct_identifier
(tokenizer:TokenParser, pred_tokens:List[str], identifier:str)¶ Reconstruct identifier given predicted tokens and initial identifier.
Parameters: - tokenizer – tokenizer - instance of TokenParser.
- pred_tokens – list of predicted tokens.
- identifier – identifier.
Returns: reconstructed identifier based on predicted tokens.
-
classmethod
train
(cls, ptr:ReferencePointer, config:Mapping[str, Any], data_service:DataService, files:Iterator[File], **data)¶ Generate a new model on top of the specified source code.
Parameters: - ptr – Git repository state pointer.
- config – Configuration of the training of unspecified structure.
- data_service – The channel to the data service in Lookout server to query for UASTs, file contents, etc.
- files – iterator of File records from the data service.
- data – Extra data passed into the method. Used by the decorators to simplify the data retrieval.
Returns: Instance of AnalyzerModel (model_type, to be precise).
-
check_identifiers
(self, identifiers:List[str])¶ Check tokens from identifiers for typos.
Parameters: identifiers – List of identifiers to check. Returns: Dictionary of corrections grouped by ids of corresponding identifier in ‘identifiers’ and typoed tokens which have correction suggestions.
-
filter_suggestions
(self, test_df:pandas.DataFrame, suggestions:Dict[int, List[Candidate]])¶ Filter suggestions based on the repo specifics and confidence threshold.
Parameters: - test_df – DataFrame with info about tested tokens.
- suggestions – Dictionary of correction suggestions grouped by typoed token index in test_df.
Returns: Dictionary of filtered suggestions grouped by checked token’s index in test_df.
-
classmethod
_load_config
(cls, config:Mapping[str, Any])¶ Merge provided config with the default values.
Parameters: config – User-defined config. Returns: Full config.
-
-
lookout.style.typos.
main
()¶ Entry point of the utility.
-
lookout.style.typos.
analyzer_class
¶
-
lookout.style.typos.
run_cmdline_tool
¶