lookout.style.typos.benchmarks.typo_commits_report

Facilities to report the quality of a given model on a given dataset.

Module Contents

class lookout.style.typos.benchmarks.typo_commits_report.IdTyposAnalyzerSpy(model:IdTyposModel, url:str, config:Mapping[str, Any])

Bases:lookout.style.typos.IdTyposAnalyzer

The Analyzer which returns fixes found by IdTyposAnalyzer as JSON structures.

Note that all lines in the head revision (ptr_to) is analyzed, not only changed lines. Thus the result does not depend on base revision (ptr_from).

run(self, ptr:ReferencePointer, data_service:DataService)

Run generate_typos_fixes for all lines and all files in ptr_from revision.

Parameters:
  • ptr – Git repository state pointer to the revision that should be analyzed.
  • data_service – Connection to the Lookout data retrieval service to get the files.
Returns:

Generator of fixes for each file.

analyze(self, ptr_from:ReferencePointer, ptr_to:ReferencePointer, data_service:DataService, **data)

Extract the list of TypoFix-es as Comment-s.

TypoFix-es are generated in run().

Parameters:
  • ptr_from – The Git revision to analyze.
  • ptr_to – Not used. ptr_from is used for both model training and analysis.
  • data_service – The channel to the data service in Lookout server to query for UASTs, file contents, etc.
  • data – Extra data passed into the method. Used by the decorators to simplify the data retrieval.
Returns:

List of Comment-s with TypoFix in JSON format.

classmethod train(cls, ptr:ReferencePointer, config:Mapping[str, Any], data_service:DataService, **data)

Return empty model if check_all_identifiers is True.

It helps to speed up report generation. Such approach is not correct for original IdTyposAnalyzer but ok for Spy class.

_find_new_lines(self, prev_content:str, content:str)
class lookout.style.typos.benchmarks.typo_commits_report.TypoCommitsReporter(config:Optional[dict]=None, bblfsh:Optional[str]=None, database:Optional[str]=None, fs:Optional[str]=None, checkpoint_dir:Optional[str]=None, force:bool=False)

Bases:lookout.style.reporter.Reporter

Report system for Typos Analyser.

inspected_analyzer_type
report_template_path
classmethod get_report_names(cls)

Get all the available report names.

Returns:Tuple with report names.
_generate_reports(self, dataset_row:Dict[str, Any], fixes:Sequence[TypoFix])

Generate reports for a dataset row.

Parameters:
  • dataset_row – Dataset row which triggered the analyze method of the analyzer.
  • fixes – List of TypoFix-es provided by the TyposAnalyzerSpy.analyze() method.
Returns:

Dictionary with report names as keys and report string as values.

generate_commit_dataset_report(self, dataset_row:Dict[str, Any], fixes:Sequence[TypoFix])

Generate the report for a dataset row.

Parameters:
  • dataset_row – Dataset row which triggered the analyze method of the analyzer.
  • fixes – List of TypoFix-es provided by the TyposAnalyzerSpy.analyze() method.
Returns:

Dictionary with report names as keys and report string as values.

_trigger_review_event(self, dataset_row:Dict[str, Any])
_finalize(self, reports:Iterable[Dict[str, str]])

Summarize all individual reports.

Parameters:reports – Reports generated by TypoCommitsReporter.generate_commit_dataset_report()
Returns:Summarized final report
static get_metrics_stub()

Generate pandas series with TypoCommitsReporter’s metrics.

detection_ prefix relates metric to typo detection and fix_ to a metrics for founded typos. Support is a number of analyzed identifiers.

static _get_row_repr(dataset_row:Dict[str, Any])

Convert dataset row to its representation for logging purposes.

lookout.style.typos.benchmarks.typo_commits_report.generate_typos_report_entry(dataset:str, output:str, bblfsh:str, config:dict, database:Optional[str]=None, fs:Optional[str]=None, repos_cache:Optional[str]=None, checkpoint_dir:Optional[str]=None, force:bool=False)

Entry point for the command line interface to generate typos quality report.

Parameters:
  • dataset – csv file with commits. Must contain wrong_id, correct_id, file, line, commit_fix, repo, commit_typo.
  • output – Directory where to save the report.
  • bblfsh – bblfsh address to use for lookout-sdk.
  • config – config for IdTypoAnalyzer.
  • database – sqlite3 database path to store the models. A temporary file is used if not set.
  • fs – Model repository file system root. Temporary directory is used if not set.
  • repos_cache – Directory where to download repositories from the dataset. It is strongly recommended to set this parameter if there are more than 20 repositories in the dataset. Temporary directory is used if not set.
  • checkpoint_dir – Directory where to save intermediate reports generated by _generate_reports. If intermediate reports is found in the directory _generate_reports is not called untill force flag is set.
  • force – Force to recalculate checkpoints in checkpoint_dir.