lookout.style.format.benchmarks.quality_report

Measure quality on several top repositories.

Module Contents

lookout.style.format.benchmarks.quality_report.FLOAT_PRECISION = .3f
lookout.style.format.benchmarks.quality_report.get_repo_name(url:str)

Extract name of repository from URL.

Parameters:url – URL for repository.
Returns:name of repository.
lookout.style.format.benchmarks.quality_report.ensure_repo(repository:str, storage_dir:str)

Clones repository if it is an url and returns repository path.

Parameters:
  • repository – Repository url or directory in the file system.
  • storage_dir – Clone repository to this directory if it is an url.
Returns:

Repository path.

exception lookout.style.format.benchmarks.quality_report.RestartReport

Bases:ValueError

Exception raises if report collection should be restarted.

lookout.style.format.benchmarks.quality_report.measure_quality(repository:str, from_commit:str, to_commit:str, context:AnalyzerContextManager, config:dict, bblfsh:Optional[str], vnodes_expected_number:Optional[int], restarts:int=3)

Generate QualityReport for a repository. If it fails it returns empty reports.

Parameters:
  • repository – URL of repository.
  • from_commit – Hash of the base commit.
  • to_commit – Hash of the head commit.
  • context – LookoutSDK instance to query analyzer.
  • config – config for FormatAnalyzer.
  • bblfsh – Babelfish server address to use. Specify None to use the default value.
  • vnodes_expected_number – Specify number for expected number of vnodes if known. report collection will be restarted if number of extracted vnodes does not match.
  • restarts – Number of restarts if number of extracted vnodes does not match.
Returns:

Dictionary with all QualityReport reports.

lookout.style.format.benchmarks.quality_report.calc_weighted_avg(arr:Union[Sequence[Sequence], numpy.ndarray], col:int, weight_col:int=5)

Calculate average value in col weighted by column weight_col.

lookout.style.format.benchmarks.quality_report.calc_avg(arr:Union[Sequence[Sequence], numpy.ndarray], col:int)

Calculate average value in col.

lookout.style.format.benchmarks.quality_report.Metrics
lookout.style.format.benchmarks.quality_report.__doc__
:annotation: = Metrics for the quality report. Metrics are calculated on the samples
subset where predictions were made. `full_` prefix means that metric was calculated on all
available samples. Without `full_` means that metric was calculated only on samples where it has
prediction from the model. `ppcr` means predicted positive condition rate and shows the
ratio of samples where the model was able to predict.
lookout.style.format.benchmarks.quality_report._get_metrics(report:str)

Extract avg / total precision, recall, f1 score, support from report.

lookout.style.format.benchmarks.quality_report._get_model_summary(report:str)

Extract model summary - number of rules and avg. len.

lookout.style.format.benchmarks.quality_report._get_json_data(report:str)
lookout.style.format.benchmarks.quality_report.handle_input_arg(input_arg:str, log:Optional[logging.Logger]=None)

Process input argument and return an iterator over input data.

Parameters:
  • input_arg – file to process or - to get data from stdin.
  • log – Logger if you want to log handling process.
Returns:

An iterator over input files.

lookout.style.format.benchmarks.quality_report._generate_report_summary(reports:Iterable[Mapping[str, str]], report_name:str)
lookout.style.format.benchmarks.quality_report.generate_quality_report(input:str, output:str, force:bool, bblfsh:str, config:dict, database:Optional[str]=None, fs:Optional[str]=None)

Generate quality report for the given data. Entry point for command line interface.

Parameters:
  • input – csv file with repositories to make report. Should contain url, to and from columns.
  • output – Directory where to save results.
  • force – force to overwrite results stored in output directory if True. Stored results will be used if False.
  • bblfsh – bblfsh address to use.
  • config – config for FormatAnalyzer.
  • database – sqlite3 database path to store the models. Temporary file is used if not set.
  • fs – Model repository file system root. Temporary directory is used if not set.
Returns: