lookout.style.format.benchmarks.quality_report
¶
Measure quality on several top repositories.
Module Contents¶
-
lookout.style.format.benchmarks.quality_report.
FLOAT_PRECISION
= .3f¶
-
lookout.style.format.benchmarks.quality_report.
get_repo_name
(url:str)¶ Extract name of repository from URL.
Parameters: url – URL for repository. Returns: name of repository.
-
lookout.style.format.benchmarks.quality_report.
ensure_repo
(repository:str, storage_dir:str)¶ Clones repository if it is an url and returns repository path.
Parameters: - repository – Repository url or directory in the file system.
- storage_dir – Clone repository to this directory if it is an url.
Returns: Repository path.
-
exception
lookout.style.format.benchmarks.quality_report.
RestartReport
¶ Bases:
ValueError
Exception raises if report collection should be restarted.
-
lookout.style.format.benchmarks.quality_report.
measure_quality
(repository:str, from_commit:str, to_commit:str, context:AnalyzerContextManager, config:dict, bblfsh:Optional[str], vnodes_expected_number:Optional[int], restarts:int=3)¶ Generate QualityReport for a repository. If it fails it returns empty reports.
Parameters: - repository – URL of repository.
- from_commit – Hash of the base commit.
- to_commit – Hash of the head commit.
- context – LookoutSDK instance to query analyzer.
- config – config for FormatAnalyzer.
- bblfsh – Babelfish server address to use. Specify None to use the default value.
- vnodes_expected_number – Specify number for expected number of vnodes if known. report collection will be restarted if number of extracted vnodes does not match.
- restarts – Number of restarts if number of extracted vnodes does not match.
Returns: Dictionary with all QualityReport reports.
-
lookout.style.format.benchmarks.quality_report.
calc_weighted_avg
(arr:Union[Sequence[Sequence], numpy.ndarray], col:int, weight_col:int=5)¶ Calculate average value in col weighted by column weight_col.
-
lookout.style.format.benchmarks.quality_report.
calc_avg
(arr:Union[Sequence[Sequence], numpy.ndarray], col:int)¶ Calculate average value in col.
-
lookout.style.format.benchmarks.quality_report.
Metrics
¶
-
lookout.style.format.benchmarks.quality_report.
__doc__
¶ -
:annotation: = Metrics for the quality report. Metrics are calculated on the samples
-
subset where predictions were made. `full_` prefix means that metric was calculated on all
-
available samples. Without `full_` means that metric was calculated only on samples where it has
-
prediction from the model. `ppcr` means predicted positive condition rate and shows the
-
ratio of samples where the model was able to predict.
-
lookout.style.format.benchmarks.quality_report.
_get_metrics
(report:str)¶ Extract avg / total precision, recall, f1 score, support from report.
-
lookout.style.format.benchmarks.quality_report.
_get_model_summary
(report:str)¶ Extract model summary - number of rules and avg. len.
-
lookout.style.format.benchmarks.quality_report.
_get_json_data
(report:str)¶
-
lookout.style.format.benchmarks.quality_report.
handle_input_arg
(input_arg:str, log:Optional[logging.Logger]=None)¶ Process input argument and return an iterator over input data.
Parameters: - input_arg – file to process or - to get data from stdin.
- log – Logger if you want to log handling process.
Returns: An iterator over input files.
-
lookout.style.format.benchmarks.quality_report.
_generate_report_summary
(reports:Iterable[Mapping[str, str]], report_name:str)¶
-
lookout.style.format.benchmarks.quality_report.
generate_quality_report
(input:str, output:str, force:bool, bblfsh:str, config:dict, database:Optional[str]=None, fs:Optional[str]=None)¶ Generate quality report for the given data. Entry point for command line interface.
Parameters: - input – csv file with repositories to make report. Should contain url, to and from columns.
- output – Directory where to save results.
- force – force to overwrite results stored in output directory if True. Stored results will be used if False.
- bblfsh – bblfsh address to use.
- config – config for FormatAnalyzer.
- database – sqlite3 database path to store the models. Temporary file is used if not set.
- fs – Model repository file system root. Temporary directory is used if not set.
Returns: