:mod:`lookout.style.format.benchmarks.quality_report` ===================================================== .. py:module:: lookout.style.format.benchmarks.quality_report .. autoapi-nested-parse:: Measure quality on several top repositories. Module Contents --------------- .. data:: FLOAT_PRECISION :annotation: = .3f .. function:: get_repo_name(url:str) Extract name of repository from URL. :param url: URL for repository. :return: name of repository. .. function:: ensure_repo(repository:str, storage_dir:str) Clones repository if it is an url and returns repository path. :param repository: Repository url or directory in the file system. :param storage_dir: Clone repository to this directory if it is an url. :return: Repository path. .. py:exception:: RestartReport Bases::class:`ValueError` Exception raises if report collection should be restarted. .. function:: measure_quality(repository:str, from_commit:str, to_commit:str, context:AnalyzerContextManager, config:dict, bblfsh:Optional[str], vnodes_expected_number:Optional[int], restarts:int=3) Generate `QualityReport` for a repository. If it fails it returns empty reports. :param repository: URL of repository. :param from_commit: Hash of the base commit. :param to_commit: Hash of the head commit. :param context: LookoutSDK instance to query analyzer. :param config: config for FormatAnalyzer. :param bblfsh: Babelfish server address to use. Specify None to use the default value. :param vnodes_expected_number: Specify number for expected number of vnodes if known. report collection will be restarted if number of extracted vnodes does not match. :param restarts: Number of restarts if number of extracted vnodes does not match. :return: Dictionary with all QualityReport reports. .. function:: calc_weighted_avg(arr:Union[Sequence[Sequence], numpy.ndarray], col:int, weight_col:int=5) Calculate average value in `col` weighted by column `weight_col`. .. function:: calc_avg(arr:Union[Sequence[Sequence], numpy.ndarray], col:int) Calculate average value in `col`. .. data:: Metrics .. data:: __doc__ :annotation: = Metrics for the quality report. Metrics are calculated on the samples subset where predictions were made. `full_` prefix means that metric was calculated on all available samples. Without `full_` means that metric was calculated only on samples where it has prediction from the model. `ppcr` means predicted positive condition rate and shows the ratio of samples where the model was able to predict. .. function:: _get_metrics(report:str) Extract avg / total precision, recall, f1 score, support from report. .. function:: _get_model_summary(report:str) Extract model summary - number of rules and avg. len. .. function:: _get_json_data(report:str) .. function:: handle_input_arg(input_arg:str, log:Optional[logging.Logger]=None) Process input argument and return an iterator over input data. :param input_arg: file to process or `-` to get data from stdin. :param log: Logger if you want to log handling process. :return: An iterator over input files. .. function:: _generate_report_summary(reports:Iterable[Mapping[str, str]], report_name:str) .. function:: generate_quality_report(input:str, output:str, force:bool, bblfsh:str, config:dict, database:Optional[str]=None, fs:Optional[str]=None) Generate quality report for the given data. Entry point for command line interface. :param input: csv file with repositories to make report. Should contain url, to and from columns. :param output: Directory where to save results. :param force: force to overwrite results stored in output directory if True. Stored results will be used if False. :param bblfsh: bblfsh address to use. :param config: config for FormatAnalyzer. :param database: sqlite3 database path to store the models. Temporary file is used if not set. :param fs: Model repository file system root. Temporary directory is used if not set. :return: