:mod:`lookout.style.format.analyzer` ==================================== .. py:module:: lookout.style.format.analyzer .. autoapi-nested-parse:: Analyzer that detects bad formatting by learning on the existing code in the repository. Module Contents --------------- .. data:: LineFix .. data:: FileFix .. py:class:: FormatAnalyzer(model:FormatModel, url:str, config:MutableMapping[str, Any]) Bases::class:`lookout.core.analyzer.Analyzer` Detect bad formatting by training on existing code and analyzing pull requests. .. attribute:: model_type .. attribute:: name :annotation: = style.format.analyzer.FormatAnalyzer .. attribute:: vendor :annotation: = source{d} .. attribute:: version :annotation: = 1 .. attribute:: description :annotation: = Source code formatting: whitespace, new lines, quotes, braces. .. attribute:: default_config .. method:: analyze(self, ptr_from:ReferencePointer, ptr_to:ReferencePointer, data_service:DataService, changes:Iterator[Change], **data) Analyze a set of changes from one revision to another. :param ptr_from: Git repository state pointer to the base revision. :param ptr_to: Git repository state pointer to the head revision. :param data_service: Connection to the Lookout data retrieval service. :param data: Contains "changes" - the list of changes in the pointed state. :param changes: Iterator of changes from the data service. :return: List of comments. .. classmethod:: check_training_required(cls, old_model:FormatModel, ptr:ReferencePointer, config:Mapping[str, Any], data_service:'lookout.core.data_requests.DataService', **data) Return True if the format model needs to be refreshed; otherwise, False. We calculate the ratio of the number of changed lines to the overall number of lines. If it is bigger than lines_ratio_train_trigger - we need to train. :param old_model: Current FormatModel. :param ptr: Git repository state pointer. :param config: configuration dict. :param data: contains "files" - the list of files in the pointed state. :param data_service: connection to the Lookout data retrieval service. :return: True or False .. classmethod:: train(cls, ptr:ReferencePointer, config:Mapping[str, Any], data_service:DataService, files:Iterator[File], **data) Train a model given the files available. :param ptr: Git repository state pointer. :param config: configuration dict. :param data: contains "files" - the list of files in the pointed state. :param data_service: connection to the Lookout data retrieval service. :param files: iterator of File records from the data service. :return: AnalyzerModel containing the learned rules, per language. .. method:: generate_file_fixes(self, data_service:DataService, changes:Sequence[Change]) Generate all data required for any type of further processing. Next processing can be comment generation or performance report generation. :param data_service: Connection to the Lookout data retrieval service. :param changes: The list of changes in the pointed state. :return: Iterator with unrendered data per comment. .. method:: render_comment_text(self, file_fix:FileFix, fix_index:int) Generate the text of the comment for the specified line fix. :param file_fix: Information about file fix required to render a comment text. :param fix_index: Index for `file_fix.line_fixes`. Comment is generated for this line fix. :return: string with the generated comment. .. method:: _generate_token_fixes(self, file:File, fe:FeatureExtractor, feature_extractor_output, bblfsh_stub:'bblfsh.aliases.ProtocolServiceStub', rules:Rules) .. staticmethod:: split_train_test(files:Sequence[File], test_dataset_ratio:float, random_state:int) Create train test split for the files collection. File size is estimated by its length. If there is at least two files, it is guaranteed to have at least one in test dataset. :param files: The list of `File`-s (see service_data.proto) of the same language. :param test_dataset_ratio: The fraction of data that should be taken for test dataset. :param random_state: Random state. :return: Train files and test files. .. staticmethod:: _get_comment_confidence(line_ys:Sequence[int], line_ys_pred:Sequence[int], line_winners:Sequence[int], rules:Rules) .. staticmethod:: _split_vnodes_by_lines(vnodes:List[VirtualNode]) Split VirtualNode to several one-line VirtualNode if it is placed on several lines. New line character concatenated to the next line. It is applied to vnodes with y=None only. .. staticmethod:: _group_line_nodes(y:Sequence[int], y_pred:Sequence[int], vnodes_y:Sequence[VirtualNode], new_vnodes:List[VirtualNode], rule_winners:Sequence[int]) Group virtual nodes and related lists from feature extractor by line number. It yields line number and sublists of corresponding items from all input sequences. Line sublists are skipped in case there is no difference in predicted and original labels. Line sublists are merged in case new line on the end was replaced by target without newline. It is a helper function for `FormatAnalyser._generate_file_comments()` :param y: Sequence of original labels. :param y_pred: Sequence of predicted labels by the model. :param vnodes_y: Sequence of the labeled `VirtualNode`-s corresponding to labeled samples. :param new_vnodes: Sequence of all the `VirtualNode`-s corresponding to the input with applied predictions. `CodeGenerator.apply_predicted_y()` is used for that. :param rule_winners: List of rule winners. :return: 1-based line number and sublists of corresponding items from all input sequences. .. classmethod:: _load_config(cls, config:Mapping[str, Any]) Merge provided config with the default values. :param config: User-defined config. :return: Full config. .. classmethod:: _check_language_version(cls, language:str, data_service:DataService, log:logging.Logger) Return the value indicating whether the Babelfish driver version for the specified language is supported.