lookout.style.format.analyzer

Analyzer that detects bad formatting by learning on the existing code in the repository.

Module Contents

lookout.style.format.analyzer.LineFix
lookout.style.format.analyzer.FileFix
class lookout.style.format.analyzer.FormatAnalyzer(model:FormatModel, url:str, config:MutableMapping[str, Any])

Bases:lookout.core.analyzer.Analyzer

Detect bad formatting by training on existing code and analyzing pull requests.

model_type
name = style.format.analyzer.FormatAnalyzer
vendor = source{d}
version = 1
description = Source code formatting: whitespace, new lines, quotes, braces.
default_config
analyze(self, ptr_from:ReferencePointer, ptr_to:ReferencePointer, data_service:DataService, changes:Iterator[Change], **data)

Analyze a set of changes from one revision to another.

Parameters:
  • ptr_from – Git repository state pointer to the base revision.
  • ptr_to – Git repository state pointer to the head revision.
  • data_service – Connection to the Lookout data retrieval service.
  • data – Contains “changes” - the list of changes in the pointed state.
  • changes – Iterator of changes from the data service.
Returns:

List of comments.

classmethod check_training_required(cls, old_model:FormatModel, ptr:ReferencePointer, config:Mapping[str, Any], data_service:'lookout.core.data_requests.DataService', **data)

Return True if the format model needs to be refreshed; otherwise, False.

We calculate the ratio of the number of changed lines to the overall number of lines. If it is bigger than lines_ratio_train_trigger - we need to train.

Parameters:
  • old_model – Current FormatModel.
  • ptr – Git repository state pointer.
  • config – configuration dict.
  • data – contains “files” - the list of files in the pointed state.
  • data_service – connection to the Lookout data retrieval service.
Returns:

True or False

classmethod train(cls, ptr:ReferencePointer, config:Mapping[str, Any], data_service:DataService, files:Iterator[File], **data)

Train a model given the files available.

Parameters:
  • ptr – Git repository state pointer.
  • config – configuration dict.
  • data – contains “files” - the list of files in the pointed state.
  • data_service – connection to the Lookout data retrieval service.
  • files – iterator of File records from the data service.
Returns:

AnalyzerModel containing the learned rules, per language.

generate_file_fixes(self, data_service:DataService, changes:Sequence[Change])

Generate all data required for any type of further processing.

Next processing can be comment generation or performance report generation.

Parameters:
  • data_service – Connection to the Lookout data retrieval service.
  • changes – The list of changes in the pointed state.
Returns:

Iterator with unrendered data per comment.

render_comment_text(self, file_fix:FileFix, fix_index:int)

Generate the text of the comment for the specified line fix.

Parameters:
  • file_fix – Information about file fix required to render a comment text.
  • fix_index – Index for file_fix.line_fixes. Comment is generated for this line fix.
Returns:

string with the generated comment.

_generate_token_fixes(self, file:File, fe:FeatureExtractor, feature_extractor_output, bblfsh_stub:'bblfsh.aliases.ProtocolServiceStub', rules:Rules)
static split_train_test(files:Sequence[File], test_dataset_ratio:float, random_state:int)

Create train test split for the files collection.

File size is estimated by its length. If there is at least two files, it is guaranteed to have at least one in test dataset.

Parameters:
  • files – The list of File-s (see service_data.proto) of the same language.
  • test_dataset_ratio – The fraction of data that should be taken for test dataset.
  • random_state – Random state.
Returns:

Train files and test files.

static _get_comment_confidence(line_ys:Sequence[int], line_ys_pred:Sequence[int], line_winners:Sequence[int], rules:Rules)
static _split_vnodes_by_lines(vnodes:List[VirtualNode])

Split VirtualNode to several one-line VirtualNode if it is placed on several lines.

New line character concatenated to the next line. It is applied to vnodes with y=None only.

static _group_line_nodes(y:Sequence[int], y_pred:Sequence[int], vnodes_y:Sequence[VirtualNode], new_vnodes:List[VirtualNode], rule_winners:Sequence[int])

Group virtual nodes and related lists from feature extractor by line number.

It yields line number and sublists of corresponding items from all input sequences. Line sublists are skipped in case there is no difference in predicted and original labels. Line sublists are merged in case new line on the end was replaced by target without newline. It is a helper function for FormatAnalyser._generate_file_comments()

Parameters:
  • y – Sequence of original labels.
  • y_pred – Sequence of predicted labels by the model.
  • vnodes_y – Sequence of the labeled VirtualNode-s corresponding to labeled samples.
  • new_vnodes – Sequence of all the VirtualNode-s corresponding to the input with applied predictions. CodeGenerator.apply_predicted_y() is used for that.
  • rule_winners – List of rule winners.
Returns:

1-based line number and sublists of corresponding items from all input sequences.

classmethod _load_config(cls, config:Mapping[str, Any])

Merge provided config with the default values.

Parameters:config – User-defined config.
Returns:Full config.
classmethod _check_language_version(cls, language:str, data_service:DataService, log:logging.Logger)

Return the value indicating whether the Babelfish driver version for the specified language is supported.