:mod:`lookout.style.format.benchmarks.evaluate_smoke` ===================================================== .. py:module:: lookout.style.format.benchmarks.evaluate_smoke .. autoapi-nested-parse:: Module for Smoke dataset evaluation. Module Contents --------------- .. data:: EMPTY :annotation: = ␣ .. function:: align2(seq1:Sequence, seq2:Sequence, seq2_ghost:Sequence=None) Align two sequences using Levenshtein distance. For example: In[1]: align("aabc", "abbcc") Out[1]: ("aab␣c␣", "␣abbcc") :param seq1: First sequence to align. :param seq2: Second sequence to align. :param seq2_ghost: All changes to the second sequence are applied to seq2_ghost. Used by align3 function. Example: In[1]: align("aabbbc", "abbcc", "xxxxx") Out[1]: ("aabbbc␣", "␣abb␣cc", "␣xxx␣xx") :return: Aligned sequences and seq2_ghost modification if specified. .. function:: align3(seq1:Sequence, seq2:Sequence, seq3:Sequence) Align three sequences using Levenshtein distance. For example: In[1]: align("aabc", "abbcc", "ccdd") Out[1]: ("aab␣c␣␣␣", "␣abbcc␣␣", "␣␣␣␣ccdd") The result can be suboptimal because heuristic is used. True calculation requires ~ len(seq1) * len(seq2) * len(seq3) time. :param seq1: First sequence to align. :param seq2: Second sequence to align. :param seq3: Third sequence to align. :return: Aligned sequences. .. function:: calc_aligned_metrics(bad_style_code:str, correct_style_code:str, generated_code:str) Calculate model quality metrics for aligned sequences. Metrics description: 1. Amount of characters misdetected by the model as a style mistake. That is nothing needed to be changed but model did. 2. Amount of characters undetected by model. That is the character has to be changed but model did not. 3. Amount of characters detected by model as a style mistake but fix was wrong. That is the character has to be changed and model did but did it wrongly. 4. Amount of characters detected by model as a style mistake and fix was correct. That is the character has to be changed and model did it in a correct way :tada:. In scientific words: 1. False positive. 2 + 3. False negative. We have two types of false negatives. First one is when the error was missed and there is no fix. Second one is when the error was found but wrongly fixed. 4. True positive. :param bad_style_code: The file with style violations. It is files from head revision in the smoke dataset. :param correct_style_code: File with correct style. It is files from base revision in the smoke dataset. :param generated_code: Format Analyser model output. The code with fixed style. :return: Tuple with 4 metric values. .. function:: calc_metrics(bad_style_code:str, correct_style_code:str, fe:FeatureExtractor, vnodes:Sequence[VirtualNode], url:str, commit:str) Calculate metrics for model output. Algorithm description: 1. For a given model predictions `y_pred` we generate a new file. Now we have 3 files we should compare: 1. `bad_style_code`. The file from head revision where style mistakes where applied. We inspect this file to find them. 2. `correct_style_code` The file from base revision. We use this file to train repo format model. In the ideal case, we should be able to restore this file. 3. `predicted_style`. The file we get as format model output. 2. We compare files on a character level. To do so we has to align them first. `align3` function is used for that. There is an example: >>> bad_style_code = "import abcd" >>> correct_style_code = "import abcd" >>> predicted_code = "import abcd," >>> print(align3(bad_style_code, correct_style_code, predicted_code)) >>> Out[1]: ("import abcd␣", >>> "import ␣␣abcd␣", >>> "import ␣abcd,") 4. Now we are able to compare sequences character by character. `calc_aligned_metrics` function is used for that. We can have 5 cases here. Let's consider them in the same example: ("import abcd␣", # aligned bad_style_code "import ␣␣abcd␣", # aligned correct_style_code "import ␣abcd,") # aligned predicted_code ^ ^^ ^ 1 23 4 1. All characters are equal. Everything is fine. 2. Characters in bad style and predicted code are equal, but it is different in correct code. So, style mistake is undetected. 3. Characters in correct style and predicted code are equal, but it is different in wrong file. So, style mistake is detected and correctly fixed. 4. Characters in wrong style and correct style code are equal, but it is different in predicted code. So, new style mistake is introduced. We call this situation misdetection and want to avoid it as much as possible. 5. All characters are different. There is no such case in the example, but this means that style mistake is detected but wrongly fixed. Thus, as output we have 4 numbers: 1. style mistake misdetection 2. undetected style mistake, 3. detected style mistake with the wrong fix 4. detected style mistake with the correct fix In scientific words: 1. False positive. 2 + 3. False negative. We have two types of false negatives. First one is when the error was missed and there is no fix. Second one is when the error was found but wrongly fixed. 4. True positive. :param bad_style_code: The file from head revision where style mistakes where applied. :param correct_style_code: The file from base revision. In ideal case, we should be able to restore it. :param fe: Feature extraction class that was used to generate corresponding data. Set a value to None if no changes were introduced for `bad_style_code`. :param vnodes: Sequence of all the `VirtualNode`-s corresponding to the input code file. Should be ordered by position. New y values should be applied. :param url: Repository url if applicable. Useful for more informative warning messages. :param commit: Commit hash if applicable. Useful for more informative warning messages. :return: A dictionary with losses and predicted code. .. py:class:: SmokeEvalFormatAnalyzer Bases::class:`lookout.style.format.analyzer.FormatAnalyzer` Analyzer for Smoke dataset evaluation. .. attribute:: REPORT_COLNAMES :annotation: = ['repo', 'filepath', 'style', 'misdetection', 'undetected', 'detected_wrong_fix', 'detected_correct_fix', 'bad_style_file', 'correct_style_file', 'predicted_file'] .. method:: _dump_report(self, report:List[dict], outputpath:Path) .. method:: analyze(self, ptr_from:ReferencePointer, ptr_to:ReferencePointer, data_service:DataService, **data) Analyze a set of changes from one revision to another. :param ptr_from: Git repository state pointer to the base revision. :param ptr_to: Git repository state pointer to the head revision. :param data_service: Connection to the Lookout data retrieval service. :param data: Contains "changes" - the list of changes in the pointed state. :return: List of comments. .. function:: evaluate_smoke_entry(inputpath:str, reportdir:str, database:str, bblfsh:str, config:dict) CLI entry point.