lookout.style.format.benchmarks.evaluate_smoke

Module for Smoke dataset evaluation.

Module Contents

lookout.style.format.benchmarks.evaluate_smoke.EMPTY = ␣
lookout.style.format.benchmarks.evaluate_smoke.align2(seq1:Sequence, seq2:Sequence, seq2_ghost:Sequence=None)

Align two sequences using Levenshtein distance.

For example: In[1]: align(“aabc”, “abbcc”) Out[1]: (“aab␣c␣”,

“␣abbcc”)
Parameters:
  • seq1 – First sequence to align.
  • seq2 – Second sequence to align.
  • seq2_ghost – All changes to the second sequence are applied to seq2_ghost. Used by align3 function. Example: In[1]: align(“aabbbc”, “abbcc”, “xxxxx”) Out[1]: (“aabbbc␣”, “␣abb␣cc”, “␣xxx␣xx”)
Returns:

Aligned sequences and seq2_ghost modification if specified.

lookout.style.format.benchmarks.evaluate_smoke.align3(seq1:Sequence, seq2:Sequence, seq3:Sequence)

Align three sequences using Levenshtein distance.

For example: In[1]: align(“aabc”, “abbcc”, “ccdd”) Out[1]: (“aab␣c␣␣␣”,

“␣abbcc␣␣”, “␣␣␣␣ccdd”)

The result can be suboptimal because heuristic is used. True calculation requires ~ len(seq1) * len(seq2) * len(seq3) time.

Parameters:
  • seq1 – First sequence to align.
  • seq2 – Second sequence to align.
  • seq3 – Third sequence to align.
Returns:

Aligned sequences.

lookout.style.format.benchmarks.evaluate_smoke.calc_aligned_metrics(bad_style_code:str, correct_style_code:str, generated_code:str)

Calculate model quality metrics for aligned sequences.

Metrics description: 1. Amount of characters misdetected by the model as a style mistake. That is nothing needed to

be changed but model did.
  1. Amount of characters undetected by model. That is the character has to be changed but model did not.
  2. Amount of characters detected by model as a style mistake but fix was wrong. That is the character has to be changed and model did but did it wrongly.
  3. Amount of characters detected by model as a style mistake and fix was correct. That is the character has to be changed and model did it in a correct way :tada:.

In scientific words: 1. False positive. 2 + 3. False negative. We have two types of false negatives. First one is when the error was

missed and there is no fix. Second one is when the error was found but wrongly fixed.
  1. True positive.
Parameters:
  • bad_style_code – The file with style violations. It is files from head revision in the smoke dataset.
  • correct_style_code – File with correct style. It is files from base revision in the smoke dataset.
  • generated_code – Format Analyser model output. The code with fixed style.
Returns:

Tuple with 4 metric values.

lookout.style.format.benchmarks.evaluate_smoke.calc_metrics(bad_style_code:str, correct_style_code:str, fe:FeatureExtractor, vnodes:Sequence[VirtualNode], url:str, commit:str)

Calculate metrics for model output.

Algorithm description: 1. For a given model predictions y_pred we generate a new file.

Now we have 3 files we should compare: 1. bad_style_code. The file from head revision where style mistakes where applied.

We inspect this file to find them.
  1. correct_style_code The file from base revision. We use this file to train repo format model. In the ideal case, we should be able to restore this file.
  2. predicted_style. The file we get as format model output.
  1. We compare files on a character level. To do so we has to align them first. align3 function is used for that. There is an example:
>>> bad_style_code = "import   abcd"
>>> correct_style_code = "import abcd"
>>> predicted_code = "import  abcd,"
>>> print(align3(bad_style_code, correct_style_code, predicted_code))
>>> Out[1]: ("import   abcd␣",
>>>          "import ␣␣abcd␣",
>>>          "import  ␣abcd,")
4. Now we are able to compare sequences character by character. `calc_aligned_metrics` function
   is used for that. We can have 5 cases here. Let's consider them in the same example:
   ("import   abcd␣",  # aligned bad_style_code
    "import ␣␣abcd␣",  # aligned correct_style_code
    "import  ␣abcd,")  # aligned predicted_code
     ^      ^^    ^
     1      23    4
  1. All characters are equal. Everything is fine.
  2. Characters in bad style and predicted code are equal, but it is different in correct code. So, style mistake is undetected.
  3. Characters in correct style and predicted code are equal, but it is different in wrong file. So, style mistake is detected and correctly fixed.
  4. Characters in wrong style and correct style code are equal, but it is different in predicted code. So, new style mistake is introduced. We call this situation misdetection and want to avoid it as much as possible.
  5. All characters are different. There is no such case in the example, but this means that style mistake is detected but wrongly fixed.

Thus, as output we have 4 numbers: 1. style mistake misdetection 2. undetected style mistake, 3. detected style mistake with the wrong fix 4. detected style mistake with the correct fix

In scientific words: 1. False positive. 2 + 3. False negative. We have two types of false negatives. First one is when the error

was missed and there is no fix. Second one is when the error was found but wrongly fixed.
  1. True positive.
Parameters:
  • bad_style_code – The file from head revision where style mistakes where applied.
  • correct_style_code – The file from base revision. In ideal case, we should be able to restore it.
  • fe – Feature extraction class that was used to generate corresponding data. Set a value to None if no changes were introduced for bad_style_code.
  • vnodes – Sequence of all the VirtualNode-s corresponding to the input code file. Should be ordered by position. New y values should be applied.
  • url – Repository url if applicable. Useful for more informative warning messages.
  • commit – Commit hash if applicable. Useful for more informative warning messages.
Returns:

A dictionary with losses and predicted code.

class lookout.style.format.benchmarks.evaluate_smoke.SmokeEvalFormatAnalyzer

Bases:lookout.style.format.analyzer.FormatAnalyzer

Analyzer for Smoke dataset evaluation.

REPORT_COLNAMES = ['repo', 'filepath', 'style', 'misdetection', 'undetected', 'detected_wrong_fix', 'detected_correct_fix', 'bad_style_file', 'correct_style_file', 'predicted_file']
_dump_report(self, report:List[dict], outputpath:Path)
analyze(self, ptr_from:ReferencePointer, ptr_to:ReferencePointer, data_service:DataService, **data)

Analyze a set of changes from one revision to another.

Parameters:
  • ptr_from – Git repository state pointer to the base revision.
  • ptr_to – Git repository state pointer to the head revision.
  • data_service – Connection to the Lookout data retrieval service.
  • data – Contains “changes” - the list of changes in the pointed state.
Returns:

List of comments.

lookout.style.format.benchmarks.evaluate_smoke.evaluate_smoke_entry(inputpath:str, reportdir:str, database:str, bblfsh:str, config:dict)

CLI entry point.