prepare_dataset
¶
Filter and prepare dataset for evaluation. It should be launched on dataset prepared by typos_preprocessing.ipynb.
Module Contents¶
-
prepare_dataset.
Changes
¶
-
prepare_dataset.
COLUMNS
= ['identifier', 'correct_id', 'filename', 'line', 'commit', 'repository']¶
-
prepare_dataset.
NEW_COLUMNS
¶
-
prepare_dataset.
COL2IND
¶
-
prepare_dataset.
NEW_COL2IND
¶
-
class
prepare_dataset.
IdentifierFileCommitRanger
(*, filename:str, repository:str, identifier:str, commit:str, directory:Optional[str]=None)¶ Find first commit where identifier was added to the file.
-
_log
¶
-
_run_cmd
(self, cmd, step, cwd=None, env=None)¶
-
_clone
(self)¶
-
_checkout
(self)¶
-
_blame
(self, filename=None)¶
-
static
_validate_date
(text)¶
-
_get_full_hash
(self, short_hash)¶
-
_get_diff
(self)¶
-
_to_changes
(self, line)¶
-
_pipeline
(self)¶
-
__call__
(self)¶
-
static
_find_deleted_file
(text, filename=None)¶
-
-
prepare_dataset.
_parallel_comp
(args)¶
-
prepare_dataset.
pipeline
(input_csv, output_csv, n_cores=1, cache='/tmp')¶ Find first commit hash of appearing identifier in file.
Parameters: - input_csv – Path to input csv.
- output_csv – Path to store result csv.
- n_cores – How many cores to use.
- cache – Cache location. If empty - no caching
-
prepare_dataset.
parse_args
()¶
-
prepare_dataset.
args
¶