prepare_dataset¶
Filter and prepare dataset for evaluation. It should be launched on dataset prepared by typos_preprocessing.ipynb.
Module Contents¶
-
prepare_dataset.Changes¶
-
prepare_dataset.COLUMNS= ['identifier', 'correct_id', 'filename', 'line', 'commit', 'repository']¶
-
prepare_dataset.NEW_COLUMNS¶
-
prepare_dataset.COL2IND¶
-
prepare_dataset.NEW_COL2IND¶
-
class
prepare_dataset.IdentifierFileCommitRanger(*, filename:str, repository:str, identifier:str, commit:str, directory:Optional[str]=None)¶ Find first commit where identifier was added to the file.
-
_log¶
-
_run_cmd(self, cmd, step, cwd=None, env=None)¶
-
_clone(self)¶
-
_checkout(self)¶
-
_blame(self, filename=None)¶
-
static
_validate_date(text)¶
-
_get_full_hash(self, short_hash)¶
-
_get_diff(self)¶
-
_to_changes(self, line)¶
-
_pipeline(self)¶
-
__call__(self)¶
-
static
_find_deleted_file(text, filename=None)¶
-
-
prepare_dataset._parallel_comp(args)¶
-
prepare_dataset.pipeline(input_csv, output_csv, n_cores=1, cache='/tmp')¶ Find first commit hash of appearing identifier in file.
Parameters: - input_csv – Path to input csv.
- output_csv – Path to store result csv.
- n_cores – How many cores to use.
- cache – Cache location. If empty - no caching
-
prepare_dataset.parse_args()¶
-
prepare_dataset.args¶