filter_dataset
¶
Module Contents¶
-
filter_dataset.
remove_non_typos
(dataset:str, filtered_dataset:str)¶ Remove non-typo-ed identifiers from the dataset.
- Remove examples, where token splits of the wrong and the correct identifiers are equal (they differ in non-alpha chars or casing).
- Remove examples, where wrong and correct identifiers are equal on lemmas level.
Parameters: - dataset – Path to the dataset.
- filtered_dataset – Path to save the filtered dataset to.