filter_dataset¶
Module Contents¶
-
filter_dataset.remove_non_typos(dataset:str, filtered_dataset:str)¶ Remove non-typo-ed identifiers from the dataset.
- Remove examples, where token splits of the wrong and the correct identifiers are equal (they differ in non-alpha chars or casing).
- Remove examples, where wrong and correct identifiers are equal on lemmas level.
Parameters: - dataset – Path to the dataset.
- filtered_dataset – Path to save the filtered dataset to.