`lookout.style.typos.corruption`¶

Module Contents¶

lookout.style.typos.corruption.letters¶

lookout.style.typos.corruption.rand_insert(token:str)¶: Add a random letter inside the token.

lookout.style.typos.corruption.rand_delete(token:str)¶: Delete a random symbol from the token.

lookout.style.typos.corruption.rand_substitution(token:str)¶: Substitute a random symbol with a letter inside the token.

lookout.style.typos.corruption.rand_swap(token:str)¶: Swap two random consequent symbols inside the token.

lookout.style.typos.corruption._rand_typo(token_split:Tuple[str, str, bool], add_typo_probability:float)¶

lookout.style.typos.corruption.corrupt_tokens_in_df(data:pandas.DataFrame, typo_probability:float, add_typo_probability:float, processes_number:Optional[int]=None, log_level:int=logging.DEBUG)¶

Create artificial typos in tokens (identifiers) in a pandas DataFrame. Augment some of the identifiers from the dataframe with typo_probability, the consequent typos in the same word happen with add_typo_probability each. Operations run out-of-place.

Parameters:

data – Dataframe which contains columns Columns.Token and Columns.Split.
typo_probability – Probability with which a token gets to be corrupted.
add_typo_probability – Probability with which one more corruption happens to a corrupted token.
processes_number – Number of processes for multiprocessing. If not set the number of CPUs in the system is used.
log_level – Level of logging.

Returns:

New dataframe with added columns Columns.CorrectToken and Columns.CorrectSplit, which contain tokens and corresponding splits from the data. Columns.Token and Columns.Split now contain partially corrupted tokens and corresponding splits.

`lookout.style.typos.corruption`¶

Module Contents¶

Lookout Style Analyzer

Navigation

Related Topics

lookout.style.typos.corruption¶

Module Contents¶

`lookout.style.typos.corruption`¶