`lookout.style.typos.research.nn_prediction`¶

Module Contents¶

lookout.style.typos.research.nn_prediction.extract_embeddings_from_fasttext(fasttext:FastText, tokens:Iterable[str])¶

Convert the embeddings from FastText to a dense matrix.

Parameters:	fasttext – trained embeddings. tokens – list of tokens - axis Y of the returned matrix.
Returns:	matrix with extracted embeddings.

lookout.style.typos.research.nn_prediction.get_features(fasttext:FastText, typos:Sequence[str])¶

lookout.style.typos.research.nn_prediction.get_target(fasttext:FastText, identifiers:Iterable[str])¶

lookout.style.typos.research.nn_prediction.generator(features:numpy.ndarray, target:numpy.ndarray, batch_size:numpy.ndarray)¶

Pumps the data for keras.Model.fit_generator()

Parameters:	features – Inputs. target – Labels. batch_size – Batch size.
Returns:	Another batch for fit_generator().

lookout.style.typos.research.nn_prediction.create_model(num_neurons:int, input_len:int, output_len:int)¶

Builds the fully-connected NN.

Parameters:	num_neurons – Number of neurons in each hidden layer. input_len – Input size. output_len – Output size.
Returns:	Built model.

lookout.style.typos.research.nn_prediction.train_model(model:keras.models.Sequential, features:numpy.ndarray, target:numpy.ndarray, save_model_file:str=None, batch_size:int=64, lr:float=0.1, decay:float=1e-07, num_epochs:int=100)¶

lookout.style.typos.research.nn_prediction.DEFAULT_NUM_NEURONS = 256¶

lookout.style.typos.research.nn_prediction.DEFAULT_BATCH_SIZE = 64¶

lookout.style.typos.research.nn_prediction.DEFAULT_LR = 0.1¶

lookout.style.typos.research.nn_prediction.DEFAULT_DECAY = 0.9¶

lookout.style.typos.research.nn_prediction.DEFAULT_NUM_EPOCHS = 10¶

lookout.style.typos.research.nn_prediction.create_and_train_nn_prediction(fasttext:FastText, data:pandas.DataFrame, saved_model_file:str, num_neurons:int=DEFAULT_NUM_NEURONS, batch_size:int=DEFAULT_BATCH_SIZE, lr:float=DEFAULT_LR, decay:float=DEFAULT_DECAY, num_epochs:int=DEFAULT_NUM_EPOCHS)¶

Train NN model for correction embedding prediction.

Parameters:

fasttext – gensim.models.Fasttext model.
data – DataFrame containing columns [Columns.CorrectToken, Columns.Token].
saved_model_file – Path to file to dump trained NN model.
num_neurons – Number of neurons in each hidden layer.
batch_size – Batch size for training.
lr – Learning rate.
decay – Learning rate exponential decay per epoch.
num_epochs – Number of passes over the train dataset.

Returns:

Trained Keras model.

lookout.style.typos.research.nn_prediction.get_predictions(fasttext:FastText, model:keras.models.Sequential, typos:Iterable[str])¶

Get predicted correction embeddings for tokens from typos.

Parameters:	fasttext – gensim.models.FastText model. model – Trained NN model. typos – Iterable with tokens to check.
Returns:	Array of predicted correction embeddings.

lookout.style.typos.research.nn_prediction.create_and_train_nn_prediction_from_file(fasttext:str, data:str, dump:str=None, num_neurons:int=DEFAULT_NUM_NEURONS, batch_size:int=DEFAULT_BATCH_SIZE, lr:float=DEFAULT_LR, decay:float=DEFAULT_DECAY, num_epochs:int=DEFAULT_NUM_EPOCHS)¶

Train NN model for correction embedding prediction from files.

Parameters:

fasttext – Path to the binary dump of a FastText model.
data – Path to a CSV dump of pandas.DataFrame containing columns [Columns.CorrectToken, Columns.Token].
dump – Path to the file where to dump the trained NN model.
num_neurons – Number of neurons in each hidden layer.
batch_size – Batch size for training.
lr – Learning rate.
decay – Learning rate exponential decay per epoch.
num_epochs – Number of training passes over the dataset.

Returns:

Trained Keras model.

`lookout.style.typos.research.nn_prediction`¶

Module Contents¶

Lookout Style Analyzer

Navigation

Related Topics

lookout.style.typos.research.nn_prediction¶

Module Contents¶

`lookout.style.typos.research.nn_prediction`¶