lookout.style.typos.research.nn_prediction
¶
Module Contents¶
-
lookout.style.typos.research.nn_prediction.
extract_embeddings_from_fasttext
(fasttext:FastText, tokens:Iterable[str])¶ Convert the embeddings from FastText to a dense matrix.
Parameters: - fasttext – trained embeddings.
- tokens – list of tokens - axis Y of the returned matrix.
Returns: matrix with extracted embeddings.
-
lookout.style.typos.research.nn_prediction.
get_features
(fasttext:FastText, typos:Sequence[str])¶
-
lookout.style.typos.research.nn_prediction.
get_target
(fasttext:FastText, identifiers:Iterable[str])¶
-
lookout.style.typos.research.nn_prediction.
generator
(features:numpy.ndarray, target:numpy.ndarray, batch_size:numpy.ndarray)¶ Pumps the data for keras.Model.fit_generator()
Parameters: - features – Inputs.
- target – Labels.
- batch_size – Batch size.
Returns: Another batch for fit_generator().
-
lookout.style.typos.research.nn_prediction.
create_model
(num_neurons:int, input_len:int, output_len:int)¶ Builds the fully-connected NN.
Parameters: - num_neurons – Number of neurons in each hidden layer.
- input_len – Input size.
- output_len – Output size.
Returns: Built model.
-
lookout.style.typos.research.nn_prediction.
train_model
(model:keras.models.Sequential, features:numpy.ndarray, target:numpy.ndarray, save_model_file:str=None, batch_size:int=64, lr:float=0.1, decay:float=1e-07, num_epochs:int=100)¶
-
lookout.style.typos.research.nn_prediction.
DEFAULT_NUM_NEURONS
= 256¶
-
lookout.style.typos.research.nn_prediction.
DEFAULT_BATCH_SIZE
= 64¶
-
lookout.style.typos.research.nn_prediction.
DEFAULT_LR
= 0.1¶
-
lookout.style.typos.research.nn_prediction.
DEFAULT_DECAY
= 0.9¶
-
lookout.style.typos.research.nn_prediction.
DEFAULT_NUM_EPOCHS
= 10¶
-
lookout.style.typos.research.nn_prediction.
create_and_train_nn_prediction
(fasttext:FastText, data:pandas.DataFrame, saved_model_file:str, num_neurons:int=DEFAULT_NUM_NEURONS, batch_size:int=DEFAULT_BATCH_SIZE, lr:float=DEFAULT_LR, decay:float=DEFAULT_DECAY, num_epochs:int=DEFAULT_NUM_EPOCHS)¶ Train NN model for correction embedding prediction.
Parameters: - fasttext – gensim.models.Fasttext model.
- data – DataFrame containing columns [Columns.CorrectToken, Columns.Token].
- saved_model_file – Path to file to dump trained NN model.
- num_neurons – Number of neurons in each hidden layer.
- batch_size – Batch size for training.
- lr – Learning rate.
- decay – Learning rate exponential decay per epoch.
- num_epochs – Number of passes over the train dataset.
Returns: Trained Keras model.
-
lookout.style.typos.research.nn_prediction.
get_predictions
(fasttext:FastText, model:keras.models.Sequential, typos:Iterable[str])¶ Get predicted correction embeddings for tokens from typos.
Parameters: - fasttext – gensim.models.FastText model.
- model – Trained NN model.
- typos – Iterable with tokens to check.
Returns: Array of predicted correction embeddings.
-
lookout.style.typos.research.nn_prediction.
create_and_train_nn_prediction_from_file
(fasttext:str, data:str, dump:str=None, num_neurons:int=DEFAULT_NUM_NEURONS, batch_size:int=DEFAULT_BATCH_SIZE, lr:float=DEFAULT_LR, decay:float=DEFAULT_DECAY, num_epochs:int=DEFAULT_NUM_EPOCHS)¶ Train NN model for correction embedding prediction from files.
Parameters: - fasttext – Path to the binary dump of a FastText model.
- data – Path to a CSV dump of pandas.DataFrame containing columns [Columns.CorrectToken, Columns.Token].
- dump – Path to the file where to dump the trained NN model.
- num_neurons – Number of neurons in each hidden layer.
- batch_size – Batch size for training.
- lr – Learning rate.
- decay – Learning rate exponential decay per epoch.
- num_epochs – Number of training passes over the dataset.
Returns: Trained Keras model.