lookout.style.typos.research.nn_prediction¶
Module Contents¶
-
lookout.style.typos.research.nn_prediction.extract_embeddings_from_fasttext(fasttext:FastText, tokens:Iterable[str])¶ Convert the embeddings from FastText to a dense matrix.
Parameters: - fasttext – trained embeddings.
- tokens – list of tokens - axis Y of the returned matrix.
Returns: matrix with extracted embeddings.
-
lookout.style.typos.research.nn_prediction.get_features(fasttext:FastText, typos:Sequence[str])¶
-
lookout.style.typos.research.nn_prediction.get_target(fasttext:FastText, identifiers:Iterable[str])¶
-
lookout.style.typos.research.nn_prediction.generator(features:numpy.ndarray, target:numpy.ndarray, batch_size:numpy.ndarray)¶ Pumps the data for keras.Model.fit_generator()
Parameters: - features – Inputs.
- target – Labels.
- batch_size – Batch size.
Returns: Another batch for fit_generator().
-
lookout.style.typos.research.nn_prediction.create_model(num_neurons:int, input_len:int, output_len:int)¶ Builds the fully-connected NN.
Parameters: - num_neurons – Number of neurons in each hidden layer.
- input_len – Input size.
- output_len – Output size.
Returns: Built model.
-
lookout.style.typos.research.nn_prediction.train_model(model:keras.models.Sequential, features:numpy.ndarray, target:numpy.ndarray, save_model_file:str=None, batch_size:int=64, lr:float=0.1, decay:float=1e-07, num_epochs:int=100)¶
-
lookout.style.typos.research.nn_prediction.DEFAULT_NUM_NEURONS= 256¶
-
lookout.style.typos.research.nn_prediction.DEFAULT_BATCH_SIZE= 64¶
-
lookout.style.typos.research.nn_prediction.DEFAULT_LR= 0.1¶
-
lookout.style.typos.research.nn_prediction.DEFAULT_DECAY= 0.9¶
-
lookout.style.typos.research.nn_prediction.DEFAULT_NUM_EPOCHS= 10¶
-
lookout.style.typos.research.nn_prediction.create_and_train_nn_prediction(fasttext:FastText, data:pandas.DataFrame, saved_model_file:str, num_neurons:int=DEFAULT_NUM_NEURONS, batch_size:int=DEFAULT_BATCH_SIZE, lr:float=DEFAULT_LR, decay:float=DEFAULT_DECAY, num_epochs:int=DEFAULT_NUM_EPOCHS)¶ Train NN model for correction embedding prediction.
Parameters: - fasttext – gensim.models.Fasttext model.
- data – DataFrame containing columns [Columns.CorrectToken, Columns.Token].
- saved_model_file – Path to file to dump trained NN model.
- num_neurons – Number of neurons in each hidden layer.
- batch_size – Batch size for training.
- lr – Learning rate.
- decay – Learning rate exponential decay per epoch.
- num_epochs – Number of passes over the train dataset.
Returns: Trained Keras model.
-
lookout.style.typos.research.nn_prediction.get_predictions(fasttext:FastText, model:keras.models.Sequential, typos:Iterable[str])¶ Get predicted correction embeddings for tokens from typos.
Parameters: - fasttext – gensim.models.FastText model.
- model – Trained NN model.
- typos – Iterable with tokens to check.
Returns: Array of predicted correction embeddings.
-
lookout.style.typos.research.nn_prediction.create_and_train_nn_prediction_from_file(fasttext:str, data:str, dump:str=None, num_neurons:int=DEFAULT_NUM_NEURONS, batch_size:int=DEFAULT_BATCH_SIZE, lr:float=DEFAULT_LR, decay:float=DEFAULT_DECAY, num_epochs:int=DEFAULT_NUM_EPOCHS)¶ Train NN model for correction embedding prediction from files.
Parameters: - fasttext – Path to the binary dump of a FastText model.
- data – Path to a CSV dump of pandas.DataFrame containing columns [Columns.CorrectToken, Columns.Token].
- dump – Path to the file where to dump the trained NN model.
- num_neurons – Number of neurons in each hidden layer.
- batch_size – Batch size for training.
- lr – Learning rate.
- decay – Learning rate exponential decay per epoch.
- num_epochs – Number of training passes over the dataset.
Returns: Trained Keras model.