Seq2Seq prediction speed is slow - tensorflow

i'm trying to implement a Seq2Seq model using LSTM in tensorflow (from scratch, without rnn cell), the model works fine but the predict time for one sentence is slow for me. about 2 -> 6 sec a sentence. Is that normal?
My model:
2 LSTM for encode
2 LSTM for decode
Attention mechanism
Vocabulary: 400k
Word vector dimension: 300
The prediction code is totally on CPU.
I read some paper but they don't provide prediction speed. Thank you!

Related

train keras applications from scratch in tensorflow 2

For benchmarking different Frameworks, I want to train a inception v3 network from scratch.
Here the code snippet to build the model:
IMAGE_RES = 229
NUM_CLASSES = 102
model = tf.keras.applications.InceptionV3(include_top=True,weights=None,classes=NUM_CLASSES)
model.build(input_shape=(None, IMAGE_RES , IMAGE_RES , channels))
according to the official keras website, the argument weight=None , means a random initialization. Does this mean that I am training my network from scratch? If not, how is it possible to train the nerwork from scratch?
Yes it means that you are training your model from scratch.
Weight and biases in deep learning models are randomly initialized following some specific shemes. (See the Xavier Glorot scheme for example) Those schemes generally helps the network converge faster and achieves better results, by preventing the gradient to either vanish or explode, and by maintaining a low variance in the gradient across all layers.

Training Inception V2 from scratch - diverging

As a learning exercise, I'm training the Inception (v2) model from scratch using the ImageNet dataset from the Kaggle competition. I've heard people say it took them a week or so of training on a GPU to converge this model in this same dataset. I'm currently training it on my MacBook Pro (single CPU), so I'm expecting it to converge in no less than a month or so.
Here's my implementation of the Inception model. Input is 224x224x3 images, with values in range [0, 1].
The learning rate was set to a static 0.01 and I'm using the stochastic gradient descent optimizer.
My question
After 48 hours of training, the training loss seems to indicate that it's learning from the training data, but the validation loss is beginning to get worse. Ordinarily, this would feel like the model is overfitting. Does it look like something might be wrong with my model or dataset, or is this perfectly expected, since I've only trained 5.8 epochs?
My training and validation loss and accuracy after 1.5 epochs.
Training and validation loss and accuracy after 5.8 epochs.
Some input images as seen by the model, as well as the output of one of the early convolution layers.

when to stop training object detection tensorflow

I am training faster rcnn model on fruit dataset using a pretrained model provided in google api(faster_rcnn_inception_resnet_v2_atrous_coco).
I made few changes to the default configuration. (number of classes : 12 fine_tune_checkpoint: path to the pretrained checkpoint model and from_detection_checkpoint: true). Total number of annotated images I have is around 12000.
After training for 9000 steps, the results I got have an accuracy percent below 1, though I was expecting it to be atleast 50% (In evaluation nothing is getting detected as accuracy is almost 0). The loss fluctuates in between 0 and 4.
What should be the number of steps I should train it for. I read an article which says to run around 800k steps but its the number of step when you train from scratch?
FC layers of the model are changed because of the different number of the classes but it should not effect those classes which are already present in the pre-trained model like 'apple'?
Any help would be much appreciated!
You shouldn't look at your training loss to determine when to stop. Instead, you should run your model through the evaluator periodically, and stop training when the evaluation mAP stops improving.

More epochs or more layers?

What is the difference in training if one uses more epochs or more layers?
Should these train equally, assuming consistent hyperparams?
for epoch in range(20):
LSTM
and
for epoch in range(5):
LSTM -> LSTM -> LSTM -> LSTM
I understand that there would be a difference after training. In the first case, you would send any test batch through one trained LSTM cell, while in the 2nd case, it would go through 4 trained cells. My question pertains to training.
Seems they should be identical.
I think you make a big confusion between very different concepts. Let's us go back to the basics. Very simply, in a supervised machine learning experiment you have some training data X, and a model. A model is like a function with internal parameters, you give it some data and it gives you back a prediction. Here, let us say our model has one layer, which is an LSTM. That means the parameters of our model are the parameters of the LSTM (I won't go into what they are, if you don't know them you should read the paper introducing LSTMs).
What is an epoch: very roughly, "training for n epochs" means looping n times on the training data. You show each example n times to the model for update. The more epochs the more you get your network acustomed to your training data. (I'm being very overly simplistic).
I hope it is clearer now that epochs and layers are in no way related to the layers. The layers are what your model is made of, and the epochs is about how many times you will show your examples to the model.
If you put 5 LSTM layers, you will just have 5 times more parameters. But in any case, each of your training examples will go through the 1 or 5 stacked LSTM layers...

Convolutional neural network doesn't classify test set keras

I have a 3-D convolutional neural network [keras, tensorflow] and 3D brain images of people with advanced alzheimer's, early alzheimer's and healthy people (3 classes). I have training set of 324 images and test set of 74 images. When I trained my CNN, I had about 65-70% accuracy but for the test set I had only 30-40%. When I used the test data as validation data then for training set I had no more than 37% accuracy as well and the loss stayed at the same level the whole time. Nevermind which parameters I change, the result is the same. I load my prepared and normalized data from .h5 file into Python, and the input have shape (None, 90, 120, 80, 1). I don't have an idea what may be wrong, I checked the code many times and everything seems to be correct.
My CNN have 4 conv3D layers, 3 max-pooling, activations:relu and batch_normalizations, 3 dense layers and dropout, softmax
I appreciate any help or ideas.
If you only have 65/70% accuracy on your training data that is really poor and indicates your neural network is not converging properly. Your network should be capable of at least overfitting the training data if the structure is complex enough, by effectively learning to hardcode the outputs from the small input sample. By the sound of it, your structure is complex enough.
The first thing to try is to reduce the learning rate by a factor of 10, and turn off validation/early stopping/normalisation/regularisation and any other ways to prevent overfitting. Then rinse, repeat - more iterations, each reducing the LR by a factor of 10 - until you can overfit the training data to where it gets close to 100% on the training data.
You can then work on putting in the proper early stopping, dropout, normalisation, regularisation etc to prevent overfitting with a learning rate you know works.
If dropping the LR doesn't even overfit however small the LR then you have some issue with your NN structure.