5-layer DNN in Keras trains slower using GPU - tensorflow

I've written a 5-layer dense network in Keras 1.2 using tensorflow-gpu as backend and train it in my MacBookPro (CPU) and in a P2.xlarge instance in AWS (K80 - cuda enabled). Surprisingly my MacBookPro trains the model faster than the P2 instance. I've checked that the model is trained using the GPU in P2, so I wonder... why does it run slower?
Here is the network:
model = Sequential()
model.add(Dense(250, input_dim=input_dim, init='normal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(130, init='normal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(50, init='normal', activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(10, init='normal', activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(1, init='normal'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=[metrics.mae])
model.fit(x=X, y=Y, batch_size=batch, nb_epoch=epochs, shuffle=True, validation_data=(X_test, Y_test), verbose=2))
Thanks,
Alex.

I ran into a similar problem with a small network - and discovered that the wall clock time was largely due to CPU computations and data transfer between the CPU and the GPU, and specifically that the data transfer time was larger than the gains seen from doing GPU computations instead of CPU.
Without data to test on, my assumption is that similarly your network is too small to push the true power of the GPU and the reason you're seeing larger training time with the GPU is that your network takes more time to transfer the data between the GPU and CPU than it is gaining in performance increases from doing computations on the GPU.
Have you tried a noticeably larger network?

Related

Tflite 200mb big

I am building a model which should classify flowers. So I created a model with Tensorflow:
keras.layers.Conv2D(128, (3,3), activation='relu', input_shape=(imageShape[0], imageShape[1],3)),
keras.layers.MaxPooling2D(2,2),
keras.layers.Dropout(0.5),
keras.layers.Conv2D(256, (3,3), activation='relu'),
keras.layers.MaxPooling2D(2,2),
keras.layers.Conv2D(512, (3,3), activation='relu'),
keras.layers.MaxPooling2D(2,2),
keras.layers.Flatten(),
keras.layers.Dropout(0.3),
keras.layers.Dense(280, activation='relu'),
keras.layers.Dense(4, activation='softmax')
opt = tf.keras.optimizers.RMSprop()
model.compile(loss='categorical_crossentropy',
optimizer= opt,
metrics=['accuracy'])
While training I save checkpoints as .h5
checkpoint = ModelCheckpoint("preSaved"+str(time.time())+".h5", monitor='val_loss', verbose=1,
save_best_only=True, save_weights_only=False, mode='auto', period=1)
Now I got an epoch with a pretty low loss and want to convert it to .tflite to upload it to Firebase (use it in an Android Studio App).
import tensorflow as tf
new_model= tf.keras.models.load_model(filepath="model.h5")
tflite_converter = tf.lite.TFLiteConverter.from_keras_model(new_model)
tflite_converter.inference_type=tf.uint8
tflite_converter.default_ranges_stats=[min_value,max_value]
tflite_converter.quantized_input_stats={"conv2d_6_input_6:0"[mean,std]}
tflite_converter.post_training_quantize=True
tflite_model = tflite_converter.convert()
open("tf_lite_model.tflite", "wb").write(tflite_model)
The .h5 has about 335mb and the final .tflite got 160mb.But Firebase only allows .tflite to 60 mb and if I use a local model it needs minutes to load. I read that .tflite are usually smaller.
Is there a problem in my model or when I convert it to .tflite?
The model size is largely determined by your model architecture (the different layers that make up the model and the number of parameters in each layer). You can experiment changing those to get a smaller model.
Here is much simpler architecture for an image classification model. Keep in mind, of course, that going with a smaller model might have lower accuracy than a more sophisticated version.

Deep LSTM accuracy not crossing 50%

I am working on a classification problem of the semeval 2017 task 4A dataset can be found here
and I am using deep LSTM network for it. In pre-processing, I have done lower casing->tokenization->lemmatization->removing stop words->removing punctuations. For word embeddings, I have used WORD2VEC model. There are 18,000 samples in my training set and 2000 samples in testing.
The code for my model is
model = Sequential()
model.add(Embedding(max_words, 30, input_length=max_len))
model.add(BatchNormalization())
model.add(Activation('tanh'))
model.add(Dropout(0.3))
model.add(Bidirectional(LSTM(32, use_bias=True, return_sequences=True)))
model.add(BatchNormalization())
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Bidirectional(LSTM(32, use_bias=True, return_sequences=True), input_shape=(128, 1,64)))
model.add(BatchNormalization())
model.add(Activation('tanh'))
model.add(SeqSelfAttention(attention_activation='sigmoid'))
model.add(Dense(1, activation='sigmoid'))
model.summary()
The value of max_words is 2000 and max_len is 300
But even after this, my testing accuracy is not crossing 50%. I can't figure out the problem.
PS - I am using validation technique too. The loss function is 'Binary Crossentropy' and optimizer is 'Adam'.
Training "LSTM" is very different with other common deep learning model.
I recommend a higher dropout rate like 0.7,0.8. and Adam optimizer is particularly unstable in LSTM with real world data. So, i recommend SGD scheduled for a momentum of 0.9 and ReduceLROnPlateau. You have to do very long training, and if spark loss is observed, the training is going very well. (Spark Loss is a word used by NVIDIA researchers. It refers to a phenomenon in which the value of Loss that appears to converge increases significantly.)

How to stabilize loss when using keras for image classification

Am using keras to perform image classification, I have 10 classes and ~900 image each, I used VGG 16 and built on top of that this small network
model = Sequential()
model.add(Flatten(input_shape=train_data.shape[1:]))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
am training with 50 epoch
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy', metrics=['accuracy'])
I get the below accraucy and loss
[INFO] accuracy: 94.72%
[INFO] Loss: 0.45841544931342115
yet am not sure how to stabilize the loss, should I increase the epochs or there would be other parameters I need to change ?
Due to the val loss fluctuates from first epochs, I think that you forget to freeze the main VGG model and just train your adding Dense stack layers.
Indeed It's better to use 2D Global Ave Polling instead of flattening.
If problem don't solve try to use more efficient pre-trained CNN architectures such as MobileNet V2 or Xception

A neural network that can't overfit?

I am fitting a model to some noisy satellite data. The labels are measurements of rock on the bars of a river. There is a noisy but significant relationship. I only have 250 points but the method would expand and eventually run off much bigger datasets. I'm looking at a mix of models (RANSAC, Huber, SVM Regression) and DNNs. My DNN results seem too good to be true. The network looks like:
model = Sequential()
model.add(Dense(128, kernel_regularizer= regularizers.l2(0.001), input_dim=NetworkDims, kernel_initializer='he_normal', activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128, kernel_regularizer= regularizers.l2(0.001), kernel_initializer='normal', activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, kernel_regularizer= regularizers.l2(0.001), kernel_initializer='normal', activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, kernel_regularizer= regularizers.l2(0.001), kernel_initializer='normal', activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(32, kernel_regularizer= regularizers.l2(0.001), kernel_initializer='normal', activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, kernel_initializer='normal'))
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam')
return model
And when I save the history and plot training loss (green dots) and validation loss (cyan line) vs epoch I get this:
Training and validation loss just creep down. With a small dataset, I was expecting the validation loss to go its own way. In fact, if I run a 10-fold cross val score with this network, the error reported by cross val score does creep down. This just looks too good to be true. It implies that I could train this thing for 1000 epochs and still improve results. If it looks too good to be true, it usually is, but why?
EDIT: More results.
So I tried to cut dropout to 0.1 at each and remove the L2. Inteesting. With the toned-down drop-out, I get even better results:
10% dropout rate
Without the L2, there is overfitting:
No L2 reg
My guess would be that you have such a high dropout on every layer, which is why it's having trouble just overfitting on the training data. My prediction is that if you lower that dropout and regularization, it'll learn the training data much faster.
I'm not too sure if the results are too good to be true because it's hard to base how good a model is based on loss function. But it should be the dropout and regularization that is preventing it from overfitting in a few epochs.

Why does increasing features lead to worse neural network performance?

I have a regression problem and configured a multi-layered neural network using Keras. The original dataset had 286 features, and using 20 epochs, the NN converged to a MSE loss of ~0.0009. This is using the Adam optimizer.
I then added three more features, and using the same configuration, the NN won't converge. After 1 epoch, it gets stuck at a loss of 0.003, so significantly worse.
After checking that the new features are represented correctly, I have tried the following with no success:
adjusting number of layers
adjusting number of neurons in each layer
including dropout layers
adjusting the learning rate
Here is my original configuration:
model = Sequential()
model.add(Dense(300, activation='relu',
input_dim=training_set.shape[1]))
model.add(Dense(100, activation='relu'))
model.add(Dense(100, activation='relu'))
model.add(Dense(1, activation='linear'))
model.compile(optimizer='Adam',loss='mse')
Anybody have any ideas?