LSTM configuration on cuDNN kernel

LSTM configuration on cuDNN kernel - tensorflow

I am trying to run LSTM with autoencoder configuration in Google Colab with gpu support. But I get following warning:
WARNING:tensorflow:Layer lstm will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
Generic GPU kernel is significantly slower than cuDNN kernel. So I searched for solution and found here (https://keras.io/api/layers/recurrent_layers/lstm/) cuDNN kernel requirements for LSTM unit. Specifically:
activation == tanh
recurrent_activation == sigmoid
recurrent_dropout == 0
unroll is False
use_bias is True
Inputs, if use masking, are strictly right-padded.
Eager execution is enabled in the outermost context.
I thought that I found my problem in activation function which was 'relu', other cuDNN requirements are defaults in TF. Here is updated code:
model = Sequential()
model.add(LSTM(200, activation='tanh',input_shape=(n_timesteps, n_features)))
model.add(RepeatVector(n_outputs))
model.add(LSTM(200, activation='tanh', return_sequences=True))
model.add(TimeDistributed(Dense(100, activation='relu')))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mse', optimizer='adam')
model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
But upon execution I get same warnings that cuDNN requirements are not met. What's wrong?
TF: 2.3.0

Related

how to plot input and output shapes on top of each other using polt_model in keras

I want to plot my model using Keras.utils.plot_model function. my problem is that when I plot the model, the input and output shapes do not place on top of each other and instead will be put alongside each other (like figure 1).
Here is the code to plot this model:
model = tf.keras.models.Sequential()
model.add(layers.Embedding(100, 128, input_length=45,
input_shape=(45,), name='embed'))
model.add(layers.Conv1D(32, 7, activation='relu'))
model.add(layers.MaxPooling1D(5))
model.add(layers.Conv1D(32, 7, activation='relu'))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(1))
plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=False)
but I like to have the model plot such as figure 2 which is the typical figure we can find on internet and I created it many times before.
I couldn't find any figsize or fontsize option in plot_model to try changing them. I use google Colaboratory Notebook.
Any help is very appreciated.

I also have the same issue and I finally found this github link.
github
Just because we're using tensorflow ver2.8.0, this problem seems to happen.
As mentioned in the link, one valid solution is to change our tensorflow version such as tf-nightly.
[tensorflow ver2.8.0]
import tensorflow as tf
tf.__version__
2.8.0
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(1,input_shape=[1], name="input_layer")
],name="model_1")
model.compile(...)
[tensorflow nightly]
!pip --quiet install tf-nightly #try not to use tf ver2.8
import tensorflow as tf
tf.__version__
2.10.0-dev20220403
#just do the same thing as above
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(1,input_shape=[1], name="input_layer")
],name="model_1")
model.compile(...)
I hope you solve this problem.

It is easy but using model sequence is more easily managed.
What are the embedded layers and dataset buffers !?
It is batches of input, you manage the combination or number of batches!
( Using MS-word draws the graphs is faster or drawing tools, I use free office when study )
[ Codes ]:
import tensorflow as tf
from tensorflow.keras.utils import plot_model
model = tf.keras.models.Sequential([
tf.keras.layers.InputLayer(input_shape=(100,), dtype='int32', name='input'),
tf.keras.layers.Embedding(output_dim=512, input_dim=100, input_length=100),
tf.keras.layers.LSTM(32),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid', name='output'),
])
dot_img_file = 'F:\\temp\\Python\\img\\001.png'
tf.keras.utils.plot_model(model, to_file=dot_img_file, show_shapes=True)
# <IPython.core.display.Image object>
input('...')
[ Output ]:
F:\temp\Python>python test_tf_plotgraph.py
2022-03-28 14:21:26.043715: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-28 14:21:26.645113: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4565 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1
...
...

RNN on Colab TPU runs at the same speed as local CPU version

I implemented a local version of an RNN and a Colab TPU version of an RNN(code-below). When I execute the Colab TPU version(code-below), the training speed is very slow like my local version running on my laptop's CPU.
Does Colab TPU support RNN networks?
Am I missing something here?
import tensorflow as tf
import os
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, SimpleRNN
resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])
tf.config.experimental_connect_to_cluster(resolver)
tf.tpu.experimental.initialize_tpu_system(resolver)
print("All devices: ", tf.config.list_logical_devices('TPU'))
strategy = tf.distribute.TPUStrategy(resolver)
with strategy.scope():
model = Sequential()
model.add(SimpleRNN(units=32, input_shape=(1,step), activation="relu"))
model.add(Dense(16, activation="relu"))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='rmsprop')
model.fit(X,y, epochs=50, batch_size=16, verbose=0)

ctrl-f on this page for RNN. It seems like it should work if you can make the RNN static enough.
In general, dynamic operations don't work well with TPUs since it needs to recompile the model graph for each new training example.

tf.keras 4x slower than Keras in my RL code

I'm very new to Machine Learning.
I have found this example at Github: Code
I implemented the q_learning_keras function and chaned the Keras model to a tf.keras model.
With the Keras model 1000 Episodes take about 30 minutes.
With the tf.keras model 1000 Episodes take about 110 minutes.
Also the tf.keras variant need about 5-7% less cpu power.
I'm using Tensorflow version 1.13.1 and Keras version 2.2.4.
I'm runing the code on a virtual machine with 4 Cores # 2.6 GHz.
Keras model:
model = Sequential()
model.add(InputLayer(batch_input_shape=(1, 5)))
model.add(Dense(10, activation='sigmoid'))
model.add(Dense(2, activation='linear'))
model.compile(loss='mse', optimizer='adam', metrics=['mae'])
tf.keras model:
model = tf.keras.models.Sequential([
tf.keras.layers.InputLayer(batch_input_shape=(1, 5)),
tf.keras.layers.Dense(10, activation='sigmoid'),
tf.keras.layers.Dense(2, activation='linear')
])
The rest of the code is in both excatly the same, ase in the Github example linked above.
In the most over post I have found exactly the opposite problem, that Keras was slower then tf.keras. So I also hoped to get an increase in speed or at least to stay the same.
So I think, I have maybe make a mistake but I don't find it.
Edit:
Done some more resarch, with Tensorboard to see how the model look like.
The models look very similar at the most things. But the tf.keras model have some overhead. I don't know if this is normal for tf.keras model or if it comes from my way I build the model.
tf.keras model
keras model

5-layer DNN in Keras trains slower using GPU

I've written a 5-layer dense network in Keras 1.2 using tensorflow-gpu as backend and train it in my MacBookPro (CPU) and in a P2.xlarge instance in AWS (K80 - cuda enabled). Surprisingly my MacBookPro trains the model faster than the P2 instance. I've checked that the model is trained using the GPU in P2, so I wonder... why does it run slower?
Here is the network:
model = Sequential()
model.add(Dense(250, input_dim=input_dim, init='normal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(130, init='normal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(50, init='normal', activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(10, init='normal', activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(1, init='normal'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=[metrics.mae])
model.fit(x=X, y=Y, batch_size=batch, nb_epoch=epochs, shuffle=True, validation_data=(X_test, Y_test), verbose=2))
Thanks,
Alex.

I ran into a similar problem with a small network - and discovered that the wall clock time was largely due to CPU computations and data transfer between the CPU and the GPU, and specifically that the data transfer time was larger than the gains seen from doing GPU computations instead of CPU.
Without data to test on, my assumption is that similarly your network is too small to push the true power of the GPU and the reason you're seeing larger training time with the GPU is that your network takes more time to transfer the data between the GPU and CPU than it is gaining in performance increases from doing computations on the GPU.
Have you tried a noticeably larger network?

Convert CudnnGRU params to normal weights and bias

I am using the CudnnGRU class from tensorflow.contrib.cudnn_rnn, the training speed is much faster. However after training I need to move the model to an system which is not CUDA based. So how can I convert the CudnnGRU params to normal weights and bias, then load them into tf.contrib.cudnn_rnn.CudnnCompatibleGRUCell?

In Tensorflow 2 version for both CuDNNGRU and normal Tensorflow based GRU has been brought to a same layer which is tf.keras.layers.GRU.
Based on the available runtime hardware and constraints the layer will choose either cuDNN or TensorFlow based implementations.
If a GPU is available and all the arguments to the layer meet the requirement of the CuDNN kernel (see below for details), the layer will use a fast cuDNN implementation.
The requirements to use the cuDNN implementation are:
activation == tanh
recurrent_activation == sigmoid
recurrent_dropout == 0
unroll is False
use_bias is True
reset_after is True
Inputs, if use masking, are strictly right-padded.
Eager execution is enabled in the outermost context.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

LSTM configuration on cuDNN kernel - tensorflow

Related

how to plot input and output shapes on top of each other using polt_model in keras

RNN on Colab TPU runs at the same speed as local CPU version

tf.keras 4x slower than Keras in my RL code

5-layer DNN in Keras trains slower using GPU

Convert CudnnGRU params to normal weights and bias

Categories

Resources