Converting GRU layer from PyTorch to TensorFlow - tensorflow

I am trying to convert the following GRU layer from PyTorch(1.9.1) to TensorFlow(2.6.0):
# GRU layer
self.gru = nn.GRU(64, 32, bidirectional=True, num_layers=2, dropout=0.25, batch_first=True)
I am unsure about my current implementation, especially regarding the conversion of the parameters bidirectional and num_layers. My current reconstruction is the following:
# GRU Layer
model.add(Bidirectional(GRU(32, return_sequences=True, dropout=0.25, time_major=False)))
model.add(Bidirectional(GRU(32, return_sequences=True, dropout=0.25, time_major=False)))
Am I missing something? Thanks for your help in advance!

yes these two models are the same, at least from the number of parameters and the output shape point of view:
In pytorch:
import torch
model = torch.nn.Sequential(torch.nn.GRU(64, 32, bidirectional=True, num_layers=2, dropout=0.25, batch_first=True))
from torchinfo import summary
batch_size = 16
summary(model, input_size=(batch_size, 100, 64))
> ========================================================================================== Layer (type:depth-idx) Output Shape
> Param #
> ========================================================================================== Sequential -- --
> ├─GRU: 1-1 [16, 100, 64]
> 37,632
> Total params: 37,632 Trainable params: 37,632 Non-trainable params: 0
> Total mult-adds (M): 60.21
> ============================================================================= Input size (MB): 0.41 Forward/backward pass size (MB): 0.82 Params
> size (MB): 0.15 Estimated Total Size (MB): 1.38
> =============================================================================
In Tensorflow:
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Bidirectional, GRU
# GRU Layer
model = Sequential()
model.add(Bidirectional(GRU(32, return_sequences=True, dropout=0.25, time_major=False)))
model.add(Bidirectional(GRU(32, return_sequences=True, dropout=0.25, time_major=False)))
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3), loss='mse')
a = model.call(inputs=tf.random.normal(shape=(16, 100, 64)))
model.summary()
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
bidirectional_8 (Bidirection (16, 100, 64) 18816
_________________________________________________________________
bidirectional_9 (Bidirection (16, 100, 64) 18816
=================================================================
Total params: 37,632
Trainable params: 37,632
Non-trainable params: 0

Related

Why does Tensor Flow add a dimension to my input & output?

Here is my code:
from tensorflow.keras import layers
import tensorflow as tf
from tensorflow import keras
TFDataType = tf.float16
XTrain = tf.cast(tf.ones((10,10)), dtype=TFDataType)
YTrain = tf.cast(tf.ones((10,10)), dtype=TFDataType)
model = tf.keras.models.Sequential()
model.add(layers.Dense(1, dtype=TFDataType, input_shape=(10, 10)))
model.add(layers.Dense(1, dtype=TFDataType, input_shape=(10, 10)))
print(model.summary())
I am feeding it a 2 dimensional matrix. But when I see the model summary, I see:
Model: "sequential"
_________________________________________________________________
2021-08-23 13:32:18.716788: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: DESKTOP-TLG9US3
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 10, 1) 11
_________________________________________________________________
dense_1 (Dense) (None, 10, 2) 4
=================================================================
Total params: 15
Trainable params: 15
Non-trainable params: 0
_________________________________________________________________
Why is the model asking for a 3 Dimensional (None, 10, 1) array?
How do I pass an array that meets the dimensionality of (None, 10, 1)?
I cannot call numpy.ones(None, 10, 1). I cannot reshape the array with -1 in the first dimension.
In your first layer the code input_shape=(10, 10) adds the extra dimension to account for the batch size of the data. Note you only need this code for the FIRST layer in your model so remove input_shape=(10, 10) in your second layer.

Convert keras model to pytorch

Is there an easy way to convert a model like this from keras to pytorch?
I have the code in keras as following:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.regularizers import l2
state_dim = 10
architecture = (256, 256) # units per layer
learning_rate = 0.0001 # learning rate
l2_reg = 0.00000001 # L2 regularization
trainable = True
num_actions = 3
layers = []
n = len(architecture) # n = 2
for i, units in enumerate(architecture, 1):
layers.append(Dense(units=units,
input_dim=state_dim if i == 1 else None,
activation='relu',
kernel_regularizer=l2(l2_reg),
name=f'Dense_{i}',
trainable=trainable))
layers.append(Dropout(.1))
layers.append(Dense(units=num_actions,
trainable=trainable,
name='Output'))
model = Sequential(layers)
model.compile(loss='mean_squared_error',
optimizer=Adam(lr=learning_rate))
Which outputs as follow:
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
Dense_1 (Dense) (None, 256) 2816
_________________________________________________________________
Dense_2 (Dense) (None, 256) 65792
_________________________________________________________________
dropout_3 (Dropout) (None, 256) 0
_________________________________________________________________
Output (Dense) (None, 3) 771
=================================================================
Total params: 69,379
Trainable params: 69,379
Non-trainable params: 0
_________________________________________________________________
None
I must admit, I'm a little out of my depth so any advice is appreciated. I'm trying to read through the pytorch docs and will update my question with a possible answer if I manage.
Here is my best attempt:
state_dim = 10
architecture = (256, 256) # units per layer
learning_rate = 0.0001 # learning rate
l2_reg = 0.00000001 # L2 regularization
trainable = True
num_actions = 3
import torch
from torch import nn
class CustomModel(nn.Module):
def __init__(self):
super().__init__()
self.layers = nn.Sequential(
nn.Linear(state_dim, architecture[0]),
nn.ReLU(),
nn.Linear(architecture[0], architecture[1]),
nn.ReLU(),
nn.Dropout(0.25),
nn.Linear(architecture[1], num_actions),
)
def forward(self, x):
return self.layers(x)
model = CustomModel()
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
It outputs a promising looking output:
CustomModel(
(layers): Sequential(
(0): Linear(in_features=10, out_features=256, bias=True)
(1): ReLU()
(2): Linear(in_features=256, out_features=256, bias=True)
(3): ReLU()
(4): Dropout(p=0.25, inplace=False)
(5): Linear(in_features=256, out_features=3, bias=True)
)
)
However a few items are still left unanswered:
are the activations in the right place?
how do we add a kernel_regularizer = l2(l2_reg) to the first two Linear/Dense layers?
and how do we make the layers trainable?
Any input appreciated.

Transfer learning for video classification

How can I use pre-trained models to train video classification model? My dataset shape is (4000,10,150,150,1), I try to classify human action recognition with Conv2D TimeDistributed.
I can train without transfer learning but I get a poor accuracy.
What I have tried:
from keras.applications import VGG16
conv_base = VGG16(weights='imagenet',
include_top=False,
input_shape=(150, 150, 3))
model = models.Sequential()
model.add(conv_base)
model.add(TimeDistributed(Conv2D(96, (3, 3), padding='same',
input_shape=x_train.shape[1:])))
model.add(TimeDistributed(Activation('relu')))
model.add(TimeDistributed(Conv2D(128, (3, 3))))
model.add(TimeDistributed(Activation('relu')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Dropout(0.35)))
.
.
.
.
But I got ValueError: strides should be of length 1, 1 or 3 but was 2
Someone has any idea?
I'm assuming you have 10 frames for each video. It's a simple model which uses VGG16 features (GloabAveragePooling) for each frame, and LSTM to classify the frame sequences.
You can experiment by adding a few more layers, changing hyperparameters.
N.B: There are many inconsistencies in your model including passing 5-d data to VGG16 directly which expects 4-d data.
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam
import tensorflow as tf
import numpy as np
from tensorflow.keras.applications import VGG16
conv_base = VGG16(weights='imagenet',
include_top=False,
input_shape=(150, 150, 3))
IMG_SIZE=(150,150,3)
num_class = 3
def create_base():
conv_base = VGG16(weights='imagenet',
include_top=False,
input_shape=(150, 150, 3))
x = GlobalAveragePooling2D()(conv_base.output)
base_model = Model(conv_base.input, x)
return base_model
conv_base = create_base()
ip = Input(shape=(10,150,150,3))
t_conv = TimeDistributed(conv_base)(ip) # vgg16 feature extractor
t_lstm = LSTM(10, return_sequences=False)(t_conv)
f_softmax = Dense(num_class, activation='softmax')(t_lstm)
model = Model(ip, f_softmax)
model.summary()
Model: "model_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_32 (InputLayer) [(None, 10, 150, 150, 3)] 0
_________________________________________________________________
time_distributed_4 (TimeDist (None, 10, 512) 14714688
_________________________________________________________________
lstm_1 (LSTM) (None, 10) 20920
_________________________________________________________________
dense (Dense) (None, 3) 33
=================================================================
Total params: 14,735,641
Trainable params: 14,735,641
Non-trainable params: 0
________________________

How to compute number of weights of CNN?

How can we compute number of weights considering a convolutional neural network that is used to classify images into two classes :
INPUT: 100x100 gray-scale images.
LAYER 1: Convolutional layer with 60 7x7 convolutional filters (stride=1, valid
padding).
LAYER 2: Convolutional layer with 100 5x5 convolutional filters (stride=1, valid
padding).
LAYER 3: A max pooling layer that down-samples Layer 2 by a factor of 4 (e.g., from 500x500 to 250x250)
LAYER 4: Dense layer with 250 units
LAYER 5: Dense layer with 200 units
LAYER 6: Single output unit
Assume the existence of biases in each layer. Moreover, pooling layer has a weight (similar to AlexNet)
How many weights does this network have?
Some Keras code
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Conv2D, MaxPooling2D
model = Sequential()
# Layer 1
model.add(Conv2D(60, (7, 7), input_shape = (100, 100, 1), padding="same", activation="relu"))
# Layer 2
model.add(Conv2D(100, (5, 5), padding="same", activation="relu"))
# Layer 3
model.add(MaxPooling2D(pool_size=(2, 2)))
# Layer 4
model.add(Dense(250))
# Layer 5
model.add(Dense(200))
model.summary()
TL;DR - For TensorFlow + Keras
Use Sequential.summary - Link to documentation.
Example usage:
from tensorflow.keras.models import *
model = Sequential([
# Your architecture here
]);
model.summary()
The output for your architecture is:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 94, 94, 60) 3000
_________________________________________________________________
conv2d_1 (Conv2D) (None, 90, 90, 100) 150100
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 45, 45, 100) 0
_________________________________________________________________
flatten (Flatten) (None, 202500) 0
_________________________________________________________________
dense (Dense) (None, 250) 50625250
_________________________________________________________________
dense_1 (Dense) (None, 200) 50200
_________________________________________________________________
dense_2 (Dense) (None, 1) 201
=================================================================
Total params: 50,828,751
Trainable params: 50,828,751
Non-trainable params: 0
_________________________________________________________________
That's 50,828,751 parameters.
Explanation
Number of weights in a 2D Convolutional layer
For a 2D Convolutional layer having
num_filters filters,
a filter size of filter_size * filter_size * num_channels,
and a bias parameter per filter
The number of weights is: (num_filters * filter_size * filter_size * num_channels) + num_filters
E.g.: LAYER 1 in your neural network has
60 filters
and a filter size of 7 * 7 * 1. (Notice that the number of channels (1) comes from the input image.)
The number of weights in it is: (60 * 7 * 7 * 1) + 60, which is 3000.
Number of weights in a Dense layer
For a Dense layer having
num_units neurons,
num_inputs neurons in the layer prior to it,
and a bias parameter per neuron
The number of weights is: (num_units * num_inputs) + num_units
E.g. LAYER 5 in your neural network has
200 neurons
and the layer prior to it - LAYER 4 - has 250 neurons.
The number of weights in it is 200 * 250, which is 50200.

Tensorflow with Keras: ValueError - expected dense_84 to have 2 dimensions, but got array with shape (100, 9, 1)

I am trying to use Tensorflow through Keras to build a network that uses time-series data to predict the next value, but I'm getting this error:
ValueError: Error when checking target: expected dense_84 to have 2 dimensions, but got array with shape (100, 9, 1)
What is causing this? I've tried reshaping the data as other posts have suggested, but to no avail so far. Here is the code:
import keras
import numpy as np
import os
from keras import losses
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten, Dropout
from keras.layers.convolutional import Conv1D, Conv2D
# add the desktop to our path so we can access the data
os.path.join("C:\\Users\\user\\Desktop")
# import data
data = np.genfromtxt("C:\\Users\\user\\Desktop\\aapl_blocks_10.csv",
delimiter=',')
# separate into inputs and outputs
X = data[:, :9]
X = np.expand_dims(X, axis=2) # reshape (409, 9) to (409, 9, 1) for network
Y = data[:, 9]
# separate into test and train data
X_train = X[:100]
X_test = X[100:]
Y_train = Y[:100]
Y_test = Y[100:]
# set parameters
batch_size = 20;
# define model
model = Sequential()
model.add(Conv1D(filters=20,
kernel_size=5,
input_shape=(9, 1),
padding='causal'))
model.add(Flatten())
model.add(Dropout(rate=0.3))
model.add(Dense(units=10))
model.add(Activation('relu'))
model.add(Dense(units=1))
model.compile(loss=losses.mean_squared_error,
optimizer='sgd',
metrics=['accuracy'])
# train model
model.fit(X_train, Y_train, epochs=10, batch_size=batch_size)
# evaluate model
model.evaluate(X_test, Y_test, batch_size=batch_size)
And here is the model summary:
Layer (type) Output Shape Param #
=================================================================
conv1d_43 (Conv1D) (None, 9, 20) 120
_________________________________________________________________
flatten_31 (Flatten) (None, 180) 0
_________________________________________________________________
dropout_14 (Dropout) (None, 180) 0
_________________________________________________________________
dense_83 (Dense) (None, 10) 1810
_________________________________________________________________
activation_29 (Activation) (None, 10) 0
_________________________________________________________________
dense_84 (Dense) (None, 1) 11
=================================================================
Total params: 1,941
Trainable params: 1,941
Non-trainable params: 0
If there's a proper way to be formatting the data, or maybe a proper way to stack these layers, I would love to know.
I suspect you need to squeeze the channel dimension from the output, i.e. the labes are shape (batch_size, 9) and you're comparing that against the output of a dense layer with 1 channel which has size (batch_size, 9, 1). Solution: squeeze/flatten before calculating the loss.
...
model.add(Activation('relu'))
model.add(Dense(units=1))
model.add(Flatten())
model.compile(loss=losses.mean_squared_error,
optimizer='sgd',
metrics=['accuracy'])
A note on squeeze vs Flatten: in this case, the result of squeezing (removing an axis of dimension 1) and flattening (making something of shape (batch_size, n, m, ...) into shape (batch_size, nm...) will be the same. Squeeze might be slightly more appropriate in this case, since if you accidentally squeeze an axis without dimension 1 you'll get an error (a good thing), as opposed to having your program run with unexpected behaviour. I don't use keras much though and couldn't find a 'Squeeze' layer - just a squeeze function - and I'm not entirely sure how to integrate it.