I am wondering if I succeeded in translating the following definition in PyTorch to Keras?
In PyTorch, the following multi-layer perceptron was defined:
from torch import nn
hidden = 128
def mlp(size_in, size_out, act=nn.ReLU):
return nn.Sequential(
nn.Linear(size_in, hidden),
act(),
nn.Linear(hidden, hidden),
act(),
nn.Linear(hidden, hidden),
act(),
nn.Linear(hidden, size_out),
)
My translation is
from tensorflow import keras
from keras import layers
hidden = 128
def mlp(size_in, size_out, act=keras.layers.ReLU):
return keras.Sequential(
[
layers.Dense(hidden, activation=None, name="layer1", input_shape=(size_in, 1)),
act(),
layers.Dense(hidden, activation=None, name="layer2", input_shape=(hidden, 1)),
act(),
layers.Dense(hidden, activation=None, name="layer3", input_shape=(hidden, 1)),
act(),
layers.Dense(size_out, activation=None, name="layer4", input_shape=(hidden, 1))
])
I am particularly confused about the input/output arguments, because that seems to be where tensorflow and PyTorch differ.
From the documentation:
When a popular kwarg input_shape is passed, then keras will create an
input layer to insert before the current layer. This can be treated
equivalent to explicitly defining an InputLayer.
So, did I get it right?
In Keras, you can provide an input_shape for the first layer or alternatively use the tf.keras.layers.Input layer. If you do not provide either of these details, the model gets built the first time you call fit, eval, or predict, or the first time you call the model on some input data. So the input shape will actually be inferred if you do not provide it. See the docs for more details. PyTorch generally infers the input shape at runtime.
def keras_mlp(size_in, size_out, act=layers.ReLU):
return keras.Sequential([layers.Input(shape=(size_in,)),
layers.Dense(hidden, name='layer1'),
act(),
layers.Dense(hidden, name='layer2'),
act(),
layers.Dense(hidden, name='layer3'),
act(),
layers.Dense(size_out, name='layer4')])
def pytorch_mlp(size_in, size_out, act=nn.ReLU):
return nn.Sequential(nn.Linear(size_in, hidden),
act(),
nn.Linear(hidden, hidden),
act(),
nn.Linear(hidden, hidden),
act(),
nn.Linear(hidden, size_out))
You can compare their summary.
For Keras:
>>> keras_mlp(10, 5).summary()
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
layer1 (Dense) (None, 128) 1408
re_lu_6 (ReLU) (None, 128) 0
layer2 (Dense) (None, 128) 16512
re_lu_7 (ReLU) (None, 128) 0
layer3 (Dense) (None, 128) 16512
re_lu_8 (ReLU) (None, 128) 0
layer4 (Dense) (None, 5) 645
=================================================================
Total params: 35,077
Trainable params: 35,077
Non-trainable params: 0
_________________________________________________________________
For PyTorch:
>>> summary(pytorch_mlp(10, 5), (1,10))
============================================================================
Layer (type:depth-idx) Output Shape Param #
============================================================================
Sequential [1, 5] --
├─Linear: 1-1 [1, 128] 1,408
├─ReLU: 1-2 [1, 128] --
├─Linear: 1-3 [1, 128] 16,512
├─ReLU: 1-4 [1, 128] --
├─Linear: 1-5 [1, 128] 16,512
├─ReLU: 1-6 [1, 128] --
├─Linear: 1-7 [1, 5] 645
============================================================================
Total params: 35,077
Trainable params: 35,077
Non-trainable params: 0
Total mult-adds (M): 0.04
============================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.14
Estimated Total Size (MB): 0.14
============================================================================
Related
Here is my code:
from tensorflow.keras import layers
import tensorflow as tf
from tensorflow import keras
TFDataType = tf.float16
XTrain = tf.cast(tf.ones((10,10)), dtype=TFDataType)
YTrain = tf.cast(tf.ones((10,10)), dtype=TFDataType)
model = tf.keras.models.Sequential()
model.add(layers.Dense(1, dtype=TFDataType, input_shape=(10, 10)))
model.add(layers.Dense(1, dtype=TFDataType, input_shape=(10, 10)))
print(model.summary())
I am feeding it a 2 dimensional matrix. But when I see the model summary, I see:
Model: "sequential"
_________________________________________________________________
2021-08-23 13:32:18.716788: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: DESKTOP-TLG9US3
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 10, 1) 11
_________________________________________________________________
dense_1 (Dense) (None, 10, 2) 4
=================================================================
Total params: 15
Trainable params: 15
Non-trainable params: 0
_________________________________________________________________
Why is the model asking for a 3 Dimensional (None, 10, 1) array?
How do I pass an array that meets the dimensionality of (None, 10, 1)?
I cannot call numpy.ones(None, 10, 1). I cannot reshape the array with -1 in the first dimension.
In your first layer the code input_shape=(10, 10) adds the extra dimension to account for the batch size of the data. Note you only need this code for the FIRST layer in your model so remove input_shape=(10, 10) in your second layer.
I don't know if it is feasible but I'm asking just in case. Here is the (simplified) architecture of my model.
Layer (type) Output Shape Param #Connected to
==========================================
input_1 (InputLayer) [(None, 7, 7, 1024) 0
conv (Conv2D) (None, 7, 7, 10) 10240 input_1[0][0]
where each of the 10 filters in "conv" is a 1x1x1024 convolutional filter (with no bias but it's irrelevant for this particular issue).
I am currently using a custom regularization function on "conv" to make sure that the (1x1)x1024x10 matrix of filter weights has a nice property (basically that all vectors are pairwise orthogonal) and so far, everything is working as expected.
Now I also want the ability to disable training on some of these 10 filters. The only way I know how to do that would be to implement 10 filters independently as follows
Layer (type) Output Shape Param # Connected to
=========================================================
input_1 (InputLayer) [(None, 7, 7, 1024) 0
conv_1 (Conv2D) (None, 7, 7, 1) 1024 input_1[0][0]
conv_2 (Conv2D) (None, 7, 7, 1) 1024 input_1[0][0]
conv_3 (Conv2D) (None, 7, 7, 1) 1024 input_1[0][0]
...
conv_10 (Conv2D) (None, 7, 7, 1) 1024 input_1[0][0]
followed by a Concatenate layer, then to set the "trainable" parameter to True/False on each conv_i layer as I see fit. However, now I don't know how to implement my regularization function which must be computed on the weights of all layers conv_i simultaneously rather than independently.
Is there a trick that I can use to implement such function? Or conversely, is there a way to freeze only part of the weights of a convolutional layer?
Thanks!
Solution
For those interested, here is the working code for my problem following the advice provided by #LaplaceRicky.
class SpecialRegularization(tf.keras.Model):
""" In order to avoid a warning message when saving the model,
I use the solution indicated here
https://github.com/tensorflow/tensorflow/issues/44541
and now inherit from tf.keras.Model instead of Layer
"""
def __init__(self,nfilters,**kwargs):
super().__init__(**kwargs)
self.inner_layers=[Conv2D(1,(1,1)) for _ in range(nfilters)]
def call(self, inputs):
outputs=[l(inputs) for l in self.inner_layers]
self.add_loss(self.define_your_regularization_here())
return tf.concat(outputs,-1)
def set_trainable_parts(self, trainables):
""" Set the trainable attribute independently on each filter """
for l,t in zip(self.inner_layers,trainables):
l.trainable = t
def define_your_regularization_here(self):
#reconstruct the original kernel
large_kernel=tf.concat([l.kernel for l in self.inner_layers],-1)
return tf.reduce_sum(large_kernel*large_kernel[:,:,:,::-1])
One way to achieve this is to have a custom keras layer that wraps all of the small conv layers and is responsible for computing the regularization loss.
Example Codes:
import tensorflow as tf
def _get_losses(model,x):
model(x)
return model.losses
def _get_grads(model,x):
with tf.GradientTape() as t:
model(x)
reg_loss=tf.math.add_n(model.losses)
return t.gradient(reg_loss,model.trainable_weights)
class SpecialRegularization(tf.keras.layers.Layer):
def __init__(self, **kwargs):
self.inner_layers=[tf.keras.layers.Conv2D(1,(1,1)) for i in range(10)]
super().__init__(**kwargs)
def call(self, inputs,training=None):
outputs=[l(inputs,training=training) for l in self.inner_layers]
self.add_loss(self.define_your_regularization_here())
return tf.concat(outputs,-1)
def define_your_regularization_here(self):
#reconstruct the original kernel
large_kernel=tf.concat([l.kernel for l in self.inner_layers],-1)
#just giving an example here
#you should define your own regularization using the entire kernel
return tf.reduce_sum(large_kernel*large_kernel[:,:,:,::-1])
tf.random.set_seed(123)
inputs = tf.keras.Input(shape=(7,7,1024))
outputs = SpecialRegularization()(inputs)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
#get_losses, get_grads are for demonstration purpose
get_losses=tf.function(_get_losses)
get_grads=tf.function(_get_grads)
data=tf.random.normal((64,7,7,1024))
print(get_losses(model,data))
print(get_grads(model,data)[0])
print(model.layers[1].inner_layers[-1].kernel*2)
model.summary()
'''
[<tf.Tensor: shape=(), dtype=float32, numpy=-0.20446025>]
tf.Tensor(
[[[[ 0.02072023]
[ 0.12973154]
[ 0.11631528]
...
[ 0.00804012]
[-0.07299817]
[ 0.06031524]]]], shape=(1, 1, 1024, 1), dtype=float32)
tf.Tensor(
[[[[ 0.02072023]
[ 0.12973154]
[ 0.11631528]
...
[ 0.00804012]
[-0.07299817]
[ 0.06031524]]]], shape=(1, 1, 1024, 1), dtype=float32)
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 7, 7, 1024)] 0
_________________________________________________________________
special_regularization (Spec (None, 7, 7, 10) 10250
=================================================================
Total params: 10,250
Trainable params: 10,250
Non-trainable params: 0
_________________________________________________________________
'''
I am working on an autoencoder and I have an issue with reproducing the input in the same size. If I am using transposed convolution / deconvolution operation with the same parameters, I got a different output size then the original input was. For illustrating my problem, let us assume our model consists of just one convlution (to encode the input) and one deconvolution (to decode the encoded input). However, I not get the same size as my input. More precisely, the second and third dimension / axis 1 and axis 2 are 16 and not as one would expect: 15. Here is the code:
import tensorflow as tf
input = tf.keras.Input(shape=(15, 15, 3), name="Input0")
conv2d_layer2 = tf.keras.layers.Conv2D(filters=32, strides=[2, 2], kernel_size=[3, 3],
padding='same',
activation='selu', name="Conv1")
conv2d_trans_layer2 = tf.keras.layers.Conv2DTranspose(filters=32, strides=[2, 2],
kernel_size=[3, 3], padding='same',
activation='selu', name="DeConv1")
x_endcoded_1 = conv2d_layer2(input)
x_reconstructed = conv2d_trans_layer2(x_endcoded_1)
model = tf.keras.Model(inputs=input, outputs=x_reconstructed)
Results in the following model:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
Input0 (InputLayer) [(None, 15, 15, 3)] 0
_________________________________________________________________
Conv1 (Conv2D) (None, 8, 8, 32) 896
_________________________________________________________________
DeConv1 (Conv2DTranspose) (None, 16, 16, 32) 9248
=================================================================
Total params: 10,144
Trainable params: 10,144
How can I reproduce my original input with using just this tranposed convolution? Is this possible?
deleting padding from both you can reproduce the mapping
input = Input(shape=(15, 15, 3), name="Input0")
conv2d_layer2 = Conv2D(filters=32, strides=[2, 2], kernel_size=[3, 3],
activation='selu', name="Conv1")(input)
conv2d_trans_layer2 = Conv2DTranspose(filters=32, strides=[2, 2],
kernel_size=[3, 3],
activation='selu', name="DeConv1")(conv2d_layer2)
model = Model(inputs=input, outputs=conv2d_trans_layer2)
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
Input0 (InputLayer) [(None, 15, 15, 3)] 0
_________________________________________________________________
Conv1 (Conv2D) (None, 7, 7, 32) 896
_________________________________________________________________
DeConv1 (Conv2DTranspose) (None, 15, 15, 32) 9248
=================================================================
In general, to do this in deeper structures you have to play with padding, strides and pooling
online there are a lot of good resources that explain how this operation works and their application in keras
Padding and Stride for Convolutional Neural Networks
Pooling Layers for Convolutional Neural Networks
How to use the UpSampling2D and Conv2DTranspose
I have created a CNN model using Keras and I am training it on a MNIST dataset. I got a reasonable accuracy around 98%, which is what I expected:
model = Sequential()
model.add(Conv2D(64, 5, activation="relu", input_shape=(28, 28, 1)))
model.add(MaxPool2D())
model.add(Conv2D(64, 5, activation="relu"))
model.add(MaxPool2D())
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='adam',
loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(data.x_train, data.y_train,
batch_size=256, validation_data=(data.x_test, data.y_test))
Now I want to build the same model, but using vanilla Tensorflow, here is how I did that:
X = tf.placeholder(shape=[None, 784], dtype=tf.float32, name="X")
Y = tf.placeholder(shape=[None, 10], dtype=tf.float32, name="Y")
net = tf.reshape(X, [-1, 28, 28, 1])
net = tf.layers.conv2d(
net, filters=64, kernel_size=5, padding="valid", activation=tf.nn.relu)
net = tf.layers.max_pooling2d(net, pool_size=2, strides=2)
net = tf.layers.conv2d(
net, filters=64, kernel_size=5, padding="valid", activation=tf.nn.relu)
net = tf.layers.max_pooling2d(net, pool_size=2, strides=2)
net = tf.contrib.layers.flatten(net)
net = tf.layers.dense(net, name="dense1", units=256, activation=tf.nn.relu)
model = tf.layers.dense(net, name="output", units=10)
And here is how I train/test it:
loss = tf.nn.softmax_cross_entropy_with_logits_v2(labels=Y, logits=model)
opt = tf.train.AdamOptimizer().minimize(loss)
accuracy = tf.cast(tf.equal(tf.argmax(model, 1), tf.argmax(Y, 1)), tf.float32)
with tf.Session() as sess:
tf.global_variables_initializer().run()
for batch in range(data.get_number_of_train_batches(batch_size)):
x, y = data.get_next_train_batch(batch_size)
sess.run([loss, opt], feed_dict={X: x, Y: y})
for batch in range(data.get_number_of_test_batches(batch_size)):
x, y = data.get_next_test_batch(batch_size)
sess.run(accuracy, feed_dict={X: x, Y: y})
But the resulting accuracy of the model dropped to ~80%. What are the principal differences between my implementation of that model using Keras and Tensorflow ? Why the accuracy varies so much ?
I don't see any mistakes in your code. Note that your current model is heavily parameterized for such a simple problem because of the Dense layers, which introduce over 260k trainable parameters:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_3 (Conv2D) (None, 24, 24, 64) 1664
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 12, 12, 64) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 8, 8, 64) 102464
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 4, 4, 64) 0
_________________________________________________________________
flatten_2 (Flatten) (None, 1024) 0
_________________________________________________________________
dense_2 (Dense) (None, 256) 262400
_________________________________________________________________
dense_3 (Dense) (None, 10) 2570
=================================================================
Total params: 369,098
Trainable params: 369,098
Non-trainable params: 0
_________________________________________________________________
Below, I will run your code with:
minor adaptations to make the code work with the MNIST dataset in keras.datasets
a simplified model: basically I remove the 256-node Dense layer, drastically reducing the number of trainable parameters, and introduce some dropout for regularization.
With these changes, both models achieve 90%+ validation set accuracy after the first epoch. So it seems the problem you encountered has to do with an ill-posed optimization problem which leads to highly variable outcomes, and not with a bug in your code.
# Import the datasets
import numpy as np
from keras.datasets import mnist
from keras.utils import to_categorical
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Add batch dimension
x_train = np.expand_dims(x_train, axis=-1)
x_test = np.expand_dims(x_test, axis=-1)
# One-hot encode the labels
y_train = to_categorical(y_train, num_classes=None)
y_test = to_categorical(y_test, num_classes=None)
batch_size = 64
# Fit model using Keras
import keras
import numpy as np
from keras.layers import Conv2D, MaxPool2D, Flatten, Dense, Dropout
from keras.models import Sequential
model = Sequential()
model.add(Conv2D(32, 5, activation="relu", input_shape=(28, 28, 1)))
model.add(MaxPool2D())
model.add(Conv2D(32, 5, activation="relu"))
model.add(MaxPool2D())
model.add(Flatten())
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='adam',
loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=32, validation_data=(x_test, y_test), epochs=1)
Result:
Train on 60000 samples, validate on 10000 samples
Epoch 1/1
60000/60000 [==============================] - 35s 583us/step - loss: 1.5217 - acc: 0.8736 - val_loss: 0.0850 - val_acc: 0.9742
Note that the number of trainable parameters is now just a fraction of the amount in your model:
model.summary()
Layer (type) Output Shape Param #
=================================================================
conv2d_3 (Conv2D) (None, 24, 24, 32) 832
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 12, 12, 32) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 8, 8, 32) 25632
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 4, 4, 32) 0
_________________________________________________________________
flatten_2 (Flatten) (None, 512) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 512) 0
_________________________________________________________________
dense_2 (Dense) (None, 10) 5130
=================================================================
Total params: 31,594
Trainable params: 31,594
Non-trainable params: 0
Now, doing the same with TensorFlow:
# Fit model using TensorFlow
import tensorflow as tf
X = tf.placeholder(shape=[None, 28, 28, 1], dtype=tf.float32, name="X")
Y = tf.placeholder(shape=[None, 10], dtype=tf.float32, name="Y")
net = tf.layers.conv2d(
X, filters=32, kernel_size=5, padding="valid", activation=tf.nn.relu)
net = tf.layers.max_pooling2d(net, pool_size=2, strides=2)
net = tf.layers.conv2d(
net, filters=32, kernel_size=5, padding="valid", activation=tf.nn.relu)
net = tf.layers.max_pooling2d(net, pool_size=2, strides=2)
net = tf.contrib.layers.flatten(net)
net = tf.layers.dropout(net, rate=0.25)
model = tf.layers.dense(net, name="output", units=10)
loss = tf.nn.softmax_cross_entropy_with_logits_v2(labels=Y, logits=model)
opt = tf.train.AdamOptimizer().minimize(loss)
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(model, 1), tf.argmax(Y, 1)), tf.float32))
with tf.Session() as sess:
tf.global_variables_initializer().run()
L = []
l_ = 0
for i in range(x_train.shape[0] // batch_size):
x, y = x_train[i*batch_size:(i+1)*batch_size],\
y_train[i*batch_size:(i+1)*batch_size]
l, _ = sess.run([loss, opt], feed_dict={X: x, Y: y})
l_ += np.mean(l)
L.append(l_ / (x_train.shape[0] // batch_size))
print('Training loss: {:.3f}'.format(L[-1]))
acc = []
for j in range(x_test.shape[0] // batch_size):
x, y = x_test[j*batch_size:(j+1)*batch_size],\
y_test[j*batch_size:(j+1)*batch_size]
acc.append(sess.run(accuracy, feed_dict={X: x, Y: y}))
print('Test set accuracy: {:.3f}'.format(np.mean(acc)))
Result:
Training loss: 0.519
Test set accuracy: 0.968
Possible improvement of your models.
I used CNN networks on different problems and always got good effectiveness improvements with regularization techniques, the best ones with dropout.
I suggest to use Dropout on the Dense layers and in case with lower probability on the convolutional ones.
Also data augmentation on the input data is very important, but applicability depends on the problem domain.
P.s: in one case I had to change the optimization from Adam to SGD with Momentum. So, playing with the optimization makes sense. Also Gradient clipping can be considered when your networks starves and doesn't improve effectiveness, may be a numeric issue.
I am trying to use Tensorflow through Keras to build a network that uses time-series data to predict the next value, but I'm getting this error:
ValueError: Error when checking target: expected dense_84 to have 2 dimensions, but got array with shape (100, 9, 1)
What is causing this? I've tried reshaping the data as other posts have suggested, but to no avail so far. Here is the code:
import keras
import numpy as np
import os
from keras import losses
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten, Dropout
from keras.layers.convolutional import Conv1D, Conv2D
# add the desktop to our path so we can access the data
os.path.join("C:\\Users\\user\\Desktop")
# import data
data = np.genfromtxt("C:\\Users\\user\\Desktop\\aapl_blocks_10.csv",
delimiter=',')
# separate into inputs and outputs
X = data[:, :9]
X = np.expand_dims(X, axis=2) # reshape (409, 9) to (409, 9, 1) for network
Y = data[:, 9]
# separate into test and train data
X_train = X[:100]
X_test = X[100:]
Y_train = Y[:100]
Y_test = Y[100:]
# set parameters
batch_size = 20;
# define model
model = Sequential()
model.add(Conv1D(filters=20,
kernel_size=5,
input_shape=(9, 1),
padding='causal'))
model.add(Flatten())
model.add(Dropout(rate=0.3))
model.add(Dense(units=10))
model.add(Activation('relu'))
model.add(Dense(units=1))
model.compile(loss=losses.mean_squared_error,
optimizer='sgd',
metrics=['accuracy'])
# train model
model.fit(X_train, Y_train, epochs=10, batch_size=batch_size)
# evaluate model
model.evaluate(X_test, Y_test, batch_size=batch_size)
And here is the model summary:
Layer (type) Output Shape Param #
=================================================================
conv1d_43 (Conv1D) (None, 9, 20) 120
_________________________________________________________________
flatten_31 (Flatten) (None, 180) 0
_________________________________________________________________
dropout_14 (Dropout) (None, 180) 0
_________________________________________________________________
dense_83 (Dense) (None, 10) 1810
_________________________________________________________________
activation_29 (Activation) (None, 10) 0
_________________________________________________________________
dense_84 (Dense) (None, 1) 11
=================================================================
Total params: 1,941
Trainable params: 1,941
Non-trainable params: 0
If there's a proper way to be formatting the data, or maybe a proper way to stack these layers, I would love to know.
I suspect you need to squeeze the channel dimension from the output, i.e. the labes are shape (batch_size, 9) and you're comparing that against the output of a dense layer with 1 channel which has size (batch_size, 9, 1). Solution: squeeze/flatten before calculating the loss.
...
model.add(Activation('relu'))
model.add(Dense(units=1))
model.add(Flatten())
model.compile(loss=losses.mean_squared_error,
optimizer='sgd',
metrics=['accuracy'])
A note on squeeze vs Flatten: in this case, the result of squeezing (removing an axis of dimension 1) and flattening (making something of shape (batch_size, n, m, ...) into shape (batch_size, nm...) will be the same. Squeeze might be slightly more appropriate in this case, since if you accidentally squeeze an axis without dimension 1 you'll get an error (a good thing), as opposed to having your program run with unexpected behaviour. I don't use keras much though and couldn't find a 'Squeeze' layer - just a squeeze function - and I'm not entirely sure how to integrate it.