I'm trying to do a CNN 1D for time series.
First issue:
When trying to use an input shape of [1,1] I get an error:
Error: Negative dimension size caused by adding layer average_pooling1d_AveragePooling1D1 with input shape [,0,128]
2nd issue
I have 2 different arrays (1d) for my data: first array is the input data containing the time series and the 2nd array contains the output data with closed values for a stock.
Something that got me to a few more results was to set the input shape to [6,1].
Model summary:
_________________________________________________________________
Layer (type) Output shape Param #
=================================================================
conv1d_Conv1D1 (Conv1D) [null,5,128] 384
_________________________________________________________________
average_pooling1d_AveragePoo [null,4,128] 0
_________________________________________________________________
conv1d_Conv1D2 (Conv1D) [null,3,64] 16448
_________________________________________________________________
average_pooling1d_AveragePoo [null,2,64] 0
_________________________________________________________________
conv1d_Conv1D3 (Conv1D) [null,1,16] 2064
_________________________________________________________________
average_pooling1d_AveragePoo [null,0,16] 0
_________________________________________________________________
flatten_Flatten1 (Flatten) [null,0] 0
_________________________________________________________________
dense_Dense1 (Dense) [null,1] 1
=================================================================
Here training the model got me into issues:
const trainX = tf.tensor1d(data.inTime).reshape([100, 6, 1])
100 - size of my array
6 - features
1 - 1 unit as output
Error: Size(100) must match the product of shape 100,6,1
I'm stuck at the training step because I don't know how to train it.
I would prefere to have a [1,1] input shape, to give only 1 time series and to have 1 output from it.
The model
async function buildModel() {
const model = tf.sequential()
// settings
const kernelSize = 2
const poolSize = [2]
// tf layers
model.add(tf.layers.conv1d({
inputShape: [6, 1],
kernelSize: kernelSize,
filters: 128,
strides: 1,
useBias: true,
activation: 'relu',
kernelInitializer: 'varianceScaling'
}))
model.add(tf.layers.averagePooling1d({poolSize: poolSize, strides: [1]}))
// 2nd layer
model.add(tf.layers.conv1d({
kernelSize: kernelSize,
filters: 64,
strides: 1,
useBias: true,
activation: 'relu',
kernelInitializer: 'varianceScaling'
}))
model.add(tf.layers.averagePooling1d({poolSize: poolSize, strides: [1]}))
model.add(tf.layers.conv1d({
kernelSize: kernelSize,
filters: 16,
strides: 1,
useBias: true,
activation: 'relu',
kernelInitializer: 'varianceScaling'
}))
model.add(tf.layers.averagePooling1d({poolSize: poolSize, strides: [1]}))
model.add(tf.layers.flatten())
model.add(tf.layers.dense({
units: 1,
kernelInitializer: 'VarianceScaling',
activation: 'linear'
}))
// optimizer + learning rate
const optimizer = tf.train.adam(0.0001)
model.compile({
optimizer: optimizer,
loss: 'meanSquaredError',
metrics: ['accuracy'],
})
return model
}
Training where the error is occurring
async function train(model, data) {
console.log(`MODEL SUMMARY:`)
model.summary()
// Train the model
const epochs = 2
// train data size, 28, 28, 1
const trainX = tf.tensor1d(data.inTime).reshape([100, 6, 1])
const trainY = tf.tensor([data.outClosed], [1, data.size, 1])
let result = await model.fit(trainX, trainY, {
epochs: epochs
})
print("Loss after last Epoch (" + result.epoch.length + ") is: " + result.history.loss[result.epoch.length-1])
return result
}
Any ideas into how to fix it will be much appreciated!
Time series is a sequence taken at successive equally spaced points in time according to wikipedia. The goal of the neural network NN used on time series is to find the pattern between the series of data. Convolutiona Neural Networks CNN are rarely if not never used on this kind of data. Other NN often used are RNN and LSTM. If we are interested in finding a pattern in a series of data, the inputShape can't be [1, 1]; otherwise it will mean finding a pattern on a unique point. It can be done theoretically, but in reality it does not capture the essence of the time series.
The model used here is using CNN with average pooling layer. Of course, a pooling layer cannot be applied on a layer with a pooling size bigger than the shape of the layer thus throwing the error:
Error: Negative dimension size caused by adding layer average_pooling1d_AveragePooling1D1 with input shape [,0,128]
The last error:
Error: Size(100) must match the product of shape 100,6,1
indicates a mismatch of the size of the tensors.
100 * 6 * 1 = 600 elements in the tensor (size =600) whereas the input tensor has 100 elements resulting in the error.
Related
i have found the parameters used for MNIST dataset which is as below
# Parameters Based on Paper
epsilon = 1e-7
m_plus = 0.9
m_minus = 0.1
lambda_ = 0.5
alpha = 0.0005
epochs = 3
no_of_secondary_capsules = 10
params = {
"no_of_conv_kernels": 256,
"no_of_primary_capsules": 64,
"no_of_secondary_capsules": 128,
"primary_capsule_vector": 16,
"secondary_capsule_vector": 32,
"r":3,
}
the input shape for MNIST is 28,28,1
I want this parameters change for my input data shaped as 13,9,1
because when I use the MNIST parameters for capsule network it throws error about the shape
ValueError: Exception encountered when calling layer "primary_caps" (type PrimaryCaps).
in user code:
File "/content/Efficient-CapsNet/utils/layers_hinton.py", line 69, in call *
x = tf.nn.conv2d(inputs, self.kernel, self.s, 'VALID')
ValueError: Negative dimension size caused by subtracting 9 from 5 for '{{node primary_caps/Conv2D}} = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], explicit_paddings=[], padding="VALID", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true](Placeholder, primary_caps/Conv2D/ReadVariableOp)' with input shapes: [?,5,11,256], [9,9,256,256].
Call arguments received:
• inputs=tf.Tensor(shape=(None, 5, 11, 256), dtype=float32)
can someone suggest parameters for capsule network?
The data was audio (13,9,1) so converting it to spectrogram image and then reading it with target size (28,28) helped me workaround the issue of using capsule network for the audio dataset.
This workaround can be used if you want to go with the original hyperparameters and network designs of the capsule network with dynamic routing paper.
I'm trying to train an autoencoder, with constraints that force one or more of the hidden/encoded nodes/neurons to have an interpretable value. My training approach uses paired images (though after training the model should operate on a single image) and utilizes a joint loss function that includes (1) the reconstruction loss for each of the images and (2) a comparison between values of the hidden/encoded vector, from each of the two images.
I've created an analogous simple toy problem and model to make this clearer. In the toy problem, the autoencoder is given a vector of length 3 as input. The encoding uses one dense layer to compute the mean (a scalar) and another dense layer to compute some other representation of the vector (given my construction, it will likely just learn an identity matrix, i.e., copy the input vector). See the figure below. The lowest node of the hidden layer is intended to compute the mean of the input vector. The rest of the hidden nodes are unconstrained aside from having to accommodate a reconstruction that matches the input.
The figure below exhibits how I wish to train the model, using paired images. "MSE" is mean-squared-error, although the identity of the actual function is not important for the question I'm asking here. The loss function is the sum of the reconstruction loss and the mean-estimation loss.
I've tried creating (1) a tf.data.Dataset to generate paired vectors, (2) a Keras model, and (3) a custom loss function. However, I'm failing to understand how to do this correctly for this particular situation.
I can't get the Model.fit() to run correctly, and to associate the model outputs with the Dataset targets as intended. See code and errors below. Can anyone help? I've done many Google and stackoverflow searches and still don't understand how I can implement this.
import tensorflow as tf
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
DTYPE = tf.dtypes.float32
N_VEC = 3
def my_generator(n):
while True:
# Create two identical vectors of length, except with different means.
# An internal layer (single neuron) of the model should predict the
# mean of the input vector. To train it to do so, with paired
# vector inputs, use a loss function that penalizes incorrect
# predictions of the difference of the means of two input vectors.
input_vec1 = tf.random.normal((n,), dtype=DTYPE)
target_mean_diff = tf.random.normal((1,), dtype=DTYPE)
input_vec2 = input_vec1 + target_mean_diff
# Model is a constrained autoencoder. Output targets are
# identical to the input vectors. Including them as explicit
# targets in this generator, for generalization.
target_vec1 = tf.identity(input_vec1)
target_vec2 = tf.identity(input_vec2)
yield ({'input_vec1':input_vec1,
'input_vec2':input_vec2},
{'target_vec1':target_vec1,
'target_vec2':target_vec2,
'target_mean_diff':target_mean_diff})
def my_dataset(n, batch_size=4):
ds = tf.data.Dataset.from_generator(my_generator,
output_signature=({'input_vec1':tf.TensorSpec(shape=(n,), dtype=DTYPE),
'input_vec2':tf.TensorSpec(shape=(n,), dtype=DTYPE)},
{'target_vec1':tf.TensorSpec(shape=(n,), dtype=DTYPE),
'target_vec2':tf.TensorSpec(shape=(n,), dtype=DTYPE),
'target_mean_diff':tf.TensorSpec(shape=(1,), dtype=DTYPE)}),
args=(n,))
ds = ds.batch(batch_size)
return ds
## Do a brief test using the Dataset
ds = my_dataset(N_VEC, batch_size=4)
ds_iter = iter(ds)
dict_inputs, dict_targets = next(ds_iter)
print(dict_inputs)
print(dict_targets)
## Define the Model
layer_encode_vec = tf.keras.layers.Dense(N_VEC, activation=None, name='encode_vec')
layer_decode_vec = tf.keras.layers.Dense(N_VEC, activation=None, name='decode_vec')
layer_encode_mean = tf.keras.layers.Dense(1, activation=None, name='encode_mean')
layer_decode_mean = tf.keras.layers.Dense(N_VEC, activation=None, name='decode_mean')
input1 = tf.keras.Input(shape=(N_VEC,), name='input_vec1')
input2 = tf.keras.Input(shape=(N_VEC,), name='input_vec2')
vec_encoded1 = layer_encode_vec(input1)
vec_encoded2 = layer_encode_vec(input2)
mean_encoded1 = layer_encode_mean(input1)
mean_encoded2 = layer_encode_mean(input2)
mean_diff = mean_encoded2 - mean_encoded1
pred_vec1 = layer_decode_vec(vec_encoded1) + layer_decode_mean(mean_encoded1)
pred_vec2 = layer_decode_vec(vec_encoded2) + layer_decode_mean(mean_encoded2)
model = tf.keras.Model(inputs=[input1, input2], outputs=[pred_vec1, pred_vec2, mean_diff])
print(model.summary())
## Define the joint loss function
def loss_total(y_true, y_pred):
loss_reconstruct = tf.reduce_mean(tf.keras.MSE(y_true[0], y_pred[0]))/2 + \
tf.reduce_mean(tf.keras.MSE(y_true[1], y_pred[1]))/2
loss_mean = tf.reduce_mean(tf.keras.MSE(y_true[2], y_pred[2]))
return loss_reconstruct + loss_mean
## Compile model
optimizer = tf.keras.optimizers.Adam(lr=0.01)
model.compile(optimizer=optimizer, loss=loss_total)
## Train model
history = model.fit(x=ds, epochs=10, steps_per_epoch=10)
Output: Example batch from the Dataset:
{'input_vec1': <tf.Tensor: shape=(4, 3), dtype=float32, numpy=
array([[-0.53022575, -0.02389329, 0.32843253],
[-0.61793506, -0.8276422 , -1.3469328 ],
[-0.5401968 , 0.3141346 , -1.3638284 ],
[-1.2189807 , 0.23848908, 0.75108534]], dtype=float32)>, 'input_vec2': <tf.Tensor: shape=(4, 3), dtype=float32, numpy=
array([[-0.23415083, 0.27218163, 0.6245074 ],
[-0.57636774, -0.7860749 , -1.3053654 ],
[ 0.65463066, 1.508962 , -0.16900098],
[-0.49326736, 0.9642024 , 1.4767987 ]], dtype=float32)>}
{'target_vec1': <tf.Tensor: shape=(4, 3), dtype=float32, numpy=
array([[-0.53022575, -0.02389329, 0.32843253],
[-0.61793506, -0.8276422 , -1.3469328 ],
[-0.5401968 , 0.3141346 , -1.3638284 ],
[-1.2189807 , 0.23848908, 0.75108534]], dtype=float32)>, 'target_vec2': <tf.Tensor: shape=(4, 3), dtype=float32, numpy=
array([[-0.23415083, 0.27218163, 0.6245074 ],
[-0.57636774, -0.7860749 , -1.3053654 ],
[ 0.65463066, 1.508962 , -0.16900098],
[-0.49326736, 0.9642024 , 1.4767987 ]], dtype=float32)>, 'target_mean_diff': <tf.Tensor: shape=(4, 1), dtype=float32, numpy=
array([[0.29607493],
[0.04156734],
[1.1948274 ],
[0.7257133 ]], dtype=float32)>}
Output: The model summary:
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_vec1 (InputLayer) [(None, 3)] 0
__________________________________________________________________________________________________
input_vec2 (InputLayer) [(None, 3)] 0
__________________________________________________________________________________________________
encode_vec (Dense) (None, 3) 12 input_vec1[0][0]
input_vec2[0][0]
__________________________________________________________________________________________________
encode_mean (Dense) (None, 1) 4 input_vec1[0][0]
input_vec2[0][0]
__________________________________________________________________________________________________
decode_vec (Dense) (None, 3) 12 encode_vec[0][0]
encode_vec[1][0]
__________________________________________________________________________________________________
decode_mean (Dense) (None, 3) 6 encode_mean[0][0]
encode_mean[1][0]
__________________________________________________________________________________________________
tf.__operators__.add (TFOpLambd (None, 3) 0 decode_vec[0][0]
decode_mean[0][0]
__________________________________________________________________________________________________
tf.__operators__.add_1 (TFOpLam (None, 3) 0 decode_vec[1][0]
decode_mean[1][0]
__________________________________________________________________________________________________
tf.math.subtract (TFOpLambda) (None, 1) 0 encode_mean[1][0]
encode_mean[0][0]
==================================================================================================
Total params: 34
Trainable params: 34
Non-trainable params: 0
__________________________________________________________________________________________________
Output: The error message when calling model.fit():
Epoch 1/10
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
...
ValueError: Found unexpected keys that do not correspond to any
Model output: dict_keys(['target_vec1', 'target_vec2', 'target_mean_diff']).
Expected: ['tf.__operators__.add', 'tf.__operators__.add_1', 'tf.math.subtract']
You can pass a dict to Model for both inputs and outputs like so:
model = tf.keras.Model(
inputs={"input_vec1": input1, "input_vec2": input2},
outputs={
"target_vec1": pred_vec1,
"target_vec2": pred_vec2,
"target_mean_diff": mean_diff,
},
)
which avoids having to name the output layers.
For the losses, it's currently applying loss_total to each of the 3 outputs individually and summing to get the final loss, which is not what you want. So you can either break out each of the losses individually:
model.compile(
optimizer=optimizer,
loss={"target_vec1": "mse", "target_vec2": "mse", "target_mean_diff": "mse"},
loss_weights={"target_vec1": 0.5, "target_vec2": 0.5, "target_mean_diff": 1},
)
or you can manually train the model using a modified loss function that takes dict input. Something like:
def loss_total(y_true, y_pred):
loss_reconstruct = (
tf.reduce_mean(tf.keras.losses.MSE(y_true["target_vec1"], y_pred["target_vec1"])) / 2
+ tf.reduce_mean(tf.keras.losses.MSE(y_true["target_vec2"], y_pred["target_vec2"])) / 2
)
loss_mean = tf.reduce_mean(tf.keras.losses.MSE(y_true["target_mean_diff"], y_pred["target_mean_diff"]))
return loss_reconstruct + loss_mean
for epoch in range(10):
for batch, (x, y) in zip(range(10), ds):
with tf.GradientTape() as tape:
outputs = model(x, training=True)
loss = loss_total(y, outputs)
trainable_vars = model.trainable_variables
gradients = tape.gradient(loss, trainable_vars)
optimizer.apply_gradients(zip(gradients, trainable_vars))
print(f"Batch: {batch}, loss: {loss.numpy()}")
My question concerns more about how the algorithm work. I have successfully implemented EfficientNet integration and modelization for grayscale images and now I want to understand why it works.
Here the most important aspect is the grayscale and its 1 channel. When I put channels=1, the algorithm doesn't work because, if I understood right, it was made on 3-channel images. When I put channels=3 it works perfectly.
So my question is, when I put channels = 3 and feed the model with preprocessed images with channels=1, why it continues to work?
Code for EfficientNetB5
# Variable assignments
num_classes = 9
img_height = 84
img_width = 112
channels = 3
batch_size = 32
# Make the input layer
new_input = Input(shape=(img_height, img_width, channels),
name='image_input')
# Download and use EfficientNetB5
tmp = tf.keras.applications.EfficientNetB5(include_top=False,
weights='imagenet',
input_tensor=new_input,
pooling='max')
model = Sequential()
model.add(tmp) # adding EfficientNetB5
model.add(Flatten())
...
Code of preprocessing into grayscale
data_generator = ImageDataGenerator(
validation_split=0.2)
train_generator = data_generator.flow_from_directory(
train_path,
target_size=(img_height, img_width),
batch_size=batch_size,
color_mode="grayscale", ###################################
class_mode="categorical",
subset="training")
I dug into what happens when you give grayscale images to efficient net models with three-channel inputs.
Here are the first layers of Efficient Net B5 whose input_shape is (128,128,3)
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_7 (InputLayer) [(None, 128, 128, 3 0 []
)]
rescaling_7 (Rescaling) (None, 128, 128, 3) 0 ['input_7[0][0]']
normalization_13 (Normalizatio (None, 128, 128, 3) 7 ['rescaling_7[0][0]']
n)
tf.math.truediv_4 (TFOpLambda) (None, 128, 128, 3) 0 ['normalization_13[0][0]']
stem_conv_pad (ZeroPadding2D) (None, 129, 129, 3) 0 ['tf.math.truediv_4[0][0]']
And here is the shape of the output of each of these layers when the model has as input a grayscale image:
input_7 (128, 128, 1)
rescaling_7 (128, 128, 1)
normalization_13 (128, 128, 3)
tf.math.truediv_4 (128, 128, 3)
stem_conv_pad (129, 129, 3)
As you can see, the number of channels of the output tensor switches from 1 to 3 when proceeding to the normalization_13 layer, so let's see what this layer is actually doing.
The Normalization layer is performing this operation on the input tensor:
(input_tensor - self.mean) / sqrt(self.var) // see https://www.tensorflow.org/api_docs/python/tf/keras/layers/Normalization
The number of channels changes after the subtraction. As a matter of fact, self.mean looks like this :
<tf.Tensor: shape=(1, 1, 1, 3), dtype=float32, numpy=array([[[[0.485, 0.456, 0.406]]]], dtype=float32)>
So self.mean has three channels and when performing the subtraction between a tensor with one channel and a tensor with three channels, the output looks like this: [firstTensor - secondTensorFirstChannel, firstTensor - secondTensorSecondChannel, firstTensor - secondTensorThirdChannel]
And this is how the magic happens and this is why the model can take as input grayscale images!
I have checked this with efficient net B5 and with efficient net B2V2. Even if they have differences in the way the Normalization layer is declared, the process is the same. I suppose that is also the case for the other efficient net models.
I hope it was clear enough!
This is interesting. If training still works with channels = 3 even though the input is grayscale, I would check the batch shape of the train_generator(maybe print a couple of batches to get a feel for it). Here is a code snippet to quickly check the batch shape. (plotImages() is available in Tensorflow docs)
imgs,labels = next(train_generator)
print('Batch shape: ',imgs.shape)
plotImages(imgs,labels)
I want to create a machine learning model for audio files. I converted the audio files into a (spectrogram) tensor. My feature tensor (the audio files) has the following shape [119, 241, 125] (119 files, 241 samples/file, 125 frequencies/sample). By sample, I define the samples I took in a timespan e.g. 16ms. My output shape will be [119, numOptions].
I followed this tutorial from Tensorflow.js on audio recognition. They build this model:
I reshape my features tensor to be 4D:
this.features = this.features.reshape([this.features.shape[0],this.features.shape[1],this.features.shape[2],1])for the 2Dconv.
buildModel() {
const inputShape1 = [this.features.shape[1], this.features.shape[2],this.features.shape[3]];
this.model = tfNode.sequential();
// filter to the image => feature extractor, edge detector, sharpener (depends on the models understanding)
this.model.add(tfNode.layers.conv2d(
{filters: 8, kernelSize: [4, 2], activation: 'relu', inputShape: inputShape1}
));
// see the image at a higher level, generalize it more, prevent overfit
this.model.add(tfNode.layers.maxPooling2d(
{poolSize: [2, 2], strides: [2, 2]}
));
// filter to the image => feature extractor, edge detector, sharpener (depends on the models understanding)
const inputShape2 = [119,62,8];
this.model.add(tfNode.layers.conv2d(
{filters: 32, kernelSize: [4, 2], activation: 'relu', inputShape: inputShape2}
));
// see the image at a higher level, generalize it more, prevent overfit
this.model.add(tfNode.layers.maxPooling2d(
{poolSize: [2, 2], strides: [2, 2]}
));
// filter to the image => feature extractor, edge detector, sharpener (depends on the models understanding)
const inputShape3 = [58,30,32];
this.model.add(tfNode.layers.conv2d(
{filters: 32, kernelSize: [4, 2], activation: 'relu', inputShape: inputShape3}
));
// see the image at a higher level, generalize it more, prevent overfit
this.model.add(tfNode.layers.maxPooling2d(
{poolSize: [2, 2], strides: [2, 2]}
));
// 1D output, => final output score of labels
this.model.add(tfNode.layers.flatten({}));
// prevents overfitting, randomly set 0
this.model.add(tfNode.layers.dropout({rate: 0.25}));
// learn anything linear, non linear comb. from conv. and soft pool
this.model.add(tfNode.layers.dense({units: 2000, activation: 'relu'}));
this.model.add(tfNode.layers.dropout({rate: 0.25}));
// give probability for each label
this.model.add(tfNode.layers.dense({units: this.labels.shape[1], activation: 'softmax'}));
this.model.summary();
// compile the model
this.model.compile({loss: 'meanSquaredError', optimizer: 'adam'});
this.model.summary()
};
Model summary:
_________________________________________________________________
Layer (type) Output shape Param #
=================================================================
conv2d_Conv2D1 (Conv2D) [null,238,124,8] 72
_________________________________________________________________
max_pooling2d_MaxPooling2D1 [null,119,62,8] 0
_________________________________________________________________
conv2d_Conv2D2 (Conv2D) [null,116,61,32] 2080
_________________________________________________________________
max_pooling2d_MaxPooling2D2 [null,58,30,32] 0
_________________________________________________________________
conv2d_Conv2D3 (Conv2D) [null,55,29,32] 8224
_________________________________________________________________
max_pooling2d_MaxPooling2D3 [null,27,14,32] 0
_________________________________________________________________
flatten_Flatten1 (Flatten) [null,12096] 0
_________________________________________________________________
dropout_Dropout1 (Dropout) [null,12096] 0
_________________________________________________________________
dense_Dense1 (Dense) [null,2000] 24194000
_________________________________________________________________
dropout_Dropout2 (Dropout) [null,2000] 0
_________________________________________________________________
dense_Dense2 (Dense) [null,2] 4002
=================================================================
Total params: 24208378
Trainable params: 24208378
Non-trainable params: 0
_________________________________________________________________
Epoch 1 / 10
eta=0.0 ======================================>----------------------------------------------------------------------------- loss=0.515 0.51476
eta=0.8 ============================================================================>--------------------------------------- loss=0.442 0.44186
eta=0.0 ===================================================================================================================>
3449ms 32236us/step - loss=0.485 val_loss=0.958
Epoch 2 / 10
eta=0.0 ======================================>----------------------------------------------------------------------------- loss=0.422 0.42188
eta=0.9 ============================================================================>--------------------------------------- loss=0.395 0.39535
eta=0.0 ===================================================================================================================>
3643ms 34043us/step - loss=0.411 val_loss=0.958
Epoch 3 / 10
1) The first input size is my features tensor shape. The other two inputShapes (inputShape2, inputShape3) where defined by the error message I got. How to determine the following two input sizes in advance?
How the inputShape is calculated ?
It is not the inputShape that is calculated. It is the dataset that is passed to the model that has to match the inputShape. While defining the model, the inputShape is of 3D. But looking at the model summary, there is a fourth dimension with value null that is the batchshape. As a result, the training data should be of 4D. The first dimension or batchshape can be whatever - what matters is for the features and the labels to have the same batchshape. There is a more detailed answer here
How the layers shape is calculated ?
It depends of the layers used. Layers such as dropout, activation don't change the input shape.
Depending on the stride kernel, the convolution layer will change the input shape. This answer details how it is calculated.
A flatten layer will simply reshape the inputShape to be of one dimension. In the model summary, there is the input shape [null,27,14,32] and the flatten layer has the shape [null, 12096] (12096 = 27 * 14 *32)
The dense layer will also change the input shape. The shape of the dense layer depends of the number of units of that layer.
This seems like a trivial question, but I've been unable to find the answer.
I have batched sequences of images of shape:
[batch_size, number_of_frames, frame_height, frame_width, number_of_channels]
and I would like to pass each frame through a few convolutional and pooling layers. However, TensorFlow's conv2d layer accepts 4D inputs of shape:
[batch_size, frame_height, frame_width, number_of_channels]
My first attempt was to use tf.map_fn over axis=1, but I discovered that this function does not propagate gradients.
My second attempt was to use tf.unstack over the first dimension and then use tf.while_loop. However, my batch_size and number_of_frames are dynamically determined (i.e. both are None), and tf.unstack raises {ValueError} Cannot infer num from shape (?, ?, 30, 30, 3) if num is unspecified. I tried specifying num=tf.shape(self.observations)[1], but this raises {TypeError} Expected int for argument 'num' not <tf.Tensor 'A2C/infer/strided_slice:0' shape=() dtype=int32>.
Since all the images (num_of_frames) are passed to the same convolutional model, you can stack both batch and frames together and do the normal convolution. Can be achieved by just using tf.resize as shown below:
# input with size [batch_size, frame_height, frame_width, number_of_channels
x = tf.placeholder(tf.float32,[None, None,32,32,3])
# reshape for the conv input
x_reshapped = tf.reshape(x,[-1, 32, 32, 3])
x_reshapped output size will be (50, 32, 32, 3)
# define your conv network
y = tf.layers.conv2d(x_reshapped,5,kernel_size=(3,3),padding='SAME')
#(50, 32, 32, 3)
#Get back the input shape
out = tf.reshape(x,[-1, tf.shape(x)[1], 32, 32, 3])
The output size would be same as the input: (10, 5, 32, 32, 3
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(out, {x:np.random.normal(size=(10,5,32,32,3))}).shape)
#(10, 5, 32, 32, 3)