Help, please!
I have a TensorFlow time series problem with the constraint that the model's input layer should be shape of (batch, before = 5, features = 1) and the model's output shape should be (batch, after = 5, features = 1).
As well as (features = 1) neurons in the final dense
layer since model predicts only 1 feature.
How do I go about shaping the input/output layers in the RNN model?
Related
Keras pre-trained models (VGG, ResNet, DenseNet, etc.) have weights established after training on ImageNet with input shape (224, 224, 3). However, Keras allows us to specify any other input shape (width and height should be no smaller than 32). How does Keras determine the initial weights of the first hidden layer when the input shape is other than (224, 224, 3)?
It depends on parameter include_top.
Example:
import tensorflow as tf
model = tf.keras.applications.VGG16(include_top = True, input_shape=(299, 299, 3))
model.summary()
This will throw an error because when you pass include_top = True whole VGG16 architecture will be loaded including Dense layers.
As Dense layers care about the shape, it will throw an error. Because of the operation that Dense layers employ, shapes must be defined and matched with the input shape.
-- Source Code --
Second Example:
import tensorflow as tf
model = tf.keras.applications.VGG16(include_top = False, input_shape=(299, 299, 3))
model.summary()
This time, model only has convolutional layers because include_top = False. Convolutional layers are just sliding filters on the image. So input shape is not a problem for normal convolutions.
When you pass an input_shape, Keras creates an Input layer for that shape. Then creates the model, after that loads the weights.
-- Source Code --
The only constraint here is that, since these models are trained on RGB images, the new images should also have 3 channels.
I am learning Tensorflow and Keras to implement LSTM many-to-many model where the length of input sequence is equal to the length of the output sequence.
Sample Code:
Inputs:
voc_size = 10000
embed_dim = 64
lstm_units = 75
size_batch = 30
count_classes = 5
Model:
from tensorflow.keras.layers import ( Bidirectional, LSTM,
Dense, Embedding, TimeDistributed )
from tensorflow.keras import Sequential
def sample_build(embed_dim, voc_size, batch_size, lstm_units, count_classes):
model = Sequential()
model.add(Embedding(input_dim=voc_size,
output_dim=embed_dim,input_length=50))
model.add(Bidirectional(LSTM(units=lstm_units,return_sequences=True),
merge_mode="ave"))
model.add(Dense(200))
model.add(TimeDistributed(Dense(count_classes+1)))
# Compile model
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.summary()
return model
sample_model = sample_build(embed_dim,voc_size,
size_batch, rnn_units,
count_classes)
I am having trouble understanding the shapes of input and output for each layer. For example, the shape of the output of Embedding_Layer is (BATCH_SIZE, time_steps, length_of_input) and in this case, it is (30, 50, 64).
Similarly, the output shape of Bidirectional LSTM later is (30, 50, 75). This is will be the input for the next Dense Layer with 200 units. But the shape of the weight matrix of Dense Layer is (number of units in the current layer, number of units in the previous layer, which is (200,75) in this case. So how does the matrix calculation happen between 2D shape of the Dense Layer and the 3D shape of the Bidirectional Layer? Any explanations on the shape clarification will be helpful
The Dense can do 3D operation, it will flatten the the input to shape (batch_size * time_steps, features) and then apply a dense layer and reshape it back to orignal (batch_size, time_steps, units). In keras's documentation of Dense layer, it says:
Note: If the input to the layer has a rank greater than 2, then Dense computes the dot product between the inputs and the kernel along the last axis of the inputs and axis 1 of the kernel (using tf.tensordot). For example, if input has dimensions (batch_size, d0, d1), then we create a kernel with shape (d1, units), and the kernel operates along axis 2 of the input, on every sub-tensor of shape (1, 1, d1) (there are batch_size * d0 such sub-tensors). The output in this case will have shape (batch_size, d0, units).
Another point regarding the output of Embedding layer. As you said, it is correct that it is a 3D output, but correctly the shape correspond to (BATCH_SIZE, input_dim, embeddings_dim)
I made the following neural network model for sound recognition purpose. The flowchart is like the following:
cnn-lstm-dense-hybrid(please click here)
The idea is the following:
I have 2 different input layers, called A and B.
(i) Input A has 100 time steps, each step has a 64-dimensional feature vector
(ii)A 1D CNN layer(Time distributed) will extract features from each time step. The CNN layer contains 64 filters, each has length 16 taps. Then, a maxpooling layer will extract the single maximum value of each convolutional output, so a total of 64 features will be extracted at each time step.
(iii) The output of the CNN layer will be fed into an LSTM layer with 64 neurons. Number of recurrence is the same as time step of input, which is 100 time steps. The LSTM layer should return a sequence of 64-dimensional output (the length of sequence == number of time steps == 100, so there should be 100*64=6400 numbers).
(iv) Meanwhile, input B also has 100 time steps, each step has a 65-dimensional feature vector, but they are treated differently from input A.
(v)Input B is fed into a dense layer (Time distributed) of 65 neurons, so it should produce a 65-dimensional output at each time step.
Now, at each time step, we have output from LSTM layer (64 neurons) and Dense layer (65 neurons), we concatenate them in a merge layer. Now we get a 129-dimensional vector at each time step.
We feed this vector into another dense layer, which produces the output (single neuron, which represents the probability of "is target sound")
A hand drawn illustration
However, I am stuck at the very beginning trying to make 1(i) work. The code of network building is below:
mfcc_input = Input(shape=(100,64), dtype='float', name='mfcc_input')
print(mfcc_input)
CNN_out = TimeDistributed(Conv1D(64, 16, activation='relu'))(mfcc_input)
CNN_out = BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True)(CNN_out)
CNN_out = TimeDistributed(MaxPooling1D(pool_size=(64-16+1), strides=None, padding='valid'))(CNN_out)
CNN_out = Dropout(0.4)(CNN_out)
LSTM_out = LSTM(64,return_sequences=True)(CNN_out)
## Auxilliary branch
delta_input = Input(shape=(100,64), dtype='float', name='delta_input')
zcr_input = Input(shape=(100,1), dtype='float', name='zcr_input')
aux_input = concatenate([delta_input, zcr_input])
aux_out = TimeDistributed(Dense(64+1))(aux_input)
### Merge branches
merged_layer = concatenate([LSTM_out, aux_out])
## Output layer
output = TimeDistributed(Dense(1))(merged_layer)
model = Model(inputs=[mfcc_input, delta_input, zcr_input], outputs=[output])
model.compile(optimizer='rmsprop', loss='binary_crossentropy',
loss_weights=[1., 0.2])
...(other code here) ...
The error at "CNN_out = TimeDistributed(Conv1D(64, 16, activation='relu'))(mfcc_input)" is: IndexError: list index out of range
Anyone could help? Greatly appreciate!
I have a CNN model in keras (used for signal classification):
cnn = Sequential()
cnn.add(Conv1D(10,kernel_size=8,strides=4, padding="same",activation="relu",input_shape=(Dimension_of_input,1)))
cnn.add(MaxPooling1D(pool_size=3))
cnn.add(Conv1D(10,kernel_size=8,strides=4, padding="same",activation="relu"))
cnn.add(MaxPooling1D(2))
cnn.add(Flatten())
cnn.add(Dense(2, activation="softmax"))
Using the method 'model.summary()', I can get the shape of the output of each layer. In my model, the output of the last max pooling layer is (None, 1, 30) and of flatten layer is (None, 30).
For each train and test sample: Is it possible in keras to get the output of the flatten layer as a feature vector with the 30 features (numbers), before it is given as input to the dense layer??
Select the last layer by:
last = cnn.layers[-1]
then create a new model using:
inp = Input(shape=(Dimension_of_input,))
features = Model(inp, last)
So,
feature_vec = features.predict(x_train)
give you the output of the flatten layer as a feature vector for each train sample
Recently, I try to use tensorflow to implement a cnn+ctc network base on the article Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks.
I try to feed batch spectrogram data (shape:(10,120,155,3),batch_size is 10) into 10 convolution layer and 3 fully connected layer. So the output before connecting the ctc layer is 2d data(shape:(10,1024)).
Here is my problem: I want to use tf.nn.ctc_loss function in tensorflow library,but it generate the ValueError: Dimension must be 2 but is 3 for 'transpose'(op:'Transpose') with input shapes:[?,1024],[3].
I guess the error is related to the dimension of my 2d input data. The discription of the ctc_loss function in tensorflow official site is require a 3d input with the shape (batch_size x max_time x num_classes).
So, what is the extra dimension of 'num_classes' ? what should I change the shape of my cnn+fc output data?
The fully connected layer should be applied per time step.
It's like applying same dense layer per time step in recurrent neural network.
For output of convolution layer, time step is width.
So for example, output shape would be:
convolution: (10,120,155,3) = (batch, height, width, channels)
flatten: (10, 155, 120*3) = (batch, max_time, features)
fully connected: (10, 155, 1024), (same dense layer applied per time step)
(10, 155, num_classes)
It is expected shape for ctc_loss in tensorflow.