I'm trying to use LSTM networks to input a simple dataset that has multiple different sequences of numbers that represent musical data. The data is just a bunch of numpy arrays of floating point numbers with each song being one array. The data looks like this:
Song 1: [0.00013487907, 0.0002517006, 0.00021654845, ...]
Song 2: [-0.007279772, -0.011207076, -0.010082608, ...]
Song 3: [-0.00060827745, -0.00082834775, -0.0006534484, ...]
..and so on
I have done this before for MIDI files before, but those require embeddings of the different characters, however this is more continuous data as opposed to discrete data, so I'm not sure what the input model will look like, and how the data can be loaded for this particular task. For example, for the MIDI file project the input had an embedding layer to the model:
batch_size = 16
seq_length = 64
num_epochs = 100
optimizer_ = tf.keras.optimizers.Adam()
model = Sequential()
model.add(Embedding(input_dim = num_unique_chars, output_dim = 512, batch_input_shape = (batch_size, seq_length)))
model.add(LSTM(256, return_sequences = True, stateful = True))
model.add(Dropout(0.2))
model.add(LSTM(256, return_sequences = True, stateful = True))
model.add(Dropout(0.2))
model.add(LSTM(256, return_sequences = True, stateful = True))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(num_unique_chars)))
model.add(Activation("softmax"))
model.compile(loss = "categorical_crossentropy", optimizer = optimizer_, metrics = ["accuracy"])
I wanna know how to do the same without tokenization/embedding, and feed each song into the model separately, and then further be able to generate samples from it.
I've tried looking for examples of this but everything related to LSTM networks seems to be text-based. Would appreciate any help/guidance with this!
Thanks
If you already have continuous values, you will not need an Embedding-layer. Either you directly pass the data into the LSTMs or you can use a Dense layer in-between. Additionally, you can also add a Masking-layer (depending on your data).
Also you have to adjust the shape of your data to (batch_size, seq_len, 1) as you only have one feature, but the time-series has to be "recognizable".
Here is a minimum working example with a Dense-layer instead the non-functioning Embedding-layer:
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras import Sequential
batch_size = 16
seq_length = 64
num_epochs = 100
num_unique_chars = 55 # I just picked any number
optimizer_ = tf.keras.optimizers.Adam()
model = Sequential()
model.add(layers.Dense(256, use_bias=False))
model.add(layers.LSTM(256, return_sequences = True, stateful = True))
model.add(layers.Dropout(0.2))
model.add(layers.LSTM(256, return_sequences = True, stateful = True))
model.add(layers.Dropout(0.2))
model.add(layers.LSTM(256, return_sequences = True, stateful = True))
model.add(layers.Dropout(0.2))
model.add(layers.TimeDistributed(layers.Dense(num_unique_chars)))
model.add(layers.Activation("softmax"))
model.compile(loss = "categorical_crossentropy", optimizer = optimizer_, metrics = ["accuracy"])
test_data = tf.random.normal(shape=(batch_size, seq_length, 1))
test_out = model(test_data)
print(test_out.shape)
Output: (16, 64, 55)
P. S.: With Dense layers the TimeDistributed-layer is optional. The Dense layer will just manipulate the last dimension of its input tensor.
P. P. S.: I think for your limited amount of features, three LSTM-layers with a dimension of 256 might easily result in over-fitting or some other unpleasant effects. So it might be useful to reduce the number of layers and their dimension. (Of course, this does not target your initial question)
Related
I am building a model for human face segmentation into skin and non-skin area. As a model, I am using the model/method shown here as a starting point and adding a dense layer at the end with sigmoid activation. The model works very well for my purpose, giving good dice metric score. The model uses 2 pre-trained layers from Resnet50 as a model backbone for feature detection. I have read several articles, books and code but couldn't find any information on how to determine which layer to choses for feature extraction.
I compared the Resnet50 architecture with Xception and picked up two similar layers, replaced the layer in the original network (here) and ran the training. I got similar results, not better not worse.
I have the following questions
How to determine which layer is responsible for low-level/high-level features?
Does using only pre-trained layers any better than using full pre-trained networks in terms of training time and the number of trainable parameters?
where can I find more information about using only layers from pre-trained networks?
here is the code for quick over-view
def DeeplabV3Plus(image_size, num_classes):
model_input = keras.Input(shape=(image_size, image_size, 3))
resnet50 = keras.applications.ResNet50(
weights="imagenet", include_top=False, input_tensor=model_input)
x = resnet50.get_layer("conv4_block6_2_relu").output
x = DilatedSpatialPyramidPooling(x)
input_a = layers.UpSampling2D(size=(image_size // 4 // x.shape[1], image_size // 4 // x.shape[2]), interpolation="bilinear")(x)
input_b = resnet50.get_layer("conv2_block3_2_relu").output
input_b = convolution_block(input_b, num_filters=48, kernel_size=1)
x = layers.Concatenate(axis=-1)([input_a, input_b])
x = convolution_block(x)
x = convolution_block(x)
x = layers.UpSampling2D(size=(image_size // x.shape[1], image_size // x.shape[2]), interpolation="bilinear")(x)
model_output = layers.Conv2D(num_classes, kernel_size=(1, 1), padding="same")(x)
return keras.Model(inputs=model_input, outputs=model_output)
And here is my modified code using Xception layers as the backbone
def DeeplabV3Plus(image_size, num_classes):
model_input = keras.Input(shape=(image_size, image_size, 3))
Xception_model = keras.applications.Xception(
weights="imagenet", include_top=False, input_tensor=model_input)
xception_x1 = Xception_model.get_layer("block9_sepconv3_act").output
x = DilatedSpatialPyramidPooling(xception_x1)
input_a = layers.UpSampling2D(size=(image_size // 4 // x.shape[1], image_size // 4 // x.shape[2]), interpolation="bilinear")(x)
input_a = layers.AveragePooling2D(pool_size=(2, 2))(input_a)
xception_x2 = Xception_model.get_layer("block4_sepconv1_act").output
input_b = convolution_block(xception_x2, num_filters=256, kernel_size=1)
x = layers.Concatenate(axis=-1)([input_a, input_b])
x = convolution_block(x)
x = convolution_block(x)
x = layers.UpSampling2D(size=(image_size // x.shape[1], image_size // x.shape[2]),interpolation="bilinear")(x)
x = layers.Conv2D(num_classes, kernel_size=(1, 1), padding="same")(x)
model_output = layers.Dense(x.shape[2], activation='sigmoid')(x)
return keras.Model(inputs=model_input, outputs=model_output)
Thanks in advance!
In general, the first layers (the ones closer to the input) are the one responsible for learning high-level features, whereas the last layers are more dataset/task-specific. This is the reason why, when transfer learning, you usually want to delete only the last few layers to replace them with others which can deal with your specific problem.
It depends. Transfering the whole network, without deleting nor adding any new layer, basically means that the network won't learn anything new (unless you are not freezing the layers - in that case you are fine tuning). On the other hand, if you delete some layers and add a few more, than you the number of trainable parameters only depend on the new layers you just added.
What I suggest you to do is:
Delete a few layers from a pre-trained network, freeze these layers and add a few more layers (even just one)
Train the new network with a certain learning rate (usually this learning rate is not very low)
Fine Tune!: unfreeze all the layers, lower the learning rate, and re-train the whole network
I am trying to solve the Spoken Digit Recognition task using the LSTM model, where the audio files are converted into spectrograms and fed into an LSTM model after doing Global Average Pooling. Here is the architecture of it
tf.keras.backend.clear_session()
#input layer
input_= Input(shape = (64, 35))
lstm = LSTM(100, activation='tanh', return_sequences= True, kernel_regularizer = l2(0.000001),
recurrent_initializer = 'glorot_uniform')(input_)
lstm = GlobalAveragePooling1D(data_format='channels_first')(lstm)
dense = Dense(20, activation='relu', kernel_regularizer = l2(0.000001), kernel_initializer='glorot_uniform')(lstm)
drop = Dropout(0.8)(dense)
dense1 = Dense(25, activation='relu', kernel_regularizer = l2(0.000001), kernel_initializer= 'he_uniform')(drop)
drop = Dropout(0.95)(dense1)
output = Dense(10,activation = 'softmax', kernel_regularizer = l2(0.000001), kernel_initializer= 'glorot_uniform')(drop)
model_2 = Model(inputs = [input_], outputs = output)
model_2.summary()
Having summary as -
I need to calculate the F1 score to check the performance of the model, I have implemented a custom callback and used TensorFlow addons F1 score too. However, I won't get the correct result, for every epoch I get the constant F1 score value.
On further digging, I found out that my model predicts the same class label, for the entire epoch, whereas it is supposed to predict 10 classes in one epoch. as there are 10 class label values present.
Here is my model.compile and model.predict commands. I have used TensorFlow addon here -
from tensorflow import keras
opt = keras.optimizers.Adam(0.001, clipnorm=0.8)
model_2.compile(loss='categorical_crossentropy', optimizer=opt, metrics = metric)
hist = model_2.fit([X_train_spectrogram],
[y_train_converted],
validation_data= ([X_test_spectrogram], [y_test_converted]),
epochs = 10,
verbose =1,
callbacks=[tensorBoard_callbk2, ClearMemory()],
# steps_per_epoch = 3,
batch_size=32)
Here is what I mean by getting the same prediction, the entire array is filled with the same predicted values.
Why is the model predicting the same class label? or How to rectify it?
I have tried increasing the number of trainable parameters, increasing - decreasing batch size too, but it won't help me. If anyone knows can you please help me out?
I am trying to perform sentiment classification using Keras. I am trying to do this using a basic neural network (no RNN or other more complex type). However when I run the script I see no increase in accuracy during training/evaluation. I am guessing I am setting up the output layer incorrectly but I am not sure of that. y_train is a list [1,2,3,1,2,4,5] (5 different labels) containing the targets belonging to the features in X_train_seq_padded. The setup is as follows:
padding_len = 24 # len of each tokenized sentence
neurons = 16 # 2/3 the length of the text that is padded
model = Sequential()
model.add(Dense(neurons, input_dim = padding_len, activation = 'relu', name = 'hidden-1'))
model.add(Dense(neurons, activation = 'relu', name = 'hidden-2'))
model.add(Dense(neurons, activation = 'relu', name = 'hidden-3'))
model.add(Dense(1, activation = 'sigmoid', name = 'output_layer'))
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics=['accuracy'])
callbacks = [EarlyStopping(monitor = 'accuracy', patience = 5, mode = 'max')]
history = model.fit(X_train_seq_padded, y_train, epochs = 100, batch_size = 64, callbacks = callbacks)
First of all, in your above set up if you choose sigmoid in your last layer activation function which generally uses for binary classification or multi-label classification then, the loss function should be binary_crossentropy.
But if your labels are represented multi-class and transformed into one-hot encoded then your last layer should be Dense(num_classes, activations='softmax') and the loss function would be categorical_crossentropy.
But if you don't transform your multi-class label but integer then your last layer and loss function should be
Dense(num_classes) # with logits
SparseCategoricalCrossentropy(from_logits= True)
Or, (#Frightera)
Dense(num_classes, activation='softmax') # with probabilities
SparseCategoricalCrossentropy(from_logits=False)
I have created four 3D-CNN models and each was trained on a different (but related) set of images, such that each set has images of a different perspective of the same objects. (i.e: n objects have images from 4 different perspectives, each model is associated to a single perspective).
def get_model(width=128, height=128, depth=4):
inputs = Input((width, height, depth, 3))
x = Conv3D(filters=64, kernel_size=8,padding='same', activation="relu")(inputs)
x = MaxPool3D(pool_size=2,data_format= "channels_first", padding='same')(x)
x = BatchNormalization()(x)
x = Conv3D(filters=256, kernel_size=3,padding='same', activation="relu")(x)
x = MaxPool3D(pool_size=2,data_format= "channels_first", padding='same')(x)
x = BatchNormalization()(x)
x = GlobalAveragePooling3D()(x)
x = Dense(units=512, activation="relu")(x)
x = Dropout(0.3)(x)
outputs = Dense(units=2, activation="sigmoid")(x)
# Define the model.
model = keras.Model(inputs, outputs)
return model
I have now four pre-trained models, and I would like to combine them by removing the last dense layer (sigmoid) and instead, concatenating the dense layers of all the four models followed by an activation function (i.e: sigmoid). I would like to keep four input layers such that each will take an image of an object from one perspective. I have seen examples of concatenating an output layer of model_1 to the input layer of model_2, however, I am not sure how to deal with four separate input layers and concatenating towards the end of the model.
Let's assume that you have your pretrained model files named "A.h5" and "B.h5". You can simply load them in TensorFlow, access the layer that interrest you with the layers attribute, and merge them with the Functional API. One example could be the following :
import tensorflow as tf
pretrainedmodel_files = ["A.h5", "B.h5"]
A,B = [tf.keras.models.load_model(filename) for filename in pretrainedmodel_files]
# Skipping the last dense layer and the dropout means accessing the layer at the index -3
concat = tf.keras.layers.Concatenate()([A.layers[-3].output, B.layers[-3].output])
out = tf.keras.layers.Dense(2,activation="sigmoid")(concat)
model = tf.keras.Model(inputs=[A.input, B.input], outputs=out)
I've created two simple model with the following code:
tf.keras.Sequential(
[
tf.keras.layers.Dense(10, activation="relu", input_shape=(5,)),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Dense(2, activation="sigmoid")
]
)
And then merged them together with my sample code.
A and B have the following architecture (visualization with netron):
And the merged network:
I am experimenting with recurrent neural network layers in tensorflow & keras and I am having a look at the recurrent_initializer. I wanted to know more about its influence on the layer, so I created a SimpleRnn layer as the follows:
rnn_layer = keras.layers.SimpleRNN(1, return_sequences=True, kernel_initializer = keras.initializers.ones, recurrent_initializer=keras.initializers.zeros, activation="linear")
Running this code, makes the addition in the recurrent net visible:
inp = np.zeros(shape=(1,1,20), dtype=np.float32)
for i in range(20):
inp[0][0][:i] = 5
#inp[0][0][i:] = 0
print(f"i:{i} {rnn_layer(inp)}"'')
output:
i:0 [[[0.]]]
i:1 [[[5.]]]
i:2 [[[10.]]]
i:3 [[[15.]]]
i:4 [[[20.]]]
i:5 [[[25.]]]
i:6 [[[30.]]]
i:7 [[[35.]]]
i:8 [[[40.]]]
i:9 [[[45.]]]
i:10 [[[50.]]]
i:11 [[[55.]]]
i:12 [[[60.]]]
i:13 [[[65.]]]
i:14 [[[70.]]]
i:15 [[[75.]]]
i:16 [[[80.]]]
i:17 [[[85.]]]
i:18 [[[90.]]]
i:19 [[[95.]]]
Now I change the recurrent_initializer to something different, like a glorot_normal distribution:
rnn_layer = keras.layers.SimpleRNN(1, return_sequences=True, kernel_initializer = keras.initializers.ones, recurrent_initializer=keras.initializers.glorot_normal(seed=0), activation="linear")
But I still get the same results. I thought it might depend on some logic, which a Rnn is missing but a LSTM has, so I tried it with an lstm, but still same results. I guess there is something about the recurrent_logic, I still miss. Can someone explain me, what the reccurent_initializers purpose is and how it affects the recurrent layer?
Thanks alot!
Your input to the RNN layer is of shape (1, 1, 20), which mean one Timestep for each batch , the default behavior of RNN is to RESET state between each batch , so you cant see the effect of the recurrent ops(the recurrent_initializers).
You have to change the length of the sequence of your input:
inp = np.ones(shape=(5 ,4,1), dtype=np.float32) # sequence length == 4
rnn_layer1 = tf.keras.layers.LSTM(1,return_state=True, return_sequences=False,
kernel_initializer = tf.keras.initializers.ones,
recurrent_initializer=tf.keras.initializers.zeros, activation="linear")
rnn_layer2 = tf.keras.layers.LSTM(1,return_state=True , return_sequences=False,
kernel_initializer = tf.keras.initializers.ones,
recurrent_initializer=tf.keras.initializers.glorot_normal(seed=0),
activation="linear")
first_sample = inp[0 : 1 , : ,: ] #shape(1,4,1)
print(rnn_layer1(first_sample )
print(rnn_layer2(first_sample )