the reason that the training loss does not change at all - tensorflow

I have built the following model to perform sequence prediction, here is the model
inputs = Input(shape=(64,1))
model = Sequential()
model.add(Conv1D(64,12,activation='relu',input_shape= (64,1),padding='causal'))
model.add(Conv1D(64,12,activation='relu',padding='causal'))
model.add(MaxPooling1D(2))
model.add(Conv1D(128,12,activation='relu',padding='causal'))
model.add(Conv1D(128,12,activation='relu',padding='causal'))
model.add(GlobalAveragePooling1D())
model.add(Dropout((0.5)))
model.add(Dense(dense_expansion,activation='relu'))
model.add(Dense(1,activation='relu'))
model.compile(loss=loss_function, optimizer=optimizer,metrics=['mse','mae'])
model.fit(X_train, Y_train, batch_size=batch_size, validation_data=(X_val,Y_val), epochs=nr_of_epochs,verbose=2)
The model architecture is like this
However, the training result looks like there is no change, what are the possible reasons. The training data is of shape (1496000,64,1) and (1496000,1); the validation data is of (374000,64,1) and (374000,1).

Related

What layers, nodes and epoch amounts would you set for 84 columns and 14million rows of data?

I'm creating an incident intrusion detection based model and i've had trouble getting rid of false positives on my model or other classes being thrown in to the prediction, the classes might be related slightly but it's just not acceptable to me and i feel like i'm doing something incorrect. I'm using online datasets with pre-placed labels identifying each class and what a instance of data or row corresponds to. So it's a multiclass model.
What would you change with the below code? I was thinking of changing the input to the amount of features i have (84) and the hidden layer to be the average of the input and output. Thanks a lot! Also when is the perfect time to stop training? I was thinking of using early_stopping, but my 6gb rtx 3060 gpu runs out of memory after the first epoch, but only does this when i include early_stopping validation.
-14 million rows of 84 columns
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=42)
model = Sequential()
model.add(tf.keras.Input(shape=len(X.columns,))) # input layer
model.add(Dense(32, activation='relu')) # hidden
model.add(Dense(16, activation='relu')) # hidden
model.add(Dense(15, activation='softmax')) # output
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics='accuracy')
model.fit(x_train, y_train, epochs=231, batch_size=512)

My validation loss of my LSTM model is very volatile

I'm creating an LSTM model to predict the closing price of bitcoin. However, when I started training, my validation loss starts getting very volatile and my test_prediction becomes inaccurate.
Here's my model:
model = Sequential()
model.add(LSTM(80, input_shape=(1,look_back)))
model.add(LSTM(60))
model.compile(optimizer='adam', loss='mean_squared_error')
Fitting the model:
from keras.callbacks import ModelCheckpoint
callbacks = [ModelCheckpoint(save_best_only = True, filepath='btc_close_prediction.h5')]
history = model.fit(xTrain, yTrain, batch_size=10, epochs=30, callbacks=callbacks, validation_split=0.2)
loss graph:
Prediction Plot:
Please advise how can I adjust my model for a better val_loss and better predicting accuracy.
Your validation dataset should comprise of 5000 samples to get your validation loss smooth.
Try transformer model - it requires less training data.

How to mask paddings in LSTM model for speech emotion recognition

Given a few directories of .wav audio files, I have extracted their features in terms of a 3D array (batch, step, features).
For my case, the training dataset is (1883,100,136).
Basically, each audio has been analyzed 100 times (imagine that as 1fps) and each time, 136 features have been extracted. However, those audio files are different in length so some of them cannot be analyzed for 100 times.
For instance, one of the audio has 50 sets of 136 features as effective values so the rest 50 sets were padded with zeros.
Here is my model.
def LSTM_model_building(units=200,learning_rate=0.005,epochs=20,dropout=0.19,recurrent_dropout=0.2):
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Bidirectional(LSTM(units, dropout=dropout, recurrent_dropout=recurrent_dropout, input_shape=(X_train.shape[0],100, 136))))
# model.add(tf.keras.layers.Bidirectional(LSTM(32)))
model.add(Dense(num_classes, activation='softmax'))
adamopt = tf.keras.optimizers.Adam(lr=learning_rate, beta_1=0.9, beta_2=0.999, epsilon=1e-8)
opt = tf.keras.optimizers.RMSprop(lr=learning_rate, rho=0.9, epsilon=1e-6)
# opt = tf.keras.optimizers.SGD(lr=learning_rate, momentum=0.9, decay=0., nesterov=False)
model.compile(loss='categorical_crossentropy',
optimizer=adamopt,
metrics=['accuracy'])
history = model.fit(X_train, y_train,
batch_size=batch_size,
epochs=epochs,
validation_data=(X_test, y_test),
verbose = 1)
score, acc = model.evaluate(X_test, y_test,
batch_size=batch_size)
return history
I wish to mask the padding however the instruction, shown on the Keras website, uses an embedding layer which I believe is usually used for NLP. I have no idea how to use the embedding layer for my model.
Can anyone teach me how to apply masking for my LSTM model?
Embedding layer is not for your case. You can consider instead Masking layer. It is simply integrable in your model structure, as shown below.
I also remember you that the input shape must be specified in the first layer of a sequential model. Remember also that you don't need to pass the sample dimension. In your case, the input shape is (100,136) which is equal to (timesteps,n_features)
units,learning_rate,dropout,recurrent_dropout = 200,0.005,0.19,0.2
num_classes = 3
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Masking(mask_value=0.0, input_shape=(100,136)))
model.add(tf.keras.layers.Bidirectional(LSTM(units, dropout=dropout, recurrent_dropout=recurrent_dropout)))
model.add(Dense(num_classes, activation='softmax'))
adamopt = tf.keras.optimizers.Adam(lr=learning_rate, beta_1=0.9, beta_2=0.999, epsilon=1e-8)
opt = tf.keras.optimizers.RMSprop(lr=learning_rate, rho=0.9, epsilon=1e-6)
model.compile(loss='categorical_crossentropy',
optimizer=adamopt,
metrics=['accuracy'])
model.summary()

RNN Not Generalizing on Text Classification

I am using keras and RNN to classify slack text data on whether the text is reaction worthy or not (1 - emoji, 0 - no emoji). I have removed usernames and urls from the text as well as dropped duplicates with different target variables.
I am not able to get the model to generalize to unseen data. The loss of the train/val sets look good and continually decrease but the accuracy of the val set only decreases.
I am using a pretrained GLOVE word embedding since my training size is only about 25,000 sentences.
I have added additional layers, changed my regularization value and increased dropout but get similar results. Is my model not complex enough to generalize the data? The times i added additional layers they were much smaller but deeper because the training time was about 2 min per epoch.
Any insight would be appreciated.
embedding_layer = Embedding(len(word_index) + 1,
100,
weights=[embeddings_matrix],
input_length=max_message_length,
embeddings_regularizer=l2(0.001),
trainable=True)
# Creating the Model
model = Sequential()
model.add(embedding_layer)
model.add(Convolution1D(filters=32, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.7))
model.add(layers.GRU(128))
model.add(Dropout(0.7))
model.add(Dense(1, activation='sigmoid'))
# Compiling the model with our given Optimizer
optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.000025)
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
print(model.summary())

Why did the Keras Sequential model give a different result compared to Model model?

I've tried a simple lstm model in keras to do a simple sentiment analysis using imdb dataset using both Sequential model and Model model, and turns out the latter gives a worse result. Here's my code :
model = Sequential()
model.add(Embedding(top_words, embedding_vector_length, input_length=max_review_length))
model.add(LSTM(100))
model.add(Dense(2, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
It gives a result around 0.6 of accuracy in the first epoch, while the other code that use Model :
_input = Input(shape=[max_review_length], dtype='int32')
embedded = Embedding(
input_dim=top_words,
output_dim=embedding_size,
input_length=max_review_length,
trainable=False,
mask_zero=False
)(_input)
lstm = LSTM(100, return_sequences=True)(embedded)
probabilities = Dense(2, activation='softmax')(lstm)
model = Model(_input, probabilities)
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
and it gives 0.5 accuracy as a result of the first epoch and never change afterwards.
Any reason for that, or am i doing something wrong? Thanks in advance
I see two main differences between your two models :
You have set the embeddings of the second model as "trainable=False". So you have probably a lot fewer parameters to optimize the second model compared to the first one.
The LSTM is returning the whole sequence in the second model, so the outputs shape will be different, so I don't see how you can compare the two models, they are not doing the same thing.