How to Improve accuracy Multilabel sequential text classification - tensorflow

I have a dataset which consist of patient complaints as a input and multiple diagnoses as a output.
I tokenized every word of input text with padding so it like [22,10,4,5,0,0,0,0]
and output is diagnosis which is one-hot encoding [1,0,0,0,0,1........] (850 diagnosis)
I am trying to train my model
model = Sequential()
# Configuring the parameters
model.add(Embedding(vocab_size, output_dim=858, input_length=len(X_train)))
model.add(LSTM(128*2, return_sequences=True))
# Adding a dropout layer
model.add(Dropout(0.5))
model.add(LSTM(64*2))
model.add(Dropout(0.5))
# Adding a dense output layer with sigmoid activation
model.add(Dense(858, activation='sigmoid'))
model.summary()
model.compile(optimizer='adam', loss='binary_crossentropy',metrics=['accuracy'])
But the accuracy could not higher than 0.3 %30 over the 50 epochs.
Dataset :
Text: cought, fever, nausea. - Target : X1.0 Y.10 C.2.5 C3.5 (diagnosis code, each code represents diagnosis so i converted it to one hot encoding across to all distinct diagnosis)

Related

What layers, nodes and epoch amounts would you set for 84 columns and 14million rows of data?

I'm creating an incident intrusion detection based model and i've had trouble getting rid of false positives on my model or other classes being thrown in to the prediction, the classes might be related slightly but it's just not acceptable to me and i feel like i'm doing something incorrect. I'm using online datasets with pre-placed labels identifying each class and what a instance of data or row corresponds to. So it's a multiclass model.
What would you change with the below code? I was thinking of changing the input to the amount of features i have (84) and the hidden layer to be the average of the input and output. Thanks a lot! Also when is the perfect time to stop training? I was thinking of using early_stopping, but my 6gb rtx 3060 gpu runs out of memory after the first epoch, but only does this when i include early_stopping validation.
-14 million rows of 84 columns
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=42)
model = Sequential()
model.add(tf.keras.Input(shape=len(X.columns,))) # input layer
model.add(Dense(32, activation='relu')) # hidden
model.add(Dense(16, activation='relu')) # hidden
model.add(Dense(15, activation='softmax')) # output
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics='accuracy')
model.fit(x_train, y_train, epochs=231, batch_size=512)

Keras binary classification model's AUC score doesn't increase

I have a imbalanced dataset which has 57000 zeros and 2500 ones. I gave class weights as an input to my model, tried to change optimizers, tried to resize number of layers and neurons. Finally I stick to ;
because it was the only one that seems systematic, tried to change layer weight regularization rules but nothing helped me yet. I am not talking about just for my validation AUC score, even train success doesn't rise satisfyingly.
Here is how I declared my model, don't mind if you think the problem is layer and node sizes. I think I tried everything that sounds sensible.
class_weight = {0: 23.59,
1: 1.}
model=Sequential()
model.add(Dense(40, input_dim=x_train.shape[1], activation='relu'))
model.add(Dense(33, activation='relu',kernel_regularizer=regularizers.l1_l2(l1=1e-5, l2=1e-4),bias_regularizer=regularizers.l2(1e-4),activity_regularizer=regularizers.l2(1e-5)))
model.add(Dense(28, activation='relu'))
model.add(Dense(20, activation='relu'))
model.add(Dense(15, activation='relu'))
model.add(Dense(9, activation='relu'))
model.add(Dense(5, activation='relu'))
model.add(Dense(1, activation='sigmoid',kernel_regularizer=regularizers.l1_l2(l1=1e-5, l2=1e-4),bias_regularizer=regularizers.l2(1e-4),activity_regularizer=regularizers.l2(1e-5)))
opt = keras.optimizers.SGD(learning_rate=0.1)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['AUC'])
model.fit(x_train,y_train,epochs=600,verbose=1,validation_data=(x_test,y_test),class_weight=class_weight)
After approximate 100 epoch, it was stuck at 0.73-0.75 auc, doesn't rise anymore. I couldn't even overfit my model

How to mask paddings in LSTM model for speech emotion recognition

Given a few directories of .wav audio files, I have extracted their features in terms of a 3D array (batch, step, features).
For my case, the training dataset is (1883,100,136).
Basically, each audio has been analyzed 100 times (imagine that as 1fps) and each time, 136 features have been extracted. However, those audio files are different in length so some of them cannot be analyzed for 100 times.
For instance, one of the audio has 50 sets of 136 features as effective values so the rest 50 sets were padded with zeros.
Here is my model.
def LSTM_model_building(units=200,learning_rate=0.005,epochs=20,dropout=0.19,recurrent_dropout=0.2):
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Bidirectional(LSTM(units, dropout=dropout, recurrent_dropout=recurrent_dropout, input_shape=(X_train.shape[0],100, 136))))
# model.add(tf.keras.layers.Bidirectional(LSTM(32)))
model.add(Dense(num_classes, activation='softmax'))
adamopt = tf.keras.optimizers.Adam(lr=learning_rate, beta_1=0.9, beta_2=0.999, epsilon=1e-8)
opt = tf.keras.optimizers.RMSprop(lr=learning_rate, rho=0.9, epsilon=1e-6)
# opt = tf.keras.optimizers.SGD(lr=learning_rate, momentum=0.9, decay=0., nesterov=False)
model.compile(loss='categorical_crossentropy',
optimizer=adamopt,
metrics=['accuracy'])
history = model.fit(X_train, y_train,
batch_size=batch_size,
epochs=epochs,
validation_data=(X_test, y_test),
verbose = 1)
score, acc = model.evaluate(X_test, y_test,
batch_size=batch_size)
return history
I wish to mask the padding however the instruction, shown on the Keras website, uses an embedding layer which I believe is usually used for NLP. I have no idea how to use the embedding layer for my model.
Can anyone teach me how to apply masking for my LSTM model?
Embedding layer is not for your case. You can consider instead Masking layer. It is simply integrable in your model structure, as shown below.
I also remember you that the input shape must be specified in the first layer of a sequential model. Remember also that you don't need to pass the sample dimension. In your case, the input shape is (100,136) which is equal to (timesteps,n_features)
units,learning_rate,dropout,recurrent_dropout = 200,0.005,0.19,0.2
num_classes = 3
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Masking(mask_value=0.0, input_shape=(100,136)))
model.add(tf.keras.layers.Bidirectional(LSTM(units, dropout=dropout, recurrent_dropout=recurrent_dropout)))
model.add(Dense(num_classes, activation='softmax'))
adamopt = tf.keras.optimizers.Adam(lr=learning_rate, beta_1=0.9, beta_2=0.999, epsilon=1e-8)
opt = tf.keras.optimizers.RMSprop(lr=learning_rate, rho=0.9, epsilon=1e-6)
model.compile(loss='categorical_crossentropy',
optimizer=adamopt,
metrics=['accuracy'])
model.summary()

Multivariate Binary Classification Prediction Tensorflow 2 LSTM

I am currently working on the implementation of an LSTM to predict a binary outcome (either 0 or 1) for a given set of normed scaled features.
self._regressor.add(LSTM(units=60, activation='relu', return_sequences=True, input_shape=(data.x_train.shape[1], data.x_train.shape[2])))
self._regressor.add(Dropout(0.2))
self._regressor.add(LSTM(units=60, activation='relu', return_sequences=True))
self._regressor.add(Dropout(0.3))
self._regressor.add(LSTM(units=80, activation='relu', return_sequences=True))
self._regressor.add(Dropout(0.4))
self._regressor.add(LSTM(units=120, activation='relu'))
self._regressor.add(Dropout(0.5))
#this is the output layer
self._regressor.add(Dense(units=1, activation='sigmoid'))
self._logger.info("TensorFlow Summary\n {}".format(self._regressor.summary()))
#run regressor
self._regressor.compile(optimizer='adam', loss="binary_crossentropy", metrics=['accuracy'])
self._regressor.fit(data.x_train, data.y_train, epochs=1, batch_size=32)
data.y_pred_scaled = self._regressor.predict(data.x_test)
data.y_pred = self._scaler_target.inverse_transform(data.y_pred_scaled)
scores = self._regressor.evaluate(data.x_test, data.y_test, verbose=0)
My issue here is that the output of my prediction has a range of max: 0.5188445 and min: 0.518052, implying to me that all of my classifications are positive (which is definitely incorrect). I even tried predict_classes and this yielded an array of 1's.
I am struggling to find where my issue is despite numerous searches online. I have ensured that my final output layer consists of a sigmoid function as well as included the loss as the binary_crossentropy also. My data has been scaled using sklearn's MinMaxScaler with feature_range=(0,1). I am running my code through a debugger and everything up to the self._regressor.fit looks good so far. I am just struggling with quantifying the output of the predictions.
Any help would be greatly appreciated.

RNN Not Generalizing on Text Classification

I am using keras and RNN to classify slack text data on whether the text is reaction worthy or not (1 - emoji, 0 - no emoji). I have removed usernames and urls from the text as well as dropped duplicates with different target variables.
I am not able to get the model to generalize to unseen data. The loss of the train/val sets look good and continually decrease but the accuracy of the val set only decreases.
I am using a pretrained GLOVE word embedding since my training size is only about 25,000 sentences.
I have added additional layers, changed my regularization value and increased dropout but get similar results. Is my model not complex enough to generalize the data? The times i added additional layers they were much smaller but deeper because the training time was about 2 min per epoch.
Any insight would be appreciated.
embedding_layer = Embedding(len(word_index) + 1,
100,
weights=[embeddings_matrix],
input_length=max_message_length,
embeddings_regularizer=l2(0.001),
trainable=True)
# Creating the Model
model = Sequential()
model.add(embedding_layer)
model.add(Convolution1D(filters=32, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.7))
model.add(layers.GRU(128))
model.add(Dropout(0.7))
model.add(Dense(1, activation='sigmoid'))
# Compiling the model with our given Optimizer
optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.000025)
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
print(model.summary())