Interpreting the Values of Confusion Matrix from machine learning - tensorflow

I used confusion_matrix() to evaluate the model that has been trained to detect the DDoS attack.
The result of confusion matrix is like below with my test data set.
I believe that False neg value should be not 0 if it correctly detected the attack that is not DDoS.
Below is the code that I have implemented my ML model. Could you please give me a suggestion to make the model correctly check the benign attacks?
model.add(Dense(units=64, activation='relu', input_dim=7)) # Input Layer
model.add(Dropout(0.5))
model.add(Dense(units=128, activation='relu')) # hidden Layer
model.add(Dropout(0.2))
model.add(Dense(units=64, activation='relu')) # hidden Layer
model.add(Dropout(0.2))
model.add(Dense(units=1, activation='sigmoid')) # Last Layer for output
model.compile(loss='binary_crossentropy',
optimizer=Adam(learning_rate=0.0001),
metrics=['accuracy'])
CSV_FILE = "ddos.csv"
df = pd.read_csv(CSV_FILE)
df.loc[(df.Label == "ddos"), "Label"] = 1.0
df.loc[(df.Label == "Benign"), "Label"] = 0.0
# Data set
x_train = np.array(df[["Flow Duration", "Tot Fwd Pkts", "TotLen Fwd Pkts",
"Flow IAT Mean","Flow IAT Std" ,"Flow IAT Max", "Flow IAT Min"]])
x_train = x_train.astype(float)
normalized_x = preprocessing.normalize(x_train)
y_train = np.array(df[["Label"]])
y_train = np.array(y_train, dtype = 'float')
normalized_y = preprocessing.normalize(y_train)
hist = model.fit(normalized_x, normalized_y, epochs=3, batch_size=128)
y_pred = model.predict(x_train)
y_pred = preprocessing.normalize(y_pred)
cf_matrix = confusion_matrix(y_test, np.rint(y_pred))
Notice that my dataset is not imbalanced, i.e. it has exactly 50% DDoS and 50% Normal traffic information.

Although such questions cannot actually be answered with any degree of certainty, there are indeed some serious issues with your code.
First, you should not normalize your labels y_train; this is a binary classification problem, and the labels are expected to be exactly 0/1. Remove the following lines:
normalized_y = preprocessing.normalize(y_train)
y_pred = preprocessing.normalize(y_pred)
change the labels into integers (not floats), i.e.:
df.loc[(df.Label == "ddos"), "Label"] = 1
df.loc[(df.Label == "Benign"), "Label"] = 0
and your model fit to:
hist = model.fit(normalized_x, y_train, epochs=3, batch_size=128)
Second, although you train with normalized_x, you subsequently request predictions with x_train, which is again wrong; your predictions should be:
y_pred = model.predict(normalized_x)
Third, dropout should not be used by default, but only if we have signs of overfitting - but to do so, our model has first to be able to start learning something, which is not the case here. Comment out all dropout layers, and start putting them back into the model only in case of overfitting.
Last, you should start with the default settings of Adam, which usually (and reportedly) work well out of the box, i.e.:
optimizer=Adam()
And of course you should consider running the model for more than epochs=3.

Related

Question about how to improve my intrusion detection model and decrease false positives?

I have a machine learning model that i feel is still getting false positives. It can largely detect attacks that i produce separately from the training / test set, maybe at a 80% rate? But for me that is not enough. I also tried to drop columns with high correlation. My biggest problem is my understanding of whether to use one-hot-encoding or not. I can switch between both one hot and sparse and i don't notice a difference at all in my dataset.
The dataset is like this:
column 1 - column 2 - column 3 - etc, all containing stuff like packet properties, and then at the end, the class. So, class 1, class 2 or class 3. Any one row can only belong to one class, it can't be two attack types, it has to distinguish between all attack types and then assign this particular row the best attack type match! This is different from one-hot where it's meant that, if i understand right, a row can belong to multiple attack types. I however notice that nobody ever uses sparse_categorical_crossentropy when it comes to even the iris dataset which is very similar to mine, as it has more than 3 classes.
I can paste my code here and if somebody knows where i am going wrong! :Z
label_encoder = preprocessing.LabelEncoder()
y = ConcatenateAttackList['Label']
encoded_y = label_encoder.fit_transform(y)
y = np_utils.to_categorical(encoded_y)
x = ConcatenateAttackList.drop(['Label', ], axis = 1).astype(float)
sc = MinMaxScaler()
print('x_train, y_train, fitting and transforming.')
x = sc.fit_transform(x)
x,y = oversample.fit_resample(x,y)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.33, random_state=42,stratify=y,
shuffle=True)
len(x_train)
len(y_train)
X = pd.DataFrame(x_train)
print('x_train, y_train, fitted and transformed.')
with tf.device("CPU"):
train = tf.data.Dataset.from_tensor_slices((x_train, y_train)).shuffle(4*256).batch(256)
validate = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(256)
model = Sequential()
print('Model initialized.')
model.add(Dense(64,input_dim=len(X.columns),activation='relu')) # input layer
model.add(tf.keras.layers.BatchNormalization())
model.add(Dense(32, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(Dense(6, activation='softmax'))
print('Nodes added to layers.')
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics='categorical_accuracy')
print('Compiled.')
callback=tf.keras.callbacks.EarlyStopping(monitor='val_loss', mode='auto',
patience=50, min_delta=0, restore_best_weights=True, verbose=2)
print('EarlyStopping CallBack executed.')
print('Beginning fitting...')
model_hist = model.fit(x_train, y_train, epochs=231, batch_size=256, verbose=1,
callbacks=[callback],validation_data=validate)
print('Fitting completed.')
model.save("sets/mymodel5.h5")
dump(sc, 'sets/scaler_transformTCPDCV5.joblib')
print('Model saved.')
# loss history
plt.plot(model_hist.history['loss'], label="Training Loss")
plt.plot(model_hist.history['val_loss'], label="Validation Loss")
plt.legend()
#------------PREDICTION
tester = pd.read_csv('AttackTestFile.csv', sep=r'\s*,\s*', engine='python')
ColumnsForWindowsCIC = pd.read_csv('ColumnsForWindowsCIC.csv')
tester.columns = ColumnsForWindowsCIC.columns
tester = deleteRedudancy(tester)
x = tester.drop(['Label', ], axis = 1)
fit_new_input = sc.transform(x)
predict_y=model.predict(fit_new_input)
predict_y
classes_y=np.argmax(predict_y,axis=1)
classes_y
predict = label_encoder.inverse_transform(classes_y)
predict

Keras accuracy not increasing

I am trying to perform sentiment classification using Keras. I am trying to do this using a basic neural network (no RNN or other more complex type). However when I run the script I see no increase in accuracy during training/evaluation. I am guessing I am setting up the output layer incorrectly but I am not sure of that. y_train is a list [1,2,3,1,2,4,5] (5 different labels) containing the targets belonging to the features in X_train_seq_padded. The setup is as follows:
padding_len = 24 # len of each tokenized sentence
neurons = 16 # 2/3 the length of the text that is padded
model = Sequential()
model.add(Dense(neurons, input_dim = padding_len, activation = 'relu', name = 'hidden-1'))
model.add(Dense(neurons, activation = 'relu', name = 'hidden-2'))
model.add(Dense(neurons, activation = 'relu', name = 'hidden-3'))
model.add(Dense(1, activation = 'sigmoid', name = 'output_layer'))
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics=['accuracy'])
callbacks = [EarlyStopping(monitor = 'accuracy', patience = 5, mode = 'max')]
history = model.fit(X_train_seq_padded, y_train, epochs = 100, batch_size = 64, callbacks = callbacks)
First of all, in your above set up if you choose sigmoid in your last layer activation function which generally uses for binary classification or multi-label classification then, the loss function should be binary_crossentropy.
But if your labels are represented multi-class and transformed into one-hot encoded then your last layer should be Dense(num_classes, activations='softmax') and the loss function would be categorical_crossentropy.
But if you don't transform your multi-class label but integer then your last layer and loss function should be
Dense(num_classes) # with logits
SparseCategoricalCrossentropy(from_logits= True)
Or, (#Frightera)
Dense(num_classes, activation='softmax') # with probabilities
SparseCategoricalCrossentropy(from_logits=False)

Getting constant accuracies for training and validation sets despite their losses are changing during CNN training?

As the title clearly describes the issue I've been experiencing during the training of my CNN model, the accuracies of training and validation sets are constant despite the losses of them are changing. I have included the detail regarding the model and its training setup below. What may cause this issue?
Here is the data that was used by training (X_train & y_train), validation, and test sets (X_test and y_test):
df = pd.read_csv(CSV_PATH, sep=',', header=None)
print(f'Shape of all data: {df.shape}')
y = df.iloc[:, -1].values
X = df.iloc[:, :-1].values
encoder = LabelEncoder()
encoder.fit(y)
encoded_Y = encoder.transform(y)
dummy_y = to_categorical(encoded_Y)
X_train, X_test, y_train, y_test = train_test_split(X, dummy_y, test_size=0.3, random_state=RANDOM_STATE)
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
Here are the shapes of training and test sets:
Shape of X_train: (1322, 10800, 1)
Shape of Y_train: (1322, 3)
Shape of X_test: (567, 10800, 1)
Shape of y_test: (567, 3)
Here is my CNN model:
# Model hyper-parameters
activation_fn = 'relu'
n_lr = 1e-4
weight_decay = 1e-4
batch_size = 64
num_epochs = 200*10*10
num_classes = 3
n_dropout = 0.6
n_momentum = 0.5
n_kernel = 5
n_reg = 1e-5
# the sequential model
model = Sequential()
model.add(Conv1D(128, n_kernel, input_shape=(10800, 1)))
model.add(BatchNormalization())
model.add(Activation(activation_fn))
model.add(MaxPooling1D(pool_size=2, strides=2))
model.add(Dropout(n_dropout))
model.add(Conv1D(256, n_kernel))
model.add(BatchNormalization())
model.add(Activation(activation_fn))
model.add(MaxPooling1D(pool_size=2, strides=2))
model.add(Dropout(n_dropout))
model.add(GlobalAveragePooling1D()) # have tried model.add(Flatten()) as well
model.add(Dense(256, activation=activation_fn))
model.add(Dropout(n_dropout))
model.add(Dense(64, activation=activation_fn))
model.add(Dropout(n_dropout))
model.add(Dense(num_classes, activation='softmax'))
adam = Adam(lr=n_lr, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=weight_decay)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['acc'])
Here is how I have evaluated the model:
Y_pred = model.predict(X_test, verbose=0)
y_pred = np.argmax(Y_pred, axis=1)
y_test_int = np.argmax(y_test, axis=1)
And, my model always predicts the same class of three classes during the model evaluation as you can see from the classification result below (via classification_result(y_test_int, y_pred) function):
precision recall f1-score support
normal 0.743 1.000 0.852 421
apb 0.000 0.000 0.000 45
pvc 0.000 0.000 0.000 101
The model was trained using the EarlyStopping callback of Keras. Thus, the training has continued for 4,173 epochs. Here is the obtained losses during the training for training and validation sets:
Here are the obtained accuracies during the training for training and validation sets:
The model was implemented using Keras and hosted on Google Colab.
Although such issues are difficult to resolve without the data, there are a couple of general rules applicable.
The very first thing we do when the model does not seem to learn anything, like here (despite the mild drop in the loss), is to remove all dropout.
In fact, dropout is not supposed to be used by default; its nominal function is to guard against overfitting - but of course, before starting to worry about overfitting, you must first have some success with fitting, something that is clearly not happening here. The fact that, with a dropout rate of n_dropout = 0.6, you also seem to be rather too aggressive in its use, does not help, either.

Getting higher accuracy with softmax + categorical_crossentropy compared to sigmoid + binary_crossentropy in LSTM

I am using Word2Vec encoding and training a LSTM model. My data only has two labels and about 10k instances with 45k features. My encoding's shape is (58137, 100), i trained it. I am keeping all the paramters same except for the softmax + categorical_crossentropy and sigmoid + binary_crossentropy. Since i have two labels i should be getting a better accuracy with sigmoid + binary_crossentropy? Here are my models.
#model.add(Embedding(maximum_words_number, e_dim, input_length=X.shape[1]))
model.add(Embedding(58137, 100, weights = [embeddings] ,input_length=X_train.shape[1],trainable = False)) # -> This adds Word2Vec encodings
model.add(LSTM(10,return_sequences= True, dropout=0.2, recurrent_dropout=0.2))
model.add(LSTM(10,return_sequences= False, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(2, activation='softmax'))
#opt = SGD(lr=0.05)
model.compile(loss='categorical_crossentropy', optimizer="Nadam", metrics=['accuracy'])
epochs = 4
batch_size = 100
model_outcome = model.fit(X_train, y_train_binary, epochs=epochs, batch_size=batch_size,validation_split=0.2,callbacks=[EarlyStopping(monitor='val_loss', patience=1, min_delta=0.0001)])
model = Sequential()
#model.add(Embedding(maximum_words_number, e_dim, input_length=X.shape[1]))
model.add(Embedding(58137, 100, weights = [embeddings] ,input_length=X_train.shape[1],trainable = False)) # -> This adds Word2Vec encodings
model.add(LSTM(10,return_sequences= True, dropout=0.2, recurrent_dropout=0.2))
model.add(LSTM(10,return_sequences= False, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(2, activation='sigmoid'))
#opt = SGD(lr=0.05)
model.compile(loss='binary_crossentropy', optimizer="Nadam", metrics=['accuracy'])
epochs = 4
batch_size = 100
model_outcome = model.fit(X_train, y_train_binary, epochs=epochs, batch_size=batch_size,validation_split=0.2,callbacks=[EarlyStopping(monitor='val_loss', patience=1, min_delta=0.0001)])
My accuracies and other evaluation scores (precision, recall and f1) on testing set is higher with the first model which uses softmax + categorical_crossentropy, can someone explain why is it the case to me?
And if there is something wrong with the model i created please let me know.
Thank you.
The accuracies should be the same(or very similar considering that you do not set seeds for exact reproducibility), but in your comparisons you made a mistake at this line:
model.add(Dense(2, activation='sigmoid'))
Here, for the binary_crossentropy and the sigmoid, you need 1 instead of 2 neurons.
Therefore,
model.add(Dense(1, activation='sigmoid'))
Of course, you need to make sure you provide the data in the right format (sigmoid and BCE [0,1,1,1,...] instead of softmax + CCE [[0,1],[1,0],[1,0],[1,0],...].

NaN loss in CNN-LSTM on Keras for Time Series forecasting

I've to predict the time dependence of soil wet from the rainfall and some several time series. For all of them I've forecasts and the only to do is prediction of soil wet.
According to guide I build a CNN model, cause Arima's can't take into account outer stohastic influence.
The model work's, but not as it should.
If You have a look on this picture enter image description here, You'll find that the forecasted series(yellow smsfu_sum) doesn't depend on rain (aprec series) as in training set. I want a sharp peak in forecast, but changing the sizes of kernel and pooling don't help.
So I tried to train CNN-LSTM model based on this guide
Here's code of architecture of model :
def build_model(train, n_input):
# prepare data
train_x, train_y = to_supervised(train, n_input)
# define parameters
verbose, epochs, batch_size = 1, 20, 32
n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
# reshape output into [samples, timesteps, features]
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
# define model
model = Sequential()
model.add(Conv1D(filters=64, kernel_size=3, activation='softmax', input_shape=(n_timesteps,n_features)))
model.add(Conv1D(filters=64, kernel_size=3, activation='softmax'))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(RepeatVector(n_outputs))
model.add(LSTM(200, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(100, activation='softmax')))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mse', optimizer='adam')
# fit network
model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
return model
I used batch size = 32, and split data with function:
def to_supervised(train, n_input, n_out=300):
# flatten data
data = train.reshape((train.shape[0]*train.shape[1], train.shape[2]))
X, y = list(), list()
in_start = 0
# step over the entire history one time step at a time
for _ in range(len(data)):
# define the end of the input sequence
in_end = in_start + n_input
out_end = in_end + n_out
# ensure we have enough data for this instance
if out_end <= len(data):
X.append(data[in_start:in_end, :])
y.append(data[in_end:out_end, 2])
# move along one time step
in_start += 1
return array(X), array(y)
Using n_input = 1000 and n_output = 480 (I've to predict for this time)
So the first iteration on this Network tends the loss function to Nan.
How should I fix it? There no missing values in my data, I droped every NaNs.