Binary CNN Spam Classifier using ResNet50 producing val-train metrices > 0.99 but test metric falls under 0.6. Faulty code or approach? - tensorflow

I am working on Binary Spam Classifier project where I have to classify whether an image is spam or not. To get things in detail, let me first show you what are not spams:
Any image in the world which is not a picture of a question on a paper is a spam be it a bike, black screen or anything.
I used my Custom made dummy model as well as ResNet50 with and without fine tuning. I have lots of images of questions but I used 47000 images of questions and COCO 2017 test data set as spams. Model did well producing great results but when I used it with imagenet data set images it performed very poorly. I want to ask if I am doing something wrong here or if not, what should I do to make my model generalised.
train_data_gen = ImageDataGenerator(preprocessing_function=preprocess_input,validation_split=0.20)
train_set = train_data_gen.flow_from_directory(TRAIN,batch_size=BATCH_SIZE,target_size=(224,224),
subset='training',class_mode='categorical')
val_set = train_data_gen.flow_from_directory(TRAIN,batch_size=128,target_size=(224,224),
subset='validation',class_mode='categorical')
res_net = ResNet50(include_top=False,weights='imagenet',input_shape=(224,224,3),pooling='avg')
fc1 = Dense(1024,activation='relu')(res_net.output)
d1 = Dropout(0.79)(fc1)
out_ = Dense(1,activation='sigmoid',name='output_layer')(d1)
model = Model(inputs=res_net.input, outputs= out_)
for layer in res_net.layers:
layer.trainable = False
model.compile(loss='binary_crossentropy', optimizer=adam,
metrics=['accuracy'])
history = model.fit(train_set,epochs=3,validation_data=val_set,
steps_per_epoch=len(train_set)//BATCH_SIZE,callbacks=callbacks)
I have used softmax with categorical_crossentropy (2 neurons) and sigmoid with binary_crossentropy (1 neuron). Model is performing well in all train and validation cases but Why is it not being generalise? Is there one thing wrong with the way I train my model, is the code faulty? Do I need more data, do I train the resnet from scratch.
I think I need more data but how much data should I be aiming at so that it tends to LEARN WHAT A QUESTION IMAGE LOOKS LIKE because spam could be anything.

Related

Why doesn't my CNN validation accuracy increase? [closed]

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 months ago.
Improve this question
I am attempting to create a simple CNN to be able to distinguish eye (retinal) scans of different severities. It is a multi-class classification problem, 5 classes. This by now is probably a fairly standard, textbook case for CNNs. I am using the Kaggle EyePACs dataset. The photos are very big, so I'm using a dataset that has rescaled them.
My issue is, when I'm training the model, I expect to see the usual learning curves where both training and validation curves increase together like this example from google:
However my curves look like this:
I haven't done any image pre-processing on the data, I was hoping that there would be some rudimentary learning going on which I can then improve upon using CLAHE and what have you. I've changed the classes so that instead of trying to predict the grades from 0 to 4, I've removed the middle classes so that we just have the extremes: 0 and 4 (and thus it became a binary classification problem, where class 4 was relabelled 1 and so it's 0 and 1). However the curve didn't change much and still looks like this:
What could be the issue? I thought that as the model gets better with the training data, it must improve on the validation. Yes, this is overfitting, but I assumed that kicks in after some positive learning, not straight away. Validation set doesn't seem to be learning at all. Also, shouldn't these models start with random parameters, so that the initial accuracy would be random; but instead it's around 0.75 from the get-go. It just doesn't learn after that. What's going on? What should I look at changing? Is this a data problem or a hyperparameter problem? Shall I include the code here? Many thanks.
=============================Edit=============================
Here's the code I used. I know it's rudimentary, it's a mishmash of both the 'image classification from scratch' Keras tutorial as well as some standard MNIST tutorials you get around the web. Grateful for any pointers.
Creating the image-label dataset objects for train (+validation split) and test:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
"/content/drive/MyDrive/Colab Notebooks/resized train 15/Binary 0-4",
labels="inferred",
label_mode="binary",
validation_split=0.2,
seed=1337,
subset="training",
)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
"/content/drive/MyDrive/Colab Notebooks/resized train 15/Binary 0-4",
labels="inferred",
label_mode="binary",
validation_split=0.2,
seed=1337,
subset="validation",
)
test_ds = tf.keras.preprocessing.image_dataset_from_directory(
"/content/drive/MyDrive/Colab Notebooks/resized test 15/0-4/",
labels="inferred",
label_mode="binary",
)
Found 26518 files belonging to 2 classes.
Using 21215 files for training.
Found 26518 files belonging to 2 classes.
Using 5303 files for validation.
Found 36759 files belonging to 2 classes.
#To make it run faster (I think?):
train_ds = train_ds.prefetch(buffer_size=32)
val_ds = val_ds.prefetch(buffer_size=32)
test_ds = test_ds.prefetch(buffer_size=32)
#The architecture:
from keras.models import Sequential
from keras.layers import Dense, Rescaling, Conv2D, MaxPool2D, Flatten
model = Sequential()
model.add(Rescaling(1.0 / 255))
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(256,256,3)))
model.add(MaxPool2D(pool_size=(2, 2), strides=2))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2), strides=2))
model.add(Flatten())
model.add(Dense(units=2, activation='sigmoid'))
#Compile it:
from keras import optimizers
model.compile(optimizer=keras.optimizers.Adam(1e-3), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
#And then finally train (run) it:
history = model.fit(
x=train_ds,
epochs=30,
validation_data=val_ds,
)
#I think this is how I evaluate the trained model against the test data:
loss, acc = model.evaluate(test_ds)
print("Accuracy", acc)
#It prints out the following output:
1149/1149 [==============================] - 278s 238ms/step - loss: 0.1408 - accuracy: 0.9672
Accuracy 0.9671916961669922
And then of course I end it with model.save('Binary CNN 0-4').
I think I have spotted one thing I can change already -- that's to change the loss function to binary_crossentropy and adjust the number of units at the final dense layer to 1 (instead of 2)(?). But surely that little change won't actually address why the validation set isn't learning.
You've not included code, so I hope it's OK to give a couple of
tentative general answers
Q) initial accuracy, how can it be as high as 0.75?
A) Tensorflow reports the average training accuracy over the epoch, and if
there are many batches then it learns during epoch 0.
The first accuracy reported is the average over epoch 0
and can be much better than random.
If, for example, the input data is unbalanced and has 75% of
labels in one category, the model may learn very quickly that
it can achieve 75% accuracy by allocating 100% of training data
to that category.
Q) Can overfitting start at the beginning?
A) It can start very close to the beginning. A network may in effect just be memorising the training set.
There are standard approaches to overfitting, which include
i) Try a simpler network. It makes sense anyway to start simple and add
complexity as required.
ii) Regularization of layers - add (e.g.) L2 regularizers to your layers
iii) Add dropout layers between hidden layers
iv) Batch normalisation between hidden layers.
v) Image augmentation (randomly add some rotation, shift, flipping if appropriate)
vi) Get more training data
vii) Use transfer learning
as another answer has suggested. This is most likely appropriate if
you don't have much training data. You can then just add a layer
or two to the pre-built model (probably removing its last
layer or two), and train only the new layers.
Only trial and error will show what works

How to freeze batch-norm layers during Transfer-learning

I am following the Transfer learning and fine-tuning guide on the official TensorFlow website. It points out that during fine-tuning, batch normalization layers should be in inference mode:
Important notes about BatchNormalization layer
Many image models contain BatchNormalization layers. That layer is a
special case on every imaginable count. Here are a few things to keep
in mind.
BatchNormalization contains 2 non-trainable weights that get updated during training. These are the variables tracking the mean and variance of the inputs.
When you set bn_layer.trainable = False, the BatchNormalization layer will run in inference mode, and will not update its mean & variance statistics. This is not the case for other layers in general, as weight trainability & inference/training modes are two orthogonal concepts. But the two are tied in the case of the BatchNormalization layer.
When you unfreeze a model that contains BatchNormalization layers in order to do fine-tuning, you should keep the BatchNormalization layers in inference mode by passing training=False when calling the base model. Otherwise the updates applied to the non-trainable weights will suddenly destroy what the model has learned.
You'll see this pattern in action in the end-to-end example at the end
of this guide.
Even tho, some other sources, for example this article (titled Transfer Learning with ResNet), says something completely different:
for layer in resnet_model.layers:
if isinstance(layer, BatchNormalization):
layer.trainable = True
else:
layer.trainable = False
ANYWAY, I know that there is a difference between training and trainable parameters in TensorFlow.
I am loading my model from file, as so:
model = tf.keras.models.load_model(path)
And I am unfreezing (or actually freezing the rest) some of the top layers in this way:
model.trainable = True
for layer in model.layers:
if layer not in model.layers[idx:]:
layer.trainable = False
NOW about batch normalization layers: I can either do:
for layer in model.layers:
if isinstance(layer, keras.layers.BatchNormalization):
layer.trainable = False
or
for layer in model.layers:
if layer.name.startswith('bn'):
layer.call(layer.input, training=False)
Which one should I do? And whether finally it is better to freeze batch norm layer or not?
Not sure about the training vs trainable difference, but personally I've gotten good results settings trainable = False.
Now as to whether to freeze them in the first place: I've had good results with not freezing them. The reasoning is simple, the batch norm layer learns the moving average of the initial training data. This may be cats, dogs, humans, cars e.t.c. But when you're transfer learning, you could be moving to a completely different domain. The moving averages of this new domain of images are far different from the prior dataset.
By unfreezing those layers and freezing the CNN layers, my model saw a 6-7% increase in accuracy (82 -> 89% ish). My dataset was far different from the inital Imagenet dataset that efficientnet was trained on.
P.S. Depending on how you plan on running the mode post training, I would advise you to freeze the batch norm layers once the model is trained. For some reason, if you ran the model online (1 image at a time), the batch norm would get all funky and give irregular results. Freezing them post training fixed the issue for me.
Use the code below to see whether the batch norm layer are being freezed or not. It will not only print the layer names but whether they are trainable or not.
def print_layer_trainable(conv_model):
for layer in conv_model.layers:
print("{0}:\t{1}".format(layer.trainable, layer.name))
In this case i have tested your method but did not freezed my model's batch norm layers.
for layer in model.layers:
if isinstance(layer, keras.layers.BatchNormalization):
layer.trainable = False
The code below worked nice for me. In my case the model is a ResNetV2 and the batch norm layers are named with the suffix "preact_bn". By using the code above for printing layers you can see how the batch norm layers are named and configure as you want.
for layer in new_model.layers[:]:
if ('preact_bn' in layer.name):
trainable = False
else:
trainable = True
layer.trainable = trainable
Just to add to #luciano-dourado answer;
In my case, I started by following the Transfer Learning guide as is, that is, freezing BN layers throughout the entire training (classifier + fine-tuning).
What I saw is that training the classifier worked without problems but as soon as I started fine-tuning, the loss went to NaN after a few batches.
After running the usual checks: input data without NaNs, loss functions yielding correct values, etc. I checked if BN layers were running in inference mode (trainable = False).
But in my case, the dataset was so different to ImageNet that I needed to do the contrary, set all trainable BN attributes to True. I found this empirically just as #zwang commented. Just remember to freeze them after training, before you deploy the model for inference.
By the way, just as an informative note, ResNet50V2, for example, has a total 49 BN layers of which only 16 are pre-activations BNs. This means that the remaining 33 layers were updating their mean and variance values.
Yet another case where one has to run several empirical tests to find out why the "standard" approach does not work in his/her case. I guess this further reinforces the importance of data in Deep Learning :)

CNN + LSTM model for images performs poorly on validation data set

My training and loss curves look like below and yes, similar graphs have received comments like "Classic overfitting" and I get it.
My model looks like below,
input_shape_0 = keras.Input(shape=(3,100, 100, 1), name="img3")
model = tf.keras.layers.TimeDistributed(Conv2D(8, 3, activation="relu"))(input_shape_0)
model = tf.keras.layers.TimeDistributed(Dropout(0.3))(model)
model = tf.keras.layers.TimeDistributed(MaxPooling2D(2))(model)
model = tf.keras.layers.TimeDistributed(Conv2D(16, 3, activation="relu"))(model)
model = tf.keras.layers.TimeDistributed(MaxPooling2D(2))(model)
model = tf.keras.layers.TimeDistributed(Conv2D(32, 3, activation="relu"))(model)
model = tf.keras.layers.TimeDistributed(MaxPooling2D(2))(model)
model = tf.keras.layers.TimeDistributed(Dropout(0.3))(model)
model = tf.keras.layers.TimeDistributed(Flatten())(model)
model = tf.keras.layers.TimeDistributed(Dropout(0.4))(model)
model = LSTM(16, kernel_regularizer=tf.keras.regularizers.l2(0.007))(model)
# model = Dense(100, activation="relu")(model)
# model = Dense(200, activation="relu",kernel_regularizer=tf.keras.regularizers.l2(0.001))(model)
model = Dense(60, activation="relu")(model)
# model = Flatten()(model)
model = Dropout(0.15)(model)
out = Dense(30, activation='softmax')(model)
model = keras.Model(inputs=input_shape_0, outputs = out, name="mergedModel")
def get_lr_metric(optimizer):
def lr(y_true, y_pred):
return optimizer.lr
return lr
opt = tf.keras.optimizers.RMSprop()
lr_metric = get_lr_metric(opt)
# merged.compile(loss='sparse_categorical_crossentropy',
optimizer='adam', metrics=['accuracy'])
model.compile(loss='sparse_categorical_crossentropy',
optimizer=opt, metrics=['accuracy',lr_metric])
model.summary()
In the above model building code, please consider the commented lines as some of the approaches I have tried so far.
I have followed the suggestions given as answers and comments to this kind of question and none seems to be working for me. Maybe I am missing something really important?
Things that I have tried:
Dropouts at different places and different amounts.
Played with inclusion and expulsion of dense layers and their number of units.
Number of units on the LSTM layer was tried with different values (started from as low as 1 and now at 16, I have the best performance.)
Came across weight regularization techniques and tried to implement them as shown in the code above and so tried to put it at different layers ( I need to know what is the technique in which I need to use it instead of simple trial and error - this is what I did and it seems wrong)
Implemented learning rate scheduler using which I reduce the learning rate as the epochs progress after a certain number of epochs.
Tried two LSTM layers with the first one having return_sequences = true.
After all these, I still cannot overcome the overfitting problem.
My data set is properly shuffled and divided in a train/val ratio of 80/20.
Data augmentation is one more thing that I found commonly suggested which I am yet to try, but I want to see if I am making some mistake so far which I can correct it and avoid diving into data augmentation steps for now. My data set has the below sizes:
Training images: 6780
Validation images: 1484
The numbers shown are samples and each sample will have 3 images. So basically, I input 3 mages at once as one sample to my time-distributed CNN which is then followed by other layers as shown in the model description. Following that, my training images are 6780 * 3 and my Validation images are 1484 * 3. Each image is 100 * 100 and is on channel 1.
I am using RMS prop as the optimizer which performed better than adam as per my testing
UPDATE
I tried some different architectures and some reularizations and dropouts at different places and I am now able to achieve a val_acc of 59% below is the new model.
# kernel_regularizer=tf.keras.regularizers.l2(0.004)
# kernel_constraint=max_norm(3)
model = tf.keras.layers.TimeDistributed(Conv2D(32, 3, activation="relu"))(input_shape_0)
model = tf.keras.layers.TimeDistributed(Dropout(0.3))(model)
model = tf.keras.layers.TimeDistributed(MaxPooling2D(2))(model)
model = tf.keras.layers.TimeDistributed(Conv2D(64, 3, activation="relu"))(model)
model = tf.keras.layers.TimeDistributed(MaxPooling2D(2))(model)
model = tf.keras.layers.TimeDistributed(Conv2D(128, 3, activation="relu"))(model)
model = tf.keras.layers.TimeDistributed(MaxPooling2D(2))(model)
model = tf.keras.layers.TimeDistributed(Dropout(0.3))(model)
model = tf.keras.layers.TimeDistributed(GlobalAveragePooling2D())(model)
model = LSTM(128, return_sequences=True,kernel_regularizer=tf.keras.regularizers.l2(0.040))(model)
model = Dropout(0.60)(model)
model = LSTM(128, return_sequences=False)(model)
model = Dropout(0.50)(model)
out = Dense(30, activation='softmax')(model)
Try to perform Data Augmentation as a preprocessing step. Lack of data samples can lead to such curves. You can also try using k-fold Cross Validation.
There are many ways to prevent overfitting, according to the papers below:
Dropout layers (Disabling randomly neurons). https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf
Input Noise (e.g. Random Gaussian Noise on the imges). https://arxiv.org/pdf/2010.07532.pdf
Random Data Augmentations (e.g. Rotating, Shifting, Scaling, etc.).
https://arxiv.org/pdf/1906.11052.pdf
Adjusting Number of Layers & Units.
https://clgiles.ist.psu.edu/papers/UMD-CS-TR-3617.what.size.neural.net.to.use.pdf
Regularization Functions (e.g. L1, L2, etc)
https://www.researchgate.net/publication/329150256_A_Comparison_of_Regularization_Techniques_in_Deep_Neural_Networks
Early Stopping: If you notice that for N successive epochs that your model's training loss is decreasing, but the model performs poorly on validaiton data set, then It is a good sign to stop the training.
Shuffling the training data or K-Fold cross validation is also common way way of dealing with Overfitting.
I found this great repository, which contains examples of how to implement data augmentations:
https://github.com/kochlisGit/random-data-augmentations
Also, this repository here seems to have examples of CNNs that implement most of the above methods:
https://github.com/kochlisGit/Tensorflow-State-of-the-Art-Neural-Networks
The goal should be to get the model predict correctly irrespective of
the order in which the 3 images in the sample are arranged.
If the order of the images of each sample is not important for the training, I think your model does the inverse, the Timedistributed layers succeded by LSTM take into account the order of the three images. As a solution, primarily, you can add images by reordering the images of each sample (= Augmented data). Secondly, try to consider the three images as one image with three-channel and remove the Timedistributed layers (I'm not sure that the three-channels are more efficient but you can give it a try)

Is my training data set too complex for my neural network?

I am new to machine learning and stack overflow, I am trying to interpret two graphs from my regression model.
Training error and Validation error from my machine learning model
my case is similar to this guy Very large loss values when training multiple regression model in Keras but my MSE and RMSE are very high.
Is my modeling underfitting? if yes what can I do to solve this problem?
Here is my neural network I used for solving a regression problem
def build_model():
model = keras.Sequential([
layers.Dense(128, activation=tf.nn.relu, input_shape=[len(train_dataset.keys())]),
layers.Dense(64, activation=tf.nn.relu),
layers.Dense(1)
])
optimizer = tf.keras.optimizers.RMSprop(0.001)
model.compile(loss='mean_squared_error',
optimizer=optimizer,
metrics=['mean_absolute_error', 'mean_squared_error'])
return model
and my data set
I have 500 samples, 10 features and 1 target
Quite the opposite: it looks like your model is over-fitting. When you have low error rates for your training set, it means that your model has learned from the data well and can infer the results accurately. If your validation data is high afterwards however, that means that the information learned from your training data is not successfully being applied to new data. This is because your model has 'fit' onto your training data too much, and only learned how to predict well when its based off of that data.
To solve this, we can introduce common solutions to reduce over-fitting. A very common technique is to use Dropout layers. This will randomly remove some of the nodes so that the model cannot correlate with them too heavily - therefor reducing dependency on those nodes and 'learning' more using the other nodes too. I've included an example that you can test below; try playing with the value and other techniques to see what works best. And as a side note: are you sure that you need that many nodes within your dense layer? Seems like quite a bit for your data set, and that may be contributing to the over-fitting as a result too.
def build_model():
model = keras.Sequential([
layers.Dense(128, activation=tf.nn.relu, input_shape=[len(train_dataset.keys())]),
Dropout(0.2),
layers.Dense(64, activation=tf.nn.relu),
layers.Dense(1)
])
optimizer = tf.keras.optimizers.RMSprop(0.001)
model.compile(loss='mean_squared_error',
optimizer=optimizer,
metrics=['mean_absolute_error', 'mean_squared_error'])
return model
Well i think your model is overfitting
There are several ways that can help you :
1-Reduce the network’s capacity Which you can do by removing layers or reducing the number of elements in the hidden layers
2- Dropout layers, which will randomly remove certain features by setting them to zero
3-Regularization
If i want to give a brief explanation on these:
-Reduce the network’s capacity:
Some models have a large number of trainable parameters. The higher this number, the easier the model can memorize the target class for each training sample. Obviously, this is not ideal for generalizing on new data.by lowering the capacity of the network, it's going to learn the patterns that matter or that minimize the loss. But remember،reducing the network’s capacity too much will lead to underfitting.
-regularization:
This page can help you a lot
https://towardsdatascience.com/handling-overfitting-in-deep-learning-models-c760ee047c6e
-Drop out layer
You can use some layer like this
model.add(layers.Dropout(0.5))
This is a dropout layer with a 50% chance of setting inputs to zero.
For more details you can see this page:
https://machinelearningmastery.com/how-to-reduce-overfitting-with-dropout-regularization-in-keras/
As mentioned in the existing answer by #omoshiroiii your model in fact seems to be overfitting, that's why RMSE and MSE are too high.
Your model learned the detail and noise in the training data to the extent that it is now negatively impacting the performance of the model on new data.
The solution is therefore randomly removing some of the nodes so that the model cannot correlate with them too heavily.

DeepLearning Anomaly Detection for images

I am still relatively new to the world of Deep Learning. I wanted to create a Deep Learning model (preferably using Tensorflow/Keras) for image anomaly detection. By anomaly detection I mean, essentially a OneClassSVM.
I have already tried sklearn's OneClassSVM using HOG features from the image. I was wondering if there is some example of how I can do this in deep learning. I looked up but couldn't find one single code piece that handles this case.
The way of doing this in Keras is with the KerasRegressor wrapper module (they wrap sci-kit learn's regressor interface). Useful information can also be found in the source code of that module. Basically you first have to define your Network Model, for example:
def simple_model():
#Input layer
data_in = Input(shape=(13,))
#First layer, fully connected, ReLU activation
layer_1 = Dense(13,activation='relu',kernel_initializer='normal')(data_in)
#second layer...etc
layer_2 = Dense(6,activation='relu',kernel_initializer='normal')(layer_1)
#Output, single node without activation
data_out = Dense(1, kernel_initializer='normal')(layer_2)
#Save and Compile model
model = Model(inputs=data_in, outputs=data_out)
#you may choose any loss or optimizer function, be careful which you chose
model.compile(loss='mean_squared_error', optimizer='adam')
return model
Then, pass it to the KerasRegressor builder and fit with your data:
from keras.wrappers.scikit_learn import KerasRegressor
#chose your epochs and batches
regressor = KerasRegressor(build_fn=simple_model, nb_epoch=100, batch_size=64)
#fit with your data
regressor.fit(data, labels, epochs=100)
For which you can now do predictions or obtain its score:
p = regressor.predict(data_test) #obtain predicted value
score = regressor.score(data_test, labels_test) #obtain test score
In your case, as you need to detect anomalous images from the ones that are ok, one approach you can take is to train your regressor by passing anomalous images labeled 1 and images that are ok labeled 0.
This will make your model to return a value closer to 1 when the input is an anomalous image, enabling you to threshold the desired results. You can think of this output as its R^2 coefficient to the "Anomalous Model" you trained as 1 (perfect match).
Also, as you mentioned, Autoencoders are another way to do anomaly detection. For this I suggest you take a look at the Keras Blog post Building Autoencoders in Keras, where they explain in detail about the implementation of them with the Keras library.
It is worth noticing that Single-class classification is another way of saying Regression.
Classification tries to find a probability distribution among the N possible classes, and you usually pick the most probable class as the output (that is why most Classification Networks use Sigmoid activation on their output labels, as it has range [0, 1]). Its output is discrete/categorical.
Similarly, Regression tries to find the best model that represents your data, by minimizing the error or some other metric (like the well-known R^2 metric, or Coefficient of Determination). Its output is a real number/continuous (and the reason why most Regression Networks don't use activations on their outputs). I hope this helps, good luck with your coding.