Tensorflow loss converging but model fails to predict even on train data - tensorflow

Using ANN with Tensorflow to train a simple known equation Y=Sin(X) or Y=Cos(X). My loss function is converging properly.
Loss function convergence graph. If loss function converges it means model has fitted well to my training dataset.
However, when I predict passing in argument training set itself, model fails to predict even train data which is strange.
Here it can be seen that after 200th value there model shows no training at all
If the loss has converged then model should fit the train dataset perfectly but that is not happening here. What is wrong in my code?
X = np.linspace(0,10*np.pi,1000)
Y = np.sin(X)
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(500,input_shape=(1,),activation='relu'))
model.add(tf.keras.layers.Dense(1))
opt = tf.keras.optimizers.Adam(0.01)
model.compile(optimizer=opt,loss='mse')
r= model.fit(X.reshape(-1,1),Y,epochs=100)
plt.plot(r.history['loss'])
Yhat = model.predict(X.reshape(-1,1)).flatten()
plt.plot(Y)
plt.plot(Yhat)

It is the nature of your data.
It made me remember the old paper which showed that the ANN can't compute even the XOR
Anyway the reason here is that your model is shallow and shallow networks are much less efficient than deep networks. To put in perspective a model like below
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(20,input_shape=(1,),activation='relu'))
model.add(tf.keras.layers.Dense(20,activation='relu'))
model.add(tf.keras.layers.Dense(1))
Will likely perform better even though it has only 1/3 of the parameters of the original model and that is cause the deeper you go the more complex representations can the model create. The core thing to remember is
THE DEEP LEARNING MODEL DON'T BUILD NON-LINEAR DECISION BOUNDARIES as EACH AND EVERY
UNIT IS FUNDAMENTALLY DESIGNED TO CREATE SOME LINEAR DECISION BOUNDARY. so what does
it do? IT FROM STACKING THOSE LINEAR DECISION BOUNDARIES MAKE A REPRESENTATION OF
DATA WHICH IS LINEARLY SEPARABLE.
Also, the most important things is to know your data. In this case using the Probabilistic Models will give almost perfect results. You can easily implement those using TensorFlow probability.

Related

The importance of combining a train and inference in Deep Learning

I thought that the process of training and inference in Deep Learning is separated, however I saw this snippet, and I actually executed the code, I found that my understanding it might be not correct.
network.train()
batch_losses = []
for step, (imgs, label_imgs) in enumerate(train_loader):
imgs = Variable(imgs).cuda()
label_imgs = Variable(label_imgs.type(torch.LongTensor)).cuda()
outputs = network(imgs)
loss = loss_fn(outputs, label_imgs)
loss_value = loss.data.cpu().numpy()
batch_losses.append(loss_value)
I point out outputs = network(imgs) to figure out this procedure. this is typically used for estimating model? I know that estimating model is quite good to see our direction. but I don't understand that it's necessary. If I want to boost my training time, could I remove it?
The code snippet you shared is actually just the training code for a deep learning model. Here, the outputs = network(imgs) actually takes in the imgs i.e; the training data and gives the predicted outputs after passing through the network (the model you want to train your data on).
And then based on that outputs, loss_fn is calculated here loss = loss_fn(outputs, label_imgs) taking the predicted labels (outputs) and the actual labels (label_imgs)
Note: In the inference, you dont generally calculate the loss, you calculate the accuracy of the predicted labels. Loss is calculated only on the training data to measure if the training is happening properly and if the loss is decreasing.
I suggest you once go through some basic deep learning tutorials to get a better insight in case you still have some doubts.

Is my training data set too complex for my neural network?

I am new to machine learning and stack overflow, I am trying to interpret two graphs from my regression model.
Training error and Validation error from my machine learning model
my case is similar to this guy Very large loss values when training multiple regression model in Keras but my MSE and RMSE are very high.
Is my modeling underfitting? if yes what can I do to solve this problem?
Here is my neural network I used for solving a regression problem
def build_model():
model = keras.Sequential([
layers.Dense(128, activation=tf.nn.relu, input_shape=[len(train_dataset.keys())]),
layers.Dense(64, activation=tf.nn.relu),
layers.Dense(1)
])
optimizer = tf.keras.optimizers.RMSprop(0.001)
model.compile(loss='mean_squared_error',
optimizer=optimizer,
metrics=['mean_absolute_error', 'mean_squared_error'])
return model
and my data set
I have 500 samples, 10 features and 1 target
Quite the opposite: it looks like your model is over-fitting. When you have low error rates for your training set, it means that your model has learned from the data well and can infer the results accurately. If your validation data is high afterwards however, that means that the information learned from your training data is not successfully being applied to new data. This is because your model has 'fit' onto your training data too much, and only learned how to predict well when its based off of that data.
To solve this, we can introduce common solutions to reduce over-fitting. A very common technique is to use Dropout layers. This will randomly remove some of the nodes so that the model cannot correlate with them too heavily - therefor reducing dependency on those nodes and 'learning' more using the other nodes too. I've included an example that you can test below; try playing with the value and other techniques to see what works best. And as a side note: are you sure that you need that many nodes within your dense layer? Seems like quite a bit for your data set, and that may be contributing to the over-fitting as a result too.
def build_model():
model = keras.Sequential([
layers.Dense(128, activation=tf.nn.relu, input_shape=[len(train_dataset.keys())]),
Dropout(0.2),
layers.Dense(64, activation=tf.nn.relu),
layers.Dense(1)
])
optimizer = tf.keras.optimizers.RMSprop(0.001)
model.compile(loss='mean_squared_error',
optimizer=optimizer,
metrics=['mean_absolute_error', 'mean_squared_error'])
return model
Well i think your model is overfitting
There are several ways that can help you :
1-Reduce the network’s capacity Which you can do by removing layers or reducing the number of elements in the hidden layers
2- Dropout layers, which will randomly remove certain features by setting them to zero
3-Regularization
If i want to give a brief explanation on these:
-Reduce the network’s capacity:
Some models have a large number of trainable parameters. The higher this number, the easier the model can memorize the target class for each training sample. Obviously, this is not ideal for generalizing on new data.by lowering the capacity of the network, it's going to learn the patterns that matter or that minimize the loss. But remember،reducing the network’s capacity too much will lead to underfitting.
-regularization:
This page can help you a lot
https://towardsdatascience.com/handling-overfitting-in-deep-learning-models-c760ee047c6e
-Drop out layer
You can use some layer like this
model.add(layers.Dropout(0.5))
This is a dropout layer with a 50% chance of setting inputs to zero.
For more details you can see this page:
https://machinelearningmastery.com/how-to-reduce-overfitting-with-dropout-regularization-in-keras/
As mentioned in the existing answer by #omoshiroiii your model in fact seems to be overfitting, that's why RMSE and MSE are too high.
Your model learned the detail and noise in the training data to the extent that it is now negatively impacting the performance of the model on new data.
The solution is therefore randomly removing some of the nodes so that the model cannot correlate with them too heavily.

Approximating multidimensional functions with neural networks

Is it possible to fit or approximate multidimensional functions with neural networks?
Let's say I want to model the function f(x,y)=sin(x)+y from some given measurement data. (f(x,y) is considered as ground truth and is not known). Also if it's possible some code examples written in Tensorflow or Keras would be great.
As said by #AndreHolzner, theoretically you can approximate any continuous function with a neural network as well as you want, on any compact subset of R^n, even with only one hidden layer.
However, in practice, the neural net can have to be very large for some functions, and sometimes be untrainable (the optimal weights may be hard to find without getting in a local minimum). So here are a few practical suggestions (unfortunately vague, because the details depend too much on your data and are hard to predict without multiple tries):
Keep the network not too big (it'hard to define though, unfortunately): you'll just overfit. You'll probably need a LOT of training samples.
A big number of reasonably-sized layers is usually better than a reasonable number of big layers.
If you have some priors about the function, use them: for instance, if you believe there is some kind of periodicity in f (like in your example, but it could be more complicated), you could add the sin() function to some of of the outputs of the first layer (not all, that would give you a truly periodic output). If you suspect a polynom of degree n, just augment you input x with x², ...x^n and use a linear regression on that input, etc. It will be much easier than learning the weights.
The universal approximator theorem is true on any compact subset of R^n, not on the entire multidimensional space. In particular, you'll never be able to predict the value for an input that's way bigger than any of the training samples for instance (say you trained on numbers from 0 to 100, don't test on 200, it will fail).
For an example of regression you can look here for instance. To regress a more complicated function you'd need to put more complicated functions to get pred from x, for instance like this:
n_layers = 3
x = tf.placeholder(shape=[-1, n_dimensions], dtype=tf.float32)
last_layer = x
# Add n_layers dense hidden layers
for i in range(n_layers):
last_layer = tf.layers.dense(inputs=last_layer, units=128, activation=tf.nn.relu)
# Get the output prediction
pred = tf.layers.dense(inputs=last_layer, units=1, activation=None)
# Get the cost, training op, etc, just like in the linear regression example

Optimizer and Estimator in Neural Networks

When I started with Neural it seemed I understood Optimizers and Estimators well.
Estimators: Classifier to classify the value based on sample set and Regressor to predict the value based on sample set.
Optimizer: Using different optimizers (Adam, GradientDescentOptimizer) to minimise the loss function, which could be complex.
I understand every estimators come up with an Default optimizer internally to perform minimising the loss.
Now my question is how do they fit in together and optimize the machine training?
short answer: loss function link them together.
for example, if you are doing a classification, your classifier can take input and output a prediction. then you can calculate your loss by take predicted class and ground truth class. the task of your optimizer is to minimize the loss by modifying the parameter of your classifier.

DeepLearning Anomaly Detection for images

I am still relatively new to the world of Deep Learning. I wanted to create a Deep Learning model (preferably using Tensorflow/Keras) for image anomaly detection. By anomaly detection I mean, essentially a OneClassSVM.
I have already tried sklearn's OneClassSVM using HOG features from the image. I was wondering if there is some example of how I can do this in deep learning. I looked up but couldn't find one single code piece that handles this case.
The way of doing this in Keras is with the KerasRegressor wrapper module (they wrap sci-kit learn's regressor interface). Useful information can also be found in the source code of that module. Basically you first have to define your Network Model, for example:
def simple_model():
#Input layer
data_in = Input(shape=(13,))
#First layer, fully connected, ReLU activation
layer_1 = Dense(13,activation='relu',kernel_initializer='normal')(data_in)
#second layer...etc
layer_2 = Dense(6,activation='relu',kernel_initializer='normal')(layer_1)
#Output, single node without activation
data_out = Dense(1, kernel_initializer='normal')(layer_2)
#Save and Compile model
model = Model(inputs=data_in, outputs=data_out)
#you may choose any loss or optimizer function, be careful which you chose
model.compile(loss='mean_squared_error', optimizer='adam')
return model
Then, pass it to the KerasRegressor builder and fit with your data:
from keras.wrappers.scikit_learn import KerasRegressor
#chose your epochs and batches
regressor = KerasRegressor(build_fn=simple_model, nb_epoch=100, batch_size=64)
#fit with your data
regressor.fit(data, labels, epochs=100)
For which you can now do predictions or obtain its score:
p = regressor.predict(data_test) #obtain predicted value
score = regressor.score(data_test, labels_test) #obtain test score
In your case, as you need to detect anomalous images from the ones that are ok, one approach you can take is to train your regressor by passing anomalous images labeled 1 and images that are ok labeled 0.
This will make your model to return a value closer to 1 when the input is an anomalous image, enabling you to threshold the desired results. You can think of this output as its R^2 coefficient to the "Anomalous Model" you trained as 1 (perfect match).
Also, as you mentioned, Autoencoders are another way to do anomaly detection. For this I suggest you take a look at the Keras Blog post Building Autoencoders in Keras, where they explain in detail about the implementation of them with the Keras library.
It is worth noticing that Single-class classification is another way of saying Regression.
Classification tries to find a probability distribution among the N possible classes, and you usually pick the most probable class as the output (that is why most Classification Networks use Sigmoid activation on their output labels, as it has range [0, 1]). Its output is discrete/categorical.
Similarly, Regression tries to find the best model that represents your data, by minimizing the error or some other metric (like the well-known R^2 metric, or Coefficient of Determination). Its output is a real number/continuous (and the reason why most Regression Networks don't use activations on their outputs). I hope this helps, good luck with your coding.