Predicting new values in logistic regression - tensorflow

I am building a logistic regression model in tensorflow to approximate a function.
When I randomly select training and testing data from the complete dataset, I get a good result like so (blue are training points; red are testing points, the black line is the predicted curve):
But when I select the spatially seperate testing data, I get terrible predicted curve like so:
I understand why this is happening. But shouldn't a machine learning model learn these patterns and predict new values?
Similar thing happens with a periodic function too:
Am I missing something trivial here?
P.S. I did google this query for quite some time but was not able to get a good answer.
Thanks in advance.

What you are trying to do here is not related to logistic regression. Logistic regression is a classifier and you are doing regression.
No, machine learning systems aren't smart enough to learn to extrapolate functions like you have here. When you fit the model you are telling it to find an explanation for the training data. It doesn't care what the model does outside the range of training data. If you want it to be able to extrapolate then you need to give it extra information. You could set it up to assume that the input belonged to a sine wave or a quadratic polynomial and have it find the best fitting one. However, with no assumptions about the form of the function you won't be able to extrapolate.

Related

Is this overfitting

I’m running a machine learning algorithm to answer True/False questions.
Assuming I use classification algo.
After running 1200 data, I got 30% of accuracy.
But then, I made a second algorithm to always negate the first algorithm’s answer
Thus it’s accuracy is 70%
Is this an overfitting for the second algo? Assuming my 1st algorithm consistenly predicts 30% accuracy
To your questions.
I feel like this answer kind of depends on the machine learning model which you choose and the training set. Most ML Models make mistakes initially. In your case if the training set of Algo 2 is 70% it might mean that it is good at predicting the wrong thing? If i'm understanding this correctly? All though this might be true in the beginning of the data negating a ML answer is a bad idea. The better idea is to prepare your data correctly and train it on a data set which is the best fit for your model.
Most Machine learning models make mistakes it is bound to happen. But the training set and all that data helps you to choose the right model. Data preparation is key in order to make your training set correctly. I know I'm bouncing all over the place. I apologize for that
For instance we might have a logistic regression model and we want to identify the individuals who have a certain condition versus those who don't. The first thing we do is properly prepare our data and then train it (this is the short version) but my point is training a model is very important it allows your ML model to be able to predict the accuracy of it.
I should say I really enjoy Machine Learning/ Deep Learning but I am no means an expert. I highly recommend this class though its how I started off understanding the fundamentals.
Coursera Andrew Ng course

Do I need every class in a training image for object detection?

I just try to dive into TensorFlows Object Detection. I have a very small training set of circa 40 images yet. Each image can have up to 3 classes. But now the question came into my mind: Does every training image need every class? Is that important for efficient training? Or is it okay if an image may only have one of the object classes?
I get a very high total loss with ~8.0 and thought this might be the reason for this but I couldn't find an answer.
In general machine learning systems can cope with some amount of noise.
An image missing labels or having the wrong labels is fine as long as overall you have sufficient data for the model to figure it out.
40 examples for image classification sounds very small. It might work if you start with a pre-trained image network and there are few classes that are very easy to distinguish.
Ignore absolute the loss value, it doesn't mean anything. Look at the curve to see that the loss is decreasing and stop the training when the curve flattens out. Compare the loss value to a test dataset to check if the values are sufficiently similar (you are not overfitting). You might be able to compare to another training of the exact same system (to check if the training is stable for example).

How to predict using Tensorflow?

This is a newbie question for the tensorflow experts:
I reading lot of data from power transformer connected to an array of solar panels using arduinos, my question is can I use tensorflow to predict the power generation in future.
I am completely new to tensorflow, if can point me to something similar I can start with that or any github repo which is doing similar predictive modeling.
Edit: Kyle pointed me to the MNIST data, which I believe is a Image Dataset. Again, not sure if tensorflow is the right computation library for this problem or does it only work on Image datasets?
thanks, Rajesh
Surely you can use tensorflow to solve your problem.
TensorFlow™ is an open source software library for numerical
computation using data flow graphs.
So it works not only on Image dataset but also others. Don't worry about this.
And about prediction, first you need to train a model(such as linear regression) on you dataset, then predict. The tutorial code can be found in tensorflow homepage .
Get your hand dirty, you will find it works on your dataset.
Good luck.
You can absolutely use TensorFlow to predict time series. There are plenty of examples out there, like this one. And this is a really interesting one on using RNN to predict basketball trajectories.
In general, TF is a very flexible platform for solving problems with machine learning. You can create any kind of network you can think of in it, and train that network to act as a model for your process. Depending on what kind of costs you define and how you train it, you can build a network to classify data into categories, predict a time series forward a number of steps, and other cool stuff.
There is, sadly, no short answer for how to do this, but that's just because the possibilities are endless! Have fun!

One class classification - interpreting the models accuracy

I am using LIBSVM for classification of data. I am mainly doing One Class Classification.
My training sets consists of data of only one class & my testing data consists of data of two classes (one which belong to target class & the other which doesn't belong to the target class).
After applying svmtrain and svmpredict on both training and testing datasets the accuracy which is coming for training sets is 48% and for testing sets it is 34.72%.
Is it good? How can I know whether LIBSVM is classifying the datasets correctly?
To say if it is good or not depends entirely on the data you are trying to classify. You should search what is the state of the art accuracy for SVM model for your kind of classification and then you will be able to know if your model is good or not.
What I can say from your results is that the testing accuracy is worse than the training accuracy, which is normal as a classifier usually perform better with data it has already seen before.
What you can try now is to play with the regularization parameter (C if you are using a linear kernel) and see if the performance improves on the testing set.
You can also trace learning curves to see if your classifier overfit or not, which will help you choose if you need to increase or decrease the regularization.
For you case, you might want to apply weighting on the classes as the data is often sparse in favor of negative example.
To know whether Libsvm is classifying the dataset correctly you can look at which examples it predicted correctly and which ones it predicted incorrectly. Then you can try to change your features to improve its results.
If you are worried about your code being correct, you can try to code a toy example and play with it or use an example of someone on the web and replicate their results.

Neural network weights explode in linear unit

I am currently implementing a simple neural network and the backprop algorithm in Python with numpy. I have already tested my backprop method using central differences and the resulting gradient is equal.
However, the network fails to approximate a simple sine curve. The network hast one hidden layer (100 neurons) with tanh activation functions and a output layer with a linear activation function. Each unit hast also a bias input. The training is done by simple gradient descent with a learning rate of 0.2.
The problem arises from the gradient, which gets with every epoch larger, but I don't know why? Further, the problem is unchanged, if I decrease the learning rate.
EDIT: I have uploaded the code to pastebin: http://pastebin.com/R7tviZUJ
There are two things you can try, maybe in combination:
Use a smaller learning rate. If it is too high, you may be overshooting the minimum in the current direction by a lot, and so your weights will keep getting larger.
Use smaller initial weights. This is related to the first item. A smaller learning rate would fix this as well.
I had a similar problem (with a different library, DL4J), even in the case of extremely simple target functions. In my case, the issue turned out to be the cost function. When I changed from negative log likelihood to Poisson or L2, I started to get decent results. (And my results got MUCH better once I added exponential learning rate decay.)
Looks like you dont use regularization. If you train your network long enough it will start to learn the excact data rather than abstract pattern.
There are a couple of method to regularize your network like: stopped training, put a high cost to large gradients or more complex like e.g.g drop out. If you search web/books you probably will find many options for this.
A too big learning rate can fail to converge, and even DIVERGE, that is the point.
The gradient could diverge for this reason: when exceeding the position of the minima, the resulting point could not only be a bit further, but could even be at a greater distance than initially, but the other side. Repeat the process, and it will continue to diverge. in other words, the variation rate around the optimal position could be just to big compared to the learning rate.
Source: my understanding of the following video (watch near 7:30).
https://www.youtube.com/watch?v=Fn8qXpIcdnI&list=PLLH73N9cB21V_O2JqILVX557BST2cqJw4&index=10