Accuracy of solutions of differential equations with DeepXDE - tensorflow

We used DeepXDE for solving differential equations. (DeepXDE is a framework for solving differential equations, based on TensorFlow). It works fine, but the accuracy of the solution is limited, and optimizing the meta-parameters did not help. Is this limitation a well-known problem? How the accuracy of solutions can be increased? We used the Adam-optimizer; are there optimizers that are more suitable for numerical problems, if high precision is needed?
(I think the problem is not specific for some concrete equation, but if needed I add an example.)

There are actually some methods that could increase the accuracy of the model:
Random Resampling
Residual Adaptive Refinement (RAR): https://arxiv.org/pdf/1907.04502.pdf
They even have an implemented example in their github repository:
https://github.com/lululxvi/deepxde/blob/master/examples/Burgers_RAR.py
Also, You could try using a different architecture such as Multi-Scale Fourier NNs. They seem to outperform PINNs, in cases where the solution contains lots of "spikes".

Related

Object Detection model stuck at low mAP

I am trying to reproduce the results of the SSDLite model reported in the MobileNetV2 paper (arXiv:1801.04381), which should achieve about 22.1% mAP on the COCO detection challenge. However, I am stuck at 9% mAP. This is strange behavior because the model does work somewhat, but is still far off from the reported result. Can this much of a gap be caused by hyperparameters/optimizer choices (I am using adam instead of sgd), or is it almost certain that there is a bug in my implementation?
It is also worth mentioning that the model successfully overfits a small subset of the training set, but on the whole training set the loss seems to reach a plateau fairly quickly.
Has anyone encountered a problem similar to this?
Can this much of a gap be caused by hyperparameters/optimizer choices
(I am using adam instead of sgd), or is it almost certain that there
is a bug in my implementation?
Even small changes in the hyperparameters and a different optimizer choice can impact the training and the resulting precision of the classifier a lot. So your low precision might not necessary be due to a bug but could also be due to wrong parametrization.
It is also worth mentioning that the model successfully overfits a
small subset of the training set, but on the whole training set the
loss seems to reach a plateau fairly quickly.
Seems like you run into a local optimum which only works for a subset of your data, which could also be a pointer for an suboptimal parameterization.
Like #Matias Valdenegro also mentioned, to reproduce the exact result you might have to use the same parameters as in the original implementation.

Is it possible to train Neural Network with low amount of instances?

I have faced some problem when I needed to solve Regression Task and use as minimum instances as possible. When I tried to use Xgboost I had to feed 4 instances to get the reasonable result. But Multilayer Perceptron tuned to overcoming Regression problems has to take 20 instances, tried to change amount of neurons&layers but the answer is still 20 .Is it possible to do something to make Neural Network solve Resgression tasks with from 2 to 4 instances? if yes - explain please what should I do to succeed in it? Maybe there is some correlation between how much instances are needed to train and get reasonable results from Perceptron and how features are valuable inside dataset?
Thanks in advance for any help
With small numbers of samples, there are likely better methods to apply, Xgaboost definitely comes to mind as a method that does quite well at avoiding overfitting.
Neural networks tend to work well with larger numbers of samples. They often over fit to small datasets and underperform other algorithms.
There is, however, an active area of research in semi-supervised techniques using neural networks with large datasets of unlabeled data and small datasets of labeled samples.
Here's a paper to start you down that path, search on 'semi supervised learning'.
http://vdel.me.cmu.edu/publications/2011cgev/paper.pdf
Another area of interest to reduce overfitting in smaller datasets is in multi-task learning.
http://ruder.io/multi-task/
Multi task learning requires the network to achieve multiple target goals for a given input. Adding more requirements tends to reduce the space of solutions that the network can converge on and often achieves better results because of it. To say that differently: when multiple objectives are defined, the parameters necessary to do well at one task are often beneficial for the other task and vice versa.
Lastly, another area of open research is GANs and how they might be used in semi-supervised learning. No papers pop to the forefront of my mind on the subject just now, so I'll leave this mention as a footnote.

Tensorflow: how to find good neural network architectures/hyperparameters?

I've been using tensorflow on and off for various things that I guess are considered rather easy these days. Captcha cracking, basic OCR, things I remember from my AI education at university. They are problems that are reasonably large and therefore don't really lend themselves to experimenting efficiently with different NN architectures.
As you probably know, Joel Grus came out with FizzBuzz in tensorflow. TLDR: learning from a binary representation of a number (ie. 12 bits encoding the number) into 4 bits (none_of_the_others, divisible by 3, divisible by 5, divisible by 15). For this toy problem, you can quickly compare different networks.
So I've been trying a simple feedforward network and wrote a program to compare various architectures. Things like a 2-hidden-layer feedforward network, then 3 layers, different activation functions, ... Most architectures, well, suck. They get somewhere near 50-60 success rate and remain there, independent of how much training you do.
A few perform really well. For instance, a sigmoid-activated double hidden layer with 23 neurons each works really well (89-90% correct after 2000 training epochs). Unfortunately anything close to it is rather disastrously bad. Take one neuron out of the second or first layer and it drops to 30% correct. Same for taking it out of the first layer ... Single hidden layer, 20 neurons tanh activated does pretty well as well. But most have a little over half this performance.
Now given that for real problems I can't realistically do these sorts of studies of different architectures, are there ways to get good architectures guaranteed to work ?
You might find the paper by Yoshua Bengio on Practical Recommendations for Gradient-Based Training of Deep Architectures helpful to learn more about hyperparameters and their settings.
If you're asking specifically for settings that have more guaranteed succes, I advise you to read on Batch Normalization. I find that it decreases the failure rate for bad picks of the learning rate and weight initialization.
Some people also discourage the use of non-linearities like sigmoid() and tanh() as they suffer from the vanishing gradient problem

Neural network weights explode in linear unit

I am currently implementing a simple neural network and the backprop algorithm in Python with numpy. I have already tested my backprop method using central differences and the resulting gradient is equal.
However, the network fails to approximate a simple sine curve. The network hast one hidden layer (100 neurons) with tanh activation functions and a output layer with a linear activation function. Each unit hast also a bias input. The training is done by simple gradient descent with a learning rate of 0.2.
The problem arises from the gradient, which gets with every epoch larger, but I don't know why? Further, the problem is unchanged, if I decrease the learning rate.
EDIT: I have uploaded the code to pastebin: http://pastebin.com/R7tviZUJ
There are two things you can try, maybe in combination:
Use a smaller learning rate. If it is too high, you may be overshooting the minimum in the current direction by a lot, and so your weights will keep getting larger.
Use smaller initial weights. This is related to the first item. A smaller learning rate would fix this as well.
I had a similar problem (with a different library, DL4J), even in the case of extremely simple target functions. In my case, the issue turned out to be the cost function. When I changed from negative log likelihood to Poisson or L2, I started to get decent results. (And my results got MUCH better once I added exponential learning rate decay.)
Looks like you dont use regularization. If you train your network long enough it will start to learn the excact data rather than abstract pattern.
There are a couple of method to regularize your network like: stopped training, put a high cost to large gradients or more complex like e.g.g drop out. If you search web/books you probably will find many options for this.
A too big learning rate can fail to converge, and even DIVERGE, that is the point.
The gradient could diverge for this reason: when exceeding the position of the minima, the resulting point could not only be a bit further, but could even be at a greater distance than initially, but the other side. Repeat the process, and it will continue to diverge. in other words, the variation rate around the optimal position could be just to big compared to the learning rate.
Source: my understanding of the following video (watch near 7:30).
https://www.youtube.com/watch?v=Fn8qXpIcdnI&list=PLLH73N9cB21V_O2JqILVX557BST2cqJw4&index=10

Training complexity of Linear SVM

Which is the actual computational complexity of the learning phase of SVM (let's say, that implemented in LibSVM)?
Thank you
Training complexity of nonlinear SVM is generally between O(n^2) and O(n^3) with n the amount of training instances. The following papers are good references:
Support Vector Machine Solvers by Bottou and Lin
SVM-optimization and steepest-descent line search by List and Simon
PS: If you want to use linear kernel, do not use LIBSVM. LIBSVM is a general purpose (nonlinear) SVM solver. It is not an ideal implementation for linear SVM. Instead, you should consider things like LIBLINEAR (by the same authors as LIBSVM), Pegasos or SVM^perf. These have much better training complexity for linear SVM. Training speed can be orders of magnitude better than using LIBSVM.
This is going to be heavily dependent on svm type and kernel. There is a rather technical discussion http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf.
For a quick answer, http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf, says expect it to be n^2.