Optimisation Problem with a stochastic integral - optimization

I am reading the following paper about forecasting interest rates: https://onlinelibrary.wiley.com/doi/full/10.1002/for.2783.
In section 3.2.3 - the Hull-White model, it mentions that the parameters can be found by solving the optimization problem to minimize the difference between actual and model interest rates:
optimisation problem
The model interest rate is given by the solution of the Hull-White equation which involves a stochastic integral as shown here:
interest rate solution
Is there a well known method to deal with this kind of problem? Thank you!

Related

Accuracy of solutions of differential equations with DeepXDE

We used DeepXDE for solving differential equations. (DeepXDE is a framework for solving differential equations, based on TensorFlow). It works fine, but the accuracy of the solution is limited, and optimizing the meta-parameters did not help. Is this limitation a well-known problem? How the accuracy of solutions can be increased? We used the Adam-optimizer; are there optimizers that are more suitable for numerical problems, if high precision is needed?
(I think the problem is not specific for some concrete equation, but if needed I add an example.)
There are actually some methods that could increase the accuracy of the model:
Random Resampling
Residual Adaptive Refinement (RAR): https://arxiv.org/pdf/1907.04502.pdf
They even have an implemented example in their github repository:
https://github.com/lululxvi/deepxde/blob/master/examples/Burgers_RAR.py
Also, You could try using a different architecture such as Multi-Scale Fourier NNs. They seem to outperform PINNs, in cases where the solution contains lots of "spikes".

Dynamically find the right amount of timesteps for an LSTM RNN predicting timeseries

I'm fairly new to the world of NNs so it could be that my question is extremely stupid, so sorry in advance:
I'm working on a predictive autoscaler, trying to predict the workload of different, unknown, applications using only historic workload data about this application.
One of my predictors is a LSTM RNN. To make the RNN predict the values in question, I have to define the timestep - that is the amount of Lags I feed in to the RNN to predict the future value (I hope I used the right terms here). Alot of tutorials and literature seems to set the timestep to a value that seems pretty random to me. My Question can be divided in two subquestions:
1. Given I don't know the Timeseries during implementation: Is there any way to compute this value other than trying different values and comparing the confidence of the prediction?
2. How does the Value influence the assumptions the RNN learns about that time series?
I sadly lack of any intuition on what this value influences. To make an example of my confusion:
Given I have a Time Series with a yearly seasonality, but I decide that I will only feed in the data of a week to make the next prediction: Is the Network able to learn this yearly seasonality? Part of me says no because it can't learn that the partial correlation between the timestamp in question and the lag 365 days ago is very high, because it does not have that data, right? Or does it because it has seen the data from a year ago while training and learned that fairly similar pattern and simply applies it now (which is more likely to be right I guess)?
Is my assumption right, that taking too many timestamps into the equation overfits the network?
Can you please help me get a vague understanding of what this parameter influences in the great scheme of things and what properties of a time series should influence my choice of that value?
Thank you so much and stay healthy :)

Deep learning basic thoughts

I try to understand the basics of deep learning, lastly reading a bit through deeplearning4j. However, I don't really find an answer for: How does the training performance scale with the amount of training data?
Apparently, the cost function always depends on all the training data, since it just sums the squared error per input. Thus, I guess at each optimization step, all datapoints have to be taken into account. I mean deeplearning4j has the dataset iterator and the INDArray, where the data can live anywhere and thus (I think) doesn't limit the amount of training data. Still, doesn't that mean, that the amount of training data is directly related to the calculation time per step within the gradient descend?
DL4J uses iterator. Keras uses generator. Still the same idea - your data comes in batches, and used for SGD. So, minibatches matter, not the the whole amount of data you have.
Fundamentally speaking it doesn't (though your mileage may vary). You must research right architecture for your problem. Adding new data records may introduce some new features, which may be hard to capture with your current architecture. I'd safely always question my net's capacity. Retrain your model and check if metrics drop.

Gauss-Newton products in Tensorflow

I would like to use the Gauss-Newton approximation to the Hessian as a metric for an optimization problem, such as the method used to fit the value function in GAE https://arxiv.org/abs/1506.02438. However, does anyone know how to efficiently compute these products? The issue is that I can not compute Jacobian off the shelf in Tensorflow, which makes it hard to do the per example one-rank computations. One solution is given in this technical report https://arxiv.org/pdf/1510.01799.pdf, however, this puts some constraints on the network architectures that can be used. Does it exist a more general solution?
As of april 2017 there is no general-purpose efficient way to compute per-example gradients in TensorFlow, calling tf.gradients on all examples is probably the best for now.

Neural network weights explode in linear unit

I am currently implementing a simple neural network and the backprop algorithm in Python with numpy. I have already tested my backprop method using central differences and the resulting gradient is equal.
However, the network fails to approximate a simple sine curve. The network hast one hidden layer (100 neurons) with tanh activation functions and a output layer with a linear activation function. Each unit hast also a bias input. The training is done by simple gradient descent with a learning rate of 0.2.
The problem arises from the gradient, which gets with every epoch larger, but I don't know why? Further, the problem is unchanged, if I decrease the learning rate.
EDIT: I have uploaded the code to pastebin: http://pastebin.com/R7tviZUJ
There are two things you can try, maybe in combination:
Use a smaller learning rate. If it is too high, you may be overshooting the minimum in the current direction by a lot, and so your weights will keep getting larger.
Use smaller initial weights. This is related to the first item. A smaller learning rate would fix this as well.
I had a similar problem (with a different library, DL4J), even in the case of extremely simple target functions. In my case, the issue turned out to be the cost function. When I changed from negative log likelihood to Poisson or L2, I started to get decent results. (And my results got MUCH better once I added exponential learning rate decay.)
Looks like you dont use regularization. If you train your network long enough it will start to learn the excact data rather than abstract pattern.
There are a couple of method to regularize your network like: stopped training, put a high cost to large gradients or more complex like e.g.g drop out. If you search web/books you probably will find many options for this.
A too big learning rate can fail to converge, and even DIVERGE, that is the point.
The gradient could diverge for this reason: when exceeding the position of the minima, the resulting point could not only be a bit further, but could even be at a greater distance than initially, but the other side. Repeat the process, and it will continue to diverge. in other words, the variation rate around the optimal position could be just to big compared to the learning rate.
Source: my understanding of the following video (watch near 7:30).
https://www.youtube.com/watch?v=Fn8qXpIcdnI&list=PLLH73N9cB21V_O2JqILVX557BST2cqJw4&index=10