Can someone explain to me why Verlet integration is better than Euler integration? And why RK4 is better than Verlet? I don't understand why it is a better method.
The Verlet method is is good at simulating systems with energy conservation, and the reason is that it is symplectic. In order to understand this statement you have to describe a time step in your simulation as a function, f, that maps the state space into itself. In other words each timestep can be written on the following form.
(x(t+dt), v(t+dt)) = f(x(t),v(t))
The time step function, f, of the Verlet method has the special property that it conserves state-space volume. We can write this in mathematical terms. If you have a set A of states in the state space, then you can define f(A) by
f(A) = {f(x)| for x in A}
Now let us assume that the sets A and f(A) are smooth and nice so we can define their volume. Then a symplectic map, f, will always fulfill that the volume of f(A) is the same as the volume of A. (and this will be fulfilled for all nice and smooth choices of A). This is fulfilled by the time step function of the Verlet method, and therefore the Verlet method is a symplectic method.
Now the final question is. Why is a symplectic method good for simulating systems with energy conservation, but I am afraid that you will have to read a book to understand this.
The Euler method is a first order integration scheme, i.e. the total error is proportional to the step size. However, it can be numerically unstable, in other words, the accumulated error can overwhelm the calculation giving you nonsense. Please note, this instability can occur regardless of how small you make the step size or whether the system is linear or not. I am not familiar with verlet integration, so I can not speak to its efficacy. But, the Runge-Kutta methods differ from the Euler method in more than just step size.
In essence, they are based on a better way of numerically approximating the derivative. The precise details escape me at the moment. In general, the fourth order Runge-Kutta method is considered the workhorse of the integration schemes, but it does have some disadvantages. It is slightly dissipative, i.e. a small first derivative dependent term is added to your calculation which resembles an added friction. Also, it has a fixed step size which can result can make it difficult to achieve the accuracy you desire. Alternatively, you can use an adaptive stepsize scheme, like the Runge-Kutta-Fehlberg method, which gives fifth order accuracy for an additional 6 function evaluations. This can greatly reduce the time necessary to perform your calculation while improving accuracy, as shown here.
If everything just coasts along in a linear way, it wouldn't matter what method you used, but when something interesting (i.e. non-linear) happens, you need to look more carefully, either by considering the non-linearity directly (verlet) or by taking smaller timesteps (rk4).
Related
In GNU radio I am trying to use the frequency of one signal to generate another signal of a different frequency. Here is the flow diagram that I am using:
I generate a 50 kHz signal with a signal source block and feed this into a Log Power FFT block. I used the Argmax block to find the FFT bin with the most power and multiply that with a constant. I want to use this result as the input to the complex vco block to generate another signal with a different frequency. All vectors have a length of 4096.
However, looking at the output of the complex QT Gui Time Sink block, the output of the vco is always zero. This is strange to me because using a float QT Gui Time Sink to look at the output of the multiply block (which is also going to the input of the vco block), the result is 50,000 as expected. Why am I only getting zero out of the vco?
Also, my sample rate is set to 1M. I am assuming because of the vector length of 4096 that the sample rate out of the Argmax block will be 1M/4096 = 244. Is this correct?
I am running gnu radio companion on windows 10.
The proposed solution is not a solution. Please don't abuse the signal probe, which is really just that, a probe for slow, debugging or purely visual purposes. Every time I use it myself, I see how architecturally bad it is, and I personally think the project should be removing it from the block library altogether.
Now, instead of just say "probe is bad, do something else", let's analyse where your flow graph falls short:
your frequency estimation depends on the argmax of a block that was meant for pure visualization purposes. No, the output rate is not (sampling rate/FFT length), the output rate is roughly "frame rate" (but not actually exactly. That block is terrible and mixes "sample times" with "wall clock times"). Don't do that. If you need something like that, use the FFT block, followed my "complex to magnitude squared". You don't even want the logarithm - you're just looking for a maximum
Instead of looking for a maximum absolute value in an FFT, which is inherently a quantizing frequency estimator, use something that actually gives you an oscillation. There's multiple ways you can do that with a PLL!
your VCO solution probably does what it's programmed to do. You just use an inadequate sensitivity!
The sampling rate you assume in your time sinks is totally off, which is probably why you have the impression of a constant output – it just changes so slowly that you'll not notice.
So, I propose to instead, either / or:
Use the PLL Freq det. Feed the output of that into the VCO. Don't scale with a constant, but simply apply the proper sensitivity. Sensitivity is the factor between "input amplitude" and "phase advance per sample on the output in radians".
Use the PLL Carrier recovery. Use a resampler, or some other mathematical method, to generate the new frequency. You haven't told us how that other frequency relates to the input frequency, so I can't give you concrete advice.
Also notice that this very much suggest this being a case of "I'm trying to recreate an analog approach in digital"; that might be a good approach, but in many cases it's not.
If I might be as brazen: Describe why you need to generate that other frequency, for which purpose, in a post on https://dsp.stackexchange.com or to the GNU Radio mailing list discuss-gnuradio#gnu.org (sign up here). This really only barely is a programming problem, but really a signal processing problem. And there's a lot of people out there eager to help you find an appropriate solution that actually tackles your problem!
It looks like a better solution was to probe the output of the multiplier using a probe signal block along with a Function Probe Block to create a new variable. This variable could then be used as the frequency value in a separate Signal Source Block that is used to generate the new signal. This flow diagram seems to satisfy the original intended purpose: new flow diagram
I'm trying to figure out how to correct drift errors introduced by a SLAM method using GPS measurements, I have two point sets in euclidian 3d space taken at fixed moments in time:
The red dataset is introduced by GPS and contains no drift errors, while blue dataset is based on SLAM algorithm, it drifts over time.
The idea is that SLAM is accurate on short distances but eventually drifts, while GPS is accurate on long distances and inaccurate on short ones. So I would like to figure out how to fuse SLAM data with GPS in such way that will take best accuracy of both measurements. At least how to approach this problem?
Since your GPS looks like it is very locally biased, I'm assuming it is low-cost and doesn't use any correction techniques, e.g. that it is not differential. As you probably are aware, GPS errors are not Gaussian. The guys in this paper show that a good way to model GPS noise is as v+eps where v is a locally constant "bias" vector (it is usually constant for a few metters, and then changes more or less smoothly or abruptly) and eps is Gaussian noise.
Given this information, one option would be to use Kalman-based fusion, e.g. you add the GPS noise and bias to the state vector, and define your transition equations appropriately and proceed as you would with an ordinary EKF. Note that if we ignore the prediction step of the Kalman, this is roughly equivalent to minimizing an error function of the form
measurement_constraints + some_weight * GPS_constraints
and that gives you a more straigh-forward, second option. For example, if your SLAM is visual, you can just use the sum of squared reprojection errors (i.e. the bundle adjustment error) as the measurment constraints, and define your GPS constraints as ||x- x_{gps}|| where the x are 2d or 3d GPS positions (you might want to ignore the altitude with low-cost GPS).
If your SLAM is visual and feature-point based (you didn't really say what type of SLAM you were using so I assume the most widespread type), then fusion with any of the methods above can lead to "inlier loss". You make a sudden, violent correction, and augment the reprojection errors. This means that you lose inliers in SLAM's tracking. So you have to re-triangulate points, and so on. Plus, note that even though the paper I linked to above presents a model of the GPS errors, it is not a very accurate model, and assuming that the distribution of GPS errors is unimodal (necessary for the EKF) seems a bit adventurous to me.
So, I think a good option is to use barrier-term optimization. Basically, the idea is this: since you don't really know how to model GPS errors, assume that you have more confidance in SLAM locally, and minimize a function S(x) that captures the quality of your SLAM reconstruction. Note x_opt the minimizer of S. Then, fuse with GPS data as long as it does not deteriorate S(x_opt) more than a given threshold. Mathematically, you'd want to minimize
some_coef/(thresh - S(X)) + ||x-x_{gps}||
and you'd initialize the minimization with x_opt. A good choice for S is the bundle adjustment error, since by not degrading it, you prevent inlier loss. There are other choices of S in the litterature, but they are usually meant to reduce computational time and add little in terms of accuracy.
This, unlike the EKF, does not have a nice probabilistic interpretation, but produces very nice results in practice (I have used it for fusion with other things than GPS too, and it works well). You can for example see this excellent paper that explains how to implement this thoroughly, how to set the threshold, etc.
Hope this helps. Please don't hesitate to tell me if you find inaccuracies/errors in my answer.
I want to optimize KNN. There is a lot about SVM, RF and XGboost; but very few for KNN.
As far as I know the number of neighbors is one parameter to tune.
But what other parameters to test? Is there any good article?
Thank you
KNN is so simple method that there is pretty much nothing to tune besides K. The whole method is literally:
for a given test sample x:
- find K most similar samples from training set, according to similarity measure s
- return the majority vote of the class from the above set
Consequently the only thing used to define KNN besides K is the similarity measure s, and that's all. There is literally nothing else in this algorithm (as it has 3 lines of pseudocode). On the other hand finding "the best similarity measure" is equivalently hard problem as learning a classifier itself, thus there is no real method of doing so, and people usually end up using either simple things (Euclidean distance) or use their domain knowledge to adapt s to the problem at hand.
Lejlot, pretty much summed it all. K-NN is so simple that it's an instance based nonparametric algorithm, that's what makes it so beautiful, and works really well for certain specific examples. Most of K-NN research is not in K-NN itself but in the computation and hardware that goes into it. If you'd like some readings on K-NN and machine learning algorithms Charles Bishop - Pattern Recognition and Machine Learning. Warning: it is heavy in the mathematics, but, Machine Learning and real computer science is all math.
By optimizing if you are also focusing on the reduction of prediction time (you should) then there are other aspects which you can implement to make the algorithm more efficient (But these are not parameter tuning). The major draw back with the KNN is that with the increasing number of training examples, the prediction time also goes high thus performance go low.
To optimize, you can check on the KNN with KD-trees, KNN with inverted lists(index) and KNN with locality sensitive hashing (KNN with LSH).
These will reduce the search space during the prediction time thus optimizing the algorithm.
I am trying to implement a function approximator (aggregation) using a rule-based fuzzy control system. So as to simplify my implementation (and have better understanding) I am trying to approximate y=x^2 (the simplest non-linear function). As far as i understand i have to map my input (e.g. uniform samples over [-1,1]) to fuzzy sets (fuzzyfication) and then use a defuzzyfication method to take crisp values. Is there any simple explanation of this procedure because fuzzy control system literature is a bit mess.
This is sort of a broad question, but I'll give it a go since it has sat unanswered for so long.
First, I believe you need to refine your objective (at least as it stated here). I would hesitate to use the term "function approximation" in this context. If I follow your question correctly, the objective is map a non-linear function into another domain via fuzzy methods.
To do so, you first need to define your fuzzy set membership functions. (This link is a good example of the process.) Without additional information, the I recommend the triangular function due to its ease in implementation. The number of fuzzy sets, their placement and width (or support), and degree of overlap is application specific. You've indicated that your input domain is [-1,1], so you might find that three fuzzy sets does the trick, i.e Negative, Zero, and Positive.
From there, you need to craft a set of rules, i.e. if x is Negative then...
With rules in place, you can then define the defuzzification process. In short, this step weights the activation of each rule according to the needs of the application.
I don't believe I can contribute more fully until the output is better defined. You state "use a defuzzyfication method to take crisp values." - what does this set of crisp values mean? What is the range? Etc. Furthermore, you'll get more a response if you can identify the areas in which you are stuck (i.e. more specific questions).
I would like to maximize a function with one parameter.
So I run gradient descent (or, ascent actually): I start with an initial parameter and keep adding the gradient (times some learning rate factor that gets smaller and smaller), re-evaluate the gradient given the new parameter, and so on until convergence.
But there is one problem: My parameter must stay positive, so it is not supposed to become <= 0 because my function will be undefined. My gradient search will sometimes go into such regions though (when it was positive, the gradient told it to go a bit lower, and it overshoots).
And to make things worse, the gradient at such a point might be negative, driving the search toward even more negative parameter values. (The reason is that the objective function contains logs, but the gradient doesn't.)
What are some good (simple) algorithms that deal with this constrained optimization problem? I'm hoping for just a simple fix to my algorithm. Or maybe ignore the gradient and do some kind of line search for the optimal parameter?
Each time you update your parameter, check to see if it's negative, and if it is, clamp it to zero.
If clamping to zero is not acceptable, try adding a "log-barrier" (Google it). Basically, it adds a smooth "soft" wall to your objective function (and modifying your gradient) to keep it away from regions you don't want it to go to. You then repeatedly run the optimization by hardening up the wall to make it more infinitely vertical, but starting with the previously found solution. In the limit (in practice only a few iterations are needed), the problem you are solving is identical to the original problem with a hard constraint.
Without knowing more about your problem, it's hard to give specific advice. Your gradient ascent algorithm may not be particularly suitable for your function space. However, given that's what you've got, here's one tweak that would help.
You're following what you believe is an ascending gradient. But when you move forwards in the direction of the gradient, you discover you have fallen into a pit of negative value. This implies that there was a nearby local maximum, but also a very sharp negative gradient cliff. The obvious fix is to backtrack to your previous position, and take a smaller step (e.g half the size). If you still fall in, repeat with a still smaller step. This will iterate until you find the local maximum at the edge of the cliff.
The problem is, there is no guarantee that your local maximum is actually global (unless you know more about your function than you are sharing). This is the main limitation of naive gradient ascent - it always fixes on the first local maximum and converges to it. If you don't want to switch to a more robust algorithm, one simple approach that could help is to run n iterations of your code, starting each time from random positions in the function space, and keeping the best maximum you find overall. This Monte Carlo approach increases the odds that you will stumble on the global maximum, at the cost of increasing your run time by a factor n. How effective this is will depend on the 'bumpiness' of your objective function.
A simple trick to restrict a parameter to be positive is to re-parametrize the problem in terms of its logarithm (make sure to change the gradient appropriately). Of course it is possible that the optimum moves to -infty with this transformation, and the search does not converge.
At each step, constrain the parameter to be positive. This is (in short) the projected gradient method you may want to google about.
I have three suggestions, in order of how much thinking and work you want to do.
First, in gradient descent/ascent, you move each time by the gradient times some factor, which you refer to as a "learning rate factor." If, as you describe, this move causes x to become negative, there are two natural interpretations: Either the gradient was too big, or the learning rate factor was too big. Since you can't control the gradient, take the second interpretation. Check whether your move will cause x to become negative, and if so, cut the learning rate factor in half and try again.
Second, to elaborate on Aniko's answer, let x be your parameter, and f(x) be your function. Then define a new function g(x) = f(e^x), and note that although the domain of f is (0,infinity), the domain of g is (-infinity, infinity). So g cannot suffer the problems that f suffers. Use gradient descent to find the value x_0 that maximizes g. Then e^(x_0), which is positive, maximizes f. To apply gradient descent on g, you need its derivative, which is f'(e^x)*e^x, by the chain rule.
Third, it sounds like you're trying maximize just one function, not write a general maximization routine. You could consider shelving gradient descent, and tailoring the
method of optimization to the idiosyncrasies of your specific function. We would have to know a lot more about the expected behavior of f to help you do that.
Just use Brent's method for minimization. It is stable and fast and the right thing to do if you have only one parameter. It's what the R function optimize uses. The link also contains a simple C++ implementation. And yes, you can give it MIN and MAX parameter value limits.
You're getting good answers here. Reparameterizing is what I would recommend. Also, have you considered another search method, like Metropolis-Hastings? It's actually quite simple once you bull through the scary-looking math, and it gives you standard errors as well as an optimum.