Support Vector Machine Primal Form Implementation - optimization

I am currently working on a support vector machine (SVM) project. The version of SVM that I am working on is Linear SVM in Primal Form and I am having hard time understanding where to start.
In general, I think I understand the theory; basically I need to minimize norm of w under certain constraint. And the Lagrangian function will be my objective function to be minimized (after Lagrange multiplier is applied).
The things that I don't understand is that I was told from my professor that we will be using Quasi-Newton method along with BFGS update. I have tried 2D and 3D case for Newton's method and I think I have good grasp of the algorithm, but I don't see how Quasi-Newton method is applied to find the coefficients alpha. Also, many literature that I read so far tells to apply Quadratic programming to find the coefficients.
How is the iterative algorithm of Quasi-Newton related to finding coefficients of w...? And how is quadratic programming related to Quasi-Newton? Can anyone please walk me through what is going on?

You are cunfusing many things here
"alpha coefficients" are only in the dual form, so you do not find them in your case
"apply Quadratic programming", quadratic programming is a problem, not a solution. you cannot "apply QP", you can only solve a QP, which in your case will be solved using quasi-newton method
"how is (...) related to finding coefficientss of w" exactly the same way, as this optimization technique is related to finding the optimal coefficients of any function. You are going to minimize the function of w, so applying any optimization technique (in particular quasi-netwton) will lead to solution expressed as w coefficients

Related

How does Constrained Nonlinear Optimization VI works? (Theory)

I am trying to get the theory behind LabVIEW's Constrained Nonlinear Optimization VI. There description provides how to use it but not which optimization algorithms works behind it.
Here is an overview of the optimization algorithms but it simply states
Solves a general nonlinear optimization problem with nonlinear equality constraint and nonlinear inequality constraint bounds using a sequential quadratic programming method.
I suspect that it is a wrapper for multiple algorithms depending on the inputs... I want to know if it uses a Levenberg-Marquardt or a Downhill-Simplex or other theory. It is not even said if it is trust-region or line search and how the bounds are ensured (e.g. by reflection)... In other languages, the documentation often refers to a paper from which I can take the original theory. This is what I am looking for. Can anyone help (or do I have to contact the NI support)? Thx
(using LabVIEW 2017 and 2018 32bit)

What parameters to optimize in KNN?

I want to optimize KNN. There is a lot about SVM, RF and XGboost; but very few for KNN.
As far as I know the number of neighbors is one parameter to tune.
But what other parameters to test? Is there any good article?
Thank you
KNN is so simple method that there is pretty much nothing to tune besides K. The whole method is literally:
for a given test sample x:
- find K most similar samples from training set, according to similarity measure s
- return the majority vote of the class from the above set
Consequently the only thing used to define KNN besides K is the similarity measure s, and that's all. There is literally nothing else in this algorithm (as it has 3 lines of pseudocode). On the other hand finding "the best similarity measure" is equivalently hard problem as learning a classifier itself, thus there is no real method of doing so, and people usually end up using either simple things (Euclidean distance) or use their domain knowledge to adapt s to the problem at hand.
Lejlot, pretty much summed it all. K-NN is so simple that it's an instance based nonparametric algorithm, that's what makes it so beautiful, and works really well for certain specific examples. Most of K-NN research is not in K-NN itself but in the computation and hardware that goes into it. If you'd like some readings on K-NN and machine learning algorithms Charles Bishop - Pattern Recognition and Machine Learning. Warning: it is heavy in the mathematics, but, Machine Learning and real computer science is all math.
By optimizing if you are also focusing on the reduction of prediction time (you should) then there are other aspects which you can implement to make the algorithm more efficient (But these are not parameter tuning). The major draw back with the KNN is that with the increasing number of training examples, the prediction time also goes high thus performance go low.
To optimize, you can check on the KNN with KD-trees, KNN with inverted lists(index) and KNN with locality sensitive hashing (KNN with LSH).
These will reduce the search space during the prediction time thus optimizing the algorithm.

Routh-Hurwitz useful when I can just calculate eigenvalues?

This is for self-study of N-dimensional system of linear homogeneous ordinary differential equations of the form:
dx/dt=Ax
where A is the coefficient matrix of the system.
I have learned that you can check for stability by determining if the real parts of all the eigenvalues of A are negative. You can check for oscillations if there are any purely imaginary eigenvalues of A.
The author in the book I'm reading then introduces the Routh-Hurwitz criterion for detecting stability and oscillations of the system. This seems to be a more efficient computational short-cut than calculating eigenvalues.
What are the advantages of using Routh-Hurwitz criteria for stability and oscillations, when you can just find the eigenvalues quickly nowadays? For instance, will it be useful when I start to study nonlinear dynamics? Is there some additional use that I am completely missing?
Wikipedia entry on RH stability analysis has stuff about control systems, and ends up with a lot of equations in the s-domain (Laplace transforms), but for my applications I will be staying in the time-domain for the most part, and just focusing fairly narrowly on stability and oscillations in linear (or linearized) systems.
My motivation: it seems easy to calculate eigenvalues on my computer, and the Routh-Hurwitz criterion comes off as sort of anachronistic, the sort of thing that might save me a lot of time if I were doing this by hand, but not very helpful for doing analysis of small-fry systems via Matlab.
Edit: I've asked this at Math Exchange, which seems more appropriate:
https://math.stackexchange.com/questions/690634/use-of-routh-hurwitz-if-you-have-the-eigenvalues
There is an accepted answer there.
This is just legacy educational curriculum which fell way behind of the actual computational age. Routh-Hurwitz gives a very nice theoretical basis for parametrization of root positions and linked to much more abstract math.
However, for control purposes it is just a nice trick that has no practical value except maybe simple transfer functions with one or two unknown parameters. It had real value when computing the roots of the polynomials were expensive or even manual. Today, even root finding of polynomials is based on forming the companion matrix and computing the eigenvalues. In fact you can basically form a meshgrid and check the stability surface by plotting the largest real part in a few minutes.

Finding best path trought strongly connected component

I have a directed graph which is strongly connected and every node have some price(plus or negative). I would like to find best (highest score) path from node A to node B. My solution is some kind of brutal force so it takes ages to find that path. Is any algorithm for this or any idea how can I do it?
Have you tried the A* algorithm?
It's a fairly popular pathfinding algorithm.
The algorithm itself is not to difficult to implement, but there are plenty of implementations available online.
Dijkstra's algorithm is a special case for the A* (in which the heuristic function h(x) = 0).
There are other algorithms who can outperform it, but they usually require graph pre-processing. If the problem is not to complex and you're looking for a quick solution, give it a try.
EDIT:
For graphs containing negative edges, there's the Bellman–Ford algorithm. Detecting the negative cycles comes at the cost of performance, though (worse than the A*). But it still may be better than what you're currently using.
EDIT 2:
User #templatetypedef is right when he says the Bellman-Ford algorithm may not work in here.
The B-F works with graphs where there are edges with negative weight. However, the algorithm stops upon finding a negative cycle. I believe that is a useful behavior. Optimizing the shortest path in a graph that contains a cycle of negative weights will be like going down a Penrose staircase.
What should happen if there's the possibility of reaching a path with "minus infinity cost" depends on the problem.

Objective C / Solving math systems

I've been trying to develop a simple app that calculates the equation of a curve using information given by the user.
For example, let's suppose the user has the equation of a circle and the equation of an ellipse, and he wants to know the intersection points.
Mathematically speaking, this is a quite simple problem, but i can't figure out how to tell Xcode to solve that system.
I've looked into the Accelerate framework, and i found the "dgesv" function of Lapack. This would be a perfect solution for a system of lines, but what about more complex systems like the one i've stated before?
I was even wondering how to calculate the tangent line of a curve, and other similar geometry problems.
For example, let's suppose the user has the equation of a circle and the equation of an ellipse, and he wants to know the intersection points. Mathematically speaking, this is a quite simple problem, but i can't figure out how to tell Xcode to solve that system.
The problem of intersecting a circle and an ellipse in general involves solving a fourth degree polynomial equation. You can break it down to third degree, but square roots alone won't suffice. So I'd say that's not really a “quiet simple problem”. That being said, some numeric libraries do support algorithms for finding the real or complex roots of a univariate polynomial. So you'd eliminate one of your variables (e.g. using resultants or Gröbner bases) and then feed the resulting polynomial to that. Not sure what libraries would be most suitable in the Objective C world. But a question asking to suggest a specific library would be deemed off-topic anyway, so use the keywords I mentioned to find something that suites your needs.
I was even wondering how to calculate the tangent line of a curve, and other similar geometry problems.
The tangent direction is simply orthogonal to the gradient of the implicit form. In other words, describe your curve not parametrically as x=f(t) and y=g(t) but instead implicitely as f(x,y)=0. In case of a circle that would look like x2+y2+ax+by+c=0. Then compute the partial derivatives with respect to x and y. That's 2x+a and 2y+b. So (2x+a, 2y+b) is perpendicular to the circle, and (2y+b, -2x-a) is tangential. Not sure what you'd consider similar to this.