Training complexity of Linear SVM - time-complexity

Which is the actual computational complexity of the learning phase of SVM (let's say, that implemented in LibSVM)?
Thank you

Training complexity of nonlinear SVM is generally between O(n^2) and O(n^3) with n the amount of training instances. The following papers are good references:
Support Vector Machine Solvers by Bottou and Lin
SVM-optimization and steepest-descent line search by List and Simon
PS: If you want to use linear kernel, do not use LIBSVM. LIBSVM is a general purpose (nonlinear) SVM solver. It is not an ideal implementation for linear SVM. Instead, you should consider things like LIBLINEAR (by the same authors as LIBSVM), Pegasos or SVM^perf. These have much better training complexity for linear SVM. Training speed can be orders of magnitude better than using LIBSVM.

This is going to be heavily dependent on svm type and kernel. There is a rather technical discussion http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf.
For a quick answer, http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf, says expect it to be n^2.

Related

Accuracy of solutions of differential equations with DeepXDE

We used DeepXDE for solving differential equations. (DeepXDE is a framework for solving differential equations, based on TensorFlow). It works fine, but the accuracy of the solution is limited, and optimizing the meta-parameters did not help. Is this limitation a well-known problem? How the accuracy of solutions can be increased? We used the Adam-optimizer; are there optimizers that are more suitable for numerical problems, if high precision is needed?
(I think the problem is not specific for some concrete equation, but if needed I add an example.)
There are actually some methods that could increase the accuracy of the model:
Random Resampling
Residual Adaptive Refinement (RAR): https://arxiv.org/pdf/1907.04502.pdf
They even have an implemented example in their github repository:
https://github.com/lululxvi/deepxde/blob/master/examples/Burgers_RAR.py
Also, You could try using a different architecture such as Multi-Scale Fourier NNs. They seem to outperform PINNs, in cases where the solution contains lots of "spikes".

Is it meaningless to use ReduceLROnPlateau with Adam optimizer?

This question is basically for the working of Keras or tf.keras for people who have the verty deep knowledge of the framework
According to my knowledge, tf.keras.optimizers.Adam is an optimizer which has already an Adaptive Learning rate scheme. So if we are using from keras.callbacks.ReduceLROnPlateau with the Adam optimizer or any other, isn't it meaningless to do so? I don't have the very inner workings of Keras based Optimizer but it looks natural to me that if we are using the adaptive optimizer, why to to use this and If we use this given callback, what would be the effect on the training?
Conceptually, consider the gradient a fixed, mathematical value from automatic differentiation.
What every optimizer other than pure SGD does is to take the gradient and apply some statistical analysis to create a better gradient. In the simplest case, momentum, the gradient is averaged with previous gradients. In RMSProp, the variance of the gradient across batches is measured - the noisier it is, the less RMSProp "trusts" the gradient and so the gradient is reduced (divided by the stdev of the gradient for that weight). Adam does both.
Then, all optimizers multiply the statistically adjusted gradient by a learning rate.
So although one colloquial description of Adam is that it automatically tunes a learning rate... a more informative description is that Adam statistically adjusts gradients to be more reliable, but you still need to decide on a learning rate and how it changes during training (e.g. a LR policy). ReduceLROnPlateau, cosine decay, warmup, etc are examples of an LR policy.
Whether you program TF or PyTorch, the psuedocode on PyTorch's optimizers are my go to to understand the optimizer algorithms. Looks like a wall of greek letters as first, but you'll grok it if you stare at it for a few minutes.
https://pytorch.org/docs/stable/optim.html

Best case to use tensorflow

I followed all the steps mentioned in the article:
https://stackabuse.com/tensorflow-2-0-solving-classification-and-regression-problems/
Then I compared the results with Linear Regression and found that the error is less (68) than the tensorflow model (84).
from sklearn.linear_model import LinearRegression
logreg_clf = LinearRegression()
logreg_clf.fit(X_train, y_train)
pred = logreg_clf.predict(X_test)
print(np.sqrt(mean_squared_error(y_test, pred)))
Does this mean that if I have large dataset, I will get better results than linear regression?
What is the best situation - when I should be using tensorflow?
Answering your first question, Neural Networks are notoriously known for overfitting on smaller datasets, and here you are comparing the performance of a simple linear regression model with a neural network with two hidden layers on the testing data set, so it's not very surprising to see that the MLP model falling behind (assuming that you are working with relatively a smaller dataset) the linear regression model. Larger datasets will definitely help neural networks in learning more accurate parameters and generalize the phenomena well.
Now coming to your second question, Tensorflow is basically a library for building deep learning models, so whenever you are working on a deep learning problem like image recognition, Natural Language Processing, etc. you need massive computational power and will be processing a ton of data to train your models, and this is where TensorFlow becomes handy, it offers you GPU support which will significantly boost your training process which otherwise becomes practically impossible. Moreover, if you are building a product that has to be deployed in a production environment for it to be consumed, you can make use of TensorFlow Serving which helps you to take your models much closer to the customers.

How to compute the complexity of machine learning models

I am working on comparison of deep learning models with application in Vehicular network communication security. I want to know how I can compute the complexity of these models to know the performance of my proposed ones. I am making use of tensorflow
You can compare the complexity of two deep networks with respect to space and time.
Regarding space complexity:
Number of parameters in your model -> this is directly proportional to the amount of memory consumed by your model.
Regarding time complexity:
Amount of time it takes to train a single batch for a given batch size.
Amount of time it takes for training to converge
Amount of time it takes to perform inference on a single sample
Some papers also discuss the architecture complexity. For example, if GoogLeNet accuracy is only marginally higher than VGG-net, some people might prefer VGG-net as it is a lot easier to implement.
You can also discuss some analysis on tolerance of your network to hyperparameter tuning i.e. how your performance varies when you change the hyperparameters.
If your model is in a distributed setting, there are other things to mention such as the communication interval as it is the bottleneck sometimes.
In summary, you can discuss pretty much anything you feel that is implemented differently in another network and that is contributing additional complexity without much improvement in accuracy with respect to your network.
I don't think you would want it but there is also an open source project called deepBench to benchmark different deep network models.

Test Matrix for pre-computed kernel in libsvm (scikit learn)

I am using Weisfeiler-Lehman Graph Kernels from here to get the precomputed kernel for the Scikit learn SVM see description.
At test time, what should be the format of my data? I'm really confused about that. See dimension requirements.
Thanks very much.