About the difference between XGBoost and GBRT - xgboost

I'm trying to understand the algorithm of XGboost.
I have several qestions below:
What are the fundamental differences between XGboost and gradient boosting classifier(from scikit-learn)?
I learned that XGboost uses newton's method for optimization for loss function, but I don't understand what will happen in the case that hessian is nonpositive-definite.
I would be so happy if you help me.


How to integrate a pytorch model into a dynamic optimization, for example in Pyomo or gekko

Let's say I have a pytorch-model describing the evolution of some multidimensional system based on its own state x and an external actuator u. So x_(t+1) = f(x_t, u_t) with f being the artificial neural network from pytorch.
Now i want to solve a dynamic optimization problem to find an optimal sequence of u-values to minimize an objective that depends on x. Something like this:
min sum over all timesteps phi(x_t)
s.t.: x_(t+1) = f(x_t, u_t)
Additionally I also have some upper and lower bounds on some of the variables in x.
Is there an easy way to do this using a dynamic optimization toolbox like pyomo or gekko?
I already wrote some code that transforms a feedforward neural network to a numpy-function which can then be passed as a constraint to pyomo. The problem with this approach is, that it requires significant reprogramming-effort every time the structure of the neural network changes, so quick testing becomes difficult. Also integration of recurrent neural networks gets difficult because hidden cell states would have to be added as additional variables to the optimization problem.
I think a good solution could be to do the function evaluations and gradient calculations in torch and somehow pass the results to the dynamic optimizer. I'm just not sure how to do this.
Thanks a lot for your help!
Tensorflow or Pytorch models can't be directly integrated into the GEKKO at this moment. But, I believe you can retrieve the derivatives from Tensorflow and Pytorch, which allows you to pass them to the GEKKO.
There is a GEKKO Brain module and examples in the link below. You can also find an example that uses GEKKO Feedforward neural network for dynamic optimization.
GEKKO Brain Feedforward neural network examples
MIMO MPC example with GEKKO neural network model
Recurrent Neural Network library in the GEKKO Brain module is currently being developed, which allows using all the GEKKO's dynamic optimization functions easily.
In the meantime, you can use a sequential method by wrapping the TensorFlow or PyTorch models in the available optimization solver such as scipy optimization module.
Check out the below link for a dynamic optimization example with Keras LSTM model and scipy optimize.

can anybody suggest a package for neural ode for tensorflow?

I tried this : https://github.com/titu1994/tfdiffeq,
but had an issue and cannot proceed further - https://github.com/titu1994/tfdiffeq/issues/10
Tensorflow Probability has differentiable ODE solvers here.
You will get used to TFP solvers soon because the interface is much similar to tfdiffeq.
(But it also has some issues and I'm having trouble too😥)

Why using cross-entropy for calculating the loss in Variational Autoencoder

I am learning the theory and the implementation of variational autoencoder by reading this.
In the documentation, it said optimize the following function: log{p(x|z)} + log{p(z)} - log{q(z|x)}. However, in the code, I am not able to understand why the implementation used cross-entropy to calculate log{p(x|z)}. Can someone please explain to me how cross-entropy is linked to log{p(x|z)}?
Thanks in advance.

Tensorflow: How to perform binary classification as pre-processing and perform linear regression training

In Tensorflow, you can either perform either classification or linear regression to train your inputs against the labels. Is it possible to perform some classification for your inputs (as pre-processing but not necessarily to use Tensorflow) and determine if you want to run the linear regression using Tensorflow?
For example in image denoising task, you have found that your linear regression algorithm can provide a good smoothing effect against the edges but in the meantime also remove the details for the texture objects. Therefore you would like to perform a binary classification to determine if an input is a texture object, and run the linear regression algorithm using Tensorflow; otherwise do nothing for texture object.
I understand Tensorflow supports transfer learning so I guess one of the possible solutions is to perform binary classification using Tensorflow, and transfer the "texture classification" knowledge to instruct Tensorflow to apply linear regression algorithm only when the input is a texture object? Please correct me if I am wrong as I am not too sure if the above task is do-able in Tensorflow (it would be great if you can describe how to do this in details if this is do-able :-) ).
I guess an alternative solution is to use some binary classification without Tensorflow, and filter out (remove) the texture inputs before passing them to Tensorflow.
Please kindly tell me if which of the above solution (or any other solution) is better (if do-able) for the above scenario? Any suggestions are welcome.

Training complexity of Linear SVM

Which is the actual computational complexity of the learning phase of SVM (let's say, that implemented in LibSVM)?
Thank you
Training complexity of nonlinear SVM is generally between O(n^2) and O(n^3) with n the amount of training instances. The following papers are good references:
Support Vector Machine Solvers by Bottou and Lin
SVM-optimization and steepest-descent line search by List and Simon
PS: If you want to use linear kernel, do not use LIBSVM. LIBSVM is a general purpose (nonlinear) SVM solver. It is not an ideal implementation for linear SVM. Instead, you should consider things like LIBLINEAR (by the same authors as LIBSVM), Pegasos or SVM^perf. These have much better training complexity for linear SVM. Training speed can be orders of magnitude better than using LIBSVM.
This is going to be heavily dependent on svm type and kernel. There is a rather technical discussion http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf.
For a quick answer, http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf, says expect it to be n^2.