When we use XGBOOST in binary classification, do we need to MinMaxscale the features we input? - xgboost

Given the retweet dynamic of a tweet for a certain period T, we want to predict whether the tweet will be widely spread?
The model we selected is XGBOOST
I have got about 10 dimension features. If I conduct the process of Minmaxscale (or other standard methods), the performance of the model can be improved. If I do not conduct the above process, the performance of the model can not be improved.
However, many people told me that there is no need to do this in the model xgboost


Does knowledge distillation have an ensemble effect?

I don't know much about knowledge distillation.
I have a one question.
There is a model with showing 99% performance(10class image classification). But I can't use a bigger model because I have to keep inference time.
Does it have an ensemble effect if I train knowledge distillation using another big model?
Or let me know if there's any way to improve performance than this.
The technical answer is no. KD is a different technique from ensembling.
But they are related in the sense that KD was originally proposed to distill larger models, and the authors specifically cite ensemble models as the type of larger model they experimented on.
Net net, give KD a try on your big model to see if you can keep a lot of the performance of the bigger model but with the size of the smaller model. I have empirically found that you can retain 75%-80% of the power of the a 5x larger model after distilling it down to the smaller model.
From the abstract of the KD paper:
A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions. Unfortunately, making predictions using a whole ensemble of models is cumbersome and may be too computationally expensive to allow deployment to a large number of users, especially if the individual models are large neural nets. Caruana and his collaborators have shown that it is possible to compress the knowledge in an ensemble into a single model which is much easier to deploy and we develop this approach further using a different compression technique. We achieve some surprising results on MNIST and we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model. We also introduce a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse. Unlike a mixture of experts, these specialist models can be trained rapidly and in parallel.

AutoML select the model manually

It is been a while I am looking for the best pipeline to do some classification using AutoML. But I want to know if it is possible to select the model manually and then just optimize its hyperparameters. For example, I want to just optimize SVM's hyperparameters and don't care about other models.
You can optimize only the selected model in MLJAR AutoML. It is open-source AutoML with code available at GitHub: https://github.com/mljar/mljar-supervised
The example code will look like:
automl = AutoML(algorithms=["Xgboost"], mode="Compete")
automl.fit(X, y)
The above code will tune only the Xgboost algorithm. The mode Compete is needed because the MLJAR AutoML can work in three modes: Explain, Perform, and Compete. Algorithms available in MLJAR AutoML: Baseline, Linear, Random Forest, Extra Trees, Decision Tree, Neural Networks, Nearest Neighbors, Xgboost, LightGBM, CatBoost.
I'm the author of MLJAR AutoML, I'll be happy to help you set it and run.

How to extract influential features from ANN?

I am working on ANN with 12[layer1 or ip] +6[Hidden layer-reLU]+6[Hidden layer-reLU]+1[output layer-sigmoid] using keras and want to know which input feature most influences the output.
How to measure importance of inputs is clearly discussed in the below link:
You can always remove different factors from the input, then train and test the neural network. Removing the most significant features will result in the biggest decline in classification accuracy. Of course, this method is not precise because removing inputs will change the NN architecture, and thus its properties.
Link to extract influential features

Can Tensorflow Wide and Deep model train to continuous values

I am working with the Tensorflow Wide and Deep model. It currently trains against a binary classification (>50K or not).
Can this model be coerced to train directly against numeric values to produce more precise (if less accurate) predictions?
I have seen an example of using LSTM RNNs to make such predictions using TensorFlowEstimator directly here, but DNNLinearCombinedClassifier will not accept n_classes=0.
I like the structure of the Wide and Deep model, especially the ability to run the linear regression and the DNN separately to determine how learnable the data is, but my application involves data that clusters, but in an overlapping, input-dependent fashion.
Use DnnLinearCombinedRegressor for regression problems.

One class classification - interpreting the models accuracy

I am using LIBSVM for classification of data. I am mainly doing One Class Classification.
My training sets consists of data of only one class & my testing data consists of data of two classes (one which belong to target class & the other which doesn't belong to the target class).
After applying svmtrain and svmpredict on both training and testing datasets the accuracy which is coming for training sets is 48% and for testing sets it is 34.72%.
Is it good? How can I know whether LIBSVM is classifying the datasets correctly?
To say if it is good or not depends entirely on the data you are trying to classify. You should search what is the state of the art accuracy for SVM model for your kind of classification and then you will be able to know if your model is good or not.
What I can say from your results is that the testing accuracy is worse than the training accuracy, which is normal as a classifier usually perform better with data it has already seen before.
What you can try now is to play with the regularization parameter (C if you are using a linear kernel) and see if the performance improves on the testing set.
You can also trace learning curves to see if your classifier overfit or not, which will help you choose if you need to increase or decrease the regularization.
For you case, you might want to apply weighting on the classes as the data is often sparse in favor of negative example.
To know whether Libsvm is classifying the dataset correctly you can look at which examples it predicted correctly and which ones it predicted incorrectly. Then you can try to change your features to improve its results.
If you are worried about your code being correct, you can try to code a toy example and play with it or use an example of someone on the web and replicate their results.