xgboost feature importance in R - xgboost

theres 8 columns input layer and binary classification labels.
i have made xgboost classification model with those, and its optimized colsample_bytree hyperparameter was 0.75
so in 8 inputs, only 6 inputs were used in this model.
but when i check feature importance of model, there's 8 inputs and all of it has frequency score.
i don't understand this. why 6 inputs are used but 8 columns are effects to model?
here's code

Related

How to prioritise certain output in MultiOutput LSTM Tensorflow?

Basically, I am creating an LSTM model with Tensorflow and the shape of my input data is something like
(10000 users, 6 timesteps, 20 feature columns) => (10000,6,20)
The model is doing a binary classification using LSTM with 20 output columns giving the shape of (10000, 20).
PS. I'm not doing classification with 20 classes, I'm doing a classification that gives 20 binary outputs for each person
Is it possible to prioritise certain output columns like giving weights or importance to certain columns more than others so that when we train the model it punishes incorrect predictions for these more important output columns more than others or would it make more sense to create separate models for these important columns?
It's easy to use class weights with TensorFlow for this purpose. See the class_weight parameter for model.fit(): https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit

Which Loss function & Metrics is more suitable for multi-label classification? Binary or Categorical cross-entropy and Why?

According to my knowledge(please correct me if I'm wrong),
Multi-label classification(mutually inclusive) i.e., samples might have more than 1 correct values (for example movie genre, disease detection, etc).
Multi-Class classification(mutually exclusive) i.e., samples will always have 1 correct value (for example Cat or Dog, object detection, etc) this includes Binary Classification.
Assuming output is one-hot encoding.
What are the Loss function and metrics on has to use for these 2 types?
loss func. metrics
1. multi-label: (binary, categorical) (binary_accuracy, TopKCategorical accuracy, categorical_accuracy, AUC)
2. multi-class: (binary) (binary_accuracy,f1, recall, precision)
Please tell me from the above table which of them is/are more suitable, which of them is/are wrong & Why?
If you are trying to use multi-class classification provided that the labels (y) is one hot encoded, use the loss function as categorical crossentropy and use adam optimizer (It is suitable for most cases). Also, while using multi-class classification, the number of output nodes should be the same as the number of classes (or) labels. Say if your model is going to classify the input into 4 classes, You can configure the output layer as follows..
model.add(4, activation = "softmax")
Also, forgot to mention that softmax activation should be used in the output layer for multiclass classification problems.
Incase if your y is not one hot encoded, I would advise you to choose the loss function as sparse categorical crossentropy. No other changes will be necessary.
Also, I usually split the data into test data and train data and feed them to the model like this to get the accuracy in each epoch..
history = model.fit(train_data, validation_data = test_data, epochs = 10)
Hope it solved your problem.

Seq2Seq prediction speed is slow

i'm trying to implement a Seq2Seq model using LSTM in tensorflow (from scratch, without rnn cell), the model works fine but the predict time for one sentence is slow for me. about 2 -> 6 sec a sentence. Is that normal?
My model:
2 LSTM for encode
2 LSTM for decode
Attention mechanism
Vocabulary: 400k
Word vector dimension: 300
The prediction code is totally on CPU.
I read some paper but they don't provide prediction speed. Thank you!

Adjust the number of classifications in the existing model provided by TensorFlow

I have tried Denny Britz's code on TensorFlow CNN Text Classification. Currently, the code is fixated on the number of text classifications as provided by the existing model of TensorFlow. For example, the model has 8 types of classifications. But I wish to increase/decrease the number of classifications. Is there a way to make it work?

Tensorflow Loss for Non-Independent Classes

I am using a Tensorflow network for classification between classes that are similar to their neighboring classes, i.e. not independent. For example, let's say we want to predict among 10 classes but the predictions are not merely "correct" or "incorrect." Instead, if the correct class is 7 and network predicts 6, the loss should be less than if the network predicted 5, because 6 is closer to the correct answer than 5. My understanding is that cross entropy and 1-hot vectors provides "all or nothing" loss rather than a "continuous" loss that reflects the magnitude of the error. If that is correct, how does one implement such a continuous loss in Tensorflow?
-- Update June 13 2016 ----
An example application might be color recognition. If the network predicts "green" but the true color is yellow-green, then the loss should be less than if the network predicted blue because green is a better prediction than blue.
You can choose to implement a continuous function (e.g. hue from HSV) as a single output, and construct your own loss calculation that reflects what you want to optimize. In that case you'd just have a single output value that ranged between 0.0 and 1.0, and the loss would be evaluated based on the distance from the labeled value.