Why the best parameter found by grid search performs worse than default even though the default was included in the search space - xgboost

I have used the steps given in Analytics Vidhya. In one of the steps it has the following:
param_test4 = {'subsample':[i/10.0 for i in range(6,10)],
'colsample_bytree':[i/10.0 for i in range(6,10)]
}
gsearch4 = GridSearchCV(estimator = XGBClassifier( learning_rate =0.1,
n_estimators=177,
max_depth=4,
min_child_weight=6,
gamma=0,
subsample=0.8,
colsample_bytree=0.8,
objective= 'binary:logistic',
nthread=4,
scale_pos_weight=1,seed=27),
param_grid = param_test4,
scoring='roc_auc',
n_jobs=4,
iid=False,
cv=5)
gsearch4.fit(train[predictors],train[target])
gsearch4.grid_scores_, gsearch4.best_params_, gsearch4.best_score_
I have tried this code (on my own data). and before doing grid search for parameters in
param_test4, I did fit the model for subsample=0.8, and colsample_bytree=0.8. Even tho these two values are amongst the list of parameters set in the param_test4, the output of grid search performs worse than just using subsample=0.8, and colsample_bytree=0.8 directly. I was wondering if anyone has seen this problem/situation before.
Thanks

Related

Reload Keras-Tuner Trials from the directory

I'm trying to reload or access the Keras-Tuner Trials after the Tuner's search has completed for inspecting the results. I'm not able to find any documentation or answers related to this issue.
For example, I set up BayesianOptimization to search for the best hyper-parameters as follows:
## Build Hyper Parameter Search
tuner = kt.BayesianOptimization(build_model,
objective='val_categorical_accuracy',
max_trials=10,
directory='kt_dir',
project_name='lstm_dense_bo')
tuner.search((X_train_seq, X_train_num), y_train_cat,
epochs=30,
batch_size=64,
validation_data=((X_val_seq, X_val_num), y_val_cat),
callbacks=[callbacks.EarlyStopping(monitor='val_loss', patience=3,
restore_best_weights=True)])
I see this creates trial files in the directory kt_dir with project name lstm_dense_bo such as below:
Now, if I restart my Jupyter kernel, how can I reload these trials into a Tuner object and subsequently inspect the best model or the best hyperparameters or the best trial?
I'd very much appreciate your help. Thank you
I was trying to do the same thing. I was looking into the keras docs for an easier way than this but could not find one - so if any other SO-ers have a better idea, please let us know!
Load the previous tuner. Make sure overwrite=False or else you'll delete your trials.
workdir = "mlp_202202151345"
obj = "val_recall"
tuner = kt.Hyperband(
hypermodel=build_model,
metrics=metrics,
objective=kt.Objective(obj, direction="max"),
executions_per_trial=1,
overwrite=False,
directory=workdir,
project_name="keras_tuner",
)
Look for a trial you want to load. Note that TensorBoard works really well for this. In this example, I'm loading 1a38ebaba07b77501999cb1c4ab9413e.
Here's the part that I could not find in Keras docs. This might be dependent on the tuner you use (I am using Hyperband):
tuner.oracle.get_trial('1a38ebaba07b77501999cb1c4ab9413e')
Returns a Trial object (also could not find in the docs). The Trial object has a hyperparameters attribute that will return that trial's hyperparameters. Now:
tuner.hypermodel.build(trial.hyperparameters)
Gives you the trial's model for training, evaluation, predictions, etc.
NOTE This seems convuluted and hacky, would love to see a better way.
j7skov has correctly mentioned that you need to reload previous tuner and set the parameter overwrite=False(so that tuner will not overwrite already generated trials).
Further if you want to load first K best models then we need to use tuner's get_best_models method as below
# This will load 10 best hyper tuned models with the weights
# corresponding to their best checkpoint (at the end of the best epoch of best trial).
best_model_count = 10
bo_tuner_best_models = tuner.get_best_models(num_models=best_model_count)
Then you can access a specific best model as below
best_model_id = 7
model = bo_tuner_best_models[best_model_id]
This method is for querying the models trained during the search. For best performance, it is recommended to retrain your Model on the full dataset using the best hyperparameters found during search, which can be obtained using tuner.get_best_hyperparameters().
tuner_best_hyperparameters = tuner.get_best_hyperparameters(num_trials=best_model_count)
best_hp = tuner_best_hyperparameters[best_model_id]
model = tuner.hypermodel.build(best_hp)
If you want to just display hyperparameters for the K best models then use tuner's results_summary method as below
tuner.results_summary(num_trials=best_model_count)
For further reference visit this page.
Inspired by j7skov, I found that the models can be reloaded
by manipulating tuner.oracle.trials and tuner.load_model.
By assigning tuner.oracle.trials to a variable, we can find that it is a dict object containing all relavant trials in the tuning process.
The keys of the dictionary are the trial_id, and the values of the
dictionary are the instance of the Trial object.
Alternatively, we can return the best few trials by using tuner.oracle.get_best_trials.
To inspect the hyperparameters of the trial, we can use the summary method of the instance.
To load the model, we can pass the trial instance to tuner.load_model.
Beware that different versions can lead to incompatibilities.
For example the directory structure is a little different between keras-tuner==1.0 and keras-tuner==1.1 as far as I know.
Using your example, the working flow may be summarized as follows.
# Recreate the tuner object
tuner = kt.BayesianOptimization(build_model,
objective='val_categorical_accuracy',
max_trials=10,
directory='kt_dir',
project_name='lstm_dense_bo',
overwrite=False)
# Return all trials from the oracle
trials = tuner.oracle.trials
# Print out the ID and the score of all trials
for trial_id, trial in trials.items():
print(trial_id, trial.score)
# Return best 5 trials
best_trials = tuner.oracle.get_best_trials(num_trials=5)
for trial in best_trials:
trial.summary()
model = tuner.load_model(trial)
# Do some stuff to the model
using
tuner = kt.BayesianOptimization(build_model,
objective='val_categorical_accuracy',
max_trials=10,
directory='kt_dir',
project_name='lstm_dense_bo')
will load the tuner again.

In Tensorflow-Serving, is it possible to get only the top-k prediction results?

When using the code in https://www.tensorflow.org/serving, but with a DNNClassifier Estimator model, the curl/query request returns all the possible label classes and their associated scores.
Using a model with 100,000+ possible output/label classes, the response becomes too large. Is there any way to limit the number of outputs to the top-k results? (Similar to how it can be done in keras).
The only possibility I could think of is feeding some parameter into the predict API through the signatures, but I haven't found any parameters that would give this functionality. I've read through a ton of documentation + code and googled a ton, but to no avail.
Any help would be greatly appreciated. Thanks in advance for any responses. <3
AFAIC, there are 2 ways to support your need.
You could add some lines in tensorflow-serving source code referring to this
You could do something like this while training/retraining your model.
Hope this will help.
Putting this up here in case it helps anyone. It's possible to override the classification_output() function in head.py (which is used by dnn.py) in order to filter the top-k results. You can insert this snippet into your main.py / train.py file, and whenever you save an DNNClassifier model, that model will always output at most num_top_k_results when doing inference/serving. The vast majority of the method is copied from the original classification_output() function. (Note this may or may not work with 1.13 / 2.0 as it hasn't been tested on those.)
from tensorflow.python.estimator.canned import head as head_lib
num_top_k_results = 5
def override_classification_output(scores, n_classes, label_vocabulary=None):
batch_size = array_ops.shape(scores)[0]
if label_vocabulary:
export_class_list = label_vocabulary
else:
export_class_list = string_ops.as_string(math_ops.range(n_classes))
# Get the top_k results
top_k_scores, top_k_indices = tf.nn.top_k(scores, num_top_k_results)
# Using the top_k_indices, get the associated class names (from the vocabulary)
top_k_classes = tf.gather(tf.convert_to_tensor(value=export_class_list), tf.squeeze(top_k_indices))
export_output_classes = array_ops.tile(
input=array_ops.expand_dims(input=top_k_classes, axis=0),
multiples=[batch_size, 1])
return export_output.ClassificationOutput(
scores=top_k_scores,
# `ClassificationOutput` requires string classes.
classes=export_output_classes)
# Override the original method with our custom one.
head_lib._classification_output = override_classification_output

binary classification target specifically on false positive

I got a little confused when using models from sklearn, how do I set the specific optimization functions? for example, when RandomForestClassifier is used, how do I let the model 'know' that I want to maximize 'recall' or 'F1 score'. or 'AUC' instead of 'accuracy'?
Any suggestions? Thank you.
What you are looking for is Parameter Tuning. Basically, first you select an estimator , then you define a hyper-parameter space (i.e. all possible parameters and their respective values that you want to tune), a cross validation scheme and scoring function. Now depending upon your choice of searching the parameter space, you can choose the following:
Exhaustive Grid Search
In this approach, sklearn creates a grid of all possible combination of hyper-paramter values defined by the user using the GridSearchCV method. For instance, :
my_clf = DecisionTreeClassifier(random_state=0,class_weight='balanced')
param_grid = dict(
classifier__min_samples_split=[5,7,9,11],
classifier__max_leaf_nodes =[50,60,70,80],
classifier__max_depth = [1,3,5,7,9]
)
In this case, the grid specified is a cross-product of values of classifier__min_samples_split, classifier__max_leaf_nodes and classifier__max_depth. The documentation states that:
The GridSearchCV instance implements the usual estimator API: when “fitting” it on a dataset all the possible combinations of parameter values are evaluated and the best combination is retained.
An example for using GridSearch :
#Create a classifier
clf = LogisticRegression(random_state = 0)
#Cross-validate the dataset
cv=StratifiedKFold(n_splits=n_splits).split(features,labels)
#Declare the hyper-parameter grid
param_grid = dict(
classifier__tol=[1.0,0.1,0.01,0.001],
classifier__C = np.power([10.0]*5,list(xrange(-3,2))).tolist(),
classifier__solver =['newton-cg', 'lbfgs', 'liblinear', 'sag'],
)
#Perform grid search using the classifier,parameter grid, scoring function and the cross-validated dataset
grid_search = GridSearchCV(clf, param_grid=param_grid, verbose=10,scoring=make_scorer(f1_score),cv=list(cv))
grid_search.fit(features.values,labels.values)
#To get the best score using the specified scoring function use the following
print grid_search.best_score_
#Similarly to get the best estimator
best_clf = grid_logistic.best_estimator_
print best_clf
You can read more about it's documentation here to know about the various internal methods, etc. to retrieve the best parameters, etc.
Randomized Search
Instead of exhaustively checking for the hyper-parameter space, sklearn implements RandomizedSearchCV to do a randomized search over the paramters. The documentation states that:
RandomizedSearchCV implements a randomized search over parameters, where each setting is sampled from a distribution over possible parameter values.
You can read more about it from here.
You can read more about other approaches here.
Alternative link for reference:
How to Tune Algorithm Parameters with Scikit-Learn
What is hyperparameter optimization in machine learning in formal terms?
Grid Search for hyperparameter and feature selection
Edit: In your case, if you want to maximize the recall for the model, you simply specify recall_score from sklearn.metrics as the scoring function.
If you wish to maximize 'False Positive' as stated in your question, you can refer this answer to extract the 'False Positives' from the confusion matrix. Then use the make scorer function and pass it to the GridSearchCV object for tuning.
I would suggest you grab a cup of coffee and read (and understand) the following
http://scikit-learn.org/stable/modules/model_evaluation.html
You need to use something along the lines of
cross_val_score(model, X, y, scoring='f1')
possible choices are (check the docs)
['accuracy', 'adjusted_mutual_info_score', 'adjusted_rand_score',
'average_precision', 'completeness_score', 'explained_variance',
'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted',
'fowlkes_mallows_score', 'homogeneity_score', 'mutual_info_score',
'neg_log_loss', 'neg_mean_absolute_error', 'neg_mean_squared_error',
'neg_mean_squared_log_error', 'neg_median_absolute_error',
'normalized_mutual_info_score', 'precision', 'precision_macro',
'precision_micro', 'precision_samples', 'precision_weighted', 'r2',
'recall', 'recall_macro', 'recall_micro', 'recall_samples',
'recall_weighted', 'roc_auc', 'v_measure_score']
Have fun
Umberto

Reinforcement learning a3c with multiple independent outputs

I am attempting to modify and implement googles pattern of the Asynchronous Advantage Actor Critic (A3C) model. There are plenty of examples online out there that have gotten me started but I am running into a issues attempting to expand the samples.
All of the examples I can find focus on pong as the example which has a state based output of left or right or stay still. What I am trying to expand this to is a system that also has a separate on off output. In the context of pong, it would be a boost to your speed.
The code I am basing my code on can be found here. It is playing doom, but it still has the same left and right but also a fire button instead of stay still. I am looking at how I could modify this code such that fire was an independent action from movement.
I know I can easily add another separate output from the model so that the outputs would look something like this:
self.output = slim.fully_connected(rnn_out,a_size,
activation_fn=tf.nn.softmax,
weights_initializer=normalized_columns_initializer(0.01),
biases_initializer=None)
self.output2 = slim.fully_connected(rnn_out,1,
activation_fn=tf.nn.sigmoid,
weights_initializer=normalized_columns_initializer(0.01),
biases_initializer=None)
The thing I am struggling with is how then do I have to modify the value output and redefine the loss function. The value is still tied to the combination of the two outputs. Or is there a separate value output for each of the independent output. I feel like it should still only be one output as the value, but I am unsure how I them use that one value and modify the loss function to take this into account.
I was thinking of adding a separate term to the loss function so that the calculation would look something like this:
self.actions_1 = tf.placeholder(shape=[None],dtype=tf.int32)
self.actions_2 = tf.placeholder(shape=[None],dtype=tf.float32)
self.actions_onehot = tf.one_hot(self.actions_1,a_size,dtype=tf.float32)
self.target_v = tf.placeholder(shape=[None],dtype=tf.float32)
self.advantages = tf.placeholder(shape=[None],dtype=tf.float32)
self.responsible_outputs = tf.reduce_sum(self.output1 * self.actions_onehot, [1])
self.responsible_outputs_2 = tf.reduce_sum(self.output2 * self.actions_2, [1])
#Loss functions
self.value_loss = 0.5 * tf.reduce_sum(tf.square(self.target_v - tf.reshape(self.value,[-1])))
self.entropy = - tf.reduce_sum(self.policy * tf.log(self.policy))
self.policy_loss = -tf.reduce_sum(tf.log(self.responsible_outputs)*self.advantages) -
tf.reduce_sum(tf.log(self.responsible_outputs_2)*self.advantages)
self.loss = 0.5 * self.value_loss + self.policy_loss - self.entropy * 0.01
I am looking to know if I am on the right track here, or if there are resources or examples that I can expand off of.
First of all, the example you are mentioning don't need two output nodes. One output node with continuous output value is enough to solve. Also you should't use placeholder for advantage, but rather you should use for discounted reward.
self.discounted_reward = tf.placeholder(shape=[None],dtype=tf.float32)
self.advantages = self.discounted_reward - self.value
Also while calculating the policy loss you have to use tf.stop_gradient to prevent the value node gradient feedback contribution for policy learning.
self.policy_loss = -tf.reduce_sum(tf.log(self.responsible_outputs)*tf.stop_gradient(self.advantages))

Selective patterns with Matplotlib imshow without using patches

Is there a way to place patterns into selected areas on an imshow graph? To be precise, I need to make it so that, in addition to the numerical-data-carrying colored squares, I also have different patterns in other squares indicate different failure modes for the experiment (and also generate a key explaining the meaning of these different patterns). An example of a pattern that would be useful would be various types of crosshatches. I need to be able to do this without disrupting the main color-numerical data relationship on the graph.
Due to the back-end I am working within for the GUI containing the graph, I cannot utilize patches (they fail to pickle and make it from the back-end to the front-end via the multiprocessing package). I was wondering if anyone knew of another way to do this.
grid = np.ma.array(grid, mask=np.isnan(grid))
ax.imshow(grid, interpolation='nearest', aspect='equal', vmax = private.vmax, vmin = private.vmin)
# Up to here works fine and draws the graph showing only the data with white spaces for any point that failed
if show_fail and faildat != []:
faildat = faildat[np.lexsort((faildat[:,yind],faildat[:,xind]))]
fails = []
for i in range(len(faildat)): #gives coordinates with failures as (x,y)
fails.append((faildat[i,1],faildat[i,0]))
for F in fails:
ax.FUNCTION NEEDED HERE
ax.minorticks_off()
ax.set_xticks(range(len(placex)))
ax.set_yticks(range(len(placey)))
ax.set_xticklabels(placex)
ax.set_yticklabels(placey, rotation = 0)
ax.colorbar()
ax.show()