Hyperopt best param score interpretation and usage - xgboost

Have a question regarding the Hyperopt Score output and best param output. After running the best_param (as shown in the example), it generated best loss: -0.0921409214092141 with the final score being 0.06775067750677506. However, when I used the best param generated to fit, it generated a score of 0.06775067750677506.
Shouldn't i be expecting 0.0921409214092141 or am i interpreting it wrongly? I passed the parameter dictionary as XGBClassifier(**param_dict) when fitting with the param generated from Hyperopt.
best_params = fmin(fn = objective,
space = space,
algo = tpe.suggest,
max_evals = 100,
trials = trials,
rstate=np.random.default_rng(0))
...
SCORE:
0.07046070460704607
SCORE:
0.08130081300813008
SCORE:
0.06775067750677506
100%|██████| 100/100 [04:42<00:00, 2.83s/trial, best loss: -0.0921409214092141]
print(best_params)
{'colsample_bytree': 0.8945730049917332, 'eta': 0.05817033918345017, 'gamma': 4.954333064013377, 'max_depth': 6.0, 'min_child_weight': 5.0, 'n_estimators': 258.0, 'reg_alpha': 5.0, 'reg_lambda': 8.0, 'subsample': 0.9}
thanks in advance

Related

Hyperparameter tuning with XGBRanker

I am trying to figure how to tune my hyperparameter through RandomizedSearchCV with an XGBRanker model.
I could split the data into groups, feed it into the model and make predictions. However I am not sure how to set up the Search object, namely 2 specific things - how to inform it about the groups and also what kind of score I need to supply.
model = xg.XGBRanker(
tree_method='exact',
booster='gbtree',
objective='rank:pairwise',
random_state=42,
learning_rate=0.06,
max_depth=5,
n_estimators=700,
subsample=0.75,
#colsample_bytree=0.9,
#subsample=0.75
min_child_weight=0.06
)
model.fit(x_train, y_train, group=train_groups, verbose=True)
This works fine.
This is where I need some help
param_dist = {'n_estimators': stats.randint(40, 1000),
'learning_rate': stats.uniform(0.01, 0.59),
'subsample': stats.uniform(0.3, 0.6),
'max_depth': [3, 4, 5, 6, 7, 8, 9],
'colsample_bytree': stats.uniform(0.5, 0.4),
'min_child_weight': [0.05, 0.1, 0.02]
}
clf = RandomizedSearchCV(model,
param_distributions=param_dist,
cv=5,
n_iter=5,
scoring=???, #
error_score=0,
verbose=3,
n_jobs=-1)
#also what about the groups?
i had tried something similar. for scoring however i used the ndcg_scorer from sklearn.
i added
scoring = sklearn.metrics.make_scorer(sklearn.metrics.ndcg_score, greater_is_better=True)
for groups u can add to the fit_params in RandomizedSearchCV.
fit_params = {"model__groups": group}
clf = RandomizedSearchCV(model,
param_distributions=param_dist,
cv=5,
n_iter=5,
scoring=scoring,
error_score=0,
verbose=3,
n_jobs=-1,fit_params = fit_params)

XGBoost gives nondeterministic predictions in version <1.0. for INT input

I've trained a classification model in 0.9:
param = {
'objective': 'multi:softprob',
'num_class': 9,
'booster': 'dart',
'eta': 0.3,
'gamma': 0,
'max_depth': 6,
'alpha': 0,
'lambda': 1,
'colsample_bylevel':0.8,
'colsample_bynode': 0.8,
'colsample_bytree': 0.8,
'normalize_type': 'tree',
'rate_drop': 1.0,
'min_child_weight': 5,
'subsample': 0.5,
'num_parallel_tree': 1,
'tree_method': 'approx'
}
model = xgb.train(param, D_train, num_boost_round=1)
model.save_model('./model.bst')
Here is the sample training data:
{f1:1, f2:1, label:"1"}
Both f1 and f2 are integers.
During prediction, the results are nondeterministic with the same input. Sometimes (~1 out of 10 times) it gives equal probability for every output class.
This issue is gone when switching to XGB 1.0: Use XGB 1.0 to make predictions on a model trained in 0.9.
imported_model = xgb.Booster(model_file='./model.bst')
encoded=[[[0, 0]]
prediction_input = xgb.DMatrix(
np.array(encoded).reshape(2, -1), missing=None)
for i in range(1000):
outputs = imported_model.predict(prediction_input)
print(outputs)
Does anyone know the root cause?

Run prediction from saved model in tensorflow 2.0

I have a saved model (a directory with model.pd and variables) and wanted to run predictions on a pandas data frame.
I've unsuccessfully tried a few ways to do this:
Attempt 1: Restore the estimator from the saved model
estimator = tf.estimator.LinearClassifier(
feature_columns=create_feature_cols(),
model_dir=path,
warm_start_from=path)
Where path is the directory that has a model.pd and variables folder. I got an error
ValueError: Tensor linear/linear_model/dummy_feature1/weights is not found in
gs://bucket/Trainer/output/2013/20191008T170504.583379-63adee0eaee0/serving_model_dir/export/1570554483/variables/variables
checkpoint {'linear/linear_model/dummy_feature1/weights': [1, 1], 'linear/linear_model/dummy_feature2/weights': [1, 1]
}
Attempt 2: Run prediction directly from the saved model by running
imported = tf.saved_model.load(path) # path is the directory that has a `model.pd` and variables folder
imported.signatures["predict"](example)
But has not successfully passed the argument - looks like the function is looking for a tf.example and I am not sure how to convert a data frame to tf.example.
My attempt to convert is below but got an error that df[f] is not a tensor:
for f in features:
example.features.feature[f].float_list.value.extend(df[f])
I've seen solutions on StackOverflow but they are all tensorflow 1.14. Greatly appreciate it if someone can help with tensorflow 2.0.
Considering you have your saved model present like this:
my_model
assets saved_model.pb variables
You can load your saved model using:
new_model = tf.keras.models.load_model('saved_model/my_model')
# Check its architecture
new_model.summary()
To perform prediction on a DataFrame you need to:
Wrap scalars into a list so as to have a batch dimension (models only process batches of data, not single samples)
Call convert_to_tensor on each feature
Example 1:
If you have values for the first test row as
sample = {
'Type': 'Cat',
'Age': 3,
'Breed1': 'Tabby',
'Gender': 'Male',
'Color1': 'Black',
'Color2': 'White',
'MaturitySize': 'Small',
'FurLength': 'Short',
'Vaccinated': 'No',
'Sterilized': 'No',
'Health': 'Healthy',
'Fee': 100,
'PhotoAmt': 2,
}
input_dict = {name: tf.convert_to_tensor([value]) for name, value in sample.items()}
predictions = new_model.predict(input_dict)
prob = tf.nn.sigmoid(predictions[0])
print(
"This particular pet had a %.1f percent probability "
"of getting adopted." % (100 * prob)
)
Example 2:
Or if you have multiple rows present in the same order as the train data
predict_dataset = tf.convert_to_tensor([
[5.1, 3.3, 1.7, 0.5,],
[5.9, 3.0, 4.2, 1.5,],
[6.9, 3.1, 5.4, 2.1]
])
# training=False is needed only if there are layers with different
# behavior during training versus inference (e.g. Dropout).
predictions = new_model(predict_dataset, training=False)
for i, logits in enumerate(predictions):
class_idx = tf.argmax(logits).numpy()
p = tf.nn.softmax(logits)[class_idx]
name = class_names[class_idx]
print("Example {} prediction: {} ({:4.1f}%)".format(i, name, 100*p))

How to implement Gaussian Mixture for VAE?

I feel like I don't really know what I'm doing so I will describe what I think I'm doing and what I want to do and where that fails.
Given a normal variational autoencoder:
...
net = tf.layers.dense(net, units=code_size * 2, activation=None)
mean = net[:, :code_size]
std = net[:, code_size:]
posterior = tfd.MultivariateNormalDiagWithSoftplusScale(mean, std)
net = posterior.sample()
net = tf.layers.dense(net, units=input_size, ...)
...
What I think I'm doing: Let the neural network find a "mean" and "std" value and use it to create a Normal distribution (Gaussian).
Sample from that distribution and use that for the decoder.
In other words: learn a Gaussian distribution of the encoding
Now I would like to do the same for a mixture of Gaussians.
...
net = tf.layers.dense(net, units=code_size * 2 * code_size, activation=None)
means, stds = tf.split(net, 2, axis=-1)
means = tf.split(means, code_size, axis=-1)
stds = tf.split(stds, code_size, axis=-1)
components = [tfd.MultivariateNormalDiagWithSoftplusScale(means[i], stds[i]) for i in range(code_size)]
probs = [1.0 / code_size] * code_size
gauss_mix = tfd.Mixture(cat=tfd.Categorical(probs=probs), components=components)
net = gauss_mix.sample()
net = tf.layers.dense(net, units=input_size, ...)
...
That seemed relatively straight forward for me except that it fails with the following error:
Shapes () and (?,) are not compatible
This seems to come from probs that doesn't have the batch dimension (I didn't thought it would need that).
I thought that probs defines the probability between the components.
If I define a probs that also has the batch dimension I get the following cryptic error I don't know what it should mean:
Dimension -1796453376 must be >= 0
Do I generally misunderstand some concepts?
Or what do I need to do differently?

TPU slower than GPU?

I just tried using TPU in Google Colab and I want to see how much TPU is faster than GPU. I got surprisingly the opposite result.
The following is the NN.
random_image = tf.random_normal((100, 100, 100, 3))
result = tf.layers.conv2d(random_image, 32, 7)
result = tf.reduce_sum(result)
Performance results:
CPU: 8s
GPU: 0.18s
TPU: 0.50s
I wonder why.... The complete code for TPU is as follows:
def calc():
random_image = tf.random_normal((100, 100, 100, 3))
result = tf.layers.conv2d(random_image, 32, 7)
result = tf.reduce_sum(result)
return result
tpu_ops = tf.contrib.tpu.batch_parallel(calc, [], num_shards=8)
session = tf.Session(tpu_address)
try:
print('Initializing global variables...')
session.run(tf.global_variables_initializer())
print('Warming up...')
session.run(tf.contrib.tpu.initialize_system())
print('Profiling')
start = time.time()
session.run(tpu_ops)
end = time.time()
elapsed = end - start
print(elapsed)
finally:
session.run(tf.contrib.tpu.shutdown_system())
session.close()
Benchmarking devices properly is hard, so please take everything you learn from these examples with a grain of salt. It's better in general to compare specific models you are interested in (e.g. running an ImageNet network) to understand performance differences. That said, I understand it's fun to do this, so...
Larger models will illustrate the TPU and GPU performance better. Your example also is including the compilation time in the cost of the TPU call: every call after the first for a given program and shape will be cached, so you will want to tpu_ops once before starting the timer unless you want to capture the compilation time.
Currently each call to a TPU function copies the weights to the TPU before it can start running, this affects small operations more significantly. Here's an example that runs a loop on the TPU before returning to the CPU, with the following outputs.
1 0.010800600051879883
10 0.09931182861328125
100 0.5581905841827393
500 2.7688047885894775
. So you can actually run 100 iterations of this function in 0.55s.
import os
import time
import tensorflow as tf
def calc(n):
img = tf.random_normal((128, 100, 100, 3))
def body(_):
result = tf.layers.conv2d(img, 32, 7)
result = tf.reduce_sum(result)
return result
return tf.contrib.tpu.repeat(n[0], body, [0.0])
session = tf.Session('grpc://' + os.environ['COLAB_TPU_ADDR'])
try:
print('Initializing TPU...')
session.run(tf.contrib.tpu.initialize_system())
for i in [1, 10, 100, 500]:
tpu_ops = tf.contrib.tpu.batch_parallel(calc, [[i] * 8], num_shards=8)
print('Warming up...')
session.run(tf.global_variables_initializer())
session.run(tpu_ops)
print('Profiling')
start = time.time()
session.run(tpu_ops)
end = time.time()
elapsed = end - start
print(i, elapsed)
finally:
session.run(tf.contrib.tpu.shutdown_system())
session.close()