How do I use CNTK Trainer function without eval (just loss criteria)? - cntk

I normally call trainer with criteria that includes both loss and evaluation criteria, e.g.,
my_trainer = Trainer(out, (loss, label_error), [learner])
However, when I tried to call it with loss (without evaluation criteria):
my_trainer = Trainer(out, loss, [learner])
I got an error:
ValueError: not enough values to unpack (expected 2, got 1)
Is there a way to train without defining evaluation criteria?

You can do:
my_trainer = Trainer(out, (loss, None),[learner])
If you checkout the GAN tutorial, it will have an example.
https://github.com/Microsoft/CNTK/blob/master/Tutorials/CNTK_206_Basic_GAN.ipynb

Related

Higher loss penalty for true non-zero predictions

I am building a deep regression network (CNN) to predict a (1000,1) target vector from images (7,11). The target usually consists of about 90 % zeros and only 10 % non-zero values. The distribution of (non-) zero values in the targets vary from sample to sample (i.e. there is no global class imbalance).
Using mean sqaured error loss, this led to the network predicting only zeros, which I don't find surprising.
My best guess is to write a custom loss function that penalizes errors regarding non-zero values more than the prediction of zero-values.
I have tried this loss function with the intend to implement what I have guessed could work above. It is a mean squared error loss in which the predictions of non-zero targets are penalized less (w=0.1).
def my_loss(y_true, y_pred):
# weights true zero predictions less than true nonzero predictions
w = 0.1
y_pred_of_nonzeros = tf.where(tf.equal(y_true, 0), y_pred-y_pred, y_pred)
return K.mean(K.square(y_true-y_pred_of_nonzeros)) + K.mean(K.square(y_true-y_pred))*w
The network is able to learn without getting stuck with only-zero predictions. However, this solution seems quite unclean. Is there a better way to deal with this type of problem? Any advice on improving the custom loss function?
Any suggestions are welcome, thank you in advance!
Best,
Lukas
Not sure there is anything better than a custom loss just like you did, but there is a cleaner way:
def weightedLoss(w):
def loss(true, pred):
error = K.square(true - pred)
error = K.switch(K.equal(true, 0), w * error , error)
return error
return loss
You may also return K.mean(error), but without mean you can still profit from other Keras options like adding sample weights and other things.
Select the weight when compiling:
model.compile(loss = weightedLoss(0.1), ...)
If you have the entire data in an array, you can do:
w = K.mean(y_train)
w = w / (1 - w) #this line compesates the lack of the 90% weights for class 1
Another solution that can avoid using a custom loss, but requires changes in the data and the model is:
Transform your y into a 2-class problem for each output. Shape = (batch, originalClasses, 2).
For the zero values, make the first of the two classes = 1
For the one values, make the second of the two classes = 1
newY = np.stack([1-oldY, oldY], axis=-1)
Adjust the model to output this new shape.
...
model.add(Dense(2*classes))
model.add(Reshape((classes,2)))
model.add(Activation('softmax'))
Make sure you are using a softmax and a categorical_crossentropy as loss.
Then use the argument class_weight={0: w, 1: 1} in fit.

How to evaluate the value of a tensor, from inside the model function of a custom tf.estimator

I am implementing an NLP model based on BERT, using tf.TPUEstimator(). I want to implement layer-wise training, where I need to select only one layer of the model to train for each epoch. In order to do this I wanted to change my model_fn and get the value of current_epoch.
I know how to compute the value of current_epoch as a tensor using tf.train.get_or_create_global_step() inside the model_fn BUT, I need to evaluate the value of this tensor to select which layer to train and implement return the correct train_op to the tf.estimator (train_op pertaining to a single layer chosen accrding to the value of the current_epoch).
I am unable to evaluate this tensor (current_epoch / global_step) from inside the model_fn. I tried the following but the training hangs at the step my_sess.run(my_global_step.initializer
global_step = tf.train.get_or_create_global_step()
graph = tf.get_default_graph()
my_sess = tf.Session(graph=graph)
current_epoch = (global_step * full_bs) // train_size
my_sess.run(my_global_step.initializer)
current_epoch = sess.run(current_epoch)
# My program hangs at the initialising step: my_sess.run(my_global_step.initializer)
Is there any way to evaluate a tensor using the tf.Estimators default session? How do I get the default session/ Graph?
Most importantly what is wrong in my code and why does the training hang when using tpu's and TPUEstimator?
This is not direct answer to OP's 2nd question, it is answer to the title.
I managed to print variable value with get_variable_value, but not sure if this is optimal way.
with
estimator = tf.contrib.tpu.TPUEstimator(
# ...
)
out = estimator.get_variable_value('output_bias')
print(type(out))
print(out)
I got
<class 'numpy.ndarray'>
[-0.00107745 0.00107744]

tf.Estimator.predict() issue when using a Tensorflow Hub module as the basis of a custom tf.Estimator

I am trying to create a custom tensorflow tf.Estimator. In the model_fn passed to the tf.Estimator, I am importing the Inception_V3 module from Tensorflow Hub.
Problem: After fine-tuning the model (using tf.Estimator.train), the results obtained using tf.Estimator.predict are not as good as expected based on tf.Estimator.evaluate (This is for a regression problem.)
I am new to Tensorflow and Tensorflow Hub, so I could be making lots of rookie mistakes.
When I run tf.Estimator.evaluate() on my validation data, the reported loss is in the same ball park as the loss after tf.Estimator.train() was used to train the model. The problem comes in when I try to use tf.Estimator.predict() on the same validation data.
tf.Estimator.predict() returns predictions which I then use to calculate the same loss metric (mean_squared_error) which is computed by tf.Estimator.evaluate(). I am using the same set of data to feed to the predict function as the evaluate function. But I do not get the same result for the mean_squared_error -- not remotely close! (The mse I calculate from predict is much worse.)
Here is what I have done (edited out some details)...
Define a model_fn with Tensorflow Hub module. Then call the tf.Estimator functions to train, evaluate and predict.
def my_model_fun(features, labels, mode, params):
# Load InceptionV3 Module from Tensorflow Hub
iv3_module =hub.Module("https://tfhub.dev/google/imagenet/inception_v3/feature_vector/1",trainable=True, tags={'train'})
# Gather the variables for fine-tuning
var_list = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,scope='CustomeLayer')
var_list.extend(tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,scope='module/InceptionV3/Mixed_5b'))
predictions = {"the_prediction" : final_output}
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)
# Define loss, optimizer, and evaluation metrics
loss = tf.losses.mean_squared_error(labels=labels, predictions=final_output)
optimizer =tf.train.AdadeltaOptimizer(learning_rate=learn_rate).minimize(loss,
var_list=var_list, global_step=tf.train.get_global_step())
rms_error = tf.metrics.root_mean_squared_error(labels=labels,predictions=predictions["the_prediction"])
eval_metric_ops = {"rms_error": rms_error}
if mode == tf.estimator.ModeKeys.TRAIN:
return tf.estimator.EstimatorSpec(mode=mode, loss=loss,train_op=optimizer)
if mode == tf.estimator.ModeKeys.EVAL:
tf.summary.scalar('rms_error', rms_error)
return tf.estimator.EstimatorSpec(mode=mode, loss=loss,eval_metric_ops=eval_metric_ops)
iv3_estimator = tf.estimator.Estimator(model_fn=iv3_model_fn)
iv3_estimator.train(input_fn=train_input_fn, steps=TRAIN_STEPS)
iv3_estimator.evaluate(input_fn=val_input_fn)
ii =0
for ans in iv3_estimator.predict(input_fn=test_input_fn):
sqErr = np.square(label[ii] - ans['the_prediction'][0])
totalSqErr += sqErr
ii += 1
mse = totalSqErr/ii
I expect that the mse loss reported by tf.Estimator.evaluate() should be the same as the when I calculate mse from the known labels and the output of tf.Estimator.predict()
Do I need to import the Tensorflow Hub model differently when I use predict? (use trainable=False in the call to hub.Module()?
Are the weights obtained from training being used when tf.Estimator.evaluate() runs, but not when tf.Estimator.predict()- runs?
other?
There's a few things that seem to be missing from the code snippet. How is final_output computed from iv3_module? Also, mean squared error is an unusual choice of loss function for a classification problem; the common approach is to pass image features from the module into a a linear output layer with scores for each class ("logits") and a "softmax cross-entropy loss". For an explanation of these terms, you can review online tutorials like https://developers.google.com/machine-learning/crash-course/ (all the way to multi-class neural nets).
Regarding TF-Hub technicalities:
The variables of a Hub module are automatically added to the GLOBAL_VARIABLES and TRAINABLE_VARIABLES collections (if trainable=True, as you already do). No manual extension of those collections should be needed.
hub.Module(..., tags=...) should be set to {"train"} for mode==TRAIN and set to None or the empty set otherwise.
In general, it's useful to get a solution working end-to-end for your problem without fine-tuning as a baseline, and then add fine-tuning.

Get values of tensors in loss function

I would like to get the values of the y_pred and y_true tensors of this keras backend function. I need this to be able to perform some custom calculations and change the loss, these calculations are just possible with the real array values.
def mean_squared_error(y_true, y_pred):
#some code here
return K.mean(K.square(y_pred - y_true), axis=-1)
There is a way to do this in keras? Or in any other ML framework (tf, pytorch, theano)?
No, in general you can't compute the loss that way, because Keras is based on frameworks that do automatic differentiation (like Theano, TensorFlow) and they need to know which operations you are doing in between in order to compute the gradients of the loss.
You need to implement your loss computations using keras.backend functions, else there is no way to compute gradients and optimization won't be possible.
Try including this within the loss function:
y_true = keras.backend.print_tensor(y_true, message='y_true')
Following is an excerpt from the Keras documentation (https://keras.io/backend/):
print_tensor
keras.backend.print_tensor(x, message='')
Prints message and the tensor value when evaluated.
Note that print_tensor returns a new tensor identical to x which should be used in the later parts of the code. Otherwise, the print operation is not taken into account during evaluation.

Tensorflow: train and test in separate functions

I am trying to use a Tensorflow model in two separate functions: one that trains it, and one used to test it. For example, the training function looks something like this:
graph = tf.Graph()
with graph.as_default():
tf_dataset = tf.placeholder(tf.float32, shape=(None, num_dims))
...
weights = tf.Variable(tf.truncated_normal([num_dims, num_labels]))
...
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
prediction = tf.nn.softmax(logits)
...
session = tf.Session(graph=graph)
...
The other, evaluation function would just use prediction with the test data, like so:
session.run(prediction, feed_dict={tf_dataset: test_data})
The problem is, of course, that tf_dataset is not in the scope of the other function. I am fine with returning session and prediction from the training function, but having to share every single placeholder with the evaluation code seems a bit lame.
Is there a way to get the references somehow, from the session or the graph? Also, are there any good practices on how to separate training and evaluation code in Tensorflow?
You could give your placeholders unique names and use that. IE,
tf_dataset = tf.placeholder(tf.float32, shape=(None, num_dims), name="datainput")
...
sess.run(..., feed_dict={"datainput:0": mydata})
You can also get names/type pairs for all ops in your graph, so you could recover all the placeholder tensor names that way
[(op.name+":0", op.op_def.name) for op in graph.get_operations()]