How do I get value function/critic values from Rllib's PPO algorithm for a range of observations?

Goal: I want to train a PPO agent on a problem and determine its optimal value function for a range of observations. Later I plan to work with this value function (economic inequality research). The problem is sufficiently complex so that dynamic programming techniques no longer work.
Approach: In order to check, whether I get correct outputs for the value function, I have trained PPO on a simple problem, whose analytical solution is known. However, the results for the value function are rubbish, which is why I suspect that I have done sth wrong.
The code:
from keras import backend as k_util
parser = argparse.ArgumentParser()
# Define framework to use
choices=["tf", "tf2", "tfe", "torch"],
help="The DL framework specifier.",
def get_rllib_config(seeds, debug=False, framework="tf") -> Dict:
def get_value_function(agent, min_state, max_state):
policy = agent.get_policy()
value_function = []
for i in np.arange(min_state, max_state, 1):
model_out, _ = policy.model({"obs": np.array([[i]], dtype=np.float32)})
value = k_util.eval(policy.model.value_function())[0]
print(i, value)
return value_function
def train_schedule(config, reporter):
rllib_config = config["config"]
iterations = rllib_config.pop("training_iteration", 10)
agent = PPOTrainer(env=rllib_config["env"], config=rllib_config)
for _ in range(iterations):
result = agent.train()
values = get_value_function(agent, 0, 100)
resources = PPO.default_resource_request(exp_config)
tune_analysis = tune.Tuner(tune.with_resources(train_schedule, resources=resources), param_space=exp_config).fit()
So first I get the policy (policy = agent.get_policy()) and run a forward pass with each of the 100 values (model_out, _ = policy.model({"obs": np.array([[i]], dtype=np.float32)})). Then, after each forward pass I use the value_function() method to get the output of the critic network and evaluate the tensor via keras backend.
The results:
True VF (analytical solution)
VF output of Rllib
Unfortunately you can see that the results are not that promising. Maybe I have missed a pre- or postprocessing step? Does the value_function() method even return the last layer of the critic network?
I am very grateful for any help!

It's not part of your script, but I assume that you have trained the policy before you attempt to get useful values out of it.
You are correct in assuming that the value_function() returns the output of the last layer of the critic network in RLlib's implementations.
Have a look at the value function metrics to see if it's actually learning anything (RLlib logs .../learner_stats/vf_loss and .../learner_stats/vf_explained_var)!
After training the model, I'd also try to query the model directly. If that looks better, something is likely off with the code you posted here.


TF object detection: return subset of inference payload

I'm working on training and deploying an instance segmentation model using TF's object detection API. I'm able to successfully train the model, package it into a TF Serving Docker image (latest tag as of Oct 2020), and process inference requests via the REST interface. However, the amount of data returned from an inference request is very large (hundreds of Mb). This is a big problem when the inference request and processing don't happen on the same machine because all that returned data has to go over the network.
Is there a way to trim down the number of outputs (either during model export or within the TF Serving image) so allow faster round trip times during inference?
I'm using TF OD API (with TF2) to train a Mask RCNN model, which is a modified version of this config. I believe the full list of outputs is described in code here. The list of items I get during inference is also pasted below. For a model with 100 object proposals, that information is ~270 Mb if I just write the returned inference as json to disk.
dict_keys(['detection_masks', 'rpn_features_to_crop', 'detection_anchor_indices', 'refined_box_encodings', 'final_anchors', 'mask_predictions', 'detection_classes', 'num_detections', 'rpn_box_predictor_features', 'class_predictions_with_background', 'proposal_boxes', 'raw_detection_boxes', 'rpn_box_encodings', 'box_classifier_features', 'raw_detection_scores', 'proposal_boxes_normalized', 'detection_multiclass_scores', 'anchors', 'num_proposals', 'detection_boxes', 'image_shape', 'rpn_objectness_predictions_with_background', 'detection_scores'])
I already encode the images within my inference requests as base64, so the request payload is not too large when going over the network. It's just that the inference response is gigantic in comparison. I only need 4 or 5 of the items out of this response, so it'd be great to exclude the rest and avoid passing such a large package of bits over the network.
Things I've tried
I've tried setting the score_threshold to a higher value during the export (code example here) to reduce the number of outputs. However, this seems to just threshold the detection_scores. All the extraneous inference information is still returned.
I also tried just manually excluding some of these inference outputs by adding the names of keys to remove here. That also didn't seem to have any effect, and I'm worried this is a bad idea because some of those keys might be needed during scoring/evaluation.
I also searched here and on tensorflow/models repo, but I wasn't able to find anything.
I was able to find a hacky workaround. In the export process (here), some of the components of the prediction dict are deleted. I added additional items to the non_tensor_predictions list, which contains all keys that will get removed during the postprocess step. Augmenting this list cut down my inference outputs from ~200MB to ~12MB.
Full code for the if self._number_of_stages == 3 block:
if self._number_of_stages == 3:
non_tensor_predictions = [
k for k, v in prediction_dict.items() if not isinstance(v, tf.Tensor)]
# Add additional keys to delete during postprocessing
non_tensor_predictions = non_tensor_predictions + ['raw_detection_scores', 'detection_multiclass_scores', 'anchors', 'rpn_objectness_predictions_with_background', 'detection_anchor_indices', 'refined_box_encodings', 'class_predictions_with_background', 'raw_detection_boxes', 'final_anchors', 'rpn_box_encodings', 'box_classifier_features']
for k in non_tensor_predictions:'Removing {0} from prediction_dict'.format(k))
return prediction_dict
I think there's a more "proper" way to deal with this using signature definitions during the creation of the TF Serving image, but this worked for a quick and dirty fix.
I've ran into the same problem. In the exporter_main_v2 code there is stated that the outputs should be:
and the following output nodes returned by the model.postprocess(..):
* `num_detections`: Outputs float32 tensors of the form [batch]
that specifies the number of valid boxes per image in the batch.
* `detection_boxes`: Outputs float32 tensors of the form
[batch, num_boxes, 4] containing detected boxes.
* `detection_scores`: Outputs float32 tensors of the form
[batch, num_boxes] containing class scores for the detections.
* `detection_classes`: Outputs float32 tensors of the form
[batch, num_boxes] containing classes for the detections.
I've submitted an issue on the tensorflow object detection github repo, I hope we will get feedback from the tensorflow dev team.
The github issue can be found here
If you are using file to export your model, you can try this hack way to solve this problem.
Just add following codes in the function _run_inference_on_images of file:
detections[classes_field] = (
tf.cast(detections[classes_field], tf.float32) + label_id_offset)
############# START ##########
ignored_model_output_names = ["raw_detection_boxes", "raw_detection_scores"]
for key in ignored_model_output_names:
if key in detections.keys(): del detections[key]
############# END ##########
for key, val in detections.items():
detections[key] = tf.cast(val, tf.float32)
Therefore, the generated model will not output the values of ignored_model_output_names.
Please let me know if this can solve your problem.
Another approach would be to alter the signatures of the saved model:
model = tf.saved_model.load(path.join("models", "efficientdet_d7_coco17_tpu-32", "saved_model"))
infer = model.signatures["serving_default"]
outputs = infer.structured_outputs
for o in ["raw_detection_boxes", "raw_detection_scores"]:
signatures={"serving_default" : infer},

model.evaluate() returns different value for same metric depending on if it is returned as the loss or as a metric

I compiled and trained a model like so:
model.compile(optimizer=opt, loss=pixelwise_weighted_binary_crossentropy, metrics=[pixelwise_weighted_binary_crossentropy, dice_coef, dice_loss])
Now during evaluation I get different values for loss_weighted_cross_entropy_value_1 and weighted_cross_entropy_value_2, when running:
(loss_weighted_cross_entropy_value_1, weighted_cross_entropy_value_2, dice_value, dice_loss_value) = model.evaluate(data_generator)
Here, weighted_cross_entropy_value_2 returns the value I expect (same value as during training, when running on the validation dataset), but loss_weighted_cross_entropy_value_1 seems to randomly fluctuate around that value, depending on batch-size.
If I had to wager a guess, it seems as if loss_weighted_cross_entropy_value_1 is the value for only the last batch of the evaluation data. Whereas weighted_cross_entropy_value_2 is the averaged value across all batches of the evaluation data.
Is this correct or is what is going on here?
I now ran the evaluation on each batch individually by getting them from the generator first and feeding them to model.evaluate(...) as numpy arrays (see code below). Averaging over the batch-results of loss_weighted_cross_entropy_val_1 and weighted_cross_entropy_val_2 gives the same result in this case:
Averaged loss_weighted_cross_entropy_val_1 - per-sample pass: 0.08109399276593375; std: 0.005511607824946092
Averaged weighted_cross_entropy_val_2 - per-sample pass: 0.08109399271862848; std: 0.005511607193872294
I see this as further indication for my interpretation above.
nr_of_samples = len(data_generator)
result = nr_of_samples * [None]
loss_weighted_cross_entropy_val_1 = np.zeros(nr_of_samples)
weighted_cross_entropy_val_2 = np.zeros(nr_of_samples)
dice_val = np.zeros(nr_of_samples)
dice_loss_val = np.zeros(nr_of_samples)
for index, sample in enumerate(data_generator):
image = sample[0]
mask_weight = sample[1]
(loss_weighted_cross_entropy_val_1[index], weighted_cross_entropy_val_2[index], dice_val[index], dice_loss_val[index]) = model.evaluate(image, mask_weight)
print(f"Sample {index}/{nr_of_samples}")
If you are using the same function as the loss and metric, you will see minor difference in results usually due to floating point precision errors.
Please refer to this SO Answer, which explain in detail for this case.

In Tensorflow-Serving, is it possible to get only the top-k prediction results?

When using the code in, but with a DNNClassifier Estimator model, the curl/query request returns all the possible label classes and their associated scores.
Using a model with 100,000+ possible output/label classes, the response becomes too large. Is there any way to limit the number of outputs to the top-k results? (Similar to how it can be done in keras).
The only possibility I could think of is feeding some parameter into the predict API through the signatures, but I haven't found any parameters that would give this functionality. I've read through a ton of documentation + code and googled a ton, but to no avail.
Any help would be greatly appreciated. Thanks in advance for any responses. <3
AFAIC, there are 2 ways to support your need.
You could add some lines in tensorflow-serving source code referring to this
You could do something like this while training/retraining your model.
Hope this will help.
Putting this up here in case it helps anyone. It's possible to override the classification_output() function in (which is used by in order to filter the top-k results. You can insert this snippet into your / file, and whenever you save an DNNClassifier model, that model will always output at most num_top_k_results when doing inference/serving. The vast majority of the method is copied from the original classification_output() function. (Note this may or may not work with 1.13 / 2.0 as it hasn't been tested on those.)
from tensorflow.python.estimator.canned import head as head_lib
num_top_k_results = 5
def override_classification_output(scores, n_classes, label_vocabulary=None):
batch_size = array_ops.shape(scores)[0]
if label_vocabulary:
export_class_list = label_vocabulary
export_class_list = string_ops.as_string(math_ops.range(n_classes))
# Get the top_k results
top_k_scores, top_k_indices = tf.nn.top_k(scores, num_top_k_results)
# Using the top_k_indices, get the associated class names (from the vocabulary)
top_k_classes = tf.gather(tf.convert_to_tensor(value=export_class_list), tf.squeeze(top_k_indices))
export_output_classes = array_ops.tile(
input=array_ops.expand_dims(input=top_k_classes, axis=0),
multiples=[batch_size, 1])
return export_output.ClassificationOutput(
# `ClassificationOutput` requires string classes.
# Override the original method with our custom one.
head_lib._classification_output = override_classification_output

How to make tensorflow assignment op part of computational graph without explicitly running its output?

I am trying to create a custom gradient in tensorflow to implement the exponentially smoothed (unbiased) gradient of a logarithm that is suggested in this paper ( What I need to do is crease a new variable that stores an exponentially smoothed value, which is updated and used in a custom gradient function. Additionally, I need a flag which tells me when the first gradient calculation is being done, so I can initialize the exponentially smoothed value to the appropriate (data-dependent) value. Furthermore, the output of the custom gradient function must be just the gradient, so it will be a pain in the butt to access the output of a tf.assign from inside the custom gradient. Lastly, I do not want to create a second operation that 'manually' initializes the exponential smoothing by running it separately in my training loop. Anyway, this is all too complicated, so I have an abstract, but simple, problem outlined below, the solution to which would solve my problem:
What I need to be able to do is update one variable in a manner which is conditional upon a second, and furthermore I need to update the second variable without providing it as explicit output by my function. Example code demonstrating my problem is below:
import tensorflow as tf
a = tf.get_variable(name = "test",initializer=True)
b = tf.get_variable(name = "testval",initializer = 10.)
init = tf.global_variables_initializer()
sess = tf.Session()
def make_function(inp):
with tf.variable_scope("",reuse = True):
a = tf.get_variable(name = "test",dtype = tf.bool)
b = tf.get_variable(name = "testval")
iftrue = lambda: [tf.assign(b,inp),tf.assign(a,False)]
iffalse = lambda: [tf.assign(b,(b + inp)/2),tf.assign(a,False)]
acond,bcond = tf.cond(a,iftrue,iffalse)
return acond
I = tf.placeholder(tf.float32)
tcond = make_function(I)
print("{}\tThe initial values of a and b".format([a,b])))
print("{}\t\tRun, tcond1. output is the updated value of b.".format(,{I:1})))
print("{}\tNow we see that b has been updated, but a has not.".format([a,b])))
print("{}\t\tSo now the value is 2 instead of 1.5 like it should be.".format(,{I:2})))
The output is:
[True, 10.0] The initial values of a and b
1.0 Run, tcond1. output is the updated value of b.
[True, 1.0] Now we see that b has been updated, but a has not.
2.0 So now the value is 2 instead of 1.5 like it should be.
Now, I understand that I need to have a line like where acond is the output of the conditional within make_function, but I can't return that because my function needs to only return the value of b (not a), and I don't want to have to carry around an extra op that I need to remember to run on the first training iteration, but not on the others.
So, is there a way to add the assignment op acond to the computational graph without explicitly returning it and running with it
Add this operation to a custom collection and, then, create a dependency between your final op (e.g. the train_op) and your acond.
Inside the method:
tf.add_to_collection("to_run", acond)
In the definition of the final op:
to_run = tf.get_collection("to_run")
with tf.control_dependencies(to_run):
final_op = <something>
When you run final_op you are assured your acond has been already executed.

Unsure whether function breaks backpropagation

I have been tinkering around a lot with tensorflow in the past few days however I am quite unsure whether a function I wrote would break the backpropagation in a Neural network. I thought I'd ask here before I try to integrate this function in a NN. So the basic setup is I want to add two matricies with
op = tf.add(tfObject, tfImageBackground)
where tfImageBackground is some constant image. (i.e. an RGBA image of size 800, 800 with R = G = B = A = 0) and the tfObject is again a matrix with the same dimenstion however we get that with the function I am unsure about
def getObject(vector):
objectId = vector[0]
x = vector[1]
y = vector[2]
xEnd = baseImageSize-(x+objectSize)
yStart =baseImageSize- (y+objectSize)
padding = tf.convert_to_tensor([[x, xEnd], [yStart, y],[0,0]])
RTensor = tfObjectMatrix[objectId,:,:,0:1]
GTensor = tfObjectMatrix[objectId,:,:,1:2]
BTensor = tfObjectMatrix[objectId,:,:,2:3]
ATensor = tfObjectMatrix[objectId,:,:,3:4]
paddedR = tf.pad(tensor = RTensor,
paddings= padding,
generates padding for every channel
finalTensor=tf.concat([paddedR, paddedG, paddedB, paddedA], 2)
return finalTensor
The tfObjectMatrix is a list of images which never change.
I did check wether I was able to generate a tf.gradient from the op, which turned out to work. I am unsure if that is sufficient for backpropagation to work though.
Thanks for you time and effort. Any input at all would be greatly appreciated.
TensorFlow will backpropagate to everything by default. As per your code, everything will receive gradients with a training operation from an optimizer. So to answer your question, backpropagation will work.
The only thing to consider, is that you say tfObjectMatrix is a list of images that will not change. So you might not want it to receive any gradients. Therefore you might want to look into tf.stop_gradient() and maybe use it like OM = tf.stop_gradient( tfObjectMatrix ) and work with that OM in your function.