Inexplicable behaviour when using numpy.T as init for pyTorch weights - numpy

I use numpy to init the weights of my PyTorch MLP. It's a really small network, 2 layers, 21 neurons per layer. The network's output is BRDF values that are then rendered by Mitsuba 0.6.0.
The very peculiar and strange issue I am experiencing is when transposing the np-arrays during the initialization phase. Doing version A gives me a network that renders perfectly in Mitsuba (what I would expect). Doing version B, which should be equivalent, gives me a network that scores the same loss in PyTorch, but renders different values in Mitsuba.
# Version A:
w = np.random.uniform(low=-0.05, high=0.05, size=(6, 21)).astype(np.float32)
model.fc1.weight = torch.nn.Parameter(torch.from_numpy(w.T), requires_grad=True)
# Version B:
w = np.random.uniform(low=-0.05, high=0.05, size=(21, 6)).astype(np.float32)
model.fc1.weight = torch.nn.Parameter(torch.from_numpy(w), requires_grad=True)
Note how in Version B, all that changed are the dimensions and the call to transpose. Therefore, the shapes are equivalent to Version A, and the contents should be equivalent as well, as both are sampled from the same distribution.
I cannot share a MWE, as this is proprietary research, but I assure you that the ONLY thing I changed between these two runs is the two lines in the above code snippets. I do not think Mitsuba is at fault either, because the first network (version A) renders fine, and the second network is equivalent to that, but for the init. I tried mimicking the numpy-inits with the respective PyTorch-equivalents, and the issue persists.
Any help is greatly appreciated!!
VersionA
VersionB

Related

TF object detection: return subset of inference payload

Problem
I'm working on training and deploying an instance segmentation model using TF's object detection API. I'm able to successfully train the model, package it into a TF Serving Docker image (latest tag as of Oct 2020), and process inference requests via the REST interface. However, the amount of data returned from an inference request is very large (hundreds of Mb). This is a big problem when the inference request and processing don't happen on the same machine because all that returned data has to go over the network.
Is there a way to trim down the number of outputs (either during model export or within the TF Serving image) so allow faster round trip times during inference?
Details
I'm using TF OD API (with TF2) to train a Mask RCNN model, which is a modified version of this config. I believe the full list of outputs is described in code here. The list of items I get during inference is also pasted below. For a model with 100 object proposals, that information is ~270 Mb if I just write the returned inference as json to disk.
inference_payload['outputs'].keys()
dict_keys(['detection_masks', 'rpn_features_to_crop', 'detection_anchor_indices', 'refined_box_encodings', 'final_anchors', 'mask_predictions', 'detection_classes', 'num_detections', 'rpn_box_predictor_features', 'class_predictions_with_background', 'proposal_boxes', 'raw_detection_boxes', 'rpn_box_encodings', 'box_classifier_features', 'raw_detection_scores', 'proposal_boxes_normalized', 'detection_multiclass_scores', 'anchors', 'num_proposals', 'detection_boxes', 'image_shape', 'rpn_objectness_predictions_with_background', 'detection_scores'])
I already encode the images within my inference requests as base64, so the request payload is not too large when going over the network. It's just that the inference response is gigantic in comparison. I only need 4 or 5 of the items out of this response, so it'd be great to exclude the rest and avoid passing such a large package of bits over the network.
Things I've tried
I've tried setting the score_threshold to a higher value during the export (code example here) to reduce the number of outputs. However, this seems to just threshold the detection_scores. All the extraneous inference information is still returned.
I also tried just manually excluding some of these inference outputs by adding the names of keys to remove here. That also didn't seem to have any effect, and I'm worried this is a bad idea because some of those keys might be needed during scoring/evaluation.
I also searched here and on tensorflow/models repo, but I wasn't able to find anything.
I was able to find a hacky workaround. In the export process (here), some of the components of the prediction dict are deleted. I added additional items to the non_tensor_predictions list, which contains all keys that will get removed during the postprocess step. Augmenting this list cut down my inference outputs from ~200MB to ~12MB.
Full code for the if self._number_of_stages == 3 block:
if self._number_of_stages == 3:
non_tensor_predictions = [
k for k, v in prediction_dict.items() if not isinstance(v, tf.Tensor)]
# Add additional keys to delete during postprocessing
non_tensor_predictions = non_tensor_predictions + ['raw_detection_scores', 'detection_multiclass_scores', 'anchors', 'rpn_objectness_predictions_with_background', 'detection_anchor_indices', 'refined_box_encodings', 'class_predictions_with_background', 'raw_detection_boxes', 'final_anchors', 'rpn_box_encodings', 'box_classifier_features']
for k in non_tensor_predictions:
tf.logging.info('Removing {0} from prediction_dict'.format(k))
prediction_dict.pop(k)
return prediction_dict
I think there's a more "proper" way to deal with this using signature definitions during the creation of the TF Serving image, but this worked for a quick and dirty fix.
I've ran into the same problem. In the exporter_main_v2 code there is stated that the outputs should be:
and the following output nodes returned by the model.postprocess(..):
* `num_detections`: Outputs float32 tensors of the form [batch]
that specifies the number of valid boxes per image in the batch.
* `detection_boxes`: Outputs float32 tensors of the form
[batch, num_boxes, 4] containing detected boxes.
* `detection_scores`: Outputs float32 tensors of the form
[batch, num_boxes] containing class scores for the detections.
* `detection_classes`: Outputs float32 tensors of the form
[batch, num_boxes] containing classes for the detections.
I've submitted an issue on the tensorflow object detection github repo, I hope we will get feedback from the tensorflow dev team.
The github issue can be found here
If you are using exporter_main_v2.py file to export your model, you can try this hack way to solve this problem.
Just add following codes in the function _run_inference_on_images of exporter_lib_v2.py file:
detections[classes_field] = (
tf.cast(detections[classes_field], tf.float32) + label_id_offset)
############# START ##########
ignored_model_output_names = ["raw_detection_boxes", "raw_detection_scores"]
for key in ignored_model_output_names:
if key in detections.keys(): del detections[key]
############# END ##########
for key, val in detections.items():
detections[key] = tf.cast(val, tf.float32)
Therefore, the generated model will not output the values of ignored_model_output_names.
Please let me know if this can solve your problem.
Another approach would be to alter the signatures of the saved model:
model = tf.saved_model.load(path.join("models", "efficientdet_d7_coco17_tpu-32", "saved_model"))
infer = model.signatures["serving_default"]
outputs = infer.structured_outputs
for o in ["raw_detection_boxes", "raw_detection_scores"]:
outputs.pop(o)
tf.saved_model.save(
model,
export_dir="export",
signatures={"serving_default" : infer},
options=None
)

Different optimization behavior using np.random-normal instead of tf.random_normal

I’m looking into the code from https://github.com/AshishBora/csgm and experience some strange behavior when using np.random.normal instead of tf.random_normal as initializing of a tf.Variable. More concrete:
Instead of
z = tf.Variable(tf.random_normal((batch_size, hparams.n_z)), name='z')
I have
# in mnist_vae/src/model_def.py, line 74
z = tf.Variable(np.random.normal(size=(batch_size,
hparams.n_z)).astype('float32'), name='z')
z is the variable, which is optimized via Adam optimizer with respect to an objective.
For a little bit background: There is a pre-trained neural network G, whose input z is drawn from a standard normal distribution using tf.random_normal. For a given z*, one wants to solve ẑ= argmin_z ||AG(z)-AG(z*)|| and check the reconstruction error ||G(ẑ)-G(z*)||. The outcoming minimal value c(z*)=||G(ẑ)-G(z*)|| is for several different z* quite stable around a value c1. Now, I wasn’t quite sure whether the optimization (Adam optimizer) might use the information that z comes from a standard normal distribution. So I replaced the tf.random_normal by a np.random_normal in the hope that the optimizer can’t use the information then. (see the code above)
Unfortunately, the results are indeed different using np.random.normal: c(z*)=||G(ẑ)-G(z*)|| is for several different z* stable around a different value c2 (not c1). How can one explain this? Is it really that the optimizer uses the information of the normal distribution (e.g. as loglikelihood prior) in the optimization? My feeling says no, since it's only the initialization.
The code is given in https://github.com/AshishBora/csgm

RETURNN Custom Layer Search Mode Assertion Error

I've implemented a custom RETURNN layer (HMM Factorization), which works as intended during training, but throws an assertion error when used in search mode. The output of the layer is identical to that of a softmax layer.
Here's the config that was used : transformer + HMM Factorization
This was tested using the latest version of RETURNN.
The exact line that fails is (code link):
assert fixed_seq_len is not None
Here's the full error log (too large to paste here)
Here's the training initialisation
Does anybody have any ideas what the error could be?
Thanks!
This is actually a bug in RETURNN. I created a pull request here which should fix that, and merged that in now.
The problem was not with your custom layer, but rather with a layer inside your RecLayer, which was actually totally independent, i.e. this one:
'encoder_int': {'activation': None,
'class': 'linear',
'from': ['base:encoder'],
'n_out': 1000,
'with_bias': False}
It just depends on one base layer ("base:encoder"), nothing else. So it (correctly) optimize this layer out of the recurrent loop, because it is independent.
However, then it sees that you are accessing this layer inside the loop, and as this is a loop over time, it assumes that this loop is over this time-dimension of "base:encoder". Then it tries to unroll the "base:encoder" (TensorArray.unroll) given the seq len of the rec layer, but then it fails because at this time it does not know the seq len of the rec layer.
My fix now does some more advanced check whether this assumption is correct, i.e. that the loop is really over the same time dimension. The check is a bit fragile though, and not sure if that works correctly in all cases. However, I created a test case which reproduces your problem and this is fixed now.

Can I change Inv operation into Reciprocal in an existing graph in Tensorflow?

I am working on an image classification problem with tensorflow. I have 2 different CNNs trained separately (in fact 3 in total but I will deal with the third later), for different tasks and on a AWS (Amazon) machine. One tells if there is text in the image and the other one tells if the image is safe for work or not. Now I want to use them in a single script on my computer, so that I can put an image as input and get the results of both networks as output.
I load the two graphs in a single tensorflow Session, using the import_meta_graph API and the import_scope argument and putting each subgraph in a separate scope. Then I just use the restore method of the created saver, giving it the common Session as argument.
Then, in order to run inference, I retrieve the placeholders and final output with graph=tf.get_default_graph() and my_var=graph.get_operation_by_name('name').outputs[0] before using it in sess.run (I think I could just have put 'name' in sess.run instead of fetching the output tensor and putting it in a variable, but this is not my problem).
My problem is the text CNN works perfectly fine, but the nsfw detector always gives me the same output, no matter the input (even with np.zeros()). I have tried both separately and same story: text works but not nsfw. So I don't think the problem comes from using two networks simultaneaously.
I also tried on the original AWS machine I trained it on, and this time the nsfw CNN worked perfectly.
Both networks are very similar. I checked on Tensorboard if everything was fine and I think it is ok. The differences are in the number of hidden units and the fact that I use batch normalization in the nsfw model and not in the text one. Now why this title ? I observed that I had a warning when running the nsfw model that I didn't have when using only the text model:
W tensorflow/core/framework/op_def_util.cc:332] Op Inv is deprecated. It will cease to work in GraphDef version 17. Use Reciprocal.
So I thougt maybe this was the reason, everything else being equal. I checked my GraphDef version, which seems to be 11, so Inv should still work in theory. By the way the AWS machine use tensroflow version 0.10 and I use version 0.12.
I noticed that the text network only had one Inv operation (via a filtering on the names of the operations given by graph.get_operations()), and that the nsfw model had the same operation plus multiple Inv operations due to the batch normalization layers. As precised in the release notes, tf.inv has simply been renamed to tf.reciprocal, so I tried to change the names of the operations to Reciprocal with tf.group(), as proposed here, but it didn't work. I have seen that using tf.identity() and changing the name could also work, but from what I understand, tensorflow graphs are an append-only structure, so we can't really modify its operations (which seems to be immutable anyway).
The thing is:
as I said, the Inv operation should still work in my GraphDef version;
this is only a warning;
the Inv operations only appear under name scopes that begin with 'gradients' so, from my understanding, this shouldn't be used for inference;
the text model also have an Inv operation.
For these reasons, I have a big doubt on my diagnosis. So my final questions are:
do you have another diagnosis?
if mine is correct, is it possible to replace Inv operations with Reciprocal operations, or do you have any other solution?
After a thorough examination of the output of relevant nodes, with the help of Tensorboard, I am now pretty certain that the renaming of Inv to Reciprocal has nothing to do with my problem.
It appears that the last batch normalization layer eliminates almost any variance of its output when the inputs varies. I will ask why elsewhere.

Force copy of tensor when enqueuing

first, I'm not sure if the title is very good, but it was the best I could come up with given my understanding of the situation.
The background is that I'm trying to understand how queues work in tensorflow and ran into the following issue which puzzled me.
I have a variable n, which I enqueue to a tf.FIFOQueue, and then I increment the variable. This is repeated several times, and one would expect a result similar to 0, 1, 2, ... However, when emptying the queue all values are the same.
More precisely, the code is as follows:
from __future__ import print_function
import tensorflow as tf
q = tf.FIFOQueue(10, tf.float32)
n = tf.Variable(0, trainable=False, dtype=tf.float32)
inc = n.assign(n+1)
enqueue = q.enqueue(n)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
sess.run(enqueue)
sess.run(inc)
sess.run(enqueue)
sess.run(inc)
sess.run(enqueue)
sess.run(inc)
print(sess.run(q.dequeue()))
print(sess.run(q.dequeue()))
print(sess.run(q.dequeue()))
Which I expect would print:
0.0
1.0
2.0
Instead I get the following result:
3.0
3.0
3.0
It seems like I'm pushing some pointer to n to the queue, instead of the actual value, which is what I want. However, I don't really have any actual understanding of tensorflow internals, so maybe something else is going on?
I tried changing
enqueue = q.enqueue(n)
to
enqueue = q.enqueue(tf.identity(n))
since answers to How can I copy a variable in tensorflow and In TensorFlow, what is tf.identity used for? gives me the impression that it might help, but it does not change the result. I also tried adding a tf.control_dependencies(), but again, all values are the same when dequeueing.
Edit: The output above is from running the code on a computer with a single CPU, when trying to see if there was some difference between different versions of tensorflow, I noticed if I run the code on a computer with CPU and GPU I get the "expected" result. Indeed, if I run with CUDA_VISIBLE_DEVICES="" I get the result above, and with CUDA_VISIBLE_DEVICES="0" I get the "expected" result.
To force a non-caching read you can do
q.enqueue(tf.add(q, 0))
This is what's currently done by the batch-normalization layer to force a copy.
Semantics of how variables get read vs. referenced are in the process of getting revamped so they are temporarily non-intuitive. In particular, I expected q.enqueue(v.read_value()) to force a non-caching read, but it doesn't fix your example on TF 0.12rc1
Using GPU machine puts variable on GPU, while Queue is CPU only, so enqueue op forces a GPU->CPU copy.
In case it helps, I've found that the other answers despite correct they do not work for all dtypes.
For example, this works fine with floats or ints but fails when n is a string tensor:
q.enqueue(tf.add(n, 0))
This one fails when the queue uses tuples with heterogeneous types (e.g., ints and floats):
q.enqueue_many([[n]])
So, if you see yourself caught in any of these situations try this instead:
q.enqueue(tf.add(n, tf.zeros_like(n)))
Or, to enqueue a tuple t:
q.enqueue([tf.add(n, tf.zeros_like(n)) for n in t])
That works even for string tensors and heterogeneous tuple types.
Hope it helps!
--
Update: it looks like tf.bool types do not work with tf.zeros_like(). For those, an explicit cast to an integer type might be needed.