Tensorflow Warning while saving SparseTensor - tensorflow

I'm working on a TensorFlow project where 'targets' is defined as:
targets = tf.sparse_placeholder(tf.int32, name='targets')
Now saving my model with saver.save(sess, model_path, meta_graph_suffix='meta', write_meta_graph=True) gives me the following error:
WARNING:tensorflow:Error encountered when serializing targets.
Type is unsupported, or the types of the items don't match field type in CollectionDef.
'SparseTensor' object has no attribute 'name'
I believe the warning is printed in the following lines of code: https://github.com/tensorflow/tensorflow/blob/f974e8d0c2420c6f7e2a2791febb4781a266823f/tensorflow/python/training/saver.py#L1452
Reloading the model with saver.restore(session, save_path) seems to work though.
Has anyone seen this issue before? Why would serializing a SparseTensor give that warning? Is there any way to avoid this warning?
I'm using TensorFlow version 0.10.0rc0 python 2.7 GPU version. I can't provide a minimal example, it doesn't happen all the time, only in certain configurations. And I can't share the model I currently have this issue with.

The component placeholders (for indices, values, and possibly shape) somehow get added to some collections. If you trace through the code in saver.py, you can see ops.get_all_collection_keys() being used.
This should be a benign warning. I will forward to the team to see if something can be done to improve this handling.

The warning means that a SparseTensor type of operation has been added to a collection whose to_proto() implementation expects a "name" field.
I'd consider this a bug if you intend to restore the complete graph from meta_graph, including all the Python objects, and you should find out which operation added that SparseTensor into a collection.
If you never intend to restore from meta_graph, then you can ignore this error.
Hope that helps.
Sherry

Related

GluonCV inference with finetuned model - “Please make sure source and target networks have the same prefix” error

I used GluonCV to finetune an object detection model in order to recognize some custom classes, mostly following the related tutorial.
I tried using both “ssd_512_resnet50_v1_coco” and “ssd_512_mobilenet1.0_coco” as base models, and the training process ended successfully (the accuracy on the validation dataset is reasonably high).
The problem is, I tried running inference with the newly trained model, by using for example:
classes = ["CML_mug", "person"]
net = gcv.model_zoo.get_model('ssd_512_mobilenet1.0_custom',
classes=classes,
pretrained_base=False,
ctx=ctx)
net.load_params("saved_weights/-0070.params", ctx=ctx)
but I get the error:
AssertionError: Parameter 'mobilenet0_conv0_weight' is missing in file: saved_weights/CML_mobilenet_00/-0000.params, which contains parameters: 'ssd0_ssd0_mobilenet0_conv0_weight', 'ssd0_ssd0_mobilenet0_batchnorm0_gamma', 'ssd0_ssd0_mobilenet0_batchnorm0_beta', ..., 'ssd0_ssd0_ssdanchorgenerator2_anchor_2', 'ssd0_ssd0_ssdanchorgenerator3_anchor_3', 'ssd0_ssd0_ssdanchorgenerator4_anchor_4', 'ssd0_ssd0_ssdanchorgenerator5_anchor_5'. Please make sure source and target networks have the same prefix.
So, it seems the network parameters are named differently in the .params file and in the model I’m using for inference. Specifically, in the .params file, the name of the network weights is prefixed by the string “ssd0_ssd0_”, which lead to the error when invoking net.load_parameters.
I did this whole procedure a few times in the past without having problems, did anything change? I’m running it on Ubuntu 18.04, with mxnet-mkl (1.6.0) and gluoncv (0.7.0).
I tried loading the .params file by:
from mxnet import nd
model = nd.load(0070.param)
and I wanted to modify it and remove the “ssd0_ssd0_” string that is causing the problem.
I’m trying to navigate the dictionary, but between the keys I only found a:
ssd0_resnetv10_conv0_weight
so, slightly different than indicated in the error.
Anyway, this way of fixing the issue would be a little cumbersome, I’d prefer a more direct way.
Ok, fixed it. Basically, during training I was saving the .params file by using:
net.export(param_file)
and, as I said, loading them during inference by:
net.load_parameters(param_file)
However, it doesn’t work this way, but it does if instead of export I use:
net.save_parameters(param_file)

Tensorflow: slicing PartitionedVariable

I have created a TensorFlow PartitionedVariable object. Unfortunately I need to slice it at some other point of my program (not according to how the variable is partitioned). Unfortunately when I try the obvious (X[count:]), I get the error TypeError: PartitionedVariable object has no attribute getitem. Is it a bug, or is there any workaround of how to slice PartitionedVariable?
I'm afraid you'll have to use tf.slice().
tf.slice(X, [count], [-1])
in your case.

How is tf.summary.tensor_summary meant to be used?

TensorFlow provides a tf.summary.tensor_summary() function that appears to be a multidimensional variant of tf.summary.scalar():
tf.summary.tensor_summary(name, tensor, summary_description=None, collections=None)
I thought it could be useful for summarizing inferred probabilities per class ... somewhat like
op_summary = tf.summary.tensor_summary('classes', some_tensor)
# ...
summary = sess.run(op_summary)
writer.add_summary(summary)
However it appears that TensorBoard doesn't provide a way to display these summaries at all. How are they meant to be used?
I cannot get it to work either. It seems like that feature is still under development. See this video from the TensorFlow Dev Summit that states that the tensor_summary is still under development (starting at 9:17): https://youtu.be/eBbEDRsCmv4?t=9m17s. It will probably be better defined and examples should be provided in the future.

Can I change Inv operation into Reciprocal in an existing graph in Tensorflow?

I am working on an image classification problem with tensorflow. I have 2 different CNNs trained separately (in fact 3 in total but I will deal with the third later), for different tasks and on a AWS (Amazon) machine. One tells if there is text in the image and the other one tells if the image is safe for work or not. Now I want to use them in a single script on my computer, so that I can put an image as input and get the results of both networks as output.
I load the two graphs in a single tensorflow Session, using the import_meta_graph API and the import_scope argument and putting each subgraph in a separate scope. Then I just use the restore method of the created saver, giving it the common Session as argument.
Then, in order to run inference, I retrieve the placeholders and final output with graph=tf.get_default_graph() and my_var=graph.get_operation_by_name('name').outputs[0] before using it in sess.run (I think I could just have put 'name' in sess.run instead of fetching the output tensor and putting it in a variable, but this is not my problem).
My problem is the text CNN works perfectly fine, but the nsfw detector always gives me the same output, no matter the input (even with np.zeros()). I have tried both separately and same story: text works but not nsfw. So I don't think the problem comes from using two networks simultaneaously.
I also tried on the original AWS machine I trained it on, and this time the nsfw CNN worked perfectly.
Both networks are very similar. I checked on Tensorboard if everything was fine and I think it is ok. The differences are in the number of hidden units and the fact that I use batch normalization in the nsfw model and not in the text one. Now why this title ? I observed that I had a warning when running the nsfw model that I didn't have when using only the text model:
W tensorflow/core/framework/op_def_util.cc:332] Op Inv is deprecated. It will cease to work in GraphDef version 17. Use Reciprocal.
So I thougt maybe this was the reason, everything else being equal. I checked my GraphDef version, which seems to be 11, so Inv should still work in theory. By the way the AWS machine use tensroflow version 0.10 and I use version 0.12.
I noticed that the text network only had one Inv operation (via a filtering on the names of the operations given by graph.get_operations()), and that the nsfw model had the same operation plus multiple Inv operations due to the batch normalization layers. As precised in the release notes, tf.inv has simply been renamed to tf.reciprocal, so I tried to change the names of the operations to Reciprocal with tf.group(), as proposed here, but it didn't work. I have seen that using tf.identity() and changing the name could also work, but from what I understand, tensorflow graphs are an append-only structure, so we can't really modify its operations (which seems to be immutable anyway).
The thing is:
as I said, the Inv operation should still work in my GraphDef version;
this is only a warning;
the Inv operations only appear under name scopes that begin with 'gradients' so, from my understanding, this shouldn't be used for inference;
the text model also have an Inv operation.
For these reasons, I have a big doubt on my diagnosis. So my final questions are:
do you have another diagnosis?
if mine is correct, is it possible to replace Inv operations with Reciprocal operations, or do you have any other solution?
After a thorough examination of the output of relevant nodes, with the help of Tensorboard, I am now pretty certain that the renaming of Inv to Reciprocal has nothing to do with my problem.
It appears that the last batch normalization layer eliminates almost any variance of its output when the inputs varies. I will ask why elsewhere.

MXNET build model error on r

When I try to use mxnet to build a feedforward model it appeared the following error:
Error in mx.io.internal.arrayiter(as.array(data), as.array(label), unif.rnds, :
basic_string::_M_replace_aux
I follow the R regression example on mxnet website but I change the data into my own data which contains 109 examples and 1876 variables. The first several steps can run without error until ran the model building step. I just can't understand the error information mean. I wonder that it is because of my dataset or the way I deal with the data.
Can you provide the code snippet you are using? That gives more details on the issue. Also, any stacktrace will be useful.
You get this error message mainly due to invalid column/row access and shape (dimension) mismatch. Can you verify if you are using correct "index" values in creating matrix. Let me know if this fixes the issue.
However, MXNet can be better at printing details about error in the stacktrace. I have created a issue to follow up on this - https://github.com/dmlc/mxnet/issues/4206