UnicodeDecodeError from tf.train.import_meta_graph - tensorflow

I serialized a Tensorflow model with the following code ...
save_path = self.saver.save(self.session, os.path.join(self.logdir, "model.ckpt"), global_step)
logging.info("Model saved in file: %s" % save_path)
... and I'm now trying to restore it from scratch in a separate file using the following code:
saver = tf.train.import_meta_graph(PROJ_DIR + '/logs/default/model.ckpt-54.meta')
session = tf.Session()
saver.restore(session, PROJ_DIR + '/logs/default/model.ckpt-54')
print('Model restored')
When tf.train.import_meta_graph is called, the following exception is thrown:
[libprotobuf ERROR google/protobuf/io/coded_stream.cc:207] A protocol message was rejected because it was too big (more than 67108864 bytes). To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
Traceback (most recent call last):
File "/home/reid/projects/research/ccg/taggerflow_modified/test/tf_restore.py", line 4, in <module>
saver = tf.train.import_meta_graph(PROJ_DIR + '/logs/default/model.ckpt-54.meta')
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1711, in import_meta_graph
read_meta_graph_file(meta_graph_or_file), clear_devices)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1563, in read_meta_graph_file
text_format.Merge(file_content.decode("utf-8"), meta_graph_def)
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa7 in position 1: invalid start byte
For reference, here's the first few lines of <PROJ_DIR>/logs/default/model.ckpt-54.meta:
<A7>:^R<A4>:
9
^CAdd^R^F
^Ax"^AT^R^F
^Ay"^AT^Z^F
^Az"^AT"^Z
^AT^R^Dtype:^O
^M2^K^S^A^B^D^F^E^C ^R^G
I think that Tensorflow is using a different encoding when serializing vs when deserializing. How do we specify the encoding that Tensorflow uses when serializing/deserializing? Or is the solution something different?

I was facing the same issue. Have you ensured that apart from the
.meta, .data-00000-of-00001 and the .index files
the file named 'checkpoint' too is there in the directory from which you're loading the model?
My issue got resolved after I made sure of this. Hope this helps!

Related

onnxruntime: Given model could not be parsed while creating inference session. Error message: Protobuf parsing failed

According to the example code mentioned below the library. I have followed the example code but it didn't work.
[Library] https://github.com/notAI-tech/NudeNet/
Code
from nudenet import NudeClassifier
import onnxruntime
classifier = NudeClassifier()
classifier.classify('/home/coremax/Downloads/DETECTOR_AUTO_GENERATED_DATA/IMAGES/3FEF7B75-3823-4153-8490-87483AAC6ABC'
'.jpg')
I have also followed the previous solution on StackOverflow but it didn't work
Error on running Super Resolution Model from ONNX
Traceback (most recent call last):
File "/snap/pycharm-community/276/plugins/python-ce/helpers/pydev/pydevd.py", line 1491, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/snap/pycharm-community/276/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/coremax/Documents/NudeNet/main.py", line 3, in <module>
classifier = NudeClassifier()
File "/home/coremax/Documents/NudeNet/nudenet/classifier.py", line 37, in __init__
self.nsfw_model = onnxruntime.InferenceSession(model_path)
File "/home/coremax/anaconda3/envs/AdultNET/lib/python3.6/site-packages/onnxruntime/capi/session.py", line 158, in __init__
self._load_model(providers or [])
File "/home/coremax/anaconda3/envs/AdultNET/lib/python3.6/site-packages/onnxruntime/capi/session.py", line 166, in _load_model
True)
RuntimeError: /onnxruntime_src/onnxruntime/core/session/inference_session.cc:238 onnxruntime::InferenceSession::InferenceSession(const onnxruntime::SessionOptions&, const onnxruntime::Environment&, const string&) status.IsOK() was false. Given model could not be parsed while creating inference session. Error message: Protobuf parsing failed.
I know it is too late but hope this helps someone build a very useful software.
why it fails
the error is that for NudeClassifier to work it has to download the onnx model from this link
but github now requires you to be logged in to download any file so the constructor for the NudeClassifier fails as it tries to download this model
Solution
create a folder in your user's home folder with the name .NudeNet/
download the model from this link
save the model in the folder you created in step one
you should now have the model at the following path ~/.NudeNet/classifier_model.onnx
now you're ready to go good luck!

TFX Transform Rank Mismatch While Loading/Applying TFX Beam Transform Graph

I've already successfully fit a TFTransformOutput to some data (in this case, the Census dataset from UCI common amongst the TF and TFX examples.) I try to apply the transformer with the method transform_raw_features(raw_features) but keep getting the error:
ValueError: Node 'transform/transform/inputs/workclass_copy' has an
_output_shapes attribute inconsistent with the GraphDef for output #0: Shapes must be equal rank, but are 0 and 1
Digging into the source code, it seems the error originates in saved_transform_io in the method _partially_apply_saved_transform_impl while doing:
saver = tf_saver.import_meta_graph(meta_graph_def, import_scope=import_scope,
input_map=input_map)
I examined the meta_graph_def produced by TFX TFTransform and Beam and notice that the graph indeed has a series of copied variables with input/output rank differences. However, that is nothing I have control over.
The column in the error message is "workclass" which is a simple categorical column. What might I be doing incorrectly? What is the best way to debug this? At this point, I've already dug deep into the TF source code but the error seems to originate with how the TFTransform graph was written, not sure what levers I have to change/fix that.
This is using TF Transform v0.9 and the corresponding TF v1.9
Traceback (most recent call last): File
"/home/sahmed/workspace/ml_playground/TFX-TFT/trainers.py", line 449,
in parse_csv
transformed_stuff=xformer.transform_raw_features(raw_features) File
"/home/sahmed/miniconda3/envs/kml2/lib/python2.7/site-packages/tensorflow_transform/output_wrapper.py",
line 122, in transform_raw_features
self.transform_savedmodel_dir, raw_features)) File "/home/sahmed/miniconda3/envs/kml2/lib/python2.7/site-packages/tensorflow_transform/saved/saved_transform_io.py",
line 360, in partially_apply_saved_transform_internal
saved_model_dir, logical_input_map, tensor_replacement_map) File "/home/sahmed/miniconda3/envs/kml2/lib/python2.7/site-packages/tensorflow_transform/saved/saved_transform_io.py",
line 218, in _partially_apply_saved_transform_impl
input_map=input_map) File "/home/sahmed/miniconda3/envs/kml2/lib/python2.7/site-packages/tensorflow/python/training/saver.py",
line 1960, in import_meta_graph
**kwargs) File "/home/sahmed/miniconda3/envs/kml2/lib/python2.7/site-packages/tensorflow/python/framework/meta_graph.py",
line 744, in import_scoped_meta_graph
producer_op_list=producer_op_list) File "/home/sahmed/miniconda3/envs/kml2/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py",
line 432, in new_func
return func(*args, **kwargs) File "/home/sahmed/miniconda3/envs/kml2/lib/python2.7/site-packages/tensorflow/python/framework/importer.py",
line 422, in import_graph_def
raise ValueError(str(e)) ValueError: Node 'transform/transform/inputs/workclass_copy' has an _output_shapes
attribute inconsistent with the GraphDef for output #0: Shapes must be
equal rank, but are 0 and 1
The issue is likely that the shape of the workclass tensor is incompatible with what transform_raw_features expects.
TFTransformOutput.transform_raw_features() expects these features to have the same characteristics as described in the metadata given to tft.AnalyzeDataset() similarly to how it's done in this example:
https://github.com/tensorflow/transform/blob/master/examples/simple_example.py#L63
Could you take a look at the metadata used in your pipeline and see that it is compatible with the data fed into TFTransformOutput.transform_raw_features()?

pix2pixHD error with own dataset

I am trying to generate my own images using the pix2pixHD pre-trained model. Github repo found here
The images inside the dataset has to be in grayscale with no alpha channel. The images in the repo has a size of 16 bitPerSample and I have both images in size 8 and 16 bitsPerSample.
When I check both my images and the images in the repo using sips -g all. This is the outcome I get:
pixelWidth: 2048
pixelHeight: 1024
typeIdentifier: public.png
format: png
formatOptions: default
dpiWidth: 72.000
dpiHeight: 72.000
samplesPerPixel: 1
bitsPerSample: 16
hasAlpha: no
space: Gray
The strange thing is that it works with the images that has 8 bitPerSample.
This is the outcome I get:
Grayscale input
Converted label map
Final output
When I run test.py with 16 bitsPerSample images, it doesn't work.
This is the error it gives me:
model [Pix2PixHDModel] was created
Traceback (most recent call last):
File "test.py", line 26, in <module>
for i, data in enumerate(dataset):
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 210, in __next__
return self._process_next_batch(batch)
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 230, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
TypeError: Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 42, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 42, in <listcomp>
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/paperspace/Documents/pix2pixHD/data/aligned_dataset.py", line 41, in __getitem__
label_tensor = transform_label(label) * 255.0
File "/usr/local/lib/python3.5/dist-packages/torch/tensor.py", line 309, in __mul__
return self.mul(other)
TypeError: mul received an invalid combination of arguments - got (float), but expected one of:
* (int value)
didn't match because some of the arguments have invalid types: (float)
* (torch.IntTensor other)
didn't match because some of the arguments have invalid types: (float)
I am new fairly to Tensorflow and I have never used pytorch before.
Any idea what this error mean and how can I resolve it?
Yes, I think I can help you.
I haven't checked the repository, but from the error trace the problem appears to be following:
You are performing a multiplication operation betweenn the output of transform_label(label) (presumably a tensor) and a scalar 255.0. This is fine as long as both your scalar and your tensor are of the same datatype. From the error trace however, it looks as if the output of transform_label() is of data type Int / Long, while 255.0 is a float.
I suggest you try 255 or int(255.0) instead of 255.0.
If this does not resolve your problem, let me know what data type the output of transform_label() is.

Using graph_metrics.py with a saved graph

I want to view statistics of my model by saving my graph to a file then running graph_metrics.py.
I have tried a few different things to write the file, my best effort is:
tf.train.write_graph( session.graph_def, ".", "my_graph", as_text=True )
But here's what happens:
$ python ./util/graph_metrics.py --noinput_binary --graph my_graph
Traceback (most recent call last):
File "./util/graph_metrics.py", line 137, in <module>
tf.app.run()
File ".virtualenv/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv))
File "./util/graph_metrics.py", line 85, in main
FLAGS.batch_size)
File "./util/graph_metrics.py", line 109, in calculate_graph_metrics
input_tensor = sess.graph.get_tensor_by_name(input_layer)
File ".virtualenv/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2531, in get_tensor_by_name
return self.as_graph_element(name, allow_tensor=True, allow_operation=False)
File ".virtualenv/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2385, in as_graph_element
return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
File ".virtualenv/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2427, in _as_graph_element_locked
"graph." % (repr(name), repr(op_name)))
KeyError: "The name 'Mul:0' refers to a Tensor which does not exist. The operation, 'Mul', does not exist in the graph."
Is there a complete working example of saving a graph, then analyzing it with graph_metrics.py?
This process seems to involve a magic incantation that I haven't yet discovered.
The error you're hitting is because you need to specify the name of your own input node with --input_layer= (it just defaults to Mul:0 because that's what we use in one of our Inception models):
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/graph_metrics.py#L51
The graph_metrics script is still very much a work in progress unfortunately, and you may hit problems with shape inference, but hopefully this should get you past the initial hurdle.

TensorFlow distributed master worker save fails silently; the checkpoint file isn't created but no exception is raised

In distribution tensorflow environment. the master worker saves checkpoint fail.
saver.save has return ok*(not raise exception and return the store checkpoint file path) but, the return checkpoint file is not exist.
this is not same as the description of the tensorflow api
Why? How to Fix it?
=============
the related code is below:
def def_ps(self):
self.saver = tf.train.Saver(max_to_keep=100,keep_checkpoint_every_n_hours=3)
def save(self,idx):
ret = self.saver.save(self.sess,self.save_model_path,global_step=None,write_meta_graph=False)
if not os.path.exists(ret):
msg = "save model for %u path %s not exists."%(idx,ret)
lg.error(msg)
raise Exception(msg);
=============
the log is below:
2016-06-02 21:33:52,323 root ERROR save model for 2 path model_path/rl_model_2 not exists.
2016-06-02 21:33:52,323 root ERROR has error:save model for 2 path model_path/rl_model_2 not exists.
Traceback (most recent call last):
File "d_rl_main_model_dist_0.py", line 755, in run_worker
model_a.save(next_model_idx)
File "d_rl_main_model_dist_0.py", line 360, in save
Trainer.save(self,save_idx)
File "d_rl_main_model_dist_0.py", line 289, in save
raise Exception(msg);
Exception: save model for 2 path model_path/rl_model_2 not exists.
===========
not meets the tensorflow api which define Saver.save as below:
https://www.tensorflow.org/versions/master/api_docs/python/state_ops.html#Saver
tf.train.Saver.save(sess, save_path, global_step=None, latest_filename=None, meta_graph_suffix='meta', write_meta_graph=True)
Returns:
A string: path at which the variables were saved. If the saver is sharded, this string ends with: '-?????-of-nnnnn' where 'nnnnn' is the number of shards created.
Raises:
TypeError: If sess is not a Session.
ValueError: If latest_filename contains path components.
The tf.train.Saver.save() method is a little... surprising when you run in distributed mode. The actual file is written by the process that holds the tf.Variable op, which is typically a process in "/job:ps" if you've used the example code to set things up. This means that you need to look in save_path on each of the remote machines that have variables to find the checkpoint files.
Why is this the case? The Saver API implicitly assumes that all processes have the same view of a shared file system, like an NFS mount, because that is the typical setup we use at Google. We've added support for Google Cloud Storage in the latest nightly versions of TensorFlow, and are investigating HDFS support as well.