RuntimeError: Unsuccessful TensorSliceReader constructor Failed to find any matching files for Tensorflow/workspace/models/myssd_mobnet/./model.ckpt-5 - object-detection

I am trying to do Automatic number plate recognition training using google colab and during the process I run this line :
# Load pipeline config and build a detection model
configs = config_util.get_configs_from_pipeline_file(files['PIPELINE_CONFIG'])
detection_model = model_builder.build(model_config=configs['model'], is_training=False)
# Restore checkpoint
ckpt = tf.compat.v2.train.Checkpoint(model=detection_model)
ckpt.restore(os.path.join(paths['CHECKPOINT_PATH'], 'ckpt-5')).expect_partial()
#tf.function
def detect_fn(image):
image, shapes = detection_model.preprocess(image)
prediction_dict = detection_model.predict(image, shapes)
detections = detection_model.postprocess(prediction_dict, shapes)
return detections
and i got the following error:
NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for Tensorflow/workspace/models/my_ssd_mobnet/./model.ckpt-5
NotFoundError: Error when restoring from checkpoint or SavedModel at Tensorflow/workspace/models/my_ssd_mobnet/./model.ckpt-5: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for Tensorflow/workspace/models/my_ssd_mobnet/./model.ckpt-5
Please double-check that the path is correct. You may be missing the checkpoint suffix (e.g. the '-1' in 'path/to/ckpt-1').
**#
i tried to change the path but i got the same error, any suggestions ? **

Related

Deploy a Tensorflow model built using TF2 in TF1 format (no SavedModel bundles found!)

I have used Recommenders https://github.com/microsoft/recommenders library to train an NCF recommendation model. Currently I'm getting issues in deployment through Amazon TensorflowModel library
Model is saved using the following code
def save(self, dir_name):
"""Save model parameters in `dir_name`
Args:
dir_name (str): directory name, which should be a folder name instead of file name
we will create a new directory if not existing.
"""
# save trained model
if not os.path.exists(dir_name):
os.makedirs(dir_name)
saver = tf.compat.v1.train.Saver()
saver.save(self.sess, os.path.join(dir_name, MODEL_CHECKPOINT))
Files exported in the process are 'checkpoint', 'model.ckpt.data-00000-of-00001', 'model.ckpt.index', 'model.ckpt.meta'
They follow the structure of
- model.tar.gz
- 00000000
- checkpoint
- model.ckpt.data-00000-of-00001
- model.ckpt.index
- model.ckpt.meta
I have tried various deployment processes, however they all give the same error. Here's the latest one that I implemented following this example https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-script-mode/pytorch_bert/code/inference_code.py
from sagemaker.tensorflow.model import TensorFlowModel
model = TensorFlowModel(
entry_point="tf_inference.py",
model_data=zipped_model_path,
role=role,
model_version='1',
framework_version="2.7"
)
predictor = model.deploy(
initial_instance_count=1, instance_type="ml.g4dn.2xlarge", endpoint_name='endpoint-name3'
)
All Solutions end with the same error over and over again
Traceback (most recent call last):
File "/sagemaker/serve.py", line 502, in <module>
ServiceManager().start()
File "/sagemaker/serve.py", line 482, in start
self._create_tfs_config()
File "/sagemaker/serve.py", line 153, in _create_tfs_config
raise ValueError("no SavedModel bundles found!")
These 2 links helped me resolve the issue
https://github.com/aws/sagemaker-python-sdk/issues/599
https://www.tensorflow.org/guide/migrate/saved_model#1_save_the_graph_as_a_savedmodel_with_savedmodelbuilder
Sagemaker has weird directory structure that you need to strictly follow. The first one shares the starting directories and 2nd one shares the process of saving the model for TF1 and TF2

SavedModel file does not exist at saved_model/{saved_model.pbtxt|saved_model.pb}

I'm try running Tensorflow Object Detection API on Tensorflow 2 and I got that error, can someone have a solution?
The code :
Loader
def load_model(model_name):
base_url = 'http://download.tensorflow.org/models/object_detection/'
model_file = model_name + '.tar.gz'
model_dir = tf.keras.utils.get_file(
fname=model_name,
origin=base_url + model_file,
untar=True)
​
model_dir = pathlib.Path(model_dir)/"saved_model"
​
model = tf.saved_model.load(str(model_dir))
model = model.signatures['serving_default']
​
return model
Loading label map
Label maps map indices to category names, so that when our convolution network predicts 5, we know that this corresponds to airplane. Here we use internal utility functions, but anything that returns a dictionary mapping integers to appropriate string labels would be fine
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = 'data/mscoco_label_map.pbtxt'
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)
For the sake of simplicity we will test on 2 images:
# If you want to test the code with your images, just add path to the images to the TEST_IMAGE_PATHS.
PATH_TO_TEST_IMAGES_DIR = pathlib.Path('test_images')
TEST_IMAGE_PATHS = sorted(list(PATH_TO_TEST_IMAGES_DIR.glob("*.jpg")))
TEST_IMAGE_PATHS
Detection
Load an object detection model:
model_name = 'ssd_mobilenet_v1_coco_11_06_2017'
detection_model = load_model(model_name)
and i got this error
OSError Traceback (most recent call last)
<ipython-input-7-e89d9e690495> in <module>
1 model_name = 'ssd_mobilenet_v1_coco_11_06_2017'
----> 2 detection_model = load_model(model_name)
<ipython-input-4-f8a3c92a04a4> in load_model(model_name)
9 model_dir = pathlib.Path(model_dir)/"saved_model"
10
---> 11 model = tf.saved_model.load(str(model_dir))
12 model = model.signatures['serving_default']
13
D:\Anaconda\lib\site-packages\tensorflow_core\python\saved_model\load.py in load(export_dir, tags)
515 ValueError: If `tags` don't match a MetaGraph in the SavedModel.
516 """
--> 517 return load_internal(export_dir, tags)
518
519
D:\Anaconda\lib\site-packages\tensorflow_core\python\saved_model\load.py in load_internal(export_dir, tags, loader_cls)
524 # sequences for nest.flatten, so we put those through as-is.
525 tags = nest.flatten(tags)
--> 526 saved_model_proto = loader_impl.parse_saved_model(export_dir)
527 if (len(saved_model_proto.meta_graphs) == 1
528 and saved_model_proto.meta_graphs[0].HasField("object_graph_def")):
D:\Anaconda\lib\site-packages\tensorflow_core\python\saved_model\loader_impl.py in parse_saved_model(export_dir)
81 (export_dir,
82 constants.SAVED_MODEL_FILENAME_PBTXT,
---> 83 constants.SAVED_MODEL_FILENAME_PB))
84
85
OSError: SavedModel file does not exist at: C:\Users\Asus\.keras\datasets\ssd_mobilenet_v1_coco_11_06_2017\saved_model/{saved_model.pbtxt|saved_model.pb}
I assume that you are running detection_model_zoo tutorial here. Note that maybe you can change the model name from ssd_mobilenet_v1_coco_11_06_2017 to ssd_mobilenet_v1_coco_2017_11_17, this will solve the problem in my test.
The content of these files can be seen below:
# ssd_mobilenet_v1_coco_11_06_2017
frozen_inference_graph.pb model.ckpt.data-00000-of-00001 model.ckpt.meta
graph.pbtxt model.ckpt.index
# ssd_mobilenet_v1_coco_2017_11_17
checkpoint model.ckpt.data-00000-of-00001 model.ckpt.meta
frozen_inference_graph.pb model.ckpt.index saved_model
Reference:
Where to find tensorflow pretrained models (list or download link)
detect_model_zoo
Using the SavedModel format official blog
Do not link all the way to the model name. Use the pathname to the folder containing the model.
In my case, this code is worked for me. I gave the path of the folder of my .pd file that was created by model checkpoint module :
import tensorflow as tf
if __name__ == '__main__':
# Update the input name and path for your Keras model
input_keras_model = 'my path/weights/my_trained_model/{the files inside this folder are: assets(folder), variables(folder),keras_metadata.pd,saved_model.pd}'
model = tf.keras.models.load_model(input_keras_model)
I was getting exactly this error when trying to use the saved_model.pb file.
I had gotten the .pb file along with a pre-trained model following some tutorial.
It was happening due to the following reasons:
first your already existing saved_model.pb file might be corrupt
second as the user #Mark Silla has mentioned, you are giving the wrong path to the file, just give the path of folder containing the .pb file excluding the file name
third, it might be due to Tensorflow versioning issues
I had to follow all of the above steps and upgraded Tensorflow from v2.3 to v2.3, and it finally created a new saved_model.pb which was not corrupt and I could run it.

TensorFlow: Are saved variables from tf.saved_model and tf.train.Saver not compatible?

I saved a TensorFlow model using tf.saved_model and now I'm trying to load only the variables from that model using a tf.train.Saver, but I get one of the following two errors depending on the path I give it:
DataLossError: Unable to open table file saved_model/variables:
Failed precondition: saved_model/variables: perhaps your file is in a
different file format and you need to use a different restore operator?
or
InvalidArgumentError: Unsuccessful TensorSliceReader constructor:
Failed to get matching files on saved_model/variables/variables:
Not found: saved_model/variables
[[Node: save/RestoreV2_34 = RestoreV2[dtypes=[DT_FLOAT],
_device="/job:localhost/replica:0/task:0/cpu:0"]
(_arg_save/Const_1_0_0,
save/RestoreV2_34/tensor_names, save/RestoreV2_34/shape_and_slices)]]
tf.saved_model, when saving a model, creates a saved_model.pb protocol buffer and a folder named variables that contains two files:
variables.data-00000-of-00001
variables.index
tf.train.Saver.save() creates the following files:
some_name.data-00000-of-00001
some_name.index
some_name.meta
checkpoint
I have always assumed that the two output files ending in .data-00000-of-00001 and .index are compatible between both savers.
Is that not the case?

Tensorflow: Unable to open table file error

With tensorflow 1.2.0, I am trying to restore a saved model but I receive the error:
DataLossError (see above for traceback): Unable to open table file checkpoints/saved_2/saved_2_model_1.meta: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
[[Node: save/RestoreV2_185 = RestoreV2[dtypes=[DT_INT32], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_185/tensor_names, save/RestoreV2_185/shape_and_slices)]]
I am using the same tensorflow version for saving and restoring.
For saving:
saver = tf.train.Saver()
ckpt_dir = os.path.join(params['CHK_PATH'], folder)
if not os.path.exists(ckpt_dir):
os.makedirs(ckpt_dir)
ckpt_file = os.path.join(ckpt_dir, '{}'.format(name))
path = saver.save(sess, ckpt_file)
For restoring:
saver.restore(sess, ckpt_file)
I tried: model_saver = tf.train.Saver(write_version = saver_pb2.SaverDef.V1)
But the same problem remains.
saver.restore(sess,tf.train.latest_checkpoint(ckpt_dir))
works

Tensorflow import_meta_graph returns 'tensor does not exist' error

I am trying to import a saved neural network in Tensorflow. I saved it after training with:
saver = tf.train.Saver()
saver.save(sess, filename)
and in the script I use for inference, I restore it with:
sess = tf.Session()
saver = tf.train.import_meta_graph(filename.meta)
saver.restore(sess, tf.train.latest_checkpoint('./'))
But during the import_meta_graph line, I get this error:
KeyError: "The name 'dropout1/cond/dropout/Shape/Switch:1' refers to a Tensor which does not exist. The operation, 'dropout1/cond/dropout/Shape/Switch', does not exist in the graph."
I looked at the names of the tensors and operations in the original notebook in which I trained the model, and the names mentionned in the error message do exist. Moreover, I used the same code for saving and importing other models and it works. The only difference is that I trained these on an AWS machine, with an older version of tensorflow, while I trained the problematic one on my computer.