I am trying to deploy my model. I am encountering the following problem:
FileNotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ram://a603e930-4fda-4105-8554-7af5e5fc02f5/variables/variables
You may be trying to load on a different device from the computational device. Consider setting the experimental_io_device option in tf.saved_model.LoadOptions to the io_device such as '/job:localhost'
This happened when I stored an NLP model in a pickle. I have seen that it does not work so I then tried to save the model as .h5 The problem still persists and shows me the above error.
Related
In my original python code, there is a frequent restore of the ckpt model file. It takes too much time to read the checkpoints again and again. So I decided to save the model in the memory. A simple way is to create a RAMDisk and save the model in that disk. However, something unexpected happens.
I deployed 1G of RAMDisk according to the tutorial How to Create RAM Disk in Windows 10 for Super-Fast Read and Write Speeds. My system is windows 11.
I made two attempts: In the first one, I copied my code to the RAMDisk E: and used tf.train.Saver().save(self.sess,'./') to save the model, but it reports that UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 114: invalid start byte. However, if I put the code on other normal folders, it runs successfully.
In the second attempt, I put the code under D: and modified the line as tf.train.Saver().save(self.sess,'E:\\'), and it reports that cannot create directory E: Permission Denied. Obviously, E:\ is not a directory to create. So I don't know how to handle this.
Your jupyter/python environment cannot go beyond the directory from which jupyter/python is started from and that's why you get a permission denied error.
However, you can run shell commands from the jupyter notebook. If your user has write access to your destination, you can do the following.
model.save("my_model") # This will save the model to the current directory.
!mv "my_model" "E:\my_model" # This will move the model from the current directory to your required directory.
On a side note, I when searching for tf.train.Saver().save(), I get this page as the only relevant result, which says it is used for saving checkpoints and not model. Also they recommend switching to the newer tf.train.Checkpoint or tf.keras.Model.save_weights. None the less, the above method should work as expected.
I have a colab pro plus subscription, and I ran a python code that runs a network using pytorch.
Until now, I succeeded in training my network and using my Google drive to save checkpoints and such.
But now I had a run that ran for around 16 hours, and no checkpoint or any other data was saved - even though the logs clearly show that I have saved the data and even evaluated the metrics on saved data.
Maybe the data was saved on a different folder somehow?
I looked in the drive activity and I could not see any data that was saved.
Has anyone ran to this before?
Any help would be appreciated.
Keras "SavedModel file does not exist at..." error occurs for a model retrieved from an online URL and never manually saved at any local directory.
The code ran just fine for as long as I've been working on it before but I reopened the project today and without changing anything it now gives me this error.
Code Snippet & Error Screenshot
Managed to solve it myself. Simply visit the file directory the error mentions and delete the folder with the random numbers and letters in it. Rerun the program and it'll properly generate the files needed.
I've been trying to run the object_detection_tutorial located at https://github.com/tensorflow/models/tree/master/research/object_detection but keep getting an error when trying to load the model.
I get:
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
when i try to do:
tf.saved_model.load(model_dir)
I've gone through the installation instructions and done all of that but I can't load the model. Anyone have any idea? Thanks.
I've created a tensorflow session where the export.meta file is 553.17 MB. Whenever I try to load the exported graph into Google ML it crashes with the error:
gcloud beta ml models versions create --origin=${TRAIN_PATH}/model/ --model=${MODEL_NAME} v1
ERROR: (gcloud.beta.ml.models.versions.create) Error Response: [3] Create Version failed.Error accessing the model location gs://experimentation-1323-ml/face/model/. Please make sure that service account cloud-ml-service#experimentation-1323-10cd8.iam.gserviceaccount.com has read access to the bucket and the objects.
The graph is a static version of a VGG16 face recognition, so export is empty except for a dummy variable, while all the "weights" are constants in export.meta. Could that affect things? How do I go about debugging this?
Update (11/18/2017)
The service currently expects deployed models to have checkpoint files. Some models, such as inception, have folded variables into constants and therefore do not have checkpoint files. We will work on addressing this limitation in the service. In the meantime, as a workaround, you can create a dummy variable, e.g.,
import os
output_dir = 'my/output/dir'
dummy = tf.Variable([0])
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
saver.save(sess, os.path.join(output_dir, 'export'))
Update (11/17/2017)
A previous version of this post noted that the root cause of the problem was that the training service was producing V2 checkpoints but the prediction service was unable to consume them. This has now been fixed, so it is no longer necessary to force training to write V1 checkpoints; by default, V2 checkpoints are written.
Please retry.
Previous Answer
For future posterity, the following was the original answer, which may still apply to some users in some cases, so leaving here:
The error indicates that this is a permissions problem, and not related to the size of the model. The getting started instructions recommend running:
gcloud beta ml init-project
That generally sets up the permissions properly, as long as the bucket that has the model ('experimentation-1323-ml') is in the same project as you are using to deploy the model (the normal situation).
If things still aren't working, you'll need to follow these instructions for manually setting the correct permissions.