I wanted to use yolov4-tiny in the Tensorflow lite framework to count objects that cross a virtual line in a video.
I converted my darknet weights trained from AlexeyAB's repo using these commands:
python save_model.py --weights yolov4-tiny.weights --output ./checkpoints/yolov4-tiny-608-tf --input_size 608 --model yolov4 --tiny --framework tflite
python convert_tflite.py --weights ./checkpoints/yolov4-tiny-608-tf --output ./checkpoints/yolov4-tiny-608.tflite
You can find the convert_tflite.py here
The first command is successful using numpy==1.19.0. However, the second one shows these errors:
loc("batch_normalization/moving_mean"): error: is not immutable, try running tf-saved-model-optimize-global-tensors to prove tensors are immutable
Traceback (most recent call last):
File "C:\Python37\lib\site-packages\tensorflow\lite\python\convert.py", line 213, in toco_convert_protos
enable_mlir_converter)
File "C:\Python37\lib\site-packages\tensorflow\lite\python\wrap_toco.py", line 38, in wrapped_toco_convert
enable_mlir_converter)
Exception: <unknown>:0: error: loc("batch_normalization/moving_mean"): is not immutable, try running tf-saved-model-optimize-global-tensors to prove tensors are immutable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "convert_tflite.py", line 76, in <module>
app.run(main)
File "C:\Python37\lib\site-packages\absl\app.py", line 303, in run
_run_main(main, args)
File "C:\Python37\lib\site-packages\absl\app.py", line 251, in _run_main
sys.exit(main(argv))
File "convert_tflite.py", line 71, in main
save_tflite()
File "convert_tflite.py", line 45, in save_tflite
tflite_model = converter.convert()
File "C:\Python37\lib\site-packages\tensorflow\lite\python\lite.py", line 762, in convert
result = _convert_saved_model(**converter_kwargs)
File "C:\Python37\lib\site-packages\tensorflow\lite\python\convert.py", line 648, in convert_saved_model
enable_mlir_converter=True)
File "C:\Python37\lib\site-packages\tensorflow\lite\python\convert.py", line 216, in toco_convert_protos
raise ConverterError(str(e))
tensorflow.lite.python.convert.ConverterError: <unknown>:0: error: loc("batch_normalization/moving_mean"): is not immutable, try running tf-saved-model-optimize-global-tensors to prove tensors are immutable
I have tried other versions of Tensorflow (2.2, 2.3, 2.4) but I had no luck. What should I do?
There is a similar issue raised here: Tensorflow Issue 44790
Here are my system details:
Windows 10, x64
GeForce GTX 1060
NVIDIA Driver 460.89
CUDA 11.0.3
CuDNN 8.0.5.39
Python 3.7.2
pip install tensorflow==2.3.0rc0
and restart runtime before starting conversion
I resolved the problem by following a thread on Github issues.
In google colab, I had this issue if I used the default TF version, which was 2.4.0 or above.
Running !pip install tensorflow==2.3.0 and restarting the runtime, then converting corrected the issue.
For me this solved my problem :
import tensorflow as tf
if tf.__version__ != '2.3.0-rc0':
!pip uninstall -y tensorflow
!pip install tensorflow-gpu==2.3.0rc0
And restart runtime, in order to use newly installed versions.
I've been trying to train my own deeplab model from https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/pascal.md.
I'm running everything on Google Colab.
I've been able to train the model fine:
%%shell
export PYTHONPATH=$PYTHONPATH:"/content/models/research":"/content/models/research/slim"
NUM_ITERATIONS=50
python3 train.py \
--logtostderr \
--train_split="train" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--train_crop_size=200,200 \
--train_batch_size=12 \
--training_number_of_steps="${NUM_ITERATIONS}" \
--fine_tune_batch_norm=true \
--tf_initial_checkpoint="/content/deeplabv3_pascal_train_aug/model.ckpt.index" \
--train_logdir="/content/output" \
--dataset_dir="/content/drive/My Drive/Colab Notebooks/Background Removal/tfrecord"
And create visualizations fine:
%%shell
export PYTHONPATH=$PYTHONPATH:"/content/models/research":"/content/models/research/slim"
python3 vis.py \
--logtostderr \
--vis_split="val" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--vis_crop_size=200,200 \
--checkpoint_dir=/content/output \
--vis_logdir=/content/output/vis \
--dataset_dir="/content/drive/My Drive/Colab Notebooks/Background Removal/tfrecord" \
--max_number_of_iterations=1
But running export_model.py does not work. I thought it might have been an issue with the model I have trained, so I tried exporting the initial checkpoint I am training off of - it doesn't work either.
%%shell
export PYTHONPATH=$PYTHONPATH:"/content/models/research":"/content/models/research/slim"
NUM_ITERATIONS=50
python3 export_model.py \
--logtostderr \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--crop_size=200 \
--crop_size=200 \
--checkpoint_path='/content/output/model.ckpt-50.index' \
--export_path='/content/output'
Full output from running export_model.py:
WARNING:tensorflow:From /content/models/research/deeplab/core/conv2d_ws.py:40: The name tf.layers.Layer is deprecated. Please use tf.compat.v1.layers.Layer instead.
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
* https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
WARNING:tensorflow:From export_model.py:201: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.
WARNING:tensorflow:From export_model.py:117: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.
W0329 17:24:00.753659 139709292058496 module_wrapper.py:139] From export_model.py:117: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.
WARNING:tensorflow:From export_model.py:117: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.
W0329 17:24:00.753914 139709292058496 module_wrapper.py:139] From export_model.py:117: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.
WARNING:tensorflow:From export_model.py:118: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.
W0329 17:24:00.754124 139709292058496 module_wrapper.py:139] From export_model.py:118: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.
INFO:tensorflow:Prepare to export model to: /content/output
I0329 17:24:00.754279 139709292058496 export_model.py:118] Prepare to export model to: /content/output
WARNING:tensorflow:From export_model.py:91: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
W0329 17:24:00.755340 139709292058496 module_wrapper.py:139] From export_model.py:91: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
INFO:tensorflow:Exported model performs single-scale inference.
I0329 17:24:00.817728 139709292058496 export_model.py:130] Exported model performs single-scale inference.
WARNING:tensorflow:From /content/models/research/deeplab/model.py:320: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.
W0329 17:24:00.818036 139709292058496 module_wrapper.py:139] From /content/models/research/deeplab/model.py:320: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.
WARNING:tensorflow:From /content/models/research/deeplab/core/feature_extractor.py:461: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0329 17:24:00.818522 139709292058496 deprecation.py:323] From /content/models/research/deeplab/core/feature_extractor.py:461: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
WARNING:tensorflow:From /content/models/research/deeplab/core/feature_extractor.py:75: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
W0329 17:24:00.821603 139709292058496 module_wrapper.py:139] From /content/models/research/deeplab/core/feature_extractor.py:75: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
WARNING:tensorflow:From /tensorflow-1.15.2/python3.6/tensorflow_core/contrib/layers/python/layers/layers.py:1057: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
W0329 17:24:00.825009 139709292058496 deprecation.py:323] From /tensorflow-1.15.2/python3.6/tensorflow_core/contrib/layers/python/layers/layers.py:1057: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
WARNING:tensorflow:From /content/models/research/deeplab/core/utils.py:41: The name tf.image.resize_bilinear is deprecated. Please use tf.compat.v1.image.resize_bilinear instead.
W0329 17:24:02.636440 139709292058496 module_wrapper.py:139] From /content/models/research/deeplab/core/utils.py:41: The name tf.image.resize_bilinear is deprecated. Please use tf.compat.v1.image.resize_bilinear instead.
WARNING:tensorflow:From export_model.py:162: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.
W0329 17:24:02.986706 139709292058496 module_wrapper.py:139] From export_model.py:162: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.
WARNING:tensorflow:From export_model.py:178: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.
W0329 17:24:02.991279 139709292058496 module_wrapper.py:139] From export_model.py:178: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.
WARNING:tensorflow:From export_model.py:178: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Please use tf.global_variables instead.
W0329 17:24:02.991502 139709292058496 deprecation.py:323] From export_model.py:178: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Please use tf.global_variables instead.
WARNING:tensorflow:From export_model.py:181: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.
W0329 17:24:03.295938 139709292058496 module_wrapper.py:139] From export_model.py:181: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.
WARNING:tensorflow:From export_model.py:182: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
W0329 17:24:03.296255 139709292058496 module_wrapper.py:139] From export_model.py:182: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
WARNING:tensorflow:From /tensorflow-1.15.2/python3.6/tensorflow_core/python/tools/freeze_graph.py:127: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
W0329 17:24:03.419735 139709292058496 deprecation.py:323] From /tensorflow-1.15.2/python3.6/tensorflow_core/python/tools/freeze_graph.py:127: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
2020-03-29 17:24:03.901045: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-03-29 17:24:03.919472: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-29 17:24:03.920276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:04.0
2020-03-29 17:24:03.920544: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-03-29 17:24:03.922225: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-03-29 17:24:03.923832: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-03-29 17:24:03.924132: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-03-29 17:24:03.926131: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-03-29 17:24:03.927020: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-03-29 17:24:03.930883: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-03-29 17:24:03.931017: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-29 17:24:03.931838: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-29 17:24:03.932481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2020-03-29 17:24:03.937940: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2020-03-29 17:24:03.938159: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1a83480 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-03-29 17:24:03.938192: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-03-29 17:24:03.993090: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-29 17:24:03.993934: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1a83640 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-03-29 17:24:03.993966: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla K80, Compute Capability 3.7
2020-03-29 17:24:03.994138: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-29 17:24:03.994819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:04.0
2020-03-29 17:24:03.994883: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-03-29 17:24:03.994912: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-03-29 17:24:03.994937: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-03-29 17:24:03.994960: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-03-29 17:24:03.994984: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-03-29 17:24:03.995007: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-03-29 17:24:03.995031: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-03-29 17:24:03.995121: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-29 17:24:03.995850: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-29 17:24:03.996477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2020-03-29 17:24:03.996539: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-03-29 17:24:03.998097: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-29 17:24:03.998127: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] 0
2020-03-29 17:24:03.998140: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0: N
2020-03-29 17:24:03.998307: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-29 17:24:03.999000: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-29 17:24:03.999707: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2020-03-29 17:24:03.999752: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10805 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)
INFO:tensorflow:Restoring parameters from /content/output/model.ckpt-50.index
I0329 17:24:04.002565 139709292058496 saver.py:1284] Restoring parameters from /content/output/model.ckpt-50.index
Traceback (most recent call last):
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
(0) Not found: Tensor name "MobilenetV2/Conv/BatchNorm/beta" not found in checkpoint files /content/output/model.ckpt-50.index
[[{{node save/RestoreV2}}]]
(1) Not found: Tensor name "MobilenetV2/Conv/BatchNorm/beta" not found in checkpoint files /content/output/model.ckpt-50.index
[[{{node save/RestoreV2}}]]
[[save/RestoreV2/_301]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 1290, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
(0) Not found: Tensor name "MobilenetV2/Conv/BatchNorm/beta" not found in checkpoint files /content/output/model.ckpt-50.index
[[node save/RestoreV2 (defined at /tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py:1748) ]]
(1) Not found: Tensor name "MobilenetV2/Conv/BatchNorm/beta" not found in checkpoint files /content/output/model.ckpt-50.index
[[node save/RestoreV2 (defined at /tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py:1748) ]]
[[save/RestoreV2/_301]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'save/RestoreV2':
File "export_model.py", line 201, in <module>
tf.app.run()
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "export_model.py", line 178, in main
saver = tf.train.Saver(tf.all_variables())
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 828, in __init__
self.build()
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 840, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 878, in _build
build_restore=build_restore)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 508, in _build_internal
restore_sequentially, reshape)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 328, in _AddRestoreOps
restore_sequentially)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/ops/gen_io_ops.py", line 1696, in restore_v2
name=name)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 1300, in restore
names_to_keys = object_graph_key_mapping(save_path)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 1618, in object_graph_key_mapping
object_graph_string = reader.get_tensor(trackable.OBJECT_GRAPH_PROTO_KEY)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/pywrap_tensorflow_internal.py", line 915, in get_tensor
return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str))
tensorflow.python.framework.errors_impl.NotFoundError: _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint file
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "export_model.py", line 201, in <module>
tf.app.run()
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "export_model.py", line 192, in main
initializer_nodes=None)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/tools/freeze_graph.py", line 151, in freeze_graph_with_def_protos
saver.restore(sess, input_checkpoint)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 1306, in restore
err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
2 root error(s) found.
(0) Not found: Tensor name "MobilenetV2/Conv/BatchNorm/beta" not found in checkpoint files /content/output/model.ckpt-50.index
[[node save/RestoreV2 (defined at /tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py:1748) ]]
(1) Not found: Tensor name "MobilenetV2/Conv/BatchNorm/beta" not found in checkpoint files /content/output/model.ckpt-50.index
[[node save/RestoreV2 (defined at /tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py:1748) ]]
[[save/RestoreV2/_301]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'save/RestoreV2':
File "export_model.py", line 201, in <module>
tf.app.run()
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "export_model.py", line 178, in main
saver = tf.train.Saver(tf.all_variables())
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 828, in __init__
self.build()
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 840, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 878, in _build
build_restore=build_restore)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 508, in _build_internal
restore_sequentially, reshape)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 328, in _AddRestoreOps
restore_sequentially)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/ops/gen_io_ops.py", line 1696, in restore_v2
name=name)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
<ipython-input-14-46a5ede3bd50> in <module>()
----> 1 get_ipython().run_cell_magic('shell', '', 'export PYTHONPATH=$PYTHONPATH:"/content/models/research":"/content/models/research/slim"\nNUM_ITERATIONS=50\npython3 export_model.py \\\n --logtostderr \\\n --atrous_rates=6 \\\n --atrous_rates=12 \\\n --atrous_rates=18 \\\n --output_stride=16 \\\n --crop_size=200 \\\n --crop_size=200 \\\n --checkpoint_path=\'/content/output/model.ckpt-50.index\' \\\n --export_path=\'/content/output\'')
2 frames
/usr/local/lib/python3.6/dist-packages/google/colab/_system_commands.py in check_returncode(self)
136 if self.returncode:
137 raise subprocess.CalledProcessError(
--> 138 returncode=self.returncode, cmd=self.args, output=self.output)
139
140 def _repr_pretty_(self, p, cycle): # pylint:disable=unused-argument
CalledProcessError: Command 'export PYTHONPATH=$PYTHONPATH:"/content/models/research":"/content/models/research/slim"
NUM_ITERATIONS=50
python3 export_model.py \
--logtostderr \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--crop_size=200 \
--crop_size=200 \
--checkpoint_path='/content/output/model.ckpt-50.index' \
--export_path='/content/output'' returned non-zero exit status 1.
I'm aware of similar GitHub issues (https://github.com/tensorflow/models/issues/6212 and https://github.com/tensorflow/models/issues/3992), but it doesn't look like any were resolved. I also tried poking around in the export_model.py code in deeplab, but I don't understand the TF code enough to know where to look.
It is trying to search for model checkpoints trained on MobileNet-v2 backbone by default. But as you have trained your model on xception backbone. Please add '--model_variant="xception_65"' argument to your export_model.py.
When I execute
python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/faster_rcnn_inception_v2_pets.config
I got this error:
WARNING:tensorflow: The TensorFlow contrib module will not be included
in TensorFlow 2.0. For more information, please see: *
https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons * https://github.com/tensorflow/io (for I/O related ops) If you depend
on functionality not listed there, please file an issue.
WARNING:tensorflow:From train.py:55: The name tf.logging.set_verbosity
is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.
WARNING:tensorflow:From train.py:55: The name tf.logging.INFO is
deprecated. Please use tf.compat.v1.logging.INFO instead.
WARNING:tensorflow:From train.py:167: The name tf.app.run is
deprecated. Please use tf.compat.v1.app.run instead.
WARNING:tensorflow:From train.py:89: The name tf.gfile.MakeDirs is
deprecated. Please use tf.io.gfile.makedirs instead.
W1212 22:01:57.353342 3060 deprecation_wrapper.py:119] From
train.py:89: The name tf.gfile.MakeDirs is deprecated. Please use
tf.io.gfile.makedirs instead.
WARNING:tensorflow:From
c:\users\aamir\desktop\models\research\object_detection\utils\config_util.py:86:
The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile
instead.
W1212 22:01:57.354341 3060 deprecation_wrapper.py:119] From
c:\users\aamir\desktop\models\research\object_detection\utils\config_util.py:86:
The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile
instead.
WARNING:tensorflow:From train.py:94: The name tf.gfile.Copy is
deprecated. Please use tf.io.gfile.copy instead.
W1212 22:01:57.358338 3060 deprecation_wrapper.py:119] From
train.py:94: The name tf.gfile.Copy is deprecated. Please use
tf.io.gfile.copy instead.
WARNING:tensorflow:From
c:\users\aamir\desktop\models\research\object_detection\anchor_generators\grid_anchor_generator.py:59:
to_float (from tensorflow.python.ops.math_ops) is deprecated and will
be removed in a future version. Instructions for updating: Use
`tf.cast` instead. W1212 22:01:57.401396 3060 deprecation.py:323]
From
c:\users\aamir\desktop\models\research\object_detection\anchor_generators\grid_anchor_generator.py:59:
to_float (from tensorflow.python.ops.math_ops) is deprecated and will
be removed in a future version. Instructions for updating: Use
`tf.cast` instead. INFO:tensorflow:Scale of 0 disables regularizer.
I1212 22:01:57.406377 3060 regularizers.py:98] Scale of 0 disables
regularizer. INFO:tensorflow:Scale of 0 disables regularizer. I1212
22:01:57.406377 3060 regularizers.py:98] Scale of 0 disables
regularizer. WARNING:tensorflow:From
c:\users\aamir\desktop\models\research\object_detection\trainer.py:228:
create_global_step (from
tensorflow.contrib.framework.python.ops.variables) is deprecated and
will be removed in a future version. Instructions for updating: Please
switch to tf.train.create_global_step W1212 22:01:57.408376 3060
deprecation.py:323] From
c:\users\aamir\desktop\models\research\object_detection\trainer.py:228:
create_global_step (from
tensorflow.contrib.framework.python.ops.variables) is deprecated and
will be removed in a future version. Instructions for updating: Please
switch to tf.train.create_global_step WARNING:tensorflow:From
c:\users\aamir\desktop\models\research\object_detection\data_decoders\tf_example_decoder.py:104:
The name tf.FixedLenFeature is deprecated. Please use
tf.io.FixedLenFeature instead.
W1212 22:01:57.413390 3060 deprecation_wrapper.py:119] From
c:\users\aamir\desktop\models\research\object_detection\data_decoders\tf_example_decoder.py:104:
The name tf.FixedLenFeature is deprecated. Please use
tf.io.FixedLenFeature instead.
WARNING:tensorflow:From
c:\users\aamir\desktop\models\research\object_detection\data_decoders\tf_example_decoder.py:119:
The name tf.VarLenFeature is deprecated. Please use
tf.io.VarLenFeature instead.
W1212 22:01:57.414372 3060 deprecation_wrapper.py:119] From
c:\users\aamir\desktop\models\research\object_detection\data_decoders\tf_example_decoder.py:119:
The name tf.VarLenFeature is deprecated. Please use
tf.io.VarLenFeature instead.
Traceback (most recent call last): File "train.py", line 167, in
<module>
tf.app.run() File "C:\Users\Aamir\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py",
line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File
"C:\Users\Aamir\Anaconda3\lib\site-packages\absl\app.py", line 299, in
run
_run_main(main, args) File "C:\Users\Aamir\Anaconda3\lib\site-packages\absl\app.py", line 250, in
_run_main
sys.exit(main(argv)) File "train.py", line 163, in main
worker_job_name, is_chief, FLAGS.train_dir) File "c:\users\aamir\desktop\models\research\object_detection\trainer.py",
line 235, in train
train_config.prefetch_queue_capacity, data_augmentation_options) File
"c:\users\aamir\desktop\models\research\object_detection\trainer.py",
line 59, in create_input_queue
tensor_dict = create_tensor_dict_fn() File "train.py", line 120, in get_next
dataset_builder.build(config)).get_next() File "c:\users\aamir\desktop\models\research\object_detection\builders\dataset_builder.py",
line 138, in build
label_map_proto_file=label_map_proto_file) File "c:\users\aamir\desktop\models\research\object_detection\data_decoders\tf_example_decoder.py",
line 195, in __init__
use_display_name) File "c:\users\aamir\desktop\models\research\object_detection\utils\label_map_util.py",
line 149, in get_label_map_dict
label_map = load_labelmap(label_map_path) File "c:\users\aamir\desktop\models\research\object_detection\utils\label_map_util.py",
line 129, in load_labelmap
label_map_string = fid.read() File "C:\Users\Aamir\Anaconda3\lib\site-packages\tensorflow\python\lib\io\file_io.py",
line 122, in read
self._preread_check() File "C:\Users\Aamir\Anaconda3\lib\site-packages\tensorflow\python\lib\io\file_io.py",
line 84, in _preread_check
compat.as_bytes(self.__name), 1024 * 512) tensorflow.python.framework.errors_impl.NotFoundError:
NewRandomAccessFile failed to Create/Open:
C:/Users/Aamir/Desktop/models/research/object_detection/training/object-detection.pbtxt
: The system cannot find the file specified. ; No such file or
directory
Well somewhere in your code you use the file C:/Users/Aamir/Desktop/models/research/object_detection/training/object-detection.pbtxt
but it's missing. Maybe this file is in a different folder or you have a typo
When I run CNN in Tensorflow 2.0, I get CUDNN_STATUS_INTERNAL_ERROR.
It seems that libcublas.so.10.0 and libcudnn.so.7 are loaded fine.
versions should be fine:
Tensorflow 2.0
ubuntu 18.04
GeForce GTX 1650
NVIDIA driver 430
cudnn: 7.4.2.24 (also tried with 7.3.0.29 and 7.6.4.38)
(ref)
I tried followings but they didn't fix the problem:
I removed ~/.nv (ref)
Modified /usr/include/cudnn.h #include "driver_types.h" to #include <driver_types.h> and passed mnistCUDNN test (ref)
Questions:
Does passing the mnistCUDNN test mean that required packages are installed correctly?
How can I fix this problem below?
After all, here's error message:
Using TensorFlow backend.
2019-10-16 14:48:16.226892: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-10-16 14:48:16.255123: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
...
2019-10-16 14:48:16.370703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3253 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1650, pci bus id: 0000:01:00.0, compute capability: 7.5)
Train on 48000 samples, validate on 12000 samples
Epoch 1/12
2019-10-16 14:48:17.357747: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-10-16 14:48:17.525865: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
--error here--
2019-10-16 14:48:17.873127: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-10-16 14:48:17.879412: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
--error here--
2019-10-16 14:48:17.879516: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv2d_1/convolution}}]]
Traceback (most recent call last):
File "lenet.py", line 96, in <module> x_train, y_train, batch_size=128, epochs=12, validation_split=0.2
File "lenet.py", line 83, in train verbose=self.verbose
File "/home/yuyu/venv/lib/python3.6/site-packages/keras/engine/training.py", line 1239, in fit validation_freq=validation_freq)
File "/home/yuyu/venv/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 196, in fit_loop outs = fit_function(ins_batch)
File "/home/yuyu/venv/lib/python3.6/site-packages/tensorflow_core/python/keras/backend.py", line 3740, in __call__
outputs = self._graph_fn(*converted_inputs)
File "/home/yuyu/venv/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1081, in __call__
return self._call_impl(args, kwargs)
File "/home/yuyu/venv/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1121, in _call_impl
return self._call_flat(args, self.captured_inputs, cancellation_manager)
File "/home/yuyu/venv/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1224, in _call_flat
ctx, args, cancellation_manager=cancellation_manager)
File "/home/yuyu/venv/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 511, in call
ctx=ctx)
File "/home/yuyu/venv/lib/python3.6/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node conv2d_1/convolution (defined at /home/yuyu/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1751) ]] [Op:__inference_keras_scratch_graph_1220]
Function call stack:
keras_scratch_graph
I encountered this error on my Ubuntu 20.04 / RTX 2070 system. I found this:
https://gist.github.com/mikaelhg/cae5b7938aa3dfdf3d06a40739f2f3f4#file-cuda-install-md
where it suggests exporting an environment variable like this:
export TF_FORCE_GPU_ALLOW_GROWTH=true
That fixed it for me. Happy days.
I trained a model using the image_retraining guide of tensorflow (https://www.tensorflow.org/hub/tutorials/image_retraining). Then I tried to convert the pb model with tensorflojs_converter but I get an error about metagraph.
My environment is Ubuntu 18.04, I'm using tensorflow-gpu (https://www.tensorflow.org/install/gpu) and the latest version of tensorflowjs_converter (1.0.1).
Command executed for training the model:
python retrain.py --image_dir ./flower_photos --saved_model_dir=/tmp/saved_models/$(date +%s)/
Command executed for converting the model:
tensorflowjs_converter --input_format=tf_saved_model --output_format=tfjs_graph_model /tmp/saved_models/1555066703 /tmp/web_models
2019-04-12 15:45:06.797479: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-04-12 15:45:06.818525: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2592000000 Hz
2019-04-12 15:45:06.819292: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55637fb624e0 executing computations on platform Host. Devices:
2019-04-12 15:45:06.819327: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
2019-04-12 15:45:10.845000: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1364] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
WARNING: Logging before flag parsing goes to stderr.
W0412 15:45:11.737798 139737592477504 meta_graph.py:447] Issue encountered when serializing variables.
Type is unsupported, or the types of the items don't match field type in CollectionDef. Note this is a warning and probably safe to ignore.
to_proto not supported in EAGER mode.
W0412 15:45:11.738872 139737592477504 meta_graph.py:447] Issue encountered when serializing model_variables.
Type is unsupported, or the types of the items don't match field type in CollectionDef. Note this is a warning and probably safe to ignore.
to_proto not supported in EAGER mode.
2019-04-12 15:45:11.743861: I tensorflow/core/grappler/devices.cc:61] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA support)
2019-04-12 15:45:11.743944: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
2019-04-12 15:45:11.762060: E tensorflow/core/grappler/grappler_item_builder.cc:636] Init node final_retrain_ops/weights/final_weights/Assign doesn't exist in graph
Traceback (most recent call last):
File "/home/davide/.local/bin/tensorflowjs_converter", line 11, in <module>
sys.exit(main())
File "/home/davide/.local/lib/python2.7/site-packages/tensorflowjs/converters/converter.py", line 358, in main
strip_debug_ops=FLAGS.strip_debug_ops)
File "/home/davide/.local/lib/python2.7/site-packages/tensorflowjs/converters/tf_saved_model_conversion_v2.py", line 271, in convert_tf_saved_model
concrete_func)
File "/home/davide/.local/lib/python2.7/site-packages/tensorflow/python/framework/convert_to_constants.py", line 99, in convert_variables_to_constants_v2
graph_def = _run_inline_graph_optimization(func)
File "/home/davide/.local/lib/python2.7/site-packages/tensorflow/python/framework/convert_to_constants.py", line 57, in _run_inline_graph_optimization
return tf_optimizer.OptimizeGraph(config, meta_graph)
File "/home/davide/.local/lib/python2.7/site-packages/tensorflow/python/grappler/tf_optimizer.py", line 43, in OptimizeGraph
verbose, graph_id, status)
File "/home/davide/.local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 548, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Failed to import metagraph, check error log for more info.
I expect a tfjs model, I get the above result.