I'm completely new to cntk. I recently installed cntk 2.7 (GPU version) on my pc (windows 10,i5-7200U CPU) with GeForce 940MX GPU. I'm trying to set up cntk and the faster rcnn object-detection example provided in the link below
https://learn.microsoft.com/en-us/cognitive-toolkit/object-detection-using-faster-r-cnn
I'm trying to run the toy example.
After running the install_data_and_model.py in Examples/Image/Detection/FastRCNN folder
I run the run_faster_rcnn.py Examples/Image/Detection/FasterRCNN folder
I get the following error:
Selected GPU[0] GeForce 940MX as the process wide default device.
About to throw exception 'Failed to parse Dictionary from the input stream.'
Traceback (most recent call last):
File "run_faster_rcnn.py", line 34, in
trained_model = train_faster_rcnn(cfg)
File "C:\Users\HP-PC\Anaconda3\Lib\site-packages\cntk\Examples\Image\Detection\FasterRCNN\FasterRCNN_train.py", line 291, in train_faster_rcnn
eval_model = train_faster_rcnn_e2e(cfg)
File "C:\Users\HP-PC\Anaconda3\Lib\site-packages\cntk\Examples\Image\Detection\FasterRCNN\FasterRCNN_train.py", line 314, in train_faster_rcnn_e2e
loss, pred_error = create_faster_rcnn_model(image_input, roi_input, dims_node, cfg)
File "C:\Users\HP-PC\Anaconda3\Lib\site-packages\cntk\Examples\Image\Detection\FasterRCNN\FasterRCNN_train.py", line 177, in create_faster_rcnn_model
base_model = load_model(cfg['BASE_MODEL_PATH'])
File "C:\Users\HP-PC\Anaconda3\lib\site-packages\cntk\internal\swig_helper.py", line 69, in wrapper
result = f(*args, **kwds)
File "C:\Users\HP-PC\Anaconda3\lib\site-packages\cntk\ops\functions.py", line 1721, in load_model
return Function.load(model, device, format)
File "C:\Users\HP-PC\Anaconda3\lib\site-packages\cntk\internal\swig_helper.py", line 69, in wrapper
result = f(*args, **kwds)
File "C:\Users\HP-PC\Anaconda3\lib\site-packages\cntk\ops\functions.py", line 1635, in load
return cntk_py.Function.load(str(model), device, format.value)
RuntimeError: Failed to parse Dictionary from the input stream.
[CALL STACK]
> CNTK::Internal:: UseSparseGradientAggregationInDataParallelSGD
- CNTK::operator>>
- CNTK::Function:: Load
- PyInit__cntk_py (x2)
- PyCFunction_Call
- PyEval_GetFuncDesc
- PyEval_EvalFrameEx (x2)
- PyFunction_SetAnnotations
- PyObject_Call
- PyEval_GetFuncDesc
- PyEval_EvalFrameEx (x2)
- PyEval_GetFuncDesc (x2)
Can someone help me with what the issue is all about?
This error always happen when you shutdown cntk when last model was saving,so the model file break
Related
I have some difficulty with tensorflow_datasets when I was trying to load mnist.
python:3.7
tensorflow : 2.1.0
tensorflow_datasets has been upgraded to latest version 4.6, because the default version of tensorflow_datasets from tensorflow installation has no attribute 'load'
But now the problem is data can not be downloaded and extracted successfully.
with the following command:
datasets = tfds.load(name="mnist")
the error message is :
Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to ~\tensorflow_datasets\mnist\3.0.1...
Extraction completed...: 0 file [00:00, ? file/s]██████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 138.37 url/s]
Dl Size...: 100%|██████████████████████████████████████████████████████████████████████████| 11594722/11594722 [00:00<00:00, 373172106.07 MiB/s]
Dl Completed...: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 122.03 url/s]
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\Wilso\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow_datasets\core\load.py", line 327, in load
dbuilder.download_and_prepare(**download_and_prepare_kwargs)
File "C:\Users\Wilso\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 483, in download_and_prepare
download_config=download_config,
File "C:\Users\Wilso\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 1222, in _download_and_prepare
disable_shuffling=self.info.disable_shuffling,
File "C:\Users\Wilso\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow_datasets\core\split_builder.py", line 310, in submit_split_generation
return self._build_from_generator(**build_kwargs)
File "C:\Users\Wilso\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow_datasets\core\split_builder.py", line 376, in _build_from_generator
leave=False,
File "C:\Users\Wilso\Anaconda3\envs\tfgpu\lib\site-packages\tqdm\std.py", line 1195, in iter
for obj in iterable:
File "C:\Users\Wilso\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow_datasets\image_classification\mnist.py", line 151, in _generate_examples
images = _extract_mnist_images(data_path, num_examples)
File "C:\Users\Wilso\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow_datasets\image_classification\mnist.py", line 350, in _extract_mnist_images
f.read(16) # header
File "C:\Users\Wilso\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow_core\python\lib\io\file_io.py", line 122, in read
self._preread_check()
File "C:\Users\Wilso\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow_core\python\lib\io\file_io.py", line 84, in _preread_check
compat.as_bytes(self.__name), 1024 * 512)
File "C:\Users\Wilso\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow_core\python\util\compat.py", line 87, in as_bytes
(bytes_or_text,))
TypeError: Expected binary or unicode string, got WindowsGPath('C:\Users\Wilso\tensorflow_datasets\downloads\extracted\GZIP.cvdf-datasets_mnist_train-images-idx3-ubyteRA_Kv3PMVG-iFHXoHqNwJlYF9WviEKQCTSyo8gNSNgk.gz')
Try:
(ds_train, ds_test), ds_info = tfds.load(
"mnist",
split=["train", "test"],
shuffle_files=True,
as_supervised=True, # will return tuple (img, label) otherwise dict
with_info=True, # able to get info about dataset
)
I am using the keras API in tensorflow 2.4.0 to save at different steps my model using this line of code:
tf.keras.callbacks.ModelCheckpoint(os.path.join(log_dir, "checkpoint_{epoch:02d}.tf"),
save_freq=train_steps_per_epoch * cfg.log.checkpoint_save_every_epochs,
save_weights_only=False))
When the model is saved I encounter some warnings:
[2021-03-17 12:11:09,974][absl][WARNING] - Found untraced functions such as nl_0_layer_call_fn, nl_0_layer_call_and_return_conditional_losses, nl_1_layer_call_fn, nl_1_layer_call_and_return_conditional_losses, conv2d_layer_call_fn while saving (showing 5 of 280). These functions will not be directly callable after loading.
And when I load the model using this line of code:
model = tf.keras.models.load_model(os.path.join(train_dir, f'checkpoint_{cfg.training.checkpoint_epoch:02d}.tf'))
I have this error:
Traceback (most recent call last):
File "run/linear_classifier_evaluation.py", line 51, in run
model = tf.keras.models.load_model(os.path.join(train_dir, f'checkpoint_{cfg.training.checkpoint_epoch:02d}.tf'))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/save.py", line 212, in load_model
return saved_model_load.load(filepath, compile, options)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/saved_model/load.py", line 138, in load
keras_loader.load_layers(compile=compile)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/saved_model/load.py", line 376, in load_layers
node_metadata.metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/saved_model/load.py", line 417, in _load_layer
obj, setter = self._revive_from_config(identifier, metadata, node_id)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/saved_model/load.py", line 434, in _revive_from_config
self._revive_graph_network(metadata, node_id) or
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/saved_model/load.py", line 471, in _revive_graph_network
inputs=[], outputs=[], name=config['name'])
KeyError: 'name'
I've searched online this error and I've read people suggesting to implement get_config and from_config methods in my model class but I am not using the H5 format so I shouldn't have to do that if I understood correctly the keras tutorial and others solutions where for tf1.x.
I would gladly welcome any help or suggestions on where to look.
When trying to reload the official tensorflow models for ResNet-50 checkpoint here:
http://download.tensorflow.org/models/official/20181001_resnet/checkpoints/resnet_imagenet_v1_fp32_20181001.tar.gz
...using this code:
import os
import tensorflow as tf
print(tf.__version__)
saver = tf.train.import_meta_graph(os.path.join(
'resnet_imagenet_v1_fp32_20181001',
'model.ckpt-225207.meta'))
I get this error:
1.13.1
Traceback (most recent call last):
File "chehckpoint_to_savedmodel.py", line 11, in <module>
'model.ckpt-225207.meta'))
File "/Users/*user*/Library/Python/3.7/lib/python/site-packages/tensorflow/python/training/saver.py", line 1435, in import_meta_graph
meta_graph_or_file, clear_devices, import_scope, **kwargs)[0]
File "/Users/*user*/Library/Python/3.7/lib/python/site-packages/tensorflow/python/training/saver.py", line 1457, in _import_meta_graph_with_return_elements
**kwargs))
File "/Users/*user*/Library/Python/3.7/lib/python/site-packages/tensorflow/python/framework/meta_graph.py", line 806, in import_scoped_meta_graph_with_return_elements
return_elements=return_elements)
File "/Users/*user*/Library/Python/3.7/lib/python/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/Users/*user*/Library/Python/3.7/lib/python/site-packages/tensorflow/python/framework/importer.py", line 399, in import_graph_def
_RemoveDefaultAttrs(op_dict, producer_op_list, graph_def)
File "/Users/*user*/Library/Python/3.7/lib/python/site-packages/tensorflow/python/framework/importer.py", line 159, in _RemoveDefaultAttrs
op_def = op_dict[node.op]
KeyError: 'ExperimentalFunctionBufferingResource'
Funny that googling "KeyError: 'ExperimentalFunctionBufferingResource'" returns zero hits. That's a first.
Ideas?
Not sure how else to reload this model. I also tried this:
path = os.path.join(
'resnet_imagenet_v1_fp32_20181001',
'model.ckpt-225207')
checkpoint = tf.train.Checkpoint()
status = checkpoint.restore(path)
print(status)
status.assert_consumed()
But it fails the assertion with no other information.
Thanks in advance.
P
This seems to be a issue with TF >= 1.13 versions. Try downgrading to 1.12 and give it a try. It should work.
Issues to track would be these : #29751
I am trying to create version under google cloud ml models for the successfully trained tensorflow estimator model. I believe that I am providing the correct Uri(in google storage) which includes saved_model.pb.
Framework: Tensorflow,
Framework Version: 1.13.1,
Runtime Version: 1.13,
Python: 3.5
Here is the traceback of the error:
Traceback (most recent call last):
File "/google/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 985, in Execute
resources = calliope_command.Run(cli=self, args=args)
File "/google/google-cloud-sdk/lib/googlecloudsdk/calliope/backend.py", line 795, in Run
resources = command_instance.Run(args)
File "/google/google-cloud-sdk/lib/surface/ml_engine/versions/create.py", line 119, in Run
python_version=args.python_version)
File "/google/google-cloud-sdk/lib/googlecloudsdk/command_lib/ml_engine/versions_util.py", line 114, in Create
message='Creating version (this might take a few minutes)...')
File "/google/google-cloud-sdk/lib/googlecloudsdk/command_lib/ml_engine/versions_util.py", line 75, in WaitForOpMaybe
return operations_client.WaitForOperation(op, message=message).response
File "/google/google-cloud-sdk/lib/googlecloudsdk/api_lib/ml_engine/operations.py", line 114, in WaitForOperation
sleep_ms=5000)
File "/google/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/waiter.py", line 264, in WaitFor
sleep_ms, _StatusUpdate)
File "/google/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/waiter.py", line 326, in PollUntilDone
sleep_ms=sleep_ms)
File "/google/google-cloud-sdk/lib/googlecloudsdk/core/util/retry.py", line 229, in RetryOnResult
if not should_retry(result, state):
File "/google/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/waiter.py", line 320, in _IsNotDone
return not poller.IsDone(operation)
File "/google/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/waiter.py", line 122, in IsDone
raise OperationError(operation.error.message)
OperationError: Bad model detected with error: "Failed to load model: a bytes-like object is required, not 'str' (Error code: 0)"
ERROR: (gcloud.ml-engine.versions.create) Bad model detected with error: "Failed to load model: a bytes-like object is required, not 'str' (Error code: 0)"
Do you have any idea what might be the problem?
EDIT
I am using:
tf.estimator.LatestExporter('exporter', model.serving_input_fn)
as a estimator exporter.
serving_input_fn:
def serving_input_fn():
inputs = {'string1': tf.placeholder(tf.int16, [None, MAX_SEQUENCE_LENGTH]),
'string2': tf.placeholder(tf.int16, [None, MAX_SEQUENCE_LENGTH])}
return tf.estimator.export.ServingInputReceiver(inputs, inputs)
PS: my model takes two inputs and returns one binary output.
Hello
I'm using TensorFlow v 1.4.0 and when I want to create a TensorBoard session with the following commands:
tensorboard --logdir="folder_path"
I have an error:
2018-04-11 17:18:44.422839: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:02:00.0
totalMemory: 11,91GiB freeMemory: 11,74GiB
2018-04-11 17:18:44.467559: E tensorflow/core/common_runtime/direct_session.cc:167] Internal: failed initializing StreamExecutor for CUDA device ordinal 1: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_INVALID_DEVICE
Traceback (most recent call last):
File "/usr/local/bin/tensorboard", line 11, in <module>
sys.exit(run_main())
File "/usr/local/lib/python3.5/dist-packages/tensorboard/main.py", line 36, in run_main
tf.app.run(main)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "/usr/local/lib/python3.5/dist-packages/tensorboard/main.py", line 45, in main
default.get_assets_zip_provider())
File "/usr/local/lib/python3.5/dist-packages/tensorboard/program.py", line 166, in main
tb = create_tb_app(plugins, assets_zip_provider)
File "/usr/local/lib/python3.5/dist-packages/tensorboard/program.py", line 200, in create_tb_app
window_title=FLAGS.window_title)
File "/usr/local/lib/python3.5/dist-packages/tensorboard/backend/application.py", line 124, in standard_tensorboard_wsgi
plugin_instances = [constructor(context) for constructor in plugins]
File "/usr/local/lib/python3.5/dist-packages/tensorboard/backend/application.py", line 124, in <listcomp>
plugin_instances = [constructor(context) for constructor in plugins]
File "/usr/local/lib/python3.5/dist-packages/tensorboard/plugins/beholder/beholder_plugin.py", line 47, in __init__
self.most_recent_frame = im_util.get_image_relative_to_script('no-data.png')
File "/usr/local/lib/python3.5/dist-packages/tensorboard/plugins/beholder/im_util.py", line 277, in get_image_relative_to_script
return read_image(filename)
File "/usr/local/lib/python3.5/dist-packages/tensorboard/plugins/beholder/im_util.py", line 265, in read_image
return np.array(decode_png(image_file.read()))
File "/usr/local/lib/python3.5/dist-packages/tensorboard/plugins/beholder/im_util.py", line 182, in __call__
self._lazily_initialize()
File "/usr/local/lib/python3.5/dist-packages/tensorboard/plugins/beholder/im_util.py", line 160, in _lazily_initialize
self._session = tf.Session(graph=graph, config=config)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1509, in __init__
super(Session, self).__init__(target, graph, config=config)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 638, in __init__
self._session = tf_session.TF_NewDeprecatedSession(opts, status)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.
TensorBoard worked when I used TensorFlow 1.6 but I think it is not the problem because I tried to re-use the version 1.6 today and it is not working
My folder contains a file "event.out.po", I checked it.
Do you know where is the problem ?
Thank you
I found the problem. In the batch before using TensorBoard, this command must be run to use the gpu:
export CUDA_VISIBLE_DEVICES=0
If the precedent command does not work, you can try:
export CUDA_VISIBLE_DEVICES=''