I am a noob and i need help. If I try this simple try/except statement and if I havn't a pretrained model to load, I always get an error.
But if I have already a model to load it works. It does also work when I remove the try/except and train my model every time I run my programm.
Thanks.
try:
model.load("model.tflearn")
except:
model.fit(training, output, n_epoch=300, batch_size=8, show_metric=True)
model.save("model.tflearn")
Traceback (most recent call last):
File "C:/Users/Eric/PycharmProjects/ai_chatbot/nlp.py", line 82, in <module>
model.fit(training, output, n_epoch=300, batch_size=8, show_metric=True)
File "C:\Users\Eric\anaconda3\envs\chatbot\lib\site-packages\tflearn\models\dnn.py", line 206, in fit
callbacks=callbacks)
File "C:\Users\Eric\anaconda3\envs\chatbot\lib\site-packages\tflearn\helpers\trainer.py", line 344, in fit
show_metric)
File "C:\Users\Eric\anaconda3\envs\chatbot\lib\site-packages\tflearn\helpers\trainer.py", line 826, in _train
tflearn.is_training(True, session=self.session)
File "C:\Users\Eric\anaconda3\envs\chatbot\lib\site-packages\tflearn\config.py", line 95, in is_training
tf.get_collection('is_training_ops')[0].eval(session=session)
File "C:\Users\Eric\anaconda3\envs\chatbot\lib\site-packages\tensorflow\python\framework\ops.py", line 921, in eval
return _eval_using_default_session(self, feed_dict, self.graph, session)
File "C:\Users\Eric\anaconda3\envs\chatbot\lib\site-packages\tensorflow\python\framework\ops.py", line 5512, in _eval_using_default_session
return session.run(tensors, feed_dict)
File "C:\Users\Eric\anaconda3\envs\chatbot\lib\site-packages\tensorflow\python\client\session.py", line 968, in run
run_metadata_ptr)
File "C:\Users\Eric\anaconda3\envs\chatbot\lib\site-packages\tensorflow\python\client\session.py", line 1114, in _run
raise RuntimeError('Attempted to use a closed Session.')
RuntimeError: Attempted to use a closed Session.
PS: I use anaconda with a venv and tflearn for my model
I'm not sure how tflearn works but I think your error comes from this line: model.fit(training, output, n_epoch=300, batch_size=8, show_metric=Tr which is outside try: block.
Related
I get a AssertionError when passing my tf.Dataset into the tf.Keras Model's fit() method.
I am using tensorflow==2.0.0.
I checked if my dataset works by:
# for x,y in dataset:
# print(x.shape, y.shape)
which yields correct shapes for models input data.
The full trace is:
Traceback (most recent call last):
File "/anaconda3/envs/ml36/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/anaconda3/envs/ml36/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/me/train.py", line 102, in <module>
start_training(**arguments)
File "/me/train.py", line 66, in start_training
steps_per_epoch=TRAIN_STEPS_PER_EPOCH,
File "/anaconda3/envs/ml36/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py", line 728, in fit
use_multiprocessing=use_multiprocessing)
File "/anaconda3/envs/ml36/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_distributed.py", line 789, in fit
*args, **kwargs)
File "/anaconda3/envs/ml36/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_distributed.py", line 776, in wrapper
mode=dc.CoordinatorMode.INDEPENDENT_WORKER)
File "/anaconda3/envs/ml36/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_coordinator.py", line 782, in run_distribute_coordinator
rpc_layer)
File "/anaconda3/envs/ml36/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_coordinator.py", line 344, in _run_single_worker
assert strategy
AssertionError
I had the same error when running gcloud ai-platform local train on the final release of tensorflow 2.0.0. However, it was working on earlier releases. Try to downgrade to 2.0.0b1:
pip install tensorflow==2.0.0b1
--
Also found that you don't get this error if you run directly in python or if you run it in the cloud.
If you are training locally without using any distributed strategies you can add following lines to your code to solve this issue:
TF_CONFIG = os.environ.get('TF_CONFIG')
if TF_CONFIG:
os.environ.pop('TF_CONFIG')
Hello I am using ubuntu 16.04, ROS kinetic, tensorflow 1.13.1.
My aim to combine an ensenso n35 camera with its rosdriver to the mask rcnn node created for ROS. I have altered the original code for the mask rcnn node so that it takes a grayscale input an stacks it onto itself. I have actually already verified this to work by using a virtual version of the ensenso camera.The sdk contains an app that sets this up. It outputs a white image, however, this should not be an issue for testing functionality. The problem arrises when I attacht the actual camera to the system. This gives the following error:
2019-03-28 13:30:43.113919: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-03-28 13:30:43.872243: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-03-28 13:30:43.874466: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
None
None
Traceback (most recent call last):
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/nodes/mask_rcnn_node", line 182, in <module>
main()
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/nodes/mask_rcnn_node", line 179, in main
node.run()
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/nodes/mask_rcnn_node", line 104, in run
results = self._model.detect([np_image], verbose=0)
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/src/mask_rcnn_ros/model.py", line 2340, in detect
self.keras_model.predict([molded_images, image_metas], verbose=0)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/keras/engine/training.py", line 1790, in predict
verbose=verbose, steps=steps)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/keras/engine/training.py", line 1299, in _predict_loop
batch_outs = f(ins_batch)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2357, in __call__
**self.session_kwargs)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1156, in _run
feed_dict_tensor, options, run_metadata)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1334, in _do_run
run_metadata)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1354, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node conv1/convolution (defined at /home/riwo-rack-pc/.local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py:3195) ]]
[[node ROI/strided_slice_20 (defined at /home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/src/mask_rcnn_ros/utils.py:687) ]]
Caused by op u'conv1/convolution', defined at:
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/nodes/mask_rcnn_node", line 182, in <module>
main()
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/nodes/mask_rcnn_node", line 178, in main
node = MaskRCNNNode()
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/nodes/mask_rcnn_node", line 65, in __init__
config=config)
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/src/mask_rcnn_ros/model.py", line 1735, in __init__
self.keras_model = self.build(mode=mode, config=config)
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/src/mask_rcnn_ros/model.py", line 1791, in build
_, C2, C3, C4, C5 = resnet_graph(input_image, "resnet101", stage5=True)
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/src/mask_rcnn_ros/model.py", line 152, in resnet_graph
x = KL.Conv2D(64, (7, 7), strides=(2, 2), name='conv1', use_bias=True)(x)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/keras/engine/topology.py", line 603, in __call__
output = self.call(inputs, **kwargs)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/keras/layers/convolutional.py", line 164, in call
dilation_rate=self.dilation_rate)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 3195, in conv2d
data_format=tf_data_format)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 851, in convolution
return op(input, filter)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 966, in __call__
return self.conv_op(inp, filter)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 591, in __call__
return self.call(inp, filter)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 208, in __call__
name=self.name)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1026, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
op_def=op_def)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
self._traceback = tf_stack.extract_stack()
UnknownError (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node conv1/convolution (defined at /home/riwo-rack-pc/.local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py:3195) ]]
[[node ROI/strided_slice_20 (defined at /home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/src/mask_rcnn_ros/utils.py:687) ]]
I can't, for the life of me, figure out where this goes into error nor why. I was ensured that the virtual camera outputs the same data as the actual would, but the error only occurs when using the actual camera.
What i have found so far is that the following statement should be added somewhere in the code but I can not think of, or find, the proper placement for it:
config_pb2.GPUOptions(allow_growth=True)
Help would be much appreciated! Also if anyone thinks this question is better asked elsewhere I will move it there.
I have seen that you are using python=2.7, in the Mask-Rcnn documentation requires.
python_requires='>=3.4',
Other things you should consider.
If you're trying to use your gpu you shloud use tensorflow-gpu.
$ pip install tensorflow-gpu
I am trying to load meta graph of trained networks "name.ckpt-1.meta" using tf.train.import_meta_graph("./name.ckpt-1.meta")
but the following error appears:
Traceback (most recent call last):
File "/home/rapsodo/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3265, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-7-634d5d15ac05>", line 1, in <module>
saver = tf.train.import_meta_graph(input_checkpoint + '.meta', clear_devices=False)
File "/home/rapsodo/workspace_mike3352/anaconda2/envs/mike_tfpy36/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1960, in import_meta_graph
**kwargs)
File "/home/rapsodo/workspace_mike3352/anaconda2/envs/mike_tfpy36/lib/python3.6/site-packages/tensorflow/python/framework/meta_graph.py", line 744, in import_scoped_meta_graph
producer_op_list=producer_op_list)
File "/home/rapsodo/workspace_mike3352/anaconda2/envs/mike_tfpy36/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 432, in new_func
return func(*args, **kwargs)
File "/home/rapsodo/workspace_mike3352/anaconda2/envs/mike_tfpy36/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 391, in import_graph_def
_RemoveDefaultAttrs(op_dict, producer_op_list, graph_def)
File "/home/rapsodo/workspace_mike3352/anaconda2/envs/mike_tfpy36/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 158, in _RemoveDefaultAttrs
op_def = op_dict[node.op]
KeyError: 'ImageProjectiveTransformV2'
I did not understand the reason and did not saw same thing somewhere else, Im not sure because of the tensorflow version or something else.
I found the solution; It is because of version mismatch. Newer version of tensorflow does not match with older versions in terms of saving graph etc.
If we saved checkpoints with older version, we should use proper version (same version is preferred) to load meta graph or frozen graph.
I try to test this tutorial https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/layers/cnn_mnist.py about Convolutional Neural Network, It is explained , but I find this error:
Traceback (most recent call last):
File "Convolution_Neural_Network.py", line 161, in <module>
tf.app.run()
File "/usr/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "Convolution_Neural_Network.py", line 129, in main
model_fn=cnn_model_fn, model_dir="/mnist_convnet_model/")
File "/usr/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 171, in __init__
_verify_model_fn_args(model_fn, params)
File "/usr/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 742, in _verify_model_fn_args
raise ValueError('model_fn (%s) must include features argument.' % model_fn)
ValueError: model_fn (<function cnn_model_fn at 0x53790c8>) must include features argument.
The error is from this line:
model_fn=cnn_model_fn, model_dir="/mnist_convnet_model/")
I would be very grateful if you could help me please.
you probably renamed the variable called 'features' in definition of function cnn_model_fn. I had similiar problem, because I renamed it to 'inputs'.
I'm running a seq2seq model with tf, the inference program runs well when loading parameters from checkpoint file using tf.train.Saver. But after exporting the graph with freeze_graph.py (using tf.framework.graph_util.convert_variables_to_constants()), and import with tf.import_graph_def in the inference program, it got OOM problem.
Here is a part of error log:
W tensorflow/core/common_runtime/bfc_allocator.cc:274] ****************************************************************************************************
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 4.0KiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:983] Internal: Dst tensor is not initialized.
E tensorflow/core/common_runtime/executor.cc:594] Executor failed to create kernel. Internal: Dst tensor is not initialized.
[[Node: embedding_attention_seq2seq/embedding_attention_decoder/attention_decoder/AttnV_0 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [1024] values: -0.016628871 -0.2054652 -0.045054652...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Traceback (most recent call last):
File "inference.py", line 88, in console_main
result = list(inference(source_sentence))
File "inference.py", line 54, in inference
for sequence in result:
File "/data/experiment/decoder.py", line 115, in search_best_sequence
State.batch_predict(self.session, self.model, self.context, beam)
File "/data/experiment/decoder.py", line 82, in batch_predict
state_list[0].depth)
File "/data/experiment/seq2seq_model.py", line 452, in batch_feed_decoder
log_softmax, attns, state = session.run(output_fetch, input_feed)
File "/home/.conda/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/home/.conda/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 966, in _run
feed_dict_string, options, run_metadata)
File "/home/.conda/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1016, in _do_run
target_list, options, run_metadata)
File "/home/.conda/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1036, in _do_call
raise type(e)(node_def, op, message)
InternalError: Dst tensor is not initialized.
[[Node: embedding_attention_seq2seq/embedding_attention_decoder/attention_decoder/AttnV_0 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [1024] values: -0.016628871 -0.2054652 -0.045054652...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Caused by op u'embedding_attention_seq2seq/embedding_attention_decoder/attention_decoder/AttnV_0', defined at:
File "inference.py", line 169, in <module>
tf.app.run()
File "/home/.conda/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "inference.py", line 165, in main
console_main(session)
File "inference.py", line 66, in console_main
model = create_model(session, False)
File "/data/experiment/model.py", line 145, in create_model
tensor_name_pickle=tensor_name_pickle)
File "/data/experiment/seq2seq_model.py", line 106, in __init__
tf.import_graph_def(graph_def, name="")
File "/home/.conda/lib/python2.7/site-packages/tensorflow/python/framework/importer.py", line 287, in import_graph_def
op_def=op_def)
File "/home/.conda/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/.conda/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1264, in __init__
self._traceback = _extract_stack()
InternalError (see above for traceback): Dst tensor is not initialized.
[[Node: embedding_attention_seq2seq/embedding_attention_decoder/attention_decoder/AttnV_0 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [1024] values: -0.016628871 -0.2054652 -0.045054652...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
I thought it might cause by the memory issue of tf.Constant. Does someone have experience with this problem?
I had the same issue but when trying to load and run the inference from a C++ application using the C API. After a lot of twiddling and testing it appeared the culprit was the frozen graph and freeze_graph.py itself. It's probably a bug of some kind. There are actually multiple issue reports on github's TF repo, but they were just closed due to lack of activity, e.g. here and here. I guess apparent bugs of model freezing aren't of any priority.
In my case the model .pb file was around 500mb and it took around 10Gb of RAM while running a session. Not only did it occupy an insane amount of RAM, it was actually orders of magnitudes slower that way.
When I switched to loading just a SavedModel directory everything went to normal. I'm not sure how to achieve that in python, but for C code I replaced a TF_GraphImportGraphDef() call with TF_LoadSessionFromSavedModel().
I used TF v1.14.0. The library is built with Bazel by me, not the stock version. I could provide some details here and there if anybody was interested. Just not sure where to start, I had many trials and errors.