Does Google Cloud ML support GPU? - tensorflow

I'm testing Google Cloud ML for speeding up my ML model using Tensorflow.
Unfortunately, it seems like Google Cloud ML is extremely slow. My Mainstream-Level PC is at least 10x faster than Google Cloud ML.
I doubt it uses GPU, so I did a test. I modified a sample code to force using GPU.
diff --git a/mnist/trainable/trainer/task.py b/mnist/trainable/trainer/task.py
index 9acb349..a64a11d 100644
--- a/mnist/trainable/trainer/task.py
+++ b/mnist/trainable/trainer/task.py
## -131,11 +131,12 ## def run_training():
images_placeholder, labels_placeholder = placeholder_inputs(
FLAGS.batch_size)
- # Build a Graph that computes predictions from the inference model.
- logits = mnist.inference(images_placeholder, FLAGS.hidden1, FLAGS.hidden2)
+ with tf.device("/gpu:0"):
+ # Build a Graph that computes predictions from the inference model.
+ logits = mnist.inference(images_placeholder, FLAGS.hidden1, FLAGS.hidden2)
- # Add to the Graph the Ops for loss calculation.
- loss = mnist.loss(logits, labels_placeholder)
+ # Add to the Graph the Ops for loss calculation.
+ loss = mnist.loss(logits, labels_placeholder)
# Add to the Graph the Ops that calculate and apply gradients.
train_op = mnist.training(loss, FLAGS.learning_rate)
This training code works at my PC (gcloud beta ml local train ...) but not in cloud. It gives errors like this:
"Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 239, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 235, in main
run_training()
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 177, in run_training
sess.run(init)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 766, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 964, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1014, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1034, in _do_call
raise type(e)(node_def, op, message)
InvalidArgumentError: Cannot assign a device to node 'softmax_linear/biases': Could not satisfy explicit device specification '/device:GPU:0' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
Colocation Debug Info:
Colocation group had the following types and devices:
ApplyGradientDescent: CPU
Identity: CPU
Assign: CPU
Variable: CPU
[[Node: softmax_linear/biases = Variable[container="", dtype=DT_FLOAT, shape=[10], shared_name="", _device="/device:GPU:0"]()]]
Does Google Cloud ML support GPU?

GPUs are now in Beta and all Cloud ML customers have access.
Here are the docs for using GPUs with Cloud ML.

Related

How can I run mobiledet model successfully with the pretrained model in TF1 model zoo from TensorFlow object detection api?

I want to test the mobiledet model provided in the TF1 model zoo from TensorFlow object detection api. tf1 object detection model zoo
since the pretrained files contain both the pb file and the ckpt files the Screenshot of ckpt files.
So, I have tried two methods to load the pretrained model to do inference.
Firstly, I tried to load the tflite_graph.pb directly.I encountered the following problem, I tried to change the tf version, but it still did not solve.
The code is like this:
MODEL_DIR = '/tf_ckpts/ssdlite_mobiledet_cpu_320x320_coco_2020_05_19/'
MODEL_CHECK_FILE = os.path.join(MODEL_DIR, 'tflite_graph.pb')
graph = tf.Graph()
with graph.as_default():
graph_def = tf.GraphDef()
with tf.gfile.Open(MODEL_CHECK_FILE,'rb') as f:
graph_def.ParseFromString(f.read())
tf.import_graph_def(graph_def, name='')
Traceback (most recent call last):
File "/home/zhaoxin/workspace/models-1.12.0/research/inference_demo.py", line 41, in <module>
tf.import_graph_def(graph_def, name='')
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 405, in import_graph_def
producer_op_list=producer_op_list)
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 505, in _import_graph_def_internal
raise ValueError(str(e))
ValueError: NodeDef mentions attr 'exponential_avg_factor' not in Op<name=FusedBatchNormV3; signature=x:T, scale:U, offset:U, mean:U, variance:U -> y:T, batch_mean:U, batch_variance:U, reserve_space_1:U, reserve_space_2:U, reserve_space_3:U; attr=T:type,allowed=[DT_HALF, DT_BFLOAT16, DT_FLOAT]; attr=U:type,allowed=[DT_FLOAT]; attr=epsilon:float,default=0.0001; attr=data_format:string,default="NHWC",allowed=["NHWC", "NCHW"]; attr=is_training:bool,default=true>; NodeDef: {{node FeatureExtractor/MobileDetCPU/Conv/BatchNorm/FusedBatchNormV3}}. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
Then, I tried to load the ckpt files to run the model.
mobiledet = 'tf_ckpts/ssdlite_mobiledet_cpu_320x320_coco_2020_05_19/'
meta_path = mobiledet+'model.ckpt-400000.meta'
ckpt_path = mobiledet+'model.ckpt-400000'
with tf.Session() as sess:
saver=tf.train.import_meta_graph(meta_path)
saver.restore(sess, ckpt_path)
graph = tf.get_default_graph()
The error like this:
Traceback (most recent call last):
File "/home/zhaoxin/workspace/models-1.12.0/research/tf_load.py", line 15, in <module>
saver=tf.train.import_meta_graph(meta_path)
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 1453, in import_meta_graph
**kwargs)[0]
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 1477, in _import_meta_graph_with_return_elements
**kwargs))
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/framework/meta_graph.py", line 809, in import_scoped_meta_graph_with_return_elements
return_elements=return_elements)
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 405, in import_graph_def
producer_op_list=producer_op_list)
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 501, in _import_graph_def_internal
graph._c_graph, serialized, options) # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.NotFoundError: Op type not registered 'LegacyParallelInterleaveDatasetV2' in binary running on localhost.localdomain. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
It seems that the loading errors of the above two methds are caused by the inconsistency of the tf version, but I have tried many tf versions and failed to solve it. Has anyone successfully run the mobiledet model in TF1 object detection model zoo?
OS: linux
TF version: tf 1.15
#Shane Zhao - are you planning on training with custom dataset or are you using the pretrained graph as is? The version of Tensorflow should only matter during training to the best of my knowledge. Anyways please refer this demo from Google in Colab - https://colab.research.google.com/github/luxonis/depthai-ml-training/blob/master/colab-notebooks/Easy_Object_Detection_Demo_Training.ipynb#scrollTo=JDddx2rPfex9

cudnn handle not created, solve is clear, how to implement

Hello I am using ubuntu 16.04, ROS kinetic, tensorflow 1.13.1.
My aim to combine an ensenso n35 camera with its rosdriver to the mask rcnn node created for ROS. I have altered the original code for the mask rcnn node so that it takes a grayscale input an stacks it onto itself. I have actually already verified this to work by using a virtual version of the ensenso camera.The sdk contains an app that sets this up. It outputs a white image, however, this should not be an issue for testing functionality. The problem arrises when I attacht the actual camera to the system. This gives the following error:
2019-03-28 13:30:43.113919: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-03-28 13:30:43.872243: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-03-28 13:30:43.874466: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
None
None
Traceback (most recent call last):
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/nodes/mask_rcnn_node", line 182, in <module>
main()
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/nodes/mask_rcnn_node", line 179, in main
node.run()
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/nodes/mask_rcnn_node", line 104, in run
results = self._model.detect([np_image], verbose=0)
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/src/mask_rcnn_ros/model.py", line 2340, in detect
self.keras_model.predict([molded_images, image_metas], verbose=0)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/keras/engine/training.py", line 1790, in predict
verbose=verbose, steps=steps)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/keras/engine/training.py", line 1299, in _predict_loop
batch_outs = f(ins_batch)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2357, in __call__
**self.session_kwargs)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1156, in _run
feed_dict_tensor, options, run_metadata)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1334, in _do_run
run_metadata)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1354, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node conv1/convolution (defined at /home/riwo-rack-pc/.local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py:3195) ]]
[[node ROI/strided_slice_20 (defined at /home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/src/mask_rcnn_ros/utils.py:687) ]]
Caused by op u'conv1/convolution', defined at:
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/nodes/mask_rcnn_node", line 182, in <module>
main()
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/nodes/mask_rcnn_node", line 178, in main
node = MaskRCNNNode()
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/nodes/mask_rcnn_node", line 65, in __init__
config=config)
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/src/mask_rcnn_ros/model.py", line 1735, in __init__
self.keras_model = self.build(mode=mode, config=config)
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/src/mask_rcnn_ros/model.py", line 1791, in build
_, C2, C3, C4, C5 = resnet_graph(input_image, "resnet101", stage5=True)
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/src/mask_rcnn_ros/model.py", line 152, in resnet_graph
x = KL.Conv2D(64, (7, 7), strides=(2, 2), name='conv1', use_bias=True)(x)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/keras/engine/topology.py", line 603, in __call__
output = self.call(inputs, **kwargs)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/keras/layers/convolutional.py", line 164, in call
dilation_rate=self.dilation_rate)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 3195, in conv2d
data_format=tf_data_format)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 851, in convolution
return op(input, filter)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 966, in __call__
return self.conv_op(inp, filter)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 591, in __call__
return self.call(inp, filter)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 208, in __call__
name=self.name)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1026, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
op_def=op_def)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
self._traceback = tf_stack.extract_stack()
UnknownError (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node conv1/convolution (defined at /home/riwo-rack-pc/.local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py:3195) ]]
[[node ROI/strided_slice_20 (defined at /home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/src/mask_rcnn_ros/utils.py:687) ]]
I can't, for the life of me, figure out where this goes into error nor why. I was ensured that the virtual camera outputs the same data as the actual would, but the error only occurs when using the actual camera.
What i have found so far is that the following statement should be added somewhere in the code but I can not think of, or find, the proper placement for it:
config_pb2.GPUOptions(allow_growth=True)
Help would be much appreciated! Also if anyone thinks this question is better asked elsewhere I will move it there.
I have seen that you are using python=2.7, in the Mask-Rcnn documentation requires.
python_requires='>=3.4',
Other things you should consider.
If you're trying to use your gpu you shloud use tensorflow-gpu.
$ pip install tensorflow-gpu

Tensorflow OOM after freeze graph

I'm running a seq2seq model with tf, the inference program runs well when loading parameters from checkpoint file using tf.train.Saver. But after exporting the graph with freeze_graph.py (using tf.framework.graph_util.convert_variables_to_constants()), and import with tf.import_graph_def in the inference program, it got OOM problem.
Here is a part of error log:
W tensorflow/core/common_runtime/bfc_allocator.cc:274] ****************************************************************************************************
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 4.0KiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:983] Internal: Dst tensor is not initialized.
E tensorflow/core/common_runtime/executor.cc:594] Executor failed to create kernel. Internal: Dst tensor is not initialized.
[[Node: embedding_attention_seq2seq/embedding_attention_decoder/attention_decoder/AttnV_0 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [1024] values: -0.016628871 -0.2054652 -0.045054652...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Traceback (most recent call last):
File "inference.py", line 88, in console_main
result = list(inference(source_sentence))
File "inference.py", line 54, in inference
for sequence in result:
File "/data/experiment/decoder.py", line 115, in search_best_sequence
State.batch_predict(self.session, self.model, self.context, beam)
File "/data/experiment/decoder.py", line 82, in batch_predict
state_list[0].depth)
File "/data/experiment/seq2seq_model.py", line 452, in batch_feed_decoder
log_softmax, attns, state = session.run(output_fetch, input_feed)
File "/home/.conda/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/home/.conda/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 966, in _run
feed_dict_string, options, run_metadata)
File "/home/.conda/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1016, in _do_run
target_list, options, run_metadata)
File "/home/.conda/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1036, in _do_call
raise type(e)(node_def, op, message)
InternalError: Dst tensor is not initialized.
[[Node: embedding_attention_seq2seq/embedding_attention_decoder/attention_decoder/AttnV_0 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [1024] values: -0.016628871 -0.2054652 -0.045054652...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Caused by op u'embedding_attention_seq2seq/embedding_attention_decoder/attention_decoder/AttnV_0', defined at:
File "inference.py", line 169, in <module>
tf.app.run()
File "/home/.conda/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "inference.py", line 165, in main
console_main(session)
File "inference.py", line 66, in console_main
model = create_model(session, False)
File "/data/experiment/model.py", line 145, in create_model
tensor_name_pickle=tensor_name_pickle)
File "/data/experiment/seq2seq_model.py", line 106, in __init__
tf.import_graph_def(graph_def, name="")
File "/home/.conda/lib/python2.7/site-packages/tensorflow/python/framework/importer.py", line 287, in import_graph_def
op_def=op_def)
File "/home/.conda/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/.conda/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1264, in __init__
self._traceback = _extract_stack()
InternalError (see above for traceback): Dst tensor is not initialized.
[[Node: embedding_attention_seq2seq/embedding_attention_decoder/attention_decoder/AttnV_0 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [1024] values: -0.016628871 -0.2054652 -0.045054652...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
I thought it might cause by the memory issue of tf.Constant. Does someone have experience with this problem?
I had the same issue but when trying to load and run the inference from a C++ application using the C API. After a lot of twiddling and testing it appeared the culprit was the frozen graph and freeze_graph.py itself. It's probably a bug of some kind. There are actually multiple issue reports on github's TF repo, but they were just closed due to lack of activity, e.g. here and here. I guess apparent bugs of model freezing aren't of any priority.
In my case the model .pb file was around 500mb and it took around 10Gb of RAM while running a session. Not only did it occupy an insane amount of RAM, it was actually orders of magnitudes slower that way.
When I switched to loading just a SavedModel directory everything went to normal. I'm not sure how to achieve that in python, but for C code I replaced a TF_GraphImportGraphDef() call with TF_LoadSessionFromSavedModel().
I used TF v1.14.0. The library is built with Bazel by me, not the stock version. I could provide some details here and there if anybody was interested. Just not sure where to start, I had many trials and errors.

TensorFlow: InternalError: Blas SGEMM launch failed

When I run sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys}) I get InternalError: Blas SGEMM launch failed. Here is the full error and stack trace:
InternalErrorTraceback (most recent call last)
<ipython-input-9-a3261a02bdce> in <module>()
1 batch_xs, batch_ys = mnist.train.next_batch(100)
----> 2 sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata)
338 try:
339 result = self._run(None, fetches, feed_dict, options_ptr,
--> 340 run_metadata_ptr)
341 if run_metadata:
342 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
562 try:
563 results = self._do_run(handle, target_list, unique_fetches,
--> 564 feed_dict_string, options, run_metadata)
565 finally:
566 # The movers are no longer used. Delete them.
/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
635 if handle is None:
636 return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
--> 637 target_list, options, run_metadata)
638 else:
639 return self._do_call(_prun_fn, self._session, handle, feed_dict,
/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _do_call(self, fn, *args)
657 # pylint: disable=protected-access
658 raise errors._make_specific_exception(node_def, op, error_message,
--> 659 e.code)
660 # pylint: enable=protected-access
661
InternalError: Blas SGEMM launch failed : a.shape=(100, 784), b.shape=(784, 10), m=100, n=10, k=784
[[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_Placeholder_0/_4, Variable/read)]]
Caused by op u'MatMul', defined at:
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/local/lib/python2.7/dist-packages/ipykernel/__main__.py", line 3, in <module>
app.launch_new_instance()
File "/usr/local/lib/python2.7/dist-packages/traitlets/config/application.py", line 596, in launch_instance
app.start()
File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelapp.py", line 442, in start
ioloop.IOLoop.instance().start()
File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/ioloop.py", line 162, in start
super(ZMQIOLoop, self).start()
File "/usr/local/lib/python2.7/dist-packages/tornado/ioloop.py", line 883, in start
handler_func(fd_obj, events)
File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
self._handle_recv()
File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
self._run_callback(callback, msg)
File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
callback(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelbase.py", line 276, in dispatcher
return self.dispatch_shell(stream, msg)
File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelbase.py", line 228, in dispatch_shell
handler(stream, idents, msg)
File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelbase.py", line 391, in execute_request
user_expressions, allow_stdin)
File "/usr/local/lib/python2.7/dist-packages/ipykernel/ipkernel.py", line 199, in do_execute
shell.run_cell(code, store_history=store_history, silent=silent)
File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2723, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2825, in run_ast_nodes
if self.run_code(code, result):
File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2885, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-4-d7414c4b6213>", line 4, in <module>
y = tf.nn.softmax(tf.matmul(x, W) + b)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1036, in matmul
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 911, in _mat_mul
transpose_b=transpose_b, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2154, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1154, in __init__
self._traceback = _extract_stack()
Stack: EC2 g2.8xlarge machine, Ubuntu 14.04
Old question, but may help others.
Try to close interactive sessions active in other processes (if IPython Notebook - just restart kernels). This helped me!
Additionally, I use this code to close local sessions in this kernel during experiments:
if 'session' in locals() and session is not None:
print('Close interactive session')
session.close()
I encountered this problem and solved it by setting allow_soft_placement=True and gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.3), which specifically define the fraction of memory of GPU been used. I guess this has helped to avoid two tensorflow processes competing for the GPU memory.
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.3)
sess = tf.Session(config=tf.ConfigProto(
allow_soft_placement=True, log_device_placement=True))
I got this error when running Tensorflow Distributed. Did you check if any of the workers were reporting CUDA_OUT_OF_MEMORY errors? If this is the case it may have to do with where you place your weight and bias variables. E.g.
with tf.device("/job:paramserver/task:0/cpu:0"):
W = weight_variable([input_units, num_hidden_units])
b = bias_variable([num_hidden_units])
My environment is Python 3.5, Tensorflow 0.12 and Windows 10 (no Docker). I am training neural networks in both CPU and GPU. I came across the same error InternalError: Blas SGEMM launch failed whenever training in the GPU.
I could not find the reason why this error happens but I managed to run my code in the GPU by avoiding the tensorflow function tensorflow.contrib.slim.one_hot_encoding(). Instead, I do the one-hot-encoding operation in numpy (input and output variables).
The following code reproduces the error and the fix. It is a minimal setup to learn the y = x ** 2 function using gradient descent.
import numpy as np
import tensorflow as tf
import tensorflow.contrib.slim as slim
def test_one_hot_encoding_using_tf():
# This function raises the "InternalError: Blas SGEMM launch failed" when run in the GPU
# Initialize
tf.reset_default_graph()
input_size = 10
output_size = 100
input_holder = tf.placeholder(shape=[1], dtype=tf.int32, name='input')
output_holder = tf.placeholder(shape=[1], dtype=tf.int32, name='output')
# Define network
input_oh = slim.one_hot_encoding(input_holder, input_size)
output_oh = slim.one_hot_encoding(output_holder, output_size)
W1 = tf.Variable(tf.random_uniform([input_size, output_size], 0, 0.01))
output_v = tf.matmul(input_oh, W1)
output_v = tf.reshape(output_v, [-1])
# Define updates
loss = tf.reduce_sum(tf.square(output_oh - output_v))
trainer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
update_model = trainer.minimize(loss)
# Optimize
init = tf.initialize_all_variables()
steps = 1000
# Force CPU/GPU
config = tf.ConfigProto(
# device_count={'GPU': 0} # uncomment this line to force CPU
)
# Launch the tensorflow graph
with tf.Session(config=config) as sess:
sess.run(init)
for step_i in range(steps):
# Get sample
x = np.random.randint(0, 10)
y = np.power(x, 2).astype('int32')
# Update
_, l = sess.run([update_model, loss], feed_dict={input_holder: [x], output_holder: [y]})
# Check model
print('Final loss: %f' % l)
def test_one_hot_encoding_no_tf():
# This function does not raise the "InternalError: Blas SGEMM launch failed" when run in the GPU
def oh_encoding(label, num_classes):
return np.identity(num_classes)[label:label + 1].astype('int32')
# Initialize
tf.reset_default_graph()
input_size = 10
output_size = 100
input_holder = tf.placeholder(shape=[1, input_size], dtype=tf.float32, name='input')
output_holder = tf.placeholder(shape=[1, output_size], dtype=tf.float32, name='output')
# Define network
W1 = tf.Variable(tf.random_uniform([input_size, output_size], 0, 0.01))
output_v = tf.matmul(input_holder, W1)
output_v = tf.reshape(output_v, [-1])
# Define updates
loss = tf.reduce_sum(tf.square(output_holder - output_v))
trainer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
update_model = trainer.minimize(loss)
# Optimize
init = tf.initialize_all_variables()
steps = 1000
# Force CPU/GPU
config = tf.ConfigProto(
# device_count={'GPU': 0} # uncomment this line to force CPU
)
# Launch the tensorflow graph
with tf.Session(config=config) as sess:
sess.run(init)
for step_i in range(steps):
# Get sample
x = np.random.randint(0, 10)
y = np.power(x, 2).astype('int32')
# One hot encoding
x = oh_encoding(x, 10)
y = oh_encoding(y, 100)
# Update
_, l = sess.run([update_model, loss], feed_dict={input_holder: x, output_holder: y})
# Check model
print('Final loss: %f' % l)
maybe you not free your gpu rigthly , if you are using linux,try "ps -ef | grep python" to see what jobs are using GPU. then kill them
In my case, I had 2 python consoles open, both using keras/tensorflow.
As I closed the old console (forgotten from previous day),
everything started to work correctly.
So it is good to check, if you do not have multiple consoles / processes occupying GPU.
I closed all other Jupyter Sessions running and this solved the problem. I think It was GPU memory issue.
In my case,
First, I run
conda clean --all
to clean up tarballs and unused packages.
Then, I restart IDE (Pycharm in this case) and it works well. Environment: anaconda python 3.6, windows 10 64bit. I install tensorflow-gpu by a command provided on the anaconda website.
For me, I got this problem when I tried to run multiple tensorflow processes (e.g. 2) and both of them require to access GPU resources.
A simple solution is to make sure there has to be only one tensorflow process running at a single time.
For more details, you can see here.
To be clear, tensorflow will try (by default) to consume all available
GPUs. It cannot be run with other programs also active. Closing. Feel
free to reopen if this is actually another problem.
2.0 Compatible Answer: Providing 2.0 Code for erko's answer for the benefit of the Community.
session = tf.compat.v1.Session()
if 'session' in locals() and session is not None:
print('Close interactive session')
session.close()
In my case, the network filesystem under which libcublas.so was located simply died. The node was rebooted and everything was fine. Just to add another point to the dataset.
I encountered this error when running Keras CuDNN tests in parallel with pytest-xdist. The solution was to run them serially.
For me, I got this error when using Keras, and Tensorflow was the the backend. It was because the deep learning environment in Anaconda was not activated properly, as a result, Tensorflow didn't kick in properly either. I noticed this since the last time I activated my deep learning environment (which is called dl), the prompt changed in my Anaconda Prompt to this:
(dl) C:\Users\georg\Anaconda3\envs\dl\etc\conda\activate.d>set "KERAS_BACKEND=tensorflow"
While it only had the dl before then. Therefore, what I did to get rid of the above error was to close my jupyter notebook and Anaconda prompt, then relaunch, for several times.
I encountered this error after changing OS to Windows 10 recently, and I never encountered this before when using windows 7.
The error occurs if I load my GPU Tensorflow model when an another GPU program is running; it's my JCuda model loaded as socket server, which is not large. If I close my other GPU program(s), this Tensorflow model can be loaded very successfully.
This JCuda program is not large at all, just around 70M, and in comparison this Tensorflow model is more than 500M and much larger. But I am using 1080 ti, which has much memory. So it would be probably not an out-of-memory progblem, and it would perhaps be some tricky internal issue of Tensorflow regarding OS or Cuda. (PS: I am using Cuda version 8.0.44 and haven't downloaded a newer version.)
Restarting my Jupyter processes wasn't enough; I had to reboot my computer.
In my case, it is enough to open the Jupyter Notebooks in separate servers.
This error only occurs with me if I try using more than one tensorflow/keras model in the same server. It doesn't matter if open one notebook, execute it, than close and try opening another. If they are being loaded in the same Jupyter server the error always happens.

TensorFlow CIFAR10 cifar10_eval.py throws error: Compute status: Invalid argument: Assign requires shapes of both tensors to match

I am running the SVHN data set on the CIFAR10 example provided in the TensorFlow packages. All I did was just to change the source directories for the data, and modify a few lines of code here and there. I can successfully train the network.
However, when I run svhn_eval.py (the equivalent of cifar10_eval.py, names changed so I know how to organize my files), I get this error of assign requires shape of both tensors to match. I guess that the problem could be due to
saver.restore(sess, ckpt.model_checkpoint_path)
as the trace ends there and goes deep into the other files of TensorFlow. Does anyone know how to solve this?
W tensorflow/core/common_runtime/executor.cc:1076] 0x1a5bad0 Compute status: Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [2304,384] rhs shape= [4096,384]
[[Node: save/Assign_5 = Assign[T=DT_FLOAT, use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](local3/weights, save/restore_slice_5)]]
Traceback (most recent call last):
File "/home/samuelchin/svhn/svhn_eval.py", line 161, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/default/_app.py", line 30, in run
sys.exit(main(sys.argv))
File "/home/samuelchin/svhn/svhn_eval.py", line 157, in main
evaluate()
File "/home/samuelchin/svhn/svhn_eval.py", line 147, in evaluate
eval_once(saver, summary_writer, top_k_op, summary_op)
File "/home/samuelchin/svhn/svhn_eval.py", line 78, in eval_once
saver.restore(sess, ckpt.model_checkpoint_path)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 891, in restore
sess.run([self._restore_op_name], {self._filename_tensor_name: save_path})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 373, in run
results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 449, in _do_run
e.code)
tensorflow.python.framework.errors.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [2304,384] rhs shape= [4096,384]
[[Node: save/Assign_5 = Assign[T=DT_FLOAT, use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](local3/weights, save/restore_slice_5)]]
Caused by op u'save/Assign_5', defined at:
File "/home/samuelchin/svhn/svhn_eval.py", line 161, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/default/_app.py", line 30, in run
sys.exit(main(sys.argv))
File "/home/samuelchin/svhn/svhn_eval.py", line 157, in main
evaluate()
File "/home/samuelchin/svhn/svhn_eval.py", line 137, in evaluate
saver = tf.train.Saver(variables_to_restore)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 713, in __init__
restore_sequentially=restore_sequentially)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 432, in build
filename_tensor, vars_to_save, restore_sequentially, reshape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 202, in _AddRestoreOps
validate_shape=not reshape))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_state_ops.py", line 40, in assign
use_locking=use_locking, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.py", line 660, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1850, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1049, in __init__
self._traceback = _extract_stack()
EDIT 1: The lines of code that I changed are in distorted_inputs. In the the original CIFAR10, there was random crop from a 32x32 to a 24x24 picture. However, in the SVHN implementation, I input 32x32 images. Based on the output error, we can sort of figure out what's wrong.
lhs shape= [2304,384] rhs shape= [4096,384]
2304 = 24 * 24 * 4
4096 = 32 * 32 * 4
The question we have to ask ourselves now is, why multiply by 4?
The solution is that cifar10.py has a variable called IMAGE_SIZE. I left it as 24, because I thought it would not affect anything. However, what happens is that when you try and run the test set, the inputs are cropped to a size of IMAGE_SIZE x IMAGE_SIZE.
Therefore, when that wasn't changed, the tensor dimensions do not match. Changing that variable to 32 will do the trick.