tensorflow 2.5x slower than pytorch on vgg16 architecture - tensorflow

So I'm trying to get into tensorflow and liking it so far.
Today I upgraded to cuda 8, cudnn 5.1 and tensorflow 0.12.1. Using a Maxwell Titan X GPU.
Using the following short code of loading the pretrained vgg16:
import tensorflow as tf
from tensorflow.contrib import slim
from tensorflow.contrib.slim import nets
tf.reset_default_graph()
input_images = tf.placeholder(tf.float32, [None, 224, 224, 3], 'image')
preds = nets.vgg.vgg_16(input_images, is_training=False)[0]
saver = tf.train.Saver()
config = tf.ConfigProto(log_device_placement=True,
gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction = 0.5))
sess = tf.InteractiveSession(config=config)
saver.restore(sess, './vgg_16.ckpt')
_in = np.random.randn(16, 224, 224, 3).astype(np.float32)
I then time the forward pass :
%timeit sess.run(preds, feed_dict={input_images: _in})
I get 160ms per batch (forward pass only), which seems 2.5x slower than the respective configuration in torch according to this benchmark (and also slower than MatconvNet).
The operations seem correctly assigned to the gpu, and the cuda libraries properly found, what else am I missing?
Edit : cudnn and cuda properly found
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:04:00.0
Total memory: 11.92GiB
Free memory: 11.81GiB
Also the feeding does not seem to be the problem since replacing input_images by tf.random_uniform((16, 224, 224, 3), maxval=255)does not change the timing.
Edit 2: So I compared to the pytorch version running on the same machine and I get (batches of 16x224x224x3) :
Resnet-50 : pytorch 48ms vs tf 58 ms (OK)
VGG16 : pytorch 65ms vs tf 160ms (not OK)

Tested recently on cuda 9.0, tensorflow 1.9 and pytorch 0.4.1, the differences are now negligible for the same operations.
See the proper timing here.

Related

Do I need to add tf.compat.v1.disable_eager_execution() to export_inference_graph.py to convert tf.train.Checkpoint to SavedModel?

I found a question about this error (in a different scenario), many Github issues and articles, but it seemingly always has to do with people upgrading from TF 1.x to TF 2.x. I'm not doing that.
Here are my versions:
tensorflow 2.5.0
tensorflow-addons 0.13.0
tensorflow-datasets 4.3.0
tensorflow-estimator 2.5.0
tensorflow-gpu 2.5.0
I'm trying to use TF object detection, converting a model trained in TF 2.5 via Python to a tensorflow.js compatible model and asked a question about it. The answer given was to start by running:
python export_inference_graph.py \
--input_type image_tensor \
--pipeline_config_path path/to/ssd_inception_v2.config \
--trained_checkpoint_prefix path/to/model.ckpt \
--output_directory path/to/exported_model_directory
So my command ended up being:
py Tensorflow\models\research\object_detection\export_inference_graph.py
--input_type image_tensor
--pipeline_config_path Tensorflow\workspace\models\my_ssd_mobnet\pipeline.config
--trained_checkpoint_prefix Tensorflow\workspace\pre-trained-models\ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8\checkpoint\ckpt-0.data-00000-of-00001
--output_directory Tensorflow\workspace\models\my_ssd_mobnet\export
Which resulted in the error:
RuntimeError: tf.placeholder() is not compatible with eager execution
I do see in the logs a common cause of this error, I know where it's coming from:
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\object_detection-0.1-py3.8.egg\object_detection\exporter.py", line 186, in _image_tensor_input_placeholder
input_tensor = tf.placeholder(
But I don't understand how to deal with this, since I'm not writing any of these Tensorflow modules, I'm just trying to do something basic with existing modules, like converting a tf.train.Checkpoint to SavedModel.
Normally the answer seems to be to call tf.compat.v1.disable_eager_execution() but the weird thing about this is it's not my code, I don't know what else I'll potentially break in this conversion script by disabling a feature. Nor am I good enough with the Tensorflow API yet to really understand that script.
Full logs and trace:
2021-07-15 09:40:24.482953: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2021-07-15 09:40:26.835151: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll
2021-07-15 09:40:26.856379: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2080 SUPER computeCapability: 7.5
coreClock: 1.845GHz coreCount: 48 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 462.00GiB/s
2021-07-15 09:40:26.856487: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2021-07-15 09:40:26.861810: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2021-07-15 09:40:26.861891: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
2021-07-15 09:40:26.864685: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll
2021-07-15 09:40:26.865561: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll
2021-07-15 09:40:26.872246: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll
2021-07-15 09:40:26.874465: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll
2021-07-15 09:40:26.874979: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
2021-07-15 09:40:26.875238: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-15 09:40:26.876220: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-07-15 09:40:26.877353: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2080 SUPER computeCapability: 7.5
coreClock: 1.845GHz coreCount: 48 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 462.00GiB/s
2021-07-15 09:40:26.877556: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-15 09:40:27.285985: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-15 09:40:27.286153: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2021-07-15 09:40:27.286917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2021-07-15 09:40:27.287164: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5957 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2080 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
Traceback (most recent call last):
File "Tensorflow\models\research\object_detection\export_inference_graph.py", line 206, in <module>
tf.app.run()
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\platform\app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\absl\app.py", line 303, in run
_run_main(main, args)
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\absl\app.py", line 251, in _run_main
sys.exit(main(argv))
File "Tensorflow\models\research\object_detection\export_inference_graph.py", line 194, in main
exporter.export_inference_graph(
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\object_detection-0.1-py3.8.egg\object_detection\exporter.py", line 611, in export_inference_graph
_export_inference_graph(
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\object_detection-0.1-py3.8.egg\object_detection\exporter.py", line 503, in _export_inference_graph
outputs, placeholder_tensor_dict = build_detection_graph(
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\object_detection-0.1-py3.8.egg\object_detection\exporter.py", line 457, in build_detection_graph
placeholder_tensor, input_tensors = input_placeholder_fn_map[input_type](
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\object_detection-0.1-py3.8.egg\object_detection\exporter.py", line 186, in _image_tensor_input_placeholder
input_tensor = tf.placeholder(
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\ops\array_ops.py", line 3268, in placeholder
raise RuntimeError("tf.placeholder() is not compatible with "
RuntimeError: tf.placeholder() is not compatible with eager execution.
What could I be doing here that would cause this error? Did I install the wrong version of the conversion script? I checked that I have the latest Tensorflow files from the official repo, and that's where export_inference_graph.py is found. Does the conversion script just not work with Tensorflow 2.x? Do I need to modify the conversion script with tf.compat.v1.disable_eager_execution()? Will this cause other problems in the script since I'm disabling a feature?
Edit:
I know some models in the object detection were built for tf 1.x (model zoo) and others 2.x (model zoo). I verified that I have a 2.x model, so that's not the cause.
Tensorflow allows you to save the model in multiple different format (checkpoint or savedmodel). The checkpoint just saves the weights for every layer so when loading the model, you need to first define the network architecture and then load the weights. The SavedModel saves the complete model i.e. architecture, weights and training configuration (including the optimizer weights). This link has more details related to the various format that are available.
https://www.tensorflow.org/tutorials/keras/save_and_load
In your case, since tfjs requires savedmodel as input, you can directly save the tensorflow model in the savedmodel rather than saving it first as checkpoint and then trying to convert it to savedmodel format.

Tensorflow 2.0 can't use GPU, something wrong in cuDNN? :Failed to get convolution algorithm. This is probably because cuDNN failed to initialize

I am trying to understand and debug my code. I try to predict with a CNN model developed under tf2.0/tf.keras on GPU, but get those error messages.
could someone help me to fix it?
here is my environmental configuration
enviroments:
python 3.6.8
tensorflow-gpu 2.0.0-rc0
nvidia 418.x
CUDA 10.0
cuDNN 7.6+**
and the log file,
2019-09-28 13:10:59.833892: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-09-28 13:11:00.228025: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-09-28 13:11:00.957534: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-09-28 13:11:00.963310: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-09-28 13:11:00.963416: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node mobilenetv2_1.00_192/Conv1/Conv2D}}]]
mobilenetv2_1.00_192/block_15_expand_BN/cond/then/_630/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0=====>GPU Available: True
=====> 4 Physical GPUs, 1 Logical GPUs
mobilenetv2_1.00_192/block_15_expand_BN/cond/then/_630/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_15_depthwise_BN/cond/then/_644/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_15_depthwise_BN/cond/then/_644/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_15_project_BN/cond/then/_658/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_15_project_BN/cond/then/_658/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_expand_BN/cond/then/_672/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_expand_BN/cond/then/_672/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_depthwise_BN/cond/then/_686/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_depthwise_BN/cond/then/_686/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_project_BN/cond/then/_700/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_project_BN/cond/then/_700/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/Conv_1_bn/cond/then/_714/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/Conv_1_bn/cond/then/_714/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Traceback (most recent call last):
File "NSFW_Server.py", line 162, in <module>
model.predict(initial_tensor)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 915, in predict
use_multiprocessing=use_multiprocessing)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 722, in predict
callbacks=callbacks)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 393, in model_iteration
batch_outs = f(ins_batch)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/backend.py", line 3625, in __call__
outputs = self._graph_fn(*converted_inputs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1081, in __call__
return self._call_impl(args, kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1121, in _call_impl
return self._call_flat(args, self.captured_inputs, cancellation_manager)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1224, in _call_flat
ctx, args, cancellation_manager=cancellation_manager)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 511, in call
ctx=ctx)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node mobilenetv2_1.00_192/Conv1/Conv2D (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1751) ]] [Op:__inference_keras_scratch_graph_10727]
Function call stack:
keras_scratch_graph
The code
if __name__ == "__main__":
print("=====>GPU Available: ", tf.test.is_gpu_available())
tf.debugging.set_log_device_placement(True)
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
# Currently, memory growth needs to be the same across GPUs
tf.config.experimental.set_visible_devices(gpus[0], 'GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print("=====>", len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)
paras_path = "./paras/{}".format(int(2011))
model = tf.keras.experimental.load_from_saved_model(paras_path)
initial_tensor = np.zeros((1, INPUT_SHAPE, INPUT_SHAPE, 3))
model.predict(initial_tensor)
You have to check that you have the right version of CUDA + CUDNN + TensorFlow (also ensure that you have all installed).
A couple of examples of running configurations are presented below(UPDATE FOR LATEST VERSIONS OF TENSORFLOW)
Cuda 11.3.1 + CuDNN 8.2.1.32 + TensorFlow 2.7.0
Cuda 11.0 + CuDNN 8.0.4 + TensorFlow 2.4.0
Cuda 10.1 + CuDNN 7.6.5 (normally > 7.6) + TensorFlow 2.2.0/TensorFlow 2.3.0 (TF >= 2.1 requires CUDA >=10.1)
Cuda 10.1 + CuDNN 7.6.5 (normally > 7.6) + TensorFlow 2.1.0 (TF >= 2.1 requires CUDA >=
10.1)
Cuda 10.0 + CuDNN 7.6.3 + / TensorFlow 1.13/1.14 / TensorFlow 2.0.
Cuda 9.0 + CuDNN 7.0.5 + TensorFlow 1.10
Usually this error appears when you have an incompatible version of TensorFlow/CuDNN installed. In my case, this appeared when I tried using an older TensorFlow with a newer version of CuDNN.
**If for some reason you get an error message like(and nothing happens afterwards) :
Relying on the driver to perform ptx compilation
Solution : Install the latest nvidia driver
[SEEMS TO BE SOLVED IN TF >= 2.5.0] (see below):
Only for Windows Users : Some late combintations of CUDA, CUDNN and TF may not work, due to a bug (a .dll extension named improperly). To handle that specific case, please consult this link: Tensorflow GPU Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
For those who are facing issues regarding the above error(For Windows platform), I sorted it just by installing CuDNN version compatible with the CUDA already installed in the system.
This suitable version can be downloaded from the website Download CuDNN from Developer's portal. You might need Nvidia account for it. This will be easily created by providing mail id and filling a questionnaire.
To check the CUDA version, run NVCC --version.
Once the suitable version is downloaded, extract the folder from the zip file.
Go to the bin folder of the extracted folder. copy the cudnn64:7.dll and paste it in the CUDA's bin folder. In my case, the location where Cuda is installed is C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin.
This would most probably solve the problem.
My system details:
Windows 10
CUDA 10.0
TensorFlow 2.0
GPU- Nvidia GTX 1060
I also found this blog Installing TensorFlow with CUDA and GPU support on Windows 10. very useful.
Check the instructions on this TensorFlow GPU instruction page for your OS. It resolved issue for me on Ubuntu 16.04.6 LTS and Tensorflow 2.0

How to run TensorFlow on AMD/ATI GPU?

After reading this tutorial https://www.tensorflow.org/guide/using_gpu I checked GPU session on this simple code
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2,3], name = 'a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape = [3,2], name = 'b')
c = tf.matmul(a, b)
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
x = sess.run(c)
print(x)
The output was
2018-08-07 18:44:59.019144: I
tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports
instructions that this TensorFlow binary was not compiled to use: AVX2
FMA Device mapping: no known devices. 2018-08-07 18:44:59.019536: I
tensorflow/core/common_runtime/direct_session.cc:288] Device mapping:
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:CPU:0
2018-08-07 18:44:59.019902: I
tensorflow/core/common_runtime/placer.cc:886] MatMul:
(MatMul)/job:localhost/replica:0/task:0/device:CPU:0 a: (Const):
/job:localhost/replica:0/task:0/device:CPU:0 2018-08-07
18:44:59.019926: I tensorflow/core/common_runtime/placer.cc:886] a:
(Const)/job:localhost/replica:0/task:0/device:CPU:0 b: (Const):
/job:localhost/replica:0/task:0/device:CPU:0 2018-08-07
18:44:59.019934: I tensorflow/core/common_runtime/placer.cc:886] b:
(Const)/job:localhost/replica:0/task:0/device:CPU:0 [[ 22. 28.] [
49. 64.]]
As you see there is no calculation done by GPU.
and when I changed the code to use GPU's configuration and process fraction:
conf = tf.ConfigProto()
conf.gpu_options.per_process_gpu_memory_fraction = 0.4
with tf.Session(config = conf) as sess:
x = sess.run(c)
print(x)
The output was
2018-08-07 18:52:22.681221: I
tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports
instructions that this TensorFlow binary was not compiled to use: AVX2
FMA [[ 22. 28.] [ 49. 64.]]
What can I do to run the session on GPU card? Thank you.
It is most certainly possible to run tensorflow on AMD GPUs. About 2 years back ROCm was released which gets things done. However, the is a caveat, that it runs only on Linux as of now owing to its open-source origins. So if you are willing to use Linux then you can most certainly train your DL models using AMD GPUs. That said the amount of support you will get is low as the community is still not large enough. Google search for ROCm and you can get instructions on how to get it set up and running on a Linux machine. May be it will work with WSL2 in windows, but I have not tried it yet and so cannot comment on that.
here is a link to ROCm installation docs
You can use TensorflowJS, the Javascript version of tensorflow.
TensorflowJS does not have any HW limitation and can run on all the gpu supporting webGL.
The api is pretty similar to tf in python and the project provides scripts to convert your models from python to JS
I believe TensorFlow-GPU only support GPU card with CUDA Compute Capability >= 3.0 of NVIDIA.
The following TensorFlow variants are available for installation:
TensorFlow with CPU support only. If your system does not have a NVIDIA® GPU, you must install this version. This version of TensorFlow is usually easier to install, so even if you have an NVIDIA GPU, we recommend installing this version first.
TensorFlow with GPU support. TensorFlow programs usually run much faster on a GPU instead of a CPU. If you run performance-critical applications and your system has an NVIDIA® GPU that meets the prerequisites, you should install this version. See TensorFlow GPU support for details.
https://www.tensorflow.org/install/install_linux

how to use my own trained model with facenet implemented in tensorflow?

I train the model with the shell command:
python src/facenet_train.py \
--batch_size 15 \
--gpu_memory_fraction 0.25 \
--models_base_dir trained_model_2017_05_15_10_24 \
--pretrained_model trained_model_2017_05_15_10_24/20170515-121856/model-20170515-121856.ckpt-182784 \
--model_def models.nn2 \
--logs_base_dir logs \
--data_dir /data/user_set/training/2017_05_15_10_24 \
--lfw_pairs /data/user_set/lfw_pairs.txt \
--image_size 224 \
--lfw_dir /data/user_set/lfw \
--optimizer ADAM \
--max_nrof_epochs 1000 \
--learning_rate 0.00001
but i get error infomation like this when use my own trained model:
2017-05-17 14:23:05.448285: W
tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow
library wasn't compiled to use SSE4.1 instructions, but these are
available on your machine and could speed up CPU computations.
2017-05-17 14:23:05.448318: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow
library wasn't compiled to use SSE4.2 instructions, but these are
available on your machine and could speed up CPU computations.
2017-05-17 14:23:05.448324: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow
library wasn't compiled to use AVX instructions, but these are
available on your machine and could speed up CPU computations.
2017-05-17 14:23:05.448329: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow
library wasn't compiled to use AVX2 instructions, but these are
available on your machine and could speed up CPU computations.
2017-05-17 14:23:05.448334: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow
library wasn't compiled to use FMA instructions, but these are
available on your machine and could speed up CPU computations.
2017-05-17 14:23:05.674872: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0
with properties:
name: Quadro M4000
major: 5 minor: 2 memoryClockRate (GHz) 0.7725
pciBusID 0000:03:00.0
Total memory: 7.93GiB
Free memory: 2.89GiB
2017-05-17 14:23:05.674917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0
2017-05-17 14:23:05.674935: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0: Y
2017-05-17 14:23:05.674957: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating
TensorFlow device (/gpu:0) -> (device: 0, name: Quadro M4000, pci bus
id: 0000:03:00.0)
Traceback (most recent call last):
File "forward.py", line 21, in
images_placeholder = tf.get_default_graph().get_tensor_by_name("input:0")
File "/home/chen/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2563, in get_tensor_by_name
return self.as_graph_element(name, allow_tensor=True, allow_operation=False)
File "/home/chen/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2414, in as_graph_element
return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
File "/home/chen/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2456, in _as_graph_element_locked
"graph." % (repr(name), repr(op_name)))
KeyError: "The name 'input:0' refers to a Tensor which does not exist. The operation, 'input', does not exist in the graph."
get feature code:
import tensorflow as tf
import facenet
w_MODEL_PATH_='/home/chen/demo_dir/facenet_tensorflow_train/trained_model_2017_05_15_10_24/20170515-121856'
with tf.Graph().as_default():
with tf.Session() as sess:
# load the model
meta_file, ckpt_file = facenet.get_model_filenames(w_MODEL_PATH_)
facenet.load_model(w_MODEL_PATH_, meta_file, ckpt_file)
# print("model_path:", w_MODEL_PATH_,"meta_file:", meta_file,"ckpt_file:", ckpt_file)
# Get input and output tensors
# ops = tf.get_default_graph().get_operations()
#
# print(ops)
images_placeholder = tf.get_default_graph().get_tensor_by_name("input:0")
embeddings = tf.get_default_graph().get_tensor_by_name("embeddings:0")
phase_train_placeholder = tf.get_default_graph().get_tensor_by_name("phase_train:0")
image_size = images_placeholder.get_shape()[1]
embedding_size = embeddings.get_shape()[1]
# print(image_size)
paths = ['one.png', 'two.png']
# Run forward pass to calculate embeddings
images = facenet.load_data(paths, do_random_crop=False, do_random_flip=False, image_size=image_size,
do_prewhiten=True)
# print("images:", idx, images)
feed_dict = {images_placeholder: images, phase_train_placeholder: False}
# print(idx,"embeddings:", embeddings)
emb_array = sess.run(embeddings, feed_dict=feed_dict)
# print(idx, "emb_array:", emb_array)
print(emb_array)
I don't know how to use my own trained model, please help.
If you are talking about the last part then use this code to see what operations your model has.
for i in tf.get_default_graph().get_operations():
print(i.name)
If you are talking about the optimizations.
You are getting this error because you need to compile tensorflow on your own machine. It is very easy to do.
You can read the documentation for the full list of options, but essentially you need to do a few steps.
https://www.tensorflow.org/install/install_sources
git clone tensorflow
repo install
bazel a tensorflow build system
configure tensorflow
build tensorflow
install tensorflow in your environment if that is anaconda or virtualenv if you are using python
that is it, of course other required libraries will need to be installed. It is pretty easy to do on Ubuntu.
Alternatively you could try the conda forge version of tensorflow-gpu if you are using anaconda, but I cannot verify it is also compiled with optimizations for your cpu.
https://conda-forge.org/
install anaconda
add the conda forge repo url
update conda
install tensorflow-gpu

How does one have TensorFlow not run the script unless the GPU was loaded successfully?

I have been trying to run some TensorFlow training on some machine with GPUs however, whenever I try to do so I get some type of error that seems to say it wasn't able to use the GPU for some reason (usually memory issue, or cuda issue or cudnn etc). However, since the thing TensorFlow does automatically is to just run in CPU if it can't use the GPU its been hard to tell for me if it was actually able to leverage the GPU or not. Thus, I wanted to have my script just fail/halt unless the GPU is being used. How do I do that?
For the sake of an example, currently I have the message:
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Tesla P100-SXM2-16GB
major: 6 minor: 0 memoryClockRate (GHz) 1.4805
pciBusID 0000:85:00.0
Total memory: 15.93GiB
Free memory: 15.63GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-SXM2-16GB, pci bus id: 0000:85:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Tesla P100-SXM2-16GB
major: 6 minor: 0 memoryClockRate (GHz) 1.4805
pciBusID 0000:85:00.0
Total memory: 15.93GiB
Free memory: 522.25MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-SXM2-16GB, pci bus id: 0000:85:00.0)
E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:532] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
where it seems to load all the cuda fine but then at the end complains. The complaining lines are:
E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:532] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
we could try to debug these specific bug but for the moment it proceeds to train however, I have no idea if its using cpu or gpu. Can we just have it not proceed training if any weird cuda/cudnn or whatever gpu bug comes up?
Use with tf.device('/gpu:0'):. This will kill your program if /gpu:0 doesnt exist.
eg see https://github.com/hughperkins/tensorflow-cl/blob/tensorflow-cl/tensorflow/stream_executor/cl/test/test_binary_ops.py#L52
with tf.Graph().as_default():
with tf.Session(config=tf.ConfigProto(log_device_placement=False)) as sess:
with tf.device('/gpu:0'):
tf_a = tf.placeholder(tf_dtype, [None, None], 'a')
tf_b = tf.placeholder(tf_dtype, [None, None], 'b')
tf_c = tf.__dict__[tf_func](tf_a, tf_b, name="c")
You can list all available devices in tensorflow: How to get current available GPUs in tensorflow?. If GPU is not in the list, you can make the program throw exceptions.