I am loading a Tensorflow 2 version of EfficientDet D2 (http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d2_coco17_tpu-32.tar.gz) using a Jetson AGX Xavier.
I run the following script:
#!/usr/bin/python3
import tensorflow as tf
import time
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as viz_utils
PATH_TO_SAVED_MODEL = "./efficientdet_d2_coco17_tpu-32/saved_model/"
print('Loading model...')
start_time = time.time()
# Load saved model and build the detection function
detect_fn = tf.saved_model.load(PATH_TO_SAVED_MODEL)
end_time = time.time()
elapsed_time = end_time - start_time
print('Done! Took {} seconds'.format(elapsed_time))
However, the performance results is a loading time of more than 13 minutes.
This is the output after the command has been executed:
./test.py
2021-07-04 10:58:58.074413: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-07-04 10:59:05.375568: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
Loading model...
2021-07-04 11:00:54.337115: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-07-04 11:00:54.342226: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-07-04 11:00:54.347726: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:00:54.347959: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.377GHz coreCount: 8 deviceMemorySize: 31.17GiB deviceMemoryBandwidth: 82.08GiB/s
2021-07-04 11:00:54.348037: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-07-04 11:00:54.353788: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-07-04 11:00:54.354040: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2021-07-04 11:00:54.358471: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-07-04 11:00:54.359514: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-07-04 11:00:54.364904: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-07-04 11:00:54.369140: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-07-04 11:00:54.369861: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-07-04 11:00:54.370262: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:00:54.370843: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:00:54.371060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-07-04 11:00:54.375404: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:00:54.375623: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.377GHz coreCount: 8 deviceMemorySize: 31.17GiB deviceMemoryBandwidth: 82.08GiB/s
2021-07-04 11:00:54.375714: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-07-04 11:00:54.375823: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-07-04 11:00:54.375908: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2021-07-04 11:00:54.376011: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-07-04 11:00:54.376090: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-07-04 11:00:54.376167: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-07-04 11:00:54.376287: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-07-04 11:00:54.376369: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-07-04 11:00:54.376673: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:00:54.376972: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:00:54.377093: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-07-04 11:05:01.847060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-04 11:05:01.847174: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-07-04 11:05:01.847226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-07-04 11:05:01.847710: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:05:01.848589: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:05:01.848911: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:05:01.849096: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 19271 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
2021-07-04 11:05:01.850298: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
Done! Took 793.8719098567963 seconds
With the computing power of the Xavier I would expect much better performance? Anybody knows what the cause to this could be?
Thanks for any help or input!
The time it takes is not just to load the model, but to initialize the device. Maybe the problem is in the driver.
To prove it, try to init a smaller model, or toy example like a+b=c. I expect it will take similar time.
Also the computing power has nothing to do with loading the model. Loading of the model depends more on memory management of the driver and TF. The actual building of the model in the memory may be done on the CPU, even when using GPU or other accelerator (just guessing).
My experience with CUDA and TF is 5 minutes initialization time with one version of CUDA, TF and GPU driver. And less than 30 sec with another version of CUDA and TF, on the same hardware (8x1080ti GPUs).
Related
environment:tensorflow 2.3.0 Anaconda with CUDA Toolkit 10.1 CUDNN 7.6.0 (Windows 10)
I run my code with multiprocessing it return a issue
Non-OK-status: GpuLaunchKernel(FillPhiloxRandomKernelLaunch<Distribution>, num_blocks, block_size, 0, d.stream(), gen, data, size, dist) status: Internal: initialization error
but the problem goes away when I run the code using "else" methods:
This is the related code:
if runInProcess:
p = multiprocessing.Process(target=Train, args=(yml, train_subs, loadData, channDic, eventTypeDic, getData, folderIndex, modelPath))
p.start()
p.join()
else:
Train(yml, train_subs, loadData, channDic, eventTypeDic, getData, folderIndex, modelPath)
My GPU : RTX2080Ti 12G
I already run the nvcc -V and it returns :
meya#admin:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105
This is the full result :
coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2022-05-30 09:54:42.815682: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2022-05-30 09:54:42.815732: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2022-05-30 09:54:42.815769: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2022-05-30 09:54:42.815804: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2022-05-30 09:54:42.815838: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2022-05-30 09:54:42.815872: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2022-05-30 09:54:42.815907: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2022-05-30 09:54:42.818666: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2022-05-30 09:54:42.818744: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2022-05-30 09:54:43.491362: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-05-30 09:54:43.491412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2022-05-30 09:54:43.491419: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2022-05-30 09:54:43.492693: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9374 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:af:00.0, compute capability: 7.5)
../model/1S_FB1-100_S100_chanSeq.yml
Will Return
folder Index: 1
--------------------- Fetching 1/1 Data from suject 1 ---------------------
Extracting EDF parameters from
/data0/meya/MI/RaiseLower/Action/session_miltiprocess/s01/evt.bdf...
BDF file detected
Setting channel info structure...
Creating raw.info structure...
/data0/meya/code/code_test/MI/interSub/../../MI/loadData.py:204: RuntimeWarning: Omitted
136 annotation(s) that were outside data range.
annotationData = mne.io.read_raw_bdf(eventFile)
mne version > 0.20
Filtering raw data in 1 contiguous segment
Setting up band-pass filter from 4 - 1e+02 Hz
FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal bandpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 4.00
- Lower transition bandwidth: 2.00 Hz (-6 dB cutoff frequency: 3.00 Hz)
- Upper passband edge: 100.00 Hz
- Upper transition bandwidth: 25.00 Hz (-6 dB cutoff frequency: 112.50 Hz)
- Filter length: 1651 samples (1.651 sec)
----------------------Build Model----------------------
Model Type: EEGNet
2022-05-30 09:55:53.385452: F ./tensorflow/core/kernels/random_op_gpu.h:232] Non-OK-status: GpuLaunchKernel(FillPhiloxRandomKernelLaunch<Distribution>, num_blocks, block_size, 0, d.stream(), gen, data, size, dist) status: Internal: initialization error
What are the issues and how to fix it?
I have an Nvidia driver v:396, and I can't update it as another application running inside docker depends on it.
So, I used this repo https://github.com/SmileTM/Tensorflow2.X-GPU-CUDA9.0 to install tf2 inside docker container nvidia/cuda:9.0-cudnn7-devel
But when I install the tensorflow and try to run tf.test.is_gpu_available() I get the following output:
>>> tf.test.is_gpu_available()
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2021-05-02 17:55:39.369149: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-05-02 17:55:40.073345: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:05:00.0 name: Quadro P4000 computeCapability: 6.1
coreClock: 1.48GHz coreCount: 14 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 226.62GiB/s
2021-05-02 17:55:40.073783: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.9.0
2021-05-02 17:55:40.075887: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.9.0
2021-05-02 17:55:40.077755: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.9.0
2021-05-02 17:55:40.107478: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.9.0
2021-05-02 17:55:40.108647: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.9.0'; dlerror: /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcusolver.so.9.0: undefined symbol: GOMP_critical_end; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-05-02 17:55:40.110743: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.9.0
2021-05-02 17:55:40.116016: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-05-02 17:55:40.116046: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1592] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-05-02 17:55:40.384507: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-05-02 17:55:40.384578: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2021-05-02 17:55:40.384595: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
False
import ctypes
ctypes.CDLL("libgomp.so.1", mode=ctypes.RTLD_GLOBAL)
Now, tf.test.is_gpu_available() returns True.
System information
OS Platform and Distribution :CentOS Linux release 7.7.1908
-TensorFlow version:2.3.0
I try to convert the tensorflow offical image caption model to TFLite model
And Now I have successfully convert the model using tf.lite.TFLiteConverter.from_concrete_functions
as following:
#tf.function
def evaluate(img_tensor_val):
temp_input = tf.expand_dims(img_tensor_val, 0)
img_tensor_val = image_features_extract_model(temp_input)
img_tensor_val = tf.reshape(img_tensor_val, (img_tensor_val.shape[0], -1, img_tensor_val.shape[3]))
hidden = decoder.reset_states(batch_size=1)
features = encoder(img_tensor_val)
dec_input = tf.expand_dims([tokenizer.word_index['<start>']], 0)
result = []
for i in range(max_length):
predictions, hidden, attention_weights = decoder(dec_input, features, hidden)
print(predictions.shape)
# result.append(predictions)
predicted_id = tf.random.categorical(predictions, 1)[0][0]
#
#
result.append(predicted_id)
#
#
# if predicted_id == 3:
# return result
# # result.append(tf.gather(tokenizer.index_word, predicted_id))
# #
# # if tf.gather(tokenizer.index_word, predicted_id) == '<end>':
# # return result
#
dec_input = tf.expand_dims([predicted_id], 0)
return result
export_dir = "./"
tflite_enc_input = ''
ckpt.f = evaluate
to_save = evaluate.get_concrete_function(tf.TensorSpec(shape=(299, 299, 3),dtype=tf.dtypes.float32))
converter = tf.lite.TFLiteConverter.from_concrete_functions([to_save])
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS,
tf.lite.OpsSet.SELECT_TF_OPS]
tflite_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_model)
And I Visualize the converted_model.tflite by Netorn:
But when I invoke the interpreter the problem came:
LOG:
2020-10-03 12:11:24.049222: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-10-03 12:11:30.184705: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-10-03 12:11:30.213363: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:af:00.0 name: Tesla V100-SXM2-32GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 31.72GiB deviceMemoryBandwidth: 836.37GiB/s
2020-10-03 12:11:30.213414: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-10-03 12:11:30.219666: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-10-03 12:11:30.223018: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-10-03 12:11:30.224419: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-10-03 12:11:30.227861: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-10-03 12:11:30.230195: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-10-03 12:11:30.236320: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-10-03 12:11:30.239374: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-10-03 12:11:30.239829: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-10-03 12:11:30.248265: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2600000000 Hz
2020-10-03 12:11:30.249524: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5615faa7fa90 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-10-03 12:11:30.249552: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-10-03 12:11:30.381691: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5615faaec0c0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-10-03 12:11:30.381734: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla V100-SXM2-32GB, Compute Capability 7.0
2020-10-03 12:11:30.383860: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:af:00.0 name: Tesla V100-SXM2-32GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 31.72GiB deviceMemoryBandwidth: 836.37GiB/s
2020-10-03 12:11:30.383900: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-10-03 12:11:30.383930: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-10-03 12:11:30.383944: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-10-03 12:11:30.383959: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-10-03 12:11:30.383973: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-10-03 12:11:30.383987: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-10-03 12:11:30.384002: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-10-03 12:11:30.387738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-10-03 12:11:30.387786: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-10-03 12:11:31.156790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-10-03 12:11:31.156840: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2020-10-03 12:11:31.156853: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2020-10-03 12:11:31.160006: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 30098 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:af:00.0, compute capability: 7.0)
**(299, 299, 3)**
INFO: Created TensorFlow Lite delegate for select TF ops.
2020-10-03 12:11:31.760387: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:af:00.0 name: Tesla V100-SXM2-32GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 31.72GiB deviceMemoryBandwidth: 836.37GiB/s
2020-10-03 12:11:31.760470: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-10-03 12:11:31.760523: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-10-03 12:11:31.760551: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-10-03 12:11:31.760577: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-10-03 12:11:31.760601: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-10-03 12:11:31.760625: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-10-03 12:11:31.760647: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-10-03 12:11:31.763282: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-10-03 12:11:31.763329: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-10-03 12:11:31.763346: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2020-10-03 12:11:31.763360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2020-10-03 12:11:31.766083: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 30098 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:af:00.0, compute capability: 7.0)
**INFO: TfLiteFlexDelegate delegate: 51 nodes delegated out of 2014 nodes with 51 partitions.**
**Segmentation fault (core dumped)**
The invoke of Interpreter
def load_image(image_path):
img = tf.io.read_file(image_path)
img = tf.image.decode_jpeg(img, channels=3)
img = tf.image.resize(img, (299,299))
img = tf.keras.applications.inception_v3.preprocess_input(img)
return img, image_path
image = load_image('./test.jpg')[0]
print(image.shape)
interpreter = tf.lite.Interpreter(model_path='./converted_model.tflite')
input_details = interpreter.get_input_details()
interpreter.allocate_tensors()
interpreter.set_tensor(input_details[0]['index'], image)
interpreter.invoke()
raw_prediction = interpreter.tensor(
interpreter.get_output_details()[0]['index'])()
print(raw_prediction)
Please tell me what 's the problem of the program?What's the meaning of 'Segmentation fault (core dumped)' ?
After I change the GPU I use to CPU, the program is now correct.But I don't know the Why the theraw_prediction=291 .But In the evaluation function, the return is a list[].How can this be?
I want to run yolov4 code in this repo: https://github.com/hunglc007/tensorflow-yolov4-tflite
And I installed python 3.7 and all requirements and cuda and cudnn.
By the log, the cudnn and cuda is installed well, but there is error of "no kernel image is available for execution on the device" what is this error? is it related in cuda or cudnn version error?
Python: 3.7.9, CUDA: 10.1, Tensorflow:2.3.0rc0, Tensorflow-GPU:not installed, CUDNN:7.5.0, OS: Windows10(x64)
py -3.7 save_model.py --weights ./data/yolov4.weights --output ./checkpoints/yolov4-416-tflite --input_size 416 --model yolov4 --framework tflite
2020-09-03 11:02:05.897607: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-09-03 11:02:09.504648: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2020-09-03 11:02:09.997508: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce 940MX computeCapability: 5.0
coreClock: 1.2415GHz coreCount: 3 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 13.41GiB/s
2020-09-03 11:02:10.017273: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-09-03 11:02:10.036505: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-09-03 11:02:10.059534: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-09-03 11:02:10.074749: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-09-03 11:02:10.094710: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-09-03 11:02:10.115167: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-09-03 11:02:10.140633: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-09-03 11:02:10.148636: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-09-03 11:02:10.155846: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-09-03 11:02:10.188413: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x295adc030a0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-09-03 11:02:10.199421: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-09-03 11:02:10.207675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce 940MX computeCapability: 5.0
coreClock: 1.2415GHz coreCount: 3 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 13.41GiB/s
2020-09-03 11:02:10.222939: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-09-03 11:02:10.231890: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-09-03 11:02:10.241896: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-09-03 11:02:10.250393: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-09-03 11:02:10.260177: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-09-03 11:02:10.268644: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-09-03 11:02:10.278132: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-09-03 11:02:10.286635: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-09-03 11:02:10.380510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-03 11:02:10.388703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2020-09-03 11:02:10.394562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2020-09-03 11:02:10.402323: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1464 MB memory) -> physical GPU (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0, compute capability: 5.0)
2020-09-03 11:02:10.429701: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x295ae120140 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-09-03 11:02:10.441631: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce 940MX, Compute Capability 5.0
2020-09-03 11:02:10.619742: F .\tensorflow/core/kernels/random_op_gpu.h:232] Non-OK-status: GpuLaunchKernel(FillPhiloxRandomKernelLaunch<Distribution>, num_blocks, block_size, 0, d.stream(), gen, data, size, dist) status: Internal: no kernel image is available for execution on the device
Fatal Python error: Aborted
The error indicates that the pre-built binary used in tensorflow, does not support the SM version (compute capability) supported by your actual hardware.
You can refer to below link for supported combinations:
https://www.tensorflow.org/install/source_windows#gpu
Based on this, both 2.1.0 and 2.3.0 require CUDNN 7.4 and CUDA 10.1. You should try with these supported combinations.
[2.3.0 release/rc2/rc0 specific] from https://github.com/tensorflow/tensorflow/releases/tag/v2.3.0 - TF 2.3 includes PTX kernels only for compute capability 7.0 to reduce the TF pip binary size. Earlier releases included PTX for a variety of older compute capabilities.
I'm using TensorFlow-GPU 1.14 on Ubuntu 16.04.
As I'm not familiar with TensorFlow, I wonder I'm using GPU practically or not.
I have
GeForce GTX 1060
Nvidia-driver 418
CUDA 10.0
cuDNN v7.6.5
And when I execute my codes I always get this message,
WARNING:tensorflow:From /home/mine/catkin_ws/src/PROJECT/project6_3/src/ddpg.py:26: The name tf.InteractiveSession is deprecated. Please use tf.compat.v1.InteractiveSession instead.
2020-06-24 20:29:13.827441: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-06-24 20:29:13.834067: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-06-24 20:29:13.930412: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-24 20:29:13.931260: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x6a40a50 executing computations on platform CUDA. Devices:
2020-06-24 20:29:13.931277: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce GTX 1060 6GB, Compute Capability 6.1
2020-06-24 20:29:13.959129: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3000000000 Hz
2020-06-24 20:29:13.959392: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x6ab1d70 executing computations on platform Host. Devices:
2020-06-24 20:29:13.959409: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
2020-06-24 20:29:13.959576: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-24 20:29:13.960326: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.759
pciBusID: 0000:01:00.0
2020-06-24 20:29:13.961867: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-06-24 20:29:13.988711: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2020-06-24 20:29:14.002012: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2020-06-24 20:29:14.006381: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2020-06-24 20:29:14.038179: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2020-06-24 20:29:14.057922: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2020-06-24 20:29:14.114149: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-06-24 20:29:14.114248: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-24 20:29:14.115060: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-24 20:29:14.115765: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2020-06-24 20:29:14.116472: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-06-24 20:29:14.118350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-24 20:29:14.118378: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2020-06-24 20:29:14.118386: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2020-06-24 20:29:14.119144: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-24 20:29:14.119963: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-24 20:29:14.120610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4889 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
WARNING:tensorflow:From /home/mine/catkin_ws/src/PROJECT/project6_3/src/actor_network_bn.py:75: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
WARNING:tensorflow:From /home/mine/catkin_ws/src/PROJECT/project6_3/src/actor_network_bn.py:177: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.
WARNING:tensorflow:From /home/mine/catkin_ws/src/PROJECT/project6_3/src/actor_network_bn.py:178: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
Anyone who knows what this message means?
Am I using GPU, properly?
(When I checked how much my GPU was being used with following commands
$ watch -d -n 0.5 nvidia-smi
it always returns 1407 Mib/ 6000 Mib of usage.)
And additionally, should I modify my codes following WARNING messages?
(it works well currently on some level)
Thanks in advance. :)
Am I using Tensorflow GPU ?
If you have executed below code and if it returns device_type='GPU' means, there is no issue with Tensorflow GPU installation and you are good to use.
import tensorflow as tf
tf.config.list_physical_devices('GPU')
Output:
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
2020-06-24 20:29:14.120610: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created
TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with
4889 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060
6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
If you have check above log from stack trace Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4889 MB memory), that means you are using GPU.
2020-06-24 20:29:13.961867: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-06-24 20:29:13.988711: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2020-06-24 20:29:14.002012: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2020-06-24 20:29:14.006381: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2020-06-24 20:29:14.038179: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
They are just the Information as they are prefixed with I. If there would be any warnings then it prefixed with W and for error they prefixed with E.
And you are seeing WARNING:tensorflow: they are conveying you to replace modules with newer one(i.e compat) since those are deprecated and to execute same code in TF2.x.