Time consuming Tensorflow's CUDA driver check in AWS Lambda - tensorflow

I've been running an AWS Lambda and mounted an EFS, where I've installed Tensorflow 2.4. When I try to run the Lambda (and every Lambda that uses Tensorflow 2.4) it wastes a lot of time (about 4 minutes, or maybe more sometimes) on some Tensorflow's settings check. So I need to set a very wide timeout to overcome this issue.
These are the prints that the Lambda produces:
2022-05-17 06:33:21.917336: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-05-17 06:33:21.921992: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /var/lang/lib:/lib64:/usr/lib64:/var/runtime:/var/runtime/lib:/var/task:/var/task/lib:/opt/lib
2022-05-17 06:33:21.922025: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2022-05-17 06:33:21.922048: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (169.254.137.137): /proc/driver/nvidia/version does not exist
2022-05-17 06:33:21.922460: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-05-17 06:33:22.339905: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2022-05-17 06:33:22.340468: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2500010000 Hz
[WARNING] 2022-05-17T06:33:22.436Z c4500036-5b77-4808-a062-f8ae820b0317 AutoGraph could not transform <function Model.make_predict_function..predict_function at 0x7f65bfb37280> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output.
Cause: unsupported operand type(s) for -: 'NoneType' and 'int'
To silence this warning, decorate the function with #tf.autograph.experimental.do_not_convert
What I need is to overcome this waste of time, and run a clean elaboration.

Related

model optimzer in intel open vino

I used
import tensorflow as tf
model = tf.keras.models.load_model('model.h5')
tf.saved_model.save(model,'model')
for saving my image classification model (tensorflow version on google colab = 2.9.2, intel open vino version[Development Tools] = 2021.4.2 LTS)
---------------------------------------------------------------------------------------
C:\Program Files (x86)\Intel\openvino_2021.4.752\deployment_tools\model_optimizer>python mo_tf.py --saved_model_dir C:\Users\dchoi\CNNProejct_Only_saved_English\saved_model --input_shape [1,32,320,240,3] --output_dir C:\Users\dchoi\CNNproject_only_output_English\output_model
Model Optimizer arguments:
Common parameters:
- Path to the Input Model: None
- Path for generated IR: C:\Users\dchoi\CNNproject_only_output_English\output_model
- IR output name: saved_model
- Log level: ERROR
- Batch: Not specified, inherited from the model
- Input layers: Not specified, inherited from the model
- Output layers: Not specified, inherited from the model
- Input shapes: [1,32,320,240,3]
- Mean values: Not specified
- Scale values: Not specified
- Scale factor: Not specified
- Precision of IR: FP32
- Enable fusing: True
- Enable grouped convolutions fusing: True
- Move mean values to preprocess section: None
- Reverse input channels: False
TensorFlow specific parameters:
- Input model in text protobuf format: False
- Path to model dump for TensorBoard: None
- List of shared libraries with TensorFlow custom layers implementation: None
- Update the configuration file with input/output node names: None
- Use configuration file used to generate the model with Object Detection API: None
- Use the config file: None
- Inference Engine found in: C:\Users\dchoi\AppData\Local\Programs\Python\Python38\lib\site-packages\openvino
Inference Engine version: 2021.4.0-3839-cd81789d294-releases/2021/4
Model Optimizer version: 2021.4.2-3974-e2a469a3450-releases/2021/4
[ WARNING ] Model Optimizer and Inference Engine versions do no match.
[ WARNING ] Consider building the Inference Engine Python API from sources or reinstall OpenVINO (TM) toolkit using "pip install openvino==2021.4"
2022-11-19 01:34:44.207311: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-11-19 01:34:44.207542: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
C:\Users\dchoi\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\autograph\impl\api.py:22: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
2022-11-19 01:34:46.961002: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-11-19 01:34:46.961949: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found
2022-11-19 01:34:46.962904: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2022-11-19 01:34:46.969471: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: DESKTOP-SCBPOUA
2022-11-19 01:34:46.969727: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: DESKTOP-SCBPOUA
2022-11-19 01:34:46.970663: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-19 01:34:46.971135: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
[ FRAMEWORK ERROR ] Cannot load input model: SavedModel format load failure: NodeDef mentions attr 'validate_shape' not in Op<name=AssignVariableOp; signature=resource:resource, value:dtype -> ; attr=dtype:type; is_stateful=true>; NodeDef: {{node AssignNewValue}}. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
------------------------------------------------------------------------------------------
I am getting this kind of error even after I downloaded
install_prerequirement/install_prerequisites_tf2.bat
need help
Your error seems to indicate the mismatch between the TensorFlow version used to load GraphDef file. From my replication, I am able to generate the Intermediate Representation (IR) files using TensorFlow 2.5.3 version. Here is the full Model Optimizer command used:
mo_tf.py --saved_model_dir <path_to_model\IMGC.h5_to_saved_model.pb> --input_shape [1,320,240,3] --output_dir <path_for_output_files>

tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found

I was instaling tensorflow on my cpu when I got these 2 errors:
2022-03-13 17:59:56.171741: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-03-13 17:59:56.171872: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Can anybody help me out here a little bit because I was also following a tutorial from a few years ago.
This is just a Warning and Information message that CUDA libraries cannot be found.
The I message at line 2: ignore the W message that comes above it if no CUDA GPU is installed on your machine.
The only effect of this is that training will happen on CPU only.
If you are using NVIDIA GPU, you can refer to how to install the missing files.
If you don't use NVIDIA GPU, or simply want to ignore the I and W messages, you can add the 2 lines below at the beginning of your code:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
You can see more about TF_CPP_MIN_LOG_LEVEL at TensorFlow logging.

Running drums_rnn_train hangs on calling checkpoint listeners on first checkpoint

I'm trying to train a magenta model on a set of hi hat MIDI patterns and upon running
drums_rnn_train --config='one_drum' --run_dir=/tmp/drums_rnn/logdir/run2 --sequence_example_file=/tmp/drums_rnn/sequence_examples/training_drum_tracks.tfrecord --hparams="batch_size=32,rnn_layer_sizes=[32,32]" --num_training_steps=1000
I'm seeing the below logs after a bunch of deprecation warnings.
I1003 13:21:29.452953 4436757952 events_rnn_train.py:103] Starting training loop...
I1003 13:21:29.453077 4436757952 basic_session_run_hooks.py:546] Create CheckpointSaverHook.
W1003 13:21:29.549679 4436757952 deprecation.py:323] From /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
I1003 13:21:29.589996 4436757952 monitored_session.py:246] Graph was finalized.
2020-10-03 13:21:29.590419: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-10-03 13:21:29.609557: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fd08e7ae700 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-10-03 13:21:29.609573: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
I1003 13:21:29.672456 4436757952 session_manager.py:505] Running local_init_op.
I1003 13:21:29.678084 4436757952 session_manager.py:508] Done running local_init_op.
W1003 13:21:29.695948 4436757952 deprecation.py:323] From /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py:906: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
I1003 13:21:30.106312 4436757952 basic_session_run_hooks.py:614] Calling checkpoint listeners before saving checkpoint 0...
I1003 13:21:30.106546 4436757952 basic_session_run_hooks.py:618] Saving checkpoints for 0 into ./tmp/drums_rnn/logdir/run5/train/model.ckpt.
I1003 13:21:30.187100 4436757952 basic_session_run_hooks.py:626] Calling checkpoint listeners after saving checkpoint 0...
The model remains stuck on this first "Calling checkpoint listeners after saving" line. I've verified it's not a performance issue as I can easily train models using larger batch sizes for polyphonic melodies. Has anyone seen an issue like this? Could this be due to Magenta relying on an older version of Tensorflow?

Tensorflow: device CUDA:0 not supported by XLA service while setting up XLA_GPU_JIT device number 0

I got this when using keras with Tensorflow backend:
tensorflow.python.framework.errors_impl.InvalidArgumentError: device CUDA:0 not supported by XLA service
while setting up XLA_GPU_JIT device number 0
Relevant code:
tfconfig = tf.ConfigProto()
tfconfig.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1
tfconfig.gpu_options.allow_growth = True
K.tensorflow_backend.set_session(tf.Session(config=tfconfig))
tensorflow version: 1.14.0
Chairman Guo's code:
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
solved my problem of jupyter notebook kernel crashing at:
tf.keras.models.load_model(path/to/my/model)
The fatal message was:
2020-01-26 11:31:58.727326: F
tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch
value instead of handling error Internal: failed initializing
StreamExecutor for CUDA device ordinal 0: Internal: failed call to
cuDevicePrimaryCtxRetain: CUDA_ERROR_UNKNOWN: unknown error
My TF's version is: 2.2.0-dev20200123. There are 2 GPUs on this system.
This could be due to your TF-default (i.e. 1st) GPU is running out of memory. If you have multiple GPUs, divert your Python program to run on other GPUs. In TF (suppose using TF-2.0-rc1), set the following:
# Specify which GPU(s) to use
os.environ["CUDA_VISIBLE_DEVICES"] = "1" # Or 2, 3, etc. other than 0
# On CPU/GPU placement
config = tf.compat.v1.ConfigProto(allow_soft_placement=True, log_device_placement=True)
config.gpu_options.allow_growth = True
tf.compat.v1.Session(config=config)
# Note that ConfigProto disappeared in TF-2.0
Suppose, however, your environment have only one GPU, then perhaps you have no choice but ask your buddy to stop his program, then treat him a cup of coffee.

Can't use tensorflow.keras.layers.CuDNNLSTM or keras.layers.CuDNNLSTM in my Colab hosted runtime

When I tried to use either tensorflow.keras.layers.CuDNNLSTM or keras.layers.CuDNNLSTM, I got the following error:
InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNN' used by {{node cu_dnnlstm/CudnnRNN}}with these attrs: [dropout=0, seed=0, T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="lstm", is_training=true, seed2=0]
Registered devices: [CPU, XLA_CPU]
I am using the hosted runtime and I presume that supports GPU as well but I noticed the error message above shows there is no GPU. Not so sure what the problem is but any clue will be appreciated
You need to explicitly request a GPU enabled runtime.
From the Runtime menu select "Change runtime type" then select GPU under "hardware accelerator":