How to Enable Mixed precision training - tensorflow

i'm trying to train a deep learning model on vs code so i would like to use the GPU for that. I have cuda 11.6 , nvidia GeForce GTX 1650, TensorFlow-gpu==2.5.0 and pip version 21.2.3 for windows 10. The problem is whenever i run this part of code i've got this error : Mixed precision training with AMP or APEX (--fp16 or --bf16) and half precision evaluation (--fp16_full_eval or --bf16_full_eval) can only be used on CUDA devices.
from transformers import TrainingArguments
training_args = TrainingArguments(
output_dir=new_output_models_dir,
#output_dir="dev/",
group_by_length=True,
per_device_train_batch_size=16,
gradient_accumulation_steps=2,
#dataloader_num_workers = 1,
dataloader_num_workers = 0,
evaluation_strategy="steps",
num_train_epochs=40,
fp16=True,
save_steps=400,
eval_steps=400,
logging_steps=400,
learning_rate=1e-4,
warmup_steps=500,
save_total_limit=2,
)
I've also tested whether tensorflow can access a gpu and whether tensorflow was built with cuda gpu support using tf.config.list_physical_devices('GPU') and tf.test.is_built_with_cuda() and both of them return TRUE . How to slove this issue ? and why i'm getting this error ? Any ideas !

The above error suggests that it does not accept fp16=True/bf16=True in non-GPU mode. Perhaps Cuda 11.6 might be an issue here which has stability issues.
Test with Cuda 11.2 and CudNN 8.1 . If that does not work you can go with fp16=False parametre.
Ref - https://www.tensorflow.org/install/source#gpu

Related

GPU available but not used. Set `accelerator` and `devices` using `Trainer(accelerator='gpu', devices=1)`

model_nbeats = NBEATSModel(
input_chunk_length=30,
output_chunk_length=7,
generic_architecture=True,
num_stacks=10,
num_blocks=1,
num_layers=4,
layer_widths=512,
n_epochs=100,
nr_epochs_val_period=1,
batch_size=800,
model_name="nbeats_run",
)
model_nbeats.fit(train, val_series=val, verbose=True)
Hi,
While running the code above, "PossibleUserWarning: GPU available but not used. Set accelerator and devices using Trainer(accelerator='gpu', devices=1)." is appearing in the output cell of Google Colab. How can I fix it?
I have tried darts.nbeats to fit training data in 100 epochs. I select GPU to accelerate the training process. However, the training was not accelerated and the afferomentioned possiblewarning was appeared.

Order of CUDA devices [duplicate]

This question already has answers here:
How does CUDA assign device IDs to GPUs?
(4 answers)
Closed 4 years ago.
I saw this solution, but it doesn't quite answer my question; it's also quite old so I'm not sure how relevant it is.
I keep getting conflicting outputs for the order of GPU units. There are two of them: Tesla K40 and NVS315 (legacy device that is never used). When I run deviceQuery, I get
Device 0: "Tesla K40m"
...
Device PCI Domain ID / Bus ID / location ID: 0 / 4 / 0
Device 1: "NVS 315"
...
Device PCI Domain ID / Bus ID / location ID: 0 / 3 / 0
On the other hand, nvidia-smi produces a different order:
0 NVS 315
1 Tesla K40m
Which I find very confusing. The solution I found for Tensorflow (and a similar one for Pytorch) is to use
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"
PCI Bus ID is 4 for Tesla and 3 for NVS, so it should set it to 3 (NVS), is that right?
In pytorch I set
os.environ['CUDA_VISIBLE_DEVICES']='0'
...
device = torch.cuda.device(0)
print torch.cuda.get_device_name(0)
to get Tesla K40m
when I set instead
os.environ['CUDA_VISIBLE_DEVICES']='1'
device = torch.cuda.device(1)
print torch.cuda.get_device_name(0)
to get
UserWarning:
Found GPU0 NVS 315 which is of cuda capability 2.1.
PyTorch no longer supports this GPU because it is too old.
warnings.warn(old_gpu_warn % (d, name, major, capability[1]))
NVS 315
So I'm quite confused: what's the true order of GPU devices that tf and pytorch use?
By default, CUDA orders the GPUs by computing power. GPU:0 will be the fastest GPU on your host, in your case the K40m.
If you set CUDA_DEVICE_ORDER='PCI_BUS_ID' then CUDA orders your GPU depending on how you set up your machine meaning that GPU:0 will be the GPU on your first PCI-E lane.
Both Tensorflow and PyTorch use the CUDA GPU order. That is consistent with what you showed:
os.environ['CUDA_VISIBLE_DEVICES']='0'
...
device = torch.cuda.device(0)
print torch.cuda.get_device_name(0)
Default order so GPU:0 is the K40m since it is the most powerful card on your host.
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ['CUDA_VISIBLE_DEVICES']='0'
...
device = torch.cuda.device(0)
print torch.cuda.get_device_name(0)
PCI-E lane order, so GPU:0 is the card with the lowest bus-id in your case the NVS.

keras with tensorflow : a CUDA runtime call was likely performed without using a StreamExecutor context

I am using kears with tensorflow backend, and following is the problem. Is there any can solve this problem, thanks!
The error is caused by a illegal value of CNMEM. According to theano doc, CNMEM can only be assigned as a float.
0: not enabled.
0 < N <= 1: use this fraction of the total GPU memory (clipped to .95 for driver memory).
1: use this number in megabytes (MB) of memory.
You can also refer to here.
The warning is due to a change in Theano (Kera's backend). It will change from CUDA to GpuArray. You can refer to here for a solution.
Actually if you fix the warning, the error will disappear as well according to:
This value allocates GPU memory ONLY when using (CUDA backend) and has no effect when the GPU backend is (GpuArray Backend). For the new backend, please see config.gpuarray.preallocate

Using multiple gpus on windows using theano,keras

I am a beginner in deep learning/theano/keras.I'm trying to figure out how to use multiple gpus on windows 7. I've had success installing Theano,keras(as described in this post How do I install Keras and Theano in Anaconda Python on Windows?) and using one gpu. I want to use both my gpus
Following are the details of configs and versions
Python - 2.7(Anaconda-4.3.14,Windows-64bit)
,CUDA - 7.5.17
,Theano - 0.9.0rc3
,keras - 1.2.2
,pycuda - 2016.1.2+cuda7518
,gpu - Geforce GTX 480(2 of them)
Theano configuration is as below
.theanorc.txt
[global]
floatX = float32
device = gpu
[nvcc]
flags=-LC:\ProgramData\Anaconda2\libs
compiler_bindir=C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin
[lib]
cnmem=0.8
Currently I'm able to use only one GPU and I am getting memory error as below when I try to fit the model
MemoryError: ('Error allocating 411041792 bytes of device memory (CNMEM_STATUS_OUT_OF_MEMORY).', "you might consider using 'theano.shared(..., borrow=True)'")
Does using 2 gpus solve the problem(if yes, how do I enable the second one?)
or is my model too big ?
Thank You

Tensorflow: dynamically call GPUs with enough free memory

My desktop has two gpus which can run Tensorflow with specification /gpu:0 or /gpu:1. However, if I don't specify which gpu to run the code, Tensorflow will by default to call /gpu:0, as we all know.
Now I would like to setup the system such that it can assign gpu dynamically according to the free memory of each gpu. For example, if a script doesn't specify which gpu to run the code, the system first assigns /gpu:0 for it; then if another script runs now, it will check whether /gpu:0 has enough free memory. If yes, it will continue assign /gpu:0 to it, otherwise it will assign /gpu:1 to it. How can I achieve it?
Follow-ups:
I believe the question above may be related to the virtualization problem of GPU. That is to say, if I can virtualize multi-gpu in a desktop into one GPU, I can get what I want. So beside any setup methods for Tensorflow, any ideas about virtualization is also welcome.
TensorFlow generally assumes it's not sharing GPU with anyone, so I don't see a way of doing it from inside TensorFlow. However, you could do it from outside as follows -- shell script that calls nvidia-smi, parses out GPU k with more memory, then sets "CUDA_VISIBLE_DEVICES=k" and calls TensorFlow script
Inspired by:
How to set specific gpu in tensorflow?
def leave_gpu_with_most_free_ram():
try:
command = "nvidia-smi --query-gpu=memory.free --format=csv"
memory_free_info = _output_to_list(sp.check_output(command.split()))[1:]
memory_free_values = [int(x.split()[0]) for i, x in enumerate(memory_free_info)]
least_busy_idx = memory_free_values.index(max(memory_free_values))
# update CUDA variable
gpus =[least_busy_idx]
setting = ','.join(map(str, gpus))
os.environ["CUDA_VISIBLE_DEVICES"] = setting
print('Left next %d GPU(s) unmasked: [%s] (from %s available)'
% (leave_unmasked, setting, str(available_gpus)))
except FileNotFoundError as e:
print('"nvidia-smi" is probably not installed. GPUs are not masked')
print(e)
except sp.CalledProcessError as e:
print("Error on GPU masking:\n", e.output)
Add a call to this function before importing tensorflow