i'm trying to train a deep learning model on vs code so i would like to use the GPU for that. I have cuda 11.6 , nvidia GeForce GTX 1650, TensorFlow-gpu==2.5.0 and pip version 21.2.3 for windows 10. The problem is whenever i run this part of code i've got this error : Mixed precision training with AMP or APEX (--fp16 or --bf16) and half precision evaluation (--fp16_full_eval or --bf16_full_eval) can only be used on CUDA devices.
from transformers import TrainingArguments
training_args = TrainingArguments(
output_dir=new_output_models_dir,
#output_dir="dev/",
group_by_length=True,
per_device_train_batch_size=16,
gradient_accumulation_steps=2,
#dataloader_num_workers = 1,
dataloader_num_workers = 0,
evaluation_strategy="steps",
num_train_epochs=40,
fp16=True,
save_steps=400,
eval_steps=400,
logging_steps=400,
learning_rate=1e-4,
warmup_steps=500,
save_total_limit=2,
)
I've also tested whether tensorflow can access a gpu and whether tensorflow was built with cuda gpu support using tf.config.list_physical_devices('GPU') and tf.test.is_built_with_cuda() and both of them return TRUE . How to slove this issue ? and why i'm getting this error ? Any ideas !
The above error suggests that it does not accept fp16=True/bf16=True in non-GPU mode. Perhaps Cuda 11.6 might be an issue here which has stability issues.
Test with Cuda 11.2 and CudNN 8.1 . If that does not work you can go with fp16=False parametre.
Ref - https://www.tensorflow.org/install/source#gpu
Related
model_nbeats = NBEATSModel(
input_chunk_length=30,
output_chunk_length=7,
generic_architecture=True,
num_stacks=10,
num_blocks=1,
num_layers=4,
layer_widths=512,
n_epochs=100,
nr_epochs_val_period=1,
batch_size=800,
model_name="nbeats_run",
)
model_nbeats.fit(train, val_series=val, verbose=True)
Hi,
While running the code above, "PossibleUserWarning: GPU available but not used. Set accelerator and devices using Trainer(accelerator='gpu', devices=1)." is appearing in the output cell of Google Colab. How can I fix it?
I have tried darts.nbeats to fit training data in 100 epochs. I select GPU to accelerate the training process. However, the training was not accelerated and the afferomentioned possiblewarning was appeared.
This question already has answers here:
How does CUDA assign device IDs to GPUs?
(4 answers)
Closed 4 years ago.
I saw this solution, but it doesn't quite answer my question; it's also quite old so I'm not sure how relevant it is.
I keep getting conflicting outputs for the order of GPU units. There are two of them: Tesla K40 and NVS315 (legacy device that is never used). When I run deviceQuery, I get
Device 0: "Tesla K40m"
...
Device PCI Domain ID / Bus ID / location ID: 0 / 4 / 0
Device 1: "NVS 315"
...
Device PCI Domain ID / Bus ID / location ID: 0 / 3 / 0
On the other hand, nvidia-smi produces a different order:
0 NVS 315
1 Tesla K40m
Which I find very confusing. The solution I found for Tensorflow (and a similar one for Pytorch) is to use
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"
PCI Bus ID is 4 for Tesla and 3 for NVS, so it should set it to 3 (NVS), is that right?
In pytorch I set
os.environ['CUDA_VISIBLE_DEVICES']='0'
...
device = torch.cuda.device(0)
print torch.cuda.get_device_name(0)
to get Tesla K40m
when I set instead
os.environ['CUDA_VISIBLE_DEVICES']='1'
device = torch.cuda.device(1)
print torch.cuda.get_device_name(0)
to get
UserWarning:
Found GPU0 NVS 315 which is of cuda capability 2.1.
PyTorch no longer supports this GPU because it is too old.
warnings.warn(old_gpu_warn % (d, name, major, capability[1]))
NVS 315
So I'm quite confused: what's the true order of GPU devices that tf and pytorch use?
By default, CUDA orders the GPUs by computing power. GPU:0 will be the fastest GPU on your host, in your case the K40m.
If you set CUDA_DEVICE_ORDER='PCI_BUS_ID' then CUDA orders your GPU depending on how you set up your machine meaning that GPU:0 will be the GPU on your first PCI-E lane.
Both Tensorflow and PyTorch use the CUDA GPU order. That is consistent with what you showed:
os.environ['CUDA_VISIBLE_DEVICES']='0'
...
device = torch.cuda.device(0)
print torch.cuda.get_device_name(0)
Default order so GPU:0 is the K40m since it is the most powerful card on your host.
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ['CUDA_VISIBLE_DEVICES']='0'
...
device = torch.cuda.device(0)
print torch.cuda.get_device_name(0)
PCI-E lane order, so GPU:0 is the card with the lowest bus-id in your case the NVS.
I am using kears with tensorflow backend, and following is the problem. Is there any can solve this problem, thanks!
The error is caused by a illegal value of CNMEM. According to theano doc, CNMEM can only be assigned as a float.
0: not enabled.
0 < N <= 1: use this fraction of the total GPU memory (clipped to .95 for driver memory).
1: use this number in megabytes (MB) of memory.
You can also refer to here.
The warning is due to a change in Theano (Kera's backend). It will change from CUDA to GpuArray. You can refer to here for a solution.
Actually if you fix the warning, the error will disappear as well according to:
This value allocates GPU memory ONLY when using (CUDA backend) and has no effect when the GPU backend is (GpuArray Backend). For the new backend, please see config.gpuarray.preallocate
I am a beginner in deep learning/theano/keras.I'm trying to figure out how to use multiple gpus on windows 7. I've had success installing Theano,keras(as described in this post How do I install Keras and Theano in Anaconda Python on Windows?) and using one gpu. I want to use both my gpus
Following are the details of configs and versions
Python - 2.7(Anaconda-4.3.14,Windows-64bit)
,CUDA - 7.5.17
,Theano - 0.9.0rc3
,keras - 1.2.2
,pycuda - 2016.1.2+cuda7518
,gpu - Geforce GTX 480(2 of them)
Theano configuration is as below
.theanorc.txt
[global]
floatX = float32
device = gpu
[nvcc]
flags=-LC:\ProgramData\Anaconda2\libs
compiler_bindir=C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin
[lib]
cnmem=0.8
Currently I'm able to use only one GPU and I am getting memory error as below when I try to fit the model
MemoryError: ('Error allocating 411041792 bytes of device memory (CNMEM_STATUS_OUT_OF_MEMORY).', "you might consider using 'theano.shared(..., borrow=True)'")
Does using 2 gpus solve the problem(if yes, how do I enable the second one?)
or is my model too big ?
Thank You
My desktop has two gpus which can run Tensorflow with specification /gpu:0 or /gpu:1. However, if I don't specify which gpu to run the code, Tensorflow will by default to call /gpu:0, as we all know.
Now I would like to setup the system such that it can assign gpu dynamically according to the free memory of each gpu. For example, if a script doesn't specify which gpu to run the code, the system first assigns /gpu:0 for it; then if another script runs now, it will check whether /gpu:0 has enough free memory. If yes, it will continue assign /gpu:0 to it, otherwise it will assign /gpu:1 to it. How can I achieve it?
Follow-ups:
I believe the question above may be related to the virtualization problem of GPU. That is to say, if I can virtualize multi-gpu in a desktop into one GPU, I can get what I want. So beside any setup methods for Tensorflow, any ideas about virtualization is also welcome.
TensorFlow generally assumes it's not sharing GPU with anyone, so I don't see a way of doing it from inside TensorFlow. However, you could do it from outside as follows -- shell script that calls nvidia-smi, parses out GPU k with more memory, then sets "CUDA_VISIBLE_DEVICES=k" and calls TensorFlow script
Inspired by:
How to set specific gpu in tensorflow?
def leave_gpu_with_most_free_ram():
try:
command = "nvidia-smi --query-gpu=memory.free --format=csv"
memory_free_info = _output_to_list(sp.check_output(command.split()))[1:]
memory_free_values = [int(x.split()[0]) for i, x in enumerate(memory_free_info)]
least_busy_idx = memory_free_values.index(max(memory_free_values))
# update CUDA variable
gpus =[least_busy_idx]
setting = ','.join(map(str, gpus))
os.environ["CUDA_VISIBLE_DEVICES"] = setting
print('Left next %d GPU(s) unmasked: [%s] (from %s available)'
% (leave_unmasked, setting, str(available_gpus)))
except FileNotFoundError as e:
print('"nvidia-smi" is probably not installed. GPUs are not masked')
print(e)
except sp.CalledProcessError as e:
print("Error on GPU masking:\n", e.output)
Add a call to this function before importing tensorflow