multiple GPUs keras weird speedup - tensorflow

I did implement a similar code like the multi GPU code from keras
(multiGPU tutorial). When running this on a Server with 2 GPUs I have the following training times per epoch:
showing Keras only one GPU and setting variable gpus = 1 (only use one GPU), one epoch = 32s
showing Keras two GPUs, and gpus = 1, one epoch = 31 s
showing Keras two GPUs, and gpus = 2, one epoch = 37 s
the output looks a bit strange, while initializing the code seems to create multiple Tensorflow devices per GPU, I'm not sure if this is the correct behavior. But the most other examples I saw had just one such line per GPU.
first test (one GPU shown, gpus = 1):
2017-12-04 14:54:04.071549: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: Tesla P100-PCIE-16GB
major: 6 minor: 0 memoryClockRate (GHz) 1.3285
pciBusID 0000:82:00.0
Total memory: 15.93GiB
Free memory: 15.64GiB
2017-12-04 14:54:04.071597: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
2017-12-04 14:54:04.071605: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y
2017-12-04 14:54:04.071619: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:82:00.0)
2017-12-04 14:54:21.531654: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:82:00.0)
second test (2 GPU shown, gpus = 1):
2017-12-04 14:48:24.881733: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 1 with properties:
...(same as earlier)
2017-12-04 14:48:24.882924: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:82:00.0)
2017-12-04 14:48:24.882931: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:83:00.0)
2017-12-04 14:48:42.353807: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:82:00.0)
2017-12-04 14:48:42.353851: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:83:00.0)
and weirdly for example 3 (gpus = 2):
2017-12-04 14:41:35.906828: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 1 with properties:
...(same as earlier)
2017-12-04 14:41:35.907996: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:82:00.0)
2017-12-04 14:41:35.908002: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:83:00.0)
2017-12-04 14:41:52.944335: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:82:00.0)
2017-12-04 14:41:52.944377: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:83:00.0)
2017-12-04 14:41:53.709812: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:82:00.0)
2017-12-04 14:41:53.709838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:83:00.0)
the code:
LSTM = keras.layers.CuDNNLSTM
model.add(LSTM(knots, input_shape=(timesteps, X_train.shape[-1]), return_sequences=True))
model.add(LSTM(knots))
model.add(Dense(3, activation='softmax'))
if gpus>=2:
model_basic = model
with tf.device("/cpu:0"):
model = model_basic
parallel_model = multi_gpu_model(model, gpus=gpus)
model = parallel_model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])
hist = model.fit(myParameter)
Is this a typical behavior? What is wrong with my code that the multiple devices per GPU are created. Thanks in advance.

I tried the exact code of multiGPU tutorial.
It looks like its somehow the expected output. But to see the expected speed differences I had to increase the number of samples (20000) and needed to height and width to 100 (due to RAM limits).
I'm not completely sure why in my case I didn't see a speedup with two GPU. I expect it to be due to limits of the memory speed. Because my batch size is rather small and each sample is also small. This leads to the effect that the managing of the data needs more time than the actual calculation.
The distribution of the data gets even more time consuming when using 2 GPUs, while the actual runtime on each GPU decreases.
This effect could be proven if I could check the utilization of the graphics cards. Sadly I don't know how to do this.
If anyone has other ideas on this, let me know. Thanks

Related

Converting tensorflow session-based code to distribute.Mirroredstrategy

I'm pretty new to Tensorflow and I'll be the first to admit I'm a bit confused and turned around and might very well be barking up the wrong tree.
First: This is NOT a question about getting my GPUs working and seen by tensorflow(TF); I have verified from inside the container the GPU's are detected by TF. (using tensorflow/tensorflow:1.13.1-gpu-py3)
2020-02-20 22:24:25.233916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-20 22:24:25.233933: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 1 2 3 4 5
2020-02-20 22:24:25.233939: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N Y Y Y Y Y
2020-02-20 22:24:25.233943: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: Y N Y Y Y Y
2020-02-20 22:24:25.233947: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: Y Y N Y Y Y
2020-02-20 22:24:25.233950: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: Y Y Y N Y Y
2020-02-20 22:24:25.233954: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 4: Y Y Y Y N Y
2020-02-20 22:24:25.233958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 5: Y Y Y Y Y N
2020-02-20 22:24:25.234135: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7623 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-02-20 22:24:25.234370: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 7624 MB memory) -> physical GPU (d
evice: 1, name: GeForce GTX 1070, pci bus id: 0000:02:00.0, compute capability: 6.1)
2020-02-20 22:24:25.234516: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 7624 MB memory) -> physical GPU (d
evice: 2, name: GeForce GTX 1070, pci bus id: 0000:04:00.0, compute capability: 6.1)
2020-02-20 22:24:25.234623: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 7624 MB memory) -> physical GPU (d
evice: 3, name: GeForce GTX 1070, pci bus id: 0000:05:00.0, compute capability: 6.1)
2020-02-20 22:24:25.234832: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:4 with 7624 MB memory) -> physical GPU (d
evice: 4, name: GeForce GTX 1070, pci bus id: 0000:07:00.0, compute capability: 6.1)
2020-02-20 22:24:25.234949: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:5 with 7624 MB memory) -> physical GPU (d
evice: 5, name: GeForce GTX 1070, pci bus id: 0000:08:00.0, compute capability: 6.1)
Second: The code as-is does work; I have successfully run the training model, but of course it only used the first GPU.
The code I'm using is a GAN project and uses a 'with' block for training:
with tf.Session() as session:
# Time stamp
localtime = time.asctime( time.localtime(time.time()) )
print("Starting TensorFlow session...")
print("Local current time :", localtime)
# Start TensorFlow session...
session.run(tf.global_variables_initializer())
.
.
.
I've been going in circles (and crazy) trying to figure out how to use the recommended tf.distribute.MirroredStrategy() to do parallel training across my GPUs. Everything I've come across so far leads in circles or stops short of applicable examples.
Is there a straightforward way to modify the session code to use the mirrored strategy? Is there just a more basic way to get the session calls to train across multiple GPUs?
It's doable, but the short answer to this is no. There's no straightforward way.
Everything I've been able to find requires a good understanding of tensorflow and a fair amount of work to port a 'stock' tensorflow session over to multi GPU. It requires running multiple sessions with assigned GPUs and figuring out how to coordinate the training data between everything.
Switching to the newer strategy paradigm (or keras multigpu) requires figuring out how to express the learning model in layers vs sessions. Again, something that requires a pretty solid handle on TF.
If you're starting from scratch, I'd say look into keras or the mirrored strategy from the beginning.
Managing sessions and coordinating data is a bit of a headache (thats why there's nice wrappers now) but it's been done a lot.
If you're a beginner like me let me save you some frustration; it's a big task either go a different way or buckle up no easy answer.
https://jhui.github.io/2017/03/07/TensorFlow-GPU/

(tensorflow) Am I using two gpus in parallel correctly?

(I'am sorry if this question is too novice but as I don't quite understand and want to double check whether I am using two gpus in parallel in a correct manner, I ask you the following question.)
Two gpus (with the same model) are installed in the pc I am using. In a pycharm project, I run a tensorflow code setting
os.environ["CUDA_VISIBLE_DEVICES"] = '0'
, which then runs with a initial run log
Using TensorFlow backend.
2018-09-15 03:36:36.727152: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-09-15 03:36:37.080157: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1405] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:17:00.0
totalMemory: 11.00GiB freeMemory: 9.08GiB
2018-09-15 03:36:37.080671: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1484] Adding visible gpu devices: 0
2018-09-15 03:36:37.796088: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-15 03:36:37.796320: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:971] 0
2018-09-15 03:36:37.796469: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:984] 0: N
2018-09-15 03:36:37.796723: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8783 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:17:00.0, compute capability: 6.1)
Then in another pycharm project, I run a tensorflow code setting
os.environ["CUDA_VISIBLE_DEVICES"] = '1'
which then shows in run log
Using TensorFlow backend.
2018-09-15 03:37:00.119630: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-09-15 03:37:00.468546: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1405] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:65:00.0
totalMemory: 11.00GiB freeMemory: 9.08GiB
2018-09-15 03:37:00.468930: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1484] Adding visible gpu devices: 0
2018-09-15 03:37:01.199726: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-15 03:37:01.199950: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:971] 0
2018-09-15 03:37:01.200096: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:984] 0: N
2018-09-15 03:37:01.200349: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8783 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1)
What worries me that they have both device 0. But their pciBusIDs are different.
So my simple question is am I using two gpus in parallel in a correct manner?
As I am using Windows 10, I monitored gpu usages with Device Manager, and it seems correct to me. But I just want to hear from experts.
And if it is okay for you to answer, what is pci bus ID, roughly? And why are both of them showing device 0?
No, you have to add
with tf.device("your_device_name")
Just follow this tutorial, section Using Multiple GPUs https://www.tensorflow.org/guide/using_gpu

Which GPU is used when 2 gpus are available but no specific selections on them

I have two gpus installed in my pc as they are to be used in parallel (without any SLI or likes). Suppose I run a simple code in tensorflow like linear regression as in this. Then which gpu is used? Are both of them used? Here is run log.
2018-09-15 02:55:36.314345: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-09-15 02:55:36.675657: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1405] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:17:00.0
totalMemory: 11.00GiB freeMemory: 9.08GiB
2018-09-15 02:55:36.798520: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1405] Found device 1 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:65:00.0
totalMemory: 11.00GiB freeMemory: 9.08GiB
2018-09-15 02:55:36.799044: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1484] Adding visible gpu devices: 0, 1
2018-09-15 02:55:38.234984: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-15 02:55:38.235236: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:971] 0 1
2018-09-15 02:55:38.235392: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:984] 0: N N
2018-09-15 02:55:38.235559: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:984] 1: N N
2018-09-15 02:55:38.235849: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8783 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:17:00.0, compute capability: 6.1)
2018-09-15 02:55:38.601267: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 8783 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1)
Tensorflow's default is to consume the memory of all visible GPUs, but unless you code for using multiple GPUs only the first of the two will be used for computation.
You would typically set the environment variable export CUDA_VISIBLE_DEVICES=0prior to running python to limit tensorflow to only seeing gpu0, for example. (0=gpu0, 1=gpu1, etc, -1=cpu only)
Using both GPUs for computation requires that you code for multiple GPUs (and make decisions about what that means in your model), there are many tutorials on the topic, here's one quick one I pulled up: http://blog.s-schoener.com/2017-12-15-parallel-tensorflow-intro/

How can i make Tensor flow train.py use all the available GPU's?

I am running tensorflow 1.7 on my local machine which contains 2 GPU's each of around 8 GB.
Training(train.py) the Object detection is working when i am using the model 'faster_rcnn_resnet101_coco'. But When i tried to run 'faster_rcnn_nas_coco' its showing an error of 'Resource exhausted'
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/learning.py:736: __init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2018-05-02 16:14:53.963966: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0, 1
2018-05-02 16:14:53.964071: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-02 16:14:53.964083: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0 1
2018-05-02 16:14:53.964091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N Y
2018-05-02 16:14:53.964097: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 1: Y N
2018-05-02 16:14:53.964566: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7385 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:02:00.0, compute capability: 6.1)
2018-05-02 16:14:53.966360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 7552 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1070, pci bus id: 0000:03:00.0, compute capability: 6.1)
INFO:tensorflow:Restoring parameters from training/model.ckpt-0
INFO:tensorflow:Restoring parameters from training/model.ckpt-0
Limit: 7744048333
InUse: 7699536896
MaxInUse: 7699551744
NumAllocs: 10260
MaxAllocSize: 4076716032
2018-05-02 16:16:52.223943: W tensorflow/core/common_runtime/bfc_allocator.cc:279] ***********************************************************************************x****************
2018-05-02 16:16:52.223967: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at depthwise_conv_op.cc:358 : Resource exhausted: OOM when allocating tensor with shape[64,672,9,9] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
I am not sure !! If its using both GPU's because it showing in usage memory as '7699536896'. After going through the train.py, I also tried
python train.py \
--logtostderr \
--worker_replicas=2 \
--pipeline_config_path=training/faster_rcnn_resnet101_coco.config \
--train_dir=training
If 2 GPU's are available, Does tensorflow by default choose both of them ? or does it need any arguments ?
We use the number of GPU specified by worker_replicas. For the NASNet case, try decreasing the batch-size to make the network fit on GPU.

GPU ->CPU Memcpy failed in tensorflow word2vec in gpu occured

I I am studying word2vec of tensorflow.
We bought two 1080i for parallel processing of gpu.
Mounting was successful and p2p was successful.
However, I tried to assign it to gpu using the command with tf.device ('/ gpu: 0')
The following error occurs :
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.645
pciBusID 0000:66:00.0
Total memory: 10.91GiB
Free memory: 10.21GiB
tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1
tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y Y
tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1: Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:66:00.0)
I word2vec_kernels.cc:246] Data file: data/spouse_freebase/input2.nt contains 34966827 bytes, 2620786 words, 11769 unique words, 11769 unique frequent words.
E tensorflow/stream_executor/cuda/cuda_driver.cc:1276] failed to enqueue async memcpy from device to host: CUDA_ERROR_INVALID_VALUE; host dst: 0x104d5000000; GPU src: 0x7f12c800cbc0; size: 8=0x8
I tensorflow/stream_executor/stream.cc:1338] stream 0x39c2160 did not wait for stream: 0x39bf9a0
I tensorflow/stream_executor/stream.cc:3775] stream 0x39c2160 did not memcpy device-to-host; source: 0x3bd0d00
F tensorflow/core/common_runtime/gpu/gpu_util.cc:296] GPU->CPU Memcpy failed
I think this error is the out of memory of the gpu.
I wait for your help.
thank you.
I had the same issue. I've just switched off G-SYNC support in Nvidia settings and it helped.