How to disable GPUs in H2O AutoML - gpu

When I run an experiment with H2O AutoML, I got the error: "terminate called after throwing an instance of 'thrust::system::system_error' what(): parallel_for failed: invalid resource handle". This error message comes from XGBoost and it is because of the GPU limit exceed.
While I'm using the regular XGBoost, I set the cuda visible devices parameter to blank to disable GPUs. However, this arguments seems to be ignored in H2O AutoML - XGBoost implementation.
import os
os.environ["CUDA_VISIBLE_DEVICES"] = ""
Currently, the only xgboost can be run on gpu in H2O AutoML.
The question it that anybody knows how to disable GPUs in H2O AutoML?
As a workaround, I excluded XGBoost algorithm to run my experiment for now. The trouble is passed when I exclude XGBoost but I do not want to give up the power of XGBoost.
from h2o.automl import H2OAutoML
model = H2OAutoML(max_runtime_secs = 60*60*2, exclude_algos = ["XGBoost"])

That's definitely an oversight and we will need to add the ability to turn on/off and/or specify the GPU. I opened a ticket for this. I wonder if there's a way to temporarily disable the GPU at the system level (outside of H2O/Python) in the meantime? Thanks for the report!

Related

Google Colab - Your session crashed for an unknown reason

Your session crashed for an unknown reason
when I run the following cell in Google Colab:
from keras import backend as K
if 'tensorflow' == K.backend():
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
config.gpu_options.visible_device_list = "0"
set_session(tf.Session(config=config))
I receive this message since I have uploaded two data sets to google drive.
Does anyone know this message and can give me some advice?
Many thanks for every hint.
Update:
I always receive the message
Update
I have removed the data sets from Google Drive, but the session is still crashing.
Google Colab is crashing because you are trying to Run Code related to GPU with Runtime as CPU.
The execution is successful if you change the Runtime as GPU. Steps for the same are mentioned below:
Runtime -> Change Runtime -> GPU (Select from dropdown).
Please find the Working code in Github Gist.
Just a side note: sometimes you may want to reinstall an litle older version of the related module (see from the error log). It works for me in a case.
This error happens when the expected device and the actual device are different.
For example, if you run the code that is written with torch_xla, which is for TPU training, on the GPU (cuda) then the Colab will return you this error.
It is really tricky since it does not give you an actual debugging message, etc, which makes you hard to find what is the actual problem.

Can we run training and validation on separate GPUs using tensorflow object detection API running on tensorflow 1.12?

I have two Nvidia Titan X cards on my machine and want to finetune COCO pretrained Inception V2 model on a single specific class. I have created the train/val tfrecords and changed the config to run the tensorflow object detection training pipeline.
I am able to start the training but it hangs (without any OOM) whenever it tries to evaluate a checkpoint. Currently it is using only GPU 0 with other resource parameters (like RAM, CPU, IO etc) in normal range. So I am guessing that GPU is the bottleneck. I wanted to try splitting training and validation on separate GPUs and see if it works.
I tried to look for a place where I could do something like setting "CUDA_VISIBLE_DEVICES" differently for both the processes but unfortunately the latest tensorflow object detection API code (using tensorflow 1.12) makes it very difficult to do so. I am also unable to verify my assumption about training and validation running in same process as my machine hangs. Could someone please suggest where to look for to solve it?

data generator with tensorflow on the gpu

I am making a neural network using tensorflow and I ran into a problem trying to use a generator to split my data up, basically it's too slow.
My training data consists of 52x52 numpy arrays. I need to split each array into a 52x52x3 array before I input it into my NN. As mentioned I have a generator working that does this, but I noticed that even though my NN is running on the GPU my GPU usage is very low (under 10% usually). I think this might be caused by me doing the generator on the CPU.
Is there any way of running my generator on the GPU?
What I tried:
- I thought of trying to use pyCUDA in order to program the generator on the GPU but found that tensorflow and pyCUDA don't support each other
-I tried using the from_generator function from the Dataset API as mentioned here:
https://www.tensorflow.org/api_docs/python/tf/contrib/data/Dataset
But while having issues with it I ran into this github thread mentioning that this function isn't supported to run on the GPU anyway:
https://github.com/tensorflow/tensorflow/issues/13610
Any help would be greatly appreciated.

keras + scikit-learn wrapper, appears to hang when GridSearchCV with n_jobs >1

UPDATE: I have to re-write this question as after some investigation I realise that this is a different problem.
Context: running keras in a gridsearch setting using the kerasclassifier wrapper with scikit learn. Sys: Ubuntu 16.04, libraries: anaconda distribution 5.1, keras 2.0.9, scikitlearn 0.19.1, tensorflow 1.3.0 or theano 0.9.0, using CPUs only.
Code:
I simply used the code here for testing: https://machinelearningmastery.com/use-keras-deep-learning-models-scikit-learn-python/, the second example 'Grid Search Deep Learning Model Parameters'. Pay attention to line 35, which reads:
grid = GridSearchCV(estimator=model, param_grid=param_grid)
Symptoms: When grid search uses more than 1 jobs (means cpus?), e.g.,, setting 'n_jobs' on the above line A to '2', line below:
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=2)
will cause the code to hang indefinitely, either with tensorflow or theano, and there is no cpu usage (see attached screenshot, where 5 python processes were created but none is using cpu).
By debugging, it appears to be the following line with 'sklearn.model_selection._search' that causes problems:
line 648: for parameters, (train, test) in product(candidate_params,
cv.split(X, y, groups)))
, on which the program hangs and cannot continue.
I would really appreciate some insights as to what this means and why this could happen.
Thanks in advance
Are you using a GPU? If so, you can't have multiple threads running each variation of the params because they won't be able to share the GPU.
Here's a full example on how to use keras, sklearn wrappers in a Pipeline with GridsearchCV: Pipeline with a Keras Model
If you really want to have multiple jobs in the GridSearchCV, you can try to limit the GPU fraction used by each job (e.g. if each job only allocates 0.5 of the available GPU memory, you can run 2 jobs simultaneously)
See these issues:
Limit the resource usage for tensorflow backend
GPU memory fraction does not work in keras 2.0.9 but it works in 2.0.8
I dealt with this problem too and it really slowed me down not being able to run what is essentially trivially-parallelizable code. The issue is indeed with the tensorflow session. If a session in created in the parent process before GridSearchCV.fit(), it will hang!
The solution for me was to keep all session/graph creation code restricted to the KerasClassifer class and the model creation function i passed to it.
Also what Felipe said about the memory is true, you will want to restrict the memory usage of TF in either the model creation function or a subclass of KerasClassifier.
Related info:
Session hang issue with python multiprocessing
Keras + Tensorflow and Multiprocessing in Python
TL;DR Answer: You can't because your Keras model can't be serialized, and serialization is needed for parallelizing in Python with joblib.
This problem is much detailed here: https://www.neuraxle.org/stable/scikit-learn_problems_solutions.html#problem-you-can-t-parallelize-nor-save-pipelines-using-steps-that-can-t-be-serialized-as-is-by-joblib
The solution to parallelize your code is to make your Keras estimator serializable. This can be done using savers as described at the link above.
If you're lucky enough to be using TensorFlow v2's prebuilt Keras module, the following practical code sample will reveal to be useful to you as you'd practically just need to take the code and modify it with yours:
https://github.com/guillaume-chevalier/seq2seq-signal-prediction
In this example, all the saving and loading code is all pre-written for you using Neuraxle-TensorFlow, and this makes it parallelizeable if you use Neuraxle's AutoML methods (e.g.: Neuraxle's grid search and Neuraxle's own parallelism things).

Does tensorflow automatically detect GPU or do I have to specify it manually?

I have a code written in tensorflow that I run on CPUs and it runs fine.
I am transferring to a new machine which has GPUs and I run the code on the new machine but the training speed did not improve as expected (takes almost the same time).
I understood that Tensorflow automatically detects GPUs and run the operations on them (https://www.quora.com/How-do-I-automatically-put-all-my-computation-in-a-GPU-in-TensorFlow) & (https://www.tensorflow.org/tutorials/using_gpu).
Do I have to change the code to make it manually runs the operations on GPUs (for now I have a single GPU)? and what would be gained by doing that manually?
Thanks
If the GPU version of TensorFlow is installed and if you don't assign all your tensors to CPU, some of them should be assigned to GPU.
To find out which devices (CPU, GPU) are available to TensorFlow, you can use this:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
Regarding the question of the performance, it's quite a broad subject and it really depends of your model, your data and so on. Here are a few and wide remarks on TensorFlow performance.