I'm running some projects with H2o AutoML using Sagemaker notebook instances, and I would like to know if H2o AutoML can benefit from a GPU Sagemaker instance, if so, how should I configure the notebook?
H2O AutoML contains a handful of algorithms and one of them is XGBoost, which has been part of H2O AutoML since H2O version 3.22.0.1. XGBoost is the only GPU-capable algorithm inside of H2O AutoML, however, a lot of the models that are trained in AutoML are XGBoost models, so it still can be useful to utilize a GPU. Keep in mind that you must use H2O 3.22 or above to use this feature.
My suggestion is to test it on a GPU-enabled instance and compare the results to a non-GPU instance and see if it's worth the extra cost.
Related
I am running in the following scenario:
Single Node Kubernetes Cluster (1x i7-8700K, 1x RTX 2070, 32GB RAM)
1 Tensorflow Serving Pod
4 Inference Client Pods
What the inference clients do is they get images from 4 separate cameras (1 each) and pass it to TF-Serving for inference in order to get the understanding of what is seen on the video feeds.
I have previously been doing inference inside the Inference Client Pods individually by calling TensorFlow directly but that hasn't been good on the RAM of the graphics card. Tensorflow Serving has been introduced to the mix quite recently in order to optimize RAM as we don't load duplicated models to the graphics card.
And the performance is not looking good, for a 1080p images it looks like this:
Direct TF: 20ms for input tensor creation, 70ms for inference.
TF-Serving: 80ms for GRPC serialization, 700-800ms for inference.
The TF-Serving pod is the only one that has access to the GPU and it is bound exclusively. Everything else operates on CPU.
Are there any performance tweaks I could do?
The model I'm running is Faster R-CNN Inception V2 from the TF Model Zoo.
Many thanks in advance!
This is from TF Serving documentation:
Please note, while the average latency of performing inference with TensorFlow Serving is usually not lower than using TensorFlow directly, where TensorFlow Serving shines is keeping the tail latency down for many clients querying many different models, all while efficiently utilizing the underlying hardware to maximize throughput.
From my own experience, I've found TF Serving to be useful in providing an abstraction over model serving which is consistent, and does not require implementing custom serving functionalities. Model versioning and multi-model which come out-of-the-box save you lots of time and are great additions.
Additionally, I would also recommend batching your requests if you haven't already. I would also suggest playing around with the TENSORFLOW_INTER_OP_PARALLELISM, TENSORFLOW_INTRA_OP_PARALLELISM, OMP_NUM_THREADS arguments to TF Serving. Here is an explanation of what they are
Maybe you could try OpenVINO? It's a heavily optimized toolkit for inference. You could utilize your i7-8700K and run some frames in parallel. Here are some performance benchmarks for very similar i7-8700T.
There is even OpenVINO Model Server which is very similar to Tensorflow Serving.
Disclaimer: I work on OpenVINO.
I have recently become interested in incorporating distributed training into my Tensorflow projects. I am using Google Colab and Python 3 to implement a Neural Network with customized, distributed, training loops, as described in this guide:
https://www.tensorflow.org/tutorials/distribute/training_loops
In that guide under section 'Create a strategy to distribute the variables and the graph', there is a picture of some code that basically sets up a 'MirroredStrategy' and then prints the number of generated replicas of the model, see below.
Console output
From what I can understand, the output indicates that the MirroredStrategy has only created one replica of the model, and thereofore, only one GPU will be used to train the model. My question: is Google Colab limited to training on a single GPU?
I have tried to call MirroredStrategy() both with, and without, GPU acceleration, but I only get one model replica every time. This is a bit surprising because when I use the multiprocessing package in Python, I get four threads. I therefore expected that it would be possible to train four models in parallel in Google Colab. Are there issues with Tensorflows implementation of distributed training?
On google colab, you can only use one GPU, that is the limit from Google. However, you can run different programs on different gpu instances so by creating different colab files and connect them with gpus but you can not place the same model on many gpu instances in parallel.
There are no problems with mirrored startegy, talking from personal experience it works fine if you have more than one GPU.
I am looking into using H2O to create a client-facing application from which they will be able to import data and run ML models on. As H2O only offers a limited number of models at the moment, is there any way to build custom models (an LSTM in TensorFlow, for example), import them into H2O where they can then be run just like any of H2O's included models?
It seems as though H2O's Deep Water was the nearest solution to this, but they have now discontinued its development.
In other words, is there any way to facilitate for different types of models that H2O does not support? (SVM, RNN, CNN, GAN, etc.)
Sorry, deploying non-H2O-3 models within H2O-3 is unsupported.
For trainining speed, it would be nice to be able to train a H2O model with GPUs, take the model file, and then predict on a machine without GPUs.
It seems like that should be possible in theory, but with the H2O release 3.13.0.341, that doesn't seem to happen, except for XGBoost model.
When I run gpustat -cup I can see the GPUs kick in when I train H2O's XGBoost model. This doesn't happen with DL, DRF, GLM, or GBM.
I wouldn't be surprised if a difference in float point size (16, 32, 64) could cause some inconsistency, not to mention the vagaries due to multiprocessor modeling, but I think I could live with that.
(This is related to my question here, but now that I understand the environment better I can see that the GPUs aren't used all the time.)
How can I tell if H2O 3.11.0.266 is running with GPUs?
The new XGBoost integration in H2O is the only GPU-capable algorithm in H2O (proper) at this time. So you can train an XGBoost model on GPUs and score on CPUs, but that's not true for the other H2O algorithms.
There is also the H2O Deep Water project, which provides integration between H2O and three third-party deep learning backends (MXNet, Caffe and TensorFlow), all of which are GPU-capable. So you can train those models using a GPU and score on a CPU as well. You can download the H2O Deep Water jar file (or R package, or Python module) at the Deep Water link above, and you can find out more info in the Deep Water GitHub repo README.
Yes, you do the heavy job of training on a GPU, save weights and then, your CPU will only do the matrix multiplication for predictions.
In Keras you can train your model and save Neural Network weights:
model.save_weights('your_model_weights.h5')
model.load_weights('your_model_weights.h5')
Using Word2vec and Doc2vec methods provided by Gensim, they have a distributed version which uses BLAS, ATLAS, etc to speedup (details here). However, is it supporting GPU mode? Is it possible to get GPU working if using Gensim?
Thank you for your question. Using GPU is on the Gensim roadmap. Will appreciate any input that you have about it.
There is a version of word2vec running on keras by #niitsuma called word2veckeras.
The code that runs on latest Keras version is in this fork and branch https://github.com/SimonPavlik/word2vec-keras-in-gensim/tree/keras106
#SimonPavlik has run performance test on this code. He found that a single gpu is slower than multiple CPUs for word2vec.
Regards
Lev