Counting FLOPS in tensorflow - tensorflow

Is there a way to count FLOPS for the training and prediction of tensorflow models?
The models are running on a CPU using tensorflow 2.8.0 and i would not like to use an external (e.g. command line) tool.

Related

How to do inference with tensorflow2 with multi GPUs

I have a large dateset to inference. There are 10 gpus in my machine. When I do inference, only one GPU work. The frame I use is tensorflow2.6. I used to use pytorch. But now I have to use tensorflow which I am not familiar with for some reasons.
I want to know how to use all gpus and keep the order of the Dataset at the same time in the inference process

Difference between `train.py` and `model_main.py` in Tensorflow Object Detection API

I usually just use train.py to train using Tensorflow Object Detection API. However, I read from https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/discussion/68581 that you can also use model_main.py to train your model and see real-time plots and images on Tensorboard.
How do you exactly use model_main.py on Tensorboard?
What is the difference between train.py and model_main.py?
On TensorBoard, the model_main.py output similar graphs like train.py, but in model_main.py, the performance of the model on the evaluation dataset is measured too.
model_main.py is the newer version in TensorFlow Object Detection API. It is used for training and also evaluating the model. When using train.py we have to run a separate program for evaluation (eval.py), while model_main.py executes both. For example, training code will be running for a certain time (for example 5 mins or every 2000 steps), then the training will be stopped and evaluation will be run. After the evaluation has finished, the training will be continued again. Then the same cycle is repeated again.
The newer version of Object Detection API of Tensorflow offers model_main.py that trains as well as evaluates the model using the various pre-conditions and preprocessing where as the older versions of Tensorflow Object Detection APIs uses train.py for training and eval.py for evaluating.
Reference : https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10

Is it possible to train a H2O model with GPU and predict with a CPU?

For trainining speed, it would be nice to be able to train a H2O model with GPUs, take the model file, and then predict on a machine without GPUs.
It seems like that should be possible in theory, but with the H2O release 3.13.0.341, that doesn't seem to happen, except for XGBoost model.
When I run gpustat -cup I can see the GPUs kick in when I train H2O's XGBoost model. This doesn't happen with DL, DRF, GLM, or GBM.
I wouldn't be surprised if a difference in float point size (16, 32, 64) could cause some inconsistency, not to mention the vagaries due to multiprocessor modeling, but I think I could live with that.
(This is related to my question here, but now that I understand the environment better I can see that the GPUs aren't used all the time.)
How can I tell if H2O 3.11.0.266 is running with GPUs?
The new XGBoost integration in H2O is the only GPU-capable algorithm in H2O (proper) at this time. So you can train an XGBoost model on GPUs and score on CPUs, but that's not true for the other H2O algorithms.
There is also the H2O Deep Water project, which provides integration between H2O and three third-party deep learning backends (MXNet, Caffe and TensorFlow), all of which are GPU-capable. So you can train those models using a GPU and score on a CPU as well. You can download the H2O Deep Water jar file (or R package, or Python module) at the Deep Water link above, and you can find out more info in the Deep Water GitHub repo README.
Yes, you do the heavy job of training on a GPU, save weights and then, your CPU will only do the matrix multiplication for predictions.
In Keras you can train your model and save Neural Network weights:
model.save_weights('your_model_weights.h5')
model.load_weights('your_model_weights.h5')

why is multi GPU tensorflow retraining not working

I have been training my tensorflow retraining algorithm using a single GTX Titan and it works just fine, but when I try to use multiple gpus in the flower of retraining example it does not work and seems to only utilize one GPU when I run it in Nvidia SMI.
Why is this happening as it does work with multiple gpus when retraining at Inception model from scratch but not during retraining?
TensorFlow's flower retraining example does not work with multiple GPUs at all, even if you set --num_gpus > 1. It should support a single GPU as you noted.
The model needs to be modified to utilize multiple GPUs in parallel. Unfortunately, a single TensorFlow operation like the flower retraining example can't automatically be split over multiple GPUs at this time.

How can I use tensorflow pretrained model (i.e inception v3) in batch mode by using GPU?

I wanna use the inception v3 model in tensor flow for feature extraction. But the number of the images I am using is a lot, so it takes long time to run. So, I am going to use GPU. I have installed Cuda 7.5 and cuDnn correctly.
I am using following code in the CPU mode for one image:
with tf.Session as sess:
softmax_tensor =sess.graph.get_tensor_by_name('pool_3:0')
feat_vect = numpy.squeeze(sess.run(softmax_tensor,{'DecodeJpeg:0': in_image}))
So, my question is that how should I change my code so I can use it for many batches by GPU?