How to compile TensorFlow binary to use AVX2, AVX512F, FMA? - tensorflow

When I call tf.Session() in TensorFlow, I obtain the following warning message:
I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports
instructions that this TensorFlow binary was not compiled to use: AVX2
AVX512F FMA
My questions are:
How can I solve this? In particular, I wish to be able to keep the current TensorFlow version (1.12.0)
Will I obtain a considerable gain considering that I work on GPU?
I use Ubuntu 18.04.1 LTS.
Thank you ;)

I do not know how to keep 1.12.0, however the Tensorflow page has a good build guide: https://www.tensorflow.org/install/source#setup_for_linux_and_macos
According to comments from this thread at the Tensorflow github project, no. Quote:
From my experiments I found CPU-optimized GPU TF doesn't boost the performance significantly, but it can make the CPU cooler.

Related

TensorFlow and oneDNN

After installing tensorflow==2.9.1 and running it for the first time I got the following message:
2022-08-19 11:51:23.381523: I tensorflow/core/platform/cpu_feature_guard.cc:193]
This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)
to use the following CPU instructions in performance-critical operations:
AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate
compiler flags.
This message is a bit confusing. At a first glance, it seems like a bad thing. But if you read it carefully, you realize it's actually a good thing. It's using all those nice extensions in "performance-critical operations". But then the last sentence makes it sound not so good, because they are not enabled in "other operations" (whatever they are).
Searching the above message on the interwebs I came across Intel® Optimization for TensorFlow* Installation Guide which said:
If your machine has AVX512 instruction set supported please use the below packages for better performance.
pip install intel-tensorflow-avx512==2.9.1 # linux only
Since my box supports AVX512, I've installed intel-tensorflow-avx512==2.9.1 and now got this message:
2022-08-19 11:43:00.187298: I tensorflow/core/platform/cpu_feature_guard.cc:193]
This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)
to use the following CPU instructions in performance-critical operations:
AVX512_VNNI
To enable them in other operations, rebuild TensorFlow with the appropriate
compiler flags.
Hmm...
So, my questions are:
Since Intel "optimized" version of TensorFlow only "complains" of using AVX512_VNNI in "performance-critical sections", does that mean it's using AVX2, AVX512F and FMA everywhere, including all "other operations"? Or does it mean it's not using them at all?
If it's not using them at all, does it mean it's "inferior" to the official version of TensorFlow and there is no point in using it?
BONUS QUESTION: Why are those cool AVX2/512F/512_VNNI and FMA instructions only enabled in "performance-critical sections" and not for all "other operations"?

AVX512 not showing on Intel Tensorflow

I have a Windows 11 computer with an 11th Gen Intel Core i7-1185G7, which supports SSE4.1, SSE4.2, AVX, AVX2 and AVX512. The computer has no GPU.
I created a conda environment with Python 3.10, and ran pip install intel-tensorflow. According to the documentation, the command pip install intel-tensorflow-avx512 should only be used on Linux platforms. It mentions that AVX512 is automatically used and enabled on PIP wheels:
All Intel TensorFlow binaries are optimized with oneAPI Deep Neural Network Library (oneDNN), which will use the AVX2 or AVX512F FMA etc CPU instructions automatically in performance-critical operations based on the supported Instruction sets on your machine for both Windows and Linux OS.
However, when I start a new project that uses TensorFlow, the following message is shown:
I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Therefore, I am not sure that TensorFlow is using AVX512 as the default instructions set.
Questions
How can I check that TensorFlow is indeed using AVX512?
If TensorFlow is not using AVX512, how can I force it to? Is it a bug that should be reported to Intel?
Is AVX512 really worth it in comparison with AVX and AVX2 when training a model in TensorFlow on a CPU?
This may not be ideal but you could try WSL and run TF through there using the intel-tensorflow-avx512 package as a test.
It is supposed to be default in the TF windows package as well (no need to use the avx512 pip package), but I’m confirming that now. Will get back to you asap.

How to Use Build a Keras (TF) model using GPU?

The question is pretty straightforward but nothing has really been answered.
Pretty simple, how do I know that when I build a Sequential() model in tensorflow via Keras it's going to use my GPU?
Normally, in Torch, so easy just use 'device' parameter and can verify via nvidia-smi volatility metric. I tried it while building model in TF but nvidia-smi shows 0% usage across all GPU devices.
Tensorflow uses GPU for most of the operations by default when
It detects at least one GPU
Its GPU support is installed and configured properly. For information regarding how to install and configure it properly for GPU support: https://www.tensorflow.org/install/gpu
One of the requirements to emphasize is that specific version of CUDA library has to be installed. e.g. Tensorflow 2.5 requires CUDA 11.2. Check here for the CUDA version required for each version of TF.
To know whether it detects GPU devices:
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
It will also print out debug messages by default to stderr to indicate whether the GPU support is configured properly and whether it detects GPU devices.
To validate using nvidia-smi that it is really using GPU:
You have to define a sufficiently deep and complex neural network model such that the bottleneck is in the GPU side. This can be achieved by increasing the number of layers and the number of channels in each of the layers.
When doing training or inference of the model like model.fit() and model.evaluate(), the GPU utilization in the logging from nvidia-smi should be high.
To know exactly where each operation will be executed, you can add the following line in the beginning of your codes
tf.debugging.set_log_device_placement(True)
For more information: https://www.tensorflow.org/guide/gpu

Tensorflow quantization

I would like to optimize a graph using Tensorflow's transform_graph tool. I tried optimizing the graph from MultiNet (and others with similar encoder-decoder architectures). However, the optimized graph is actually slower when using quantize_weights, and even much slower when using quantize_nodes. From Tensorflow's documentation, there may be no improvements, or it may even be slower, when quantizing. Any idea if this is normal with the graph/software/hardware below?
Here is my system information for your reference:
OS Platform and Distribution: Linux Ubuntu 16.04
TensorFlow installed from: using TF source code (CPU) for graph conversion, using binary-python(GPU) for inference
TensorFlow version: both using r1.3
Python version: 2.7
Bazel version: 0.6.1
CUDA/cuDNN version: 8.0/6.0 (inference only)
GPU model and memory: GeForce GTX 1080 Ti
I can post all the scripts used to reproduce if necessary.
It seems like quantization in Tensorflow only happens on CPUs. See: https://github.com/tensorflow/tensorflow/issues/2807
I got same problem in PC enviroment. My model is 9 times slower than not quantize.
But when I porting my quantized model into android application, its ok to speed up.
Seems like current only work on CPU and only ARM base CPU such as android phone.

Tensorflow Compilation Speeding up CPU

As I use the following command after importing tensorflow in python 2.7:
sess = tf.Session()
Warnings/errors:
tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow
library wasn't compiled to use SSE4.2 instructions, but these are
available on your machine and could speed up CPU computations.
2017-02-02 00:41:48.616602: W
tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow
library wasn't compiled to use AVX instructions, but these are
available on your machine and could speed up CPU computations.
2017-02-02 00:41:48.616614: W
tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow
library wasn't compiled to use AVX2 instructions, but these are
available on your machine and could speed up CPU computations.
2017-02-02 00:41:48.616624: W
tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow
library wasn't compiled to use FMA instructions, but these are
available on your machine and could speed up CPU computations.
Please help me fix this so I may use my machine at its optimal power.
Those warnings are just saying if you build TensorFlow from source it can run faster on your machine. There is no fix as it's not an issue but intended behavior to provide this information to users.
Those CPU instructions were not enabled by default to provide a broader compatibility with most machines.
As the docs says:
TensorFlow checks on startup whether it has been compiled with the optimizations available on the CPU. If the optimizations are not included, TensorFlow will emit warnings, e.g. AVX, AVX2, and FMA instructions not included.
For all details on that see the Performance Guide.
These warnings you see, are telling you that the compiled code does not use these instructions which you have, but not all CPUs out there. When maintainers compile codes for repositories, they need to compile it such that it supports majority of CPUs out there, which means they tell the compiler to use architecture specific instructions.
If you want the package to use all the instructions you have, you need to compile it yourself, or as it's called install from source. You can find documentation about how to do that here, and once you're comfortable compiling tensorflow from source, then you should go and read the performance specific instructions.
However, at the end of the day, for realworld applications you might really need a GPU. It is true that these CPU instructions give you a bit of performance boost, but that's not comparable to using a GPU.