How to use SSE4.1 instructions without install tensorflow from source? - tensorflow

I have tried to install Tensorflow from source according to guide on its official site, but the experience is very unpleasant.
The immediate outcome of unable to install from source I can see is the following:
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I wonder whether there is a way to 'use SSE4.1 instructions' and other instructions mentioned above without installing Tensorflow from source.
Thanks!

There is no way to use SIMD instructions without building TensorFlow from source.
TensorFlow binaries come by default without this optimization to increase compatibility with the wider range of CPU architectures as possible.
If you want to silence the warnings though you can set TF_CPP_MIN_LOG_LEVEL to 2 as:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import tensorflow as tf
This TF environment variable defaults to 0, showing all logs.
Setting it to 1 will filter out INFO logs and 2 will additionally silence WARNING logs.

Related

AVX512 not showing on Intel Tensorflow

I have a Windows 11 computer with an 11th Gen Intel Core i7-1185G7, which supports SSE4.1, SSE4.2, AVX, AVX2 and AVX512. The computer has no GPU.
I created a conda environment with Python 3.10, and ran pip install intel-tensorflow. According to the documentation, the command pip install intel-tensorflow-avx512 should only be used on Linux platforms. It mentions that AVX512 is automatically used and enabled on PIP wheels:
All Intel TensorFlow binaries are optimized with oneAPI Deep Neural Network Library (oneDNN), which will use the AVX2 or AVX512F FMA etc CPU instructions automatically in performance-critical operations based on the supported Instruction sets on your machine for both Windows and Linux OS.
However, when I start a new project that uses TensorFlow, the following message is shown:
I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Therefore, I am not sure that TensorFlow is using AVX512 as the default instructions set.
Questions
How can I check that TensorFlow is indeed using AVX512?
If TensorFlow is not using AVX512, how can I force it to? Is it a bug that should be reported to Intel?
Is AVX512 really worth it in comparison with AVX and AVX2 when training a model in TensorFlow on a CPU?
This may not be ideal but you could try WSL and run TF through there using the intel-tensorflow-avx512 package as a test.
It is supposed to be default in the TF windows package as well (no need to use the avx512 pip package), but I’m confirming that now. Will get back to you asap.

How to compile TensorFlow binary to use AVX2, AVX512F, FMA?

When I call tf.Session() in TensorFlow, I obtain the following warning message:
I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports
instructions that this TensorFlow binary was not compiled to use: AVX2
AVX512F FMA
My questions are:
How can I solve this? In particular, I wish to be able to keep the current TensorFlow version (1.12.0)
Will I obtain a considerable gain considering that I work on GPU?
I use Ubuntu 18.04.1 LTS.
Thank you ;)
I do not know how to keep 1.12.0, however the Tensorflow page has a good build guide: https://www.tensorflow.org/install/source#setup_for_linux_and_macos
According to comments from this thread at the Tensorflow github project, no. Quote:
From my experiments I found CPU-optimized GPU TF doesn't boost the performance significantly, but it can make the CPU cooler.

Custom tensorflow contradictory AVX2 / FMA messages?

In Julia 0.6 official release, if I Pkg.add tensorflow and run the Pkg.test, shortly after the test starts I get a message about how my CPU supports various protocols such as AVX/2 FMA SSE and so on. Then later in the test process I get another message restating that AVX2 and FMA are not available. The AVX? issue is broadly addressed in other stackoverflow questions.
On recompile a custom version of tensorflow to include AVX / FMA and copy of the resulting tensorflow.so files to the Julia tensorflow deps/usr/bin, running the same Pkg.test() results in no
first message, which seems to confirm that AVX2 and FMA are now in the binary, but the second message repeats, informing me again that AVX2 and FMA are not compiled in.
Test Summary: | Pass Total
shape_inference | 255 255
2018-06-08 09:55:41.794208: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
TensorBoard 1.8.0 at http://linux-k18k.suse:6006 (Press CTRL+C to quit)
Test Summary: |
show | No tests
This may or may not be a contradiction in messages from tensorflow. Given a tensorflow.so library file, is there a way to confirm independently whether the AVX / FMA components were successfully compiled in?
Edit1: Ok so I found objdump and verified that some opt codes for AVX2 are in fact included in the .so library. This issue seems to involve tensorboard rather than tensorflow, but I don't qualify to add a tag for tensorboard (can someone help?). I'm wondering if the standalone tensorboard is pointed at the right libtensorflow? If it is getting information from another version this might explain why it thinks that the codes for AVX2 are missing.
This is now resolved. For me the confusing thing was that it was tensorboard generating the message, not (as I thought) tensorflow itself. Tensorflow was quiet because it saw a valid binary capable of AVX2 and FMA, but tensorboard was doing a separate check which failed, at least in version 1.8. Tensorboard in fact does not do anything requiring AVX2 or FMA so the issue can be safely ignored. Version 1.9 of tensorflow/tensorboard now assesses AVX2 and FMA capability correctly and does not generate the warning message.

Re-build Tensorflow with desired optimization flags

and thanks in advance for your consideration,
I just installed tensorflow (on a new machine with Ubuntu 16.04 and CUDA 8.0 already installed) using the following procedure:
Initially, I used --copt=-march=native.
I received the message
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
So in an attempt to fix this, I searched for solutions and used the answer to the following
How to compile Tensorflow with SSE4.2 and AVX instructions?
by using the above procedure from nVidia, starting from
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package
but am still receiving the same messages as above. I feel as though I'm making a very simple error, any ideas?
Thanks!
By following the NVIDIA instructions you are resetting the TensorFlow repository to an older commit, before the SIMD instructions optimization was made available (1.0r):
git reset --hard 70de76e
This commit dates back to a previous release when this feature was not yet implemented, so it is actually working as it's supposed to.
The solution is to follow the official TensorFlow documentation.
For future situations, it is always recommended to use official resources before reaching out for third party solutions, as more they may be helpful, the official ones are more reliable and better maintained.
Notice during configure you're not prompted which CPU instructions you want to build TF with as it should due to the reason above, and therefore, you're unable to build with them.
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Follow the official docs accordingly and it will work. If you have any follow up questions feel free to ask or if you face any problems open an issue on Github :)

Tensorflow Compilation Speeding up CPU

As I use the following command after importing tensorflow in python 2.7:
sess = tf.Session()
Warnings/errors:
tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow
library wasn't compiled to use SSE4.2 instructions, but these are
available on your machine and could speed up CPU computations.
2017-02-02 00:41:48.616602: W
tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow
library wasn't compiled to use AVX instructions, but these are
available on your machine and could speed up CPU computations.
2017-02-02 00:41:48.616614: W
tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow
library wasn't compiled to use AVX2 instructions, but these are
available on your machine and could speed up CPU computations.
2017-02-02 00:41:48.616624: W
tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow
library wasn't compiled to use FMA instructions, but these are
available on your machine and could speed up CPU computations.
Please help me fix this so I may use my machine at its optimal power.
Those warnings are just saying if you build TensorFlow from source it can run faster on your machine. There is no fix as it's not an issue but intended behavior to provide this information to users.
Those CPU instructions were not enabled by default to provide a broader compatibility with most machines.
As the docs says:
TensorFlow checks on startup whether it has been compiled with the optimizations available on the CPU. If the optimizations are not included, TensorFlow will emit warnings, e.g. AVX, AVX2, and FMA instructions not included.
For all details on that see the Performance Guide.
These warnings you see, are telling you that the compiled code does not use these instructions which you have, but not all CPUs out there. When maintainers compile codes for repositories, they need to compile it such that it supports majority of CPUs out there, which means they tell the compiler to use architecture specific instructions.
If you want the package to use all the instructions you have, you need to compile it yourself, or as it's called install from source. You can find documentation about how to do that here, and once you're comfortable compiling tensorflow from source, then you should go and read the performance specific instructions.
However, at the end of the day, for realworld applications you might really need a GPU. It is true that these CPU instructions give you a bit of performance boost, but that's not comparable to using a GPU.