I have a Windows 11 computer with an 11th Gen Intel Core i7-1185G7, which supports SSE4.1, SSE4.2, AVX, AVX2 and AVX512. The computer has no GPU.
I created a conda environment with Python 3.10, and ran pip install intel-tensorflow. According to the documentation, the command pip install intel-tensorflow-avx512 should only be used on Linux platforms. It mentions that AVX512 is automatically used and enabled on PIP wheels:
All Intel TensorFlow binaries are optimized with oneAPI Deep Neural Network Library (oneDNN), which will use the AVX2 or AVX512F FMA etc CPU instructions automatically in performance-critical operations based on the supported Instruction sets on your machine for both Windows and Linux OS.
However, when I start a new project that uses TensorFlow, the following message is shown:
I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Therefore, I am not sure that TensorFlow is using AVX512 as the default instructions set.
Questions
How can I check that TensorFlow is indeed using AVX512?
If TensorFlow is not using AVX512, how can I force it to? Is it a bug that should be reported to Intel?
Is AVX512 really worth it in comparison with AVX and AVX2 when training a model in TensorFlow on a CPU?
This may not be ideal but you could try WSL and run TF through there using the intel-tensorflow-avx512 package as a test.
It is supposed to be default in the TF windows package as well (no need to use the avx512 pip package), but I’m confirming that now. Will get back to you asap.
Related
After installing tensorflow==2.9.1 and running it for the first time I got the following message:
2022-08-19 11:51:23.381523: I tensorflow/core/platform/cpu_feature_guard.cc:193]
This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)
to use the following CPU instructions in performance-critical operations:
AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate
compiler flags.
This message is a bit confusing. At a first glance, it seems like a bad thing. But if you read it carefully, you realize it's actually a good thing. It's using all those nice extensions in "performance-critical operations". But then the last sentence makes it sound not so good, because they are not enabled in "other operations" (whatever they are).
Searching the above message on the interwebs I came across Intel® Optimization for TensorFlow* Installation Guide which said:
If your machine has AVX512 instruction set supported please use the below packages for better performance.
pip install intel-tensorflow-avx512==2.9.1 # linux only
Since my box supports AVX512, I've installed intel-tensorflow-avx512==2.9.1 and now got this message:
2022-08-19 11:43:00.187298: I tensorflow/core/platform/cpu_feature_guard.cc:193]
This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)
to use the following CPU instructions in performance-critical operations:
AVX512_VNNI
To enable them in other operations, rebuild TensorFlow with the appropriate
compiler flags.
Hmm...
So, my questions are:
Since Intel "optimized" version of TensorFlow only "complains" of using AVX512_VNNI in "performance-critical sections", does that mean it's using AVX2, AVX512F and FMA everywhere, including all "other operations"? Or does it mean it's not using them at all?
If it's not using them at all, does it mean it's "inferior" to the official version of TensorFlow and there is no point in using it?
BONUS QUESTION: Why are those cool AVX2/512F/512_VNNI and FMA instructions only enabled in "performance-critical sections" and not for all "other operations"?
When I call tf.Session() in TensorFlow, I obtain the following warning message:
I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports
instructions that this TensorFlow binary was not compiled to use: AVX2
AVX512F FMA
My questions are:
How can I solve this? In particular, I wish to be able to keep the current TensorFlow version (1.12.0)
Will I obtain a considerable gain considering that I work on GPU?
I use Ubuntu 18.04.1 LTS.
Thank you ;)
I do not know how to keep 1.12.0, however the Tensorflow page has a good build guide: https://www.tensorflow.org/install/source#setup_for_linux_and_macos
According to comments from this thread at the Tensorflow github project, no. Quote:
From my experiments I found CPU-optimized GPU TF doesn't boost the performance significantly, but it can make the CPU cooler.
In Julia 0.6 official release, if I Pkg.add tensorflow and run the Pkg.test, shortly after the test starts I get a message about how my CPU supports various protocols such as AVX/2 FMA SSE and so on. Then later in the test process I get another message restating that AVX2 and FMA are not available. The AVX? issue is broadly addressed in other stackoverflow questions.
On recompile a custom version of tensorflow to include AVX / FMA and copy of the resulting tensorflow.so files to the Julia tensorflow deps/usr/bin, running the same Pkg.test() results in no
first message, which seems to confirm that AVX2 and FMA are now in the binary, but the second message repeats, informing me again that AVX2 and FMA are not compiled in.
Test Summary: | Pass Total
shape_inference | 255 255
2018-06-08 09:55:41.794208: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
TensorBoard 1.8.0 at http://linux-k18k.suse:6006 (Press CTRL+C to quit)
Test Summary: |
show | No tests
This may or may not be a contradiction in messages from tensorflow. Given a tensorflow.so library file, is there a way to confirm independently whether the AVX / FMA components were successfully compiled in?
Edit1: Ok so I found objdump and verified that some opt codes for AVX2 are in fact included in the .so library. This issue seems to involve tensorboard rather than tensorflow, but I don't qualify to add a tag for tensorboard (can someone help?). I'm wondering if the standalone tensorboard is pointed at the right libtensorflow? If it is getting information from another version this might explain why it thinks that the codes for AVX2 are missing.
This is now resolved. For me the confusing thing was that it was tensorboard generating the message, not (as I thought) tensorflow itself. Tensorflow was quiet because it saw a valid binary capable of AVX2 and FMA, but tensorboard was doing a separate check which failed, at least in version 1.8. Tensorboard in fact does not do anything requiring AVX2 or FMA so the issue can be safely ignored. Version 1.9 of tensorflow/tensorboard now assesses AVX2 and FMA capability correctly and does not generate the warning message.
I have installed CUDA and cuDNN, but the last was not working, giving a lot of error messages in theano. Now I am training moderate sized deep conv nets in Keras/Tensorflow, without getting any cuDNN error messages. How can I check if cuDNN is now being used?
tl;dr: If tensorflow-gpu works, then CuDNN is used.
The prebuilt binaries of TensorFlow (at least since version 1.3) link to the CuDNN library. If CuDNN is missing, an error message will tell you ImportError: Could not find 'cudnn64_7.dll'. TensorFlow requires that this DLL be installed....
According to the TensorFlow install documentation for version 1.5, CuDNN must be installed for GPU support even if you build it from source. There are still a lot of fallbacks in the TensorFlow code for the case of CuDNN not being available -- as far as I can tell it used to be optional in prior versions.
Here are two lines from the TensorFlow source that explicitly tell and force that CuDNN is required for gpu acceleration.
There is a special GPU version of TensorFlow that needs to be installed in order to use the GPU (and CuDNN). Make sure the installed python package is tensorflow-gpu and not just tensorflow.
You can list the packages containing "tensorflow" with conda list tensorflow (or just pip list, if you do not use anaconda), but make sure you have the right environment activated.
When you run your scripts with GPU support, they will start like this:
Using TensorFlow backend.
2018- ... C:\tf_jenkins\...\gpu\gpu_device.cc:1105] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7845
To test it, just type into the console:
import tensorflow as tf
tf.Session()
To check if you "see" the CuDNN from your python environment and therewith validate a correct PATH variable, you can try this:
import ctypes
ctypes.WinDLL("cudnn64_7.dll") # use the file name of your cudnn version here.
You might also want to look into the GPU optimized Keras Layers.
CuDNNLSTM
CuDNNGRU
They are significantly faster:
https://keras.io/layers/recurrent/#cudnnlstm
We saw a 10x improvement going from the LSTM to CuDNNLSTM Keras layers.
Note:
We also saw a 10x increase in VMS (virtual memory) usage on the machine. So there are tradeoffs to consider.
and thanks in advance for your consideration,
I just installed tensorflow (on a new machine with Ubuntu 16.04 and CUDA 8.0 already installed) using the following procedure:
Initially, I used --copt=-march=native.
I received the message
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
So in an attempt to fix this, I searched for solutions and used the answer to the following
How to compile Tensorflow with SSE4.2 and AVX instructions?
by using the above procedure from nVidia, starting from
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package
but am still receiving the same messages as above. I feel as though I'm making a very simple error, any ideas?
Thanks!
By following the NVIDIA instructions you are resetting the TensorFlow repository to an older commit, before the SIMD instructions optimization was made available (1.0r):
git reset --hard 70de76e
This commit dates back to a previous release when this feature was not yet implemented, so it is actually working as it's supposed to.
The solution is to follow the official TensorFlow documentation.
For future situations, it is always recommended to use official resources before reaching out for third party solutions, as more they may be helpful, the official ones are more reliable and better maintained.
Notice during configure you're not prompted which CPU instructions you want to build TF with as it should due to the reason above, and therefore, you're unable to build with them.
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Follow the official docs accordingly and it will work. If you have any follow up questions feel free to ask or if you face any problems open an issue on Github :)