Inside virtualenv: How to get tensorflow to support sse 4.2 and avx - tensorflow

Just to say it upfront, I'm aware of all the answers that require bazel and they didn't work for me. I'm using virtualenv as the tensorflow website recommends to.
(tensorflow27)name#computersname:~$ bazel build --linkopt='-lrt' -c opt --copt=-mavx --copt=-msse4.2 --copt=-msse4.1 --copt=-msse3-k //tensorflow/tools/pip_package:build_pip_package
will output
ERROR: The 'build' command is only supported from within a workspace.
Basically I followed all steps from here
But when I run this validation I get
2017-09-02 11:46:52.613368: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-02 11:46:52.613396: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-02 11:46:52.613416: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
I DO NOT want to surpress the warnings, I actually want to use SSE 4.2 and AVX (my processor supports both) However I wasn't able to find any instructions anywhere how to compile tensorflow inside a virutal environment such that the support for SSE and AVX is enabled from scratch. It's not even listed in their common installation problems section.
Btw. the system I use for this is Ubuntu 14.04 and I don't have an nvidia graphics card (so no cuda for now), but I plan to get one in the future.
I'm a little bit disappointed that tensorflow doesn't detect CPU capabilities before compilation.
Edit: Untrue, later I figured out it actually does
PS: I have setup two virtual environments, one for python 2.7 and the other one for python 3.0. Ideally I would hope that the solution works for both, as I didn't decide yet which python version I will end up using.

OK, so it turned out my problems were pretty much independent of which virtual environment I choose. The bazel build failed simply because I was in the wrong directory. I needed to pull tensorflow from git and then cd into it. Then I could build from there, using the default build option -march=native, which already detects my CPU capabilities.
Setting up two different virtual environments for the two different python versions in advance was useful. I used those environments to compile directly inside of them. This results in automatic python version detection. So building in the 2.7 environment will result in a build for python 2.7 etc.
To be somewhat more detailed:
First I installed bazel. Then I installed the dependencies mentioned on the tensorflow page like this:
sudo apt-get install python3-numpy python3-dev python3-pip python3-wheel
sudo apt-get install python-numpy python-dev python-pip python-wheel
In order to use OpenCL in the build later, I had to download (and complile if I remember correctly) ComputeCpp-CE-0.3.1-Ubuntu.14.04-64bit.tar.gz (For ubuntu 16 it would be ComputeCpp-CE-0.3.1-Ubuntu.16.04-64bit.tar.gz). After compilation I had to move the build result to /usr/local/computecpp:
sudo mkdir /usr/local/computecpp
sudo cp -r ./Downloads/ComputeCpp*/* /usr/local/computecpp
If I remember correctly at this point I then activated my virtual environment with the python version I want to compile against, so the desired python version would be recognized by ./configure.
Then I pulled tensorflow from git and configured it:
git clone https://github.com/tensorflow/tensorflow
cd tensorflow
./configure
During the configuration routine I only answered yes to jemalloc and OpenCL as I don't have a CUDA card right now. When saying yes to OpenCL I was prompted for the ComputeCpp path which was at that location I created under /usr/local/computecpp
Then, while staying in the tensorflow directory I did
bazel build --config=opt --config=mkl //tensorflow/tools/pip_package:build_pip_package
People who activated cuda during ./configure, should also add a "--config=cuda" to this bazel build command. If you have gcc > 5 you will also need to add '--cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0'.
The '--config=mkl' part activates some additional libraries provided by intel specifically to speed up some calculations on their processors, so tensorflow can make use of them. If you don't have an Intel processor, it's probably wise to remove that option.
Btw, at first I also compiled mkl by hand, but that's not necessary it turns out. The bazel build will automatically pull mkl from an online source if it's missing.
Then I created the final package at /tmp/tensorflow_pkg:
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
Depending on the python version in your environment you will see a file name reflecting it:
/tmp/tensorflow_pkg/tensorflow-1.3.0-cp27-cp27mu-linux_x86_64.whl
/tmp/tensorflow_pkg/tensorflow-1.3.0-cp36-cp36m-linux_x86_64.whl
Now I could go to the appropriate environment and install it using pip. If you are using conda, the tensorflow website still recommends to use "pip install" and not "conda install". So if I were in a virtual environment with python 2.7 (doesn't matter if conda or not) I would type:
pip install --ignore-installed --upgrade /tmp/tensorflow_pkg/tensorflow-1.3.0-cp27-cp27mu-linux_x86_64.whl
And if I were under python 3.6 I would type:
pip install --ignore-installed --upgrade /tmp/tensorflow_pkg/tensorflow-1.3.0-cp36-cp36m-linux_x86_64.whl
There was one last hoop I needed to jump over: If you stay in the tensorflow directory (which you pulled from git) and then launch python, you won't be able to import tensorflow. I think it's some kind of a bug. So it's important to escape the tensorflow directory before starting python.
cd ..
Now you can start your python inside your virtual environment and import tensorflow. Hopefully without further errors or any warnings about not using the SSE or AVX capabilities of your processor. At least in my case it worked.

Related

The kernel appears to have died. It will restart automatically. Jupyter notebook [duplicate]

I am using a MacBook Pro with M1 processor, macOS version 11.0.1, Python 3.8 in PyCharm, Tensorflow version 2.4.0rc4 (also tried 2.3.0, 2.3.1, 2.4.0rc0). I am trying to run the following code:
import tensorflow
This causes the error message:
Process finished with exit code 132 (interrupted by signal 4: SIGILL)
The code runs fine on my Windows and Linux machines.
What does the error message mean and how can I fix it?
Seems that this problem happens when you have multiple python interpreters installed, and some of them are for differente architectuers (x86_64 vs arm64). You need to make sure that the correct python interpreter is being used, if you installed Apple's version of tensorflow, then that probably requires an arm64 interpreter.
If you use rosetta (Apple's x86_64 emulator) then you need to use a x86_64 python interpreter, if you somehow load the arm64 python interpreter, you will get the illegal instruction error (which totally makes sense).
If you use any script that installs new python interpreters, then you need to make sure the correct interpreter for the architecture is installed (most likely arm64).
Overalll I think this problem happens because the python environment setup is not made for systems that can run multiple instruction sets/architectures, pip does check the architecture of packages and the host system but seems you can run a x86_64 interpreter to load a package meant for arm64 and this produces the problem.
For reference there is an issue in tensorflow_macos that people can check.
For M1 Macs, From Apple developer page the following worked:
First, download Conda Env from here and then follow these instructions (assuming the script is downloaded to ~/Downloads folder)
chmod +x ~/Downloads/Miniforge3-MacOSX-arm64.sh
sh ~/Downloads/Miniforge3-MacOSX-arm64.sh
source ~/miniforge3/bin/activate
reload the shell and do
python -m pip uninstall tensorflow-macos
python -m pip uninstall tensorflow-metal
conda install -c apple tensorflow-deps
python -m pip install tensorflow-macos
python -m pip install tensorflow-metal
If the above doesn't work for some reason, there are some edge cases and additional information provided at the Apple developer page
Installing Tensorflow version 1.15 fixed this for me.
$ conda install tensorflow==1.15
I have been able to resolve this issue by using Miniforge instead of Anaconda as the Python environment. Anaconda doesn't support the arm64 architecture, yet.
I had the same issue
This is because of M1 chip. Now there is a pre-release that delivers hardware-accelerated TensorFlow and TensorFlow Addons for macOS 11.0+. Native hardware acceleration is supported on M1 Macs and Intel-based Macs through Apple’s ML Compute framework.
You need to install the TensorFlow that supports M1 chip Simply pull this tensorflow macos repository and run the ./scripts/download_and_install.sh

Tensorflow will not run on GPU

I'm a newbie when it comes to AWS and Tensorflow and I've been learning about CNNs over the last week via Udacity's Machine Learning course.
Now I've a need to use an AWS instance of a GPU. I launched a p2.xlarge instance of Deep Learning AMI with Source Code (CUDA 8, Ubuntu) (that's what they recommended)
But now, it seems that tensorflow is not using the GPU at all. It's still training using the CPU. I did some searching and I found some answers to this problem and none of them seemed to work.
When I run the Jupyter notebook, it still uses the CPU
What do I do to get it to run on the GPU and not the CPU?
The problem of tensorflow not detecting GPU can possibly be due to one of the following reasons.
Only the tensorflow CPU version is installed in the system.
Both tensorflow CPU and GPU versions are installed in the system, but the Python environment is preferring CPU version over GPU version.
Before proceeding to solve the issue, we assume that the installed environment is an AWS Deep Learning AMI having CUDA 8.0 and tensorflow version 1.4.1 installed. This assumption is derived from the discussion in comments.
To solve the problem, we proceed as follows:
Check the installed version of tensorflow by executing the following command from the OS terminal.
pip freeze | grep tensorflow
If only the CPU version is installed, then remove it and install the GPU version by executing the following commands.
pip uninstall tensorflow
pip install tensorflow-gpu==1.4.1
If both CPU and GPU versions are installed, then remove both of them, and install the GPU version only.
pip uninstall tensorflow
pip uninstall tensorflow-gpu
pip install tensorflow-gpu==1.4.1
At this point, if all the dependencies of tensorflow are installed correctly, tensorflow GPU version should work fine. A common error at this stage (as encountered by OP) is the missing cuDNN library which can result in following error while importing tensorflow into a python module
ImportError: libcudnn.so.6: cannot open shared object file: No such
file or directory
It can be fixed by installing the correct version of NVIDIA's cuDNN library. Tensorflow version 1.4.1 depends upon cuDNN version 6.0 and CUDA 8, so we download the corresponding version from cuDNN archive page (Download Link). We have to login to the NVIDIA developer account to be able to download the file, therefore it is not possible to download it using command line tools such as wget or curl. A possible solution is to download the file on host system and use scp to copy it onto AWS.
Once copied to AWS, extract the file using the following command:
tar -xzvf cudnn-8.0-linux-x64-v6.0.tgz
The extracted directory should have structure similar to the CUDA toolkit installation directory. Assuming that CUDA toolkit is installed in the directory /usr/local/cuda, we can install cuDNN by copying the files from the downloaded archive into corresponding folders of CUDA Toolkit installation directory followed by linker update command ldconfig as follows:
cp cuda/include/* /usr/local/cuda/include
cp cuda/lib64/* /usr/local/cuda/lib64
ldconfig
After this, we should be able to import tensorflow GPU version into our python modules.
A few considerations:
If we are using Python3, pip should be replaced with pip3.
Depending upon user privileges, the commands pip, cp and ldconfig may require to be run as sudo.

Tensorflow compilation on Odroid XU4

I am trying to compile Tensorflow (tried both: full & lite) on Odroid XU4 (16GB eMMc, Ubuntu 16) but I am getting errors shown in figures: https://www.dropbox.com/sh/j86ysncze1q0eka/AAB8RZtUTkaytqfEGivbev_Ga?dl=0
I am using FlytOS as OS(http://docs.flytbase.com/docs/FlytOS/GettingStarted/OdroidGuide.html). Its customized Ubuntu 16 with OpenCV and ROS setup, makes 11GB after installation. So, I got only 2.4GB free. Therefore, I added 16GB USB as swap memory.
I have installed Bazel without using swap memory. Tried tensorflow full version and lite but failed to compile. However, I downloaded compiled tensorflow lite for Pi and successfully installed on Odroid. Since, Odroid is Octacore, therefore, to make best use of available processing power I need to compile tensorflow on Odroid.
Please let me know if any one has tensorflow compiled on Odroid XU4.
Regards,
Check this guide out. Build Tensorflow on Odroid
IT gives a detailed step by step guide and also has some troubleshooting procedures.
Summarizing the steps here:
Install prerequisites including g++, gcc-4.8, python-pip, python-dev, numpy and Oracle Java (not OpenJDK)
Use a USB/ Flash drive and add some swap memory
Build Bazel. In the compile.sh shell script, modify the run line to add memory flags
run “${JAVAC}” -J-Xms256m -J-Xmx384m -classpath “${classpath}” -sourcepath “${sourcepath}”
Get Tensorflow v1.4 specifically and run ./configure and select relevant options. Disable XLA as it's causing some problems.
Finally run Bazel command.
bazel build -c opt --copt="-funsafe-math-optimizations" --copt="-ftree-vectorize" --copt="-fomit-frame-pointer" --local_resources 8192,8.0,1.0 --verbose_failures tensorflow/tools/pip_package:build_pip_package
Now install it.
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
sudo pip2 install /tmp/tensorflow_pkg/tensorflow-1.4.0-cp27-cp27mu-linux_armv7l.whl --upgrade --ignore-installed
Test the install
python
import tensorflow
print(tensorflow.__version__)
1.4.0
I was able to compile it successfully by following the steps given there.

Suppressing warnings about use of SSE4.1, SSE4.2 and AVX instructions in TensorFlow [duplicate]

This is the message received from running a script to check if Tensorflow is working:
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I noticed that it has mentioned SSE4.2 and AVX,
What are SSE4.2 and AVX?
How do these SSE4.2 and AVX improve CPU computations for Tensorflow tasks.
How to make Tensorflow compile using the two libraries?
I just ran into this same problem, it seems like Yaroslav Bulatov's suggestion doesn't cover SSE4.2 support, adding --copt=-msse4.2 would suffice. In the end, I successfully built with
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package
without getting any warning or errors.
Probably the best choice for any system is:
bazel build -c opt --copt=-march=native --copt=-mfpmath=both --config=cuda -k //tensorflow/tools/pip_package:build_pip_package
(Update: the build scripts may be eating -march=native, possibly because it contains an =.)
-mfpmath=both only works with gcc, not clang. -mfpmath=sse is probably just as good, if not better, and is the default for x86-64. 32-bit builds default to -mfpmath=387, so changing that will help for 32-bit. (But if you want high-performance for number crunching, you should build 64-bit binaries.)
I'm not sure what TensorFlow's default for -O2 or -O3 is. gcc -O3 enables full optimization including auto-vectorization, but that sometimes can make code slower.
What this does: --copt for bazel build passes an option directly to gcc for compiling C and C++ files (but not linking, so you need a different option for cross-file link-time-optimization)
x86-64 gcc defaults to using only SSE2 or older SIMD instructions, so you can run the binaries on any x86-64 system. (See https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html). That's not what you want. You want to make a binary that takes advantage of all the instructions your CPU can run, because you're only running this binary on the system where you built it.
-march=native enables all the options your CPU supports, so it makes -mavx512f -mavx2 -mavx -mfma -msse4.2 redundant. (Also, -mavx2 already enables -mavx and -msse4.2, so Yaroslav's command should have been fine). Also if you're using a CPU that doesn't support one of these options (like FMA), using -mfma would make a binary that faults with illegal instructions.
TensorFlow's ./configure defaults to enabling -march=native, so using that should avoid needing to specify compiler options manually.
-march=native enables -mtune=native, so it optimizes for your CPU for things like which sequence of AVX instructions is best for unaligned loads.
This all applies to gcc, clang, or ICC. (For ICC, you can use -xHOST instead of -march=native.)
Let's start with the explanation of why do you see these warnings in the first place.
Most probably you have not installed TF from source and instead of it used something like pip install tensorflow. That means that you installed pre-built (by someone else) binaries which were not optimized for your architecture. And these warnings tell you exactly this: something is available on your architecture, but it will not be used because the binary was not compiled with it. Here is the part from documentation.
TensorFlow checks on startup whether it has been compiled with the
optimizations available on the CPU. If the optimizations are not
included, TensorFlow will emit warnings, e.g. AVX, AVX2, and FMA
instructions not included.
Good thing is that most probably you just want to learn/experiment with TF so everything will work properly and you should not worry about it
What are SSE4.2 and AVX?
Wikipedia has a good explanation about SSE4.2 and AVX. This knowledge is not required to be good at machine-learning. You may think about them as a set of some additional instructions for a computer to use multiple data points against a single instruction to perform operations which may be naturally parallelized (for example adding two arrays).
Both SSE and AVX are implementation of an abstract idea of SIMD (Single instruction, multiple data), which is
a class of parallel computers in Flynn's taxonomy. It describes
computers with multiple processing elements that perform the same
operation on multiple data points simultaneously. Thus, such machines
exploit data level parallelism, but not concurrency: there are
simultaneous (parallel) computations, but only a single process
(instruction) at a given moment
This is enough to answer your next question.
How do these SSE4.2 and AVX improve CPU computations for TF tasks
They allow a more efficient computation of various vector (matrix/tensor) operations. You can read more in these slides
How to make Tensorflow compile using the two libraries?
You need to have a binary which was compiled to take advantage of these instructions. The easiest way is to compile it yourself. As Mike and Yaroslav suggested, you can use the following bazel command
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package
Let me answer your 3rd question first:
If you want to run a self-compiled version within a conda-env, you can. These are the general instructions I run to get tensorflow to install on my system with additional instructions. Note: This build was for an AMD A10-7850 build (check your CPU for what instructions are supported...it may differ) running Ubuntu 16.04 LTS. I use Python 3.5 within my conda-env. Credit goes to the tensorflow source install page and the answers provided above.
git clone https://github.com/tensorflow/tensorflow
# Install Bazel
# https://bazel.build/versions/master/docs/install.html
sudo apt-get install python3-numpy python3-dev python3-pip python3-wheel
# Create your virtual env with conda.
source activate YOUR_ENV
pip install six numpy wheel, packaging, appdir
# Follow the configure instructions at:
# https://www.tensorflow.org/install/install_sources
# Build your build like below. Note: Check what instructions your CPU
# support. Also. If resources are limited consider adding the following
# tag --local_resources 2048,.5,1.0 . This will limit how much ram many
# local resources are used but will increase time to compile.
bazel build -c opt --copt=-mavx --copt=-msse4.1 --copt=-msse4.2 -k //tensorflow/tools/pip_package:build_pip_package
# Create the wheel like so:
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
# Inside your conda env:
pip install /tmp/tensorflow_pkg/NAME_OF_WHEEL.whl
# Then install the rest of your stack
pip install keras jupyter etc. etc.
As to your 2nd question:
A self-compiled version with optimizations are well worth the effort in my opinion. On my particular setup, calculations that used to take 560-600 seconds now only take about 300 seconds! Although the exact numbers will vary, I think you can expect about a 35-50% speed increase in general on your particular setup.
Lastly your 1st question:
A lot of the answers have been provided above already. To summarize: AVX, SSE4.1, SSE4.2, MFA are different kinds of extended instruction sets on X86 CPUs. Many contain optimized instructions for processing matrix or vector operations.
I will highlight my own misconception to hopefully save you some time: It's not that SSE4.2 is a newer version of instructions superseding SSE4.1. SSE4 = SSE4.1 (a set of 47 instructions) + SSE4.2 (a set of 7 instructions).
In the context of tensorflow compilation, if you computer supports AVX2 and AVX, and SSE4.1 and SSE4.2, you should put those optimizing flags in for all. Don't do like I did and just go with SSE4.2 thinking that it's newer and should superseed SSE4.1. That's clearly WRONG! I had to recompile because of that which cost me a good 40 minutes.
These are SIMD vector processing instruction sets.
Using vector instructions is faster for many tasks; machine learning is such a task.
Quoting the tensorflow installation docs:
To be compatible with as wide a range of machines as possible, TensorFlow defaults to only using SSE4.1 SIMD instructions on x86 machines. Most modern PCs and Macs support more advanced instructions, so if you're building a binary that you'll only be running on your own machine, you can enable these by using --copt=-march=native in your bazel build command.
Thanks to all this replies + some trial and errors, I managed to install it on a Mac with clang. So just sharing my solution in case it is useful to someone.
Follow the instructions on Documentation - Installing TensorFlow from Sources
When prompted for
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]
then copy-paste this string:
-mavx -mavx2 -mfma -msse4.2
(The default option caused errors, so did some of the other flags. I got no errors with the above flags. BTW I replied n to all the other questions)
After installing, I verify a ~2x to 2.5x speedup when training deep models with respect to another installation based on the default wheels - Installing TensorFlow on macOS
Hope it helps
I have recently installed it from source and bellow are all the steps needed to install it from source with the mentioned instructions available.
Other answers already describe why those messages are shown. My answer gives a step-by-step on how to isnstall, which may help people struglling on the actual installation as I did.
Install Bazel
Download it from one of their available releases, for example 0.5.2.
Extract it, go into the directory and configure it: bash ./compile.sh.
Copy the executable to /usr/local/bin: sudo cp ./output/bazel /usr/local/bin
Install Tensorflow
Clone tensorflow: git clone https://github.com/tensorflow/tensorflow.git
Go to the cloned directory to configure it: ./configure
It will prompt you with several questions, bellow I have suggested the response to each of the questions, you can, of course, choose your own responses upon as you prefer:
Using python library path: /usr/local/lib/python2.7/dist-packages
Do you wish to build TensorFlow with MKL support? [y/N] y
MKL support will be enabled for TensorFlow
Do you wish to download MKL LIB from the web? [Y/n] Y
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Do you wish to use jemalloc as the malloc implementation? [Y/n] n
jemalloc disabled
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] N
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N] N
No Hadoop File System support will be enabled for TensorFlow
Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N] N
No XLA JIT support will be enabled for TensorFlow
Do you wish to build TensorFlow with VERBS support? [y/N] N
No VERBS support will be enabled for TensorFlow
Do you wish to build TensorFlow with OpenCL support? [y/N] N
No OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] N
No CUDA support will be enabled for TensorFlow
The pip package. To build it you have to describe which instructions you want (you know, those Tensorflow informed you are missing).
Build pip script: bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.1 --copt=-msse4.2 -k //tensorflow/tools/pip_package:build_pip_package
Build pip package: bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
Install Tensorflow pip package you just built: sudo pip install /tmp/tensorflow_pkg/tensorflow-1.2.1-cp27-cp27mu-linux_x86_64.whl
Now next time you start up Tensorflow it will not complain anymore about missing instructions.
This is the simplest method. Only one step.
It has significant impact on speed. In my case, time taken for a training step almost halved.
Refer
custom builds of tensorflow
I compiled a small Bash script for Mac (easily can be ported to Linux) to retrieve all CPU features and apply some of them to build TF. Im on TF master and use kinda often (couple times in a month).
https://gist.github.com/venik/9ba962c8b301b0e21f99884cbd35082f
To compile TensorFlow with SSE4.2 and AVX, you can use directly
bazel build --config=mkl
--config="opt"
--copt="-march=broadwell"
--copt="-O3"
//tensorflow/tools/pip_package:build_pip_package
Source:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/docker/Dockerfile.devel-cpu-mkl
2.0 COMPATIBLE SOLUTION:
Execute the below commands in Terminal (Linux/MacOS) or in Command Prompt (Windows) to install Tensorflow 2.0 using Bazel:
git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow
#The repo defaults to the master development branch. You can also checkout a release branch to build:
git checkout r2.0
#Configure the Build => Use the Below line for Windows Machine
python ./configure.py
#Configure the Build => Use the Below line for Linux/MacOS Machine
./configure
#This script prompts you for the location of TensorFlow dependencies and asks for additional build configuration options.
#Build Tensorflow package
#CPU support
bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
#GPU support
bazel build --config=opt --config=cuda --define=no_tensorflow_py_deps=true //tensorflow/tools/pip_package:build_pip_package
When building TensorFlow from source, you'll run the configure script. One of the questions that the configure script asks is as follows:
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]
The configure script will attach the flag(s) you specify to the bazel command that builds the TensorFlow pip package. Broadly speaking, you can respond to this prompt in one of two ways:
If you are building TensorFlow on the same type of CPU type as the one on which you'll run TensorFlow, then you should accept the default (-march=native). This option will optimize the generated code for your machine's CPU type.
If you are building TensorFlow on one CPU type but will run TensorFlow on a different CPU type, then consider supplying a more specific optimization flag as described in the gcc
documentation.
After configuring TensorFlow as described in the preceding bulleted list, you should be able to build TensorFlow fully optimized for the target CPU just by adding the --config=opt flag to any bazel command you are running.
To hide those warnings, you could do this before your actual code.
import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import tensorflow as tf

How to compile Tensorflow with SSE4.2 and AVX instructions?

This is the message received from running a script to check if Tensorflow is working:
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I noticed that it has mentioned SSE4.2 and AVX,
What are SSE4.2 and AVX?
How do these SSE4.2 and AVX improve CPU computations for Tensorflow tasks.
How to make Tensorflow compile using the two libraries?
I just ran into this same problem, it seems like Yaroslav Bulatov's suggestion doesn't cover SSE4.2 support, adding --copt=-msse4.2 would suffice. In the end, I successfully built with
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package
without getting any warning or errors.
Probably the best choice for any system is:
bazel build -c opt --copt=-march=native --copt=-mfpmath=both --config=cuda -k //tensorflow/tools/pip_package:build_pip_package
(Update: the build scripts may be eating -march=native, possibly because it contains an =.)
-mfpmath=both only works with gcc, not clang. -mfpmath=sse is probably just as good, if not better, and is the default for x86-64. 32-bit builds default to -mfpmath=387, so changing that will help for 32-bit. (But if you want high-performance for number crunching, you should build 64-bit binaries.)
I'm not sure what TensorFlow's default for -O2 or -O3 is. gcc -O3 enables full optimization including auto-vectorization, but that sometimes can make code slower.
What this does: --copt for bazel build passes an option directly to gcc for compiling C and C++ files (but not linking, so you need a different option for cross-file link-time-optimization)
x86-64 gcc defaults to using only SSE2 or older SIMD instructions, so you can run the binaries on any x86-64 system. (See https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html). That's not what you want. You want to make a binary that takes advantage of all the instructions your CPU can run, because you're only running this binary on the system where you built it.
-march=native enables all the options your CPU supports, so it makes -mavx512f -mavx2 -mavx -mfma -msse4.2 redundant. (Also, -mavx2 already enables -mavx and -msse4.2, so Yaroslav's command should have been fine). Also if you're using a CPU that doesn't support one of these options (like FMA), using -mfma would make a binary that faults with illegal instructions.
TensorFlow's ./configure defaults to enabling -march=native, so using that should avoid needing to specify compiler options manually.
-march=native enables -mtune=native, so it optimizes for your CPU for things like which sequence of AVX instructions is best for unaligned loads.
This all applies to gcc, clang, or ICC. (For ICC, you can use -xHOST instead of -march=native.)
Let's start with the explanation of why do you see these warnings in the first place.
Most probably you have not installed TF from source and instead of it used something like pip install tensorflow. That means that you installed pre-built (by someone else) binaries which were not optimized for your architecture. And these warnings tell you exactly this: something is available on your architecture, but it will not be used because the binary was not compiled with it. Here is the part from documentation.
TensorFlow checks on startup whether it has been compiled with the
optimizations available on the CPU. If the optimizations are not
included, TensorFlow will emit warnings, e.g. AVX, AVX2, and FMA
instructions not included.
Good thing is that most probably you just want to learn/experiment with TF so everything will work properly and you should not worry about it
What are SSE4.2 and AVX?
Wikipedia has a good explanation about SSE4.2 and AVX. This knowledge is not required to be good at machine-learning. You may think about them as a set of some additional instructions for a computer to use multiple data points against a single instruction to perform operations which may be naturally parallelized (for example adding two arrays).
Both SSE and AVX are implementation of an abstract idea of SIMD (Single instruction, multiple data), which is
a class of parallel computers in Flynn's taxonomy. It describes
computers with multiple processing elements that perform the same
operation on multiple data points simultaneously. Thus, such machines
exploit data level parallelism, but not concurrency: there are
simultaneous (parallel) computations, but only a single process
(instruction) at a given moment
This is enough to answer your next question.
How do these SSE4.2 and AVX improve CPU computations for TF tasks
They allow a more efficient computation of various vector (matrix/tensor) operations. You can read more in these slides
How to make Tensorflow compile using the two libraries?
You need to have a binary which was compiled to take advantage of these instructions. The easiest way is to compile it yourself. As Mike and Yaroslav suggested, you can use the following bazel command
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package
Let me answer your 3rd question first:
If you want to run a self-compiled version within a conda-env, you can. These are the general instructions I run to get tensorflow to install on my system with additional instructions. Note: This build was for an AMD A10-7850 build (check your CPU for what instructions are supported...it may differ) running Ubuntu 16.04 LTS. I use Python 3.5 within my conda-env. Credit goes to the tensorflow source install page and the answers provided above.
git clone https://github.com/tensorflow/tensorflow
# Install Bazel
# https://bazel.build/versions/master/docs/install.html
sudo apt-get install python3-numpy python3-dev python3-pip python3-wheel
# Create your virtual env with conda.
source activate YOUR_ENV
pip install six numpy wheel, packaging, appdir
# Follow the configure instructions at:
# https://www.tensorflow.org/install/install_sources
# Build your build like below. Note: Check what instructions your CPU
# support. Also. If resources are limited consider adding the following
# tag --local_resources 2048,.5,1.0 . This will limit how much ram many
# local resources are used but will increase time to compile.
bazel build -c opt --copt=-mavx --copt=-msse4.1 --copt=-msse4.2 -k //tensorflow/tools/pip_package:build_pip_package
# Create the wheel like so:
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
# Inside your conda env:
pip install /tmp/tensorflow_pkg/NAME_OF_WHEEL.whl
# Then install the rest of your stack
pip install keras jupyter etc. etc.
As to your 2nd question:
A self-compiled version with optimizations are well worth the effort in my opinion. On my particular setup, calculations that used to take 560-600 seconds now only take about 300 seconds! Although the exact numbers will vary, I think you can expect about a 35-50% speed increase in general on your particular setup.
Lastly your 1st question:
A lot of the answers have been provided above already. To summarize: AVX, SSE4.1, SSE4.2, MFA are different kinds of extended instruction sets on X86 CPUs. Many contain optimized instructions for processing matrix or vector operations.
I will highlight my own misconception to hopefully save you some time: It's not that SSE4.2 is a newer version of instructions superseding SSE4.1. SSE4 = SSE4.1 (a set of 47 instructions) + SSE4.2 (a set of 7 instructions).
In the context of tensorflow compilation, if you computer supports AVX2 and AVX, and SSE4.1 and SSE4.2, you should put those optimizing flags in for all. Don't do like I did and just go with SSE4.2 thinking that it's newer and should superseed SSE4.1. That's clearly WRONG! I had to recompile because of that which cost me a good 40 minutes.
These are SIMD vector processing instruction sets.
Using vector instructions is faster for many tasks; machine learning is such a task.
Quoting the tensorflow installation docs:
To be compatible with as wide a range of machines as possible, TensorFlow defaults to only using SSE4.1 SIMD instructions on x86 machines. Most modern PCs and Macs support more advanced instructions, so if you're building a binary that you'll only be running on your own machine, you can enable these by using --copt=-march=native in your bazel build command.
Thanks to all this replies + some trial and errors, I managed to install it on a Mac with clang. So just sharing my solution in case it is useful to someone.
Follow the instructions on Documentation - Installing TensorFlow from Sources
When prompted for
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]
then copy-paste this string:
-mavx -mavx2 -mfma -msse4.2
(The default option caused errors, so did some of the other flags. I got no errors with the above flags. BTW I replied n to all the other questions)
After installing, I verify a ~2x to 2.5x speedup when training deep models with respect to another installation based on the default wheels - Installing TensorFlow on macOS
Hope it helps
I have recently installed it from source and bellow are all the steps needed to install it from source with the mentioned instructions available.
Other answers already describe why those messages are shown. My answer gives a step-by-step on how to isnstall, which may help people struglling on the actual installation as I did.
Install Bazel
Download it from one of their available releases, for example 0.5.2.
Extract it, go into the directory and configure it: bash ./compile.sh.
Copy the executable to /usr/local/bin: sudo cp ./output/bazel /usr/local/bin
Install Tensorflow
Clone tensorflow: git clone https://github.com/tensorflow/tensorflow.git
Go to the cloned directory to configure it: ./configure
It will prompt you with several questions, bellow I have suggested the response to each of the questions, you can, of course, choose your own responses upon as you prefer:
Using python library path: /usr/local/lib/python2.7/dist-packages
Do you wish to build TensorFlow with MKL support? [y/N] y
MKL support will be enabled for TensorFlow
Do you wish to download MKL LIB from the web? [Y/n] Y
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Do you wish to use jemalloc as the malloc implementation? [Y/n] n
jemalloc disabled
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] N
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N] N
No Hadoop File System support will be enabled for TensorFlow
Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N] N
No XLA JIT support will be enabled for TensorFlow
Do you wish to build TensorFlow with VERBS support? [y/N] N
No VERBS support will be enabled for TensorFlow
Do you wish to build TensorFlow with OpenCL support? [y/N] N
No OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] N
No CUDA support will be enabled for TensorFlow
The pip package. To build it you have to describe which instructions you want (you know, those Tensorflow informed you are missing).
Build pip script: bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.1 --copt=-msse4.2 -k //tensorflow/tools/pip_package:build_pip_package
Build pip package: bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
Install Tensorflow pip package you just built: sudo pip install /tmp/tensorflow_pkg/tensorflow-1.2.1-cp27-cp27mu-linux_x86_64.whl
Now next time you start up Tensorflow it will not complain anymore about missing instructions.
This is the simplest method. Only one step.
It has significant impact on speed. In my case, time taken for a training step almost halved.
Refer
custom builds of tensorflow
I compiled a small Bash script for Mac (easily can be ported to Linux) to retrieve all CPU features and apply some of them to build TF. Im on TF master and use kinda often (couple times in a month).
https://gist.github.com/venik/9ba962c8b301b0e21f99884cbd35082f
To compile TensorFlow with SSE4.2 and AVX, you can use directly
bazel build --config=mkl
--config="opt"
--copt="-march=broadwell"
--copt="-O3"
//tensorflow/tools/pip_package:build_pip_package
Source:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/docker/Dockerfile.devel-cpu-mkl
2.0 COMPATIBLE SOLUTION:
Execute the below commands in Terminal (Linux/MacOS) or in Command Prompt (Windows) to install Tensorflow 2.0 using Bazel:
git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow
#The repo defaults to the master development branch. You can also checkout a release branch to build:
git checkout r2.0
#Configure the Build => Use the Below line for Windows Machine
python ./configure.py
#Configure the Build => Use the Below line for Linux/MacOS Machine
./configure
#This script prompts you for the location of TensorFlow dependencies and asks for additional build configuration options.
#Build Tensorflow package
#CPU support
bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
#GPU support
bazel build --config=opt --config=cuda --define=no_tensorflow_py_deps=true //tensorflow/tools/pip_package:build_pip_package
When building TensorFlow from source, you'll run the configure script. One of the questions that the configure script asks is as follows:
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]
The configure script will attach the flag(s) you specify to the bazel command that builds the TensorFlow pip package. Broadly speaking, you can respond to this prompt in one of two ways:
If you are building TensorFlow on the same type of CPU type as the one on which you'll run TensorFlow, then you should accept the default (-march=native). This option will optimize the generated code for your machine's CPU type.
If you are building TensorFlow on one CPU type but will run TensorFlow on a different CPU type, then consider supplying a more specific optimization flag as described in the gcc
documentation.
After configuring TensorFlow as described in the preceding bulleted list, you should be able to build TensorFlow fully optimized for the target CPU just by adding the --config=opt flag to any bazel command you are running.
To hide those warnings, you could do this before your actual code.
import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import tensorflow as tf