How to control what Tensorflow CPU feature tos be compiled into the binary? - tensorflow

I'm having runtime errors when running Tensorflow programs. I tracked down the problem and the source seems to be that the shared object I have compiled includes CPU features that are not supported by my processor. That causes some pointers pointing to invalid addresses. Right now there are 37 CPU feature defined in tensorflow/core/platform/cpu_info.h. My question is how to not include some of those when compiling Tensorflow's source code.

There is a ./configure option for optimization flags to the compiler (CC_OPT_FLAGS). By default this is -march=native, which will try to include instructions supported by the current architecture. You can turn off some of these manually with e.g. -mno-avx to turn off AVX.

Related

Still getting "Your CPU supports instructions that this TensorFlow binary was not compiled to use: " while using -march=native

I have tried to compile Tensorflow 2.0 to get the benefits of extra cpu instructions like avx, but to no avail. I have read through How to compile Tensorflow with SSE4.2 and AVX instructions? but I am still confused as unless you are building for another PC surely -march=native should just work. I have tried building 2 times with different instructions and am still getting the warning message.
I think I used the below, and I think I have the logs still saved if someone wants to help.
"bazel build //tensorflow/tools/pip_package:build_pip_package
d_pip_package --config=mkl"
"bazel build -c opt --copt=-march=native --config=mkl //tensorflow/tools/pip_package:build_pip_package
This is only for the satisfaction of understanding what is going on. I currently don't need the benefit the optimisation will bring, but I do not understand why the method I used isn't working as I followed it exactly.
As noted by my edit in the top answer on the question you linked, it seems bazel and/or TensorFlow's build scripts are buggy. They mishandle -march=native and fail to pass it on to the compiler. I'm guessing it does something wrong with args that have an = in their name, because args like -mfma work.
You are correct, if they were correctly passing -march=native to the compiler there would be no problem, and no need for any of this complication.
IDK why nobody's fix this huge inconvenience yet, instead leaving lots of users who aren't experts on x86 CPU features to stumble around trying to figure out which features their CPU has and how to enable them for gcc/clang. This is exactly what -march=native is for, along with the other important feature of setting tuning options appropriately for the machine you're compiling on.
I had a look once, but I don't actually use TensorFlow and don't know bazel so I got bogged down in the maze of build machinery between that command line and actual invocation of g++ ... foo.cpp

GCC link error, relocation truncated to fit: GPREL16 against symbol error

I'm cross compiling tensorflow r1.9 at present. The host system is ubuntu 18.04, the target system is sw26010 (a Chinese CPU which instruction set is based on alpha). The cross compiler is based on GCC 5.3.
Due to some OS restriction, I must static link all libraries to tensorflow. libstdc++.a and libpthread.a. are included.
I can compile all object files successfully after some configeration. (add "//conditions:default": []," to nsync BUILD file, and add the sw2 CPU macro to double conversion BUILD file). However, I cannot link all library files and object files successfully.
Here is the error message.
/home/qh5/swgcc530/gcc-5.3.0/libstdc++-v3/src/c++98/ios_init.cc:140:(.text._ZNSt8ios_base4InitD2Ev+0xf4): relocation truncated to fit: GPREL16 against symbol `std::wcerr' defined in .bss._ZSt5wcerr section in /usr/sw-mpp/swcc/swgcc530-tools/usr/sw_64sw2-unknown-linux-gnu/lib/libstdc++.a(globals_io.o)
Here is the CROSSTOOL for tensorflow bazel.
CROSSTOOL on hastebin
I tried to use compiler_flag: "-msmall-data" and compiler_flag: "-fpic" to fix the bug, but failed.
Finally, this error is solved by contacting the compiler team. If you have the same problem. Please seek help from the Chinese compiler team and update your compiler.

how to know the lowest cuda toolkit version that supports for one specific gpu like gtx1080

For a specific gpu like gtx1080, I want to know that which cuda toolkit versions support for it. I have scanned the official website of nvidia, but find no specific result.
You could first check the Computing Capability (CC) of GTX1080 in the following site. It is CC 6.1.
https://developer.nvidia.com/cuda-gpus
Then check the CUDA doc below to see if it is in the supported CC list of the current version of CUDA 7.5. It is not.
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities
So it should be supported by a future release. Currently the coming release is CUDA 8. You will find it in its document.
If you are not sure about the version of the doucments, you could find the documents associated with a specific CUDA installation, such as
/usr/local/cuda-7.5/doc
The help message of CUDA compiler also gives the supported CC list.
$ nvcc --help
--gpu-code <code>,... (-code)
Specify the name of the NVIDIA GPU to assemble and optimize PTX for.
nvcc embeds a compiled code image in the resulting executable for each specified
<code> architecture, which is a true binary load image for each 'real' architecture
(such as sm_20), and PTX code for the 'virtual' architecture (such as compute_20).
During runtime, such embedded PTX code is dynamically compiled by the CUDA
runtime system if no binary load image is found for the 'current' GPU.
Architectures specified for options '--gpu-architecture' and '--gpu-code'
may be 'virtual' as well as 'real', but the <code> architectures must be
compatible with the <arch> architecture. When the '--gpu-code' option is
used, the value for the '--gpu-architecture' option must be a 'virtual' PTX
architecture.
For instance, '--gpu-architecture=compute_35' is not compatible with '--gpu-code=sm_30',
because the earlier compilation stages will assume the availability of 'compute_35'
features that are not present on 'sm_30'.
Allowed values for this option: 'compute_20','compute_30','compute_32',
'compute_35','compute_37','compute_50','compute_52','compute_53','sm_20',
'sm_21','sm_30','sm_32','sm_35','sm_37','sm_50','sm_52','sm_53'.

Is there a way to show where LLVM is auto vectorising?

Context: I have several loops in an Objective-C library I am writing which deal with processing large text arrays. I can see that right now it is running in a single threaded manner.
I understand that LLVM is now capable of auto-vectorising loops, as described at Apple's session at WWDC. It is however very cautious in the way it does it, one reason being the possibility of variables being modified due to CPU pipelining.
My question: how can I see where LLVM has vectorised my code, and, more usefully, how can I receive debug messages that explain why it can't vectorise my code? I'm sure if it can see why it can't auto-vectorise it, it could point that out to me and I could make the necessary manual adjustments to make it vectorisable.
I would be remiss if I didn't point out that this question has been more or less asked already, but quite obtusely, here.
Identifies loops that were successfully vectorized:
clang -Rpass=loop-vectorize
Identifies loops that failed vectorization and indicates if vectorization was specified:
clang -Rpass-missed=loop-vectorize
Identifies the statements that caused vectorization to fail:
clang -Rpass-analysis=loop-vectorize
Source: http://llvm.org/docs/Vectorizers.html#diagnostics
The standard llvm toolchain provided by Xcode doesn't seem to support getting debug info from the optimizer. However, if you roll your own llvm and use that, you should be able to pass flags as mishr suggested above. Here's the workflow I used:
1. Using homebrew, install llvm
brew tap homebrew/versions
brew install llvm33 --with-clang --with-asan
This should install the full and relatively current llvm toolchain. It's linked into /usr/local/bin/*-3.3 (i.e. clang++-3.3). The actual on-disk location is available via brew info llvm33 - probably /usr/local/Cellar/llvm33/3.3/bin.
2. Build the single file you're optimizing, with homebrew llvm and flags
If you've built in Xcode, you can easily copy-paste the build parameters, and use your clang++-3.3 instead of Xcode’s own clang.
Appending -mllvm -debug-only=loop-vectorize will get you the auto-vectorization report. Note: this will likely NOT work with any remotely complex build, e.g. if you've got PCH's, but is a simple way to tweak a single cpp file to make sure it's vectorizing correctly.
3. Create a compiler plugin from the new llvm
I was able to build my entire project with homebrew llvm by:
Grabbing this Xcode compiler plugin: http://trac.seqan.de/browser/trunk/util/xcode/Clang%20LLVM%20MacPorts.xcplugin.zip?order=name
Modifying the clang-related paths to point to my homebrew llvm and clang bin names (by appending '-3.3')
Placing it in /Library/Application Support/Developer/5.0/Xcode/Plug-ins/
Relaunching Xcode should show this plugin in the list of available compilers. At this point, the -mllvm -debug-only=loop-vectorize flag will show the auto-vectorization report.
I have no idea why this isn't exposed in the Apple builds.
UPDATE: This is exposed in current (8.x) versions of Xcode. The only thing required is to enable one or more of the loop-vectorize flags.
Assuming you are using opt and you have a debug build of llvm, you can do it as follows:
opt -O1 -loop-vectorize -debug-only=loop-vectorize code.ll
where code.ll is the IR you want to vectorize.
If you are using clang, you will need to pass the -debug-only=loop-vectorize flag using -mllvm option.

Chromium Ninja build fails (Illegal Instruction output)

I followed the Linux build instructions and when I try running "ninja -C out/Debug chrome", I just get the output "Illegal Instruction (core dumped)". Now, I wish I could actually find where the core dump is located to see if there is more specific information in there...
For reference, I am trying to run Ninja on Ubuntu 13.10.
Has anyone else experienced this while building Chromium or while trying to build anything else using Ninja? Also, where could I find the core dump?
The error message "Illegal Instruction (core dumped)" indicates that the current binary is using an instruction that is not supported by your CPU.
Please check whether software used for compilation (compiler, linker, ar, ninja-build etc.) is matching your CPU architecture. Unless you have no fancy system like ARM or POWER, you mixed up 32 bit (e.g. i586) and 64 bit (x86-64).
Or you compile to a wrong target. Does your compiler flags include flags beginning with -m like "-march="? That could lead to the same error but only if the compiled code is executed.
Have you built gyp or ninja-build yourself? This would be an other place to make such a mistake.