Enable XNNPack in TFLite v2.3 for ARM64

Enable XNNPack in TFLite v2.3 for ARM64 - tensorflow

The TFLite team recently announced XNNPack support in TF v2.3 (https://blog.tensorflow.org/2020/07/accelerating-tensorflow-lite-xnnpack-integration.html). This should provide some pretty impressive speedups on float operations on ARM v8 cores.
Does anyone know how to enable XNNPack for ARM64 builds of TFLite? The benchmarking application in particular would be a good place to test out this new functionality on target hardware. iOS and Android support is enabled by passing a flag to Bazel when compiling. Unfortunately, no guidance is given for building for ARM64 boards. The build instructions (see below) don't provide any updated guidance, and inspecting download_dependencies.sh doesn't show XNNPack being downloaded from anywhere.
https://www.tensorflow.org/lite/guide/build_arm64

XNNPACK is not yet supported via Makefile-based builds. We have recently added experimental support for cross-compilation to ARM64 (via --config=elinux_aarch64 in the bazel build command), which should allow build-time opt-in to XNNPACK by also adding --define tflite_with_xnnpack=true in your build command. Expect some improvements in documentation for cross-compilation to ARM64 in the next TF 2.4 release, where we'll also be looking into enabling XNNPACK by default for as many platforms as possible.

Related

gstreamer custom plugin for nvidia gpu

I want to develop a gstreamer plugin that can use the acceleration provided by the GPU of the graphics card (NVIDIA RTX2xxx). The objective is to have a fast gstreamer pipeline that process a video stream including on it a custom filter.
After two days googling, I can not find any example or hint.
One of the best alternatives found is use "nvivafilter", passing a cuda module as argument. However, no where explains how to install this plugin or provides an example. Worst, it seems that it could be specific for Nvidia Jetson hardware.
Another alternative seems use gstreamer inside an opencv python script. But that means a mix that I do not known how impacts performance.
This gstreamer tutorial talks about several libraries. But seems outdated and not provides details.
RidgeRun seems to have something similar to "nvivafilter", but not FOSS.
Has someone any example or suggestion about how to proceed.

I suggest you start with installing DS 5.0 and explore the examples and the apps provided. It's built on Gstreamer. Deepstream Instalation guide
The installation is straight forward. You will find custom parsers built.
You will need to install the following: Ubuntu 18.04, GStreamer 1.14.1, NVIDIA driver 440or later,CUDA 10.2,TensorRT 7.0 or later.
Here is an example of running an app with 4 streams. deepstream-app -c /opt/nvidia/deepstream/deepstream-5.0/samples/configs/deepstream-app/source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
The advantage of DS is that all the video pipeline is optimized on GPU including decoding and preprocessing. You can always run Gstreamer along opencv only, in my experience it's not an efficient implementation.
Building custom parsers:
The parsers are required to convert the raw Tensor data from the inference to (x,y) location of bounding boxes around the detected object. This post-processing algorithm will vary based on the detection architecture.
If using Deepstream 4.0, Transfer Learning Toolkit 1.0 and TensorRT 6.0: follow the instructions in the repository https://github.com/NVIDIA-AI-IOT/deepstream_4.x_apps
If using Deepstream 5.0, Transfer Learning Toolkit 2.0 and TensorRT 7.0: keep following the instructions from https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps
Resources:
Starting page: https://developer.nvidia.com/deepstream-sdk
Deepstream download and resources: https://developer.nvidia.com/deepstream-getting-started
Quick start manual: https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html
Integrate TLT model with Deepstream SDK: https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps
Deepstream Devblog: https://devblogs.nvidia.com/building-iva-apps-using-deepstream-5.0/
Plugin manual: https://docs.nvidia.com/metropolis/deepstream/plugin-manual/index.html
Deepstream 5.0 release notes: https://docs.nvidia.com/metropolis/deepstream/DeepStream_5.0_Release_Notes.pdf
Transfer Learning Toolkit v2.0 Release Notes: https://docs.nvidia.com/metropolis/TLT/tlt-release-notes/index.html
Transfer Learning Toolkit v2.0 Getting Started Guide: https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/index.html
Metropolis documentation: https://docs.nvidia.com/metropolis/
TensorRT: https://developer.nvidia.com/tensorrt
TensorRT documentation: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html
TensorRT Devblog: https://devblogs.nvidia.com/speeding-up-deep-learning-inference-using-tensorrt/
TensorRT Open Source Software: https://github.com/NVIDIA/TensorRT
https://gstreamer.freedesktop.org/documentation/base/gstbasetransform.html?gi-language=cGood luck.

Illegal instruction (core dumped) -Tensorflow GPU

I have installed Tensorflow-GPU version 1.9.0 and simple tensorflow import statement gives exception "Illegal instruction (core dumped)". If I downgrade tensorflow version to 1.5.0, it works fine. How to fix this issue for higher version as I need to work with it?
Thanks

Starting with v1.5.1 on Linux and v1.6.0 on other platforms, the official TensorFlow distribution is compiled with AVX instructions, meaning that older CPUs will not work with it (you can look up model compatibility, but it does not have to be an ancient CPU, it happened to me on an old Core i7).
If you want to use official releases, the only solution is to switch to a different hardware or to stick to the older version. There have been requests for support for older CPUs (and some people have uploaded their own build for a particular configuration, if it works for you and you trust it), but the general answer is that if you need specific support for your platform you can always build it yourself, disabling AVX optimizations (see the installation guide).

build tensorflow from source to use SSE3 and SSE4

Whenever I use tensorflow, it displays the message "The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations" and 2 more similar messages.
So I decided to build tensorflow from source to get rid of these messages. I'm using python 3.5 on Debian, and followed the instructions at https://www.tensorflow.org/install/install_sources (CPU only, no GPU).
It asked during the build if the build should be for the machine it's doing the build on, I selected that, it included -march=native in some compiler option.
Everything seemed to work, but when I ran python3 to test the build, it still gives the messages about "The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available ..." etc. How to I make the build use the hardware that it's running on?

There have been similar questions, and most of the answers to them are wrong. They say it's necessary to specify options like "--copt=-msse4.1 --copt=-msse4.2" in the build; it isn't. With the default option "-march=native", the GNU compiler will use SSE4.1 and SSE4.2 instructions if they are available.
The real problem is that if you build tensorflow from source, after installing the default build with pip, pip won't replace the old build with the new build. Everything will seem to work, but your old build remains in place in a directory under ~/.local.
The solution is simply to uninstall the old tensorflow with pip ('pip uninstall tensorflow' or 'pip3 uninstall tensorflow'), and then rebuild from source. If you have already done a build, and wondered why nothing seemed to change, you needn't repeat the build but can just execute the last couple of steps (https://www.tensorflow.org/install/install_sources), namely bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg , followed by the pip install.

bazel build Tensorflow from source

I have many big deep learning tasks in python 3.6 ahead and wanted to build tensorflow (CPU only) from source, as my MacBook Pro with Touchbar 13" noted that tensorflow would run faster if it were build with SSE4.1 SSE4.2 AVX AVX2 and FMA support. There are quite a lot questions on StackOverflow and GitHub regarding that topic and I read them all. None of which is addressing why it is not working for me.
I strictly followed the instructions provided by https://www.tensorflow.org/install/install_sources
my configure looks like this
./configure
Please specify the location of python. [Default is /anaconda/bin/python]: /anaconda/python.app/Contents/MacOS/python
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] n
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N] n
No Hadoop File System support will be enabled for TensorFlow
Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N] n
No XLA JIT support will be enabled for TensorFlow
Do you wish to build TensorFlow with VERBS support? [y/N] n
No VERBS support will be enabled for TensorFlow
Found possible Python library paths:
/anaconda/python.app/Contents/lib/python3.6/site-packages
Please input the desired Python library path to use. Default is [/anaconda/python.app/Contents/lib/python3.6/site-packages]
Using python library path: /anaconda/python.app/Contents/lib/python3.6/site-packages
Do you wish to build TensorFlow with OpenCL support? [y/N] n
No OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] n
No CUDA support will be enabled for TensorFlow
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
Configuration finished
with bazel 0.4.5 I then try to do the build as in the instructions
bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
This is executed without error but it gives literally hundreds of warnings. I can provide such as an example, but there hardly any snippets that go on without warning.
I appreciate ever help, thank you all very much.

Unfortunately compiler warnings are a fact of life. However, many of these come from external libraries which are pulled into the build. These can be filtered out with the "output_filter" argument to Bazel:
bazel build --config=opt --output_filter='^//tensorflow' //tensorflow/tools/pip_package:build_pip_package
This limits output to warnings generated by TensorFlow code (you can also turn warnings off entirely this way, but that takes all the fun out of compiling). Since the tooling used to build matches what TensorFlow is developed with more closely, there are fewer warnings (I get some about multi-line comment continuations, a bunch of signed/unsigned integer comparisons, and some about variables which "may" be uninitialized).
None of these indicate definite bugs, just patterns of code which are sometimes bug-prone. If the compiler knew something was wrong, it would emit an error instead. Which is a long way of saying there's nothing to worry about.

Bazel builds cause issues when I install TensorFlow using pip

So the documentation mentions that it is better to install from source, then build a pip package. Why is this recommended over doing a direct pip install using the wheel file provided on the downloads page? here
I tried the direct pip install and then runnning some scripts in the inception folder.This results in errors with bazel not finding some of the dependencies. I am guessing this is related to not building tensorflow from source, but I can't figure out why this is the case. Any pointers? Thanks!

Installing from pip is supported, can you provide more details on your os and the specific errors you saw?

The main reason to build from source is simply performance.
Building and installing from source
The default TensorFlow binaries target the broadest range of hardware to make TensorFlow accessible to everyone. If using CPUs for training or inference, it is recommended to compile TensorFlow with all of the optimizations available for the CPU in use. Speedups for training and inference on CPU are documented below in Comparing compiler optimizations.
To install the most optimized version of TensorFlow, build and install from source. If there is a need to build TensorFlow on a platform that has different hardware than the target, then cross-compile with the highest optimizations for the target platform. The following command is an example of using bazel to compile for a specific platform
ref: https://www.tensorflow.org/performance/performance_guide

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas