I am trying to run to get R + deepwater + tensorflow to work on a MBP.
The following have been installed.
Python 3.6.1
TensorFlow 1.1
The Hello, TensorFlow example on the TensorFlow website is working fine.
R version 3.4.0
curl -O http://h2o-release.s3.amazonaws.com/h2o/master/3904/R/src/contrib/h2o_3.11.0.3904.tar.gz
R CMD INSTALL h2o_3.11.0.3904.tar.gz
curl -O http://s3.amazonaws.com/h2o-deepwater/public/nightly/latest/h2o_3.11.0.tar.gz
R CMD INSTALL h2o_3.11.0.tar.gz
I am trying run the following example provided on the h2o website.
require(h2o)
h2o.init()
train <- h2o.importFile("https://h2o-public-test-data.s3.amazonaws.com/bigdata/laptop/mnist/train.csv.gz")
target <- "C785"
features <- setdiff(names(train), target)
train[target] <- as.factor(train[target])
model <- h2o.deepwater(x=features, y=target, training_frame=train, epochs=100, activation="Rectifier",
hidden=c(200,200), ignore_const_cols=FALSE, mini_batch_size=256, input_dropout_ratio=0.1,
hidden_dropout_ratios=c(0.5,0.5), stopping_rounds=3, stopping_tolerance=0.05,
stopping_metric="misclassification", score_interval=2, score_duty_cycle=0.5, score_training_samples=1000,
score_validation_samples=1000, nfolds=5, gpu=FALSE, seed=1234, backend="tensorflow")
The error I get is Error: java.lang.RuntimeException: Unable to initialize backend: Cannot find TensorFlow native library for OS: darwin, architecture: x86_64. Based on what I read on SO and the git page, I was under the impression that one does not need to build for the Mac platform.
One other thing that I tried was to use the info from https://github.com/rstudio/tensorflow. When I run install_tensorflow() I get Error: Prerequisites for installing TensorFlow not available. Please help!
We don't provide H2O+deepwater builds for MacOS. The one you downloaded is built for Linux machines (tested on Ubuntu), it's mentioned on the download page.
If you want to run it on MacOS you'd have to build both DeepWater and then H2O yourself.
Related
I got the following error when trying to train my tensorflow model on sagemaker ml.p2.xlarge instance. I use tensorflow==2.3.0. I wonder whether this is because of the tensorflow version incompatibility with cuda. sagemaker ml.p2.xlarge seems to use cuda 10.0
GPU error:
2020-08-31 08:46:46.429756: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/openmpi/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-08-31 08:47:02.170819: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/openmpi/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-08-31 08:47:02.764874: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
This question is probably old, but it falls back on an open issue found at the beginning of choosing which versions of frameworks to use.
The problem does not depend on the type of instance that you specified (which has NVidia GPU).
From the official documentation "Available Deep Learning Containers Images", to date 20/10/2022, precompiled versions higher than 2.2 do not seem to be usable:
Framework
Job Type
Horovod Options
CPU/GPU
Python Version Options
Example URL
TensorFlow 2.2 (Cuda 10.2)
training
Yes
GPU
3.7 (py37)
763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training:2.2.0-gpu-py37-cu102-ubuntu18.04
TensorFlow 2.2
inference
No
GPU
3.7 (py37)
763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.2.0-gpu-py37-cu102-ubuntu18.04
Within the dockerfile that is used to use the container is the instruction to install the libraries that your custom version is missing:
RUN apt-get update && apt-get install -y --no-install-recommends --allow-unauthenticated \
python3-dev \
python3-pip \
python3-setuptools \
ca-certificates \
cuda-command-line-tools-10-1 \
cuda-cudart-dev-10-1 \
cuda-cufft-dev-10-1 \
cuda-curand-dev-10-1 \
cuda-cusolver-dev-10-1 \
cuda-cusparse-dev-10-1 \
curl \
libcudnn7=7.6.2.24-1+cuda10.1 \
# TensorFlow doesn't require libnccl anymore but Open MPI still depends on it
libnccl2=2.4.7-1+cuda10.1 \
libgomp1 \
libnccl-dev=2.4.7-1+cuda10.1 \
....
Then you can install the required libraries from your custom version directly with a requirements.txt file or run the install command directly in the training script.
If there are no special project requirements, I recommend using the precompiled versions of sagemaker. Otherwise, build a docker image from scratch instead of installing libraries this way..
I'm looking the TF Lite Android App
Which can be found on GIT: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/lite/java/demo
How can I compile the tensorflow lite framework to use the optimized "atom" cpu type?
Is it possible to compile it on a MAC os with the CPU optimizations for the "atom" cpu?
I want to run the app on an Android device (SDK 22) with an "Intel Atom" Processor.
When I run the application without any changes through Android Studio the rate was about 1200ms per frame.
Compering the same APK installed on my Galaxy S9 (arm - snapdragon processor) was about 30ms per frame.
In the "build.gradle" there is this section:
dependencies {
...
compile 'org.tensorflow:tensorflow-lite:0.0.0-nightly'
...
}
So it's seems that it's downloading the framework,
How can I compile it locally with the CPU optimization and set the app to use it instead of downloading the non optimized nightly version?
I tried to run this tutorial :
Installing TensorFlow from Sources with the cpu flags but not sure exactly how it's helping me with the Android scenario..
Assuming that your Atom device is x86, use the --fat_apk_cpu flag to specify the x86 ABI:
$ bazel build -c opt --cxxopt='--std=c++11' \
--fat_apk_cpu=x86 \
//tensorflow/contrib/lite/java/demo/app/src/main:TfLiteCameraDemo
Switch x86 with x86_64 if you're building for a 64-bit device.
The built APK, available at bazel-bin/tensorflow/contrib/lite/java/demo/app/src/main/TfLiteCameraDemo.apk, will contain the x86 .so file:
$ zipinfo bazel-bin/tensorflow/contrib/lite/java/demo/app/src/main/TfLiteCameraDemo.apk | grep lib
-rw---- 2.0 fat 1434712 b- defN 80-Jan-01 00:00 lib/x86/libtensorflowlite_jni.so
If your device is connected, you can use bazel mobile-install instead of bazel build to directly install the app:
$ bazel mobile-install -c opt --cxxopt='--std=c++11' \
--fat_apk_cpu=x86 \
--start_app \
//tensorflow/contrib/lite/java/demo/app/src/main:TfLiteCameraDemo
The objective of my experiment is to build tensorflow on Jetson TK1 arm based embedded board. Since pre-builts of tensorflow for arm architecture are not given by the official releases, I was forced to the option of building it from source.
To build tensorflow, we need Bazel which should be also build from source. Now I got stuck here, not able to build bazel at all.
I have referred various blogs and github projects and tried to follow the instructions everyone said it worked for them.
1) Tensorflow on Raspberry-pi
2) Jetson Hacks building Tensorflow from source
3) Official Documentation
Steps Followed:
$ sudo apt-get install build-essential openjdk-8-jdk python zip
$ wget https://github.com/bazelbuild/bazel/releases/download/0.4.5/bazel-0.4.5-dist.zip
$ unzip -d bazel bazel-0.4.5-dist.zip
$ cd bazel
$ sudo ./compile.sh
Error Log:
ERROR: /build/bazel/src/main/protobuf/BUILD:25:2: Java compilation in rule '//src/main/protobuf:extra_actions_base_java_proto' failed: Worker process sent response with exit code: 1.
java.lang.InternalError: Cannot find requested resource bundle for locale en_US
at com.sun.tools.javac.util.JavacMessages.getBundles(JavacMessages.java:128)
at com.sun.tools.javac.util.JavacMessages.getLocalizedString(JavacMessages.java:147)
at com.sun.tools.javac.util.JavacMessages.getLocalizedString(JavacMessages.java:140)
at com.sun.tools.javac.util.Log.localize(Log.java:673)
at com.sun.tools.javac.util.Log.printLines(Log.java:485)
at com.sun.tools.javac.api.JavacTaskImpl.handleExceptions(JavacTaskImpl.java:156)
at com.sun.tools.javac.api.JavacTaskImpl.doCall(JavacTaskImpl.java:93)
at com.sun.tools.javac.api.JavacTaskImpl.call(JavacTaskImpl.java:87)
at com.google.devtools.build.buildjar.javac.BlazeJavacMain.compile(BlazeJavacMain.java:104)
at com.google.devtools.build.buildjar.SimpleJavaLibraryBuilder$1.invokeJavac(SimpleJavaLibraryBuilder.java:163)
at com.google.devtools.build.buildjar.ReducedClasspathJavaLibraryBuilder.compileSources(ReducedClasspathJavaLibraryBuilder.java:52)
at com.google.devtools.build.buildjar.SimpleJavaLibraryBuilder.compileJavaLibrary(SimpleJavaLibraryBuilder.java:166)
at com.google.devtools.build.buildjar.SimpleJavaLibraryBuilder.run(SimpleJavaLibraryBuilder.java:178)
at com.google.devtools.build.buildjar.BazelJavaBuilder.processRequest(BazelJavaBuilder.java:90)
at com.google.devtools.build.buildjar.BazelJavaBuilder.runPersistentWorker(BazelJavaBuilder.java:67)
at com.google.devtools.build.buildjar.BazelJavaBuilder.main(BazelJavaBuilder.java:44)
Caused by: java.util.MissingResourceException: Can't find bundle for base name com.google.errorprone.errors, locale en_US
at java.util.ResourceBundle.throwMissingResourceException(ResourceBundle.java:1573)
at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1396)
at java.util.ResourceBundle.getBundle(ResourceBundle.java:854)
at com.sun.tools.javac.util.JavacMessages.lambda$add$0(JavacMessages.java:106)
at com.sun.tools.javac.util.JavacMessages.getBundles(JavacMessages.java:125)
... 15 more
Target //src:bazel failed to build
INFO: Elapsed time: 291.995s, Critical Path: 258.92s
ERROR: Could not build Bazel
To make sure the error is independent of the architecture, I have tried to build Bazel in x86_64 PC. Even there I am getting the same error. I have seen people created the similar issue in bazel github group, none solved.
Version 0.4.5 is very old. We just released 0.12.0, could you try that one?
I am learning tesnorflow from this blog:
http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/
The code i am running is :
https://github.com/dennybritz/cnn-text-classification-tf/blob/master/train.py
I have installed tensorflow from sourcse in a virtual enviroment,in CPU only enviroment using followinbg bazel build command: bazel build --config=mkl ...
here is the exact error:
"2018-01-16 03:15:27.783040: F tensorflow/core/kernels/mkl_maxpooling_op.cc:157] Check failed: dnnPoolingCreateForward_F32( &prim_pooling_fwd, primAttr, algorithm, lt_user_input, params.kernel_size, params.kernel_stride, params.in_offset, dnnBorderZerosAsymm) == E_SUCCESS (-127 vs. 0)
Aborted
"
I have debugged error to the line where sess.run is written, i have beleived it has something to do it mkl_maxpooling, as i had installed tensorflow with mkl optimization of INTEL cpu's
Given below are the steps that I followed:
Build tensorflow 1.4 from source with mkl as mentioned in the question
Cloned the git repo "https://github.com/dennybritz/cnn-text-classification-tf.git"
Ran "python train.py" from "cnn-text-classification-tf" directory(created from git clone)
Code ran without any errors. So it seems like the tensorflow was not properly built from the source. Please confirm that there were no errors while building tensorflow from source.
I successfully build bazel and tensorflow from the source code, but when using the tensorflow module I am getting the following error:
./new_python/bin/python
>>>import tensorflow as tf
Error MSG: File "/home/niraj/Ansible/new_python/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in <module> _pywrap_tensorflow = swig_import_helper()
ImportError: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /home/niraj/Ansible/new_python/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so)
I am using RHEL6 machine. Any idea how to fix this ?
I found two bug reports on github regarding this very problem
https://github.com/tensorflow/tensorflow/issues/110
https://github.com/bazelbuild/bazel/issues/760
At least I get the impression that getting tensorflow to work on RHEL 6 is at least 'difficult' - as some claim in those two bugreports that they got it to work, with some limitations - if not, at least for now, impossible.
At least for Ubuntu 12.04 and CentOS 6.7 there are solutions. The 2nd answer (mentions CentOS) should work on RHEL 6 as well.
Old/First answer:
According to the link I gathered from this answer, RHEL 6 ships with libc 2.12, not 2.14.
You would have to compile the tensorflow stuff again and link it to an existing libc 2.14 on your system. I'm not quite sure how you were able to compile it without already having libc 2.14 somewhere on your system.
What made the trick for me was updating glibc (in my case to 2.17 version) by:
wget http://copr-be.cloud.fedoraproject.org/results/mosquito/myrepo-el6/epel-6-x86_64/glibc-2.17-55.fc20/glibc-2.17-55.el6.x86_64.rpm
wget http://copr-be.cloud.fedoraproject.org/results/mosquito/myrepo-el6/epel-6-x86_64/glibc-2.17-55.fc20/glibc-common-2.17-55.el6.x86_64.rpm
wget http://copr-be.cloud.fedoraproject.org/results/mosquito/myrepo-el6/epel-6-x86_64/glibc-2.17-55.fc20/glibc-devel-2.17-55.el6.x86_64.rpm
wget http://copr-be.cloud.fedoraproject.org/results/mosquito/myrepo-el6/epel-6-x86_64/glibc-2.17-55.fc20/glibc-headers-2.17-55.el6.x86_64.rpm
sudo rpm -Uvh glibc-2.17-55.el6.x86_64.rpm \
glibc-common-2.17-55.el6.x86_64.rpm \
glibc-devel-2.17-55.el6.x86_64.rpm \
glibc-headers-2.17-55.el6.x86_64.rpm --force --nodeps
I link original answer