I built a python tensorflow package and uploaded to run on ml engine.
"tensorflow-gpu==1.8.0" (no tensorflow) is set to be required in my setup.py.
The ML engine run fails at "import tensorflow as tf" saying "No module named tensorflow".
The ML engine run works fine when I'm only requiring "tensorflow==1.8.0" but I believe tensorflow-gpu is needed to use GPU.
Any ideas how to solve this issue?
Thanks
You need to set --runtime-version=1.8 when submitting the job. Consequently, you don't need to manually specify TF in setup.py. In fact, if that's the only package you are requiring, you can omit setup.py altogether.
Update 2018/06/29:
Explanation: different versions of TensorFlow require different versions of NVIDIA's drivers and software stack. The --runtime-version is guaranteed to have the right version of the drivers for that particular version of TensorFlow. You can technically set the version of tensorflow-gpu in your setup.py, but that version must be compatible with the NVIDIA stack present in the --runtime-version you've selected (defaults to the very old TF 1.0).
This also happens when you have multiple python versions. In that case you have to specify the relevant python version for tf installation. For example,"python3 setup.py" instead of "python setup.py".
Related
I'm currently working on a program to play a game similar to atari-games. I'm using keras (python 3). I finished writing the code and I want to test it, and I have few questions about the process:
first of all, I have trouble importing tesnorflow for some reason. I've installed it using pip. I've made sure to created new env. before the installation (which finished successfully), but when I try to run my program it says:
ModuleNotFoundError: No module named 'tensorflow'
I also, tried to install the package from within pycharm, but then I get this error:
Could not find a version that satisfies the requirement tensorflow (from versions: )
No matching distribution found for tensorflow
I've checked program requirements (such as pip, python, virtualenv and setuptools versions) and everything seems up to date. perhaps someone could point out what else might be the problem?
Is there any other way I can test the performance of my program?
Thank you very much for your time and attention.
Anaconda is a complete time-saver. I suggest create an enviornment using Anaconda and install the tensorflow by conda install tensorflow If you would like to use the gpu version, conda automatically installs the CUDA and cudnn for you too.
I am trying to use tensorflow gpu version to train and test my deep learning model. But here comes the problem. When I train my model in one python file things go on well. Tensorflow-gpu can be used properly. Then I save my model as a pretrained on as grapg.pb format and try to reuse it in another python file.
Then I got the following error messages.
E tensorflow/stream_executor/cuda/cuda_dnn.cc:363] Loaded runtime CuDNN
library: 7.1.4 but source was compiled with: 7.2.1. CuDNN library major
and minor version needs to match or have higher minor version in case of
CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN
library. If building from sources, make sure the library loaded at runtime
is compatible with the version specified during compile configuration.
I checked my cudnn version, in fact it is version 7.4.2. I also checked my environment path settings /cuda/v9.0/bin, cuda/v9.0/lib/x64, /cuda/v9.0/include are in there.
So why this happens? And how can I solve this?
--
cuda:v9.0
cudnn:7.4.2 (I think, I copy those cudnn files manually)
windows 10
python: 3.5
If you have multiple CuDNN installed thorough various ways like anaconda module and windows installation, you need to remove the older version in order for your code to detect the latest version and reinstall tensorflow-gpu.
You can follow this guide for installation based on OS.
I am wondering if there is a programmatic way to find out against which CUDA and CUDNN version an installed tensorflow version was built?
For example, a compiled Tensorflow installation can return which CXX11_ABI_FLAG was used while it was built:
python -c "import tensorflow as tf; print(tf.CXX11_ABI_FLAG)" ->0
The background is the following:
I'm building Tensorflow OPs according to adding an op with tensorflow binary installation. This uses pre-compiled TF to retrieve the required include paths and compile flags, to make sure that Tensorflow and new Op are compatible.
But since our systems have multiple CUDA & CUDNN versions, I also need to provide the path of the desired versions to the compiler. E.g. for Cuda 8.0 the following flag -L /usr/local/cuda-8.0/lib64/ to specify its path.
But this also opens up an error path, as the Op can now be built successfully with a different CUDA/CUDNN version, which leads to errors during run-time way later.
So I want to create a safety check, to ensure that the CUDA/CUDNN paths lead to the same versions, as the ones Tensorflow was built against.
1.Create a new environment through conda create --name tftest. (You can replace tftest with e.g. the name of your current project.)
2.Activate that new environment through activate tftest.
3.Install TF into this environment through conda install tensorflow.
4.Ensure that you're in the right environment through where python (which should produce a path containing "tftest").
5.Run Python through python.
6.import tensorflow as tf in a shell in that environment.
Thanks to great community as I found this thanks to another post!!!
Starting with version 1.6.0, prebuild binaries need AVX instructions.
There are some bug reports by people who tried to use the precompiled binaries but whose doesn't support AVX instructions and got the same error as you posted here:
https://github.com/tensorflow/tensorflow/issues/17761
https://github.com/tensorflow/tensorflow/issues/17386
Maybe you have this problem? If yes, you may have to build tensorflow from sources or downgrade to tensorflow 1.5.1.
Upon trying to install Tensorflow for conda environment, I encountered with the following error message, without any progress:
tensorflow-1.1.0-cp35-cp35mwin_amd64.whl is not a supported wheel on this platform
Have you tried uninstalling and re-installing TensorFlow using pip within your Conda environment? I.e.:
pip uninstall tensorflow
Followed by:
pip install tensorflow
If it doesn't work, the issue may be with your Python installation. TensorFlow only supports 64-bit Python 3.5+ on Windows (see more info here).
Perhaps you have Python's default installation, which comes in a 32-bit version. If that's the case, you can download the 64-bit Python 3.5 or later from here to run in your Conda environment and then you should be able to install/run TensorFlow without any issues.
Make sure that the Python version installed in the Environment is 3.5 not 3.6. Since 3.6 was released Conda automatically sets that version as default for python 3. However, it is still not supported by Tensorflow.
You can work using tensorflow library along with other essential libraries using the Dockerfile. Using Docker for environment are a good way to run experiments in reproducible manner as in this blog
You can also try using datmo in order setup environment and track machine learning projects for making it reproducible using datmo CLI tool.