Problem
I am trying to create a conda environment and it seems that conda/mamba can't resolve tensorflow_probability dependencies with respect to tensorflow.
More context
I am trying to create a conda environment with the following requirements using mamba:
numpy
pandas
mitosheet
jupyter
tensorflow-gpu
tensorflow-probability
scikit-learn
matplotlib
scikit-survival
openpyx
wandb
All works fine during the installation and I end up with the following installed versions:
$ conda list tensorflow
# packages in environment at /home/luca/miniconda3/envs/liver:
#
# Name Version Build Channel
tensorflow 2.4.1 gpu_py39h8236f22_0
tensorflow-base 2.4.1 gpu_py39h29c2da4_0
tensorflow-estimator 2.6.0 py39he80948d_0 conda-forge
tensorflow-gpu 2.4.1 h30adc30_0
tensorflow-probability 0.15.0 pyhd8ed1ab_0 conda-forge
However, when I try to use it I get a version incompatibility error. Precisely, when I run:
import tensorflow as tf
import tensorflow_probability as tfp
I get this error:
2022-12-21 11:32:29.516053: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Input In [1], in <cell line: 2>()
1 import tensorflow as tf
----> 2 import tensorflow_probability as tfp
File ~/miniconda3/envs/liver/lib/python3.9/site-packages/tensorflow_probability/__init__.py:20, in <module>
15 """Tools for probabilistic reasoning in TensorFlow."""
17 # Contributors to the `python/` dir should not alter this file; instead update
18 # `python/__init__.py` as necessary.
---> 20 from tensorflow_probability import substrates
21 # from tensorflow_probability.google import staging # DisableOnExport
22 # from tensorflow_probability.google import tfp_google # DisableOnExport
23 from tensorflow_probability.python import * # pylint: disable=wildcard-import
File ~/miniconda3/envs/liver/lib/python3.9/site-packages/tensorflow_probability/substrates/__init__.py:17, in <module>
1 # Copyright 2019 The TensorFlow Probability Authors.
2 #
3 # Licensed under the Apache License, Version 2.0 (the "License");
(...)
13 # limitations under the License.
14 # ============================================================================
15 """TensorFlow Probability alternative substrates."""
---> 17 from tensorflow_probability.python.internal import all_util
18 from tensorflow_probability.python.internal import lazy_loader # pylint: disable=g-direct-tensorflow-import
21 jax = lazy_loader.LazyLoader(
22 'jax', globals(),
23 'tensorflow_probability.substrates.jax')
File ~/miniconda3/envs/liver/lib/python3.9/site-packages/tensorflow_probability/python/__init__.py:138, in <module>
135 if _tf_loaded():
136 # Non-lazy load of packages that register with tensorflow or keras.
137 for pkg_name in _maybe_nonlazy_load:
--> 138 dir(globals()[pkg_name]) # Forces loading the package from its lazy loader.
141 all_util.remove_undocumented(__name__, _lazy_load + _maybe_nonlazy_load)
File ~/miniconda3/envs/liver/lib/python3.9/site-packages/tensorflow_probability/python/internal/lazy_loader.py:57, in LazyLoader.__dir__(self)
56 def __dir__(self):
---> 57 module = self._load()
58 return dir(module)
File ~/miniconda3/envs/liver/lib/python3.9/site-packages/tensorflow_probability/python/internal/lazy_loader.py:37, in LazyLoader._load(self)
35 """Load the module and insert it into the parent's globals."""
36 if callable(self._on_first_access):
---> 37 self._on_first_access()
38 self._on_first_access = None
39 # Import the target module and insert it into the parent's namespace
File ~/miniconda3/envs/liver/lib/python3.9/site-packages/tensorflow_probability/python/__init__.py:59, in _validate_tf_environment(package)
55 # required_tensorflow_version = '1.15' # Needed internally -- DisableOnExport
57 if (distutils.version.LooseVersion(tf.__version__) <
58 distutils.version.LooseVersion(required_tensorflow_version)):
---> 59 raise ImportError(
60 'This version of TensorFlow Probability requires TensorFlow '
61 'version >= {required}; Detected an installation of version {present}. '
62 'Please upgrade TensorFlow to proceed.'.format(
63 required=required_tensorflow_version,
64 present=tf.__version__))
66 if (package == 'mcmc' and
67 tf.config.experimental.tensor_float_32_execution_enabled()):
68 # Must import here, because symbols get pruned to __all__.
69 import warnings
ImportError: This version of TensorFlow Probability requires TensorFlow version >= 2.7; Detected an installation of version 2.4.1. Please upgrade TensorFlow to proceed.
Solving attempts
Looking online I found some related discussions in [1] and [2], where they seem to point at some problem in resolving dependencies of tensorflow/tensorflow_probability packages on conda repositories (unfortunately I am not sufficiently competent to fully understand the whole thread).
The same conclusion seems suggested also in conda-forge documentation with some additional reference to cudatoolkit.
From here, I tried to:
upgrade tensorflow:
$ CONDA_OVERRIDE_CUDA="11.2" mamba install -c conda-forge tensorflow-gpu==2.7=cuda112*
Looking for: ['tensorflow-gpu==2.7[build=cuda112*]']
pkgs/r/linux-64 No change
pkgs/main/noarch No change
pkgs/main/linux-64 No change
pkgs/r/noarch No change
conda-forge/noarch # 3.4MB/s 3.5s
conda-forge/linux-64 # 3.3MB/s 9.8s
Pinned packages:
- python 3.9.*
Could not solve for environment specs
Encountered problems while solving:
- package libxml2-2.10.3-h7463322_0 requires icu >=70.1,<71.0a0, but none of the providers can be installed
The environment can't be solved, aborting the operation
I also tried with conda directly but this is what I get:
$ CONDA_OVERRIDE_CUDA="11.2" conda install -c conda-forge tensorflow-gpu==2.7=cuda112*
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: - /
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed
UnsatisfiableError: The following specifications were found to be incompatible with each other:
Output in format: Requested package -> Available versionsThe following specifications were found to be incompatible with your system:
- feature:/linux-64::__cuda==11.2=0
- feature:|#/linux-64::__cuda==11.2=0
- tensorflow-gpu==2.7[build=cuda112*] -> tensorflow==2.7.0=cuda112py37h01c6645_0 -> __cuda
Your installed version is: 11.2
Additional info
I installed the packages one by one manually using Miniconda with conda==4.12.0 and running mamba install (mamba=1.1.0). I followed the order in the above requirements.
If I check the evolution of versions at each step with conda list --revisions, I can see that tensorflow VS tensorflow_probability probability was first resolved correctly as it installed tensorflow==2.10.0 and tensorflow_probability=0.19.0 (see rev 1 and rev 2 in the output below).
However, I guess the environment was broken when installing matplotlib (see rev 4).
Here's the full output, where rev 0 comes from cloning another environment:
2022-12-20 17:16:58 (rev 0)
+_libgcc_mutex-0.1 (conda-forge/linux-64)
+_openmp_mutex-4.5 (conda-forge/linux-64)
+alsa-lib-1.2.8 (conda-forge/linux-64)
+anyio-3.5.0 (anaconda/linux-64)
+argon2-cffi-21.3.0 (anaconda/noarch)
+argon2-cffi-bindings-21.2.0 (anaconda/linux-64)
+asttokens-2.0.5 (anaconda/noarch)
+attr-2.5.1 (conda-forge/linux-64)
+attrs-21.4.0 (anaconda/noarch)
+autopep8-2.0.1 (conda-forge/noarch)
+babel-2.9.1 (anaconda/noarch)
+backcall-0.2.0 (anaconda/noarch)
+beautifulsoup4-4.11.1 (anaconda/linux-64)
+blas-1.0 (defaults/linux-64)
+bleach-4.1.0 (anaconda/noarch)
+bottleneck-1.3.5 (defaults/linux-64)
+brotlipy-0.7.0 (defaults/linux-64)
+bzip2-1.0.8 (defaults/linux-64)
+c-ares-1.18.1 (defaults/linux-64)
+ca-certificates-2022.12.7 (conda-forge/linux-64)
+certifi-2022.12.7 (conda-forge/noarch)
+cffi-1.15.1 (defaults/linux-64)
+charset-normalizer-2.0.4 (defaults/noarch)
+conda-package-handling-1.9.0 (defaults/linux-64)
+cryptography-38.0.1 (defaults/linux-64)
+dbus-1.13.18 (anaconda/linux-64)
+debugpy-1.5.1 (anaconda/linux-64)
+decorator-5.1.1 (anaconda/noarch)
+defusedxml-0.7.1 (anaconda/noarch)
+entrypoints-0.4 (anaconda/linux-64)
+executing-0.8.3 (anaconda/noarch)
+expat-2.5.0 (conda-forge/linux-64)
+fftw-3.3.10 (conda-forge/linux-64)
+fmt-9.1.0 (conda-forge/linux-64)
+font-ttf-dejavu-sans-mono-2.37 (conda-forge/noarch)
+font-ttf-inconsolata-3.000 (conda-forge/noarch)
+font-ttf-source-code-pro-2.038 (conda-forge/noarch)
+font-ttf-ubuntu-0.83 (conda-forge/noarch)
+fontconfig-2.14.1 (conda-forge/linux-64)
+fonts-conda-ecosystem-1 (conda-forge/noarch)
+fonts-conda-forge-1 (conda-forge/noarch)
+freetype-2.12.1 (conda-forge/linux-64)
+gettext-0.21.1 (conda-forge/linux-64)
+giflib-5.2.1 (anaconda/linux-64)
+glib-2.74.1 (conda-forge/linux-64)
+glib-tools-2.74.1 (conda-forge/linux-64)
+gst-plugins-base-1.21.2 (conda-forge/linux-64)
+gstreamer-1.21.2 (conda-forge/linux-64)
+gstreamer-orc-0.4.33 (conda-forge/linux-64)
+icu-70.1 (conda-forge/linux-64)
+idna-3.4 (defaults/linux-64)
+intel-openmp-2021.4.0 (defaults/linux-64)
+ipykernel-6.9.1 (anaconda/linux-64)
+ipython-8.4.0 (anaconda/linux-64)
+ipython_genutils-0.2.0 (anaconda/noarch)
+ipywidgets-7.6.5 (anaconda/noarch)
+jack-1.9.21 (conda-forge/linux-64)
+jedi-0.18.1 (anaconda/linux-64)
+jinja2-3.0.3 (anaconda/noarch)
+jpeg-9e (anaconda/linux-64)
+json5-0.9.6 (anaconda/noarch)
+jsonschema-4.4.0 (anaconda/linux-64)
+jupyter-1.0.0 (anaconda/linux-64)
+jupyter_client-7.2.2 (anaconda/linux-64)
+jupyter_console-6.4.3 (anaconda/noarch)
+jupyter_contrib_core-0.4.0 (conda-forge/noarch)
+jupyter_contrib_nbextensions-0.7.0 (conda-forge/noarch)
+jupyter_core-4.10.0 (anaconda/linux-64)
+jupyter_highlight_selected_word-0.2.0 (conda-forge/linux-64)
+jupyter_latex_envs-1.4.6 (conda-forge/linux-64)
+jupyter_nbextensions_configurator-0.6.1 (conda-forge/noarch)
+jupyter_server-1.18.1 (anaconda/linux-64)
+jupyterlab-3.4.4 (anaconda/linux-64)
+jupyterlab_pygments-0.1.2 (anaconda/noarch)
+jupyterlab_server-2.12.0 (anaconda/linux-64)
+jupyterlab_widgets-1.0.0 (anaconda/noarch)
+keyutils-1.6.1 (conda-forge/linux-64)
+krb5-1.20.1 (conda-forge/linux-64)
+lame-3.100 (conda-forge/linux-64)
+ld_impl_linux-64-2.38 (defaults/linux-64)
+lerc-4.0.0 (conda-forge/linux-64)
+libarchive-3.5.2 (conda-forge/linux-64)
+libcap-2.66 (conda-forge/linux-64)
+libclang-15.0.6 (conda-forge/linux-64)
+libclang13-15.0.6 (conda-forge/linux-64)
+libcups-2.3.3 (conda-forge/linux-64)
+libcurl-7.86.0 (conda-forge/linux-64)
+libdb-6.2.32 (conda-forge/linux-64)
+libdeflate-1.14 (conda-forge/linux-64)
+libedit-3.1.20221030 (defaults/linux-64)
+libev-4.33 (defaults/linux-64)
+libevent-2.1.10 (conda-forge/linux-64)
+libffi-3.4.2 (conda-forge/linux-64)
+libflac-1.4.2 (conda-forge/linux-64)
+libgcc-ng-12.2.0 (conda-forge/linux-64)
+libgcrypt-1.10.1 (conda-forge/linux-64)
+libgfortran-ng-12.2.0 (conda-forge/linux-64)
+libgfortran5-12.2.0 (conda-forge/linux-64)
+libglib-2.74.1 (conda-forge/linux-64)
+libgomp-12.2.0 (conda-forge/linux-64)
+libgpg-error-1.45 (conda-forge/linux-64)
+libiconv-1.17 (conda-forge/linux-64)
+libllvm10-10.0.1 (anaconda/linux-64)
+libllvm15-15.0.6 (conda-forge/linux-64)
+libmamba-1.1.0 (conda-forge/linux-64)
+libmambapy-1.1.0 (conda-forge/linux-64)
+libnghttp2-1.47.0 (conda-forge/linux-64)
+libnsl-2.0.0 (conda-forge/linux-64)
+libogg-1.3.4 (conda-forge/linux-64)
+libopus-1.3.1 (conda-forge/linux-64)
+libpng-1.6.39 (conda-forge/linux-64)
+libpq-15.1 (conda-forge/linux-64)
+libsndfile-1.1.0 (conda-forge/linux-64)
+libsodium-1.0.18 (anaconda/linux-64)
+libsolv-0.7.22 (defaults/linux-64)
+libsqlite-3.40.0 (conda-forge/linux-64)
+libssh2-1.10.0 (conda-forge/linux-64)
+libstdcxx-ng-12.2.0 (conda-forge/linux-64)
+libsystemd0-252 (conda-forge/linux-64)
+libtiff-4.5.0 (conda-forge/linux-64)
+libtool-2.4.6 (conda-forge/linux-64)
+libudev1-252 (conda-forge/linux-64)
+libuuid-2.32.1 (conda-forge/linux-64)
+libvorbis-1.3.7 (conda-forge/linux-64)
+libwebp-1.2.4 (conda-forge/linux-64)
+libwebp-base-1.2.4 (conda-forge/linux-64)
+libxcb-1.13 (conda-forge/linux-64)
+libxkbcommon-1.0.3 (conda-forge/linux-64)
+libxml2-2.10.3 (conda-forge/linux-64)
+libxslt-1.1.37 (conda-forge/linux-64)
+libzlib-1.2.13 (conda-forge/linux-64)
+lxml-4.9.2 (conda-forge/linux-64)
+lz4-c-1.9.4 (defaults/linux-64)
+lzo-2.10 (conda-forge/linux-64)
+markupsafe-2.1.1 (anaconda/linux-64)
+matplotlib-inline-0.1.2 (anaconda/noarch)
+mistune-0.8.4 (anaconda/linux-64)
+mkl-2021.4.0 (defaults/linux-64)
+mkl-service-2.4.0 (defaults/linux-64)
+mkl_fft-1.3.1 (defaults/linux-64)
+mkl_random-1.2.2 (defaults/linux-64)
+mpg123-1.31.1 (conda-forge/linux-64)
+mysql-common-8.0.31 (conda-forge/linux-64)
+mysql-libs-8.0.31 (conda-forge/linux-64)
+nb_conda_kernels-2.3.1 (conda-forge/linux-64)
+nbclassic-0.3.5 (anaconda/noarch)
+nbclient-0.5.13 (anaconda/linux-64)
+nbconvert-6.4.4 (anaconda/linux-64)
+nbformat-5.3.0 (anaconda/linux-64)
+ncurses-6.3 (defaults/linux-64)
+nest-asyncio-1.5.5 (anaconda/linux-64)
+notebook-6.4.12 (anaconda/linux-64)
+nspr-4.35 (conda-forge/linux-64)
+nss-3.82 (conda-forge/linux-64)
+numexpr-2.8.4 (defaults/linux-64)
+numpy-1.23.4 (defaults/linux-64)
+numpy-base-1.23.4 (defaults/linux-64)
+openssl-3.0.7 (conda-forge/linux-64)
+packaging-21.3 (anaconda/noarch)
+pandas-1.5.2 (defaults/linux-64)
+pandocfilters-1.5.0 (anaconda/noarch)
+parso-0.8.3 (anaconda/noarch)
+pcre-8.45 (anaconda/linux-64)
+pcre2-10.37 (defaults/linux-64)
+pexpect-4.8.0 (anaconda/noarch)
+pickleshare-0.7.5 (anaconda/noarch)
+pip-22.3.1 (conda-forge/noarch)
+ply-3.11 (anaconda/linux-64)
+prometheus_client-0.14.1 (anaconda/linux-64)
+prompt-toolkit-3.0.20 (anaconda/noarch)
+prompt_toolkit-3.0.20 (anaconda/noarch)
+pthread-stubs-0.4 (conda-forge/linux-64)
+ptyprocess-0.7.0 (anaconda/noarch)
+pulseaudio-16.1 (conda-forge/linux-64)
+pure_eval-0.2.2 (anaconda/noarch)
+pybind11-abi-4 (conda-forge/noarch)
+pycodestyle-2.10.0 (conda-forge/noarch)
+pycosat-0.6.4 (defaults/linux-64)
+pycparser-2.21 (defaults/noarch)
+pygments-2.11.2 (anaconda/noarch)
+pyopenssl-22.0.0 (defaults/noarch)
+pyparsing-3.0.4 (anaconda/noarch)
+pyqt-5.15.7 (conda-forge/linux-64)
+pyqt5-sip-12.11.0 (conda-forge/linux-64)
+pyrsistent-0.18.0 (anaconda/linux-64)
+pysocks-1.7.1 (defaults/linux-64)
+python-3.9.15 (conda-forge/linux-64)
+python-dateutil-2.8.2 (anaconda/noarch)
+python-fastjsonschema-2.15.1 (anaconda/noarch)
+python_abi-3.9 (conda-forge/linux-64)
+pytz-2022.1 (anaconda/linux-64)
+pyyaml-6.0 (conda-forge/linux-64)
+pyzmq-23.2.0 (anaconda/linux-64)
+qt-main-5.15.6 (conda-forge/linux-64)
+qt-webengine-5.15.4 (conda-forge/linux-64)
+qtconsole-5.3.1 (anaconda/linux-64)
+qtpy-2.0.1 (anaconda/noarch)
+qtwebkit-5.212 (conda-forge/linux-64)
+readline-8.2 (defaults/linux-64)
+reproc-14.2.4 (defaults/linux-64)
+reproc-cpp-14.2.4 (defaults/linux-64)
+requests-2.28.1 (defaults/linux-64)
+ruamel_yaml-0.15.100 (defaults/linux-64)
+send2trash-1.8.0 (anaconda/noarch)
+setuptools-65.5.0 (defaults/linux-64)
+sip-6.7.5 (conda-forge/linux-64)
+six-1.16.0 (anaconda/noarch)
+sniffio-1.2.0 (anaconda/linux-64)
+soupsieve-2.3.1 (anaconda/noarch)
+sqlite-3.40.0 (defaults/linux-64)
+stack_data-0.2.0 (anaconda/noarch)
+terminado-0.13.1 (anaconda/linux-64)
+testpath-0.6.0 (anaconda/linux-64)
+tk-8.6.12 (defaults/linux-64)
+toml-0.10.2 (anaconda/noarch)
+tomli-2.0.1 (conda-forge/noarch)
+tornado-6.1 (anaconda/linux-64)
+tqdm-4.64.1 (defaults/linux-64)
+traitlets-5.1.1 (anaconda/noarch)
+typing-extensions-4.3.0 (anaconda/linux-64)
+typing_extensions-4.3.0 (anaconda/linux-64)
+tzdata-2022g (defaults/noarch)
+urllib3-1.26.13 (defaults/linux-64)
+wcwidth-0.2.5 (anaconda/noarch)
+webencodings-0.5.1 (anaconda/linux-64)
+websocket-client-0.58.0 (anaconda/linux-64)
+wheel-0.38.4 (conda-forge/noarch)
+widgetsnbextension-3.5.2 (anaconda/linux-64)
+xcb-util-0.4.0 (conda-forge/linux-64)
+xcb-util-image-0.4.0 (conda-forge/linux-64)
+xcb-util-keysyms-0.4.0 (conda-forge/linux-64)
+xcb-util-renderutil-0.3.9 (conda-forge/linux-64)
+xcb-util-wm-0.4.1 (conda-forge/linux-64)
+xorg-libxau-1.0.9 (conda-forge/linux-64)
+xorg-libxdmcp-1.1.3 (conda-forge/linux-64)
+xz-5.2.8 (defaults/linux-64)
+yaml-0.2.5 (defaults/linux-64)
+yaml-cpp-0.7.0 (conda-forge/linux-64)
+yapf-0.32.0 (conda-forge/noarch)
+zeromq-4.3.4 (anaconda/linux-64)
+zlib-1.2.13 (conda-forge/linux-64)
+zstd-1.5.2 (defaults/linux-64)
2022-12-20 17:18:38 (rev 1)
krb5 {1.20.1 (conda-forge/linux-64) -> 1.20.1 (conda-forge/linux-64)}
libarchive {3.5.2 (conda-forge/linux-64) -> 3.5.2 (conda-forge/linux-64)}
libcurl {7.86.0 (conda-forge/linux-64) -> 7.86.0 (conda-forge/linux-64)}
libevent {2.1.10 (conda-forge/linux-64) -> 2.1.10 (conda-forge/linux-64)}
libmamba {1.1.0 (conda-forge/linux-64) -> 1.1.0 (conda-forge/linux-64)}
libmambapy {1.1.0 (conda-forge/linux-64) -> 1.1.0 (conda-forge/linux-64)}
libnghttp2 {1.47.0 (conda-forge/linux-64) -> 1.47.0 (conda-forge/linux-64)}
libpq {15.1 (conda-forge/linux-64) -> 15.1 (conda-forge/linux-64)}
libssh2 {1.10.0 (conda-forge/linux-64) -> 1.10.0 (conda-forge/linux-64)}
mysql-common {8.0.31 (conda-forge/linux-64) -> 8.0.31 (conda-forge/linux-64)}
mysql-libs {8.0.31 (conda-forge/linux-64) -> 8.0.31 (conda-forge/linux-64)}
openssl {3.0.7 (conda-forge/linux-64) -> 1.1.1s (conda-forge/linux-64)}
pulseaudio {16.1 (conda-forge/linux-64) -> 16.1 (conda-forge/linux-64)}
python {3.9.15 (conda-forge/linux-64) -> 3.9.15 (conda-forge/linux-64)}
qt-main {5.15.6 (conda-forge/linux-64) -> 5.15.6 (conda-forge/linux-64)}
+absl-py-1.3.0 (conda-forge/noarch)
+aiohttp-3.8.3 (conda-forge/linux-64)
+aiosignal-1.3.1 (conda-forge/noarch)
+astunparse-1.6.3 (conda-forge/noarch)
+async-timeout-4.0.2 (conda-forge/noarch)
+blinker-1.5 (conda-forge/noarch)
+cached-property-1.5.2 (conda-forge/noarch)
+cached_property-1.5.2 (conda-forge/noarch)
+cachetools-5.2.0 (conda-forge/noarch)
+click-8.1.3 (conda-forge/noarch)
+cudatoolkit-11.8.0 (conda-forge/linux-64)
+cudnn-8.4.1.50 (conda-forge/linux-64)
+flatbuffers-2.0.8 (conda-forge/linux-64)
+frozenlist-1.3.3 (conda-forge/linux-64)
+gast-0.4.0 (conda-forge/noarch)
+google-auth-2.15.0 (conda-forge/noarch)
+google-auth-oauthlib-0.4.6 (conda-forge/noarch)
+google-pasta-0.2.0 (conda-forge/noarch)
+grpc-cpp-1.47.1 (conda-forge/linux-64)
+grpcio-1.47.1 (conda-forge/linux-64)
+h5py-3.7.0 (conda-forge/linux-64)
+hdf5-1.12.2 (conda-forge/linux-64)
+importlib-metadata-5.2.0 (conda-forge/noarch)
+keras-2.10.0 (conda-forge/noarch)
+keras-preprocessing-1.1.2 (conda-forge/noarch)
+libabseil-20220623.0 (conda-forge/linux-64)
+libblas-3.9.0 (conda-forge/linux-64)
+libcblas-3.9.0 (conda-forge/linux-64)
+liblapack-3.9.0 (conda-forge/linux-64)
+libprotobuf-3.21.11 (conda-forge/linux-64)
+markdown-3.4.1 (conda-forge/noarch)
+multidict-6.0.2 (conda-forge/linux-64)
+nccl-2.14.3.1 (conda-forge/linux-64)
+oauthlib-3.2.2 (conda-forge/noarch)
+opt_einsum-3.3.0 (conda-forge/noarch)
+protobuf-4.21.11 (conda-forge/linux-64)
+pyasn1-0.4.8 (conda-forge/noarch)
+pyasn1-modules-0.2.7 (conda-forge/noarch)
+pyjwt-2.6.0 (conda-forge/noarch)
+python-flatbuffers-22.12.6 (conda-forge/noarch)
+pyu2f-0.1.5 (conda-forge/noarch)
+re2-2022.06.01 (conda-forge/linux-64)
+requests-oauthlib-1.3.1 (conda-forge/noarch)
+rsa-4.9 (conda-forge/noarch)
+scipy-1.9.3 (conda-forge/linux-64)
+snappy-1.1.9 (conda-forge/linux-64)
+tensorboard-2.10.1 (conda-forge/noarch)
+tensorboard-data-server-0.6.1 (conda-forge/linux-64)
+tensorboard-plugin-wit-1.8.1 (conda-forge/noarch)
+tensorflow-2.10.0 (conda-forge/linux-64)
+tensorflow-base-2.10.0 (conda-forge/linux-64)
+tensorflow-estimator-2.10.0 (conda-forge/linux-64)
+tensorflow-gpu-2.10.0 (conda-forge/linux-64)
+termcolor-2.1.1 (conda-forge/noarch)
+werkzeug-2.2.2 (conda-forge/noarch)
+wrapt-1.14.1 (conda-forge/linux-64)
+yarl-1.8.1 (conda-forge/linux-64)
+zipp-3.11.0 (conda-forge/noarch)
2022-12-20 17:20:43 (rev 2)
+cloudpickle-2.2.0 (conda-forge/noarch)
+dm-tree-0.1.7 (conda-forge/linux-64)
+etils-0.9.0 (conda-forge/noarch)
+importlib_resources-5.10.1 (conda-forge/noarch)
+jax-0.3.17 (conda-forge/noarch)
+jaxlib-0.3.15 (conda-forge/linux-64)
+tensorflow-probability-0.19.0 (conda-forge/noarch)
2022-12-20 17:21:31 (rev 3)
+joblib-1.1.0 (anaconda/noarch)
+scikit-learn-1.1.1 (anaconda/linux-64)
+threadpoolctl-2.2.0 (anaconda/noarch)
2022-12-20 17:24:10 (rev 4)
cudatoolkit {11.8.0 (conda-forge/linux-64) -> 10.1.243 (conda-forge/linux-64)}
cudnn {8.4.1.50 (conda-forge/linux-64) -> 7.6.5.32 (conda-forge/linux-64)}
grpc-cpp {1.47.1 (conda-forge/linux-64) -> 1.51.1 (conda-forge/linux-64)}
grpcio {1.47.1 (conda-forge/linux-64) -> 1.51.1 (conda-forge/linux-64)}
h5py {3.7.0 (conda-forge/linux-64) -> 2.10.0 (conda-forge/linux-64)}
hdf5 {1.12.2 (conda-forge/linux-64) -> 1.10.6 (defaults/linux-64)}
jax {0.3.17 (conda-forge/noarch) -> 0.4.1 (conda-forge/noarch)}
jaxlib {0.3.15 (conda-forge/linux-64) -> 0.4.1 (conda-forge/linux-64)}
keras {2.10.0 (conda-forge/noarch) -> 2.4.3 (conda-forge/noarch)}
krb5 {1.20.1 (conda-forge/linux-64) -> 1.20.1 (conda-forge/linux-64)}
libarchive {3.5.2 (conda-forge/linux-64) -> 3.5.2 (conda-forge/linux-64)}
libcurl {7.86.0 (conda-forge/linux-64) -> 7.86.0 (conda-forge/linux-64)}
libevent {2.1.10 (conda-forge/linux-64) -> 2.1.10 (conda-forge/linux-64)}
libmamba {1.1.0 (conda-forge/linux-64) -> 1.1.0 (conda-forge/linux-64)}
libmambapy {1.1.0 (conda-forge/linux-64) -> 1.1.0 (conda-forge/linux-64)}
libnghttp2 {1.47.0 (conda-forge/linux-64) -> 1.47.0 (conda-forge/linux-64)}
libpq {15.1 (conda-forge/linux-64) -> 15.1 (conda-forge/linux-64)}
libssh2 {1.10.0 (conda-forge/linux-64) -> 1.10.0 (conda-forge/linux-64)}
mysql-common {8.0.31 (conda-forge/linux-64) -> 8.0.31 (conda-forge/linux-64)}
mysql-libs {8.0.31 (conda-forge/linux-64) -> 8.0.31 (conda-forge/linux-64)}
nccl {2.14.3.1 (conda-forge/linux-64) -> 2.11.4.1 (conda-forge/linux-64)}
openssl {1.1.1s (conda-forge/linux-64) -> 3.0.7 (conda-forge/linux-64)}
pulseaudio {16.1 (conda-forge/linux-64) -> 16.1 (conda-forge/linux-64)}
python {3.9.15 (conda-forge/linux-64) -> 3.9.15 (conda-forge/linux-64)}
qt-main {5.15.6 (conda-forge/linux-64) -> 5.15.6 (conda-forge/linux-64)}
tensorboard-data-server {0.6.1 (conda-forge/linux-64) -> 0.6.1 (conda-forge/linux-64)}
tensorflow {2.10.0 (conda-forge/linux-64) -> 2.4.1 (defaults/linux-64)}
tensorflow-base {2.10.0 (conda-forge/linux-64) -> 2.4.1 (defaults/linux-64)}
tensorflow-estimator {2.10.0 (conda-forge/linux-64) -> 2.6.0 (conda-forge/linux-64)}
tensorflow-gpu {2.10.0 (conda-forge/linux-64) -> 2.4.1 (defaults/linux-64)}
tensorflow-probability {0.19.0 (conda-forge/noarch) -> 0.15.0 (conda-forge/noarch)}
+_tflow_select-2.1.0 (defaults/linux-64)
+astor-0.8.1 (conda-forge/noarch)
+brotli-1.0.9 (conda-forge/linux-64)
+brotli-bin-1.0.9 (conda-forge/linux-64)
+contourpy-1.0.6 (conda-forge/linux-64)
+cupti-10.1.168 (defaults/linux-64)
+cycler-0.11.0 (conda-forge/noarch)
+fonttools-4.38.0 (conda-forge/linux-64)
+kiwisolver-1.4.4 (conda-forge/linux-64)
+lcms2-2.14 (conda-forge/linux-64)
+libbrotlicommon-1.0.9 (conda-forge/linux-64)
+libbrotlidec-1.0.9 (conda-forge/linux-64)
+libbrotlienc-1.0.9 (conda-forge/linux-64)
+libgrpc-1.51.1 (conda-forge/linux-64)
+matplotlib-3.6.2 (conda-forge/linux-64)
+matplotlib-base-3.6.2 (conda-forge/linux-64)
+munkres-1.1.4 (conda-forge/noarch)
+openjpeg-2.5.0 (conda-forge/linux-64)
+pillow-9.2.0 (conda-forge/linux-64)
+unicodedata2-15.0.0 (conda-forge/linux-64)
2022-12-20 17:25:29 (rev 5)
scikit-learn {1.1.1 (anaconda/linux-64) -> 1.1.3 (conda-forge/linux-64)}
+ecos-2.0.11 (conda-forge/linux-64)
+libqdldl-0.1.5 (conda-forge/linux-64)
+osqp-0.6.2.post0 (conda-forge/linux-64)
+qdldl-python-0.1.5.post2 (conda-forge/linux-64)
+scikit-survival-0.19.0.post1 (conda-forge/linux-64)
2022-12-20 17:34:12 (rev 6)
+docker-pycreds-0.4.0 (conda-forge/noarch)
+gitdb-4.0.10 (conda-forge/noarch)
+gitpython-3.1.29 (conda-forge/noarch)
+pathtools-0.1.2 (conda-forge/noarch)
+promise-2.3 (conda-forge/linux-64)
+psutil-5.9.4 (conda-forge/linux-64)
+sentry-sdk-1.12.1 (conda-forge/noarch)
+setproctitle-1.3.2 (conda-forge/linux-64)
+shortuuid-1.0.11 (conda-forge/noarch)
+smmap-3.0.5 (conda-forge/noarch)
+wandb-0.13.7 (conda-forge/noarch)
Edit: the imports apparently work with inverted order:
import tensorflow_probability as tfp
import tensorflow as tf
print(f"{tf.__version__=}, {tfp.__version__=}")
gives tf.__version__='2.4.1', tfp.__version__='0.15.0'
At the end I solved by creating the conda environment and then installing all packages via pip directly, as per tensorflow documentation.
IMPORTANTLY: I figured that in general it is advised to install as many packages as possible using conda and then use pip for the unresolved dependencies. After that, the environment should better be freezed, and a new environment should be created for installing new packages.
Related
I have one problem, my jupyter Notebook does not run on the gpu. I updated my Driver (Nvidia GTX 1660 Ti), installed CUDA 11, put the CuDNN-files into the folders and put the correct path in the environmental variables.
After doing that, I added a new environment to Anaconda including a GPU-kernel and installed tensorflow-gpu (Version 2.4, because CUDA 11 needs Version >= 2.4.0), like it is explained in this video.
After that I opened jupyter notebook with the new kernel. So I can run my code and until a certain step that works, but my GPU utilization in the task manager is below 1% and my RAM is at 60%-99%. So I think, my code isn't running on the GPU. I made a few tests:
import tensorflow.keras
import tensorflow as tf
print(tf.__version__)
print(tensorflow.keras.__version__)
print(tf.test.is_built_with_cuda())
print(tf.config.list_physical_devices('GPU'))
print(tf.test.is_gpu_available())
leads to (What I think it is correct):
2.4.0
2.4.0
True
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
True
The next test is:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
What leads to:
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 9334837591848971536
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 4837251481
locality {
bus_id: 1
links {
}
}
incarnation: 2660164806064353779
physical_device_desc: "device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5"
]
So there is CPU and GPU in that kernel, isn't it?
What can I do, that my Neural Network is running on the GPU and not on the CPU?
My Code is running until I try to train my neural network. This is the code and the error occurring:
model.fit([np.asarray(X_train).astype(np.float32), np.asarray(X_train_zusatz).astype(np.float32)],
y_train, epochs=10, batch_size=10)
It is a concatenated neural network, if you are wondering about the input and it works quite fine with normal tensorflow (not tensorflow-gpu). But training need a very very long time.
Epoch 1/10
---------------------------------------------------------------------------
ResourceExhaustedError Traceback (most recent call last)
<ipython-input-27-10813edc74c8> in <module>
3
4 model.fit([np.asarray(X_train).astype(np.float32), np.asarray(X_train_zusatz).astype(np.float32)],
----> 5 y_train, epochs=10, batch_size=10)#,
6 #validation_data=[[X_test, X_test_zusatz], y_test], class_weight=class_weight)
~\.conda\envs\tf-gpu\lib\site-pac
kages\tensorflow\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
1098 _r=1):
1099 callbacks.on_train_batch_begin(step)
-> 1100 tmp_logs = self.train_function(iterator)
1101 if data_handler.should_sync:
1102 context.async_wait()
~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\def_function.py in __call__(self, *args, **kwds)
826 tracing_count = self.experimental_get_tracing_count()
827 with trace.Trace(self._name) as tm:
--> 828 result = self._call(*args, **kwds)
829 compiler = "xla" if self._experimental_compile else "nonXla"
830 new_tracing_count = self.experimental_get_tracing_count()
~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\def_function.py in _call(self, *args, **kwds)
886 # Lifting succeeded, so variables are initialized and we can run the
887 # stateless function.
--> 888 return self._stateless_fn(*args, **kwds)
889 else:
890 _, _, _, filtered_flat_args = \
~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\function.py in __call__(self, *args, **kwargs)
2941 filtered_flat_args) = self._maybe_define_function(args, kwargs)
2942 return graph_function._call_flat(
-> 2943 filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access
2944
2945 #property
~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
1917 # No tape is watching; skip to running the function.
1918 return self._build_call_outputs(self._inference_function.call(
-> 1919 ctx, args, cancellation_manager=cancellation_manager))
1920 forward_backward = self._select_forward_and_backward_functions(
1921 args,
~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\function.py in call(self, ctx, args, cancellation_manager)
558 inputs=args,
559 attrs=attrs,
--> 560 ctx=ctx)
561 else:
562 outputs = execute.execute_with_cancellation(
~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
58 ctx.ensure_initialized()
59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60 inputs, attrs, num_outputs)
61 except core._NotOkStatusException as e:
62 if name is not None:
ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[300,300] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node model/lstm/while/body/_1/model/lstm/while/lstm_cell/split}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[gradient_tape/model/embedding/embedding_lookup/Reshape/_74]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: OOM when allocating tensor with shape[300,300] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node model/lstm/while/body/_1/model/lstm/while/lstm_cell/split}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_4691]
Function call stack:
train_function -> train_function
Why is this error occurring?
-Update-
This is how my "nvidia-smi" looks like during training my model (After ca. 20 seconds of training).
Thank you and best regards,
Daniel
I recently finished the Image super-resolution using Autoencoders in Coursera and when I try to run the same code on my laptop using Spyder and Jupyter notebook, I keep getting this error.
I am using Nvidia GeForce 1650Ti along with Tensorflow-gpu=2.3.0, CUDA=10.1, cuDNN=7.6.5 and python=3.8.5. I have used the same configurations for running many deep neural network problems and none of them gave this error.
Code:
# Image Super Resolution using Autoencoder
# Loading the Images
x_train_n = []
x_train_down = []
x_train_n2 = []
x_train_down2 = []
import tensorflow as tf
gpu_options = tf.compat.v1.GPUOptions(per_process_gpu_memory_fraction = 0.95)
session = tf.compat.v1.Session(config=tf.compat.v1.ConfigProto(gpu_options=gpu_options))
path = 'D:/GPU testing/Image Super Resolution/data/cars_train/'
images = os.listdir(path)
size = 0
for a in images:
try:
img = image.load_img(str(path+a), target_size=(64,64,3))
img_1 = image.img_to_array(img)
img_1 = img_1/255.
x_train_n.append(img_1)
dwn2 = rescale(rescale(img_1, 0.5, multichannel=True),
2.0, multichannel=True)
img_2 = image.img_to_array(dwn2)
x_train_down.append(img_2)
size+= 1
except:
print("Error loading image")
size += 1
if size >= 64:
break
x_train_n2 = np.array(x_train_n)
print(x_train_n2.shape)
x_train_down2 = np.array(x_train_down)
print(x_train_down2.shape)
# Building a Model
from tensorflow.keras.layers import Input, Dense, Conv2D, MaxPooling2D, Dropout, Conv2DTranspose, UpSampling2D, add
from tensorflow.keras.models import Model
from tensorflow.keras import regularizers
from tensorflow.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
# Building the Encoder
input_img = Input(shape=(64, 64, 3))
l1 = Conv2D(64, (3, 3), padding='same', activation='relu',
activity_regularizer=regularizers.l1(10e-10))(input_img)
l2 = Conv2D(64, (3, 3), padding='same', activation='relu',
activity_regularizer=regularizers.l1(10e-10))(l1)
l3 = MaxPooling2D(padding='same')(l2)
l3 = Dropout(0.3)(l3)
l4 = Conv2D(128, (3, 3), padding='same', activation='relu',
activity_regularizer=regularizers.l1(10e-10))(l3)
l5 = Conv2D(128, (3, 3), padding='same', activation='relu',
activity_regularizer=regularizers.l1(10e-10))(l4)
l6 = MaxPooling2D(padding='same')(l5)
l7 = Conv2D(256, (3, 3), padding='same', activation='relu',
activity_regularizer=regularizers.l1(10e-10))(l6)
# Building the Decoder
l8 = UpSampling2D()(l7)
l9 = Conv2D(128, (3, 3), padding='same', activation='relu',
activity_regularizer=regularizers.l1(10e-10))(l8)
l10 = Conv2D(128, (3, 3), padding='same', activation='relu',
activity_regularizer=regularizers.l1(10e-10))(l9)
l11 = add([l5, l10])
l12 = UpSampling2D()(l11)
l13 = Conv2D(64, (3, 3), padding='same', activation='relu',
activity_regularizer=regularizers.l1(10e-10))(l12)
l14 = Conv2D(64, (3, 3), padding='same', activation='relu',
activity_regularizer=regularizers.l1(10e-10))(l13)
l15 = add([l14, l2])
# chan = 3, for RGB
decoded = Conv2D(3, (3, 3), padding='same', activation='relu',
activity_regularizer=regularizers.l1(10e-10))(l15)
# Create our network
autoencoder = Model(input_img, decoded)
autoencoder_hfenn = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='mean_squared_error')
autoencoder.summary()
# Training the Model
history = autoencoder.fit(x_train_down2, x_train_n2,
epochs=20,
batch_size=16,
validation_steps=100,
shuffle=True,
validation_split=0.15)
# Saving the Model
autoencoder.save('ISR_model_weight.h5')
# Represeting Model as JSON String
autoencoder_json = autoencoder.to_json()
with open('ISR_model.json', 'w') as json_file:
json_file.write(autoencoder_json)
Error:
2020-09-18 20:44:23.655077: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3891 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1650 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-09-18 20:44:23.658359: I tensorflow/stream_executor/cuda/cuda_driver.cc:775] failed to allocate 3.80G (4080218880 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-09-18 20:44:23.659070: I tensorflow/stream_executor/cuda/cuda_driver.cc:775] failed to allocate 3.42G (3672196864 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-09-18 20:44:25.560185: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
Traceback (most recent call last):
File "D:\GPU testing\Image Super Resolution\Image Super Resolution using Autoencoders.py", line 126, in <module>
history = autoencoder.fit(x_train_down2, x_train_n2,
File "D:\anaconda3\envs\tensorflow_gpu\lib\site-packages\tensorflow\python\keras\engine\training.py", line 108, in _method_wrapper
return method(self, *args, **kwargs)
File "D:\anaconda3\envs\tensorflow_gpu\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1098, in fit
tmp_logs = train_function(iterator)
File "D:\anaconda3\envs\tensorflow_gpu\lib\site-packages\tensorflow\python\eager\def_function.py", line 780, in __call__
result = self._call(*args, **kwds)
File "D:\anaconda3\envs\tensorflow_gpu\lib\site-packages\tensorflow\python\eager\def_function.py", line 840, in _call
return self._stateless_fn(*args, **kwds)
File "D:\anaconda3\envs\tensorflow_gpu\lib\site-packages\tensorflow\python\eager\function.py", line 2829, in __call__
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "D:\anaconda3\envs\tensorflow_gpu\lib\site-packages\tensorflow\python\eager\function.py", line 1843, in _filtered_call
return self._call_flat(
File "D:\anaconda3\envs\tensorflow_gpu\lib\site-packages\tensorflow\python\eager\function.py", line 1923, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "D:\anaconda3\envs\tensorflow_gpu\lib\site-packages\tensorflow\python\eager\function.py", line 545, in call
outputs = execute.execute(
File "D:\anaconda3\envs\tensorflow_gpu\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node functional_1/conv2d/Relu (defined at D:\GPU testing\Image Super Resolution\Image Super Resolution using Autoencoders.py:126) ]] [Op:__inference_train_function_2246]
Function call stack:
train_function
2020-09-18 20:44:19.489732: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-09-18 20:44:21.291233: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-09-18 20:44:21.306618: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x22a29eaa6b0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-09-18 20:44:21.308804: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-09-18 20:44:21.310433: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2020-09-18 20:44:22.424648: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1650 Ti computeCapability: 7.5
coreClock: 1.485GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 178.84GiB/s
2020-09-18 20:44:22.425736: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-09-18 20:44:22.468696: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-09-18 20:44:23.161235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-18 20:44:23.161847: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2020-09-18 20:44:23.162188: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2020-09-18 20:44:23.162708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3891 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1650 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-09-18 20:44:23.167626: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x22a52959fb0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-09-18 20:44:23.168513: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 1650 Ti, Compute Capability 7.5
2020-09-18 20:44:23.642458: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1650 Ti computeCapability: 7.5
coreClock: 1.485GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 178.84GiB/s
2020-09-18 20:44:23.643553: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-09-18 20:44:23.647378: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-09-18 20:44:23.648372: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1650 Ti computeCapability: 7.5
coreClock: 1.485GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 178.84GiB/s
2020-09-18 20:44:23.649458: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-09-18 20:44:23.653267: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-09-18 20:44:23.653735: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-18 20:44:23.654291: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2020-09-18 20:44:23.654631: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2020-09-18 20:44:23.655077: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3891 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1650 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-09-18 20:44:23.658359: I tensorflow/stream_executor/cuda/cuda_driver.cc:775] failed to allocate 3.80G (4080218880 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-09-18 20:44:23.659070: I tensorflow/stream_executor/cuda/cuda_driver.cc:775] failed to allocate 3.42G (3672196864 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-09-18 20:44:25.560185: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-09-18 20:44:26.855418: E tensorflow/stream_executor/cuda/cuda_dnn.cc:328] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
2020-09-18 20:44:26.856558: E tensorflow/stream_executor/cuda/cuda_dnn.cc:328] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
2020-09-18 20:44:26.857303: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at conv_ops_fused_impl.h:642 : Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
I have tried GPU growth:
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.compat.v1.Session(config=config)
and also limiting GPU usage:
gpu_options = tf.compat.v1.GPUOptions(per_process_gpu_memory_fraction = 0.95)
session = tf.compat.v1.Session(config=tf.compat.v1.ConfigProto(gpu_options=gpu_options))
but they didn't resolve the issue.
I recently came across this article: What is Autoencoder? Enhance blurred images using autoencoders by Analytics Vidya, and tried the code provided and I faced the same error.
Can someone help me resolve this issue?
The conv2d op raised an error message:
Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
Looking above, we see
Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3891 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1650 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
failed to allocate 3.80G (4080218880 bytes) from device:
CUDA_ERROR_OUT_OF_MEMORY: out of memory
failed to allocate 3.42G (3672196864 bytes) from device:
CUDA_ERROR_OUT_OF_MEMORY: out of memory
So this graph would need more memory than there is available on your GeForce GTX 1650 Ti (3891 MB). Try using a smaller input image size and/or a smaller batch size.
The problem was with setting GPU growth for Tensorflow 2.3.0.
After setting it properly I could get rid of the error.
import tensorflow as tf
from tensorflow.compat.v1.keras.backend import set_session
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
config.log_device_placement = True
sess = tf.compat.v1.Session(config=config)
set_session(sess)
Source: https://stackoverflow.com/a/59007505/14301371
Here is the output of from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 8087623604945614369
]
Here is the output of pip list | grep tensorflow:
tensorflow-gpu (1.4.0)
tensorflow-tensorboard (0.4.0rc3)
I can confirm that I have installed cuda 8.0 and cudnn on my machine and the output of nvidia-smi shows the GPU along with other details. Can someone please help me to understand why the output from print(device_lib.list_local_devices()) doesn't show the GPU?rr
Tried this simple tensorflow example:
with tf.device('/gpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
print(sess.run(c))
Error:
Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0 ]
I used the retraining example for inception on TF Retrain Inception but I get an error if I try to classify an image. I used the following code - is the code for classification wrong or is there a problem with my memory allocation?
import tensorflow as tf
import sys
# change this as you see fit
image_path = 'C:/tmp/test.jpg'
# Read in the image_data
image_data = tf.gfile.FastGFile(image_path, 'rb').read()
# Loads label file, strips off carriage return
label_lines = [line.rstrip() for line
in tf.gfile.GFile("C:/tmp/output_labels.txt")]
# Unpersists graph from file
with tf.gfile.FastGFile("C:/tmp/output_graph.pb", 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
_ = tf.import_graph_def(graph_def, name='')
with tf.Session() as sess:
# Feed the image_data as input to the graph and get first prediction
softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')
predictions = sess.run(softmax_tensor, \
{'DecodeJpeg/contents:0': image_data})
# Sort to show labels of first prediction in order of confidence
top_k = predictions[0].argsort()[-len(predictions[0]):][::-1]
for node_id in top_k:
human_string = label_lines[node_id]
score = predictions[0][node_id]
print('%s (score = %.5f)' % (human_string, score))
errormessage:
C:\Users\Murph\Documents\Python Scripts\RETRAIN_INCEPTION>python classify.py
I c:\tf_jenkins\home\workspace\release-
win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:135]
successfully opened CUDA library cublas64_80.dll locally
I c:\tf_jenkins\home\workspace\release-
win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:135]
successfully opened CUDA library cudnn64_5.dll locally
I c:\tf_jenkins\home\workspace\release-
win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:135]
successfully opened CUDA library cufft64_80.dll locally
I c:\tf_jenkins\home\workspace\release-
win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:135]
successfully opened CUDA library nvcuda.dll locally
I c:\tf_jenkins\home\workspace\release-
win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:135]
successfully opened CUDA library curand64_80.dll locally
E c:\tf_jenkins\home\workspace\release-
win\device\gpu\os\windows\tensorflow\core\framework\op_kernel.cc:943]
OpKernel ('op: "BestSplits" device_type: "CPU"') for unknown op: BestSplits
E c:\tf_jenkins\home\workspace\release-
win\device\gpu\os\windows\tensorflow\core\framework\op_kernel.cc:943]
OpKernel ('op: "CountExtremelyRandomStats" device_type: "CPU"') for unknown
op: CountExtremelyRandomStats
E c:\tf_jenkins\home\workspace\release-
win\device\gpu\os\windows\tensorflow\core\framework\op_kernel.cc:943]
OpKernel ('op: "FinishedNodes" device_type: "CPU"') for unknown op:
FinishedNodes
E c:\tf_jenkins\home\workspace\release-
win\device\gpu\os\windows\tensorflow\core\framework\op_kernel.cc:943]
OpKernel ('op: "GrowTree" device_type: "CPU"') for unknown op: GrowTree
E c:\tf_jenkins\home\workspace\release-
win\device\gpu\os\windows\tensorflow\core\framework\op_kernel.cc:943]
OpKernel ('op: "ReinterpretStringToFloat" device_type: "CPU"') for unknown
op: ReinterpretStringToFloat
E c:\tf_jenkins\home\workspace\release-
win\device\gpu\os\windows\tensorflow\core\framework\op_kernel.cc:943]
OpKernel ('op: "SampleInputs" device_type: "CPU"') for unknown op:
SampleInputs
E c:\tf_jenkins\home\workspace\release-
win\device\gpu\os\windows\tensorflow\core\framework\op_kernel.cc:943]
OpKernel ('op: "ScatterAddNdim" device_type: "CPU"') for unknown op:
ScatterAddNdim
E c:\tf_jenkins\home\workspace\release-
win\device\gpu\os\windows\tensorflow\core\framework\op_kernel.cc:943]
OpKernel ('op: "TopNInsert" device_type: "CPU"') for unknown op: TopNInsert
E c:\tf_jenkins\home\workspace\release-
win\device\gpu\os\windows\tensorflow\core\framework\op_kernel.cc:943]
OpKernel ('op: "TopNRemove" device_type: "CPU"') for unknown op: TopNRemove
E c:\tf_jenkins\home\workspace\release-
win\device\gpu\os\windows\tensorflow\core\framework\op_kernel.cc:943]
OpKernel ('op: "TreePredictions" device_type: "CPU"') for unknown op:
TreePredictions
E c:\tf_jenkins\home\workspace\release-
win\device\gpu\os\windows\tensorflow\core\framework\op_kernel.cc:943]
OpKernel ('op: "UpdateFertileSlots" device_type: "CPU"') for unknown op:
UpdateFertileSlots
I c:\tf_jenkins\home\workspace\release-
win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 760
major: 3 minor: 0 memoryClockRate (GHz) 1.137
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.65GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0: Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 760, pci bus id: 0000:01:00.0)
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\op_def_util.cc:332] Op BatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\bfc_allocator.cc:217] Ran out of memory trying to allocate 1.91GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
call/nput/n (score = 0.61942)
Traceback (most recent call last):
File "classify_v2.py", line 33, in <module>
human_string = label_lines[node_id]
IndexError: list index out of range
The problem is with human_string = label_lines[node_id]; I think you're indexing into the wrong array. Can you print the values of top_k and the size of label_lines to make sure there's not an indexing mistake in the call to argsort?
I'm running a distributed Tensorflow script. When creating cluster server, I see some information appear in the console that look like the following:
E0805 20:51:03.294260965 3387 ev_epoll1_linux.c:1051] grpc epoll fd: 3
2017-08-05 20:51:03.299766: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job ps -> {0 -> localhost:2222}
2017-08-05 20:51:03.299790: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job worker -> {0 -> localhost:2223}
2017-08-05 20:51:03.305220: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:316] Started server with target: grpc://localhost:2223
When training, I encounter same information and no other response.
E0805 20:52:45.889979901 3387 ev_epoll1_linux.c:1051] grpc epoll fd: 3
The information is printed from with tf.Session("grpc://localhost:2223") as sess:
The version of Tensorflow : 1.3.0-rc0 , which compiling with bazel and working well for single machine
The version of Linux : Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Release: 14.04
Codename: trusty
The Active Internet connects is :
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:2222 0.0.0.0:* LISTEN 8321/python
tcp 0 0 0.0.0.0:2223 0.0.0.0:* LISTEN 8883/python
Here is sample code of creating cluster server
def main(_):
server = tf.train.Server(cluster,
job_name=FLAGS.job_name,
task_index=FLAGS.task_index)
server.join()
if __name__ == "__main__":
tf.app.run()
and traning code
train_X = np.random.rand(100).astype(np.float32)
train_Y = train_X * 0.1 + 0.3
with tf.device("/job:worker/task:0"):
X = tf.placeholder(tf.float32)
Y = tf.placeholder(tf.float32)
w = tf.Variable(0.0)
b = tf.Variable(0.0)
y = w * X + b
loss = tf.reduce_mean(tf.square(y - Y))
init_op = tf.global_variables_initializer()
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
with tf.Session("grpc://localhost:2223") as sess:
sess.run(init_op)
for i in range(500):
sess.run(train_op, feed_dict={X: train_Y, Y: train_Y})
print("after sess.run train")
if i % 50 == 0:
print i, sess.run(w), sess.run(b)
print sess.run(w)
print sess.run(b)
Does anyone know how to fix it? Thanks.