wheel, pyzmq, bad file descriptor proplems in anaconda for deep learning - tensorflow

I'm beginner of tensorflow.
I want to install, use jupyter notebook.
But that doesn't work in my notebook with bad file descripotr ploblem.
So I checked,deleted anaconda and tried to downgrade my pyzmq in 19.0.2 version.
but It has problem.
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for pyzmq
What should I do?
I downloaded visual studio.
I wrote my laptop's imformation
processor 11th Gen Intel(R) Core(TM) i7-11370H # 3.30GHz 3.30 GHz
RAM 16.0GB(15.8GB available)
system 64bit, x64 processor
Thank you for read.
How to solve Failed building wheel for pymzq19.0.2

Related

The kernel appears to have died. It will restart automatically. Jupyter notebook [duplicate]

I am using a MacBook Pro with M1 processor, macOS version 11.0.1, Python 3.8 in PyCharm, Tensorflow version 2.4.0rc4 (also tried 2.3.0, 2.3.1, 2.4.0rc0). I am trying to run the following code:
import tensorflow
This causes the error message:
Process finished with exit code 132 (interrupted by signal 4: SIGILL)
The code runs fine on my Windows and Linux machines.
What does the error message mean and how can I fix it?
Seems that this problem happens when you have multiple python interpreters installed, and some of them are for differente architectuers (x86_64 vs arm64). You need to make sure that the correct python interpreter is being used, if you installed Apple's version of tensorflow, then that probably requires an arm64 interpreter.
If you use rosetta (Apple's x86_64 emulator) then you need to use a x86_64 python interpreter, if you somehow load the arm64 python interpreter, you will get the illegal instruction error (which totally makes sense).
If you use any script that installs new python interpreters, then you need to make sure the correct interpreter for the architecture is installed (most likely arm64).
Overalll I think this problem happens because the python environment setup is not made for systems that can run multiple instruction sets/architectures, pip does check the architecture of packages and the host system but seems you can run a x86_64 interpreter to load a package meant for arm64 and this produces the problem.
For reference there is an issue in tensorflow_macos that people can check.
For M1 Macs, From Apple developer page the following worked:
First, download Conda Env from here and then follow these instructions (assuming the script is downloaded to ~/Downloads folder)
chmod +x ~/Downloads/Miniforge3-MacOSX-arm64.sh
sh ~/Downloads/Miniforge3-MacOSX-arm64.sh
source ~/miniforge3/bin/activate
reload the shell and do
python -m pip uninstall tensorflow-macos
python -m pip uninstall tensorflow-metal
conda install -c apple tensorflow-deps
python -m pip install tensorflow-macos
python -m pip install tensorflow-metal
If the above doesn't work for some reason, there are some edge cases and additional information provided at the Apple developer page
Installing Tensorflow version 1.15 fixed this for me.
$ conda install tensorflow==1.15
I have been able to resolve this issue by using Miniforge instead of Anaconda as the Python environment. Anaconda doesn't support the arm64 architecture, yet.
I had the same issue
This is because of M1 chip. Now there is a pre-release that delivers hardware-accelerated TensorFlow and TensorFlow Addons for macOS 11.0+. Native hardware acceleration is supported on M1 Macs and Intel-based Macs through Appleā€™s ML Compute framework.
You need to install the TensorFlow that supports M1 chip Simply pull this tensorflow macos repository and run the ./scripts/download_and_install.sh

Install Tensorflow-GPU on WSL2

Has anyone successfully installed Tensorflow-GPU on WSL2 with NVIDIA GPUs? I have Ubuntu 18.04 on WSL2, but am struggling to get NVIDIA drivers installed. Any help would be appreciated as I'm lost.
So I have just got this running.
The steps you need to follow are here. To summarise them:
sign up for windows insider program and get the development builds of windows so that you have the latest version
Install wsl 2
Install Ubuntu from the windows store
Install the wsl 2 cuda driver on windows
Install cuda toolkit
Install cudnn (you can download the linux version from windows and then copy the file to linux)
If you are getting memory errors like 'cannot allocate memory' then you might need to increase the amount of memory wsl can get
Then install tensorflow-gpu
pray it works
bugs I hit along the way:
If when you open ubuntu for the first time you get an error you need to enable virutalisation in the bios
If you cannot run the ./Blackscholes example in the installation instructions you might not have the right build of windows! You must have the right version
if you are getting 'cannot allocate memory' errors when running tf you need to give wsl more ram. It only access half your ram by default
create a .wslconfig file under your user directory in windows with the amount of memory you want. Mine looks like:
[wsl2]
memory=16GB
Edit after running some code
This is much slower then when I was running on windows directly. I went from 1 minute per epoch to 5 minutes. I'm just going to dualboot.
These are the steps I had to follow for Ubuntu 20.04. I am no longer on dev channel, beta channel works fine for this use case and is much more stable.
Install WSL2
Install Ubuntu 20.04 from Windows Store
Install Nvidia Drivers for Windows from: https://developer.nvidia.com/cuda/wsl/download
Install nvcc inside of WSL with:
sudo apt install nvidia-cuda-toolkit
Check that it is there with:
nvcc --version
For my use case, I do data science and already had anaconda installed. I created an environment with:
conda create --name tensorflow
conda install tensorflow-gpu
Then just test it with this little python program with the environment activated:
import tensorflow as tf
tf.config.list_physical_devices('GPU')
sys_details = tf.sysconfig.get_build_info()
cuda = sys_details["cuda_version"]
cudnn = sys_details["cudnn_version"]
print(cuda, cudnn)
For reasons I do not understand, my machine was unable to find the GPU without installing the nvcc and actually gave an error message saying it could not find nvcc.
Online tutorials I had found which had you downloading CUDA and CUDNN separately but I thinkNVCC includes CUDNN since it is . . . there somehow.
I can confirm I am able to get this working without the need for Docker on WSL2 thanks to the following article:
https://qiita.com/Navier/items/cf551908bae707db4258
Be sure to update to driver version 460.15, not 455.41 as listed in the CUDA documentation.
Note, this does not work with the card in TCC mode (only WDDM). Also, be sure to place your files on the Linux file system (i.e. not on a mount drive, like /mnt/c/). Performance is significantly faster on the Linux file system (this has to do with the difference in implementation of WSL 1 vs. WSL 2; see 1, 2, and 3).
NOTE: See also Is the class generator (inheriting Sequence) thread safe in Keras/Tensorflow?
I just want to point out that using anaconda to install cudatoolkit and cudnn does not seem to work in wsl.
Maybe there is some problem with paths that make TF look for the needed files only in the system paths instead of the conda enviroments.

Keras with Tensorflow backend on GPU. MKL ERROR: Parameter 4 was incorrect on entry to DLASCL

I installed Tensorflow with GPU support and Keras to an environment in Anaconda (v1.6.5) by using following commands:
conda install -n EnvName tensorflow-gpu
conda install -n EnvName -c conda-forge keras-gpu
I have NVIDIA Quadro 2200K on my machine with driver v384.66, cuda-8.0, cudnn 7.0
When I am trying to run a python code with Keras at the stage of training I get the following
Intel MKL ERROR: Parameter 4 was incorrect on entry to DLASCL.
and later
File
"/home/User/anaconda3/envs/keras_gpu/lib/python3.6/site-packages/numpy/linalg/linalg.py",
line 99, in _raise_linalgerror_svd_nonconvergence
raise LinAlgError("SVD did not converge") numpy.linalg.linalg.LinAlgError: SVD did not converge
Other relevant sources suggest to check data for NaNs and Infs, but my data is clean for sure. By the way, CPU version of the installation is working fine, the issue occurs only when trying to run on GPU
I tried to reinstall Anaconda, to reinstall CUDA and numpy, but it didn't work out.
The problem was in package mkl (2018.0.0) - it seems like it has recently been released and conflicts with the version of some packages supplied with Tensorflow(1.3.0) and Keras(2.0.5) via conda*.
So I manually downgraded mkl using Anaconda Navigator to v11.3.3 which led automatically to downgrade of other packages and everything is working well now.

Installing tensorflow-gpu 1.3.0 on windows 10

I have been trying to install tensorflow-gpu on windows 10, via
pip3 install --upgrade tensorflow-gpu
When I do this I break the current installation of ordinary tensorflow!, and get this error: On Windows, running "import tensorflow" generates No module named "_pywrap_tensorflow" error.
Somehow I manage to fix this by re-installing ordinary tensorflow, but then when I import tensorflow in python 3.5.2 and try to identify my GPU, No device is found!
I have a Cuda 9.0 installed alongside cudnn64_6 defined as a DLL in CUDA/v9.0/bin, and I can run the nbody test program without problems and I can see the GPU being used for that demo application.
Is there any known issue with tensorflow-gpu 1.3.0?
Really struggling on this. Why does it have to be so problematic installing this library!
Please help
mg
TensorFlow 1.3 (and 1.4) require CUDA 8.0 and do not support later versions. You will either need to downgrade CUDA to 8.0 or make a custom build from source.

tensorflow on GPU: no known devices, despite cuda's deviceQuery returning a "PASS" result

Note : this question was initially asked on github, but it was asked to be here instead
I'm having trouble running tensorflow on gpu, and it does not seems to be the usual cuda's configuration problem, because everything seems to indicate cuda is properly setup.
The main symptom: when running tensorflow, my gpu is not detected (the code being run, and its output).
What differs from usual issues is that cuda seems properly installed and running ./deviceQuery from cuda samples is successful (output).
I have two graphical cards:
an old GTX 650 used for my monitors (I don't want to use that one with tensorflow)
a GTX 1060 that I want to dedicate to tensorflow
I use:
tensorflow-1.0.0
cuda-8.0 (ls -l /usr/local/cuda/lib64/libcud*)
cudnn-5.1.10
python-2.7.12
nvidia-drivers-375.26 (this was installed by cuda and replaced my distro driver package)
I've tried:
adding /usr/local/cuda/bin/ to $PATH
forcing gpu placement in tensorflow script using with tf.device('/gpu:1'): (and with tf.device('/gpu:0'): when it failed, for good measure)
whitelisting the gpu I wanted to use with CUDA_VISIBLE_DEVICES, in case the presence of my old unsupported card did cause problems
running the script with sudo (because why not)
Here are the outputs of nvidia-smi and nvidia-debugdump -l, in case it's useful.
At this point, I feel like I have followed all the breadcrumbs and have no idea what I could try else. I'm not even sure if I'm contemplating a bug or a configuration problem. Any advice about how to debug this would be greatly appreciated. Thanks!
Update: with the help of Yaroslav on github, I gathered more debugging info by raising log level, but it doesn't seem to say much about the device selection : https://gist.github.com/oelmekki/760a37ca50bf58d4f03f46d104b798bb
Update 2: Using theano detects gpu correctly, but interestingly it complains about cuDNN being too recent, then fallback to cpu (code ran, output). Maybe that could be the problem with tensorflow as well?
From the log output, it looks like you are running the CPU version of TensorFlow (PyPI: tensorflow), and not the GPU version (PyPI: tensorflow-gpu). Running the GPU version would either log information about the CUDA libraries, or an error if it failed to load them or open the driver.
If you run the following commands, you should be able to use the GPU in subsequent runs:
$ pip uninstall tensorflow
$ pip install tensorflow-gpu
None of the other answers here worked for me. After a bit of tinkering I found that this fixed my issues when dealing with Tensorflow built from binary:
Step 0: Uninstall protobuf
pip uninstall protobuf
Step 1: Uninstall tensorflow
pip uninstall tensorflow
pip uninstall tensorflow-gpu
Step 2: Force reinstall Tensorflow with GPU support
pip install --upgrade --force-reinstall tensorflow-gpu
Step 3: If you haven't already, set CUDA_VISIBLE_DEVICES
So for me with 2 GPUs it would be
export CUDA_VISIBLE_DEVICES=0,1
In my case:
pip3 uninstall tensorflow
is not enough. Because when reinstall with:
pip3 install tensorflow-gpu
It is still reinstall tensorflow with cpu not gpu.
So, before install tensorflow-gpu, I tried to remove all related tensor folders in site-packages uninstall protobuf, and it works!
For conclusion:
pip3 uninstall tensorflow
Remove all tensor folders in ~\Python35\Lib\site-packages
pip3 uninstall protobuf
pip3 install tensorflow-gpu
Might seem dumb but a sudo reboot has fixed the exact same problem for me and a couple others.
The answer that saved my day came from Mark Sonn. Simply add this to .bashrc and
source ~/.bashrc if you are on Linux:
export CUDA_VISIBLE_DEVICES=0,1
Previously I had to use this workaround to get tensorflow recognize my GPU:
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices(device_type="GPU")
tf.config.experimental.set_visible_devices(devices=gpus[0], device_type="GPU")
tf.config.experimental.set_memory_growth(device=gpus[0], enable=True)
Even though the code still worked, adding these lines every time is clearly not something I would want.
My version of tensorflow was built from source according to the documentation to get v2.3 support CUDA 10.2 and cudnn 7.6.5.
If anyone having trouble with that, I suggest doing a quick skim over the docs. Took 1.5 hours to build with bazel. Make sure you have gcc7 and bazel installed.
This error may be caused by your GPU's compute capability, CUDA officially supports GPU's compute capability within 3.5 ~ 5.0, you can check here: https://en.wikipedia.org/wiki/CUDA
In my case, the error was like this:
Ignoring visible gpu device (device: 0, name: GeForce GT 640M, pci bus id: 0000:01:00.0, compute capability: 3.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5.
For now we can only compile from source code on Linux (or mac OS) to break the '3.5~5.0' limit.
There are various system incompatible problems.
The requirement for libraries can vary from the version of TensorFlow.
During using python in interactive mode a lot of useful information is printing into stderr. What I suggest for TensorFlow with version 2.0 or more to call:
python3.8 -c "import tensorflow as tf; print('tf version:', tf.version); tf.config.list_physical_devices()"
After this command, you will observe missing libraries (or a version of it) for work with GPU in addition to requirements:
https://www.tensorflow.org/install/gpu#software_requirements
https://www.tensorflow.org/install/gpu#hardware_requirements
p.s. CUDA_VISIBLE_DEVICES should not have a real connection with TensorFlow, or it's more general - it's a way to customize available GPUs for all launched processes.
For anaconda users. I installed tensorflow-gpu via GUI using Anaconda Navigator and configured NVIDIA GPU as in tensorflow guide but tensorflow couldn't find the GPU anyway. Then I uninstalled tensorflow, always via GUI (see here) and reinstalled it via command line in an anaconda prompt issuing:
conda install -c anaconda tensorflow-gpu
and then tensorflow could find the GPU correctly.