Running Train a GPT-2 (or GPT Neo) Text-Generating Model w/ GPU on Colab

Running Train a GPT-2 (or GPT Neo) Text-Generating Model w/ GPU on Colab - tensorflow

When I start "Running Train a GPT-2 (or GPT Neo) Text-Generating Model w/ GPU on Colab" in my Colab, following error comes up:
ERROR: tensorflow 2.5.0 has requirement tensorboard~=2.5, but you'll
have tensorboard 2.4.1 which is incompatible. ERROR: pytorch-lightning
1.3.8 has requirement PyYAML<=5.4.1,>=5.1, but you'll have pyyaml 3.13 which is incompatible.
What to do? Is it because of my Mac, or do I need to upgrade my Colab account would that help?

The problem comes from the default packages installed in the Colab environment. I does not depend on the platform you are using to access Colab or on the type of your subscription.
You have to upgrade the Python packages using pip.
In general you can run shell commands like pip in Colab prepending a ! character,
so in your case the following lines should be sufficient to fix the problem
!pip install tensorboard==2.5
!pip install pyyaml==5.4.1
If you need to run more shell commands, you can use more user-friedly methods (see the answers to this question).

Related

Can anyone give me a comprehensive guide to installing tensorflow-federated on M1 Mac?

i followed the instructions given by the official tf documentation, but i just cannot resolve the various problems encountered.
Did anyone have the experience installing tff on m1 mac and can show me your overall process?
conda create -n federated python=3.8
conda activate federated
pip install --upgrade tensorflow_federated
everything seems to be fine according to the terminal output, however,
after
import tensorflow_federated as tff
i got a RunTimeError:
RuntimeError: This version of jaxlib was built using AVX instructions, which your CPU and/or operating system do not support. You may be able work around this issue by building jaxlib from source.
how to resolve this?

Using Object Detection API on local GPU but not last version (v2.5.0)

I am trying to use my local GPU to train an EfficientDetD0 model. I already have a good pipeline (that works on Google Colab for example), I modified it a bit to use it locally, but one problem happens every time I launch the training.
I use conda to install tensorflow-gpu with cuda and cudnn but it makes TensorFlow v2.4.1 environments and when I launch the training the Object Detection API automatically install TensorFlow V2.5.0. So my env is not using the gpu for the training because cuda and cudnn are waiting for TensorFlow to be v2.4.1 and not v2.5.0.
Is there a way to get the Object Detection API in v2.4.1 and not v2.5.0 ?
I tried many things but it doesn't work (training is failing or going for CPU training).
Here is the code that install dependencies and overwrite TensorFlow version to TensorFlow v2.5.0:
os.system("cp object_detection/packages/tf2/setup.py .")
os.system("python -m pip install .")
SYSTEM:
gpu : Nvidia RTX 3070
os : Ubuntu 20.04 LTS
tensorflow: 2.4.1
P.S.: I go with conda install -c conda-forge tensorflow-gpu for installing TensorFlow, cuda and cudnn in my training env because manually there was a dependency problem, so I took the easy way.
EDIT : solution found explained in comments.

Follow these steps to install specific version of tensorflow gpu
1. Set Up Anaconda Environments
conda create -n tf_gpu cudatoolkit=11.0
2. Activate the new Environment
source activate tf_gpu
3. Install tensorflow-gpu 2.4.1
pip install tensorflow==2.4.1

Try to run object_detection without "installing" it. Dont run setup.py. Just setup the neccesery paths and packages manually.
Or edit the setup.py to skip installing the specific verison of TF. I quess that this version is a requirement of some of the packages installed in setup.py.
I use the object_detection without running the setup.py or doing any "installation" without any problems.

Jupyter Notebook kernel dies when importing tensorflow 1.5.0

Jupyter Notebook kernel dies when importing tensorflow 1.5.0
I have read a lot of posts relating to this but they have all had higher version numbers of tensorflow and have solved it by downgrading to 1.5.0. I also had higher version number and followed the advice to downgrade but I still have the problem.
Does anyone know what to try next?

pip install h5py==2.8.0
worked for me

When trying using the command prompt I got an error message not related to the tensorflow issue (I think);
"Warning! HDF5 library version mismatched error"
The key information from that message body was "Headers are 1.10.1, library is 1.10.2" so I downgraded hdf5 library by "conda install -c anaconda hdf5=1.10.1" and now the error message is gone and the kernel does not die when importing tensorflow.

I got similar problems, any tensorflow or tensorflow related packages (e.g. keras) made my kernel to die when loading, from any interface (jupyter, spyder, console....)
For those having this kind of problems, try running python from the console with verbose mode (python -v) then import tensorflow and look for errors.
I spot errors related to h5py, similar to the reply of #DBSE. I just upgraded the h5py package then everything was solved !

If you are using a conda environment, then the easiest method for fixing this issue is to just create a new environment and install tensorflow with just a single command. I had the same issue, I have tried a lot on most of the version of python and tensorflow. But at the last I have successfully configured it with just a single steps.
Run this command for installing GPU version
conda create --name tf_gpu tensorflow-gpu
The above line of code will automatically install that version of python and tf which is comaptible with your GPU or CPU.
For CPU, Run this command
conda create --name tf_env tensorflow
Both of these command work 100 % with my system for GPU and CPU access and will download the latest version which are compatible with system. It will resolved/fixed "Illegal Instruction (code dumps)" error.

pip install h5py==3.1.0
This is the most updated version which worked for me.

Try using import numpy before Keras and Tensorflow.

Install Keras on Anaconda OSX

I am trying to install keras on an anaconda environment (OSX), because I want to use it with spyder - ipython. To do that I just used pip install keras (I already have tensorflow). After the installation when I call python 2.7 from the terminal, keras works fine. But, when I call python 3.5 or spyder and try to import keras I receive:
No module named 'keras'
I assume the issue might be with the PATHS on my MacBook, because which python returns
/usr/local/bin/python2.7
while which python3.5 (or spyder) returns
/Users/georgiospapadopoulos/anaconda/bin/python3.5
/Users/georgiospapadopoulos/anaconda/bin/spyder
Also, during pip install keras shows that
Requirement already satisfied: keras in /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages
My ~/.bash_profile contains
# added by Anaconda3 2.4.0 installer
export PATH="/Users/georgiospapadopoulos/anaconda/bin:$PATH"
# added by Anaconda3 4.2.0 installer
export PATH="/Users/georgiospapadopoulos/anaconda/bin:$PATH"
export CUDA_HOME=/usr/local/cuda
export DYLD_LIBRARY_PATH="$DYLD_LIBRARY_PATH:$CUDA_HOME/lib"
export PATH="$CUDA_HOME/bin:$PATH"
# Setting PATH for Python 2.7
# The original version is saved in .bash_profile.pysave
#PATH="/Library/Frameworks/Python.framework/Versions/2.7/bin:${PATH}"
PATH="/usr/local/bin/python:$PATH"
export PATH

You are probably mixing up the virtual environments. The best way to handle this is to create a vertual env in Anaconda - say, neural-net-venv, and then open the terminal for that venv, and install keras and other related modules. Then go back to Anaconda dash and select that venv as active environment to work on. Then select Jupyter and Spyder and run your imports.
Note that you also should not mix your Python versions - if you must work on Py2 and Py3 - create separate virtual environments for both, install keras, theanos/tensorflows separately in these environments, and you should be good to go.
I have run this setup on MacOS and it works like a charm.

For installing keras in Anaconda, the best and hassle free way is just use open the anaconda prompt and then type:
conda install keras
Keras runs on either tensorflow or theano backends. Once the keras install is complete, just open the python shell and type
>>>import keras
If some error is thrown, then there must be some problem with the backend. So just open the anaconda prompt, and type
conda import tensorflow
theano also can be used. Nevertheless tensorflow is the default one.

I wanted to insatll keras on Anaconda, tried the above approach, but it still did not work. Specifically, I started Anaconda Navigator and then opened a Mac OS terminal in the base environment. Then I followed the conda install commands for keras and tensorflow. It worked fine for keras. But with tensorflow, I got the following error message:
Downloading and Extracting Packages
_tflow_select-2.3.0 | 3 KB | ######################################################### | 100%
ChecksumMismatchError: Conda detected a mismatch between the expected content and downloaded content
for url 'https://conda.anaconda.org/Anaconda/osx-64/_tflow_select-2.3.0-mkl.tar.bz2'.
download saved to: /Users/dlin/opt/anaconda3/pkgs/_tflow_select-2.3.0-mkl.tar.bz2
expected sha256: cc155b27e7bf91ec5370ce1fd2d5fceccbf13ac19706229674ba971fa3751446
actual sha256: aad248699de112a7a5ead1695dfdf51b5693c2927303844b29dd7d9138dc95b9

tensorflow on GPU: no known devices, despite cuda's deviceQuery returning a "PASS" result

Note : this question was initially asked on github, but it was asked to be here instead
I'm having trouble running tensorflow on gpu, and it does not seems to be the usual cuda's configuration problem, because everything seems to indicate cuda is properly setup.
The main symptom: when running tensorflow, my gpu is not detected (the code being run, and its output).
What differs from usual issues is that cuda seems properly installed and running ./deviceQuery from cuda samples is successful (output).
I have two graphical cards:
an old GTX 650 used for my monitors (I don't want to use that one with tensorflow)
a GTX 1060 that I want to dedicate to tensorflow
I use:
tensorflow-1.0.0
cuda-8.0 (ls -l /usr/local/cuda/lib64/libcud*)
cudnn-5.1.10
python-2.7.12
nvidia-drivers-375.26 (this was installed by cuda and replaced my distro driver package)
I've tried:
adding /usr/local/cuda/bin/ to $PATH
forcing gpu placement in tensorflow script using with tf.device('/gpu:1'): (and with tf.device('/gpu:0'): when it failed, for good measure)
whitelisting the gpu I wanted to use with CUDA_VISIBLE_DEVICES, in case the presence of my old unsupported card did cause problems
running the script with sudo (because why not)
Here are the outputs of nvidia-smi and nvidia-debugdump -l, in case it's useful.
At this point, I feel like I have followed all the breadcrumbs and have no idea what I could try else. I'm not even sure if I'm contemplating a bug or a configuration problem. Any advice about how to debug this would be greatly appreciated. Thanks!
Update: with the help of Yaroslav on github, I gathered more debugging info by raising log level, but it doesn't seem to say much about the device selection : https://gist.github.com/oelmekki/760a37ca50bf58d4f03f46d104b798bb
Update 2: Using theano detects gpu correctly, but interestingly it complains about cuDNN being too recent, then fallback to cpu (code ran, output). Maybe that could be the problem with tensorflow as well?

From the log output, it looks like you are running the CPU version of TensorFlow (PyPI: tensorflow), and not the GPU version (PyPI: tensorflow-gpu). Running the GPU version would either log information about the CUDA libraries, or an error if it failed to load them or open the driver.
If you run the following commands, you should be able to use the GPU in subsequent runs:
$ pip uninstall tensorflow
$ pip install tensorflow-gpu

None of the other answers here worked for me. After a bit of tinkering I found that this fixed my issues when dealing with Tensorflow built from binary:
Step 0: Uninstall protobuf
pip uninstall protobuf
Step 1: Uninstall tensorflow
pip uninstall tensorflow
pip uninstall tensorflow-gpu
Step 2: Force reinstall Tensorflow with GPU support
pip install --upgrade --force-reinstall tensorflow-gpu
Step 3: If you haven't already, set CUDA_VISIBLE_DEVICES
So for me with 2 GPUs it would be
export CUDA_VISIBLE_DEVICES=0,1

In my case:
pip3 uninstall tensorflow
is not enough. Because when reinstall with:
pip3 install tensorflow-gpu
It is still reinstall tensorflow with cpu not gpu.
So, before install tensorflow-gpu, I tried to remove all related tensor folders in site-packages uninstall protobuf, and it works!
For conclusion:
pip3 uninstall tensorflow
Remove all tensor folders in ~\Python35\Lib\site-packages
pip3 uninstall protobuf
pip3 install tensorflow-gpu

Might seem dumb but a sudo reboot has fixed the exact same problem for me and a couple others.

The answer that saved my day came from Mark Sonn. Simply add this to .bashrc and
source ~/.bashrc if you are on Linux:
export CUDA_VISIBLE_DEVICES=0,1
Previously I had to use this workaround to get tensorflow recognize my GPU:
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices(device_type="GPU")
tf.config.experimental.set_visible_devices(devices=gpus[0], device_type="GPU")
tf.config.experimental.set_memory_growth(device=gpus[0], enable=True)
Even though the code still worked, adding these lines every time is clearly not something I would want.
My version of tensorflow was built from source according to the documentation to get v2.3 support CUDA 10.2 and cudnn 7.6.5.
If anyone having trouble with that, I suggest doing a quick skim over the docs. Took 1.5 hours to build with bazel. Make sure you have gcc7 and bazel installed.

This error may be caused by your GPU's compute capability, CUDA officially supports GPU's compute capability within 3.5 ~ 5.0, you can check here: https://en.wikipedia.org/wiki/CUDA
In my case, the error was like this:
Ignoring visible gpu device (device: 0, name: GeForce GT 640M, pci bus id: 0000:01:00.0, compute capability: 3.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5.
For now we can only compile from source code on Linux (or mac OS) to break the '3.5~5.0' limit.

There are various system incompatible problems.
The requirement for libraries can vary from the version of TensorFlow.
During using python in interactive mode a lot of useful information is printing into stderr. What I suggest for TensorFlow with version 2.0 or more to call:
python3.8 -c "import tensorflow as tf; print('tf version:', tf.version); tf.config.list_physical_devices()"
After this command, you will observe missing libraries (or a version of it) for work with GPU in addition to requirements:
https://www.tensorflow.org/install/gpu#software_requirements
https://www.tensorflow.org/install/gpu#hardware_requirements
p.s. CUDA_VISIBLE_DEVICES should not have a real connection with TensorFlow, or it's more general - it's a way to customize available GPUs for all launched processes.

For anaconda users. I installed tensorflow-gpu via GUI using Anaconda Navigator and configured NVIDIA GPU as in tensorflow guide but tensorflow couldn't find the GPU anyway. Then I uninstalled tensorflow, always via GUI (see here) and reinstalled it via command line in an anaconda prompt issuing:
conda install -c anaconda tensorflow-gpu
and then tensorflow could find the GPU correctly.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Running Train a GPT-2 (or GPT Neo) Text-Generating Model w/ GPU on Colab - tensorflow

Related

Can anyone give me a comprehensive guide to installing tensorflow-federated on M1 Mac?

Using Object Detection API on local GPU but not last version (v2.5.0)

Jupyter Notebook kernel dies when importing tensorflow 1.5.0

Install Keras on Anaconda OSX

tensorflow on GPU: no known devices, despite cuda's deviceQuery returning a "PASS" result

Categories

Resources