I'm very new with TensorFlow.
I want to run my code on my CUDA gpu. So I've installed TensorFlow -gpu, after I've installed normal TensorFlow.
How can I tell Python that it takes gpu based TensorFlow?
If you have tensorflow-gpu installed there really isn't any reason to also have tensorflow. Without the presence of a gpu it will just run on cpu anyway.
To be specific in which GPU you use:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
in place of "0" you can either list GPUs (if you have multiple), or "" if you want it to run on cpu.
alternatively, specify in the session:
sess = tf.Session(config=tf.ConfigProto(device_count={'GPU': 0}))
Furthermore you can check which version your computer prioritizes by opening python console and typing:
>>> import tensorflow
>>> tensorflow
<module 'tensorflow' from
'/home/.../python3.6/site- packages/tensorflow/__init__.py'>
^
|
here
Related
Stylegan2 uses network pickle files to store ML models. I transfer trained one model, which I am able to open up on cloud servers. I have been generating images from this model fine with the following setup:
Google Colab: Python 3.6.9, CUDA 10.1, tensorflow-gpu 1.15, CuDNN 7.6.5
However, I cannot open the network pickle file on my local machine, even though I've been trying to replicate that cloud setup the best I can. (I have the right GPU hardware/drivers/etc.)
Local (Windows 10) Python 3.6.9, CUDA 10.1, tensorflow-gpu 1.15, CuDNN 7.6.5
Requires a library 'dnnlib' to be in the PYTHONPATH and for a tf.Session() to be initialized
I get the an assertion error about the pickle.
**Assertion error**: `assert state["version"] in [2,3]`
I find this error very odd because the network pickle works on the cloud--so it was saved properly. Additionally, my local set up can open up other network pickles(ie. ones downloaded from the internet through GET requests), making me think that I have properly set up my PYTHONPATH and initialized a tf.Session. These are prerequisites listed in the Stylegan repo:
"You can import the networks in your own Python code using pickle.load(). For this to work, you need to include the dnnlib source directory in PYTHONPATH and create a default TensorFlow session by calling dnnlib.tflib.init_tf()"
I'm not sure why I cannot open up this pickle in one environment, but can in another. Does anyone have any suggestions as to where I might start looking?
Actually, I figured it out by printing out what version was throwing the error. The version printed was '4'. I realized that this matched the pickle (HIGHEST_PROTOCOL) and that what I needed was the newest pull of the Stylegan2 repository, which included pickle format_version 4 in their allowed versions.
I prepare the dataset and save it as as hdf5 file. I have a custom data generator that subclasses Sequence from keras and generates batches from the hdf5 file.
Now, when I model.fit_generator using the train generator, the model uses the GPU and trains fast for the first 2 epochs (GPU memory is full and GPU volatile usage fluctuates nicely around 50%). However, after the 3rd epoch, GPU volatile usage is 0% and the epoch takes 20x as long.
What's going on here?
Can you try configuring GPU as given in this post https://www.tensorflow.org/guide/gpu
Here is how i have done in my program
print("Runnning Jupyter Notebook using python version: {}".format(python_version()))
print("Running tensorflow version: {}".format(tf.keras.__version__))
print("Running tensorflow.keras version: {}".format(tf.__version__))
print("Running keras version: {}".format(keras.__version__))
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
tf.config.experimental.list_physical_devices('GPU')
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
# Restrict TensorFlow to only allocate 2GB of memory on the first GPU
try:
tf.config.experimental.set_virtual_device_configuration(
gpus[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=2048)])
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Virtual devices must be set before GPUs have been initialized
print(e)
Here is the output of above command:
Runnning Jupyter Notebook using python version: 3.7.7
Running tensorflow version: 2.2.4-tf
Running tensorflow.keras version: 2.1.0
Running keras version: 2.3.1
Num GPUs Available: 1
1 Physical GPUs, 1 Logical GPUs
Value might differ, memory_limit=2048 is the amount of memory allocated to GPU device.
To confirm allocation please use nvidia-smi, if you run with this config keras won't increase memory usage. As you told that after 2 epochs it is very slow, can you tell further does kernel dies, system hangs or restarts? Issues without config I have faced, is system just hangs. If you are running on ubuntu 18.04 LTS please use System Monitor(visually tells how many cores are being used, periodic contants increase means something is wrong) tool before executing all cells in notebook, once you start check Resources Tab in System Monitor.
Do:
A fresh run
Or Restart & Run All
Suspected Issue: How to prevent tensorflow from allocating the totality of a GPU memory?
Same Error Here!!
Because when you install tensorflow-gpu along nivida tool kit it provide a limited amount of GPU memory (Here in my case 2GB) .Due to leak of memory it release GPU finally and turn to use CPU .
if you want to avoid such condition Use Google Colab which provide about 36.7GB GPU memory.
Recently I wrote a my custom operator (and its gradient) in python following this post
Tensorflow: Custom operation used in two networks simultaneously produces nan
Tensorflow runs with no error and the prediction gives expected accuracy. However, when I want to visualize this graph with tensorboard. I find that I cannot open the subgraph to see its structures. But it's gradient subgraph can be opened and seen. Does anyone has some idea about this problem?
Fig.1: subgraph fc1 cannot be opened but gradient/fc1 can be opened.
I can open the fc1 metanode on TensorBoard 0.1.8.
What version of TensorBoard you are using? You can find the version via running
python -c 'from tensorboard import version; print(version.VERSION)'
After that, could you try upgrading tensorboard via
pip install --upgrade tensorflow-tensorboard
and let me know if the issue persists?
i'm using Ubuntu 14.04 without GPU and i want to run this code ( with CPU only ). I want to run this code with CPU ( not with GPU) : https://github.com/smallcorgi/Faster-RCNN_TF . what should I do ?
The Github repository you are referring to is a TensorFlow implementation of Faster-RCNN, not Caffe.
If you want to use the Caffe implementation, you have to use this repository : https://github.com/rbgirshick/py-faster-rcnn
You have to edit the Python scripts that are used to train and test the model, e.g. train_faster_rcnn_alt_opt.py so that the line caffe.set_mode_gpu() is replaced by caffe.set_mode_cpu(). You might have to compile Caffe by editing the Makefile.config file to remove CUDNN and CUDA support.
Note that Caffe on CPU will be very slow compared to GPU computing.
first of all, I'm abit unsure if I should have asked this on Github or here, but since I wasn't sure I opted to go with stackoverflow.
I recently got a Nvidia GTX 1070 and wanted to try out tensorflow with it. I'm using a fresh install of Ubuntu 16.04, the nvidia-367 driver from the "Graphics Drivers Team" PPA, nvidia-cuda-toolkit 7.5.18-0ubuntu1 and cuDNN v4 (Feb 10, 2016).
Tensorflow was installed according to https://www.tensorflow.org/versions/r0.9/get_started/os_setup.html follwing the "Virtualenv installation", using this TF_BINARY_URL:
# Ubuntu/Linux 64-bit, GPU enabled, Python 2.7
# Requires CUDA toolkit 7.5 and CuDNN v4. For other versions, see "Install from sources" below.
(tensorflow)$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.9.0-cp27-none-linux_x86_64.whl
The first tutorial seems to work fine, and I've run a few other example models that also seem to work fine, but for some reason I'm getting an accuracy of about 9.5% in the "Deep MNIST for Experts" tutorial.
At first I thought I had made some error copy-pasting the code and spent some time trying to debug it to no avail. Then I found this issue on github https://github.com/tensorflow/tensorflow/issues/2781 and tried downloading his code and dont get anywhere close to 90% accuracy. I also tried fixing the bug in the code, so the train step runs every iteration, with no luck.
This is the output I get from running the tut.py from the above mentioned issue on github, modified to run train_step on each iteration of the loop:
$ python -i tut.py
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
>>> conv_net()
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:924] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.7715
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.46GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:806] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0)
step 0, training accuracy 0.14
step 100, training accuracy 0.1
step 200, training accuracy 0.16
step 300, training accuracy 0.12
step 400, training accuracy 0.1
step 500, training accuracy 0.08
[....]
step 19500, training accuracy 0.18
step 19600, training accuracy 0.06
step 19700, training accuracy 0.1
step 19800, training accuracy 0.12
step 19900, training accuracy 0.08
W tensorflow/core/common_runtime/bfc_allocator.cc:213] Ran out of memory trying to allocate 5.84GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
test accuracy 0.0954
I might also add that I'm fairly sure I ran this tutorial a while back using an older GPU, and didn't have any issues, so somehow I get the feeling that something with the Pascal architecture isn't supported properly. What's even stranger is that some of the more complex models like the CNN and RNN "tutorials"/examples (seems to) run fine.
Edit:
I installed the CPU version using
# Ubuntu/Linux 64-bit, CPU only, Python 2.7
(tensorflow)$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.9.0-cp27-none-linux_x86_64.whl
and running 1000 iterations (instead of 20000) gives this result:
$ python -i tut.py
>>> conv_net()
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
step 0, training accuracy 0.14
step 100, training accuracy 0.88
step 200, training accuracy 0.88
step 300, training accuracy 0.82
step 400, training accuracy 0.94
step 500, training accuracy 0.92
step 600, training accuracy 0.98
step 700, training accuracy 0.94
step 800, training accuracy 0.9
step 900, training accuracy 1
test accuracy 0.9648
Guess I'll try making a reinstall from source with newer versions for "everything".
Installing newer versions of CUDA and CuDNN seems to have solved the issue. (I saw the download page for CuDNN explicitly states that version 4 doesn't work with GTX 1070/1080.)
What worked for me was:
use the "Graphics Drivers Team" Ubuntu PPA thingy for installing the nvidia-367 drivers.
install CUDA 8.0RC using the runfile, didn't install the bundled driver. I tried the deb file but there were some issues with it wanting to install the bundled nvidia-361 driver. I never tried the third option (some tar.gz file IIRC?)
installed bazel from source, again I had some issues with the custom apt repo due to some dependency on java.
I used HEAD from tensorflows git repo, no particular reason.
I ran into this issue (or something very similar). This was solved by switching to gcc-4.9 instead of the default. (I only changed the path in the configure script for tensorflow.) I have no idea why this works, and it was something of a lucky guess.
Think I needed to install the zlib1g-dev package due to missing header files, but if so the error message was very clear that this was the issue.
Sorry for my mistake...(I deleted previous answer.)
but I've found the solution.
check the following link.
you have to join developer of nvidia. and download cuda-8.0 (after installing cuda-8.0, it is necessary to reinstall nvidia driver!)
https://developer.nvidia.com/cuda-release-candidate-download