Does Tensorflow calculate these partial derivatives/gradient correctly? - tensorflow

I'm just starting learning Tensorflow and came across an example that doesn't make sense to me:
>>> import tensorflow as tf
>>> a=tf.Variable(1.)
>>> b=2*a
>>> c=a+b
>>> g=tf.gradients(c, [a,b])
>>> sess=tf.Session()
2018-09-20 13:50:59.616341: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-09-20 13:50:59.616400: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1)
>>> print sess.run(g)
[3.0, 1.0]
Since c=3a, I expected the first partial (partial c with respect to a) to be 3.0. But, it is also true that c=1.5b, so I expected the second partial derivative to be 1.5, not 1.0.
On the other hand, if I do the following:
>>> b = tf.Variable(2.)
>>> a = 0.5*b
>>> c = a+b
>>> g = tf.gradients(c,[a,b])
I get this result:
>>> print sess.run(g)
[1.0, 1.5]
I have similar problems with this answer.
Additionally, I would think I'm looking for the same information about the same function at the same point with the same constraint in these two cases. I would expect the same answers.
Have I forgotten something truly embarrassing about partial derivatives or algebra? Or, am I fundamentally misunderstanding something about what I can expect from from Tensorflow gradients?
Is there something to do with the graph construction that ends up creating a situation where b depends on a, but a is independent of b? Or, is the true problem that gradients should only be taken with respect to variables that are strictly independent of each other?

Related

Converting tensorflow session-based code to distribute.Mirroredstrategy

I'm pretty new to Tensorflow and I'll be the first to admit I'm a bit confused and turned around and might very well be barking up the wrong tree.
First: This is NOT a question about getting my GPUs working and seen by tensorflow(TF); I have verified from inside the container the GPU's are detected by TF. (using tensorflow/tensorflow:1.13.1-gpu-py3)
2020-02-20 22:24:25.233916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-20 22:24:25.233933: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 1 2 3 4 5
2020-02-20 22:24:25.233939: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N Y Y Y Y Y
2020-02-20 22:24:25.233943: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: Y N Y Y Y Y
2020-02-20 22:24:25.233947: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: Y Y N Y Y Y
2020-02-20 22:24:25.233950: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: Y Y Y N Y Y
2020-02-20 22:24:25.233954: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 4: Y Y Y Y N Y
2020-02-20 22:24:25.233958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 5: Y Y Y Y Y N
2020-02-20 22:24:25.234135: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7623 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-02-20 22:24:25.234370: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 7624 MB memory) -> physical GPU (d
evice: 1, name: GeForce GTX 1070, pci bus id: 0000:02:00.0, compute capability: 6.1)
2020-02-20 22:24:25.234516: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 7624 MB memory) -> physical GPU (d
evice: 2, name: GeForce GTX 1070, pci bus id: 0000:04:00.0, compute capability: 6.1)
2020-02-20 22:24:25.234623: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 7624 MB memory) -> physical GPU (d
evice: 3, name: GeForce GTX 1070, pci bus id: 0000:05:00.0, compute capability: 6.1)
2020-02-20 22:24:25.234832: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:4 with 7624 MB memory) -> physical GPU (d
evice: 4, name: GeForce GTX 1070, pci bus id: 0000:07:00.0, compute capability: 6.1)
2020-02-20 22:24:25.234949: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:5 with 7624 MB memory) -> physical GPU (d
evice: 5, name: GeForce GTX 1070, pci bus id: 0000:08:00.0, compute capability: 6.1)
Second: The code as-is does work; I have successfully run the training model, but of course it only used the first GPU.
The code I'm using is a GAN project and uses a 'with' block for training:
with tf.Session() as session:
# Time stamp
localtime = time.asctime( time.localtime(time.time()) )
print("Starting TensorFlow session...")
print("Local current time :", localtime)
# Start TensorFlow session...
session.run(tf.global_variables_initializer())
.
.
.
I've been going in circles (and crazy) trying to figure out how to use the recommended tf.distribute.MirroredStrategy() to do parallel training across my GPUs. Everything I've come across so far leads in circles or stops short of applicable examples.
Is there a straightforward way to modify the session code to use the mirrored strategy? Is there just a more basic way to get the session calls to train across multiple GPUs?
It's doable, but the short answer to this is no. There's no straightforward way.
Everything I've been able to find requires a good understanding of tensorflow and a fair amount of work to port a 'stock' tensorflow session over to multi GPU. It requires running multiple sessions with assigned GPUs and figuring out how to coordinate the training data between everything.
Switching to the newer strategy paradigm (or keras multigpu) requires figuring out how to express the learning model in layers vs sessions. Again, something that requires a pretty solid handle on TF.
If you're starting from scratch, I'd say look into keras or the mirrored strategy from the beginning.
Managing sessions and coordinating data is a bit of a headache (thats why there's nice wrappers now) but it's been done a lot.
If you're a beginner like me let me save you some frustration; it's a big task either go a different way or buckle up no easy answer.
https://jhui.github.io/2017/03/07/TensorFlow-GPU/

Tensorflow low GPU utilization

I am running windows 10, core i7-8700 cpu, gtx geforce 1660 ti GPU.
When training models, gpu utilization is very low (5-10% at max, sometimes lower).
Even is network is five layers. CPU utilization on the other hand is 30% and above.
Please check the following:
The CUDA and CuDNN versions match. According to the statistics, it may very well use the CPU instead of GPU while training. You can try to see if your GPU is available below option 2.
If the former is solved, you may want to increase the batch_size, in case there is a very small batch size. It may be the case that TensorFlow pre-allocates a small amount of GPU to your training.
For step 1, in order to verify that the video card is both available and used, make use of the next lines of code:
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
tf.debugging.set_log_device_placement(True)
# Create some tensors
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)
print(c)
The print should contain(along the result) the following information:
Executing op MatMul in device
/job:localhost/replica:0/task:0/device:GPU:0

SageMaker fails when using Multi-GPU with keras.utils.multi_gpu_model

Running AWS SageMaker with a custom model, the TrainingJob fails with an Algorithm Error when using Keras plus a Tensorflow backend in multi-gpu configuration:
from keras.utils import multi_gpu_model
parallel_model = multi_gpu_model(model, gpus=K)
parallel_model.compile(loss='categorical_crossentropy',
optimizer='rmsprop')
parallel_model.fit(x, y, epochs=20, batch_size=256)
This simple parallel model loading will fail. There is no further error or exception from CloudWatch logging. This configuration works properly on local machine with 2x NVIDIA GTX 1080, same Keras Tensorflow backend.
According to SageMaker documentation and tutorials the multi_gpu_model utility will work ok when Keras backend is MXNet, but I did not find any mention when the backend is Tensorflow with the same multi gpu configuration.
[UPDATE]
I have updated the code with the suggested answer below, and I'm adding some logging before the TrainingJob hangs
This logging repeats twice
2018-11-27 10:02:49.878414: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3
2018-11-27 10:02:49.878462: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-27 10:02:49.878471: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 1 2 3
2018-11-27 10:02:49.878477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N Y Y Y
2018-11-27 10:02:49.878481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1: Y N Y Y
2018-11-27 10:02:49.878486: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2: Y Y N Y
2018-11-27 10:02:49.878492: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3: Y Y Y N
2018-11-27 10:02:49.879340: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 14874 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1b.0, compute capability: 7.0)
2018-11-27 10:02:49.879486: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:1 with 14874 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1c.0, compute capability: 7.0)
2018-11-27 10:02:49.879694: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:2 with 14874 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1d.0, compute capability: 7.0)
2018-11-27 10:02:49.879872: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:3 with 14874 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1e.0, compute capability: 7.0)
Before there is some logging info about each GPU, that repeats 4 times
2018-11-27 10:02:46.447639: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 3 with properties:
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:00:1e.0
totalMemory: 15.78GiB freeMemory: 15.37GiB
According to the logging all the 4 GPUs are visible and loaded in the Tensorflow Keras backend. After that no application logging follows, the TrainingJob status is inProgress for a while, after that it becomes Failed with the same Algorithm Error.
Looking at CloudWatch logging I can see some metrics at work. Specifically GPU Memory Utilization, CPU Utilization are ok, while GPU utilization is 0%.
[UPDATE]
Due to a known bug on Keras that is about saving a multi gpu model, I'm using this override of the multi_gpu_model utility in keras.utils
from keras.layers import Lambda, concatenate
from keras import Model
import tensorflow as tf
def multi_gpu_model(model, gpus):
#source: https://github.com/keras-team/keras/issues/8123#issuecomment-354857044
if isinstance(gpus, (list, tuple)):
num_gpus = len(gpus)
target_gpu_ids = gpus
else:
num_gpus = gpus
target_gpu_ids = range(num_gpus)
def get_slice(data, i, parts):
shape = tf.shape(data)
batch_size = shape[:1]
input_shape = shape[1:]
step = batch_size // parts
if i == num_gpus - 1:
size = batch_size - step * i
else:
size = step
size = tf.concat([size, input_shape], axis=0)
stride = tf.concat([step, input_shape * 0], axis=0)
start = stride * i
return tf.slice(data, start, size)
all_outputs = []
for i in range(len(model.outputs)):
all_outputs.append([])
# Place a copy of the model on each GPU,
# each getting a slice of the inputs.
for i, gpu_id in enumerate(target_gpu_ids):
with tf.device('/gpu:%d' % gpu_id):
with tf.name_scope('replica_%d' % gpu_id):
inputs = []
# Retrieve a slice of the input.
for x in model.inputs:
input_shape = tuple(x.get_shape().as_list())[1:]
slice_i = Lambda(get_slice,
output_shape=input_shape,
arguments={'i': i,
'parts': num_gpus})(x)
inputs.append(slice_i)
# Apply model on slice
# (creating a model replica on the target device).
outputs = model(inputs)
if not isinstance(outputs, list):
outputs = [outputs]
# Save the outputs for merging back together later.
for o in range(len(outputs)):
all_outputs[o].append(outputs[o])
# Merge outputs on CPU.
with tf.device('/cpu:0'):
merged = []
for name, outputs in zip(model.output_names, all_outputs):
merged.append(concatenate(outputs,
axis=0, name=name))
return Model(model.inputs, merged)
This works ok on local 2x NVIDIA GTX 1080 / Intel Xeon / Ubuntu 16.04. It will fails on SageMaker Training Job.
I have posted this issue on AWS Sagemaker forum in
TrainingJob custom algorithm with Keras backend and multi GPU
SageMaker Fails when using Multi-GPU with
keras.utils.multi_gpu_model
[UPDATE]
I have slightly modified the tf.session code adding some initializers
with tf.Session() as session:
K.set_session(session)
session.run(tf.global_variables_initializer())
session.run(tf.tables_initializer())
and now at least I can see that one GPU (I assume device gpu:0) is used from the instance metrics. The multi-gpu does not work anyways.
This might not be the best answer for your problem, but this is what I am using for a multi-gpu model with Tensorflow backend. First i initialize using:
def setup_multi_gpus():
"""
Setup multi GPU usage
Example usage:
model = Sequential()
...
multi_model = multi_gpu_model(model, gpus=num_gpu)
multi_model.fit()
About memory usage:
https://stackoverflow.com/questions/34199233/how-to-prevent-tensorflow-from-allocating-the-totality-of-a-gpu-memory
"""
import tensorflow as tf
from keras.utils.training_utils import multi_gpu_model
from tensorflow.python.client import device_lib
# IMPORTANT: Tells tf to not occupy a specific amount of memory
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.allow_growth = True # dynamically grow the memory used on the GPU
sess = tf.Session(config=config)
set_session(sess) # set this TensorFlow session as the default session for Keras.
# getting the number of GPUs
def get_available_gpus():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos if x.device_type == 'GPU']
num_gpu = len(get_available_gpus())
print('Amount of GPUs available: %s' % num_gpu)
return num_gpu
Then i call
# Setup multi GPU usage
num_gpu = setup_multi_gpus()
and create a model.
...
After which you're able to make it a multi GPU model.
multi_model = multi_gpu_model(model, gpus=num_gpu)
multi_model.compile...
multi_model.fit...
The only thing here that is different from what you are doing is the way Tensorflow is initializing the GPU's. I can't imagine it being the problem, but it might be worth trying out.
Good luck!
Edit: I noticed sequence to sequence not being able to work with multi GPU. Is that the type of model you are trying to train?
I apologize for the slow response.
It seems there are a lot of threads that are running in parallel, and I want to link them together, so that other individuals who have the same issue can see the progress and discussion going on.
https://forums.aws.amazon.com/thread.jspa?messageID=881541
https://forums.aws.amazon.com/thread.jspa?messageID=881540
https://github.com/aws/sagemaker-python-sdk/issues/512
There a few questions in regards to this.
What version of TensorFlow and Keras?
I am not too sure what is causing this problem. Does your container have all of the needed dependencies such as CUDA and etc? https://www.tensorflow.org/install/gpu
Were you able to train using single GPU with Keras?

Tensorflow can not run integer matrix multiplication on GPU

Take a look at this example, where I attempt to multiple two tf.int32 matrices using my GPU.
import tensorflow as tf
matrix1 = tf.constant([[3,3]])
matrix2 = tf.constant([[2],[2]])
with tf.device("/gpu:0"):
product = tf.matmul(matrix1,matrix2)
with tf.Session() as sess:
result = sess.run(product)
print(result)
It is similar to the example found on https://www.tensorflow.org/versions/r0.10/get_started/basic_usage.html
I get the output:
...
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7715
pciBusID 0000:03:00.0
Total memory: 7.92GiB
Free memory: 213.62MiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:839] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:03:00.0)
E tensorflow/core/client/tensor_c_api.cc:485] Cannot assign a device to node 'MatMul': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
[[Node: MatMul = MatMul[T=DT_INT32, transpose_a=false, transpose_b=false, _device="/device:GPU:0"](Const, Const_1)]]
Why can't I perform a matrix multiplication on the GPU? I can fix this by using allow_soft_placement = True, but I would like to do this on the GPU..
Integer multiplication is currently not implemented for the GPU in TensorFlow, and your matrices matrix1 and matrix2 have type tf.int32. (It turns out that it is easy to implement but, for various reasons discussed in this answer, TensorFlow doesn't include op registrations for tf.int32 on GPU devices.)
Assuming you are actually interested in multiplying (much larger) floating-point matrices, you can change your program to:
import tensorflow as tf
matrix1 = tf.constant([[3., 3.]])
matrix2 = tf.constant([[2.], [2.]])
with tf.device("/gpu:0"):
product = tf.matmul(matrix1,matrix2)
with tf.Session() as sess:
result = sess.run(product)
print(result)
...and the multiplication will execute on your GPU.

Tensorflow issue with GPU on matmul. GPU isn't recognized

I installed tensorflow with gpu, cuda 7.0 and cudnn 6.5. When I import tensorflow it works well.
I am trying to run a simple matrix multiplication on Tensorflow and it doesn't want to use my gpu though it seems to recognize it. I have this issue on my computer with a nvidia geforce 970m and on a cluster with two titan Z.
My first code is :
import tensorflow as tf
import numpy as np
size=100
#I create 2 matrix
mat1 = np.random.random_sample([size, size])*100
mat2 = np.random.random_sample([size, size])*100
a = tf.constant(mat1)
b = tf.constant(mat2)
c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
sess.run(c)
This code works and the result is :
Const_1: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:289] Const_1: /job:localhost/replica:0/task:0/gpu:0
Const: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:289] Const: /job:localhost/replica:0/task:0/gpu:0
MatMul: /job:localhost/replica:0/task:0/cpu:0
I tensorflow/core/common_runtime/simple_placer.cc:289] MatMul: /job:localhost/replica:0/task:0/cpu:0
So in my way, tensorflow uses my gpu to create constant but not for matmul (that is weird). Then, I force the gpu like this :
with tf.device("/gpu:0"):
a = tf.constant(mat1)
b = tf.constant(mat2)
c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
sess.run(c)
And Tensorflow returns :
InvalidArgumentError: Cannot assign a device to node 'MatMul': Could not satisfy explicit device specification '/gpu:0'
If someone have the same problem or an idea, I will be glad to read your answer !
I do not have enough reputation to comment, I have come across a similar issue, my question is here
TensorFlow: critical graph operations assigned to cpu rather than gpu