Using tf.custom_gradient in tensorflow r1.8 - tensorflow

System information
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Y
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): r1.8
Python version: 2.7.14
GCC/Compiler version (if compiling from source): 5.4
CUDA/cuDNN version: 8.0/7.0
GPU model and memory: GTX1080, 8G
Bazel version: N/A
Exact command to reproduce: python test_script.py
Describe the problem
Hello, I'm trying to make a custom_gradient op using the function of tf.custom_gradient. I made my test code based on the API explanation online. However, it seems there is a problem in the custom_gradient function. Thanks!
Source code / logs
import tensorflow as tf
import numpy as np
#tf.custom_gradient
def log1pexp(x):
e = tf.exp(x)
def grad(dy):
return dy * (1 - 1 / (1 + e))
return tf.log(1 + e), grad
x = tf.constant(100.)
f = tf.custom_gradient(log1pexp)
y, dy = f(x)
sess = tf.Session()
print (y.eval(session=sess), y.eval(session=sess).shape)
File "/home/local/home/research/DL/unit_tests/tf_test_custom_grad.py", line 14, in <module>
y, dy = f(x)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/custom_gradient.py", line 111, in decorated
return _graph_mode_decorator(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/custom_gradient.py", line 132, in _graph_mode_decorator
result, grad_fn = f(*args)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 439, in __iter__
"Tensor objects are not iterable when eager execution is not "
TypeError: Tensor objects are not iterable when eager execution is not enabled. To iterate over this tensor use tf.map_fn.

If you just want to test the code in the documentation, here is the way.
The following code will give the instable [nan] result:
import tensorflow as tf
def log1pexp(x):
return tf.log(1 + tf.exp(x))
x = tf.constant(100.)
y = log1pexp(x)
dy = tf.gradients(y, x)
with tf.Session() as sess:
print(sess.run(dy))
And the following code will give the correct result [1.0]:
import tensorflow as tf
#tf.custom_gradient
def log1pexp(x):
e = tf.exp(x)
def grad(dy):
return dy * (1 - 1 / (1 + e))
return tf.log(1 + e), grad
x = tf.constant(100.)
y = log1pexp(x)
dy = tf.gradients(y, x)
with tf.Session() as sess:
print(sess.run(dy))
Details:
The main problem here is that you are trying to decorate log1pexp twice in your code: once with #tf.custom_gradient and once with f = tf.custom_gradient(log1pexp). In python, #tf.custom_gradient here is equivalent to log1pexp = tf.custom_gradient(log1pexp). You should do this only once, especially here for the following reason.
tf.custom_gradient needs to call the function being pass to it to get both the function output and the gradient, i.e. expecting two returns. During decoration, everything works as expected because log1pexp returns tf.log(1 + e) and grad. After decorating log1pexp, log1pexp (returned by tf.custom_gradient) becomes a new function which returns only one tensor tf.log(1 + e). When you do f = tf.custom_gradient(log1pexp) after decorating log1pexp, tf.custom_gradient can only get one return which is the single tensor tf.log(1 + e). It will try to split this tensor into two by iterating this returned tensor. But it is wrong and is not allowed as the error message stated:
Tensor objects are not iterable when eager execution is not enabled.
You should not decorate log1pexp twice anyway. But this is why you got this error. One more thing to mention, your code will trigger another error for the same reason even if you removed #tf.custom_gradient. After removing #tf.custom_gradient, the line f = tf.custom_gradient(log1pexp) should work as expected. But f is a function returning only one tensor. y, dy = f(x) is wrong and will not work.

Related

Converting a fully connected neural network with variable number of hidden layers from tensorflow to pytorch

I recently started learning pytorch and I am trying to convert a part of a large script including coding a MLP with variable number of hidden layers from Tensorflow to pytorch.
import tensorflow as tf
### Base neural network
def init_mlp(layer_sizes, std=.01, bias_init=0.):
params = {'w':[], 'b':[]}
for n_in, n_out in zip(layer_sizes[:-1], layer_sizes[1:]):
params['w'].append(tf.Variable(tf.random_normal([n_in, n_out], stddev=std)))
params['b'].append(tf.Variable(tf.mul(bias_init, tf.ones([n_out,]))))
return params
def mlp(X, params):
h = [X]
for w,b in zip(params['w'][:-1], params['b'][:-1]):
h.append( tf.nn.relu( tf.matmul(h[-1], w) + b ) )
#h.append( tf.nn.tanh( tf.matmul(h[-1], w) + b ) )
return tf.matmul(h[-1], params['w'][-1]) + params['b'][-1]
def compute_nll(x, x_recon_linear):
return tf.reduce_sum(tf.nn.sigmoid_cross_entropy_with_logits(x_recon_linear, x), reduction_indices=1, keep_dims=True)
def gauss_cross_entropy(mean_post, std_post, mean_prior, std_prior):
d = (mean_post - mean_prior)
d = tf.mul(d,d)
return tf.reduce_sum(-tf.div(d + tf.mul(std_post,std_post),(2.*std_prior*std_prior)) - tf.log(std_prior*2.506628), reduction_indices=1, keep_dims=True)
how could I write down similarly weights and bias variables and attach them in each hidden layer in pytorch?
how could I convert gauss_cross_entropy and compute_nll
functions as well (finding equivalent syntax)?
Are these two codes compatible?
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as func
from torch.distributions import Normal, Categorical, Independent
from copy import
device = "cpu"
if torch.cuda.is_available():
device = "cuda:0"
if torch.cuda.device_count() > 1:
net = nn.DataParallel(net)
net.to(device)
def init_mlp(layer_sizes, std=.01, bias_init=0.):
params = {'w':[], 'b':[]}
for n_in, n_out in zip(layer_sizes[:-1], layer_sizes[1:]):
params['w'].append(torch.tensor(Normal([n_in, n_out], torch.tensor([std])) ,requires_grad=True))
params['b'].append(torch.tensor(torch.mul(bias_init, torch.ones([n_out,])),requires_grad=True))
return params
def mlp(X, params):
h = [X]
for w,b in zip(params['w'][:-1], params['b'][:-1]):
h.append( torch.nn.ReLU( tf.matmul(h[-1], w) + b ) )
return torch.matmul(h[-1], params['w'][-1]) + params['b'][-1]
def compute_nll(x, x_recon_linear):
return torch.sum(func.binary_cross_entropy_with_logits(x_recon_linear, x), reduction_indices=1, keep_dims=True)
def gauss_cross_entropy(mu_post, sigma_post, mu_prior, sigma_prior):
d = (mu_post - mu_prior)
d = torch.mul(d,d)
return torch.sum(-torch.div(d + torch.mul(sigma_post,sigma_post),(2.*sigma_prior*sigma_prior)) - torch.log(sigma_prior*2.506628), reduction_indices=1, keep_dims=True)
What is the substitute function for tf.placeholder in pytorch? For instance here:
class VAE(object):
def __init__(self, hyperParams):
self.X = tf.placeholder("float", [None, hyperParams['input_d']])
self.prior = hyperParams['prior']
self.K = hyperParams['K']
self.encoder_params = self.init_encoder(hyperParams)
self.decoder_params = self.init_decoder(hyperParams)
and also how should I change tf.shape in this line: tf.random_normal(tf.shape(self.sigma[-1]))
How could I write down similar weights and bias variables and attach them in each hidden layer in PyTorch?
An easier way to define those is to create a list containing the params as (weight, bias) tuples:
def init_mlp(layer_sizes, std=.01, bias_init=0.):
params = []
for n_in, n_out in zip(layer_sizes[:-1], layer_sizes[1:]):
params.append([
nn.init.normal_(torch.empty(n_in, n_out)).requires_grad_(True),
torch.empty(n_out).fill_(bias_init).requires_grad_(True)])
return params
Above I define my parameters as 'empty' (created with uninitialized data) tensors with torch.empty. I have used in-place functions such as nn.init.normal_ (there are many others available) and torch.Tensor.fill_ to fill the tensor with an arbitrary value (maybe it is .mul_(bias_init) you are looking for, based on your TensorFlow sample?).
For the inference code, you don't actually need to store the intermediate layer results:
def mlp(x, params):
for i, (W, b) in enumerate(params):
x = x#W + b
if i < len(params) - 1:
x = torch.relu(x)
return x
How could I convert gauss_cross_entropy and compute_nll functions as well (finding equivalent syntax)?
You can use PyTorch functions and mathematical operators to define your logic. For compute_loss you were using the built-in, which actually does not require summation after it, by default the losses of the batch elements are averaged.
def compute_loss(y_pred, y_true):
return F.binary_cross_entropy_with_logits(y_pred, y_true)
What is the substitute function for tf.placeholder in Pytorch?
You don't have placeholders in PyTorch, you compute your outputs explicitly using PyTorch operators, then you should be able to backpropagate through those operators and get the gradients for each parameter.
How should I change tf.shape in this line: tf.random_normal(tf.shape(self.sigma[-1]))
Function tf.shape returns the shape of the tensor, in PyTorch you call torch.Tensor.shape or by calling torch.Tensor.size: i.e. self.sigma[-1].shape or self.sigma[-1].size().

How can I update tensor (weight value) trying to use two separate network?

I've been trying to make AI for blackjack using RL. Now I'm trying to make two separate networks which is one way of DQN. I've searched the web and found some way and tried to use it but failed.
This error has occurred:
TypeError: Using a tf.Tensor as a Python bool is not allowed. Use if t is not None: instead of if t: to test if a tensor is defined, and use TensorFlow ops such as tf.cond to execute subgraphs conditioned on the value of a tensor.
Code:
import gym
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
def one_hot(x):
s=np.identity(600)
b = s[x[0] * 20 + x[1] * 2 + x[2]]
return b.reshape(1, 600)
def boolstr_to_floatstr(v):
if v == True:
return 1
elif v == False:
return 0
env=gym.make('Blackjack-v0')
learning_rate=0.5
state_number=600
action_number=2
#######################################3
X=tf.placeholder(tf.float32, shape=[1,state_number], name='input_data')
W1=tf.Variable(tf.random_uniform([state_number,128],0,0.01))#network for update
layer1=tf.nn.tanh(tf.matmul(X,W1))
W2=tf.Variable(tf.random_uniform([128,256],0,0.01))
layer2=tf.nn.tanh(tf.matmul(layer1,W2))
W3=tf.Variable(tf.random_uniform([256,action_number],0,0.01))
Qpred=tf.matmul(layer2,W3) # Qprediction
#####################################################################3
X1=tf.placeholder(shape=[1,state_number],dtype=tf.float32)
W4=tf.Variable(tf.random_uniform([state_number,128],0,0.01))#network for target
layer3=tf.nn.tanh(tf.matmul(X1,W4))
W5=tf.Variable(tf.random_uniform([128,256],0,0.01))
layer4=tf.nn.tanh(tf.matmul(layer3,W5))
W6=tf.Variable(tf.random_uniform([256,action_number],0,0.01))
target=tf.matmul(layer4,W6) # target
#################################################################
update1=W4.assign(W1)
update2=W5.assign(W2)
update3=W6.assign(W3)
Y=tf.placeholder(shape=[1,action_number],dtype=tf.float32)
loss=tf.reduce_sum(tf.square(Y-Qpred))#cost(W)=(Ws-y)^2
train=tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(loss)
num_episodes=1000
dis=0.99 #discount factor
rList=[] #record the reward
init=tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for i in range(num_episodes): #episode 만번
s = env.reset()
rALL = 0
done = False
e=1./((i/100)+1) #exploit or explore용 상수
total_loss=[]
while not done:
s = np.asarray(s)
s[2] = boolstr_to_floatstr(s[2])
#print(np.shape(one_hot(s)))
#print(one_hot(s))
Qs=sess.run(Qpred,feed_dict={X:one_hot(s).astype(np.float32)})
if np.random.rand(1)<e: #새로운 도전시도
a=env.action_space.sample()
else:
a=np.argmax(Qs) #그냥 내가아는한 최댓값의 액션 선택
s1,reward,done,_=env.step(a) #
s1=np.asarray(s1)
s1[2]=boolstr_to_floatstr(s1[2])
if done:
Qs[0,a]=reward
else:
Qs1=sess.run(target,feed_dict={X1:one_hot(s1)})
Qs[0,a]=reward+dis*np.max(Qs1) #optimal Q
sess.run(train,feed_dict={X:one_hot(s),Y:Qs})
if i%10==0: ##target 을 Qpredion으로 업데이트해줌
sess.run(update1,update2,update3)
if reward==1:
rALL += reward
else:
rALL+=0
s=s1
rList.append(rALL)
print('success rate: '+ str(sum(rList)/num_episodes))
print("Final Q-table values")
I need to print success rate finally. before DQN its 38%ish. If there is something wrong in my code considering its DQN algorithm, tell me please.
If you want to share the weights between different networks, then simply create layer with same name, using the scope with tf.variable_scope(self.name, reuse=tf.AUTO_REUSE): and then weights between networks will be shared automatically.

Tensorflow AssertionError "gradients list should have been aggregated by now"

I have a function f that is internally using some tf.while_loops and tf.gradients to compute the value y = f(x). Something like this
def f( x ):
...
def body( g, x ):
# Compute the gradient here
grad = tf.gradients( g, x )[0]
...
return ...
return tf.while_loop( cond, body, parallel_iterations=1 )
There are a few hundred lines of code. But I believe that those are the important points...
Now when I evaluate f(x), I get exactly the value I expect ..
y = known output of f(x)
with tf.Session() as sess:
fx = f(x)
print("Error = ", y - sess.run(fx, feed_dict)) # Prints 0
However, when I try to evaluate the gradient of f(x) with respect to x, that is,
grads = tf.gradients( fx, x )[0]
I get the error
AssertionError: gradients list should have been aggregated by now.
Here is the full trace:
File "C:/Dropbox/bob/tester.py", line 174, in <module>
grads = tf.gradients(y, x)[0]
File "C:\Anaconda36\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 649, in gradients
return [_GetGrad(grads, x) for x in xs]
File "C:\Anaconda36\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 649, in <listcomp>
return [_GetGrad(grads, x) for x in xs]
File "C:\Anaconda36\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 727, in _GetGrad
"gradients list should have been aggregated by now.")
AssertionError: gradients list should have been aggregated by now.
Could somebody please outline likely causes for this error? I have no idea where to even start looking for the issue...
Some observations:
Note that I have set the parallel iterations for the while loop to 1. This
should mean that there is no errors due to reading and writing from multiple threads.
If I discard the while loop, and just have f return body(), then the code runs:
# The following does not crash, but we removed the while_loop, so the output is incorrect
def f( x ):
...
def body( g, x ):
# Compute the gradient here
grad = tf.gradients( g, x )[0]
...
return ...
return body(...)
Obviously, the output is incorrect, but at least the gradients are computed.
I came across a similar issue. Some patterns I noted:
If the x used in tf.gradients was used in a manner that required dimension broadcasting in body, I got this error. If I changed it to one that didn't require broadcasting, tf.gradients returned [None]. I didn't test this extensively, so this pattern may not be consistent across all examples.
Both cases (returning [None] and raising this assertion error) can be resolved by differentiating tf.identity(y) rather than just y: grads = tf.gradients(tf.identity(y), xs) I have absolutely no idea why this works.

Tensorflow - running total

How can I add the number 5 after every iteration of the loop?
I want to do something like this:
weight = 0.225
for i in range(10):
weight += 5
print (weight)
Here is how I am trying in tensorflow but it never updates the weight
import tensorflow as tf
def dummy(x):
weights['h0'] = tf.add(weights['h0'], 5)
res = tf.add(weights['h0'], x)
return res
# build computational graph
a = tf.placeholder('float', None)
d = dummy(a)
weights = {
'h0': tf.Variable(tf.random_normal([1]))
}
# initialize variables
init = tf.global_variables_initializer()
# create session and run the graph
with tf.Session() as sess:
sess.run(init)
for i in range(10):
print (sess.run(d, feed_dict={a: [2]}))
# close session
sess.close()
There's an operation explicitly created for adding a value and assigning the result back to the input node: tf.assign_add
You should use it instead of tf.assing + tf.add.
Also, it's more important that you understand why you previous code won't work.
weights['h0'] = tf.add(weights['h0'], 5)
res = tf.add(weights['h0'], x)
At the fist line, you're defining a node add, whose inputs are weights['h0'] and 5 and you're assigning this node to a python variable weights['h0'].
Now, thus, weights['h0'] is a python variable holding a tensorflow node.
In the next line, you're defining another add node, between the previous node and x, and you return this node.
When the graph is evaluated, you evaluate the node pointed by res, that force the evaluation of the previous node (because res is a function of the node holded by weights['h0']).
The problem is the that your assignment at line 1 is a python assignment and not a tensorflow assignment.
Thus that assign operation is executed only in the python environment but it has no defined an assign node into the tensorflow graph.
P.S: when you use with you're defining a context manager that handles the closing operations for you. You can thus remove sess.close() because is executed automatically when you exit from that context
Apparently there is an assign operator
https://www.tensorflow.org/api_docs/python/tf/assign
weights['h0'] = tf.assign(weights['h0'], tf.add(weights['h0'], 5))

How can I implement a Binarizer Layer in TensorFlow?

I'm trying to implement the binarizer in page 4 of this paper. It's not too difficult of a function. It's simply this:
No gradients to be backpropagated for this function. I'm trying to do it in TensorFlow. There are two ways to go about it:
Implementing it in C++ using TensorFlow. However, the instructions are quite unclear to me. It would be great if someone could walk me through it. One thing that I was unclear was why is the gradient for ZeroOutOp implemented in Python?
I decided to go with the pure Python approach.
Here's the code:
import tensorflow as tf
import numpy as np
def py_func(func, inp, out_type, grad):
grad_name = "BinarizerGradients_Schin"
tf.RegisterGradient(grad_name)(grad)
g = tf.get_default_graph()
with g.gradient_override_map({"PyFunc": grad_name}):
return tf.py_func(func, inp, out_type)
'''
This is a hackish implementation to speed things up. Doesn't directly follow the formula.
'''
def _binarizer(x):
probability_matrix = (x + 1) / float(2)
probability_matrix = np.matrix.round(probability_matrix, decimals=0)
np.putmask(probability_matrix, probability_matrix==0.0, -1.0)
return probability_matrix
def binarizer(x):
return py_func(_binarizer, [x], [tf.float32], _BinarizerNoOp)
def _BinarizerNoOp(op, grad):
return grad
The problem happens here. Inputs are 32x32x3 CIFAR images and they get reduced to 4x4x64 in the last layer. My last layer has a shape of (?, 4, 4, 64), where ? is the batch size. After putting it through this by calling:
binarized = binarizer.binarizer(h_pool3)
h_deconv1 = tf.nn.conv2d_transpose(h_pool3, W_deconv1, output_shape=[batch_size, img_height/4, img_width/4, 64], strides=[1,2,2,1], padding='SAME') + b_deconv1
The following error occurs:
ValueError: Shapes (4, 4, 64) and (?, 4, 4, 64) are not compatible
I can kinda guess why this happens. The ? represents the batch size and after putting the last layer through the binarizer, the ? dimension seems to disappear.
I think you can proceed as described in this answer. Applied to our problem:
def binarizer(input):
prob = tf.truediv(tf.add(1.0, input), 2.0)
bernoulli = tf.contrib.distributions.Bernoulli(p=prob, dtype=tf.float32)
return 2 * bernoulli.sample() - 1
Then, where you setup your network:
W_h1, bias_h1 = ...
h1_before_bin = tf.nn.tanh(tf.matmul(x, W_h1) + bias_h1)
# The interesting bits:
t = tf.identity(h1_before_bin)
h1 = t + tf.stop_gradient(binarizer(h1_before_bin) - t)
However, I'm not sure how to verify that this works...