apologies beforehand if these are basic questions - I've done some digging around and can't find straightforward answers. If there are links to any resources that can help that would be great too!
I'm currently looking at the below piece of code for a class that implements optimizer.optimizer. I'm having trouble understanding the following things:
def _create_slots(self, var_list):
# Create slots for the first and second moments.
for v in var_list:
self._zeros_slot(v, "m", self._name)
def _apply_dense(self, grad, var):
lr_t = math_ops.cast(self._lr_t, var.dtype.base_dtype)
beta_t = math_ops.cast(self._beta_t, var.dtype.base_dtype)
alpha_t = math_ops.cast(self._alpha_t, var.dtype.base_dtype)
eps = 1e-7 #cap for moving average
m = self.get_slot(var, "m")
m_t = m.assign(tf.maximum(beta_t * m + eps, tf.abs(grad)))
var_update = state_ops.assign_sub(var, lr_t*grad*(1.0+alpha_t*tf.sign(grad)*tf.sign(m_t) ) )
#Create an op that groups multiple operations
#When this op finishes, all ops in input have finished
return control_flow_ops.group(*[var_update, m_t])
In create slots, is it allocating a variable "m" for each weight? And if I needed more variables then would I have another line in the for loop like self._zero_slots(v, "another variable", self._name) and so forth?
Is the input grad and var in _apply_dense the gradient per weight and variables per weight? What if I needed other gradients, e.g. if I wanted to do a global update based on the whole gradient matrix?
How is m being updated with new weights? It seems like on every iteration it would just be multiplied by beta.
In the line with var_update, it seems like var is being updated with the weight update, but I can also get my variables from var?
Related
I am trying to achieve the following:
compute the losses in the previous 25 predictions and sum them before
computing the gradient. I have tried this:
loss_summation=tf.Variable(0,dtype=tf.dtypes.float32,name="loss")
xentropy=tf.nn.sparse_softmax_cross_entropy_with_logits(labels=next_element[1],logits=logits2,name="xentropy")
loss=tf.math.reduce_sum(tf.reduce_mean(xentropy,name="loss"))
loss_summation=tf.assign(loss_summation,loss_summation+loss)
optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate)
gvs = optimizer.compute_gradients(loss_summation,[vars])
with tf.Session() as sess():
for i in range(25):
b=sess.run([loss_summation])
However optimizer.compute_gradients() complains that
None values not supported. How can go around this ?
I am actually trying to implement the following function(feedforward of LSTM) in tensorflow to predict the next word given the previous ones
def feedforward(self,x_s,hpre,targets,p_s):
fts,its,gts,css,ots,output,inputs=[],[],[],[],[],[],[]
losses=[]
hprev=hpre
hts=[hprev]
loss=0
losses=[]
previous_state=p_s
css.append(previous_state)
for x,y in zip(x_s,targets):
k=np.zeros((self.vocab_size,1))
k[x]=1
M_c=np.row_stack((hprev,k))
ft=self.sigmoid(np.dot(self.W1,M_c)+self.b1)
fts.append(ft)
it=self.sigmoid(np.dot(self.W2,M_c)+self.b2)
its.append(it)
gt=np.tanh(np.dot(self.W3,M_c)+self.b3)
gts.append(gt)
cs=(ft*previous_state)+(it*gt)
previous_state=cs
css.append(cs)
ot=self.sigmoid(np.dot(self.W4,M_c)+self.b4)
ots.append(ot)
ht=ot*np.tanh(cs)
hts.append(ht)
yt=self.softmax(np.dot(self.W5,ht)+self.b5)
hprev=ht
output.append(yt)
inputs.append(M_c)
loss+=-np.log(yt[y])
losses.append(loss)
return fts,its,gts,css,ots,output,hts,loss,hts[-1],css[-1],inputs
x_s is a list of integers representing words.
x_s=[0,1,2,3,4,5,6,7,8....,24]
targets is the list of integers expected i.e if x_s=0 then next letter is 1
targets=[1,2,3,4,5,6,7,8,9...,25]
The loss which is a summation of 25 losses is what will be minimized.
There are a few things you need to address here:
Is there a good reason not to just use larger batches? Are you trying to implement the lookahead optimizer or something?
You look like you're getting started with TensorFlow. Consider turning on eager execution with tf.enable_eager_execution(). TensorFlow 2.0 is coming soon, don't waste your time messing with tf.Sessions.
Variables are not differentiable. So accumulating the losses in a variable doesn't make any sense.
I would make a copy of all the model's variables, and accumulate new values there. Then, after N iterations assign those values back to the model. Something like:
model = tf.keras.Sequential(...)
vars = model.trainable_variables
weight_acc = [tf.Variable(var) for var in model.trainable_variables]
for n,(batch, label) in enumerate(dataset):
with tf.GradientTape() as tape:
pred = model(batch)
loss = cal_loss(batch, label)
grads = tape.gradients(loss, vars)
for g, a in zip(grad, weight_acc):
a.assign_add(learning_rate*g)
if n%25 == 0:
for a, v in zip(weight_acc, vars):
v.assign_add(lookahead_fraction*(a-v))
I am trying to create a custom gradient in tensorflow to implement the exponentially smoothed (unbiased) gradient of a logarithm that is suggested in this paper (https://arxiv.org/pdf/1801.04062.pdf). What I need to do is crease a new variable that stores an exponentially smoothed value, which is updated and used in a custom gradient function. Additionally, I need a flag which tells me when the first gradient calculation is being done, so I can initialize the exponentially smoothed value to the appropriate (data-dependent) value. Furthermore, the output of the custom gradient function must be just the gradient, so it will be a pain in the butt to access the output of a tf.assign from inside the custom gradient. Lastly, I do not want to create a second operation that 'manually' initializes the exponential smoothing by running it separately in my training loop. Anyway, this is all too complicated, so I have an abstract, but simple, problem outlined below, the solution to which would solve my problem:
What I need to be able to do is update one variable in a manner which is conditional upon a second, and furthermore I need to update the second variable without providing it as explicit output by my function. Example code demonstrating my problem is below:
import tensorflow as tf
a = tf.get_variable(name = "test",initializer=True)
b = tf.get_variable(name = "testval",initializer = 10.)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
def make_function(inp):
with tf.variable_scope("",reuse = True):
a = tf.get_variable(name = "test",dtype = tf.bool)
b = tf.get_variable(name = "testval")
iftrue = lambda: [tf.assign(b,inp),tf.assign(a,False)]
iffalse = lambda: [tf.assign(b,(b + inp)/2),tf.assign(a,False)]
acond,bcond = tf.cond(a,iftrue,iffalse)
return acond
I = tf.placeholder(tf.float32)
tcond = make_function(I)
print("{}\tThe initial values of a and b".format(sess.run([a,b])))
print("{}\t\tRun, tcond1. output is the updated value of b.".format(sess.run(tcond,{I:1})))
print("{}\tNow we see that b has been updated, but a has not.".format(sess.run([a,b])))
print("{}\t\tSo now the value is 2 instead of 1.5 like it should be.".format(sess.run(tcond,{I:2})))
The output is:
[True, 10.0] The initial values of a and b
1.0 Run, tcond1. output is the updated value of b.
[True, 1.0] Now we see that b has been updated, but a has not.
2.0 So now the value is 2 instead of 1.5 like it should be.
Now, I understand that I need to have a line like sess.run(acond) where acond is the output of the conditional within make_function, but I can't return that because my function needs to only return the value of b (not a), and I don't want to have to carry around an extra op that I need to remember to run on the first training iteration, but not on the others.
So, is there a way to add the assignment op acond to the computational graph without explicitly returning it and running with it sess.run?
Add this operation to a custom collection and, then, create a dependency between your final op (e.g. the train_op) and your acond.
Inside the method:
tf.add_to_collection("to_run", acond)
In the definition of the final op:
to_run = tf.get_collection("to_run")
with tf.control_dependencies(to_run):
final_op = <something>
When you run final_op you are assured your acond has been already executed.
I have been tinkering around a lot with tensorflow in the past few days however I am quite unsure whether a function I wrote would break the backpropagation in a Neural network. I thought I'd ask here before I try to integrate this function in a NN. So the basic setup is I want to add two matricies with
op = tf.add(tfObject, tfImageBackground)
where tfImageBackground is some constant image. (i.e. an RGBA image of size 800, 800 with R = G = B = A = 0) and the tfObject is again a matrix with the same dimenstion however we get that with the function I am unsure about
def getObject(vector):
objectId = vector[0]
x = vector[1]
y = vector[2]
xEnd = baseImageSize-(x+objectSize)
yStart =baseImageSize- (y+objectSize)
padding = tf.convert_to_tensor([[x, xEnd], [yStart, y],[0,0]])
RTensor = tfObjectMatrix[objectId,:,:,0:1]
GTensor = tfObjectMatrix[objectId,:,:,1:2]
BTensor = tfObjectMatrix[objectId,:,:,2:3]
ATensor = tfObjectMatrix[objectId,:,:,3:4]
paddedR = tf.pad(tensor = RTensor,
paddings= padding,
mode='Constant',
name='padAverageRed',
constant_values=255)
...
generates padding for every channel
...
finalTensor=tf.concat([paddedR, paddedG, paddedB, paddedA], 2)
return finalTensor
The tfObjectMatrix is a list of images which never change.
I did check wether I was able to generate a tf.gradient from the op, which turned out to work. I am unsure if that is sufficient for backpropagation to work though.
Thanks for you time and effort. Any input at all would be greatly appreciated.
TensorFlow will backpropagate to everything by default. As per your code, everything will receive gradients with a training operation from an optimizer. So to answer your question, backpropagation will work.
The only thing to consider, is that you say tfObjectMatrix is a list of images that will not change. So you might not want it to receive any gradients. Therefore you might want to look into tf.stop_gradient() and maybe use it like OM = tf.stop_gradient( tfObjectMatrix ) and work with that OM in your function.
I am attempting to modify and implement googles pattern of the Asynchronous Advantage Actor Critic (A3C) model. There are plenty of examples online out there that have gotten me started but I am running into a issues attempting to expand the samples.
All of the examples I can find focus on pong as the example which has a state based output of left or right or stay still. What I am trying to expand this to is a system that also has a separate on off output. In the context of pong, it would be a boost to your speed.
The code I am basing my code on can be found here. It is playing doom, but it still has the same left and right but also a fire button instead of stay still. I am looking at how I could modify this code such that fire was an independent action from movement.
I know I can easily add another separate output from the model so that the outputs would look something like this:
self.output = slim.fully_connected(rnn_out,a_size,
activation_fn=tf.nn.softmax,
weights_initializer=normalized_columns_initializer(0.01),
biases_initializer=None)
self.output2 = slim.fully_connected(rnn_out,1,
activation_fn=tf.nn.sigmoid,
weights_initializer=normalized_columns_initializer(0.01),
biases_initializer=None)
The thing I am struggling with is how then do I have to modify the value output and redefine the loss function. The value is still tied to the combination of the two outputs. Or is there a separate value output for each of the independent output. I feel like it should still only be one output as the value, but I am unsure how I them use that one value and modify the loss function to take this into account.
I was thinking of adding a separate term to the loss function so that the calculation would look something like this:
self.actions_1 = tf.placeholder(shape=[None],dtype=tf.int32)
self.actions_2 = tf.placeholder(shape=[None],dtype=tf.float32)
self.actions_onehot = tf.one_hot(self.actions_1,a_size,dtype=tf.float32)
self.target_v = tf.placeholder(shape=[None],dtype=tf.float32)
self.advantages = tf.placeholder(shape=[None],dtype=tf.float32)
self.responsible_outputs = tf.reduce_sum(self.output1 * self.actions_onehot, [1])
self.responsible_outputs_2 = tf.reduce_sum(self.output2 * self.actions_2, [1])
#Loss functions
self.value_loss = 0.5 * tf.reduce_sum(tf.square(self.target_v - tf.reshape(self.value,[-1])))
self.entropy = - tf.reduce_sum(self.policy * tf.log(self.policy))
self.policy_loss = -tf.reduce_sum(tf.log(self.responsible_outputs)*self.advantages) -
tf.reduce_sum(tf.log(self.responsible_outputs_2)*self.advantages)
self.loss = 0.5 * self.value_loss + self.policy_loss - self.entropy * 0.01
I am looking to know if I am on the right track here, or if there are resources or examples that I can expand off of.
First of all, the example you are mentioning don't need two output nodes. One output node with continuous output value is enough to solve. Also you should't use placeholder for advantage, but rather you should use for discounted reward.
self.discounted_reward = tf.placeholder(shape=[None],dtype=tf.float32)
self.advantages = self.discounted_reward - self.value
Also while calculating the policy loss you have to use tf.stop_gradient to prevent the value node gradient feedback contribution for policy learning.
self.policy_loss = -tf.reduce_sum(tf.log(self.responsible_outputs)*tf.stop_gradient(self.advantages))
I'm implementing a RNN cell around BasicLSTMCell where I want to be able to look back on past hidden states (across batch boundaries). I'm using dynamic_rnn() and the basic pattern I use is:
def __call__(self, inputs, old_state, scope=None):
mem = old_state[2]
# [do something with with mem]
cell_out, new_state = self.cell(inputs,
(old_state[0],
old_state[1]))
h_state = new_state.h
c_state = new_state.c
# control dependency required because of self.buf_index
with tf.get_default_graph().control_dependencies([cell_out]):
new_mem = write_to_buf(self.out_buf,
cell_out,
self.buf_index)
# update the buffer index
with tf.get_default_graph().control_dependencies(new_mem):
inc_step = tf.assign(self.buf_index, (self.buf_index + 1) %
self.buf_size)
with tf.get_default_graph().control_dependencies([inc_step]):
h_state = tf.identity(h_state)
t = [c_state, h_state, new_mem]
return cell_out, tuple(t)
self.buf and self.buf_index are variables. write_to_buf() is a function that uses scatter_update() to write the new hidden states to the buffer and returns the result.
I rely on the assumption that accesses to scatter updates return value guarantee that the new variable value is used (similar to this) so that caching of variables does not mess things up.
From debug prints it seems to work but it would be nice to get some confirmation or suggestions on alternatives.