Tensorflow convolution layers have strange artefacts - tensorflow

Could anyone explain me what I'm doing wrong that my tensorboard graphs have additional groups when I use tf.layers.conv1d ?
For sake of simplicity I've created one tf.name_scope 'conv_block1' that contains: conv1d -> max_pool -> batch_norm, yet my graph has odd addtional blocks (see attached screenshot). Basically a superficial block 'conv1dwas added with weights for theconv_block1/conv1d` layer, and it is placed an groups. This makes the networks with multiple convolution blocks completely unreadable, am I doing something wrong or is this some kind of bug/performance feature in Tensorflow 1.4? Odd enough the dense layers are fine and the weights are properly scoped.
Here is the code if anyone wants to recreate the graph:
def cnn_model(inputs, mode):
x = tf.placeholder_with_default(inputs['wav'], shape=[None, SAMPLE_RATE, 1], name='input_placeholder')
with tf.name_scope("conv_block1"):
x = tf.layers.conv1d(x, filters=80, kernel_size=5, strides=1, padding='same', activation=tf.nn.relu)
x = tf.layers.max_pooling1d(x, pool_size=3, strides=3)
x = tf.layers.batch_normalization(x, training=(mode == tf.estimator.ModeKeys.TRAIN))
x = tf.layers.flatten(x)
x = tf.layers.dense(x, units=12)
return x
UPDATE 1
I've added even simpler example that can be executed directly to see the issue:
g = tf.Graph()
with g.as_default():
x = tf.placeholder(name='input', dtype=tf.float32, shape=[None, 16000, 1])
with tf.name_scope('group1'):
x = tf.layers.conv1d(x, 80, 5, name='conv1')
x = tf.layers.dense(x, 10, name="dense1")
[n.name for n in g.as_graph_def().node]
outputs:
['input',
'conv1/kernel/Initializer/random_uniform/shape',
'conv1/kernel/Initializer/random_uniform/min',
'conv1/kernel/Initializer/random_uniform/max',
'conv1/kernel/Initializer/random_uniform/RandomUniform',
'conv1/kernel/Initializer/random_uniform/sub',
'conv1/kernel/Initializer/random_uniform/mul',
'conv1/kernel/Initializer/random_uniform',
'conv1/kernel',
'conv1/kernel/Assign',
'conv1/kernel/read',
'conv1/bias/Initializer/zeros',
'conv1/bias',
'conv1/bias/Assign',
'conv1/bias/read',
'group1/conv1/dilation_rate',
'group1/conv1/conv1d/ExpandDims/dim',
'group1/conv1/conv1d/ExpandDims',
'group1/conv1/conv1d/ExpandDims_1/dim',
'group1/conv1/conv1d/ExpandDims_1',
'group1/conv1/conv1d/Conv2D',
'group1/conv1/conv1d/Squeeze',
'group1/conv1/BiasAdd',
'dense1/kernel/Initializer/random_uniform/shape',
'dense1/kernel/Initializer/random_uniform/min',
'dense1/kernel/Initializer/random_uniform/max',
'dense1/kernel/Initializer/random_uniform/RandomUniform',
'dense1/kernel/Initializer/random_uniform/sub',
'dense1/kernel/Initializer/random_uniform/mul',
'dense1/kernel/Initializer/random_uniform',
'dense1/kernel',
'dense1/kernel/Assign',
'dense1/kernel/read',
'dense1/bias/Initializer/zeros',
'dense1/bias',
'dense1/bias/Assign',
'dense1/bias/read',
'dense1/Tensordot/Shape',
'dense1/Tensordot/Rank',
'dense1/Tensordot/axes',
'dense1/Tensordot/GreaterEqual/y',
'dense1/Tensordot/GreaterEqual',
'dense1/Tensordot/Cast',
'dense1/Tensordot/mul',
'dense1/Tensordot/Less/y',
'dense1/Tensordot/Less',
'dense1/Tensordot/Cast_1',
'dense1/Tensordot/add',
'dense1/Tensordot/mul_1',
'dense1/Tensordot/add_1',
'dense1/Tensordot/range/start',
'dense1/Tensordot/range/delta',
'dense1/Tensordot/range',
'dense1/Tensordot/ListDiff',
'dense1/Tensordot/Gather',
'dense1/Tensordot/Gather_1',
'dense1/Tensordot/Const',
'dense1/Tensordot/Prod',
'dense1/Tensordot/Const_1',
'dense1/Tensordot/Prod_1',
'dense1/Tensordot/concat/axis',
'dense1/Tensordot/concat',
'dense1/Tensordot/concat_1/axis',
'dense1/Tensordot/concat_1',
'dense1/Tensordot/stack',
'dense1/Tensordot/transpose',
'dense1/Tensordot/Reshape',
'dense1/Tensordot/transpose_1/perm',
'dense1/Tensordot/transpose_1',
'dense1/Tensordot/Reshape_1/shape',
'dense1/Tensordot/Reshape_1',
'dense1/Tensordot/MatMul',
'dense1/Tensordot/Const_2',
'dense1/Tensordot/concat_2/axis',
'dense1/Tensordot/concat_2',
'dense1/Tensordot',
'dense1/BiasAdd']

Ok I've found the issue apparently tf.name_scope is for operation only and tf.variable_scope works for both operations and variables (as per this tf issue).
Here is a stack overflow question that explains the difference between name_scope and variable_scope:
What's the difference of name scope and a variable scope in tensorflow?
g = tf.Graph()
with g.as_default():
x = tf.placeholder(name='input', dtype=tf.float32, shape=[None, 16000, 1])
with tf.variable_scope('v_scope1'):
x = tf.layers.conv1d(x, 80, 5, name='conv1')
[n.name for n in g.as_graph_def().node]
gives:
['input',
'v_scope1/conv1/kernel/Initializer/random_uniform/shape',
'v_scope1/conv1/kernel/Initializer/random_uniform/min',
'v_scope1/conv1/kernel/Initializer/random_uniform/max',
'v_scope1/conv1/kernel/Initializer/random_uniform/RandomUniform',
'v_scope1/conv1/kernel/Initializer/random_uniform/sub',
'v_scope1/conv1/kernel/Initializer/random_uniform/mul',
'v_scope1/conv1/kernel/Initializer/random_uniform',
'v_scope1/conv1/kernel',
'v_scope1/conv1/kernel/Assign',
'v_scope1/conv1/kernel/read',
'v_scope1/conv1/bias/Initializer/zeros',
'v_scope1/conv1/bias',
'v_scope1/conv1/bias/Assign',
'v_scope1/conv1/bias/read',
'v_scope1/conv1/dilation_rate',
'v_scope1/conv1/conv1d/ExpandDims/dim',
'v_scope1/conv1/conv1d/ExpandDims',
'v_scope1/conv1/conv1d/ExpandDims_1/dim',
'v_scope1/conv1/conv1d/ExpandDims_1',
'v_scope1/conv1/conv1d/Conv2D',
'v_scope1/conv1/conv1d/Squeeze',
'v_scope1/conv1/BiasAdd']

Related

Tensorflow: Saving/importing checkpoint works without error, but all imported variables have value 'none'

I am training a deep CNN for image augmentation and have run into a very odd issue.
My network architecture is fully convolutional and implements several small "u-shaped" components, wherein feature maps are down/upsampled in order to be processed throughout a "top layer." In the top layer, there are several nodes where the network "guesses" the output image, and then adds the output of the lower layers to the features derived from the guess. The loss function I have penalizes error in the final prediction as well as these guesses.
The network is defined thusly:
def convnet(x, weights, biases):
#TOP LAYER
conv0_1 = conv3dWrap(x, weights['wConv0_1'], biases['bConv0_1'],[1,1,1,1,1])
conv0_2 = conv3dWrap(conv0_1, weights['wConv0_2'], biases['bConv0_2'],[1,1,1,1,1])
#MID LAYER DOWN SAMPLE
conv1_1 = conv3dWrap(conv0_2, weights['wConv1_1'], biases['bConv1_1'],[1,2,2,2,1])
conv1_2 = conv3dWrap(conv1_1, weights['wConv1_2'], biases['bConv1_2'],[1,1,1,1,1])
#BOTTOM LAYER DOWN SAMPLE
conv2_1 = conv3dWrap(conv1_2, weights['wConv2_1'], biases['bConv2_1'],[1,2,2,2,1])
conv2_2 = conv3dWrap(conv2_1, weights['wConv2_2'], biases['bConv2_2'],[1,1,1,1,1])
conv2_3 = conv3dWrap(conv2_2, weights['wConv2_3'], biases['bConv2_3'],[1,1,1,1,1])
convTrans2_1 = conv3dTransWrap(conv2_3,weights['wTConv2_1'], biases['bTConv2_1'], [4,2,32,32,64],[1,2,2,2,1])
#MID LAYER UPSAMPLE
conv1_3 = conv3dWrap(tf.add(convTrans2_1,conv1_2),weights['wConv1_3'], biases['bConv1_3'],[1,1,1,1,1])
conv1_4 = conv3dWrap(conv1_3, weights['wConv1_4'], biases['bConv1_4'],[1,1,1,1,1])
convTrans1_1 = conv3dTransWrap(conv1_4, weights['wTConv1_1'], biases['bTConv1_1'], [4,4,64,64,32],[1,2,2,2,1])
#TOP LAYER AGAIN
conv0_3 = conv3dWrap(tf.add(conv0_2,convTrans1_1), weights['wConv0_3'], biases['bConv0_3'],[1,1,1,1,1])
conv0_4 = conv3dWrap(conv0_3, weights['wConv0_4'], biases['bConv0_4'],[1,1,1,1,1])
recon0_1 = reconWrap(conv0_3, weights['wReconDS0_1'], biases['bReconDS0_1'],[1,1,1,1,1])
print(recon0_1.shape)
catRecon0_1 = tf.add(conv0_4,tf.contrib.keras.backend.repeat_elements(recon0_1,32,4))
conv0_5 = conv3dWrap(catRecon0_1, weights['wConv0_5'], biases['bConv0_5'],[1,1,1,1,1])
#MID LAYER AGAIN
conv1_5 = conv3dWrap(conv0_5, weights['wConv1_5'], biases['bConv1_5'],[1,2,2,2,1])
conv1_6 = conv3dWrap(conv1_5, weights['wConv1_6'], biases['bConv1_6'],[1,1,1,1,1])
#BOTTOM LAYER
conv2_4 = conv3dWrap(conv1_6, weights['wConv2_4'], biases['bConv2_4'],[1,2,2,2,1])
conv2_5 = conv3dWrap(conv2_4, weights['wConv2_5'], biases['bConv2_5'],[1,1,1,1,1])
conv2_6 = conv3dWrap(conv2_5, weights['wConv2_6'], biases['bConv2_6'],[1,1,1,1,1])
convTrans2_2 = conv3dTransWrap(conv2_6,weights['wTConv2_2'], biases['bTConv2_2'], [4,2,32,32,64],[1,2,2,2,1])
#MID LAYER UPSAMPLE
conv1_7 = conv3dWrap(tf.add(convTrans2_2,conv1_6),weights['wConv1_7'], biases['bConv1_7'],[1,1,1,1,1])
conv1_8 = conv3dWrap(conv1_7, weights['wConv1_8'], biases['bConv1_8'],[1,1,1,1,1])
convTrans1_2 = conv3dTransWrap(conv1_8,weights['wTConv1_2'], biases['bTConv1_2'], [4,4,64,64,32],[1,2,2,2,1])
#TOP LAYER
conv0_6 = conv3dWrap(tf.add(conv0_5,convTrans1_2), weights['wConv0_6'], biases['bConv0_6'],[1,1,1,1,1])
recon0_2 = reconWrap(conv0_6, weights['wReconDS0_2'], biases['bReconDS0_2'],[1,1,1,1,1])
catRecon0_2 = tf.add(conv0_6,tf.contrib.keras.backend.repeat_elements(recon0_2,32,4))
conv0_7 = conv3dWrap(catRecon0_2, weights['wConv0_7'], biases['bConv0_7'],[1,1,1,1,1])
#MID LAYER
conv1_9 = conv3dWrap(conv0_7, weights['wConv1_9'], biases['bConv1_9'],[1,2,2,2,1])
conv1_10 = conv3dWrap(conv1_9, weights['wConv1_10'], biases['bConv1_10'],[1,1,1,1,1])
#BOTTOM LAYER
conv2_7 = conv3dWrap(conv1_10, weights['wConv2_7'], biases['bConv2_7'],[1,2,2,2,1])
conv2_8 = conv3dWrap(conv2_7, weights['wConv2_8'], biases['bConv2_8'],[1,1,1,1,1])
conv2_9 = conv3dWrap(conv2_8, weights['wConv2_9'], biases['bConv2_9'],[1,1,1,1,1])
convTrans2_3 = conv3dTransWrap(conv2_9, weights['wTConv2_3'], biases['bTConv2_3'], [4,2,32,32,64],[1,2,2,2,1])
#MID LAYER UPSAMPLE
conv1_11 = conv3dWrap(tf.add(convTrans2_3,conv1_10),weights['wConv1_11'], biases['bConv1_11'],[1,1,1,1,1])
conv1_12 = conv3dWrap(conv1_11, weights['wConv1_12'], biases['bConv1_12'],[1,1,1,1,1])
convTrans1_3 = conv3dTransWrap(conv1_12,weights['wTConv1_3'], biases['bTConv1_3'], [4,4,64,64,32],[1,2,2,2,1])
#TOP LAYER
conv0_8 = conv3dWrap(tf.add(conv0_7,convTrans1_3), weights['wConv0_8'], biases['bConv0_8'],[1,1,1,1,1])
recon0_3 = reconWrap(conv0_8, weights['wReconDS0_3'], biases['bReconDS0_3'],[1,1,1,1,1])
catRecon0_3 = tf.add(conv0_8,tf.contrib.keras.backend.repeat_elements(recon0_3,32,4))
conv0_9 = conv3dWrap(catRecon0_3, weights['wConv0_9'], biases['bConv0_9'],[1,1,1,1,1])
print(recon0_3.shape)
#MID LAYER
conv1_13 = conv3dWrap(conv0_9, weights['wConv1_13'], biases['bConv1_13'],[1,2,2,2,1])
conv1_14 = conv3dWrap(conv1_13, weights['wConv1_14'], biases['bConv1_14'],[1,1,1,1,1])
#BOTTOM LAYER
conv2_10 = conv3dWrap(conv1_14, weights['wConv2_10'], biases['bConv2_10'],[1,2,2,2,1])
conv2_11 = conv3dWrap(conv2_10, weights['wConv2_11'], biases['bConv2_11'],[1,1,1,1,1])
conv2_12 = conv3dWrap(conv2_11, weights['wConv2_12'], biases['bConv2_12'],[1,1,1,1,1])
convTrans2_4 = conv3dTransWrap(conv2_12, weights['wTConv2_4'], biases['bTConv2_4'], [4,2,32,32,64],[1,2,2,2,1])
#MID LAYER UPSAMPLE
conv1_15 = conv3dWrap(tf.add(convTrans2_4,conv1_14),weights['wConv1_15'], biases['bConv1_15'],[1,1,1,1,1])
conv1_16 = conv3dWrap(conv1_15, weights['wConv1_16'], biases['bConv1_16'],[1,1,1,1,1])
convTrans1_4 = conv3dTransWrap(conv1_16,weights['wTConv1_4'], biases['bTConv1_4'], [4,4,64,64,32],[1,2,2,2,1])
#TOP LAYER
conv0_10 = conv3dWrap(tf.add(conv0_9,convTrans1_4), weights['wConv0_10'], biases['bConv0_10'],[1,1,1,1,1])
#OUTPUT
convOUT = reconWrap(conv0_10, weights['wConvOUT'], biases['bConvOUT'],[1,1,1,1,1])
print(convOUT.shape)
return recon0_1, recon0_2, recon0_3, convOUT
Where all of the "wrappers" are as follows:
def conv3dWrap(x, W, b, strides):
x = tf.nn.conv3d(x, W, strides, padding='SAME')
x = tf.nn.bias_add(x, b)
return tf.nn.relu(x)
def reconWrap(x, W, b, strides):
x = tf.nn.conv3d(x, W, strides, padding='SAME')
x = tf.nn.bias_add(x, b)
return x
def conv3dTransWrap(x, W, b, shape, strides):
x = tf.nn.conv3d_transpose(x, W, shape, strides, padding='SAME')
x = tf.nn.bias_add(x,b)
return tf.nn.relu(x)
My weights and biases are stored in dictionaries that are defined before starting the training:
weights={
#TOP LAYER
'wConv0_1': tf.Variable(tf.random_normal([4, 3, 3, 1, 5]), name='wC0_1'),
'wConv0_2': tf.Variable(tf.random_normal([4, 3, 3, 5, 32]), name='wC0_2'),
'wConv0_3': tf.Variable(tf.random_normal([4, 3, 3, 32, 32]), name='wC0_3'),
'wConv0_4': tf.Variable(tf.random_normal([4, 3, 3, 32, 32]), name='wC0_4'),
'wReconDS0_1': tf.Variable(tf.random_normal([1, 1, 1, 32, 1]) , name='wR0_1') ...... #THIS CONTINUES FOR QUITE AWHILE
Then, I begin the training like this:
def train_cnn(x):
epochLosses=[]
print('Beginning Training!')
print(NUM_EPOCHS)
r1,r2,r3,pred = convNet(x, weights, biases)
cost = (tf.losses.mean_squared_error(y,pred)
+ 0.25* ((tf.losses.mean_squared_error(y,r1))
+ (tf.losses.mean_squared_error(y,r2))
+ (tf.losses.mean_squared_error(y,r3))))
regularizer= 0.01*tf.nn.l2_loss((weights['wConv0_1'])+
0.01*tf.nn.l2_loss(weights['wConv0_2'])+
0.01*tf.nn.l2_loss(weights['wConv0_3'])+
0.01*tf.nn.l2_loss(weights['wConv0_4'])+
0.01*tf.nn.l2_loss(weights['wReconDS0_1'])+
0.01*tf.nn.l2_loss(weights['wConv0_5'])+
0.01*tf.nn.l2_loss(weights['wConv0_6'])+
0.01*tf.nn.l2_loss(weights['wReconDS0_2'])+
0.01*tf.nn.l2_loss(weights['wReconDS0_3'])+
0.01*tf.nn.l2_loss(weights['wConv0_7'])+
0.01*tf.nn.l2_loss(weights['wConv0_8'])+
0.01*tf.nn.l2_loss(weights['wConv0_9'])+
0.01*tf.nn.l2_loss(weights['wConv0_10'])+
0.01*tf.nn.l2_loss(weights['wConvOUT'])+
0.01*tf.nn.l2_loss(weights['wConv1_1'])+
0.01*tf.nn.l2_loss(weights['wConv1_2'])+
0.01*tf.nn.l2_loss(weights['wConv1_3'])+
0.01*tf.nn.l2_loss(weights['wConv1_4'])+
0.01*tf.nn.l2_loss(weights['wConv1_5'])+
0.01*tf.nn.l2_loss(weights['wConv1_6'])+
0.01*tf.nn.l2_loss(weights['wConv1_7'])+
0.01*tf.nn.l2_loss(weights['wConv1_8'])+
0.01*tf.nn.l2_loss(weights['wConv1_9'])+
0.01*tf.nn.l2_loss(weights['wConv1_10'])+
0.01*tf.nn.l2_loss(weights['wConv1_11'])+
0.01*tf.nn.l2_loss(weights['wConv1_12'])+
0.01*tf.nn.l2_loss(weights['wConv1_13'])+
0.01*tf.nn.l2_loss(weights['wConv1_14'])+
0.01*tf.nn.l2_loss(weights['wConv1_15'])+
0.01*tf.nn.l2_loss(weights['wConv1_16'])+
0.01*tf.nn.l2_loss(weights['wTConv1_1'])+
0.01*tf.nn.l2_loss(weights['wTConv1_2'])+
0.01*tf.nn.l2_loss(weights['wTConv1_3'])+
0.01*tf.nn.l2_loss(weights['wTConv1_4'])+
0.01*tf.nn.l2_loss(weights['wConv2_1'])+
0.01*tf.nn.l2_loss(weights['wConv2_2'])+
0.01*tf.nn.l2_loss(weights['wConv2_3'])+
0.01*tf.nn.l2_loss(weights['wConv2_4'])+
0.01*tf.nn.l2_loss(weights['wConv2_5'])+
0.01*tf.nn.l2_loss(weights['wConv2_6'])+
0.01*tf.nn.l2_loss(weights['wConv2_7'])+
0.01*tf.nn.l2_loss(weights['wConv2_8'])+
0.01*tf.nn.l2_loss(weights['wConv2_9'])+
0.01*tf.nn.l2_loss(weights['wConv2_10'])+
0.01*tf.nn.l2_loss(weights['wConv2_11'])+
0.01*tf.nn.l2_loss(weights['wConv2_12'])+
0.01*tf.nn.l2_loss(weights['wTConv2_1'])+
0.01*tf.nn.l2_loss(weights['wTConv2_2'])+
0.01*tf.nn.l2_loss(weights['wTConv2_3'])+
0.01*tf.nn.l2_loss(weights['wTConv2_4']))
cost=cost+regularizer
optimizer = tf.train.AdamOptimizer(learning_rate=LEARNING_RATE).minimize(cost)
saver = tf.train.Saver()
sess = tf.Session()
sess.run(tf.global_variables_initializer())
valLosses=[]
epochLosses=[]
print('Beginning Session!')
writer = tf.summary.FileWriter ( './GRAPH' , sess.graph)
sess.run(tf.global_variables_initializer())
Finally, I go ahead and do some stuff for loading in the batches and, once they're ready, I do the following (for each pass, I won't do the saving every pass once I have the weight importing working):
_, c = sess.run([optimizer, cost], feed_dict = {x: inBatch,y: gsBatch})
epoch_loss += c
save_path = saver.save(sess, "./CHKPT/model.cpkt")
So when I go ahead and import this model
sess = tf.Session()
x = tf.placeholder(dtype=tf.float32)
new_saver = tf.train.import_meta_graph('./CHKPT/model.cpkt.meta')
sess.run(tf.global_variables_initializer())
a,b,c,pred = convNet(x, weights, biases)
I am met with the following error:
ValueError: Tried to convert 'filter' to a tensor and failed. Error: None values not supported.
When I look at the imported weights and biases, each of them have value 'None'. Not only is this odd, but the network 'runs' incredibly quickly during training, far far more quickly than I'd expect. I am worried that no legitimate computations are occurring.
This must not be the case, but, I am almost positive I am following the saving/loading process I've used for many other networks verbatim. Can anyone shed some light on what might be happening here?
Edit: I'm also very new to TF, and it's likely there are non-idealities in my code. If you see anything outside of the saving/importing that isn't kosher please let me know.
Running sess.run(tf.global_variables_initializer()) will reinitialize every tensor and delete their loaded values. Skip calling tf.global_variables_initializer() when you load a model. The initialization is done by the saver.
You are also missing the restore call (import_meta_graph() only loads the saver object).
new_saver = tf.train.import_meta_graph('./CHKPT/model.cpkt.meta')
new_saver.restore(sess, './CHKPT/model.cpkt')
Thereafter when you run:
a,b,c,pred = convNet(x, weights, biases)
you create an all new network and never use the loaded one.
Instead you have to find the tensors you need inside tf.global_variables() after restoring the model. For example by searching for them by name.

consistent forward / backward pass with tensorflow dropout

For the reinforcement learning one usually applies forward pass of the neural network for each step of the episode in order to calculate policy. Afterwards one could calculate parameter gradients using backpropagation. Simplified implementation of my network looks like this:
class AC_Network(object):
def __init__(self, s_size, a_size, scope, trainer, parameters_net):
with tf.variable_scope(scope):
self.is_training = tf.placeholder(shape=[], dtype=tf.bool)
self.inputs = tf.placeholder(shape=[None, s_size], dtype=tf.float32)
# (...)
layer = slim.fully_connected(self.inputs,
layer_size,
activation_fn=tf.nn.relu,
biases_initializer=None)
layer = tf.contrib.layers.dropout(inputs=layer, keep_prob=parameters_net["dropout_keep_prob"],
is_training=self.is_training)
self.policy = slim.fully_connected(layer, a_size,
activation_fn=tf.nn.softmax,
biases_initializer=None)
self.actions = tf.placeholder(shape=[None], dtype=tf.int32)
self.advantages = tf.placeholder(shape=[None], dtype=tf.float32)
actions_onehot = tf.one_hot(self.actions, a_size, dtype=tf.float32)
responsible_outputs = tf.reduce_sum(self.policy * actions_onehot, [1])
self.policy_loss = - policy_loss_multiplier * tf.reduce_mean(tf.log(responsible_outputs) * self.advantages)
local_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope)
self.gradients = tf.gradients(self.policy_loss, local_vars)
Now during training I will fist rollout the episode by consecutive forward passes (again, simplified version):
s = self.local_env.reset() # list of input variables for the first step
while done == False:
a_dist = sess.run([self.policy],
feed_dict = {self.local_AC.inputs: [s],
self.is_training: True})
a = np.argmax(a_dist)
s, r, done, extra_stat = self.local_env.step(a)
# (...)
and in the end I will calculate gradients by backward pass:
p_l, grad = sess.run([self.policy_loss,
self.gradients],
feed_dict={self.inputs: np.vstack(comb_observations),
self.is_training: True,
self.actions: np.hstack(comb_actions),})
(please note that I could have made a mistake somewhere above trying to remove as much as possible of the original code irrelevant to the issue in question)
So finally the question: Is there a way of ensuring that all the consecutive calls to the sess.run() will generate the same dropout structure? Ideally I would like to have exactly the same dropout structure within each episode and only change it between episodes. Things seem to work well as they are but I continue to wonder.

RNN Slow-down phenomenon of Tensorflow

I found a peculiar property of lstm cell(not limited to lstm but I only examined with this) of tensorflow which has not been reported as far as I know.
I don't know whether it actually has, so I left this post in SO. Below is a toy code for this problem:
import tensorflow as tf
import numpy as np
import time
def network(input_list):
input,init_hidden_c,init_hidden_m = input_list
cell = tf.nn.rnn_cell.BasicLSTMCell(256, state_is_tuple=True)
init_hidden = tf.nn.rnn_cell.LSTMStateTuple(init_hidden_c, init_hidden_m)
states, hidden_cm = tf.nn.dynamic_rnn(cell, input, dtype=tf.float32, initial_state=init_hidden)
net = [v for v in tf.trainable_variables()]
return states, hidden_cm, net
def action(x, h_c, h_m):
t0 = time.time()
outputs, output_h = sess.run([rnn_states[:,-1:,:], rnn_hidden_cm], feed_dict={
rnn_input:x,
rnn_init_hidden_c: h_c,
rnn_init_hidden_m: h_m
})
dt = time.time() - t0
return outputs, output_h, dt
rnn_input = tf.placeholder("float", [None, None, 512])
rnn_init_hidden_c = tf.placeholder("float", [None,256])
rnn_init_hidden_m = tf.placeholder("float", [None,256])
rnn_input_list = [rnn_input, rnn_init_hidden_c, rnn_init_hidden_m]
rnn_states, rnn_hidden_cm, rnn_net = network(rnn_input_list)
feed_input = np.random.uniform(low=-1.,high=1.,size=(1,1,512))
feed_init_hidden_c = np.zeros(shape=(1,256))
feed_init_hidden_m = np.zeros(shape=(1,256))
sess = tf.Session()
sess.run(tf.global_variables_initializer())
for i in range(10000):
_, output_hidden_cm, deltat = action(feed_input, feed_init_hidden_c, feed_init_hidden_m)
if i % 10 == 0:
print 'Running time: ' + str(deltat)
(feed_init_hidden_c, feed_init_hidden_m) = output_hidden_cm
feed_input = np.random.uniform(low=-1.,high=1.,size=(1,1,512))
[Not important]What this code does is to generate an output from 'network()' function containing LSTM where the input's temporal dimension is 1, so output's is also 1, and pull in&out initial state for each step of running.
[Important] Looking the 'sess.run()' part. For some reasons in my real code, I happened to put [:,-1:,:] for 'rnn_states'. What is happening is then the time spent for each 'sess.run()' increases. For some inspection by my own, I found this slow down stems from that [:,-1:,:]. I just wanted to get the output at the last time step. If you do 'outputs, output_h = sess.run([rnn_states, rnn_hidden_cm], feed_dict{~' w/o [:,-1:,:] and take 'last_output = outputs[:,-1:,:]' after the 'sess.run()', then the slow down does not occur.
I do not know why this exponential increment of time happens with that [:,-1:,:] running. Is this the nature of tensorflow hasn't been documented but particularly slows down(may be adding more graph by its own?)?
Thank you, and hope this mistake not happen for other users by this post.
I encountered the same problem, with TensorFlow slowing down for each iteration I ran it, and found this question while trying to debug it. Here's a short description of my situation and how I solved it for future reference. Hopefully it can point someone in the right direction and save them some time.
In my case the problem was mainly that I didn't make use of feed_dict to supply the network state when executing sess.run(). Instead I redeclared outputs, final_state and prediction every iteration. The answer at https://github.com/tensorflow/tensorflow/issues/1439#issuecomment-194405649 made me realize how stupid that was... I was constantly creating new graph nodes in every iteration, making it all slower and slower. The problematic code looked something like this:
# defining the network
lstm_layer = rnn.BasicLSTMCell(num_units, forget_bias=1)
outputs, final_state = rnn.static_rnn(lstm_layer, input, initial_state=rnn_state, dtype='float32')
prediction = tf.nn.softmax(tf.matmul(outputs[-1], out_weights)+out_bias)
for input_data in data_seq:
# redeclaring, stupid stupid...
outputs, final_state = rnn.static_rnn(lstm_layer, input, initial_state=rnn_state, dtype='float32')
prediction = tf.nn.softmax(tf.matmul(outputs[-1], out_weights)+out_bias)
p, rnn_state = sess.run((prediction, final_state), feed_dict={x: input_data})
The solution was of course to only declare the nodes once in the beginning, and supply the new data with feed_dict. The code went from being half slow (> 15 ms in the beginning) and becoming slower for every iteration, to execute every iteration in around 1 ms. My new code looks something like this:
out_weights = tf.Variable(tf.random_normal([num_units, n_classes]), name="out_weights")
out_bias = tf.Variable(tf.random_normal([n_classes]), name="out_bias")
# placeholder for the network state
state_placeholder = tf.placeholder(tf.float32, [2, 1, num_units])
rnn_state = tf.nn.rnn_cell.LSTMStateTuple(state_placeholder[0], state_placeholder[1])
x = tf.placeholder('float', [None, 1, n_input])
input = tf.unstack(x, 1, 1)
# defining the network
lstm_layer = rnn.BasicLSTMCell(num_units, forget_bias=1)
outputs, final_state = rnn.static_rnn(lstm_layer, input, initial_state=rnn_state, dtype='float32')
prediction = tf.nn.softmax(tf.matmul(outputs[-1], out_weights)+out_bias)
# actual network state, which we input with feed_dict
_rnn_state = tf.nn.rnn_cell.LSTMStateTuple(np.zeros((1, num_units), dtype='float32'), np.zeros((1, num_units), dtype='float32'))
it = 0
for input_data in data_seq:
encl_input = [[input_data]]
p, _rnn_state = sess.run((prediction, final_state), feed_dict={x: encl_input, rnn_state: _rnn_state})
print("{} - {}".format(it, p))
it += 1
Moving the declaration out from the for loop also got rid of the problem which the OP sdr2002 had, doing a slice outputs[-1] in sess.run() inside the for loop.
As mentioned above, no sliced output for 'sess.run()' is much appreciated for this case.
def action(x, h_c, h_m):
t0 = time.time()
outputs, output_h = sess.run([rnn_states, rnn_hidden_cm], feed_dict={
rnn_input:x,
rnn_init_hidden_c: h_c,
rnn_init_hidden_m: h_m
})
outputs = outputs[:,-1:,:]
dt = time.time() - t0
return outputs, output_h, dt

Reevaluate dependencies of a while loop

I am trying to understand how while loops work in tensorflow. In particular I have a variable, x say, that I update in the while loop, and then I have some values that depends on x, but when running the while loop the values does not seem to be updated when x changes.
The following code where I have tried to implement a simple gradient decent optimizer might illustrate what I mean:
import tensorflow as tf
x = tf.Variable(initial_value=4, dtype=tf.float32, trainable=False)
y = tf.multiply(x,x)
grad = tf.gradients(y, x)
def update_g():
with tf.control_dependencies(grad):
return tf.identity(grad[0])
iterations = tf.placeholder(tf.int32)
i = tf.constant(0, dtype=tf.int32)
g = tf.Variable(initial_value=grad[0], dtype=tf.float32, trainable=False)
c = lambda i_loop, x_loop, g_loop: i_loop < iterations
b = lambda i_loop, x_loop, g_loop: [i_loop+1, tf.assign(x, x_loop - 10*g_loop), update_g()]
l = tf.while_loop(c, b, [i, x, g], back_prop=False, parallel_iterations=1)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
res_g = sess.run(grad)
res_l = sess.run(l, feed_dict={iterations: 10})
res_x = sess.run(x)
print(res_g)
print(res_l)
print(res_x)
Running this on tensorflow 1.0 gives this result for me:
[8.0]
[10, -796.0, 8.0]
-796.0
and the issue is that the value of the gradient is not updated as x changes.
I have tried various variations on the above code, but can not seem to find a version that works. Basically my question is if the above can be made to work, or do I need to rethink the approach.
(Maybe I should add that I am not interested in writing a gradient decent optimizer, I just built this to have something simple and understandable to work with.)
With some help from the other answer I managed to get this working. Posting the complete code here as a second answer:
x = tf.constant(4, dtype=tf.float32)
y = tf.multiply(x,x)
grad = tf.gradients(y, x)
def loop_grad(x_loop):
y2 = tf.multiply(x_loop, x_loop)
return tf.gradients(y2, x_loop)[0]
iterations = tf.placeholder(tf.int32)
i = tf.constant(0, dtype=tf.int32)
c = lambda i_loop, x_loop: i_loop < iterations
b = lambda i_loop, x_loop: [i_loop+1, x_loop - 0.1*loop_grad(x_loop)]
l = tf.while_loop(c, b, [i, x], back_prop=False, parallel_iterations=1)
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.05)
with tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) as sess:
sess.run(tf.global_variables_initializer())
res_g = sess.run(grad)
res_l = sess.run(l, feed_dict={iterations: 100000})
res_x = sess.run(x)
print(res_g)
print(res_l)
print(res_x)
changing the learning rate from the code in the question and increasing the number of iterations gives the output:
[8.0]
[100000, 5.1315068e-38]
4.0
Which seems to be working. It runs reasonably fast even with high iteration count, so there does not seem to be something really horrible going on with updating the graph in each iteration of the while loop, a fear of which probably was one reason why I didn't opt for this approach from the start.
Having tf.Variable objects as loop variables for while loops is not supported, and will behave in weird nondeterministic ways. Always use tf.assign and friends to update the value of a tf.Variable.

How can I implement a Binarizer Layer in TensorFlow?

I'm trying to implement the binarizer in page 4 of this paper. It's not too difficult of a function. It's simply this:
No gradients to be backpropagated for this function. I'm trying to do it in TensorFlow. There are two ways to go about it:
Implementing it in C++ using TensorFlow. However, the instructions are quite unclear to me. It would be great if someone could walk me through it. One thing that I was unclear was why is the gradient for ZeroOutOp implemented in Python?
I decided to go with the pure Python approach.
Here's the code:
import tensorflow as tf
import numpy as np
def py_func(func, inp, out_type, grad):
grad_name = "BinarizerGradients_Schin"
tf.RegisterGradient(grad_name)(grad)
g = tf.get_default_graph()
with g.gradient_override_map({"PyFunc": grad_name}):
return tf.py_func(func, inp, out_type)
'''
This is a hackish implementation to speed things up. Doesn't directly follow the formula.
'''
def _binarizer(x):
probability_matrix = (x + 1) / float(2)
probability_matrix = np.matrix.round(probability_matrix, decimals=0)
np.putmask(probability_matrix, probability_matrix==0.0, -1.0)
return probability_matrix
def binarizer(x):
return py_func(_binarizer, [x], [tf.float32], _BinarizerNoOp)
def _BinarizerNoOp(op, grad):
return grad
The problem happens here. Inputs are 32x32x3 CIFAR images and they get reduced to 4x4x64 in the last layer. My last layer has a shape of (?, 4, 4, 64), where ? is the batch size. After putting it through this by calling:
binarized = binarizer.binarizer(h_pool3)
h_deconv1 = tf.nn.conv2d_transpose(h_pool3, W_deconv1, output_shape=[batch_size, img_height/4, img_width/4, 64], strides=[1,2,2,1], padding='SAME') + b_deconv1
The following error occurs:
ValueError: Shapes (4, 4, 64) and (?, 4, 4, 64) are not compatible
I can kinda guess why this happens. The ? represents the batch size and after putting the last layer through the binarizer, the ? dimension seems to disappear.
I think you can proceed as described in this answer. Applied to our problem:
def binarizer(input):
prob = tf.truediv(tf.add(1.0, input), 2.0)
bernoulli = tf.contrib.distributions.Bernoulli(p=prob, dtype=tf.float32)
return 2 * bernoulli.sample() - 1
Then, where you setup your network:
W_h1, bias_h1 = ...
h1_before_bin = tf.nn.tanh(tf.matmul(x, W_h1) + bias_h1)
# The interesting bits:
t = tf.identity(h1_before_bin)
h1 = t + tf.stop_gradient(binarizer(h1_before_bin) - t)
However, I'm not sure how to verify that this works...