Tensorflow: train and test in separate functions - tensorflow

I am trying to use a Tensorflow model in two separate functions: one that trains it, and one used to test it. For example, the training function looks something like this:
graph = tf.Graph()
with graph.as_default():
tf_dataset = tf.placeholder(tf.float32, shape=(None, num_dims))
...
weights = tf.Variable(tf.truncated_normal([num_dims, num_labels]))
...
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
prediction = tf.nn.softmax(logits)
...
session = tf.Session(graph=graph)
...
The other, evaluation function would just use prediction with the test data, like so:
session.run(prediction, feed_dict={tf_dataset: test_data})
The problem is, of course, that tf_dataset is not in the scope of the other function. I am fine with returning session and prediction from the training function, but having to share every single placeholder with the evaluation code seems a bit lame.
Is there a way to get the references somehow, from the session or the graph? Also, are there any good practices on how to separate training and evaluation code in Tensorflow?

You could give your placeholders unique names and use that. IE,
tf_dataset = tf.placeholder(tf.float32, shape=(None, num_dims), name="datainput")
...
sess.run(..., feed_dict={"datainput:0": mydata})
You can also get names/type pairs for all ops in your graph, so you could recover all the placeholder tensor names that way
[(op.name+":0", op.op_def.name) for op in graph.get_operations()]

Related

tf.gradients(model.output, model.input) computes a different value each time I run it

I'm trying to compute the gradient of the output layer with respect to the input layer. My neural network is relatively small (input layer composed of 9 activation units and the output layer of 1) and the training went fine as the test provided a very good accuracy. I made the NN model using Keras.
In order to solve my problem, I need to compute the gradient of the output with respect to the input. This is, I need to obtain the Jacobian which as dimension [1x9]. The gradients function in tensorflow should provide me with everything I need, but when I run the code below I obtain a different solution every time.
output_v = model.output
input_v = model.input
gradients = tf.gradients(output_v, input_v)
sess = tf.Session()
sess.run(tf.initialize_all_variables())
print(sess.run(model.input,feed_dict={model.input:x_test_N[0:1,:]}))
evaluated_gradients = sess.run(gradients,feed_dict{model.input:x_test_N[0:1,:]})
print(evaluated_gradients)
sess.close()
The first print command shows this value every time I run it (just to make sure that the input values are not modified):
[[-1.4306372 -0.1272892 0.7145787 1.338818 -1.2957293 -0.5402862-0.7771702 -0.5787912 -0.9157122]]
But the second print shows different ones:
[[ 0.00175761, -0.0490326 , -0.05413761, 0.09952173, 0.06112418, -0.04772799, 0.06557006, -0.02473242, 0.05542536]]
[[-0.00416433, 0.08235116, -0.00930298, 0.04440641, 0.03752216, 0.06378302, 0.03508484, -0.01903783, -0.0538374 ]]
Using finite differences, evaluated_gradients[0,0] = 0.03565103, which isn't close to any of the first values previously printed.
Thanks for your time!
Alberto
Solved by creating a specific session just before training my model:
sess = tf.Session()
sess.run(tf.global_variables_initializer())
K.set_session(sess)
history = model.fit(x_train_N, y_train_N, epochs=n_epochs,
validation_split=split, verbose=1, batch_size=n_batch_size,
shuffle='true', callbacks=[early_stop, tensorboard])
And evaluating the gradient after training, while tf.session is still open:
evaluated_gradients = sess.run(K.gradients(model.output, model.input), feed_dict={model.input: x_test_N})
Presumably your network is set up to initialize weights to random values. When you run sess.run(tf.initialize_all_variables()), you are initializing your variables to new random values. Therefore you get different values for output_v in every run, and hence different gradients. If you want to use a model you trained before, you should replace the initialization with initialize_all_variables() with a restore command. I am not familiar with how this is done in Keras since I usually work directly with tensorflow, but I would try this.
Also note that initialize_all_variables is deprecated and you should use global_variables_initializer instead.

How to create two graphs for train and validation?

When I read tensorflow guidance about graph and session(Graphs and Sessions), I found they suggest to create two graphs for train and validation.
I think this reasonable and I want to use this because my train and validation models are different (for encoder-decoder mode or dropout). However, i don't know how to make variables in trained graph available for test graph without using tf.saver().
When I create two graphs and create variables inside each graph, I found these two variables are totally different as they belong to different graphs.
I have googled a lot and I know there are questions about this problems, such as question1. But there is still no useful answer. If there is any code example or anyone know how to create two graphs for train and validation separately, such as:
def train_model():
g_train = tf.graph()
with g_train.as_default():
train_models
def validation_model():
g_test = tf.graph()
with g_test.as_default():
test_models
One easy way of doing that is to create a 'forward function' that defines the model and change behaviour based on extra parameters.
Here is an example:
def forward_pass(x, is_training, reuse=tf.AUTO_REUSE, name='model_forward_pass'):
# Note the reuse attribute as it tells the getter to either create the graph or get the weights
with tf.variable_scope(name=name, reuse=reuse):
x = tf.layers.conv(x, ...)
...
x = tf.layers.dense(x, ...)
x = tf.layers.dropout(x, rate, training=is_training) # Note the is_training attribute
...
return x
Now you can call the 'forward_pass' function anywhere in your code. You simply need to provide the is_training attribute to use the correct mode for dropout for example. The 'reuse' argument will automatically get the correct values for your weights as long as the 'name' of the 'variable_scope' is the same.
For example:
train_logits_model1 = forward_pass(x_train, is_training=True, name='model1')
# Graph is defined and dropout is used in training mode
test_logits_model1 = forward_pass(x_test, is_training=False, name='model1')
# Graph is reused but the dropout behaviour change to inference mode
train_logits_model2 = forward_pass(x_train2, is_training=True, name='model2')
# Name changed, model2 is added to the graph and dropout is used in training mode
To add to this answer as you stated that you want to have 2 separated graph, you could to that using an assign function:
train_graph = forward_pass(x, is_training=True, reuse=False, name='train_graph')
...
test_graph = forward_pass(x, is_training=False, reuse=False, name='test_graph')
...
train_vars = tf.get_collection('variables', 'train_graph/.*')
test_vars = tf.get_collection('variables','test_graph/.*')
test_assign_ops = []
for test, train in zip(test_vars, train_vars):
test_assign_ops += [tf.assign(test, train)]
assign_op = tf.group(*test_assign_ops)
sess.run(assign_op) # Replace vars in the test_graph by the one in train_graph
I'm a big advocate of method 1 as it is way cleaner and reduce memory usage.

How to get weights in tf.layers.dense?

I wanna draw the weights of tf.layers.dense in tensorboard histogram, but it not show in the parameter, how could I do that?
The weights are added as a variable named kernel, so you could use
x = tf.dense(...)
weights = tf.get_default_graph().get_tensor_by_name(
os.path.split(x.name)[0] + '/kernel:0')
You can obviously replace tf.get_default_graph() by any other graph you are working in.
I came across this problem and just solved it. tf.layers.dense 's name is not necessary to be the same with the kernel's name's prefix. My tensor is "dense_2/xxx" but it's kernel is "dense_1/kernel:0". To ensure that tf.get_variable works, you'd better set the name=xxx in the tf.layers.dense function to make two names owning same prefix. It works as the demo below:
l=tf.layers.dense(input_tf_xxx,300,name='ip1')
with tf.variable_scope('ip1', reuse=True):
w = tf.get_variable('kernel')
By the way, my tf version is 1.3.
The latest tensorflow layers api creates all the variables using the tf.get_variable call. This ensures that if you wish to use the variable again, you can just use the tf.get_variable function and provide the name of the variable that you wish to obtain.
In the case of a tf.layers.dense, the variable is created as: layer_name/kernel. So, you can obtain the variable by saying:
with tf.variable_scope("layer_name", reuse=True):
weights = tf.get_variable("kernel") # do not specify
# the shape here or it will confuse tensorflow into creating a new one.
[Edit]: The new version of Tensorflow now has both Functional and Object-Oriented interfaces to the layers api. If you need the layers only for computational purposes, then using the functional api is a good choice. The function names start with small letters for instance -> tf.layers.dense(...). The Layer Objects can be created using capital first letters e.g. -> tf.layers.Dense(...). Once you have a handle to this layer object, you can use all of its functionality. For obtaining the weights, just use obj.trainable_weights this returns a list of all the trainable variables found in that layer's scope.
I am going crazy with tensorflow.
I run this:
sess.run(x.kernel)
after training, and I get the weights.
Comes from the properties described here.
I am saying that I am going crazy because it seems that there are a million slightly different ways to do something in tf, and that fragments the tutorials around.
Is there anything wrong with
model.get_weights()
After I create a model, compile it and run fit, this function returns a numpy array of the weights for me.
In TF 2 if you're inside a #tf.function (graph mode):
weights = optimizer.weights
If you're in eager mode (default in TF2 except in #tf.function decorated functions):
weights = optimizer.get_weights()
in TF2 weights will output a list in length 2
weights_out[0] = kernel weight
weights_out[1] = bias weight
the second layer weight (layer[0] is the input layer with no weights) in a model in size: 50 with input size: 784
inputs = keras.Input(shape=(784,), name="digits")
x = layers.Dense(50, activation="relu", name="dense_1")(inputs)
x = layers.Dense(50, activation="relu", name="dense_2")(x)
outputs = layers.Dense(10, activation="softmax", name="predictions")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(...)
model.fit(...)
kernel_weight = model.layers[1].weights[0]
bias_weight = model.layers[1].weights[1]
all_weight = model.layers[1].weights
print(len(all_weight)) # 2
print(kernel_weight.shape) # (784,50)
print(bias_weight.shape) # (50,)
Try to make a loop for getting the weight of each layer in your sequential network by printing the name of the layer first which you can get from:
model.summary()
Then u can get the weight of each layer running this code:
for layer in model.layers:
print(layer.name)
print(layer.get_weights())

How to initialize a keras tensor employed in an API model

I am trying to implemente a Memory-augmented neural network, in which the memory and the read/write/usage weight vectors are updated according to a combination of their previous values. These weigths are different from the classic weight matrices between layers that are automatically updated with the fit() function! My problem is the following: how can I correctly initialize these weights as keras tensors and use them in the model? I explain it better with the following simplified example.
My API model is something like:
input = Input(shape=(5,6))
controller = LSTM(20, activation='tanh',stateful=False, return_sequences=True)(input)
write_key = Dense(4,activation='tanh')(controller)
read_key = Dense(4,activation='tanh')(controller)
w_w = Add()([w_u, w_r]) #<---- UPDATE OF WRITE WEIGHTS
to_write = Dot()([w_w, write_key])
M = Add()([M,to_write])
cos_sim = Dot()([M,read_key])
w_r = Lambda(lambda x: softmax(x,axis=1))(cos_sim) #<---- UPDATE OF READ WEIGHTS
w_u = Add()([w_u,w_r,w_w]) #<---- UPDATE OF USAGE WEIGHTS
retrieved_memory = Dot()([w_r,M])
controller_output = concatenate([controller,retrieved_memory])
final_output = Dense(6,activation='sigmoid')(controller_output)`
You can see that, in order to compute w_w^t, I have to have first defined w_r^{t-1} and w_u^{t-1}. So, at the beginning I have to provide a valid initialization for these vectors. What is the best way to do it? The initializations I would like to have are:
M = K.variable(numpy.zeros((10,4))) # MEMORY
w_r = K.variable(numpy.zeros((1,10))) # READ WEIGHTS
w_u = K.variable(numpy.zeros((1,10))) # USAGE WEIGHTS`
But, analogously to what said in #2486(entron), these commands do not return a keras tensor with all the needed meta-data and so this returns the following error:
AttributeError: 'NoneType' object has no attribute 'inbound_nodes'
I also thought to use the old M, w_r and w_u as further inputs at each iteration and analogously get in output the same variables to complete the loop. But this means that I have to use the fit() function to train online the model having just the target as final output (Model 1), and employ the predict() function on the model with all the secondary outputs (Model 2) to get the variables to use at the next iteration. I have also to pass the weigth matrices from Model 1 to Model 2 using get_weights() and set_weights(). As you can see, it becomes a little bit messy and too slow.
Do you have any suggestions for this problem?
P.S. Please, do not focus too much on the API model above because it is a simplified (almost meaningless) version of the complete one where I skipped several key steps.

Add validation summary

How can I add validation to tensorboard? I have written a wrappers for layers, like:
def convolution(input_data, kernel_shape, strides, activation, name=None):
with tf.name_scope(name):
kernel = tf.Variable(tf.truncated_normal(kernel_shape, stddev=stddev), name="weights")
bias = tf.Variable(tf.zeros([kernel_shape[-1]]), name="biases")
conv = tf.nn.conv2d(input=input_data, filter=kernel, strides=strides, padding="SAME", name="convolutions")
result = activation(tf.nn.bias_add(conv, bias), name="activations")
tf.scalar_summary(name + "/mean", tf.reduce_mean(kernel))
return result
and use summary_op = tf.merge_all_summaries() in main. Also I have implemented train_op and valid_op, which both calls inference function. However, there appears an error that we have duplicate tags for scalar_summary, i.e. inference is used in both train_op and valid_op, which lead to duplication of, say, conv1/mean summary.
How can I make this work? I need is to run train and validation using the same function inference.
As the error suggests, you cannot have two summaries with the same tag. This happens in your case because you are calling tf.scalar_summary twice with the same tag, once when constructing the train_op and once when constructing the valid_op. Here is a possible solution :
You can add a flag to your inference function, say is_training, to indicate that the code is being called to construct part of a training graph. You would have to thread that flag to all your layer functions. In convolution for instance, you should do the following :
if is_training:
tf.scalar_summary(name + "/mean", tf.reduce_mean(kernel))
return result
When constructing the train_op, you pass is_training=True, and when constructing the valid_op, you pass is_training=False. There is an example of such a programming pattern here in the Inception model.
Another way is to use different name scopes for summaries and then filter them by merge_summary's scope argument, instead of merge_all_summaries.