Does K.function method of Keras with Tensorflow backend work with network layers? - tensorflow

I recently have started using Keras to build neural networks. I built a simple CNN to classify MNIST dataset. Before learning the model I used K.set_image_dim_ordering('th') in order to plot a convolutional layer weights. Right now I am trying to visualize convolutional layer output with K.function method, but I keep getting error.
Here is what I want to do for now:
input_image = X_train[2:3,:,:,:]
output_layer = model.layers[1].output
input_layer = model.layers[0].input
output_fn = K.function(input_layer, output_layer)
output_image = output_fn.predict(input_image)
print(output_image.shape)
output_image = np.rollaxis(np.rollaxis(output_image, 3, 1), 3, 1)
print(output_image.shape)
fig = plt.figure()
for i in range(32):
ax = fig.add_subplot(4,8,i+1)
im = ax.imshow(output_image[0,:,:,i], cmap="Greys")
plt.xticks(np.array([]))
plt.yticks(np.array([]))
fig.subplots_adjust(right=0.8)
cbar_ax = fig.add_axes([1, 0.1, 0.05 ,0.8])
fig.colorbar(im, cax = cbar_ax)
plt.tight_layout()
plt.show()
And this is what I get:
File "/home/kinshiryuu/anaconda3/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 1621, in function
return Function(inputs, outputs, updates=updates)
File "/home/kinshiryuu/anaconda3/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 1569, in __init__
raise TypeError('`inputs` to a TensorFlow backend function '
TypeError: `inputs` to a TensorFlow backend function should be a list or tuple.

You should do the following changes:
output_fn = K.function([input_layer], [output_layer])
output_image = output_fn([input_image])
K.function takes the input and output tensors as list so that you can create a function from many input to many output. In your case one input to one output.. but you need to pass them as a list none the less.
Next K.function returns a tensor function and not a model object where you can use predict(). The correct way of using is just to call as a function

I think you can also use K.function to get gradients.
self.action_gradients = K.gradients(Q_values, actions)
self.get_action_gradients=K.function[*self.model.input, K.learning_phase()], outputs=action_gradients)
which basically runs the graph to obtain the Q-value to calculate the gradient of the Q-value w.r.t. action vector in DDPG. Source code here (lines 64 to 70): https://github.com/nyck33/autonomous_quadcopter/blob/master/criticSolution.py#L65
In light of the accepted answer and this usage here (originally from project 5 autonomous quadcopter in the Udacity Deep Learning nanodegree), a question remains in my mind, ie. is K.function() something that can be used fairly flexibly to run the graph and to designate as outputs of K.function() for example outputs of a particular layer, gradients or even weights themselves?
Lines 64 to 67 here: https://github.com/nyck33/autonomous_quadcopter/blob/master/actorSolution.py
It is being used as a custom training function for the actor network in DDPG:
#caller
self.actor_local.train_fn([states, action_gradients, 1])
#called
self.train_fn = K.function(inputs=[self.model.input, action_gradients, K.learning_phase()], \
outputs=[], updates=updates_op)
outputs is given a value of an empty list because we merely want to train the actor network with the action_gradients from the critic network.

Related

Easy way to clamp Neural Network outputs between 0 and 1?

So I'm working on writing a GAN neural network and I want to set my network's output to 0 if it is less than 0 and 1 if it is greater than 1 and leave it unchanged otherwise. I'm pretty new to tensorflow, but I don't know of any tensorflow function or activation to do this without unwanted side effects. So I made my loss function so it calculates the loss as if the output was clamped, with this code:
def discriminator_loss(real_output, fake_output):
real_output_clipped = min(max(real_output.numpy()[0],
0), 1)
fake_output_clipped = min(max(fake_output.numpy()[0],
0), 1)
real_clipped_tensor =
tf.Variable([[real_output_clipped]], dtype = "float32")
fake_clipped_tensor =
tf.Variable([[fake_output_clipped]], dtype = "float32")
real_loss = cross_entropy(tf.ones_like(real_output),
real_clipped_tensor)
fake_loss = cross_entropy(tf.zeros_like(fake_output),
fake_clipped_tensor)
total_loss = real_loss + fake_loss
return total_loss
but I get this error:
ValueError: No gradients provided for any variable: ['dense_50/kernel:0', 'dense_50/bias:0', 'dense_51/kernel:0', 'dense_51/bias:0', 'dense_52/kernel:0', 'dense_52/bias:0', 'dense_53/kernel:0', 'dense_53/bias:0'].
Does anyone know a better way to do this, or a way to fix this error?
Thanks!
You can apply a ReLU layer from Keras as your final layer and set max_value=1.0. For example:
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(32, input_shape=(16,)))
model.add(tf.keras.layers.Dense(32))
model.add(tf.keras.layers.ReLU(max_value=1.0))
You can read more about it here: https://www.tensorflow.org/api_docs/python/tf/keras/layers/ReLU
TF probably does not know how to update your network weights based on this loss. The input of the cross entropy are tensors (variables) that are directly assigned from numpy arrays and are not connected to your actual network outputs.
If you want to perform operations on tensors that will remain within the graph and (hopefully) be differentiable, use the available TF operations. There's a "clip_by_value" operation described here: https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/clip_by_value.
E.g. real_output_clipped = tf.clip_by_value(real_output, clip_value_min=0, clip_value_max=1)

What would be the output from tensorflow dense layer if we assign itself as input and output while making a neural network?

I have been going through the implementation of neural network in openAI code for any Vanilla Policy Gradient (As a matter of fact, this part is used nearly everywhere). The code looks something like this :
def mlp_categorical_policy(x, a, hidden_sizes, activation, output_activation, action_space):
act_dim = action_space.n
logits = mlp(x, list(hidden_sizes) + [act_dim], activation, None)
logp_all = tf.nn.log_softmax(logits)
pi = tf.squeeze(tf.random.categorical(logits, 1), axis=1)
logp = tf.reduce_sum(tf.one_hot(a, depth=act_dim) * logp_all, axis=1)
logp_pi = tf.reduce_sum(tf.one_hot(pi, depth=act_dim) * logp_all, axis=1)
return pi, logp, logp_pi
and this multi-layered perceptron network is defined as follows :
def mlp(x, hidden_sizes=(32,), activation=tf.tanh, output_activation=None):
for h in hidden_sizes[:-1]:
x = tf.layers.dense(inputs=x, units=h, activation=activation)
return tf.layers.dense(inputs=x, units=hidden_sizes[-1], activation=output_activation)
My question is what is the return from this mlp function? I mean the structure or shape. Is it an N-dimentional tensor? If so, how is it given as an input to tf.random_categorical? If not, and its just has the shape [hidden_layer2, output], then what happened to the other layers? As per their website description about random_categorical it only takes a 2-D input. The complete code of openAI's VPG algorithm can be found here. The mlp is implemented here. I would be highly grateful if someone would just tell me what this mlp_categorical_policy() is doing?
Note: The hidden size is [64, 64], the action dimension is 3
Thanks and cheers
Note that this is a discrete action space - there are action_space.n different possible actions at every step, and the agent chooses one.
To do this the MLP is returning the logits (which are a function of the probabilities) of the different actions. This is specified in the code by + [act_dim] which is appending count of the action_space as the final MLP layer. Note that the last layer of an MLP is the output layer. The input layer is not specified in tensorflow, it is inferred from the inputs.
tf.random.categorical takes the logits and samples a policy action pi from them, which is returned as a number.
mlp_categorical_policy also returns logp, the log probability of the action a (used to assign credit), and logp_pi, the log probability of the policy action pi.
It seems your question is more about the return from the mlp.
The mlp creates a series of fully connected layers in a loop. In each iteration of the loop, the mlp is creating a new layer using the previous layer x as an input and assigning it's output to overwrite x, with this line x = tf.layers.dense(inputs=x, units=h, activation=activation).
So the output is not the same as the input, on each iteration x is overwritten with the value of the new layer. This is the same kind of coding trick as x = x + 1, which increments x by 1. This effectively chains the layers together.
The output of tf.layers.dense is a tensor of size [:,h] where : is the batch dimension (and can usually be ignored). The creation of the last layer happens outisde the loop, it can be seen that the number of nodes in this layer is act_dim (so shape is [:,3]). You can check the shape by doing this:
import tensorflow.compat.v1 as tf
import numpy as np
def mlp(x, hidden_sizes=(32,), activation=tf.tanh, output_activation=None):
for h in hidden_sizes[:-1]:
x = tf.layers.dense(x, units=h, activation=activation)
return tf.layers.dense(x, units=hidden_sizes[-1], activation=output_activation)
obs = np.array([[1.0,2.0]])
logits = mlp(obs, [64, 64, 3], tf.nn.relu, None)
print(logits.shape)
result: TensorShape([1, 3])
Note that the observation in this case is [1.,2.], it is nested inside a batch of size 1.

Using custom error function in tensorflow

I have a trained convolutional neural network A that outputs the propability that a given picture contains a square or a circle.
Another Network B takes images of random noise. My idea is to have a bunch of convolutional layers so that the output is a newly generated square.
As an error function I would like to feed the generated image into A and learn filters of B from the softmax tensor of A. To my understanding this is sort of a generative adversarial network, except for that A does not learn. While trying to implement this I have encountered two problems.
I have imported the Layers of A that I want to use in B as followed:
with gfile.FastGFile("shape-classifier.pb", 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
image_input_layer, extern_softmax_tensor = tf.import_graph_def(
graph_def, name="", return_elements=["image_input", "Softmax"])
I would like to avoid using two sess.run() three times. (Generating the random image, getting the softmax values from A, adjusting weights of B).
Is there a way to directly connect the tensors so that I only have one graph?
Calling:
logits = extern_softmax_tensor(my_generated_image_tensor)
throws:
TypeError: 'Operation' object is not callable
The "Graph-Connected" and the "Feed-Connected" approach confuse me a bit.
logits = extern_softmax_tensor(my_generated_image_tensor) # however you would call it
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=label_input,
logits=logits)
cross_entropy_mean = tf.reduce_mean(cross_entropy_tensor)
optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
learning_step = optimizer.minimize(cross_entropy_mean)
With that Logic the error will be first passed back through A.
Is there a way to use the softmax calculated by A to directly adjust Layers of B?
Leaving aside if my idea actually works, is it actually possible to build it in tensorflow? I hope I could make my problems clear.
Thank you very much
Yes, it is possible to connect two models like that.
# Generator networ
my_generated_image_tensor = ...
# Read classifier
with gfile.FastGFile("shape-classifier.pb", 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
# Map generator output to classifier input
extern_softmax_tensor, = tf.import_graph_def(
graph_def, name="", input_map={"image_input": my_generated_image_tensor},
return_elements=["Softmax:0"])
# Define loss and training step
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(
labels=label_input, logits=extern_softmax_tensor)
cross_entropy_mean = tf.reduce_mean(cross_entropy_tensor)
optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
learning_step = optimizer.minimize(cross_entropy_mean)
As side notes: 1) tf.nn.softmax_cross_entropy_with_logits expects logits as input, that is, the values before being passed through the softmax function, so if the Softmax tensor in the loaded model is the output of a softmax operation you should probably change it to use the input logits instead. 2) Note that tf.nn.softmax_cross_entropy_with_logits is now deprecated anyway, see tf.nn.softmax_cross_entropy_with_logits_v2.

How to get weights in tf.layers.dense?

I wanna draw the weights of tf.layers.dense in tensorboard histogram, but it not show in the parameter, how could I do that?
The weights are added as a variable named kernel, so you could use
x = tf.dense(...)
weights = tf.get_default_graph().get_tensor_by_name(
os.path.split(x.name)[0] + '/kernel:0')
You can obviously replace tf.get_default_graph() by any other graph you are working in.
I came across this problem and just solved it. tf.layers.dense 's name is not necessary to be the same with the kernel's name's prefix. My tensor is "dense_2/xxx" but it's kernel is "dense_1/kernel:0". To ensure that tf.get_variable works, you'd better set the name=xxx in the tf.layers.dense function to make two names owning same prefix. It works as the demo below:
l=tf.layers.dense(input_tf_xxx,300,name='ip1')
with tf.variable_scope('ip1', reuse=True):
w = tf.get_variable('kernel')
By the way, my tf version is 1.3.
The latest tensorflow layers api creates all the variables using the tf.get_variable call. This ensures that if you wish to use the variable again, you can just use the tf.get_variable function and provide the name of the variable that you wish to obtain.
In the case of a tf.layers.dense, the variable is created as: layer_name/kernel. So, you can obtain the variable by saying:
with tf.variable_scope("layer_name", reuse=True):
weights = tf.get_variable("kernel") # do not specify
# the shape here or it will confuse tensorflow into creating a new one.
[Edit]: The new version of Tensorflow now has both Functional and Object-Oriented interfaces to the layers api. If you need the layers only for computational purposes, then using the functional api is a good choice. The function names start with small letters for instance -> tf.layers.dense(...). The Layer Objects can be created using capital first letters e.g. -> tf.layers.Dense(...). Once you have a handle to this layer object, you can use all of its functionality. For obtaining the weights, just use obj.trainable_weights this returns a list of all the trainable variables found in that layer's scope.
I am going crazy with tensorflow.
I run this:
sess.run(x.kernel)
after training, and I get the weights.
Comes from the properties described here.
I am saying that I am going crazy because it seems that there are a million slightly different ways to do something in tf, and that fragments the tutorials around.
Is there anything wrong with
model.get_weights()
After I create a model, compile it and run fit, this function returns a numpy array of the weights for me.
In TF 2 if you're inside a #tf.function (graph mode):
weights = optimizer.weights
If you're in eager mode (default in TF2 except in #tf.function decorated functions):
weights = optimizer.get_weights()
in TF2 weights will output a list in length 2
weights_out[0] = kernel weight
weights_out[1] = bias weight
the second layer weight (layer[0] is the input layer with no weights) in a model in size: 50 with input size: 784
inputs = keras.Input(shape=(784,), name="digits")
x = layers.Dense(50, activation="relu", name="dense_1")(inputs)
x = layers.Dense(50, activation="relu", name="dense_2")(x)
outputs = layers.Dense(10, activation="softmax", name="predictions")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(...)
model.fit(...)
kernel_weight = model.layers[1].weights[0]
bias_weight = model.layers[1].weights[1]
all_weight = model.layers[1].weights
print(len(all_weight)) # 2
print(kernel_weight.shape) # (784,50)
print(bias_weight.shape) # (50,)
Try to make a loop for getting the weight of each layer in your sequential network by printing the name of the layer first which you can get from:
model.summary()
Then u can get the weight of each layer running this code:
for layer in model.layers:
print(layer.name)
print(layer.get_weights())

How do I share weights across different RNN cells that feed in different inputs in Tensorflow?

I'm curious if there is a good way to share weights across different RNN cells while still feeding each cell different inputs.
The graph that I am trying to build is like this:
where there are three LSTM Cells in orange which operate in parallel and between which I would like to share the weights.
I've managed to implement something similar to what I want using a placeholder (see below for code). However, using a placeholder breaks the gradient calculations of the optimizer and doesn't train anything past the point where I use the placeholder. Is it possible to do this a better way in Tensorflow?
I'm using Tensorflow 1.2 and python 3.5 in an Anaconda environment on Windows 7.
Code:
def ann_model(cls,data, act=tf.nn.relu):
with tf.name_scope('ANN'):
with tf.name_scope('ann_weights'):
ann_weights = tf.Variable(tf.random_normal([1,
cls.n_ann_nodes]))
with tf.name_scope('ann_bias'):
ann_biases = tf.Variable(tf.random_normal([1]))
out = act(tf.matmul(data,ann_weights) + ann_biases)
return out
def rnn_lower_model(cls,data):
with tf.name_scope('RNN_Model'):
data_tens = tf.split(data, cls.sequence_length,1)
for i in range(len(data_tens)):
data_tens[i] = tf.reshape(data_tens[i],[cls.batch_size,
cls.n_rnn_inputs])
rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(cls.n_rnn_nodes_lower)
outputs, states = tf.contrib.rnn.static_rnn(rnn_cell,
data_tens,
dtype=tf.float32)
with tf.name_scope('RNN_out_weights'):
out_weights = tf.Variable(
tf.random_normal([cls.n_rnn_nodes_lower,1]))
with tf.name_scope('RNN_out_biases'):
out_biases = tf.Variable(tf.random_normal([1]))
#Encode the output of the RNN into one estimate per entry in
#the input sequence
predict_list = []
for i in range(cls.sequence_length):
predict_list.append(tf.matmul(outputs[i],
out_weights)
+ out_biases)
return predict_list
def create_graph(cls,sess):
#Initializes the graph
with tf.name_scope('input'):
cls.x = tf.placeholder('float',[cls.batch_size,
cls.sequence_length,
cls.n_inputs])
with tf.name_scope('labels'):
cls.y = tf.placeholder('float',[cls.batch_size,1])
with tf.name_scope('community_id'):
cls.c = tf.placeholder('float',[cls.batch_size,1])
#Define Placeholder to provide variable input into the
#RNNs with shared weights
cls.input_place = tf.placeholder('float',[cls.batch_size,
cls.sequence_length,
cls.n_rnn_inputs])
#global step used in optimizer
global_step = tf.Variable(0,trainable = False)
#Create ANN
ann_output = cls.ann_model(cls.c)
#Combine output of ANN with other input data x
ann_out_seq = tf.reshape(tf.concat([ann_output for _ in
range(cls.sequence_length)],1),
[cls.batch_size,
cls.sequence_length,
cls.n_ann_nodes])
cls.rnn_input = tf.concat([ann_out_seq,cls.x],2)
#Create 'unrolled' RNN by creating sequence_length many RNN Cells that
#share the same weights.
with tf.variable_scope('Lower_RNNs'):
#Create RNNs
daily_prediction, daily_prediction1 =[cls.rnn_lower_model(cls.input_place)]*2
When training mini-batches are calculated in two steps:
RNNinput = sess.run(cls.rnn_input,feed_dict = {
cls.x:batch_x,
cls.y:batch_y,
cls.c:batch_c})
_ = sess.run(cls.optimizer, feed_dict={cls.input_place:RNNinput,
cls.y:batch_y,
cls.x:batch_x,
cls.c:batch_c})
Thanks for your help. Any ideas would be appreciated.
You have 3 different inputs : input_1, input_2, input_3 fed it to a LSTM model which has the parameters shared. And then you concatenate the outputs of the 3 lstm and pass it to a final LSTM layer. The code should look something like this:
# Create input placeholder for the network
input_1 = tf.placeholder(...)
input_2 = tf.placeholder(...)
input_3 = tf.placeholder(...)
# create a shared rnn layer
def shared_rnn(...):
...
rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(...)
# generate the outputs for each input
with tf.variable_scope('lower_lstm') as scope:
out_input_1 = shared_rnn(...)
scope.reuse_variables() # the variables will be reused.
out_input_2 = shared_rnn(...)
scope.reuse_variables()
out_input_3 = shared_rnn(...)
# verify whether the variables are reused
for v in tf.global_variables():
print(v.name)
# concat the three outputs
output = tf.concat...
# Pass it to the final_lstm layer and out the logits
logits = final_layer(output, ...)
train_op = ...
# train
sess.run(train_op, feed_dict{input_1: in1, input_2: in2, input_3:in3, labels: ...}
I ended up rethinking my architecture a little and came up with a more workable solution.
Instead of duplicating the middle layer of LSTM cells to create three different cells with the same weights, I chose to run the same cell three times. The results of each run were stored in a 'buffer' like tf.Variable, and then that whole variable was used as an input into the final LSTM layer.
I drew a diagram here
Implementing it this way allowed for valid outputs after 3 time steps, and didn't break tensorflows backpropagation algorithm (i.e. The nodes in the ANN could still train.)
The only tricky thing was to make sure that the buffer was in the correct sequential order for the final RNN.