Using custom error function in tensorflow - tensorflow

I have a trained convolutional neural network A that outputs the propability that a given picture contains a square or a circle.
Another Network B takes images of random noise. My idea is to have a bunch of convolutional layers so that the output is a newly generated square.
As an error function I would like to feed the generated image into A and learn filters of B from the softmax tensor of A. To my understanding this is sort of a generative adversarial network, except for that A does not learn. While trying to implement this I have encountered two problems.
I have imported the Layers of A that I want to use in B as followed:
with gfile.FastGFile("shape-classifier.pb", 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
image_input_layer, extern_softmax_tensor = tf.import_graph_def(
graph_def, name="", return_elements=["image_input", "Softmax"])
I would like to avoid using two sess.run() three times. (Generating the random image, getting the softmax values from A, adjusting weights of B).
Is there a way to directly connect the tensors so that I only have one graph?
Calling:
logits = extern_softmax_tensor(my_generated_image_tensor)
throws:
TypeError: 'Operation' object is not callable
The "Graph-Connected" and the "Feed-Connected" approach confuse me a bit.
logits = extern_softmax_tensor(my_generated_image_tensor) # however you would call it
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=label_input,
logits=logits)
cross_entropy_mean = tf.reduce_mean(cross_entropy_tensor)
optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
learning_step = optimizer.minimize(cross_entropy_mean)
With that Logic the error will be first passed back through A.
Is there a way to use the softmax calculated by A to directly adjust Layers of B?
Leaving aside if my idea actually works, is it actually possible to build it in tensorflow? I hope I could make my problems clear.
Thank you very much

Yes, it is possible to connect two models like that.
# Generator networ
my_generated_image_tensor = ...
# Read classifier
with gfile.FastGFile("shape-classifier.pb", 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
# Map generator output to classifier input
extern_softmax_tensor, = tf.import_graph_def(
graph_def, name="", input_map={"image_input": my_generated_image_tensor},
return_elements=["Softmax:0"])
# Define loss and training step
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(
labels=label_input, logits=extern_softmax_tensor)
cross_entropy_mean = tf.reduce_mean(cross_entropy_tensor)
optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
learning_step = optimizer.minimize(cross_entropy_mean)
As side notes: 1) tf.nn.softmax_cross_entropy_with_logits expects logits as input, that is, the values before being passed through the softmax function, so if the Softmax tensor in the loaded model is the output of a softmax operation you should probably change it to use the input logits instead. 2) Note that tf.nn.softmax_cross_entropy_with_logits is now deprecated anyway, see tf.nn.softmax_cross_entropy_with_logits_v2.

Related

Inference / Prediction from frozen model: Obtain value after activation through the graph

I have a simple frozen tensorflow model (frozen in Keras) that I load and then try to use for prediction. I do this first in python (code below), and then using C and libtensorflow (and get the same results). The examples I have found provide the the logits (before activation) as the final output, rather than the class label after activation. Is there a way to obtain the label through the graph itself?
I understand I can operate the sigmoid / softmax operators on the logits, but that's not what I want to do. (I'm porting the code to use the libtensorflow C api, and would prefer to let the graph do the math.)
My understanding is that the session runs the graph to the operation / tensor, and stops before that operation. Is there a way to get the operation after the activation?
Keras Model:
model = Sequential()
model.add(Dense(100, activation='relu', input_shape=(21,)))
model.add(Dense(1, activation='sigmoid'))
Tensorflow code to load frozen model and predict:
from tensorflow.python.platform import gfile
with tf.Session() as sess:
with gfile.FastGFile('slopemodel/slopemodel.pb', 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
sess.graph.as_default()
g_in = tf.import_graph_def(graph_def)
tensor_output = sess.graph.get_tensor_by_name('import/dense_2/Sigmoid:0')
tensor_input = sess.graph.get_tensor_by_name('import/dense_1_input:0')
predictions = sess.run(tensor_output, {tensor_input:sample})
print(predictions)
Truncated list of important nodes in the graph:
['import/dense_1_input',
'import/dense_1/kernel',
'import/dense_1/kernel/read',
'import/dense_1/bias',
'import/dense_1/bias/read',
'import/dense_1/MatMul',
'import/dense_1/BiasAdd',
'import/dense_1/Relu',
'import/dense_2/kernel',
'import/dense_2/kernel/read',
'import/dense_2/bias',
'import/dense_2/bias/read',
'import/dense_2/MatMul',
'import/dense_2/BiasAdd',
'import/dense_2/Sigmoid',
'import/Adam/iterations',
.
.
.]
Yes, you simply need to change the tensor_output to the one you want to obtain. Note that you will not receive the labels themselves but a one-hot vector from which you'll need to find the corresponding label yourself.

How can I use TensorFlow's sampled softmax loss function in a Keras model?

I'm training a language model in Keras and would like to speed up training by using sampled softmax as the final activation function in my network. From the TF docs, it looks like I need to supply arguments for weights and biases, but I'm unsure of what is expected as input for these. It seems like I could write a custom function in Keras as follows:
import keras.backend as K
def sampled_softmax(weights, biases, y_true, y_pred, num_sampled, num_classes):
return K.sampled_softmax(weights, biases, y_true, y_pred, num_sampled, num_classes)
However, I'm unsure of how to "plug this in" to my existing network. The architecture for the LM is pretty dead-simple:
model = Sequential()
model.add(Embedding(input_dim=len(vocab), output_dim=256))
model.add(LSTM(1024, return_sequence=True))
model.add(Dense(output_dim=len(vocab), activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
Given this architecture, could I pass the sampled_softmax function as the loss argument when calling the compile method on the model? Or do this need to be written as a layer that comes after the final fully-connected layer. Any guidance here would be greatly appreciated. Thanks.
The key observation here is that the TensorFlow sampled softmax function returns actual losses, not a set of predictions over the set of possible labels to compare with the ground truth data to then compute losses as a separate step. This makes the model setup a little bit weird.
First, we add a second input layer to the model that encodes the target (training) data a second time as an input, in addition to being the target output. This is used for the labels argument of the sampled_softmax_loss function. It needs to be a Keras input, because it's treated as an input when we go to instantiate and set up the model.
Second, we construct a new custom Keras layer that calls the sampled_softmax_loss function with two Keras layers as its inputs: the output of the dense layer that predicts our classes, and then the second input that contains a copy of the training data. Note that we're doing some serious hackery accessing the _keras_history instance variable to fetch the weight and bias tensors from the output tensor of the original fully-connected layer.
Finally, we have to construct a new "dumb" loss function that ignores the training data and just uses the loss reported by the sampled_softmax_loss function.
Note that because the sampled softmax function returns losses, not class predictions, you can't use this model specification for validation or inference. You'll need to re-use the trained layers from this "training version" in a new specification that applies a standard softmax function to the original dense layer which has the default activation function applied.
There is definitely a more elegant way to do this, but I believe this works, so I figured I'd post it here now as-is rather than wait until I have something that's a little bit neater. For example, you'd probably want to make the number of classes an argument of the SampledSoftmax layer, or better yet, condense this all into the loss function as in the original question and avoid passing in the training data twice.
from keras.models import Model
from keras.layers import Input, Dense, Layer
from keras import backend as K
class SampledSoftmax(Layer):
def __init__(self, **kwargs):
super(SampledSoftmax, self).__init__(**kwargs)
def call(self, inputs):
"""
The first input should be the model as it were, and the second the
target (i.e., a repeat of the training data) to compute the labels
argument
"""
# the labels input to this function is batch size by 1, where the
# value at position (i, 1) is the index that is true (not zero)
# e.g., (0, 0, 1) => (2) or (0, 1, 0, 0) => (1)
return K.tf.nn.sampled_softmax_loss(weights=inputs[0]._keras_history[0].weights[0],
biases=inputs[0]._keras_history[0].bias,
inputs=inputs[0],
labels=K.tf.reshape(K.tf.argmax(inputs[1], 1), [-1, 1]),
num_sampled=1000,
num_classes=200000)
def custom_loss(y_true, y_pred):
return K.tf.reduce_mean(y_pred)
num_classes = 200000
input = Input(shape=(300,))
target_input = Input(shape=(num_classes,))
dense = Dense(num_classes)
outputs = dense(input)
outputs = SampledSoftmax()([outputs, target_input])
model = Model([input, target_input], outputs)
model.compile(optimizer=u'adam', loss=custom_loss)
# train as desired

How can I get a tensor output by a tensorflow.layer

I created a CNN model using higher level tensorflow layers, like
conv1 = tf.layers.conv2d(...)
maxpooling1 = tf.layers.max_pooling2d(...)
conv2 = tf.layers.conv2d(...)
maxpooling2 = tf.layers.max_pooling2d(...)
flatten = tf.layers.flatten(...)
logits = tf.layers.dense(...)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(...))
optimizer = tf.train.AdadeltaOptimizer(init_lr).minimize(loss)
acc = tf.reduce_mean(...)
The model is well trained and saved, everything is good so far. Next, I want to load this saved model, make a change to the learning rate, and continue to train (I know tensorflow provides exponential_decay() function to allow a decay learning rate, here i just want to be in full control of learning rate, and change it manually). To do this, my idea is like:
saver = tf.train.import_meta_grah(...)
saver.restore(sess, tf.train.latest_chechpoint(...))
graph = tf.get_default_graph()
inputImg_ = graph.get_tensor_by_name(...) # this is place_holder in model
labels_ = graph.get_tensor_by_name(...) # place_holder in model
logits = graphget_tensor_by_name(...) # output of dense layer
loss = grah.get_tensor_by_name(...) # loss
optimizer = tf.train.AdadeltaOptimizer(new_lr).minimize(loss) # I give it a new learning rate
acc = tf.reduce_mean(...)
Now I got a problem. the code above can successfully obtain inputmg_, labels_, because I named them when I defined them. But I cannot obtain logits because logits = tf.layers.dense(name='logits') the name is actually given to the dense layer instead of the output tensor logits. That means, I cannot obtain the tensor conv1, conv2 either. It seems tensorflow cannot name a tensor output by a layer. In this case, is there a way to obtain these tensors, like logits, conv1, maxpooling1? I've searched for the answer for a while but failed.
I was having the same problem and solved it using tf.identity.
Since the dense layer has bias and weights parameters, when you name it, you are naming the layer, not the output tensor.
The tf.identity returns a tensor with the same shape and contents as input.
So just leave the dense layer unamed and use it as input to the tf.identity
self.output = tf.layers.dense(hidden_layer3, 2)
self.output = tf.identity(self.output, name='output')
Now you can load the output
output = graph.get_tensor_by_name('output:0')

Tensorflow: calculate gradient for tf.multiply

I'm building a neural network that has the following two layers
pseudo_inputs = tf.Variable(a_numpy_ndarray)
weights = tf.Variable(tf.truncated_normal(...))
I then want to multiply them using tf.multiply (which, unlike tf.matmul multiplies corresponding indices, i.e. c_ij = a_ij * b_ij)
input = tf.multiply(pseudo_inputs, weights)
My goal is to learn weights. So I run
train_step = tf.train.AdamOptimizer(learn_rate).minimize(loss, var_list=[weights])
But it doesn't work. The network doesn't change at all.
Looking at tensorboard, I could see that 'input' has no gradient, so I'm assuming that's the problem. Any ideas how to solve this?
From reading tensorflow docs it seems like I might have to write a gradient op for tf.multiply, but I find it hard to believe no one needed to do this before.
I thought the pseudo_inputs should be set as Placeholders in the first line.
And in this line:
train_step = tf.train.AdamOptimizer(learn_rate).minimize(loss, var_list=[weights])
Since weights are to be trained in the graph by minimizing loss then it should not passed as a parameter here.
train = tf.train.AdamOptimizer(learn_rate).minimize(loss)
Then you should first run the train using the samples(you don't have labels) you have.
for x_train, y_train in samples:
sess.run(train, {pseudo_inputs:x_train, y:y_train})
And after that you can get weights by:
W_c, loss_c = sess.run([W, loss], {pseudo_inputs=x_train, y:y_train})

Does K.function method of Keras with Tensorflow backend work with network layers?

I recently have started using Keras to build neural networks. I built a simple CNN to classify MNIST dataset. Before learning the model I used K.set_image_dim_ordering('th') in order to plot a convolutional layer weights. Right now I am trying to visualize convolutional layer output with K.function method, but I keep getting error.
Here is what I want to do for now:
input_image = X_train[2:3,:,:,:]
output_layer = model.layers[1].output
input_layer = model.layers[0].input
output_fn = K.function(input_layer, output_layer)
output_image = output_fn.predict(input_image)
print(output_image.shape)
output_image = np.rollaxis(np.rollaxis(output_image, 3, 1), 3, 1)
print(output_image.shape)
fig = plt.figure()
for i in range(32):
ax = fig.add_subplot(4,8,i+1)
im = ax.imshow(output_image[0,:,:,i], cmap="Greys")
plt.xticks(np.array([]))
plt.yticks(np.array([]))
fig.subplots_adjust(right=0.8)
cbar_ax = fig.add_axes([1, 0.1, 0.05 ,0.8])
fig.colorbar(im, cax = cbar_ax)
plt.tight_layout()
plt.show()
And this is what I get:
File "/home/kinshiryuu/anaconda3/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 1621, in function
return Function(inputs, outputs, updates=updates)
File "/home/kinshiryuu/anaconda3/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 1569, in __init__
raise TypeError('`inputs` to a TensorFlow backend function '
TypeError: `inputs` to a TensorFlow backend function should be a list or tuple.
You should do the following changes:
output_fn = K.function([input_layer], [output_layer])
output_image = output_fn([input_image])
K.function takes the input and output tensors as list so that you can create a function from many input to many output. In your case one input to one output.. but you need to pass them as a list none the less.
Next K.function returns a tensor function and not a model object where you can use predict(). The correct way of using is just to call as a function
I think you can also use K.function to get gradients.
self.action_gradients = K.gradients(Q_values, actions)
self.get_action_gradients=K.function[*self.model.input, K.learning_phase()], outputs=action_gradients)
which basically runs the graph to obtain the Q-value to calculate the gradient of the Q-value w.r.t. action vector in DDPG. Source code here (lines 64 to 70): https://github.com/nyck33/autonomous_quadcopter/blob/master/criticSolution.py#L65
In light of the accepted answer and this usage here (originally from project 5 autonomous quadcopter in the Udacity Deep Learning nanodegree), a question remains in my mind, ie. is K.function() something that can be used fairly flexibly to run the graph and to designate as outputs of K.function() for example outputs of a particular layer, gradients or even weights themselves?
Lines 64 to 67 here: https://github.com/nyck33/autonomous_quadcopter/blob/master/actorSolution.py
It is being used as a custom training function for the actor network in DDPG:
#caller
self.actor_local.train_fn([states, action_gradients, 1])
#called
self.train_fn = K.function(inputs=[self.model.input, action_gradients, K.learning_phase()], \
outputs=[], updates=updates_op)
outputs is given a value of an empty list because we merely want to train the actor network with the action_gradients from the critic network.