Easy way to clamp Neural Network outputs between 0 and 1? - tensorflow

So I'm working on writing a GAN neural network and I want to set my network's output to 0 if it is less than 0 and 1 if it is greater than 1 and leave it unchanged otherwise. I'm pretty new to tensorflow, but I don't know of any tensorflow function or activation to do this without unwanted side effects. So I made my loss function so it calculates the loss as if the output was clamped, with this code:
def discriminator_loss(real_output, fake_output):
real_output_clipped = min(max(real_output.numpy()[0],
0), 1)
fake_output_clipped = min(max(fake_output.numpy()[0],
0), 1)
real_clipped_tensor =
tf.Variable([[real_output_clipped]], dtype = "float32")
fake_clipped_tensor =
tf.Variable([[fake_output_clipped]], dtype = "float32")
real_loss = cross_entropy(tf.ones_like(real_output),
real_clipped_tensor)
fake_loss = cross_entropy(tf.zeros_like(fake_output),
fake_clipped_tensor)
total_loss = real_loss + fake_loss
return total_loss
but I get this error:
ValueError: No gradients provided for any variable: ['dense_50/kernel:0', 'dense_50/bias:0', 'dense_51/kernel:0', 'dense_51/bias:0', 'dense_52/kernel:0', 'dense_52/bias:0', 'dense_53/kernel:0', 'dense_53/bias:0'].
Does anyone know a better way to do this, or a way to fix this error?
Thanks!

You can apply a ReLU layer from Keras as your final layer and set max_value=1.0. For example:
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(32, input_shape=(16,)))
model.add(tf.keras.layers.Dense(32))
model.add(tf.keras.layers.ReLU(max_value=1.0))
You can read more about it here: https://www.tensorflow.org/api_docs/python/tf/keras/layers/ReLU

TF probably does not know how to update your network weights based on this loss. The input of the cross entropy are tensors (variables) that are directly assigned from numpy arrays and are not connected to your actual network outputs.
If you want to perform operations on tensors that will remain within the graph and (hopefully) be differentiable, use the available TF operations. There's a "clip_by_value" operation described here: https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/clip_by_value.
E.g. real_output_clipped = tf.clip_by_value(real_output, clip_value_min=0, clip_value_max=1)

Related

Neural network only converges when data cloud is close to 0

I am new to tensorflow and am learning the basics at the moment so please bear with me.
My problem concerns strange non-convergent behaviour of neural networks when presented with the supposedly simple task of finding a regression function for a small training set consisting only of m = 100 data points {(x_1, y_1), (x_2, y_2),...,(x_100, y_100)}, where x_i and y_i are real numbers.
I first constructed a function that automatically generates a computational graph corresponding to a classical fully connected feedforward neural network:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import math
def neural_network_constructor(arch_list = [1,3,3,1],
act_func = tf.nn.sigmoid,
w_initializer = tf.contrib.layers.xavier_initializer(),
b_initializer = tf.zeros_initializer(),
loss_function = tf.losses.mean_squared_error,
training_method = tf.train.GradientDescentOptimizer(0.5)):
n_input = arch_list[0]
n_output = arch_list[-1]
X = tf.placeholder(dtype = tf.float32, shape = [None, n_input])
layer = tf.contrib.layers.fully_connected(
inputs = X,
num_outputs = arch_list[1],
activation_fn = act_func,
weights_initializer = w_initializer,
biases_initializer = b_initializer)
for N in arch_list[2:-1]:
layer = tf.contrib.layers.fully_connected(
inputs = layer,
num_outputs = N,
activation_fn = act_func,
weights_initializer = w_initializer,
biases_initializer = b_initializer)
Phi = tf.contrib.layers.fully_connected(
inputs = layer,
num_outputs = n_output,
activation_fn = tf.identity,
weights_initializer = w_initializer,
biases_initializer = b_initializer)
Y = tf.placeholder(tf.float32, [None, n_output])
loss = loss_function(Y, Phi)
train_step = training_method.minimize(loss)
return [X, Phi, Y, train_step]
With the above default values for the arguments, this function would construct a computational graph corresponding to a neural network with 1 input neuron, 2 hidden layers with 3 neurons each and 1 output neuron. The activation function is per default the sigmoid function. X corresponds to the input tensor, Y to the labels of the training data and Phi to the feedforward output of the neural network. The operation train_step performs one gradient-descent step when executed in the session environment.
So far, so good. If I now test a particular neural network (constructed with this function and the exact default values for the arguments given above) by making it learn a simple regression function for artificial data extracted from a sinewave, strange things happen:
Before training, the network seems to be a flat line. After 100.000 training iterations, it manages to partially learn the function, but only the part which is closer to 0. After this, it becomes flat again. Further training does not decrease the loss function anymore.
This get even stranger, when I take the exact same data set, but shift all x-values by adding 500:
Here, the network completely refuses to learn. I cannot understand why this is happening. I have tried changing the architecture of the network and its learning rate, but have observed similar effects: the closer the x-values of the data cloud are to the origin, the easier the network can learn. After a certain distance to the origin, learning stops completely. Changing the activation function from sigmoid to ReLu has only made things worse; here, the network tends to just converge to the average, no matter what position the data cloud is in.
Is there something wrong with my implementation of the neural-network-constructor? Or does this have something do do with initialization values? I have tried to get a deeper understanding of this problem now for quite a while and would greatly appreciate some advice. What could be the cause of this? All thoughts on why this behaviour is occuring are very much welcome!
Thanks,
Joker

stop_gradient in tensorflow

I am wondering if tf.stop_gradient stops the gradient computation of just a given op, or stops the update of its input tf.variable ? I have the following problem - During the forward path computation in MNIST, I would like to perform a set of operations on the weights (let's say W to W*) and then do a matmul with inputs. However, I would like to exclude these operations from the backward path. I want only dE/dW computed during training with back propagation. The code I wrote prevents W from getting updated. Could you please help me understand why ? If these were variables, I understand I should set their trainable property to false, but these are operations on weights. If stop_gradient cannot be used for this purpose, then how do I build two graphs, one for forward path and the other for back propagation ?
def build_layer(inputs, fmap, nscope,layer_size1,layer_size2, faulty_training):
with tf.name_scope(nscope):
if (faulty_training):
## trainable weight
weights_i = tf.Variable(tf.truncated_normal([layer_size1, layer_size2],stddev=1.0 / math.sqrt(float(layer_size1))),name='weights_i')
## Operations on weight whose gradient should not be computed during backpropagation
weights_fx_t = tf.multiply(268435456.0,weights_i)
weight_fx_t = tf.stop_gradient(weights_fx_t)
weights_fx = tf.cast(weights_fx_t,tf.int32)
weight_fx = tf.stop_gradient(weights_fx)
weights_fx_fault = tf.bitwise.bitwise_xor(weights_fx,fmap)
weight_fx_fault = tf.stop_gradient(weights_fx_fault)
weights_fl = tf.cast(weights_fx_fault, tf.float32)
weight_fl = tf.stop_gradient(weights_fl)
weights = tf.stop_gradient(tf.multiply((1.0/268435456.0),weights_fl))
##### end transformation
else:
weights = tf.Variable(tf.truncated_normal([layer_size1, layer_size2],stddev=1.0 / math.sqrt(float(layer_size1))),name='weights')
biases = tf.Variable(tf.zeros([layer_size2]), name='biases')
hidden = tf.nn.relu(tf.matmul(inputs, weights) + biases)
return weights,hidden
I am using the tensorflow gradient descent optimizer to do the training.
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
global_step = tf.Variable(0, name='global_step', trainable=False)
train_op = optimizer.minimize(loss, global_step=global_step)
Stop gradient will prevent the backpropagation from continuing past that node in the graph. You code doesn't have any path from weights_i to the loss except the one that goes through weights_fx_t where the gradient is stopped. This is what is causing weights_i not to be updated during training. You don't need to put stop_gradient after every step. Using it just once will stop the backpropagation there.
If stop_gradient doesn't do what you want then you can get the gradients by doing tf.gradients and you can write your own update op by using tf.assign. This will allow you to alter the gradients however you want.

How can I get a tensor output by a tensorflow.layer

I created a CNN model using higher level tensorflow layers, like
conv1 = tf.layers.conv2d(...)
maxpooling1 = tf.layers.max_pooling2d(...)
conv2 = tf.layers.conv2d(...)
maxpooling2 = tf.layers.max_pooling2d(...)
flatten = tf.layers.flatten(...)
logits = tf.layers.dense(...)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(...))
optimizer = tf.train.AdadeltaOptimizer(init_lr).minimize(loss)
acc = tf.reduce_mean(...)
The model is well trained and saved, everything is good so far. Next, I want to load this saved model, make a change to the learning rate, and continue to train (I know tensorflow provides exponential_decay() function to allow a decay learning rate, here i just want to be in full control of learning rate, and change it manually). To do this, my idea is like:
saver = tf.train.import_meta_grah(...)
saver.restore(sess, tf.train.latest_chechpoint(...))
graph = tf.get_default_graph()
inputImg_ = graph.get_tensor_by_name(...) # this is place_holder in model
labels_ = graph.get_tensor_by_name(...) # place_holder in model
logits = graphget_tensor_by_name(...) # output of dense layer
loss = grah.get_tensor_by_name(...) # loss
optimizer = tf.train.AdadeltaOptimizer(new_lr).minimize(loss) # I give it a new learning rate
acc = tf.reduce_mean(...)
Now I got a problem. the code above can successfully obtain inputmg_, labels_, because I named them when I defined them. But I cannot obtain logits because logits = tf.layers.dense(name='logits') the name is actually given to the dense layer instead of the output tensor logits. That means, I cannot obtain the tensor conv1, conv2 either. It seems tensorflow cannot name a tensor output by a layer. In this case, is there a way to obtain these tensors, like logits, conv1, maxpooling1? I've searched for the answer for a while but failed.
I was having the same problem and solved it using tf.identity.
Since the dense layer has bias and weights parameters, when you name it, you are naming the layer, not the output tensor.
The tf.identity returns a tensor with the same shape and contents as input.
So just leave the dense layer unamed and use it as input to the tf.identity
self.output = tf.layers.dense(hidden_layer3, 2)
self.output = tf.identity(self.output, name='output')
Now you can load the output
output = graph.get_tensor_by_name('output:0')

Does K.function method of Keras with Tensorflow backend work with network layers?

I recently have started using Keras to build neural networks. I built a simple CNN to classify MNIST dataset. Before learning the model I used K.set_image_dim_ordering('th') in order to plot a convolutional layer weights. Right now I am trying to visualize convolutional layer output with K.function method, but I keep getting error.
Here is what I want to do for now:
input_image = X_train[2:3,:,:,:]
output_layer = model.layers[1].output
input_layer = model.layers[0].input
output_fn = K.function(input_layer, output_layer)
output_image = output_fn.predict(input_image)
print(output_image.shape)
output_image = np.rollaxis(np.rollaxis(output_image, 3, 1), 3, 1)
print(output_image.shape)
fig = plt.figure()
for i in range(32):
ax = fig.add_subplot(4,8,i+1)
im = ax.imshow(output_image[0,:,:,i], cmap="Greys")
plt.xticks(np.array([]))
plt.yticks(np.array([]))
fig.subplots_adjust(right=0.8)
cbar_ax = fig.add_axes([1, 0.1, 0.05 ,0.8])
fig.colorbar(im, cax = cbar_ax)
plt.tight_layout()
plt.show()
And this is what I get:
File "/home/kinshiryuu/anaconda3/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 1621, in function
return Function(inputs, outputs, updates=updates)
File "/home/kinshiryuu/anaconda3/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 1569, in __init__
raise TypeError('`inputs` to a TensorFlow backend function '
TypeError: `inputs` to a TensorFlow backend function should be a list or tuple.
You should do the following changes:
output_fn = K.function([input_layer], [output_layer])
output_image = output_fn([input_image])
K.function takes the input and output tensors as list so that you can create a function from many input to many output. In your case one input to one output.. but you need to pass them as a list none the less.
Next K.function returns a tensor function and not a model object where you can use predict(). The correct way of using is just to call as a function
I think you can also use K.function to get gradients.
self.action_gradients = K.gradients(Q_values, actions)
self.get_action_gradients=K.function[*self.model.input, K.learning_phase()], outputs=action_gradients)
which basically runs the graph to obtain the Q-value to calculate the gradient of the Q-value w.r.t. action vector in DDPG. Source code here (lines 64 to 70): https://github.com/nyck33/autonomous_quadcopter/blob/master/criticSolution.py#L65
In light of the accepted answer and this usage here (originally from project 5 autonomous quadcopter in the Udacity Deep Learning nanodegree), a question remains in my mind, ie. is K.function() something that can be used fairly flexibly to run the graph and to designate as outputs of K.function() for example outputs of a particular layer, gradients or even weights themselves?
Lines 64 to 67 here: https://github.com/nyck33/autonomous_quadcopter/blob/master/actorSolution.py
It is being used as a custom training function for the actor network in DDPG:
#caller
self.actor_local.train_fn([states, action_gradients, 1])
#called
self.train_fn = K.function(inputs=[self.model.input, action_gradients, K.learning_phase()], \
outputs=[], updates=updates_op)
outputs is given a value of an empty list because we merely want to train the actor network with the action_gradients from the critic network.

Why isn't this Conv2d_Transpose / deconv2d returning the original input in tensorflow?

weights = tf.placeholder("float",[5,5,1,1])
imagein = tf.placeholder("float",[1,32,32,1])
conv = tf.nn.conv2d(imagein,weights,strides=[1,1,1,1],padding="SAME")
deconv = tf.nn.conv2d_transpose(conv, weights, [1,32,32,1], [1,1,1,1],padding="SAME")
dw = np.random.rand(5,5,1,1)
noise = np.random.rand(1,32,32,1)
sess = tf.InteractiveSession()
convolved = conv.eval(feed_dict={imagein: noise, weights: dw})
deconvolved = deconv.eval(feed_dict={imagein: noise, weights: dw})
I've been trying to figure out conv2d_transpose in order to reverse a convolution in Tensorflow. My understanding is that "deconvolved" should contain the same data as "noise" after applying a normal convolution and then its transpose, but "deconvolved" just contains some completely different image. Is there something wrong with my code, or is the theory incorrect?
There's a reason it's called conv2d_transpose rather than deconv2d: it isn't deconvolution. Convolution isn't an orthogonal transformation, so it's inverse (deconvolution) isn't the same as its transpose (conv2d_transpose).
Your confusion is understandable: calling the transpose of convolution "deconvolution" has been standard neural network practice for years. I am happy than we were able to fix the name to be mathematically correct in TensorFlow; more details here:
https://github.com/tensorflow/tensorflow/issues/256