tf.gradients(model.output, model.input) computes a different value each time I run it - tensorflow

I'm trying to compute the gradient of the output layer with respect to the input layer. My neural network is relatively small (input layer composed of 9 activation units and the output layer of 1) and the training went fine as the test provided a very good accuracy. I made the NN model using Keras.
In order to solve my problem, I need to compute the gradient of the output with respect to the input. This is, I need to obtain the Jacobian which as dimension [1x9]. The gradients function in tensorflow should provide me with everything I need, but when I run the code below I obtain a different solution every time.
output_v = model.output
input_v = model.input
gradients = tf.gradients(output_v, input_v)
sess = tf.Session()
sess.run(tf.initialize_all_variables())
print(sess.run(model.input,feed_dict={model.input:x_test_N[0:1,:]}))
evaluated_gradients = sess.run(gradients,feed_dict{model.input:x_test_N[0:1,:]})
print(evaluated_gradients)
sess.close()
The first print command shows this value every time I run it (just to make sure that the input values are not modified):
[[-1.4306372 -0.1272892 0.7145787 1.338818 -1.2957293 -0.5402862-0.7771702 -0.5787912 -0.9157122]]
But the second print shows different ones:
[[ 0.00175761, -0.0490326 , -0.05413761, 0.09952173, 0.06112418, -0.04772799, 0.06557006, -0.02473242, 0.05542536]]
[[-0.00416433, 0.08235116, -0.00930298, 0.04440641, 0.03752216, 0.06378302, 0.03508484, -0.01903783, -0.0538374 ]]
Using finite differences, evaluated_gradients[0,0] = 0.03565103, which isn't close to any of the first values previously printed.
Thanks for your time!
Alberto
Solved by creating a specific session just before training my model:
sess = tf.Session()
sess.run(tf.global_variables_initializer())
K.set_session(sess)
history = model.fit(x_train_N, y_train_N, epochs=n_epochs,
validation_split=split, verbose=1, batch_size=n_batch_size,
shuffle='true', callbacks=[early_stop, tensorboard])
And evaluating the gradient after training, while tf.session is still open:
evaluated_gradients = sess.run(K.gradients(model.output, model.input), feed_dict={model.input: x_test_N})

Presumably your network is set up to initialize weights to random values. When you run sess.run(tf.initialize_all_variables()), you are initializing your variables to new random values. Therefore you get different values for output_v in every run, and hence different gradients. If you want to use a model you trained before, you should replace the initialization with initialize_all_variables() with a restore command. I am not familiar with how this is done in Keras since I usually work directly with tensorflow, but I would try this.
Also note that initialize_all_variables is deprecated and you should use global_variables_initializer instead.

Related

Training RNN with error evaluation at every time step

I have a simpleRNN / LSTM that I'm trying to train on a sequential classification task using tensorflow. There is a sequence of data (300 time steps) that predicts a label at t=300. For my task I would like for the RNN to evaluate the error at every timestep (not just at the final time point) and propagate it backwards (as figure below).
After some responses below it seems I need to do a few things: use return_sequences flag; use the TimeDistributed layer to access the output from the LSTM/RNN; and also defined a custom loss function.
model = Sequential()
layer1 = LSTM(n_neurons, input_shape=(length, 1), return_sequences=True)
model.add(layer1)
layer2 = TimeDistributed(Dense(1))
model.add(layer2)
# Define custom loss
def custom_loss(layer1):
# Create a loss function
def loss(y_true,y_pred):
# access layer1 at every time point and compute mean error
# UNCLEAR HOW TO RUN AT EVERY TIME STEP
err = K.mean(layer1(X) - y_true, axis=-1)
return err
# Return a function
return loss
# Compile the model
model.compile(optimizer='adam', loss=custom_loss(layer), metrics=['accuracy'])
For now I'm a bit confused of the custom_loss function as it's not clear that how I can pass in layer1 and compute the error inside the inner most loss function.
Anyone has a suggestion or can point me to a more detailed answer?
The question is not easy to answer since it is not clear what you're trying to achieve (it shouldn't be the same using a FFNN or a RNN, and what works best depends definitely on the application).
Anyway, you might be confusing the training steps (say, the forward- and back- propagation over a minibatch of sequences) with the "internal" steps of the RNN. A single sequence (or a single minibatch) will always "unroll" entirely through time during the forward pass before any output is made available: only after (thus, at the end of the training step), you can use the predictions and compute the losses to backpropagate.
What you can do is return sequences of outputs (one y_predicted for every internal time step) including the argument return_sequences=True inside SimpleRNN(...). This will give you a sequence of 300 predictions, each of which depends only on the past inputs with respect to the considered internal time step. You can then use the outputs that you need to compute the loss, possibly in a custom loss function.
I hope I've been clear enough. Otherwise, let me know if I can help further.

Many to Many LSTM in TensorFlow : Training error not decreasing

I am trying to use train an LSTM to behave like a controller. Essential this is a many to many problem. I have 7 input features and with each feature being a sequence of 40 values. My output has two features, also being a sequence of 40 values.
I have 2 layers. First layer has four LSTM cells, and second has two LSTM cells. The code is given below.
The code runs and produces output as expected but I am unable to reduced the training error (Mean square error). The error just stops improving after the first 1000 epochs.
I tried using different batch sizes. But I am getting high error even if it the batch size is one. I tried the same network with a simple sine function, and it is working properly i.e. the error is decreasing. Is this because my sequence length is too large, due to which the vanishing gradient problem is occurring. What can I do to improve training error?
#Specify input and ouput features
Xfeatures = 7 #Number of input features
Yfeatures = 2 #Number of input features
num_steps = 40
# reset everything to rerun in jupyter
tf.reset_default_graph()
# Placeholder for the inputs in a given iteration.
u = tf.placeholder(tf.float32, [train_batch_size,num_steps,Xfeatures])
u_NN = tf.placeholder(tf.float32, [train_batch_size,num_steps,Yfeatures])
with tf.name_scope('Normalization'):
#L2 normalization for input data
Xnorm = tf.nn.l2_normalize(u_opt, 0, epsilon=1e-12, name='Normalize')
lstm1= tf.contrib.rnn.BasicLSTMCell(lstm1_size)
lstm2 = tf.contrib.rnn.BasicLSTMCell(lstm2_size)
stacked_lstm = tf.contrib.rnn.MultiRNNCell([lstm1, lstm2])
print(lstm1.output_size)
print(stacked_lstm.output_size)
LSTM_outputs, states = tf.nn.dynamic_rnn(stacked_lstm, Xnorm, dtype=tf.float32)
#Loss
mean_square_error = tf.losses.mean_squared_error(u_NN,LSTM_outputs)
train_step = tf.train.AdamOptimizer(learning_rate).minimize(mean_square_error)
#Initialization and training session
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
#print(sess.run([LSTM_outputs],feed_dict={u_opt:InputX1}))
print(sess.run([mean_square_error],feed_dict={u_opt:InputX1,u_NN:InputY1}))
for i in range(training_epochs):
sess.run([train_step],feed_dict={u_opt:InputX1,u_NN:InputY1})
if i%display_epoch ==0:
print("Training loss is:",sess.run([mean_square_error],feed_dict={u_opt:InputX1,u_NN:InputY1}),"at itertion:",i)
print(sess.run([mean_square_error],feed_dict={u_opt:InputX1,u_NN:InputY1}))
print(sess.run([LSTM_outputs],feed_dict={u_opt:InputX1}))
What do you mean with: "First layer has four LSTM cells, and second has two LSTM cells. The code is given below"? Probably you intend the states of the cells.
Your code is not complete but I can try give you some advices.
If your training error is not going down, a possibility is that your net is not well dimensioned. Probably your lstm1_size and lstm2_size are not enough large to capture the characteristics of your data.
LSTMs help you in accumulating the past of a given sequences in a state vector. Usually, the state vector is not used itself as the predictor but it is projected to the output space using a standard feedforward layer. Probably you can just keep a single layer of recursion (a single LSTM layer) and than project the outputs of the layer using a feedforward layer (i.e. g(W*LSTM_outputs+b), where g is a non-linear activation).

Tensorflow: calculate gradient for tf.multiply

I'm building a neural network that has the following two layers
pseudo_inputs = tf.Variable(a_numpy_ndarray)
weights = tf.Variable(tf.truncated_normal(...))
I then want to multiply them using tf.multiply (which, unlike tf.matmul multiplies corresponding indices, i.e. c_ij = a_ij * b_ij)
input = tf.multiply(pseudo_inputs, weights)
My goal is to learn weights. So I run
train_step = tf.train.AdamOptimizer(learn_rate).minimize(loss, var_list=[weights])
But it doesn't work. The network doesn't change at all.
Looking at tensorboard, I could see that 'input' has no gradient, so I'm assuming that's the problem. Any ideas how to solve this?
From reading tensorflow docs it seems like I might have to write a gradient op for tf.multiply, but I find it hard to believe no one needed to do this before.
I thought the pseudo_inputs should be set as Placeholders in the first line.
And in this line:
train_step = tf.train.AdamOptimizer(learn_rate).minimize(loss, var_list=[weights])
Since weights are to be trained in the graph by minimizing loss then it should not passed as a parameter here.
train = tf.train.AdamOptimizer(learn_rate).minimize(loss)
Then you should first run the train using the samples(you don't have labels) you have.
for x_train, y_train in samples:
sess.run(train, {pseudo_inputs:x_train, y:y_train})
And after that you can get weights by:
W_c, loss_c = sess.run([W, loss], {pseudo_inputs=x_train, y:y_train})

Tensorflow RNN sequence training

I'm making my first steps learning TF and have some trouble training RNNs.
My toy problem goes like this: a two layers LSTM + dense layer network is fed with raw audio data and should test whether a certain frequency is present in the sound.
so the network should 1 to 1 map float(audio data sequence) to float(pre-chosen frequency volume)
I've got this to work on Keras and seen a similar TFLearn solution but would like to implement this on bare Tensorflow in a relatively efficient way.
what i've done:
lstm = rnn_cell.BasicLSTMCell(LSTM_SIZE,state_is_tuple=True,forget_bias=1.0)
lstm = rnn_cell.DropoutWrapper(lstm)
stacked_lstm = rnn_cell.MultiRNNCell([lstm] * 2,state_is_tuple=True)
outputs, states = rnn.dynamic_rnn(stacked_lstm, in, dtype=tf.float32)
outputs = tf.transpose(outputs, [1, 0, 2])
last = tf.gather(outputs, int(outputs.get_shape()[0]) - 1)
network= tf.matmul(last, W) + b
# cost function, optimizer etc...
during training I fed this with (BATCH_SIZE, SEQUENCE_LEN,1) batches and it seems like the loss converged correctly but I can't figure out how to predict with the trained network.
My (awful lot of) questions:
how do i make this network return a sequence right from Tensorflow without going back to python for each sample(feed a sequence and predict a sequence of the same size)?
If I do want to predict one sample at a time and iterate in python what is the correct way to do it?
During testing is dynamic_rnn needed or it's just used for unrolling for BPTT during training? why is dynamic_rnn returning all the back propagation steps Tensors? these are the outputs of each layer of the unrolled network right?
after some research:
how do i make this network return a sequence right from Tensorflow
without going back to python for each sample(feed a sequence and
predict a sequence of the same size)?
you can use state_saving_rnn
class Saver():
def __init__(self):
self.d = {}
def state(self, name):
if not name in self.d:
return tf.zeros([1,LSTM_SIZE],tf.float32)
return self.d[name]
def save_state(self, name, val):
self.d[name] = val
return tf.identity('save_state_name') #<-important for control_dependencies
outputs, states = rnn.state_saving_rnn(stacked_lstm, inx, Saver(),
('lstmstate', 'lstmstate2', 'lstmstate3', 'lstmstate4'),sequence_length=[EVAL_SEQ_LEN])
#4 states are for two layers of lstm each has hidden and CEC variables to restore
network = [tf.matmul(outputs[-1], W) for i in xrange(EVAL_SEQ_LEN)]
one problem is that state_saving_rnn is using rnn() and not dynamic_rnn() therefore unroll at compile time EVAL_SEQ_LEN steps you might want to re-implement state_saving_rnn with dynamic_rnn if you want to input long sequences
If I do want to predict one sample at a time and iterate in python what is the correct way to do it?
you can use dynamic_rnn and supply initial_state. this is probably just as efficient as state_saving_rnn. look at state_saving_rnn implementations for reference
During testing is dynamic_rnn needed or it's just used for unrolling for BPTT during training? why is dynamic_rnn returning all the back propagation steps Tensors? these are the outputs of each layer of the unrolled network right?
dynamic_rnn does do unrolling at runtime similarly to compile time rnn(). I guess it returns all the steps for you to branch the graph in some other places - after less time steps. in a network that use [one time step input * current state -> one output, new state] like the one described above it's not needed in testing but could be used for training truncated time back propagation

Tensorflow: Saving and restoring the model parameters

I am a beginner in TensorFlow, currently training a CNN.
I am using Saver in order to save the parameters used by the model, but I am having concerns whether this would itself store all the Variables used by the model, and is sufficient to restore the values to re-run the program for performing classification/testing on the trained network.
Let us look at the famous example MNIST given by TensorFlow.
In the example, we have bunch of Convolutional blocks, all of which have weight, and bias variables that gets initialised when the program is run.
W_conv1 = init_weight([5,5,1,32])
b_conv1 = init_bias([32])
After having processed several layers, we create a session, and initialise all the variables added to the graph.
sess = tf.Session()
sess.run(tf.initialize_all_variables())
saver = tf.train.Saver()
Here, is it possible to comment the saver.save code, and replace it by saver.restore(sess,file_path) after the training, in order to restore the weight, bias, etc., parameters back to the graph? Is this how it should be ?
for i in range(1000):
...
if i%500 == 0:
saver.save(sess,"model%d.cpkt"%(i))
I am currently training on large dataset, so terminating, and restarting the training is a waste of time, and resources so I request someone to please clarify before the I start the training.
If you want to save the final result only once, you can do this:
with tf.Session() as sess:
for i in range(1000):
...
path = saver.save(sess, "model.ckpt") # out of the loop
print "Saved:", path
In other programs, you can load the model using the path returned from saver.save for prediction or something. You can see some examples at https://github.com/sugyan/tensorflow-mnist.
Based on the explanation in here and Sung Kim solution I wrote a very simple model exactly for this problem. Basically in this way you need to create an object from the same class and restore its variables from the saver. You can find an example of this solution here.