How can I feed last output y(t-1) as input for generating y(t) in tensorflow RNN? - tensorflow

I want to design a single layer RNN in Tensorflow such that last output (y(t-1)) is participated in updating the hidden state.
h(t) = tanh(W_{ih} * x(t) + W_{hh} * h(t) + **W_{oh}y(t - 1)**)
y(t) = W_{ho}*h(t)
How can I feed last input y(t - 1) as input for updating the hidden state?

Is y(t-1) the last input or output? In both cases it is not a straight fit with the TensorFlow RNN cell abstraction. If your RNN is simple you can just write the loop on your own, then you have full control. Another way that I would use is to pre-process your RNN input, e.g., do something like:
processed_input[t] = tf.concat(input[t], input[t-1])
Then call the RNN cell with processed_input and split there.

One possibility is to use tf.nn.raw_rnn which I found in this article. Check my answer to this related post.

I would call what you described an "autoregressive RNN". Here's an (incomplete) code snippet that shows how you can create one using tf.nn.raw_rnn:
import tensorflow as tf
LSTM_SIZE = 128
BATCH_SIZE = 64
HORIZON = 10
lstm_cell = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE, use_peepholes=True)
class RnnLoop:
def __init__(self, initial_state, cell):
self.initial_state = initial_state
self.cell = cell
def __call__(self, time, cell_output, cell_state, loop_state):
emit_output = cell_output # == None for time == 0
if cell_output is None: # time == 0
initial_input = tf.fill([BATCH_SIZE, LSTM_SIZE], 0.0)
next_input = initial_input
next_cell_state = self.initial_state
else:
next_input = cell_output
next_cell_state = cell_state
elements_finished = (time >= HORIZON)
next_loop_state = None
return elements_finished, next_input, next_cell_state, emit_output, next_loop_state
rnn_loop = RnnLoop(initial_state=initial_state_tensor, cell=lstm_cell)
rnn_outputs_tensor_array, _, _ = tf.nn.raw_rnn(lstm_cell, rnn_loop)
rnn_outputs_tensor = rnn_outputs_tensor_array.stack()
Here we initialize internal state of LSTM with some vector initial_state_tensor, and feed zero array as input at t=0. After that, the output of the current timestep is the input for the next timestep.

Related

How to perform custom operations in between keras layers?

I have one input and one output neural network and in between I need to perform small operation. I have two inputs (from the same distribution of either mean 0 or mean 1) which I need to fed to the neural network one at a time and compare the output of each input. After the comparison, I am finally generating the prediction of the model. The implementation is as follows:
from tensorflow import keras
import tensorflow as tf
import numpy as np
#define network
x1 = keras.Input(shape=(1), name="x1")
x2 = keras.Input(shape=(1), name="x2")
model = keras.layers.Dense(20)
model1 = keras.layers.Dense(1)
x11 = model1(model(x1))
x22 = model1(model(x2))
After this I need to perform following operations:
if x11>=x22:
Vm=x1
else:
Vm=x2
Finally I need to do:
out = Vm - 0.5
out= keras.activations.sigmoid(out)
model = keras.Model([x1,x2], out)
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss=tf.keras.losses.binary_crossentropy,
metrics=['accuracy']
)
model.summary()
tf.keras.utils.plot_model(model) #visualize model
I have normally distributed pair of data with same mean (mean 0 and mean 1 as generated below:
#Generating training dataset
from scipy.stats import skewnorm
n=1000 #sample each
s = 1 # scale to change o/p range
X1_0 = skewnorm.rvs(a = 0 ,loc=0, size=n)*s; X1_1 = skewnorm.rvs(a = 0 ,loc=1, size=n)*s #Skewnorm function
X2_0 = skewnorm.rvs(a = 0 ,loc=0, size=n)*s; X2_1 = skewnorm.rvs(a = 0 ,loc=1, size=n)*s #Skewnorm function
X1_train = list(X1_0) + list(X1_1) #append both data
X2_train = list(X2_0) + list(X2_1) #append both data
y_train = [x for x in (0,1) for i in range(0, n)] #make Y for above conditions
#reshape to proper format
X1_train = np.array(X1_train).reshape(-1,1)
X2_train = np.array(X2_train).reshape(-1,1)
y_train = np.array(y_train)
#train model
model.fit([X1_train, X2_train], y_train, epochs=10)
I am not been able to run the program if I include operation
if x11>=x22:
Vm=x1
else:
Vm=x2
in between layers. If I directly work with maximum of outputs as:
Vm = keras.layers.Maximum()([x11,x22])
The program is working fine. But I need to select either x1 or x2 based on the value of x11 and x22.
The problem might be due to the inclusion of the comparison operation while defining structure of the model where there is no value for x11 and x22 (I guess). I am totally new to all these stuffs and so I could not resolve this. I would greatly appreciate any help/suggestions. Thank you.
You can add this functionality via a Lambda layer.
Vm = tf.keras.layers.Lambda(lambda x: tf.where(x[0]>=x[1], x[2], x[3]))([x11, x22, x1, x2])

Tensorflow, Keras: In a multi-class classification, accuracy is high, but precision, recall, and f1-score is zero for most classes

General Explanation:
My codes work fine, but the results are wired. I don't know the problem is with
the network structure,
or the way I feed the data to the network,
or anything else.
I am struggling with this error several weeks and so far I have changed the loss function, optimizer, data generator, etc., but I could not solve it. I appreciate any help.
If the following information is not enough, let me know, please.
Field of study:
I am using tensorflow, keras for multiclass classification. The dataset has 36 binary human attributes. I have used resnet50, then for each part of the body (head, upper body, lower body, shoes, accessories), I have added a separated branch to the network. The network has 1 input image with 36 labels and 36 output nodes (36 denes layers with sigmoid activation).
Problem:
The problem is that the accuracy that keras is reporting is high, but f1-score is very low or zero for most of the outputs (even when I use f1-score as a metric when compiling the network, the f1-socre for validation is very bad).
aAfter train, when I use the network in prediction mode, it returns always one/zero for some classes. It means that the network is not able to learn (even when I use weighted loss function or focal loss function.)
Why it is weird? Because, state-of-the-art methods report heigh f1 score even after the first epoch (e.g. https://github.com/chufengt/iccv19_attribute, that I have run it in my PC and got good results after one epoch).
Parts of the Codes:
print("setup model ...")
input_image = KL.Input(args.img_input_shape, name= "input_1")
C1, C2, C3, C4, C5 = resnet_graph(input_image, architecture="resnet50", stage5=False, train_bn=True)
output_layers = merged_model (input_features=C4)
model = Model(inputs=input_image, outputs=output_layers, name='SoftBiometrics_Model')
...
print("model compiling ...")
OPTIM = optimizers.Adadelta(lr=args.learning_rate, rho=0.95)
model.compile(optimizer=OPTIM, loss=binary_focal_loss(alpha=.25, gamma=2), metrics=['acc',get_f1])
plot_model(model, to_file='model.png')
...
img_datagen = ImageDataGenerator(rotation_range=6, width_shift_range=0.03, height_shift_range=0.03, brightness_range=[0.85,1.15], shear_range=0.06, zoom_range=0.09, horizontal_flip=True, preprocessing_function=preprocess_input_resnet, rescale=1/255.)
img_datagen_test = ImageDataGenerator(preprocessing_function=preprocess_input_resnet, rescale=1/255.)
def multiple_outputs(generator, dataframe, batch_size, x_col):
Gen = generator.flow_from_dataframe(dataframe=dataframe,
directory=None,
x_col = x_col,
y_col = args.Categories,
target_size = (args.img_input_shape[0],args.img_input_shape[1]),
class_mode = "multi_output",
classes=None,
batch_size = batch_size,
shuffle = True)
while True:
gnext = Gen.next()
# return image batch and 36 sets of lables
labels = gnext[1]
output_dict = {"{}_output".format(Category): np.array(labels[index]) for index, Category in enumerate(args.Categories)}
yield {'input_1':gnext[0]}, output_dict
trainGen = multiple_outputs (generator = img_datagen, dataframe=Train_df_img, batch_size=args.BATCH_SIZE, x_col="Train_Filenames")
testGen = multiple_outputs (generator = img_datagen_test, dataframe=Test_df_img, batch_size=args.BATCH_SIZE, x_col="Test_Filenames")
STEP_SIZE_TRAIN = len(Train_df_img["Train_Filenames"]) // args.BATCH_SIZE
STEP_SIZE_VALID = len(Test_df_img["Test_Filenames"]) // args.BATCH_SIZE
...
print("Fitting the model to the data ...")
history = model.fit_generator(generator=trainGen,
epochs=args.Number_of_epochs,
steps_per_epoch=STEP_SIZE_TRAIN,
validation_data=testGen,
validation_steps=STEP_SIZE_VALID,
callbacks= [chekpont],
verbose=1)
There is a possibility that you are passing binary f1-score to compile function. This should fix the problem -
pip install tensorflow-addons
...
import tensorflow_addons as tfa
f1 = tfa.metrics.F1Score(36,'micro' or 'macro')
model.compile(...,metrics=[f1])
You can read more about how f1-micro and f1-macro is calculated and which can be useful here.
Somehow, the predict_generator() of Keras' model does not work as expected. I would rather loop through all test images one-by-one and get the prediction for each image in each iteration. I am using Plaid-ML Keras as my backend and to get prediction I am using the following code.
import os
from PIL import Image
import keras
import numpy
print("Prediction result:")
dir = "/path/to/test/images"
files = os.listdir(dir)
correct = 0
total = 0
#dictionary to label all traffic signs class.
classes = {
0:'This is Cat',
1:'This is Dog',
}
for file_name in files:
total += 1
image = Image.open(dir + "/" + file_name).convert('RGB')
image = image.resize((100,100))
image = numpy.expand_dims(image, axis=0)
image = numpy.array(image)
image = image/255
pred = model.predict_classes([image])[0]
sign = classes[pred]
if ("cat" in file_name) and ("cat" in sign):
print(correct,". ", file_name, sign)
correct+=1
elif ("dog" in file_name) and ("dog" in sign):
print(correct,". ", file_name, sign)
correct+=1
print("accuracy: ", (correct/total))

How to use previous output and hidden states from LSTM for the attention mechanism?

I am currently trying to code the attention mechanism from this paper: "Effective Approaches to Attention-based Neural Machine Translation", Luong, Pham, Manning (2015). (I use global attention with the dot score).
However, I am unsure on how to input the hidden and output states from the lstm decode. The issue is that the input of the lstm decoder at time t depends on quantities that I need to compute using the output and hidden states from t-1.
Here is the relevant part of the code:
with tf.variable_scope('data'):
prob = tf.placeholder_with_default(1.0, shape=())
X_or = tf.placeholder(shape = [batch_size, timesteps_1, num_input], dtype = tf.float32, name = "input")
X = tf.unstack(X_or, timesteps_1, 1)
y = tf.placeholder(shape = [window_size,1], dtype = tf.float32, name = "label_annotation")
logits = tf.zeros((1,1), tf.float32)
with tf.variable_scope('lstm_cell_encoder'):
rnn_layers = [tf.nn.rnn_cell.LSTMCell(size) for size in [hidden_size, hidden_size]]
multi_rnn_cell = tf.nn.rnn_cell.MultiRNNCell(rnn_layers)
lstm_outputs, lstm_state = tf.contrib.rnn.static_rnn(cell=multi_rnn_cell,inputs=X,dtype=tf.float32)
concat_lstm_outputs = tf.stack(tf.squeeze(lstm_outputs))
last_encoder_state = lstm_state[-1]
with tf.variable_scope('lstm_cell_decoder'):
initial_input = tf.unstack(tf.zeros(shape=(1,1,hidden_size2)))
rnn_decoder_cell = tf.nn.rnn_cell.LSTMCell(hidden_size, state_is_tuple = True)
# Compute the hidden and output of h_1
for index in range(window_size):
output_decoder, state_decoder = tf.nn.static_rnn(rnn_decoder_cell, initial_input, initial_state=last_encoder_state, dtype=tf.float32)
# Compute the score for source output vector
scores = tf.matmul(concat_lstm_outputs, tf.reshape(output_decoder[-1],(hidden_size,1)))
attention_coef = tf.nn.softmax(scores)
context_vector = tf.reduce_sum(tf.multiply(concat_lstm_outputs, tf.reshape(attention_coef, (window_size, 1))),0)
context_vector = tf.reshape(context_vector, (1,hidden_size))
# compute the tilda hidden state \tilde{h}_t=tanh(W[c_t, h_t]+b_t)
concat_context = tf.concat([context_vector, output_decoder[-1]], axis = 1)
W_tilde = tf.Variable(tf.random_normal(shape = [hidden_size*2, hidden_size2], stddev = 0.1), name = "weights_tilde", trainable = True)
b_tilde = tf.Variable(tf.zeros([1, hidden_size2]), name="bias_tilde", trainable = True)
hidden_tilde = tf.nn.tanh(tf.matmul(concat_context, W_tilde)+b_tilde) # hidden_tilde is [1*64]
# update for next time step
initial_input = tf.unstack(tf.reshape(hidden_tilde, (1,1,hidden_size2)))
last_encoder_state = state_decoder
# predict the target
W_target = tf.Variable(tf.random_normal(shape = [hidden_size2, 1], stddev = 0.1), name = "weights_target", trainable = True)
logit = tf.matmul(hidden_tilde, W_target)
logits = tf.concat([logits, logit], axis = 0)
logits = logits[1:]
The part inside the loop is what I am unsure of. Does tensorflow remember the computational graph when I overwrite the variable "initial_input" and "last_encoder_state"?
I think your model will be much simplified if you use tf.contrib.seq2seq.AttentionWrapper with one of implementations: BahdanauAttention or LuongAttention.
This way it'll be possible to wire the attention vector on a cell level, so that cell output is already after attention applied. Example from the seq2seq tutorial:
cell = LSTMCell(512)
attention_mechanism = tf.contrib.seq2seq.LuongAttention(512, encoder_outputs)
attn_cell = tf.contrib.seq2seq.AttentionWrapper(cell, attention_mechanism, attention_size=256)
Note that this way you won't need a loop of window_size, because tf.nn.static_rnn or tf.nn.dynamic_rnn will instantiate the cells wrapped with attention.
Regarding your question: you should distinguish python variables and tensorflow graph nodes: you can assign last_encoder_state to a different tensor, the original graph node won't change because of this. This is flexible, but can be also misleading in the result network - you might think that you connect an LSTM to one tensor, but it's actually the other. In general, you shouldn't do that.

(De-)Convutional lstm autoencoder - error jumps

I'm trying to build a convolutional lstm autoencoder (that also predicts future and past) with Tensorflow, and it works to a certain degree, but the error sometimes jumps back up, so essentially, it never converges.
The model is as follows:
The encoder starts with a 64x64 frame from a 20 frame bouncing mnist video for each time step of the lstm. Every stacking layer of LSTM halfs it and increases the depth via 2x2 convolutions with a stride of 2. (so -->32x32x3 -->...--> 1x1x96)
On the other hand, the lstm performs 3x3 convolutions with a stride of 1 on its state. Both results are concatenated to form the new state. In the same way, the decoder uses transposed convolutions to go back to the original format. Then the squared error is calculated.
The error starts at around 2700 and it takes around 20 hours (geforce1060) to get down to ~1700. At which point the jumping back up (and it sometimes jumps back up to 2300 or even ridiculous values like 440300) happens often enough that I can't really get any lower. Also at that point, it can usually pinpoint where the number should be, but its too fuzzy to actually make out the digit...
I tried different learning rates and optimizers, so if anybody knows why that jumping happens, that'd make me happy :)
Here is a graph of the loss with epochs:
import tensorflow as tf
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
#based on code by loliverhennigh (Github)
class ConvCell(tf.contrib.rnn.RNNCell):
count = 0 #exists only to remove issues with variable scope
def __init__(self, shape, num_features, transpose = False):
self.shape = shape
self.num_features = num_features
self._state_is_tuple = True
self._transpose = transpose
ConvCell.count+=1
self.count = ConvCell.count
#property
def state_size(self):
return (tf.contrib.rnn.LSTMStateTuple(self.shape[0:4],self.shape[0:4]))
#property
def output_size(self):
return tf.TensorShape(self.shape[1:4])
#here comes to the actual conv lstm implementation, if transpose = true, it performs a deconvolution on the input
def __call__(self, inputs, state, scope=None):
with tf.variable_scope(scope or type(self).__name__+str(self.count)):
c, h = state
state_shape = h.shape
input_shape = inputs.shape
#filter variables and convolutions on data coming from the same cell, a time step previous
h_filters = tf.get_variable("h_filters",[3,3,state_shape[3],self.num_features])
h_filters_gates = tf.get_variable("h_filters_gates",[3,3,state_shape[3],3])
h_partial = tf.nn.conv2d(h,h_filters,[1,1,1,1],'SAME')
h_partial_gates = tf.nn.conv2d(h,h_filters_gates,[1,1,1,1],'SAME')
c_filters = tf.get_variable("c_filters",[3,3,state_shape[3],3])
c_partial = tf.nn.conv2d(c,c_filters,[1,1,1,1],'SAME')
#filters and convolutions/deconvolutions on data coming fromthe cell input
if self._transpose:
x_filters = tf.get_variable("x_filters",[2,2,self.num_features,input_shape[3]])
x_filters_gates = tf.get_variable("x_filters_gates",[2,2,3,input_shape[3]])
x_partial = tf.nn.conv2d_transpose(inputs,x_filters,[int(state_shape[0]),int(state_shape[1]),int(state_shape[2]),self.num_features],[1,2,2,1],'VALID')
x_partial_gates = tf.nn.conv2d_transpose(inputs,x_filters_gates,[int(state_shape[0]),int(state_shape[1]),int(state_shape[2]),3],[1,2,2,1],'VALID')
else:
x_filters = tf.get_variable("x_filters",[2,2,input_shape[3],self.num_features])
x_filters_gates = tf.get_variable("x_filters_gates",[2,2,input_shape[3],3])
x_partial = tf.nn.conv2d(inputs,x_filters,[1,2,2,1],'VALID')
x_partial_gates = tf.nn.conv2d(inputs,x_filters_gates,[1,2,2,1],'VALID')
#some more lstm gate business
gate_bias = tf.get_variable("gate_bias",[1,1,1,3])
h_bias = tf.get_variable("h_bias",[1,1,1,self.num_features*2])
gates = h_partial_gates + x_partial_gates + c_partial + gate_bias
i,f,o = tf.split(gates,3,axis=3)
#concatenate the units coming from the spacial and the temporal dimension to build a unified state
concat = tf.concat([h_partial,x_partial],3) + h_bias
new_c = tf.nn.relu(concat)*tf.sigmoid(i)+c*tf.sigmoid(f)
new_h = new_c * tf.sigmoid(o)
new_state = tf.contrib.rnn.LSTMStateTuple(new_c,new_h)
return new_h, new_state #its redundant, but thats how tensorflow likes it, apparently
#global variables
LEARNING_RATE = 0.005
ITERATIONS_PER_EPOCH = 80
BATCH_SIZE = 75
TEST = False #manual switch to go from training to testing
if TEST:
BATCH_SIZE = 1
inputs = tf.placeholder(tf.float32, (20, BATCH_SIZE, 64, 64,1))
shape0 = [BATCH_SIZE,64,64,2]
shape1 = [BATCH_SIZE,32,32,6]
shape2 = [BATCH_SIZE,16,16,12]
shape3 = [BATCH_SIZE,8,8,24]
shape4 = [BATCH_SIZE,4,4,48]
shape5 = [BATCH_SIZE,2,2,96]
shape6 = [BATCH_SIZE,1,1,192]
#apparently tf.multirnncell has very specific requirements for the initial states oO
initial_state1 = (tf.contrib.rnn.LSTMStateTuple(tf.zeros(shape1),tf.zeros(shape1)),tf.contrib.rnn.LSTMStateTuple(tf.zeros(shape2),tf.zeros(shape2)),tf.contrib.rnn.LSTMStateTuple(tf.zeros(shape3),tf.zeros(shape3)),tf.contrib.rnn.LSTMStateTuple(tf.zeros(shape4),tf.zeros(shape4)),tf.contrib.rnn.LSTMStateTuple(tf.zeros(shape5),tf.zeros(shape5)),tf.contrib.rnn.LSTMStateTuple(tf.zeros(shape6),tf.zeros(shape6)))
initial_state2 = (tf.contrib.rnn.LSTMStateTuple(tf.zeros(shape5),tf.zeros(shape5)),tf.contrib.rnn.LSTMStateTuple(tf.zeros(shape4),tf.zeros(shape4)),tf.contrib.rnn.LSTMStateTuple(tf.zeros(shape3),tf.zeros(shape3)),tf.contrib.rnn.LSTMStateTuple(tf.zeros(shape2),tf.zeros(shape2)),tf.contrib.rnn.LSTMStateTuple(tf.zeros(shape1),tf.zeros(shape1)),tf.contrib.rnn.LSTMStateTuple(tf.zeros(shape0),tf.zeros(shape0)))
#encoding part of the autoencoder graph
cell1 = ConvCell(shape1,3)
cell2 = ConvCell(shape2,6)
cell3 = ConvCell(shape3,12)
cell4 = ConvCell(shape4,24)
cell5 = ConvCell(shape5,48)
cell6 = ConvCell(shape6,96)
mcell = tf.contrib.rnn.MultiRNNCell([cell1,cell2,cell3,cell4,cell5,cell6])
rnn_outputs, rnn_states = tf.nn.dynamic_rnn(mcell, inputs[0:20,:,:,:],initial_state=initial_state1,dtype=tf.float32, time_major=True)
#decoding part of the autoencoder graph, forward block and backwards block
cell9a = ConvCell(shape5,48,transpose = True)
cell10a = ConvCell(shape4,24,transpose = True)
cell11a = ConvCell(shape3,12,transpose = True)
cell12a = ConvCell(shape2,6,transpose = True)
cell13a = ConvCell(shape1,3,transpose = True)
cell14a = ConvCell(shape0,1,transpose = True)
mcella = tf.contrib.rnn.MultiRNNCell([cell9a,cell10a,cell11a,cell12a,cell13a,cell14a])
cell9b = ConvCell(shape5,48,transpose = True)
cell10b = ConvCell(shape4,24,transpose = True)
cell11b= ConvCell(shape3,12,transpose = True)
cell12b = ConvCell(shape2,6,transpose = True)
cell13b = ConvCell(shape1,3,transpose = True)
cell14b = ConvCell(shape0,1,transpose = True)
mcellb = tf.contrib.rnn.MultiRNNCell([cell9b,cell10b,cell11b,cell12b,cell13b,cell14b])
def PredictionLayer(rnn_outputs,viewPoint = 11, reverse = False):
predLength = viewPoint-2 if reverse else 20-viewPoint #vision is the input for the decoder
vision = tf.concat([rnn_outputs[viewPoint-1:viewPoint,:,:,:],tf.zeros([predLength,BATCH_SIZE,1,1,192])],0)
if reverse:
rnn_outputs2, rnn_states = tf.nn.dynamic_rnn(mcellb, vision, initial_state = initial_state2, time_major=True)
else:
rnn_outputs2, rnn_states = tf.nn.dynamic_rnn(mcella, vision, initial_state = initial_state2, time_major=True)
mean = tf.reduce_mean(rnn_outputs2,4)
if TEST:
return mean
if reverse:
return tf.reduce_sum(tf.square(mean-inputs[viewPoint-2::-1,:,:,:,0]))
else:
return tf.reduce_sum(tf.square(mean-inputs[viewPoint-1:20,:,:,:,0]))
if TEST:
mean = tf.concat([PredictionLayer(rnn_outputs,11,True)[::-1,:,:,:],createPredictionLayer(rnn_outputs,11)],0)
else: #training part of the graph
error = tf.zeros([1])
for i in range(8,15): #range size of 7 or less works, 9 or more does not, no idea why
error += PredictionLayer(rnn_outputs, i)
error += PredictionLayer(rnn_outputs, i, True)
train_fn = tf.train.RMSPropOptimizer(learning_rate=LEARNING_RATE).minimize(error)
################################################################################
## TRAINING LOOP ##
################################################################################
#code based on siemanko/tf_lstm.py (Github)
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.8)
saver = tf.train.Saver(restore_sequentially=True, allow_empty=True,)
session = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
session.run(tf.global_variables_initializer())
vids = np.load("mnist_test_seq.npy") #20/10000/64/64 , moving mnist dataset from http://www.cs.toronto.edu/~nitish/unsupervised_video/
vids = vids[:,0:6000,:,:] #training set
saver.restore(session,tf.train.latest_checkpoint('./conv_lstm_multiples_v2/'))
#saver.restore(session,'.\conv_lstm_multiples\iteration-74')
for epoch in range(1000):
if TEST:
break
epoch_error = 0
#randomize batches each epoch
vids = np.swapaxes(vids,0,1)
np.random.shuffle(vids)
vids = np.swapaxes(vids,0,1)
for i in range(ITERATIONS_PER_EPOCH):
#running the graph and feeding data
err,_ = session.run([error, train_fn], {inputs: np.expand_dims(vids[:,i*BATCH_SIZE:(i+1)*BATCH_SIZE,:,:],axis=4)})
print(err)
epoch_error += err
#training error each epoch and regular saving
epoch_error /= (ITERATIONS_PER_EPOCH*BATCH_SIZE*4096*20*7)
if (epoch+1) % 5 == 0:
saver.save(session,'.\conv_lstm_multiples_v2\iteration',global_step=epoch)
print("saved")
print("Epoch %d, train error: %f" % (epoch, epoch_error))
#testing
plt.ion()
f, axarr = plt.subplots(2)
vids = np.load("mnist_test_seq.npy")
for i in range(6000,10000):
img = session.run([mean], {inputs: np.expand_dims(vids[:,i:i+1,:,:],axis=4)})
for j in range(20):
axarr[0].imshow(img[0][j,0,:,:])
axarr[1].imshow(vids[j,i,:,:])
plt.show()
plt.pause(0.1)
Usually this happens when gradients' magnitude is very high at some point and causes your network parameters to change a lot. To verify that it is indeed the case, you can produce the same plot of gradient magnitudes and see if they jump right before the loss jump. Assuming this is the case, the classic approach is to use gradient clipping (or go all the way to natural gradient).

How to use maxout activation function in tensorflow?

I want to use maxout activation function in tensorflow, but I don't know which function should use.
I sent a pull request for maxout, here is the link:
https://github.com/tensorflow/tensorflow/pull/5528
Code is as follows:
def maxout(inputs, num_units, axis=None):
shape = inputs.get_shape().as_list()
if axis is None:
# Assume that channel is the last dimension
axis = -1
num_channels = shape[axis]
if num_channels % num_units:
raise ValueError('number of features({}) is not a multiple of num_units({})'
.format(num_channels, num_units))
shape[axis] = -1
shape += [num_channels // num_units]
outputs = tf.reduce_max(tf.reshape(inputs, shape), -1, keep_dims=False)
return outputs
Here is how it works:
I don't think there is a maxout activation but there is nothing stopping yourself from making it yourself. You could do something like the following.
with tf.variable_scope('maxout'):
layer_input = ...
layer_output = None
for i in range(n_maxouts):
W = tf.get_variable('W_%d' % d, (n_input, n_output))
b = tf.get_variable('b_%d' % i, (n_output,))
y = tf.matmul(layer_input, W) + b
if layer_output is None:
layer_output = y
else:
layer_output = tf.maximum(layer_output, y)
Note that this is code I just wrote in my browser so there may be syntax errors but you should get the general idea. You simply perform a number of linear transforms and take the maximum across all the transforms.
How about this code?
This seems to work in my test.
def max_out(input_tensor,output_size):
shape = input_tensor.get_shape().as_list()
if shape[1] % output_size == 0:
return tf.transpose(tf.reduce_max(tf.split(input_tensor,output_size,1),axis=2))
else:
raise ValueError("Output size or input tensor size is not fine. Please check it. Reminder need be zero.")
I refer the diagram in the following page.
From version 1.4 on you can use tf.contrib.layers.maxout.
Maxout is a layer such that it calculates N*M output for a N*1 input, and then it returns the maximum value across the column, i.e., the final output has shape N*1 as well. Basically it uses multiple linear fittings to mimic a complex function.