TensorFlow: How to embed float sequences to fixed size vectors? - tensorflow

I am looking methods to embed variable length sequences with float values to fixed size vectors. The input formats as following:
[f1,f2,f3,f4]->[f1,f2,f3,f4]->[f1,f2,f3,f4]-> ... -> [f1,f2,f3,f4]
[f1,f2,f3,f4]->[f1,f2,f3,f4]->[f1,f2,f3,f4]->[f1,f2,f3,f4]-> ... -> [f1,f2,f3,f4]
...
[f1,f2,f3,f4]-> ... -> ->[f1,f2,f3,f4]
Each line is a variable length sequnece, with max length 60. Each unit in one sequece is a tuple of 4 float values. I have already paded zeros to fill all sequences to the same length.
The following architecture seems solve my problem if I use the output as the same as input, I need the thought vector in the center as the embedding for the sequences.
In tensorflow, I have found tow candidate methods tf.contrib.legacy_seq2seq.basic_rnn_seq2seq and tf.contrib.legacy_seq2seq.embedding_rnn_seq2seq.
However, these tow methos seems to be used to solve NLP problem, and the input must be discrete value for words.
So, is there another functions to solve my problems?

All you need is only an RNN, not the seq2seq model, since seq2seq goes with an additional decoder which is unecessary in your case.
An example code:
import numpy as np
import tensorflow as tf
from tensorflow.contrib import rnn
input_size = 4
max_length = 60
hidden_size=64
output_size = 4
x = tf.placeholder(tf.float32, shape=[None, max_length, input_size], name='x')
seqlen = tf.placeholder(tf.int64, shape=[None], name='seqlen')
lstm_cell = rnn.BasicLSTMCell(hidden_size, forget_bias=1.0)
outputs, states = tf.nn.dynamic_rnn(cell=lstm_cell, inputs=x, sequence_length=seqlen, dtype=tf.float32)
encoded_states = states[-1]
W = tf.get_variable(
name='W',
shape=[hidden_size, output_size],
dtype=tf.float32,
initializer=tf.random_normal_initializer())
b = tf.get_variable(
name='b',
shape=[output_size],
dtype=tf.float32,
initializer=tf.random_normal_initializer())
z = tf.matmul(encoded_states, W) + b
results = tf.sigmoid(z)
###########################
## cost computing and training components goes here
# e.g.
# targets = tf.placeholder(tf.float32, shape=[None, input_size], name='targets')
# cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=targets, logits=z))
# optimizer = tf.train.AdamOptimizer(learning_rate=0.1).minimize(cost)
###############################
init = tf.global_variables_initializer()
batch_size = 4
data_in = np.zeros((batch_size, max_length, input_size), dtype='float32')
data_in[0, :4, :] = np.random.rand(4, input_size)
data_in[1, :6, :] = np.random.rand(6, input_size)
data_in[2, :20, :] = np.random.rand(20, input_size)
data_in[3, :, :] = np.random.rand(60, input_size)
data_len = np.asarray([4, 6, 20, 60], dtype='int64')
with tf.Session() as sess:
sess.run(init)
#########################
# training process goes here
#########################
res = sess.run(results,
feed_dict={
x: data_in,
seqlen: data_len})
print(res)

To encode sequence to a fixed length vector you typically use recurrent neural networks (RNNs) or convolutional neural networks (CNNs).
If you use a recurrent neural network you can use the output at the last time step (last element in your sequence). This corresponds to the thought vector in your question. Have a look at tf.dynamic_rnn. dynamic_rnn requires you to specify to type of RNN cell you want to use. tf.contrib.rnn.LSTMCell and tf.contrib.rnn.GRUCell are most common.
If you want to use CNNs you need to use 1 dimensional convolutions. To build CNNs you need tf.layers.conv1d and tf.layers.max_pooling1d

I have found a solution to my problem, using the following architecture,
,
The LSTMs layer below encode the series x1,x2,...,xn. The last output, the green one, is duplicated to the same count as the input for the decoding LSTM layers above. The tensorflow code is as following
series_input = tf.placeholder(tf.float32, [None, conf.max_series, conf.series_feature_num])
print("Encode input Shape", series_input.get_shape())
# encoding layer
encode_cell = tf.contrib.rnn.MultiRNNCell(
[tf.contrib.rnn.BasicLSTMCell(conf.rnn_hidden_num, reuse=False) for _ in range(conf.rnn_layer_num)]
)
encode_output, _ = tf.nn.dynamic_rnn(encode_cell, series_input, dtype=tf.float32, scope='encode')
print("Encode output Shape", encode_output.get_shape())
# last output
encode_output = tf.transpose(encode_output, [1, 0, 2])
last = tf.gather(encode_output, int(encode_output.get_shape()[0]) - 1)
# duplite the last output of the encoding layer
decoder_input = tf.stack([last for _ in range(conf.max_series)], axis=1)
print("Decoder input shape", decoder_input.get_shape())
# decoding layer
decode_cell = tf.contrib.rnn.MultiRNNCell(
[tf.contrib.rnn.BasicLSTMCell(conf.series_feature_num, reuse=False) for _ in range(conf.rnn_layer_num)]
)
decode_output, _ = tf.nn.dynamic_rnn(decode_cell, decoder_input, dtype=tf.float32, scope='decode')
print("Decode output", decode_output.get_shape())
# Loss Function
loss = tf.losses.mean_squared_error(labels=series_input, predictions=decode_output)
print("Loss", loss)

Related

RNN sequence learning

I am new to TensorFlow RNN prediction.
I am trying to use RNN with BasicLSTMCell to predict sequence, such as
1,2,3,4,5 ->6
3,4,5,6,7 ->8
35,36,37,38,39 ->40
My code doesn't report error, but outputs for every batch seem to be the same, and the cost seem to not reduce while training.
When I divided all training data by 100
0.01,0.02,0.03,0.04,0.05 ->0.06
0.03,0.04,0.05,0.06,0.07 ->0.08
0.35,0.36,0.37,0.38,0.39 ->0.40
The result is pretty good, the correlation between prediction and real values is very high (0.9998).
I suspect the problem is because integer and float? but I cannot explain the reason. Anyone can help? Many thanks!!
Here is the code
library(tensorflow)
start=sample(1:1000, 100000, T)
start1= start+1
start2=start1+1
start3= start2+1
start4=start3+1
start5= start4+1
start6=start5+1
label=start6+1
data=data.frame(start, start1, start2, start3, start4, start5, start6, label)
data=as.matrix(data)
n = nrow(data)
trainIndex = sample(1:n, size = round(0.7*n), replace=FALSE)
train = data[trainIndex ,]
test = data[-trainIndex ,]
train_data= train[,1:7]
train_label= train[,8]
means=apply(train_data, 2, mean)
sds= apply(train_data, 2, sd)
train_data=(train_data-means)/sds
test_data=test[,1:7]
test_data=(test_data-means)/sds
test_label=test[,8]
batch_size = 50L
n_inputs = 1L # MNIST data input (img shape: 28*28)
n_steps = 7L # time steps
n_hidden_units = 10L # neurons in hidden layer
n_outputs = 1L # MNIST classes (0-9 digits)
x = tf$placeholder(tf$float32, shape(NULL, n_steps, n_inputs))
y = tf$placeholder(tf$float32, shape(NULL, 1L))
weights_in= tf$Variable(tf$random_normal(shape(n_inputs, n_hidden_units)))
weights_out= tf$Variable(tf$random_normal(shape(n_hidden_units, 1L)))
biases_in=tf$Variable(tf$constant(0.1, shape= shape(n_hidden_units )))
biases_out = tf$Variable(tf$constant(0.1, shape=shape(1L)))
RNN=function(X, weights_in, weights_out, biases_in, biases_out)
{
X = tf$reshape(X, shape=shape(-1, n_inputs))
X_in = tf$sigmoid (tf$matmul(X, weights_in) + biases_in)
X_in = tf$reshape(X_in, shape=shape(-1, n_steps, n_hidden_units)
lstm_cell = tf$contrib$rnn$BasicLSTMCell(n_hidden_units, forget_bias=1.0, state_is_tuple=T)
init_state = lstm_cell$zero_state(batch_size, dtype=tf$float32)
outputs_final_state = tf$nn$dynamic_rnn(lstm_cell, X_in, initial_state=init_state, time_major=F)
outputs= tf$unstack(tf$transpose(outputs_final_state[[1]], shape(1,0,2)))
results = tf$matmul(outputs[[length(outputs)]], weights_out) + biases_out
return(results)
}
pred = RNN(x, weights_in, weights_out, biases_in, biases_out)
cost = tf$losses$mean_squared_error(pred, y)
train_op = tf$contrib$layers$optimize_loss(loss=cost, global_step=tf$contrib$framework$get_global_step(), learning_rate=0.05, optimizer="SGD")
init <- tf$global_variables_initializer()
sess <- tf$Session()
sess.run(init)
step = 0
while (step < 1000)
{
train_data2= train_data[(step*batch_size+1) : (step*batch_size+batch_size) , ]
train_label2=train_label[(step*batch_size+1):(step*batch_size+batch_size)]
batch_xs <- sess$run(tf$reshape(train_data2, shape(batch_size, n_steps, n_inputs))) # Reshape
batch_ys= matrix(train_label2, ncol=1)
sess$run(train_op, feed_dict = dict(x = batch_xs, y= batch_ys))
mycost <- sess$run(cost, feed_dict = dict(x = batch_xs, y= batch_ys))
print (mycost)
test_data2= test_data[(0*batch_size+1) : (0*batch_size+batch_size) , ]
test_label2=test_label[(0*batch_size+1):(0*batch_size+batch_size)]
batch_xs <- sess$run(tf$reshape(test_data2, shape(batch_size, n_steps, n_inputs))) # Reshape
batch_ys= matrix(test_label2, ncol=1)
step=step+1
}
First, it's quite useful to always normalize your network inputs (there are different approaches, divide by a maximum value, subtract mean and divide by std and many more). This will help your optimizer a lot.
Second, and actually most important in your case, after the RNN output you are applying sigmoid function. If you check the plot of the sigmoid function, you will see that it actually scales all inputs to the range (0,1). So basically no matter how big your inputs are your output will always be at most 1. Thus you should not use any activation functions at the output layer in regression problems.
Hope it helps.

Updating the Initial state of a recurrent neural network in tensorflow

Currently I have the following code:
init_state = tf.Variable(tf.zeros([batch_partition_length, state_size])) # -> [16, 1024].
final_state = tf.Variable(tf.zeros([batch_partition_length, state_size]))
And inside my inference method that is responsible producing the output, I have the following:
def inference(frames):
# Note that I write the final_state as a global valriable to avoid the shadowing issue, since it is referenced at the dynamic_rnn line.
global final_state
# .... Here we have some conv layers and so on...
# Now the RNN cell
with tf.variable_scope('local1') as scope:
# Move everything into depth so we can perform a single matrix multiply.
shape_d = pool3.get_shape()
shape = shape_d[1] * shape_d[2] * shape_d[3]
# tf_shape = tf.stack(shape)
tf_shape = 1024
print("shape:", shape, shape_d[1], shape_d[2], shape_d[3])
# So note that tf_shape = 1024, this means that we have 1024 features are fed into the network. And
# the batch size = 1024. Therefore, the aim is to divide the batch_size into num_steps so that
reshape = tf.reshape(pool3, [-1, tf_shape])
# Now we need to reshape/divide the batch_size into num_steps so that we would be feeding a sequence
rnn_inputs = tf.reshape(reshape, [batch_partition_length, step_size, tf_shape])
print('RNN inputs shape: ', rnn_inputs.get_shape()) # -> (16, 64, 1024).
cell = tf.contrib.rnn.BasicRNNCell(state_size)
# note that rnn_outputs are the outputs but not multiplied by W.
rnn_outputs, final_state = tf.nn.dynamic_rnn(cell, rnn_inputs, initial_state=init_state)
# linear Wx + b
with tf.variable_scope('softmax_linear') as scope:
weight_softmax = \
tf.Variable(
tf.truncated_normal([state_size, n_classes], stddev=1 / state_size, dtype=tf.float32, name='weight_softmax'))
bias_softmax = tf.constant(0.0, tf.float32, [n_classes], name='bias_softmax')
softmax_linear = tf.reshape(
tf.matmul(tf.reshape(rnn_outputs, [-1, state_size]), weight_softmax) + bias_softmax,
[batch_size, n_classes])
print('Output shape:', softmax_linear.get_shape())
return softmax_linear
# Here we define the loss, accuracy and the optimzer.
# now run the graph:
with tf.Session() as sess:
_, accuracy_train, loss_train, summary = \
sess.run([optimizer, accuracy, cost_scalar, merged], feed_dict={x: image_batch,
y_valence: valences,
confidence_holder: confidences})
....
Problem: How I would be able to assign initial_state the value stored in final_state? That is, how to more update a Variable value given the other?
I have used the following:
tf.assign(init_state, final_state.eval())
under session after running the sess.run command. But, this is throwing an error:
You must feed a value for placeholder tensor 'inputs' with dtype float
Where tf.Variable: "input" is declared as follows:
x = tf.placeholder(tf.float32, [None, 112, 112, 3], name='inputs')
And the feeding is done after reading the images from the tfRecords through the following command:
example = tf.train.Example()
example.ParseFromString(string_record)
height = int(example.features.feature['height']
.int64_list
.value[0])
width = int(example.features.feature['width']
.int64_list
.value[0])
img_string = (example.features.feature['image_raw']
.bytes_list
.value[0])
img_1d = np.fromstring(img_string, dtype=np.uint8)
reconstructed_img = img_1d.reshape((height, width, -1)) # Where this is added to the image_batch list, which is fed into the placeholder.
And if tried the following:
img_1d = np.fromstring(img_string, dtype=np.float32)
This will produce the following error:
ValueError: cannot reshape array of size 9408 into shape (112,112,newaxis)
Any help is much appreciated!!
So here are the mistakes that I have done so far. After doing some revision I figured out the following:
I shouldn't create the final_state as a tf.Variable. Since tf.nn.dynamic_rnn return tensors as ndarray, then, I should not instantiate the final_state int the beginning. And I should not use the global final_state under the function definition.
In order to assign the initial state the final_state, I used:
tf.assign(intial_state, final_state)
And things work out.
Note: in tensorflow, an operation returns the data as numpy array in python and as tensorflow::Tensor in C and C++.
Have a look at https://www.tensorflow.org/versions/r0.10/get_started/basic_usage for more informaiton.

Tensorflow: stacked bidirectional LSTMs

I want to stack two LSTMs without using MultiRNN wrapper. However, following code results with ValueError: Shapes (3,) and (2,) are not compatible because of inputs=states_fw_1 in the second LSTM. How can I pass hidden state of the first LSTM as input to the second?
LSTM 1
with tf.name_scope("BiLSTM_1"):
with tf.variable_scope('forward_1'):
cell_fw_1 = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
with tf.variable_scope('backward_srl'):
cell_bw_srl = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
outputs_1, states_1 = tf.nn.bidirectional_dynamic_rnn(
cell_fw=cell_fw_1,
cell_bw=cell_bw_1,
dtype=tf.float64,
sequence_length=self.input_seq_len,
inputs=self.embedded_input_layer,
scope='BiLSTM_1')
State is tuple
states_fw_1, states_bw_1 = states
LSTM 2
with tf.name_scope("BiLSTM_2"):
with tf.variable_scope('forward'):
cell_fw = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
with tf.variable_scope('backward'):
cell_bw = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
outputs, states = tf.nn.bidirectional_dynamic_rnn(
cell_fw=cell_fw,
cell_bw=cell_bw,
dtype=tf.float64,
sequence_length=self.input_seq_len,
inputs=states_fw_1,
scope="BiLSTM_extraction")
I'm learning TF 2 days (so I'm not pro-guy) and I found this problem to be interested to resolve.
Here are my findings:
You want to do thing which is not possible to obtain using 'LSTMCell' implementation. Here is why:
You want to feed the "states_fw_1 to the next BI-LSTM. So, first question should be: What are dimensions of "states_fw_1"? For any RNN implementation you need [batch_size, seq_len, input_size]. For "states_fw_1" it is [batch_size, hidden_size] (I have just check the size of "states_fw_1" running below code). So you can see that your output does not fit to RNN requirements. It is because model output just one the last state of LSTM cell, not all the history (see the documentation). And you are not interested in last state, because you want feed state[t-step] to the layer above.The 'state_fw_1' is useful when you want to classify the sequence (not each element in sequence)
Edit: 'state_fw_1' contain the last "hidden_state" and last "memory_cell". For classification only "hidden_state" will be usefull, I think.
So you just need to use the merged output (from forward and backward pass) . And 'LSTMCell' cell output have size [batch_size, seq_len, hidden_size*2], (*2 as forward and backward) so it is right for next stacked RNN (output come from each time-step, not like the state).
Here is the code which I was testing:
import tensorflow as tf
import numpy as np
hidden_size = 21
seq_len = tf.placeholder(tf.int32, [None])
inputs = tf.placeholder(tf.float32, [None, None, 32])
with tf.variable_scope('BiLSTM_1'):
with tf.variable_scope('forward_1'):
cell_fw_1 = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
with tf.variable_scope('backward_srl'):
cell_bw_1 = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
outputs_1, states_1 = tf.nn.bidirectional_dynamic_rnn(
cell_fw=cell_fw_1,
cell_bw=cell_bw_1,
dtype=tf.float32,
sequence_length=seq_len,
inputs=inputs,
scope='BiLSTM_1')
# Merge Output tensor from forward and backward pass. It size is [batch_size, seq_len, 2*hidden_size]
outputs_1 = tf.concat(outputs_1, 2)
with tf.name_scope("BiLSTM_2"):
with tf.variable_scope('forward'):
cell_fw = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
with tf.variable_scope('backward'):
cell_bw = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
outputs, states = tf.nn.bidirectional_dynamic_rnn(
cell_fw=cell_fw,
cell_bw=cell_bw,
dtype=tf.float32,
sequence_length=seq_len,
inputs=outputs_1,
scope="BiLSTM_2")
# Initializate the weights and biases
init = tf.initialize_all_variables()
batch_size = 5
seq_len_val = 10
train_inputs = np.zeros((batch_size, seq_len_val, 32))
train_seq_len = np.ones(batch_size) * seq_len_val
with tf.Session() as session:
session.run(init)
feed = {inputs: train_inputs, seq_len: train_seq_len}
out,state,state_1 = session.run([outputs,states, states_1],feed)
print ("State size: ", state_1[0].c.shape, " Out Size: ", out[0][0].shape)
print ("Batch_size: ", batch_size, " Sequence Len: ", seq_len_val, " Hidden Size: ", hidden_size)
'outputs_1' returned by LSTM 1 is a tuple containing 'outputs_fw' and 'outputs_bw'.
'outputs_fw' and 'outputs_bw' will be of dimension: [batch_size, sequence_length, hidden_size].
You have to concatenate 'outputs_fw' and 'outputs_bw' hidden states (us tf.concat with axis=2) and pass that as input to LSTM 2 instead of passing 'states_fw_1' as input to LSTM 2.

how to stack LSTM layers using TensorFlow

what I have is the following, which I believe is a network with one hidden LSTM layer:
# Parameters
learning rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 10
# Network Parameters
n_input = 13
n_steps = 10
n_hidden = 512
n_classes = 13
# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])
# Define weights
weights = {
'out' : tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
biases = {
'out' : tf.Variable(tf.random_normal([n_classes]))
}
However, I am trying to build an LSTM network using TensorFlow to predict power consumption. I have been looking around to find a good example, but I could not find any model with 2 hidden LSTM layers. Here's the model that I would like to build:
1 input layer,
1 output layer,
2 hidden LSTM layers(with 512 neurons in each),
time step(sequence length): 10
Could anyone guide me to build this using TensorFlow? ( from defining weights, building input shape, training, predicting, use of optimizer or cost function, etc), any help would be much appreciated.
Thank you so much in advance!
Here is how I do it in a translation model with GRU cells. You can just replace the GRU with an LSTM. It is really easy just use tf.nn.rnn_cell.MultiRNNCell with a list of the multiple cells it should wrap. In the code bellow I am manually unrolling it but you can pass it to tf.nn.dynamic_rnn or tf.nn.rnn as well.
y = input_tensor
with tf.variable_scope('encoder') as scope:
rnn_cell = rnn.MultiRNNCell([rnn.GRUCell(1024) for _ in range(3)])
state = tf.zeros((BATCH_SIZE, rnn_cell.state_size))
output = [None] * TIME_STEPS
for t in reversed(range(TIME_STEPS)):
y_t = tf.reshape(y[:, t, :], (BATCH_SIZE, -1))
output[t], state = rnn_cell(y_t, state)
scope.reuse_variables()
y = tf.pack(output, 1)
First you need some placeholders to put your training data (one batch)
x_input = tf.placeholder(tf.float32, [batch_size, truncated_series_length, 1])
y_output = tf.placeholder(tf.float32, [batch_size, truncated_series_length, 1])
A LSTM need a state, which consists of two components, the hidden state and the cell state, very good guide here: https://arxiv.org/pdf/1506.00019.pdf. For every layer in the LSTM you have one cell state and one hidden state.
The problem is that Tensorflow stores this in a LSTMStateTuple which you can not send into placeholder. So you need to store it in a Tensor, and then unpack it into a tuple:
state_placeholder = tf.placeholder(tf.float32, [num_layers, 2, batch_size, state_size])
l = tf.unpack(state_placeholder, axis=0)
rnn_tuple_state = tuple(
[tf.nn.rnn_cell.LSTMStateTuple(l[idx][0], l[idx][1])
for idx in range(num_layers)]
)
Then you can use the built-in Tensorflow API to create the stacked LSTM layer.
cell = tf.nn.rnn_cell.LSTMCell(state_size, state_is_tuple=True)
cell = tf.nn.rnn_cell.MultiRNNCell([cell]*num_layers, state_is_tuple=True)
outputs, state = tf.nn.dynamic_rnn(cell, x_input, initial_state=rnn_tuple_state)
From here you continue with the outputs to calculate logits and then a loss with respect to the y_inputs.
Then you run each batch with the sess.run-command, with truncated backpropagation (good explanation here http://r2rt.com/styles-of-truncated-backpropagation.html)
init_state = np.zeros((num_layers, 2, batch_size, state_size))
...current_state... = sess.run([...state...], feed_dict={x_input:batch_in, state_placeholder:current_state ...})
current_state = np.array(current_state)
You will have to convert the state to a numpy array before feeding it again.
Perhaps it is better to use a librarly like Tflearn or Keras instead?

Minimal RNN example in tensorflow

Trying to implement a minimal toy RNN example in tensorflow.
The goal is to learn a mapping from the input data to the target data, similar to this wonderful concise example in theanets.
Update: We're getting there. The only part remaining is to make it converge (and less convoluted). Could someone help to turn the following into running code or provide a simple example?
import tensorflow as tf
from tensorflow.python.ops import rnn_cell
init_scale = 0.1
num_steps = 7
num_units = 7
input_data = [1, 2, 3, 4, 5, 6, 7]
target = [2, 3, 4, 5, 6, 7, 7]
#target = [1,1,1,1,1,1,1] #converges, but not what we want
batch_size = 1
with tf.Graph().as_default(), tf.Session() as session:
# Placeholder for the inputs and target of the net
# inputs = tf.placeholder(tf.int32, [batch_size, num_steps])
input1 = tf.placeholder(tf.float32, [batch_size, 1])
inputs = [input1 for _ in range(num_steps)]
outputs = tf.placeholder(tf.float32, [batch_size, num_steps])
gru = rnn_cell.GRUCell(num_units)
initial_state = state = tf.zeros([batch_size, num_units])
loss = tf.constant(0.0)
# setup model: unroll
for time_step in range(num_steps):
if time_step > 0: tf.get_variable_scope().reuse_variables()
step_ = inputs[time_step]
output, state = gru(step_, state)
loss += tf.reduce_sum(abs(output - target)) # all norms work equally well? NO!
final_state = state
optimizer = tf.train.AdamOptimizer(0.1) # CONVERGEs sooo much better
train = optimizer.minimize(loss) # let the optimizer train
numpy_state = initial_state.eval()
session.run(tf.initialize_all_variables())
for epoch in range(10): # now
for i in range(7): # feed fake 2D matrix of 1 byte at a time ;)
feed_dict = {initial_state: numpy_state, input1: [[input_data[i]]]} # no
numpy_state, current_loss,_ = session.run([final_state, loss,train], feed_dict=feed_dict)
print(current_loss) # hopefully going down, always stuck at 189, why!?
I think there are a few problems with your code, but the idea is right.
The main issue is that you're using a single tensor for inputs and outputs, as in:
inputs = tf.placeholder(tf.int32, [batch_size, num_steps]).
In TensorFlow the RNN functions take a list of tensors (because num_steps can vary in some models). So you should construct inputs like this:
inputs = [tf.placeholder(tf.int32, [batch_size, 1]) for _ in xrange(num_steps)]
Then you need to take care of the fact that your inputs are int32s, but a RNN cell works on float vectors - that's what embedding_lookup is for.
And finally you'll need to adapt your feed to put in the input list.
I think the ptb tutorial is a reasonable place to look, but if you want an even more minimal example of an out-of-the-box RNN you can take a look at some of the rnn unit tests, e.g., here.
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/kernel_tests/rnn_test.py#L164