RNN & Batches in Tensorflow - tensorflow

The batche approach for RNN in Tensorflow is not clear to me. For example tf.nn.rnn Take as input list of Tensors [BATCH_SIZE x INPUT_SIZE]. We normally are feeding to session batches of data, so why it take list of batches not single batch?
This leads to next confusion for me:
data = []
for _ in range(0, len(train_input)):
data.append(tf.placeholder(tf.float32, [CONST_BATCH_SIZE, CONST_INPUT_SIZE]))
lstm = tf.nn.rnn_cell.BasicLSTMCell(CONST_NUM_OF_HIDDEN_STATES)
val, state = tf.nn.rnn(lstm, data, dtype=tf.float32)
I pass list of Tensors [CONST_BATCH_SIZE x CONST_INPUT_OTPUT_SIZE] to tf.nn.rnn and got output value that is list of Tensors [CONST_BATCH_SIZE x CONST_NUM_OF_HIDDEN_STATES]. Now I want to use softmax for all HIDDEN_STATES outputs and need to calculate weights with matmaul + bias
Should I use for matmul:
weight = tf.Variable(tf.zeros([CONST_NUM_OF_HIDDEN_STATES, CONST_OTPUT_SIZE]))
for i in val:
mult = tf.matmul(i, weight)
bias = tf.Variable(tf.zeros([CONST_OTPUT_SIZE]))
prediction = tf.nn.softmax(mult + bias)
Or should I create 2D array from val and then use tf.matmul without for?

This should work. output is batched data from RNN. For all the batch input probs will have the probability.
logits = tf.matmul(output, softmax_w) + softmax_b
probs = tf.nn.softmax(logits)

Related

Time-Series Transformer Model Prediction Accuracy

I have created a transformer model for multivariate time series predictions for a linear regression problem.
Details about the Dataset
I have the hourly varying data i.e., single feature (lagged energy use data). The model improvement could be done by increasing the number of lagged energy use data, which provide more information to the model) to predict the time sequence (energy consumption of a building). So my input has the shape X.shape = (8783, 168, 1) i.e., 8783 time sequences, each sequence contains lagged energy use data of one week i.e., 24*7 =168 hourly entries/vectors and each vector contains lagged energy use data as input. My output has the shape Y.shape = (8783,1) i.e., 8783 sequences each containing 1 output value (i.e., building energy consumption value after every hour).
Model Details
I took as a model an example from the official keras site. It is created for classification problems, I modified it for my regression problem by changing the activation of last output layer from sigmoid to relu. Input shape (train_f) = (8783, 168, 1) Output shape (train_P) = (8783,1) When I trained the model for 100 no. of epochs it converges very well for less number of epochs as compared to my reference models (i.e., LSTMs and LSTMS with self attention). After training, when the model is asked to make prediction by feeding in the test data, the prediction performance is also good as compare to the reference models.
For the same model predicting well, in order to improve its performance now I am feeding in the lagged data of energy use of 1 month i.e., 168*4 = 672 hourly entries/vectors and each vector contains lagged energy use data as input. So my input going into the model now has the shape X.shape = (8783, 672, 1). Both the training and prediction accuracy drops in comparison to weekly input data as seen below.
**lagged energy use data for 1 week i.e., X.shape = (8783, 168, 1)**
**MSE RMSE MAE R-Score**
Training data 1.0489 1.0242 0.6395 0.9707
Testing data 0.6221 0.7887 0.5648 0.9171
**lagged energy use data for 1 week i.e., X.shape = (8783, 672, 1)**
**MSE RMSE MAE R-Score**
Training data 1.6424 1.2816 0.7326 0.9567
Testing data 1.4991 1.2244 0.9233 0.6903
I believe that providing more information to the model should result in better predictions. Any suggestions, how to improve the model prediction/test accuracy? Is there something wrong with the model?
df_energy = pd.read_excel("/content/drive/MyDrive/Architecture Topology/Building_energy_consumption_record.xlsx")
extract_for_normalization = list(df_energy)[1]
df_data_float = df_energy[extract_for_normalization].astype(float)
df_data_array = df_data_float.to_numpy()
df_data_array_1 = df_data_array.reshape(-1,1)
from sklearn.model_selection import train_test_split
train_X, test_X = train_test_split(df_data_array_1, train_size = 0.7, shuffle = False)
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_train_X=scaler.fit_transform(train_X)
**Converting train_X into required shape (inputs,sequences, features)**
train_f = [] #features input from training data
train_p = [] # prediction values
n_future = 1 #number of days we want to predict into the future
n_past = 672 # no. of time series input features to be considered for training
for val in range(n_past, len(scaled_train_X) - n_future+1):
train_f.append(scaled_train_X[val - n_past:val, 0:scaled_train_X.shape[1]])
train_p.append(scaled_train_X[val + n_future - 1:val + n_future, -1])
train_f, train_p = np.array(train_f), np.array(train_p)
**Transformer Model**
def transformer_encoder(inputs, head_size, num_heads, ff_dim, dropout=0):
# Normalization and Attention
x = layers.LayerNormalization(epsilon=1e-6)(inputs)
x = layers.MultiHeadAttention(
key_dim=head_size, num_heads=num_heads, dropout=dropout
)(x, x)
x = layers.Dropout(dropout)(x)
res = x + inputs
# Feed Forward Part
x = layers.LayerNormalization(epsilon=1e-6)(res)
x = layers.Conv1D(filters=ff_dim, kernel_size=1, activation="relu")(x)
x = layers.Dropout(dropout)(x)
x = layers.Conv1D(filters=inputs.shape[-1], kernel_size=1)(x)
return x + res
def build_model(
input_shape,
head_size,
num_heads,
ff_dim,
num_transformer_blocks,
mlp_units,
dropout=0,
mlp_dropout=0,
):
inputs = keras.Input(shape=input_shape)
x = inputs
for _ in range(num_transformer_blocks):
x = transformer_encoder(x, head_size, num_heads, ff_dim, dropout)
x = layers.GlobalAveragePooling1D(data_format="channels_first")(x)
for dim in mlp_units:
x = layers.Dense(dim, activation="relu")(x)
x = layers.Dropout(mlp_dropout)(x)
outputs = layers.Dense(train_p.shape[1])(x)
return keras.Model(inputs, outputs)
input_shape = (train_f.shape[1], train_f.shape[2])
model = build_model(
input_shape,
head_size=256,
num_heads=4,
ff_dim=4,
num_transformer_blocks=4,
mlp_units=[128],
mlp_dropout=0.4,
dropout=0.25,
)
model.compile(loss=tf.keras.losses.mean_absolute_error,
optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
metrics=["mse"])
model.summary()
history = model.fit(train_f, train_p, epochs=100, batch_size = 32, validation_split = 0.25, verbose = 1)
trainYPredict = model.predict(train_f)
**Inverse transform the prediction and keep the last value(output)**
trainYPredict1 = np.repeat(trainYPredict, scaled_train_X.shape[1], axis = -1)
trainYPredict_actual = scaler.inverse_transform(trainYPredict1)[:, -1]
train_p_actual = np.repeat(train_p, scaled_train_X.shape[1], axis = -1)
train_p_actual1 = scaler.inverse_transform(train_p_actual)[:, -1]
Prediction_mse=mean_squared_error(train_p_actual1 ,trainYPredict_actual)
print("Mean Squared Error of prediction is:", str(Prediction_mse))
Prediction_rmse =sqrt(Prediction_mse)
print("Root Mean Squared Error of prediction is:", str(Prediction_rmse))
prediction_r2=r2_score(train_p_actual1 ,trainYPredict_actual)
print("R2 score of predictions is:", str(prediction_r2))
prediction_mae=mean_absolute_error(train_p_actual1 ,trainYPredict_actual)
print("Mean absolute error of prediction is:", prediction_mae)
**Testing of model**
scaled_test_X = scaler.transform(test_X)
test_q = []
test_r = []
for val in range(n_past, len(scaled_test_X) - n_future+1):
test_q.append(scaled_test_X[val - n_past:val, 0:scaled_test_X.shape[1]])
test_r.append(scaled_test_X[val + n_future - 1:val + n_future, -1])
test_q, test_r = np.array(test_q), np.array(test_r)
testPredict = model.predict(test_q)

Custom TensorFlow loss function with batch size > 1?

I have some neural network with following code snippets, note that batch_size == 1 and input_dim == output_dim:
net_in = tf.Variable(tf.zeros(shape = [batch_size, input_dim]), dtype=tf.float32)
input_placeholder = tf.compat.v1.placeholder(shape = [batch_size, input_dim], dtype=tf.float32)
assign_input = net_in.assign(input_placeholder)
# Some matmuls, activations, dropouts, normalizations...
net_out = tf.tanh(output_before_activation)
def loss_fn(output, input):
#input.shape = output.shape = (batch_size, input_dim)
output = tf.reshape(output, [input_dim,]) # shape them into 1d vectors
input = tf.reshape(input, [input_dim,])
return my_fn_that_only_takes_in_vectors(output, input)
# Create session, preprocess data ...
for epoch in epoch_num:
for batch in range(total_example_num // batch_size):
sess.run(assign_input, feed_dict = {input_placeholder : some_appropriate_numpy_array})
sess.run(optimizer.minimize(loss_fn(net_out, net_in)))
Currently the neural network above works fine, but it is very slow because it updates gradient every sample (batch size = 1). I would like to set batch size > 1, but my_fn_that_only_takes_in_vectors cannot accommodate matrices whose first dimension is not 1. Due to the nature of my custom loss, flattening the batch input into a vector of length (batch_size * input_dim) seems to not work.
How would I write my new custom loss_fn now that the input and output are N x input_dim where N > 1? In Keras this would not have been an issue because keras somehow takes the average of the gradients of each example in the batch. For my TensorFlow function, should I take each row as a vector individually, pass them to my_fn_that_only_takes_in_vectors, then take the average of the results?
You can use a function that computes the loss on the whole batch, and works independently on the batch size. Basically the operations are applied to the whole first dimension of the input (the first dimension represents the element number in the batch). Here is an example, I hope this helps to see how the operations are carried out:
def my_loss(y_true, y_pred):
dx2 = tf.math.squared_difference(y_true[:, 0], y_true[:, 2]) # shape (BatchSize, )
dy2 = tf.math.squared_difference(y_true[:, 1], y_true[:, 3]) # shape: (BatchSize, )
denominator = dx2 + dy2 # shape: (BatchSize, )
dst_vec = tf.math.squared_difference(y_true, y_pred) # shape: (Batch, n_labels)
numerator = tf.reduce_sum(dst_vec, axis=-1) # shape: (BatchSize,)
loss_vector = tf.cast(numerator / denominator, dtype="float32") # shape: (BatchSize,) this is a vector containing the loss of each element of the batch
loss = tf.reduce_sum(loss_vector ) #if you want to sum the losses
return loss
I am not sure whether you need to return the sum or the avg of the losses for the batch.
If you sum, make sure to use a validation dataset with same batch size, otherwise the loss is not comparable.

How to extract all weights from LSTM cell in vanila Tensorflow?

I am train LSTM network
cell_fw = tf.contrib.rnn.BasicLSTMCell(HIDDEN_SIZE)
cell_bw = tf.contrib.rnn.BasicLSTMCell(HIDDEN_SIZE)
rnn_outputs, final_state_fw, final_state_bw = tf.contrib.rnn.static_bidirectional_rnn(
cell_fw=cell_fw,
cell_bw=cell_bw,
inputs=rnn_inputs,
dtype=tf.float32
)
Further, I am try to save it coefficients:
d = {}
with tf.Session() as sess:
# train code ...
variables_names =[v.name for v in tf.global_variables()]
values = sess.run(variables_names)
for k,v in zip(variables_names, values):
d[k] = v
Dictionary d have only 2 objects from each LSTM cell:
[(k,v.shape) for (k,v) in sorted(d.items(), key=lambda x:x[0])]
[('bidirectional_rnn/bw/basic_lstm_cell/biases:0', (1024,)),
('bidirectional_rnn/bw/basic_lstm_cell/weights:0', (272, 1024)),
('bidirectional_rnn/fw/basic_lstm_cell/biases:0', (1024,)),
('bidirectional_rnn/fw/basic_lstm_cell/weights:0', (272, 1024)),
('char_embedding:0', (70, 16)),
('softmax_biases:0', (5068,)),
('softmax_weights:0', (5068, 512))]
I'm puzzled. Each LSTM cell should contain up to 4 trainable layers, or not? If so, how to get all weights from LSTM-cell??
the 4 weights (and biases) of a LSTM cell are stored as a single tensor, where slices along the second axis correspond to the different kind of weights (in gate, forget gate, ecc)
For instance, I guess that in your case the value of HIDDEN_SIZE is 256
To access the different parts, you should slice the tensors along the axis of length 1024 (but I don't know in which order the different kind of weights are stored...)

Tensorflow 1.0 LSTM Cell in dynamic_rnn throws dimension error

I am trying to implement an LSTM Model as a model_fn input to an Estimator. My X is only a .txt with a time series of prices. Before going into my first hidden layer, I try to define the lstm cell as:
def lstm_cell():
return tf.contrib.rnn.BasicLSTMCell(
size, forget_bias=0.0, state_is_tuple=True)
attn_cell = lstm_cell
if is_training and keep_prob < 1:
def attn_cell():
return tf.contrib.rnn.DropoutWrapper(
lstm_cell(), output_keep_prob=keep_prob)
cell = tf.contrib.rnn.MultiRNNCell([attn_cell() for _ in range(num_layers)], state_is_tuple=True)
initial_state = cell.zero_state(batch_size, data_type())
inputs = tf.unstack(X, num=num_steps, axis=0)
outputs = []
outputs, state = tf.nn.dynamic_rnn(cell, inputs,
initial_state=initial_state)
This then is supposed to go into:
first_hidden_layer = tf.contrib.layers.relu(outputs, 1000)
Unfortunately, it throws an error idicating that "ValueError: Dimension must be 1 but is 3 for 'transpose' (op: 'Transpose') with input shapes: [1], [3]."
I gather that my problem is the "inputs" tensor. In its description, the inputs variable is supposed to be a tensor with form [batch_size,max_time,...], but Ihave no idea how to translate this into above structure since, through the estimator, only input values X and target values y are fed to the system. So my question would be how to create a tensor that can serve as an inputs variable to the dynamic_rnn class.
Thanks a lot.
I believe you don't need the line:
inputs = tf.unstack(X, num=num_steps, axis=0)
you can supply X directly to dynamic_rnn since dynamic_rnn doesn't take a list of tensors; It takes one tensor where the time axis is dimension 0 (if time_major == True) or dimension 1 (if time_major == False).
Actually, it seems that X has 2 dimensions only, since inputs is list of 1 dimensional tensors (as indicated by the error message). so you should replace the unstack line with:
inputs = tf.expand_dims(X, axis=2)
This will add a 3rd dimension of size 1 that is needed by dynamic_rnn

how to stack LSTM layers using TensorFlow

what I have is the following, which I believe is a network with one hidden LSTM layer:
# Parameters
learning rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 10
# Network Parameters
n_input = 13
n_steps = 10
n_hidden = 512
n_classes = 13
# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])
# Define weights
weights = {
'out' : tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
biases = {
'out' : tf.Variable(tf.random_normal([n_classes]))
}
However, I am trying to build an LSTM network using TensorFlow to predict power consumption. I have been looking around to find a good example, but I could not find any model with 2 hidden LSTM layers. Here's the model that I would like to build:
1 input layer,
1 output layer,
2 hidden LSTM layers(with 512 neurons in each),
time step(sequence length): 10
Could anyone guide me to build this using TensorFlow? ( from defining weights, building input shape, training, predicting, use of optimizer or cost function, etc), any help would be much appreciated.
Thank you so much in advance!
Here is how I do it in a translation model with GRU cells. You can just replace the GRU with an LSTM. It is really easy just use tf.nn.rnn_cell.MultiRNNCell with a list of the multiple cells it should wrap. In the code bellow I am manually unrolling it but you can pass it to tf.nn.dynamic_rnn or tf.nn.rnn as well.
y = input_tensor
with tf.variable_scope('encoder') as scope:
rnn_cell = rnn.MultiRNNCell([rnn.GRUCell(1024) for _ in range(3)])
state = tf.zeros((BATCH_SIZE, rnn_cell.state_size))
output = [None] * TIME_STEPS
for t in reversed(range(TIME_STEPS)):
y_t = tf.reshape(y[:, t, :], (BATCH_SIZE, -1))
output[t], state = rnn_cell(y_t, state)
scope.reuse_variables()
y = tf.pack(output, 1)
First you need some placeholders to put your training data (one batch)
x_input = tf.placeholder(tf.float32, [batch_size, truncated_series_length, 1])
y_output = tf.placeholder(tf.float32, [batch_size, truncated_series_length, 1])
A LSTM need a state, which consists of two components, the hidden state and the cell state, very good guide here: https://arxiv.org/pdf/1506.00019.pdf. For every layer in the LSTM you have one cell state and one hidden state.
The problem is that Tensorflow stores this in a LSTMStateTuple which you can not send into placeholder. So you need to store it in a Tensor, and then unpack it into a tuple:
state_placeholder = tf.placeholder(tf.float32, [num_layers, 2, batch_size, state_size])
l = tf.unpack(state_placeholder, axis=0)
rnn_tuple_state = tuple(
[tf.nn.rnn_cell.LSTMStateTuple(l[idx][0], l[idx][1])
for idx in range(num_layers)]
)
Then you can use the built-in Tensorflow API to create the stacked LSTM layer.
cell = tf.nn.rnn_cell.LSTMCell(state_size, state_is_tuple=True)
cell = tf.nn.rnn_cell.MultiRNNCell([cell]*num_layers, state_is_tuple=True)
outputs, state = tf.nn.dynamic_rnn(cell, x_input, initial_state=rnn_tuple_state)
From here you continue with the outputs to calculate logits and then a loss with respect to the y_inputs.
Then you run each batch with the sess.run-command, with truncated backpropagation (good explanation here http://r2rt.com/styles-of-truncated-backpropagation.html)
init_state = np.zeros((num_layers, 2, batch_size, state_size))
...current_state... = sess.run([...state...], feed_dict={x_input:batch_in, state_placeholder:current_state ...})
current_state = np.array(current_state)
You will have to convert the state to a numpy array before feeding it again.
Perhaps it is better to use a librarly like Tflearn or Keras instead?