Difference equation in LSTM network on Tensorflow - tensorflow

I'd like to use a LSTM network on Tensorflow to implement a difference equation. I searched on internet but I didn't find anything about this topic.
The equation is:
formula
in which b=[1, 2, 1] and a=[1, -1.6641, 0.8387].
My aim is to use a neural network to find the correlation between input and output. Due to that to find the output ad k-instant you have to know also the previous inputs and outputs, my idea is to implement a LSTM network (many to one structure).
If we suppose to have an input vector of 500 samples and to use a window size of 5, the input of LSTM network is a vector of shape (500,5,1) while the output is (500,1,1).
The IN%OUT of first iteration are:
[0; x(k-4), x(k-3), x(k-2), x(k-1), x(k); 1] -> [1; y(k); 1]
formula
in the second iteration:
[0; x(k-3), x(k-2), x(k-1), x(k), x(k+1); 1] -> [1; y(k+1); 1]
formula
So I used a LSMT network with stateful set to TRUE to allow the network to remember past states but it doesn't converge.
It seems to me that the idea is correct but I cannot see where I am going wrong. Could someone help me find the problem? I copy and paste the code below and the network is developed on Tensorflow.
# Difference equation
K = 0.0436
b = np.array([1,2,1])
a = np.array([1, -1.6641, 0.8387])
x = np.random.uniform(0, 1, 100)
y = K*(signal.lfilter(b,a,x))
# Generate Dataset
X_train = np.random.uniform(0, 1, 100)
y_train = K*(signal.lfilter(b,a,X_train))
X_val = np.ones(100)
y_val = K*(signal.lfilter(b,a,X_val))
X_test = np.random.uniform(0.5, 0.8, 100)
y_test = K*(signal.lfilter(b,a,X_test))
def get_x_split(data, windows_size):
""" Return sliding window dataset. """
x_temp = np.zeros([1,windows_size-1])
x = np.array([])
for i in range(0,len(data)):
x_temp = np.append(x_temp[-windows_size+1:], data[i]).T
x = np.append(x, x_temp, axis=0)
x = np.reshape(x, (int(len(x)/windows_size), windows_size))
return x
windows_size = 10
X_train = get_x_split(X_train, windows_size)
X_val = get_x_split(X_val, windows_size)
X_test = get_x_split(X_test, windows_size)
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
X_val = np.reshape(X_val, (X_val.shape[0], X_val.shape[1], 1))
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
# Model Definition
activation_function = 'tanh'
def build_model():
input_layer = Input(shape=(X_train.shape[1],1), batch_size=1)
HL_1 = LSTM(1, activation=activation_function, return_sequences=True, stateful = True)(input_layer)
HL_2 = LSTM(1, activation=activation_function, return_sequences=False, stateful = True)(HL_1)
output_layer = Dense(1, activation='relu',name='Output')(HL_2)
model = Model(inputs=input_layer, outputs=output_layer)
return model
model = build_model()
model.compile(optimizer=RMSprop(),
loss={'Output': 'mse'}, #mse
metrics={'Output': tf.keras.metrics.RootMeanSquaredError()})
# Training
history = model.fit(x=X_train,
y=y_train,
batch_size=1,
validation_data=(X_val, y_val),
epochs=5000,
verbose=1,
shuffle=False)
# Test
y_pred = model.predict(X_test)
pred_samples = 400
plt.figure(dpi=1200)
plt.plot(y_test[300:pred_samples,3,0], label='true', linewidth=0.8, alpha=0.5)
plt.plot(y_pred[300:pred_samples,3,0], label='pred')
plt.legend()
plt.grid()
plt.title("Test")
plt.show()

Related

Time-Series Transformer Model trains well but performs worse on Test Data

I have created a transformer model for multivariate time series predictions (many-to-one classification model).
Details about the Dataset
I have the hourly varying data i.e., 8 different features (hour, month, temperature, humidity, windspeed, solar radiations concentration etc.) and with them I am trying to predict the time sequence (energy consumption of a building. So my input has the shape X.shape = (8783, 168, 8) i.e., 8783 time sequences, each sequence contains 168 hourly entries/vectors and each vector contains 8 features. My output has the shape Y.shape = (8783,1) i.e., 8783 sequences each containing 1 output value (i.e., building energy consumption value after every hour).
Model Details
I took as a model an example from the official keras site. It is created for classification problems, I modified it for my regression problem by changing the activation of last output layer from sigmoid to relu.
Input shape (train_f) = (8783, 168, 8)
Output shape (train_P) = (8783,1)
When I train the model for 100 no. of epochs it converges very well for less number of epochs as compared to my reference models (i.e., LSTMs and LSTMS with self attention). After training, when the model is asked to make prediction by feeding in the test data, the prediction performance is worse as compare to the reference models.
I would be grateful if you please have a look at the code and let me know of the potential steps to improve the prediction/test accuracy.
Here is the code;
df_weather = pd.read_excel(r"Downloads\WeatherData.xlsx")
df_energy = pd.read_excel(r"Downloads\Building_energy_consumption_record.xlsx")
visa = pd.concat([df_weather, df_energy], axis = 1)
df_data = visa.loc[:, ~visa.columns.isin(["Time1", "TD", "U", "DR", "FX"])
msna.bar(df_data)
plt.figure(figsize = (16,6))
sb.heatmap(df_data.corr(), annot = True, linewidths=1, fmt = ".2g", cmap= 'coolwarm')
plt.xticks(rotation = 'horizontal') # how the titles will look likemeans their orientation
extract_for_normalization = list(df_data)[1:9]
df_data_float = df_data[extract_for_normalization].astype(float)
from sklearn.model_selection import train_test_split
train_X, test_X = train_test_split(df_data_float, train_size = 0.7, shuffle = False)
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_train_X=scaler.fit_transform(train_X)
**Converting train_X into required shape (inputs,sequences, features)**
train_f = [] #features input from training data
train_p = [] # prediction values
#test_q = []
#test_r = []
n_future = 1 #number of days we want to predict into the future
n_past = 168 # no. of time series input features to be considered for training
for val in range(n_past, len(scaled_train_X) - n_future+1):
train_f.append(scaled_train_X[val - n_past:val, 0:scaled_train_X.shape[1]])
train_p.append(scaled_train_X[val + n_future - 1:val + n_future, -1])
train_f, train_p = np.array(train_f), np.array(train_p)
**Transformer Model**
def transformer_encoder(inputs, head_size, num_heads, ff_dim, dropout=0):
# Normalization and Attention
x = layers.LayerNormalization(epsilon=1e-6)(inputs)
x = layers.MultiHeadAttention(
key_dim=head_size, num_heads=num_heads, dropout=dropout
)(x, x)
x = layers.Dropout(dropout)(x)
res = x + inputs
# Feed Forward Part
x = layers.LayerNormalization(epsilon=1e-6)(res)
x = layers.Conv1D(filters=ff_dim, kernel_size=1, activation="relu")(x)
x = layers.Dropout(dropout)(x)
x = layers.Conv1D(filters=inputs.shape[-1], kernel_size=1)(x)
return x + res
def build_model(
input_shape,
head_size,
num_heads,
ff_dim,
num_transformer_blocks,
mlp_units,
dropout=0,
mlp_dropout=0,
):
inputs = keras.Input(shape=input_shape)
x = inputs
for _ in range(num_transformer_blocks):
x = transformer_encoder(x, head_size, num_heads, ff_dim, dropout)
x = layers.GlobalAveragePooling1D(data_format="channels_first")(x)
for dim in mlp_units:
x = layers.Dense(dim, activation="relu")(x)
x = layers.Dropout(mlp_dropout)(x)
outputs = layers.Dense(train_p.shape[1])(x)
return keras.Model(inputs, outputs)
input_shape = (train_f.shape[1], train_f.shape[2])
model = build_model(
input_shape,
head_size=256,
num_heads=4,
ff_dim=4,
num_transformer_blocks=4,
mlp_units=[128],
mlp_dropout=0.4,
dropout=0.25,
)
model.compile(loss=tf.keras.losses.mean_absolute_error,
optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
metrics=["mse"])
model.summary()
history = model.fit(train_f, train_p, epochs=100, batch_size = 32, validation_split = 0.15, verbose = 1)
trainYPredict = model.predict(train_f)
**Inverse transform the prediction and keep the last value(output)**
trainYPredict1 = np.repeat(trainYPredict, scaled_train_X.shape[1], axis = -1)
trainYPredict_actual = scaler.inverse_transform(trainYPredict1)[:, -1]
train_p_actual = np.repeat(train_p, scaled_train_X.shape[1], axis = -1)
train_p_actual1 = scaler.inverse_transform(train_p_actual)[:, -1]
Prediction_mse=mean_squared_error(train_p_actual1 ,trainYPredict_actual)
print("Mean Squared Error of prediction is:", str(Prediction_mse))
Prediction_rmse =sqrt(Prediction_mse)
print("Root Mean Squared Error of prediction is:", str(Prediction_rmse))
prediction_r2=r2_score(train_p_actual1 ,trainYPredict_actual)
print("R2 score of predictions is:", str(prediction_r2))
prediction_mae=mean_absolute_error(train_p_actual1 ,trainYPredict_actual)
print("Mean absolute error of prediction is:", prediction_mae)
**Testing of model**
scaled_test_X = scaler.transform(test_X)
test_q = []
test_r = []
for val in range(n_past, len(scaled_test_X) - n_future+1):
test_q.append(scaled_test_X[val - n_past:val, 0:scaled_test_X.shape[1]])
test_r.append(scaled_test_X[val + n_future - 1:val + n_future, -1])
test_q, test_r = np.array(test_q), np.array(test_r)
testPredict = model.predict(test_q )
Validation and training loss image is also attached Training and validation Loss

Creating several weight tensors for each object in Multi-Object Tracking (MOT) using TensorFlow

I am using TensorFlow V1.10.0 and developing a Multi-Object Tracker based on MDNet. I need to assign a separate weight matrix for each detected object for the fully connected layers in order to get different embedding for each object during online training. I am using this tf.map_fn in order to generate a higher-order weight tensor (n_objects, flattened layer, hidden_units),
'''
def dense_fc4(n_objects):
initializer = lambda: tf.contrib.layers.xavier_initializer()(shape=(1024, 512))
return tf.Variable(initial_value=initializer, name='fc4/kernel',
shape=(n_objects.shape[0], 1024, 512))
W4 = tf.map_fn(dense_fc4, samples_flat)
b4 = tf.get_variable('fc4/bias', shape=512, initializer=tf.zeros_initializer())
fc4 = tf.add(tf.matmul(samples_flat, W4), b4)
fc4 = tf.nn.relu(fc4)
'''
However during execution when I run the session for W4 I get a weight matrix but all having the same values. Any help?
TIA
Here is a workaround, I was able to generate the multiple kernels outside the graph in a for loop and then giving it to the graph:
w6 = []
for n_obj in range(pos_data.shape[0]):
w6.append(tf.get_variable("fc6/kernel-" + str(n_obj), shape=(512, 2),
initializer=tf.contrib.layers.xavier_initializer()))
print("modeling fc6 branches...")
prob, train_op, accuracy, loss, pred, initialize_vars, y, fc6 = build_branches(fc5, w6)
def build_branches(fc5, w6):
y = tf.placeholder(tf.int64, [None, None])
b6 = tf.get_variable('fc6/bias', shape=2, initializer=tf.zeros_initializer())
fc6 = tf.add(tf.matmul(fc5, w6), b6)
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,
logits=fc6))
train_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope="fc6")
with tf.variable_scope("", reuse=tf.AUTO_REUSE):
optimizer = tf.train.AdamOptimizer(learning_rate=0.001, name='adam')
train_op = optimizer.minimize(loss, var_list=train_vars)
initialize_vars = train_vars
initialize_vars += [optimizer.get_slot(var, name)
for name in optimizer.get_slot_names()
for var in train_vars]
if isinstance(optimizer, tf.train.AdamOptimizer):
initialize_vars += optimizer._get_beta_accumulators()
prob = tf.nn.softmax(fc6)
pred = tf.argmax(prob, 2)
correct_pred = tf.equal(pred, y)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
return prob, train_op, accuracy, loss, pred, initialize_vars, y, fc6

NN on tensorflow doesn't train

I try to train a simple tensorflow-network on a simple model, but from some reason, it doesn't learn anything. Do I make any mistake?
X, Y = read_data(file_name)
# CONSTRUCT GRAPH
x_t = tf.placeholder(shape=[None, X.shape[1]], dtype=tf.float32)
y_t = tf.placeholder(shape=[None,], dtype=tf.float32)
hidden_1 = tf.layers.dense(x_t, 50, activation=tf.nn.sigmoid)
hidden_2 = tf.layers.dense(hidden_1, 50, activation=tf.nn.sigmoid)
output = tf.layers.dense(hidden_2, 1, activation=tf.nn.sigmoid)
# DEFINE LOSS AND OPTIMIZER
loss = tf.reduce_mean(tf.square(output - y_t))
GD_optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
train_step = GD_optimizer.minimize(loss)
# BATCH SIZE
BATCH_SIZE = 20
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(15000):
rand_indices = np.random.choice(X.shape[0], size=BATCH_SIZE)
x_batch = X[rand_indices,:]
y_batch = Y[rand_indices]
_, temp_loss = sess.run([train_step, loss], feed_dict={x_t: x_batch, y_t: y_batch})
print(temp_loss)
According to my understanding of your dataset description, the target value column Y is a float (real valued number) that can be in any range and not necessarily within the [0,1] interval.
On the otherhand, because you use a sigmoid activation for the last layer of your model. The prediction values will be always in [0, 1] range.
I would suggest not using sigmoid activation in the last layer. Unless if your Y value were also in the [0,1] range.
so, modify your code such that it becomes
output = tf.layers.dense(hidden_2, 1, activation=None)

Re-use speechT example with LSTM network

From example at https://github.com/timediv/speechT, I'm trying to adapt to use with LSTM network but failed please help. I tried many combination but I always got error i.e. Input must be sequence or else. I need to implement LSTM network to the example to for speech recognition purpose and after I tried for couple of weeks I still get stuck in the coding problem. Anyone can help me provide example of using LSTM network with the sample will be good.
class InputBatchLoader(BaseInputLoader):
def __init__(self, input_size, batch_size, data_generator_creator, max_steps=None):
super().__init__(input_size)
self.batch_size = batch_size
self.data_generator_creator = data_generator_creator
self.steps_left = max_steps
with tf.device("/cpu:0"):
with tf.device("/cpu:0"):
# Define input and label placeholders
self.inputs = tf.placeholder(tf.float32, [batch_size, None, input_size], name='inputs')
self.sequence_lengths = tf.placeholder(tf.int32, [batch_size], name='sequence_lengths')
self.labels = tf.sparse_placeholder(tf.int32, name='labels')
# Queue for inputs and labels
self.queue = tf.FIFOQueue(dtypes=[tf.float32, tf.int32, tf.string],
capacity=100)
# queues do not support sparse tensors yet, we need to serialize...
serialized_labels = tf.serialize_many_sparse(self.labels)
self.enqueue_op = self.queue.enqueue([self.inputs,
self.sequence_lengths,
serialized_labels])
class Wav2LetterLSTMModel(SpeechModel): #Add Sep 14, 2017 to create LSTM model
def __init__(self, input_loader: BaseInputLoader, input_size: int, num_classes: int):
super().__init__(input_loader, input_size, num_classes)
def _create_network(self, num_classes):
cellsize = 256
num_layers = 3
inputs = self.inputs
lstm_cell = rnn.BasicLSTMCell(cellsize, forget_bias=1.0)
outputs, states = tf.nn.dynamic_rnn(lstm_cell, inputs, dtype=tf.float32)
return tf.transpose(outputs, (1, 0, 2))
def create_default_model(flags, input_size: int, speech_input: BaseInputLoader) -> SpeechModel:
model = Wav2LetterLSTMModel(input_loader=speech_input,
input_size=input_size,
num_classes=speecht.vocabulary.SIZE + 1) #Add Sep 14, 2017, to use LSTM model
# TODO how can we restore only selected variables so we do not need to always create the full network?
if flags.command == 'train':
model.add_training_ops(learning_rate=flags.learning_rate,
learning_rate_decay_factor=flags.learning_rate_decay_factor,
max_gradient_norm=flags.max_gradient_norm,
momentum=flags.momentum)
model.add_decoding_ops()
elif flags.command == 'export':
model.add_training_ops()
model.add_decoding_ops()
else:
model.add_training_ops()
model.add_decoding_ops(language_model=flags.language_model,
lm_weight=flags.lm_weight,
word_count_weight=flags.word_count_weight,
valid_word_count_weight=flags.valid_word_count_weight)
model.finalize(log_dir=flags.log_dir,
run_name=flags.run_name,
run_type=flags.run_type)
return model
At last I use this
XT = tf.transpose(inputs, [1, 0, 2])
XR = tf.reshape(XT, [-1, self.input_size])
X_split = tf.split(XR, cellsize, 0)
lstm = rnn.BasicLSTMCell(cellsize, forget_bias=1.0, state_is_tuple=True)
outputs, _states = rnn.static_rnn(lstm, X_split, dtype=tf.float32)

LSTM model error is percent of one output class

I'm having a rough time trying to figure out what's wrong with my LSTM model. I have 11 inputs, and 2 output classes (one-hot encoded) and very quickly, like within 1 batch or so, the error just goes to the % of one of the output classes and stays there.
I tried printing weights and biases, but they seem to all be full of NaN.
If i decrease the learning rate, or mess around with layers/units, I can get it to arrive at the % of one class error slowly, but it seems to always get to that point.
Here's the code:
num_units = 30
num_layers = 50
dropout_rate = 0.80
learning_rate=0.0001
batch_size = 180
epoch = 1
input_classes = len(train_input[0])
output_classes = len(train_output[0])
data = tf.placeholder(tf.float32, [None, input_classes, 1]) #Number of examples, number of input, dimension of each input
target = tf.placeholder(tf.float32, [None, output_classes]) #one-hot encoded: [1,0] = bad, [0,1] = good
dropout = tf.placeholder(tf.float32)
cell = tf.contrib.rnn.LSTMCell(num_units, state_is_tuple=True)
cell = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=dropout)
cell = tf.contrib.rnn.MultiRNNCell([cell] * num_layers, state_is_tuple=True)
#Input shape [batch_size, max_time, depth], output shape: [batch_size, max_time, cell.output_size]
val, _ = tf.nn.dynamic_rnn(cell, data, dtype=tf.float32)
val = tf.transpose(val, [1, 0, 2]) #reshapes it to [sequence_size, batch_size, depth]
#get last entry as it includes previous results
last = tf.gather(val, int(val.get_shape()[0]) - 1)
weight = tf.get_variable("W", shape=[num_units, output_classes], initializer=tf.contrib.layers.xavier_initializer())
bias = tf.get_variable("B", shape=[output_classes], initializer=tf.contrib.layers.xavier_initializer())
logits = tf.matmul(last, weight) + bias
prediction = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=target)
prediction = tf.clip_by_value(prediction, 1e-10,100.0)
cost = tf.reduce_mean(prediction)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
minimize = optimizer.minimize(cost)
mistakes = tf.not_equal(tf.argmax(target, 1), tf.argmax(logits, 1))
error = tf.reduce_mean(tf.cast(mistakes, tf.float32))
init_op = tf.global_variables_initializer()
saver = tf.train.Saver()
sess = tf.Session()
sess.run(init_op)
no_of_batches = int((len(train_input)) / batch_size)
for i in range(epoch):
ptr = 0
for j in range(no_of_batches):
inp, out = train_input[ptr:ptr+batch_size], train_output[ptr:ptr+batch_size]
ptr+=batch_size
sess.run(minimize,{data: inp, target: out, dropout: dropout_rate })
sess.close()
Since you have one hot encoding use sparse_softmax_cross_entropy_with_logits instead of tf.nn.softmax_cross_entropy_with_logits.
Refer to this stackoverflow answer to understand the difference of two functions.
1