LSTM model has lower than expected accuracy - tensorflow

Hello, I am working on the resolution of a problem that has to do with time series.
I am plotting y = sin (x) with 10000 values
Then, to each value (y), I associate an index calculated based on the next values (between 0 and 1)
if the next 150 values are lower than the current one, then this index will be set to 1
If the next 150 values are higher then the current one, then this index will be set to 0
Then I'm trying to set up a LSTM network using tensorflow/keras in order to predict this index based on the last 150 values, which should be pretty trivial for a sinus function.
Here is the code and the explanation :
I make an array with 10000 values of sin(x)
import numpy as np
import math
from matplotlib import pyplot as plt
n = 10000
array = np.array([math.sin(i*0.02) for i in range(1, n)])
fig, ax = plt.subplots()
ax.plot([(i*0.02) for i in range(1, n)], array, linewidth=0.75)
plt.show()
Calculate the associated index, here SELL_INDEX
SELL_INDEX = np.zeros((len(array), 1))
for index, row in enumerate(array):
if index > len(array) - 150:
continue
max_price = np.amax(array[index:index + 150])
min_price = np.amin(array[index:index + 150])
current_sell_index = (row - min_price) / (max_price - min_price)
SELL_INDEX[index][0] = current_sell_index
data_with_sell_index = np.hstack((array.reshape(-1,1), SELL_INDEX))
data_final = np.hstack( (data_with_sell_index, np.arange(len(data_with_sell_index)).reshape(-1, 1)) )
fig, ax = plt.subplots()
ax.scatter(data_final[:,2], data_final[:,0] , c = data_final[:,1], s = .5)
plt.show()
Here is the plot (sin(x), SELL_INDEX : 1 being yellow, 0 being purple )
Here is the creation of the model
import tensorflow as tf
from sklearn.preprocessing import MinMaxScaler
from tensorflow.python.keras import models, Input, Model
from tensorflow.python.keras.layers import LSTM, Dense, Dropout
# from neural_intelligence.batches_generator import generate_smart_lstm_batch, get_smart_lstm_data
class LearningRateReducerCb(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
old_lr = self.model.optimizer.lr.read_value()
new_lr = old_lr * 0.99
print("\nEpoch: {}. Reducing Learning Rate from {} to {}".format(epoch, old_lr, new_lr))
self.model.optimizer.lr.assign(new_lr)
# Model creation
input_layer = Input(shape=(150, 1))
layer_1_lstm = LSTM(100, return_sequences=True)(input_layer)
dropout_1 = Dropout(0.0)(layer_1_lstm)
layer_2_lstm = LSTM(200, return_sequences=True)(dropout_1)
dropout_2 = Dropout(0.0)(layer_2_lstm)
layer_3_lstm = LSTM(100)(dropout_2)
output_sell_index_proba = Dense(1, activation='sigmoid')(layer_3_lstm)
model = Model(inputs=input_layer, outputs=output_sell_index_proba)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()
Training the model
def generate_batch(dataset_x, dataset_y, sequence_length):
x_data, y_data = [], []
for i in range(len(list(zip(dataset_x, dataset_y))) - sequence_length - 1):
x_data.append(dataset_x[i:i + sequence_length])
y_data.append(dataset_y[i + sequence_length])
return np.array(x_data), np.array(y_data)
x, y = generate_batch(data_final[:,0], data_final[:,1], sequence_length=150)
x = x.reshape(x.shape[0], x.shape[1], 1)
y = y.reshape(x.shape[0], 1, 1)
print(x.shape, y.shape)
model.fit(x, y, callbacks=[LearningRateReducerCb()], epochs=2,
validation_split=0.1, batch_size=64, verbose=1)
Here is my issue, the accuracy never goes above 0.52, I don't understand why, everything seems to be ok to me.
This should be very simple for such a powerful tool as LSTM, but it can figure out what the index could be.
If you could me help in any way, you're welcome, thank you
EDIT : To plot the result, use
data = np.array(data_final[:,0])
results = np.array([])
for i in range (150, 1000):
result = model.predict(data[i - 150 : i].reshape(1, 150, 1))
results = np.append(result, results)
data = data[150:1000]
fig, ax = plt.subplots()
ax.scatter([range(len(data))], data.flatten() , c = results.flatten(), s= 1)
plt.show()
It seems to be working, the issue is : why is the accuracy never goes up while training ?
This leads me to investigate on what was the problem instead of trying predicting

This may be simplistic, but to my mind you are only accurately predicting half your curve.
This is where the blue and yellow lines overlap in your fit chart. The accuracy measure will be computed over all of the rows unless you tell it otherwise.
This intuitively explains why your accuracy is c. 50%. You should be able to confirm this by splitting your data into these two portions and calculating the accuracy on each
I suggest playing around with your features and transformations to understand which type of shapes predict your sine curve with a higher accuracy (and give a fuller overlap between the lines).

Related

Multivariate LSTM cross feature dependencies

I was working myself through handson-ml2, and chapter 15 in particular.
I want to generalize the multiple steps ahead approach to multiple features and one target. In order to test my understanding I create some series, which are either following a sin wave or a cos wave with some frequency.
`
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
start = -10
stop = 10
n_steps = 2*(stop-start) #40
forecast_horizon = 10
num_features = 4
x_axis = np.linspace(start, stop, n_steps+forecast_horizon)
series_sin = np.stack([np.sin(np.random.rand() * x_axis) for i in range(10000)])
series_cos = np.stack([np.cos(np.random.rand() * x_axis) for i in range(10000)])
rand = np.random.rand(10000, n_steps + forecast_horizon).round()
target = np.where(rand, series_sin, series_cos)
series = np.stack([target, rand, series_sin, series_cos], axis = 2)
X_train = series[:7000,:n_steps]
X_valid = series[7000:9000,:n_steps]
X_test = series[9000:,:n_steps]
Y = np.empty([10000, n_steps, forecast_horizon])
for step_ahead in range(1, n_steps + 1):
Y[:, step_ahead - 1, :] = \
target[:, step_ahead:step_ahead + forecast_horizon] \
.reshape(10000, forecast_horizon)
y_train = Y[:7000]
y_valid = Y[7000:9000]
y_test = Y[9000:]
`
When plotting the target, sin and cos waves for some sample one sees that according to the rand array the target is either the sin wave or cos wave.
time series plot
Now I want to train a neural network forecasting the time series.
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(512, activation = "linear"),
tf.keras.layers.Bidirectional(
tf.keras.layers.LSTM(
128,
input_shape=(n_steps,num_features),
return_sequences=True,
activation="sigmoid")),
tf.keras.layers.Bidirectional(
tf.keras.layers.LSTM(
64,
input_shape=(n_steps,num_features),
return_sequences=True,
activation="sigmoid")),
tf.keras.layers.Dense(256, activation = "linear"),
tf.keras.layers.Dense(
forecast_horizon,
activation = "linear")
])
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.01),
loss=tf.keras.losses.MeanSquaredError()
)
model.fit(X_train, y_train,
validation_data=(X_valid, y_valid),
epochs=2)
My assumption would be that it is very easy for the model to learn the task since feature 2,3,4 are perfect predictors for the target. But as one can see on the plot below the model does not learn any cross feature dependencies.
time series plot with forecast
Any ideas?
cheers,
Felix

Visualizing self attention weights for sequence addition problem with LSTM?

I am using Self Attention layer from here for a simple problem of adding all the numbers in a sequence that come before a delimiter. With training, I expect the neural network to learn which numbers to add and using Self Attention layer, I expect to visualize where the model is focusing. The code to reproduce the the results is following
import os
import sys
import matplotlib.pyplot as plt
import numpy
import numpy as np
from keract import get_activations
from tensorflow.keras import Sequential
from tensorflow.keras.callbacks import Callback
from tensorflow.keras.layers import Dense, Dropout, LSTM
from attention import Attention # https://github.com/philipperemy/keras-attention-mechanism
def add_numbers_before_delimiter(n: int, seq_length: int, delimiter: float = 0.0,
index_1: int = None) -> (np.array, np.array):
"""
Task: Add all the numbers that come before the delimiter.
x = [1, 2, 3, 0, 4, 5, 6, 7, 8, 9]. Result is y = 6.
#param n: number of samples in (x, y).
#param seq_length: length of the sequence of x.
#param delimiter: value of the delimiter. Default is 0.0
#param index_1: index of the number that comes after the first 0.
#return: returns two numpy.array x and y of shape (n, seq_length, 1) and (n, 1).
"""
x = np.random.uniform(0, 1, (n, seq_length))
y = np.zeros(shape=(n, 1))
for i in range(len(x)):
if index_1 is None:
a = np.random.choice(range(1, len(x[i])), size=1, replace=False)
else:
a = index_1
y[i] = np.sum(x[i, 0:a])
x[i, a] = delimiter
x = np.expand_dims(x, axis=-1)
return x, y
def main():
numpy.random.seed(7)
# data. definition of the problem.
seq_length = 20
x_train, y_train = add_numbers_before_delimiter(20_000, seq_length)
x_val, y_val = add_numbers_before_delimiter(4_000, seq_length)
# just arbitrary values. it's for visual purposes. easy to see than random values.
test_index_1 = 4
x_test, _ = add_numbers_before_delimiter(10, seq_length, 0, test_index_1)
# x_test_mask is just a mask that, if applied to x_test, would still contain the information to solve the problem.
# we expect the attention map to look like this mask.
x_test_mask = np.zeros_like(x_test[..., 0])
x_test_mask[:, test_index_1:test_index_1 + 1] = 1
model = Sequential([
LSTM(100, input_shape=(seq_length, 1), return_sequences=True),
SelfAttention(name='attention_weight'),
Dropout(0.2),
Dense(1, activation='linear')
])
model.compile(loss='mse', optimizer='adam')
print(model.summary())
output_dir = 'task_add_two_numbers'
if not os.path.exists(output_dir):
os.makedirs(output_dir)
max_epoch = int(sys.argv[1]) if len(sys.argv) > 1 else 200
class VisualiseAttentionMap(Callback):
def on_epoch_end(self, epoch, logs=None):
attention_map = get_activations(model, x_test, layer_names='attention_weight')['attention_weight']
# top is attention map.
# bottom is ground truth.
plt.imshow(np.concatenate([attention_map, x_test_mask]), cmap='hot')
iteration_no = str(epoch).zfill(3)
plt.axis('off')
plt.title(f'Iteration {iteration_no} / {max_epoch}')
plt.savefig(f'{output_dir}/epoch_{iteration_no}.png')
plt.close()
plt.clf()
model.fit(x_train, y_train, validation_data=(x_val, y_val), epochs=max_epoch,
batch_size=64, callbacks=[VisualiseAttentionMap()])
if __name__ == '__main__':
main()
However, I get following results attention weights
[Please click the link]1 to view weights during training.
I expect the attention to focus on all values before the delimiter. The white below represents ground truth while the upper half part represents weights for 10 samples.

keras - `sample_weight` results in NaN when zero passed - also not efficient for unbalanced data

I am designing a model with two outputs, y and dy, where I have much more training data for y than dy while the location (x) of those data points are the same (please check the image bellow).
I am handling this issue with sample_weight in keras.model.fit. There are two concerns:
If I pass 'zero' for a sample weight, after the first training, it results into NaN. I instead have to pass a very small number, which I am not sure how it affects the training.
This is inefficient if I have multiple outputs with many of them have available training data at very few locations. Because, all the training data will be included in the updates. Is there any other way to handle this case?
Note that Keras works fine training the model, however, I am looking for more efficient way to also be able to pass zero for unwanted weights.
Please check the code bellow:
import numpy as np
import keras as k
import tensorflow as tf
from matplotlib.pyplot import plot, show, legend
# Note this is needed to handle lambda layers as Keras' gradient does not work in this setup.
def custom_grad(y, x):
return tf.gradients(y, x, unconnected_gradients='zero', colocate_gradients_with_ops=True)
# Setting up keras model.
x = k.Input((1,), name='x', dtype='float32')
lay = k.layers.Dense(10, activation='tanh')(x)
lay = k.layers.Dense(10, activation='tanh')(lay)
y = k.layers.Dense(1, name='y')(lay)
dy = k.layers.Lambda(lambda f: custom_grad(f, x), name='dy')(y)
model = k.Model(x, [y, dy])
# Preparing training data.
num_samples = 10000
x_true = np.linspace(0.0, np.pi, num_samples)
y_true = np.sin(x_true)
dy_true = np.zeros_like(y_true)
# for dy, we only have values at certain points -
# say 10% of what is available for yfrom initial and the end.
percentage = 0.1
dy_ids = np.concatenate((np.arange(0, num_samples*percentage, dtype=int),
np.arange(num_samples*(1-percentage), 10000, dtype=int)))
dy_true[dy_ids] = np.cos(x_true[dy_ids])
# I use sample weight to circumvent unbalanced available data.
y_sample_weight = np.ones_like(y_true)
dy_sample_weight = np.zeros_like(y_true) + 1.0e-8
dy_sample_weight[dy_ids] = num_samples/dy_ids.size
assert abs(dy_sample_weight.sum() - num_samples) <= 1.0e-3
# training the model.
model.compile("adam", loss="mse")
model.fit(x_true, [y_true, dy_true],
sample_weight=[y_sample_weight, dy_sample_weight],
epochs=50, shuffle=True)
[y_pred, dy_pred] = model.predict(x_true)
# expected outputs.
plot(x_true, y_true, '.k', label='y true')
plot(x_true[dy_ids], dy_true[dy_ids], '.r', label='dy true')
plot(x_true, y_pred, '--b', label='y pred')
plot(x_true, dy_pred, '--b', label='dy pred')
legend()
show()

TensorFlow code not giving intended results

The following code has the irritating trait of making every row of "out" the same. I am trying to classify k time series in Xtrain as [1,0,0,0], [0,1,0,0], [0,0,1,0], or [0,0,0,1], according to the way they were generated (by one of four random algorithms). Anyone know why? Thanks!
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import copy
n = 100
m = 10
k = 1000
hidden_layers = 50
learning_rate = .01
training_epochs = 10000
Xtrain = []
Xtest = []
Ytrain = []
Ytest = []
# ... fill variables with data ..
x = tf.placeholder(tf.float64,shape = (k,1,n,1))
y = tf.placeholder(tf.float64,shape = (k,1,4))
conv1_weights = 0.1*tf.Variable(tf.truncated_normal([1,m,1,hidden_layers],dtype = tf.float64))
conv1_biases = tf.Variable(tf.zeros([hidden_layers],tf.float64))
conv = tf.nn.conv2d(x,conv1_weights,strides = [1,1,1,1],padding = 'VALID')
sigmoid1 = tf.nn.sigmoid(conv + conv1_biases)
s = sigmoid1.get_shape()
sigmoid1_reshape = tf.reshape(sigmoid1,(s[0],s[1]*s[2]*s[3]))
sigmoid2 = tf.nn.sigmoid(tf.layers.dense(sigmoid1_reshape,hidden_layers))
sigmoid3 = tf.nn.sigmoid(tf.layers.dense(sigmoid2,4))
penalty = tf.reduce_sum((sigmoid3 - y)**2)
train_op = tf.train.AdamOptimizer(learning_rate).minimize(penalty)
model = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(model)
for i in range(0,training_epochs):
sess.run(train_op,{x: Xtrain,y: Ytrain})
out = sigmoid3.eval(feed_dict = {x: Xtest})
Likely because your loss function is mean squared error. If you're doing classification you should be using cross-entropy loss
Your loss is penalty = tf.reduce_sum((sigmoid3 - y)**2) that's the squared difference elementwise between a batch of predictions and a batch of values.
Your network output (sigmoid3) is a tensor with shape [?, 4] and y (I guess) is a tensor with shape [?, 4] too.
The squared difference has thus shape [?, 4].
This means that the tf.reduce_sum is computing in order:
The sum over the second dimension of the squared difference, producing a tensor with shape [?]
The sum over the first dimension (the batch size, here indicated with ?) producing a scalar value (shape ()) that's your loss value.
Probably you don't want this behavior (the sum over the batch dimension) and you're looking for the mean squared error over the batch:
penalty = tf.reduce_mean(tf.squared_difference(sigmoid3, y))

When is a random number generated in a Keras Lambda layer?

I would like to apply simple data augmentation (multiplication of the input vector by a random scalar) to a fully connected neural network implemented in Keras. Keras has nice functionality for image augmentation, but trying to use this seemed awkward and slow for my input (1-tensors), whose training data set fits in my computer's memory.
Instead, I imagined that I could achieve this using a Lambda layer, e.g. something like this:
x = Input(shape=(10,))
y = x
y = Lambda(lambda z: random.uniform(0.5,1.0)*z)(y)
y = Dense(units=5, activation='relu')(y)
y = Dense(units=1, activation='sigmoid')(y)
model = Model(x, y)
My question concerns when this random number will be generated. Will this fix a single random number for:
the entire training process?
each batch?
each training data point?
Using this will create a constant that will not change at all, because random.uniform is not a keras function. You defined this operation in the graph as constant * tensor and the factor will be constant.
You need random functions "from keras" or "from tensorflow". For instance, you can take K.random_uniform((1,), 0.5, 1.).
This will be changed per batch. You can test it by training this code for a lot of epochs and see the loss changing.
from keras.layers import *
from keras.models import Model
from keras.callbacks import LambdaCallback
import numpy as np
ins = Input((1,))
outs = Lambda(lambda x: K.random_uniform((1,))*x)(ins)
model = Model(ins,outs)
print(model.predict(np.ones((1,1))))
print(model.predict(np.ones((1,1))))
print(model.predict(np.ones((1,1))))
model.compile('adam','mae')
model.fit(np.ones((100000,1)), np.ones((100000,1)))
If you want it to change for each training sample, then get a fixed batch size and generate a tensor with random numbers for each sample: K.random_uniform((batch_size,), .5, 1.).
You should probably get better performance if you do it in your own generator and model.fit_generator(), though:
class MyGenerator(keras.utils.Sequence):
def __init__(self, inputs, outputs, batchSize, minRand, maxRand):
self.inputs = inputs
self.outputs = outputs
self.batchSize = batchSize
self.minRand = minRand
self.maxRand = maxRand
#if you want shuffling
def on_epoch_end(self):
indices = np.array(range(len(self.inputs)))
np.random.shuffle(indices)
self.inputs = self.inputs[indices]
self.outputs = self.outputs[indices]
def __len__(self):
leng,rem = divmod(len(self.inputs), self.batchSize)
return (leng + (1 if rem > 0 else 0))
def __getitem__(self,i):
start = i*self.batchSize
end = start + self.batchSize
x = self.inputs[start:end] * random.uniform(self.minRand,self.maxRand)
y = self.outputs[start:end]
return x,y