Segmentation of German Asphalt Pavement Distress Dataset (GAPs) using U-Net - tensorflow

I'm trying to train a U-Net like model to segment the German Asphalt Pavement Distress Dataset.
Mask images are stored as grey value images.
Coding of the grey values:
0 = VOID, 1 = intact road, 2 = applied patch, 3 = pothole, 4 = inlaid patch, 5 = open joint, 6 = crack 7 = street inventory
I found the following colab notebook which was implementing U-Net segmentation on Oxford pets dataset:
https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/vision/ipynb/oxford_pets_image_segmentation.ipynb
I modified the notebook to fit my problem of GAPs segmentation, and this is a link to my modified notebook:
https://colab.research.google.com/drive/1YfM4lC78QNdfbkgz-1LGSKaBG4-65dkC?usp=sharing
The training is running, but while the loss is decreasing, but accuracy is never increasing above 0.05. I'm stuck with this issue for days now, and I need help about how to get the model to train properly.
The following is a link to the dataset images and masks:
https://drive.google.com/drive/folders/1-JvLSa9b1falqEake2KVaYYtyVh-dgKY?usp=sharing

In the Sequence class, you do not shuffle the content of the batches, only the batch order is shuffled with the fit method. You have to shuffle the order of all the data at each epoch.
Here is a way to do it in a Sequence subclass:
class OxfordPets(keras.utils.Sequence):
"""Helper to iterate over the data (as Numpy arrays)."""
def __init__(self, batch_size, img_size, input_img_paths, target_img_paths):
self.batch_size = batch_size
self.img_size = img_size
self.input_img_paths = input_img_paths
self.target_img_paths = target_img_paths
self.set_len = len(self.target_img_paths) // self.batch_size
self.indices = random.sample(range(self.set_len), k=self.set_len)
def __len__(self):
return self.set_len
def __getitem__(self, idx):
"""Returns tuple (input, target) correspond to batch #idx."""
i = idx * self.batch_size
indices = self.indices[i : i + self.batch_size]
batch_input_img_paths = [self.input_img_paths[k] for k in indices]
batch_target_img_paths = [self.target_img_paths[k] for k in indices]
x = np.zeros((self.batch_size,) + self.img_size + (3,), dtype="float32")
for j, path in enumerate(batch_input_img_paths):
img = load_img(path, target_size=self.img_size)
x[j] = img
y = np.zeros((self.batch_size,) + self.img_size + (1,), dtype="uint8")
for j, path in enumerate(batch_target_img_paths):
img = load_img(path, target_size=self.img_size, color_mode="grayscale")
y[j] = np.expand_dims(img, 2)
# Ground truth labels are 1, 2, 3. Subtract one to make them 0, 1, 2:
#y[j] -= 1 # I commented this line out because the ground truth labels of GAPs dataset are 0, 1, 2, 3, 4, 5, 6, 7
return x, y
def on_epoch_end(self):
self.indices = random.sample(range(self.set_len), k=self.set_len)
self.indices is a random shuffle of the all indices range(self.set_len) and it is built in the constructor and at the end of each epoch. This permits to shuffle the order of all the data.
Using rmsprop optimizer, it works then :
Epoch 1/15
88/88 [==============================] - 96s 1s/step - loss: 1.9617 - categorical_accuracy: 0.9156 - val_loss: 5.8705 - val_categorical_accuracy: 0.9375
Epoch 2/15
88/88 [==============================] - 93s 1s/step - loss: 0.4754 - categorical_accuracy: 0.9369 - val_loss: 1.9207 - val_categorical_accuracy: 0.9375
Epoch 3/15
88/88 [==============================] - 94s 1s/step - loss: 0.4497 - categorical_accuracy: 0.9447 - val_loss: 9.3833 - val_categorical_accuracy: 0.9375
Epoch 4/15
88/88 [==============================] - 94s 1s/step - loss: 0.3173 - categorical_accuracy: 0.9423 - val_loss: 14.2518 - val_categorical_accuracy: 0.9369
Epoch 5/15
88/88 [==============================] - 94s 1s/step - loss: 0.0645 - categorical_accuracy: 0.9400 - val_loss: 110.9821 - val_categorical_accuracy: 0.8963
Note that there is very quickly some overfitting.

Related

Is there anyway to show the training progress from tf.estimator.LinearClassifier().train()

Is there any way to show the training progress from the TensorFlow linear estimator: tf.estimator.LinearClassifier().train() similar to how the progress output would be with a model.fit() for each Epoch?
tensorflow==2.9.2
Epoch 1/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.4964 - accuracy: 0.8270
Epoch 2/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.3751 - accuracy: 0.8652
Epoch 3/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.3382 - accuracy: 0.8762
Here is a sample of my code
#input function
def make_input_fn(data_df, label_df, num_epochs=1000, shuffle=True, batch_size=32):
def input_function(): # inner function, this will be returned
ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df)) # create tf.data.Dataset object with data and its label
if shuffle:
ds = ds.shuffle(1000) # randomize order of data
ds = ds.batch(batch_size).repeat(num_epochs) # split dataset into batches of 32 and repeat process for number of epochs
return ds # return a batch of the dataset
return input_function # return a function object for use
train_input_fn = make_input_fn(dftrain, y_train) # here we will call the input_function that was returned to us to get a dataset object we can feed to the model
eval_input_fn = make_input_fn(dfeval, y_eval, num_epochs=1, shuffle=False)
pre_input_fn = make_input_fn(dfpre, y_pre, num_epochs=1, shuffle=False)
linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns)
linear_est.train(train_input_fn) # train
result = linear_est.evaluate(eval_input_fn)
What I've been doing, and I'm sure this isn't recommended (but I've not seen another method) is to run linear_est.train multiple times, and access the return of linear_est.evaluate() as so:
loss_by_epoch = list()
current_loss = 1.0
acceptable_loss = 0.4
while current_loss > acceptable_loss:
linear_est.train(train_input_fn)
result = linear_est.evaluate(eval_input_fn)
current_loss = result['loss']
loss_by_epoch.append(current_loss)
print(loss_by_epoch)
P.S. If anyone else wants to answer this question, feel free; this answer seems like the only way, and I hope it isn't.

Different Results between Sequential and Model in Tensorflow despite same Build up

I am trying to move a sequential neural network from the time series tutorial on the Tensorflow website to a functional API one (https://www.tensorflow.org/tutorials/structured_data/time_series#single-shot_models).
The tutorial code is as follows:
multi_dense_model = tf.keras.Sequential()
multi_dense_model.add(tf.keras.layers.Input(shape=(24, 19)))
multi_dense_model.add(tf.keras.layers.Lambda(lambda x: x[:, -1:, :]))
multi_dense_model.add(tf.keras.layers.Dense(512, activation='relu'))
multi_dense_model.add(tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros()))
multi_dense_model.add(tf.keras.layers.Reshape([OUT_STEPS, num_features]))
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=2, mode='min')
multi_dense_model.compile(loss=tf.keras.losses.MeanSquaredError(), optimizer=tf.keras.optimizers.Adam(), metrics=[tf.keras.metrics.MeanAbsoluteError()])
history = multi_dense_model.fit(multi_window.train, epochs=MAX_EPOCHS, validation_data=multi_window.val, callbacks=[early_stopping])
Where I get the following result:
Epoch 1/20
1532/1532 [==============================] - 11s 7ms/step - loss: 0.2391 - mean_absolute_error: 0.3012 - val_loss: 0.2272 - val_mean_absolute_error: 0.2895
Epoch 2/20
1532/1532 [==============================] - 9s 6ms/step - loss: 0.2226 - mean_absolute_error: 0.2850 - val_loss: 0.2283 - val_mean_absolute_error: 0.2908
Epoch 3/20
1532/1532 [==============================] - 11s 7ms/step - loss: 0.2192 - mean_absolute_error: 0.2820 - val_loss: 0.2230 - val_mean_absolute_error: 0.2847
Epoch 4/20
1532/1532 [==============================] - 10s 7ms/step - loss: 0.2166 - mean_absolute_error: 0.2798 - val_loss: 0.2212 - val_mean_absolute_error: 0.2836
Epoch 5/20
1532/1532 [==============================] - 10s 7ms/step - loss: 0.2144 - mean_absolute_error: 0.2780 - val_loss: 0.2189 - val_mean_absolute_error: 0.2809
Epoch 6/20
1532/1532 [==============================] - 9s 6ms/step - loss: 0.2131 - mean_absolute_error: 0.2768 - val_loss: 0.2196 - val_mean_absolute_error: 0.2812
Epoch 7/20
1532/1532 [==============================] - 10s 7ms/step - loss: 0.2118 - mean_absolute_error: 0.2759 - val_loss: 0.2193 - val_mean_absolute_error: 0.2827
437/437 [==============================] - 2s 4ms/step - loss: 0.2193 - mean_absolute_error: 0.2827
Now I changed the code to the functional API:
input1 = tf.keras.layers.Input(shape=(24, 19))
lamb1 = tf.keras.layers.Lambda(lambda x: x[:, -1:, :])(input1)
dense1 = tf.keras.layers.Dense(512, activation='relu')(lamb1)
dense2 = tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros(), activation='relu')(dense1)
resha1 = tf.keras.layers.Reshape([OUT_STEPS, num_features])(dense2)
multi_dense_model = tf.keras.models.Model(inputs=input1, outputs=resha1)
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=2, mode='min')
multi_dense_model.compile(loss=tf.keras.losses.MeanSquaredError(), optimizer=tf.keras.optimizers.Adam(), metrics=[tf.keras.metrics.MeanAbsoluteError()])
history = multi_dense_model.fit(multi_window.train, epochs=MAX_EPOCHS, validation_data=multi_window.val, callbacks=[early_stopping])
multi_val_performance['Dense'] = multi_dense_model.evaluate(multi_window.val)
multi_performance['Dense'] = multi_dense_model.evaluate(multi_window.test, verbose=0)
multi_window.plot(multi_dense_model)
And get this:
Epoch 1/20
1532/1532 [==============================] - 11s 7ms/step - loss: 0.9995 - mean_absolute_error: 0.8084 - val_loss: 0.9425 - val_mean_absolute_error: 0.7799
Epoch 2/20
1532/1532 [==============================] - 10s 7ms/step - loss: 0.9995 - mean_absolute_error: 0.8084 - val_loss: 0.9425 - val_mean_absolute_error: 0.7799
Epoch 3/20
1532/1532 [==============================] - 10s 7ms/step - loss: 0.9995 - mean_absolute_error: 0.8084 - val_loss: 0.9425 - val_mean_absolute_error: 0.7799
Epoch 4/20
1532/1532 [==============================] - 10s 7ms/step - loss: 0.9995 - mean_absolute_error: 0.8084 - val_loss: 0.9425 - val_mean_absolute_error: 0.7799
Epoch 5/20
1532/1532 [==============================] - 10s 7ms/step - loss: 0.9995 - mean_absolute_error: 0.8084 - val_loss: 0.9425 - val_mean_absolute_error: 0.7799
Epoch 6/20
1532/1532 [==============================] - 10s 7ms/step - loss: 0.9995 - mean_absolute_error: 0.8084 - val_loss: 0.9425 - val_mean_absolute_error: 0.7799
Epoch 7/20
1532/1532 [==============================] - 11s 7ms/step - loss: 0.9995 - mean_absolute_error: 0.8084 - val_loss: 0.9425 - val_mean_absolute_error: 0.7799
Any idea why this might be? I tried a lot of things but cant get it to match. Also model.summary prints pretty much the same for both (Sequential is always ignoring the Input Layer but I think that does not make a difference since you have to specify for Models Input)
This is the complete code I am using in case you want to copy paste
import os
import datetime
import IPython
import IPython.display
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import tensorflow as tf
mpl.rcParams['figure.figsize'] = (8, 6)
mpl.rcParams['axes.grid'] = False
zip_path = tf.keras.utils.get_file(
origin='https://storage.googleapis.com/tensorflow/tf-keras-datasets/jena_climate_2009_2016.csv.zip',
fname='jena_climate_2009_2016.csv.zip',
extract=True)
csv_path, _ = os.path.splitext(zip_path)
df = pd.read_csv(csv_path)
# Slice [start:stop:step], starting from index 5 take every 6th record.
df = df[5::6]
date_time = pd.to_datetime(df.pop('Date Time'), format='%d.%m.%Y %H:%M:%S')
wv = df['wv (m/s)']
bad_wv = wv == -9999.0
wv[bad_wv] = 0.0
max_wv = df['max. wv (m/s)']
bad_max_wv = max_wv == -9999.0
max_wv[bad_max_wv] = 0.0
# The above inplace edits are reflected in the DataFrame.
df['wv (m/s)'].min()
wv = df.pop('wv (m/s)')
max_wv = df.pop('max. wv (m/s)')
# Convert to radians.
wd_rad = df.pop('wd (deg)')*np.pi / 180
# Calculate the wind x and y components.
df['Wx'] = wv*np.cos(wd_rad)
df['Wy'] = wv*np.sin(wd_rad)
# Calculate the max wind x and y components.
df['max Wx'] = max_wv*np.cos(wd_rad)
df['max Wy'] = max_wv*np.sin(wd_rad)
timestamp_s = date_time.map(pd.Timestamp.timestamp)
day = 24*60*60
year = (365.2425)*day
df['Day sin'] = np.sin(timestamp_s * (2 * np.pi / day))
df['Day cos'] = np.cos(timestamp_s * (2 * np.pi / day))
df['Year sin'] = np.sin(timestamp_s * (2 * np.pi / year))
df['Year cos'] = np.cos(timestamp_s * (2 * np.pi / year))
column_indices = {name: i for i, name in enumerate(df.columns)}
n = len(df)
train_df = df[0:int(n*0.7)]
val_df = df[int(n*0.7):int(n*0.9)]
test_df = df[int(n*0.9):]
num_features = df.shape[1]
train_mean = train_df.mean()
train_std = train_df.std()
train_df = (train_df - train_mean) / train_std
val_df = (val_df - train_mean) / train_std
test_df = (test_df - train_mean) / train_std
df_std = (df - train_mean) / train_std
df_std = df_std.melt(var_name='Column', value_name='Normalized')
class WindowGenerator():
def __init__(self, input_width, label_width, shift,
train_df=train_df, val_df=val_df, test_df=test_df,
label_columns=None):
# Store the raw data.
self.train_df = train_df
self.val_df = val_df
self.test_df = test_df
# Work out the label column indices.
self.label_columns = label_columns
if label_columns is not None:
self.label_columns_indices = {name: i for i, name in
enumerate(label_columns)}
self.column_indices = {name: i for i, name in
enumerate(train_df.columns)}
# Work out the window parameters.
self.input_width = input_width
self.label_width = label_width
self.shift = shift
self.total_window_size = input_width + shift
self.input_slice = slice(0, input_width)
self.input_indices = np.arange(self.total_window_size)[self.input_slice]
self.label_start = self.total_window_size - self.label_width
self.labels_slice = slice(self.label_start, None)
self.label_indices = np.arange(self.total_window_size)[self.labels_slice]
def __repr__(self):
return '\n'.join([
f'Total window size: {self.total_window_size}',
f'Input indices: {self.input_indices}',
f'Label indices: {self.label_indices}',
f'Label column name(s): {self.label_columns}'])
def make_dataset(self, data):
data = np.array(data, dtype=np.float32)
ds = tf.keras.utils.timeseries_dataset_from_array(
data=data,
targets=None,
sequence_length=self.total_window_size,
sequence_stride=1,
shuffle=True,
batch_size=32,)
ds = ds.map(self.split_window)
return ds
WindowGenerator.make_dataset = make_dataset
#property
def train(self):
return self.make_dataset(self.train_df)
#property
def val(self):
return self.make_dataset(self.val_df)
#property
def test(self):
return self.make_dataset(self.test_df)
#property
def example(self):
"""Get and cache an example batch of `inputs, labels` for plotting."""
result = getattr(self, '_example', None)
if result is None:
# No example batch was found, so get one from the `.train` dataset
result = next(iter(self.train))
# And cache it for next time
self._example = result
return result
WindowGenerator.train = train
WindowGenerator.val = val
WindowGenerator.test = test
WindowGenerator.example = example
def split_window(self, features):
inputs = features[:, self.input_slice, :]
labels = features[:, self.labels_slice, :]
if self.label_columns is not None:
labels = tf.stack(
[labels[:, :, self.column_indices[name]] for name in self.label_columns],
axis=-1)
# Slicing doesn't preserve static shape information, so set the shapes
# manually. This way the `tf.data.Datasets` are easier to inspect.
inputs.set_shape([None, self.input_width, None])
labels.set_shape([None, self.label_width, None])
return inputs, labels
WindowGenerator.split_window = split_window
OUT_STEPS = 24
multi_window = WindowGenerator(input_width=24,
label_width=OUT_STEPS,
shift=OUT_STEPS)
MAX_EPOCHS = 20
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
# multi_dense_model = tf.keras.Sequential()
# multi_dense_model.add(tf.keras.layers.Input(shape=(24, 19)))
# multi_dense_model.add(tf.keras.layers.Lambda(lambda x: x[:, -1:, :]))
# multi_dense_model.add(tf.keras.layers.Dense(512, activation='relu'))
# multi_dense_model.add(tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros()))
# multi_dense_model.add(tf.keras.layers.Reshape([OUT_STEPS, num_features]))
input1 = tf.keras.layers.Input(shape=(24, 19))
lamb1 = tf.keras.layers.Lambda(lambda x: x[:, -1:, :])(input1)
dense1 = tf.keras.layers.Dense(512, activation='relu')(lamb1)
dense2 = tf.keras.layers.Dense(OUT_STEPS*num_features, activation='relu')(dense1)
resha1 = tf.keras.layers.Reshape([OUT_STEPS, num_features])(dense2)
multi_dense_model = tf.keras.models.Model(inputs=input1, outputs=resha1)
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=2, mode='min')
multi_dense_model.compile(loss=tf.keras.losses.MeanSquaredError(), optimizer=tf.keras.optimizers.Adam(), metrics=[tf.keras.metrics.MeanAbsoluteError()], run_eagerly=True)
history = multi_dense_model.fit(multi_window.train, epochs=MAX_EPOCHS, validation_data=multi_window.val, callbacks=[early_stopping])
Most likely because you are applying 2 non linearity:
#Sequential
multi_dense_model.add(tf.keras.layers.Dense(512, activation='relu'))
multi_dense_model.add(tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros()))
#Functional
dense1 = tf.keras.layers.Dense(512, activation='relu')(lamb1)
dense2 = tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros(), activation='relu')(dense1)
# scroll right -------------------------------> ^^^^^^^^^^^^^^^^^
And by definition, Dense layer with no activation, becomes linear layer... So the two models are not equivalent

How to reduce false positives and false negatives on train set in deep learning

I have training a deep neural network for classification task on my machine learning dataset.
On train as well as test set below are the observations:
For every true positive there are approx 3 false positive.
For approx 4 true negatives there is 1 false negatives
The data is scaled with standardscaler and then clipped between -5 and 5.
Below are observations while training.
382/382 [==============================] - 3s 9ms/step - loss: 0.6897 - tp: 84096.0000 - fp: 244779.0000 - tn: 355888.0000 - fn: 97448.0000 - accuracy: 0.5625 - precision: 0.2557 - recall: 0.4632 - auc: 0.5407 - prc: 0.2722
val_loss: 0.6838 - val_tp: 19065.0000 - val_fp: 56533.0000 - val_tn: 91902.0000 - val_fn: 23829.0000 - val_accuracy: 0.5800 - val_precision: 0.2522 - val_recall: 0.4445 - val_auc: 0.5468 - val_prc: 0.2722
Can someone expert please help to let me know what can I do to minimise false classification on train as well as test set.
I am using imbalanced dataset with class_weight as shown in below code:-
METRICS = [
keras.metrics.TruePositives(name='tp'),
keras.metrics.FalsePositives(name='fp'),
keras.metrics.TrueNegatives(name='tn'),
keras.metrics.FalseNegatives(name='fn'),
keras.metrics.BinaryAccuracy(name='accuracy'),
keras.metrics.Precision(name='precision'),
keras.metrics.Recall(name='recall'),
keras.metrics.AUC(name='auc'),
keras.metrics.AUC(name='prc', curve='PR'), # precision-recall curve
]
pos = sum(y_train)
neg = y_train.shape[0] - pos
total = y_train.shape[0]
weight_for_0 = (1 / neg) * (total / 2.0)
weight_for_1 = (1 / pos) * (total / 2.0)
class_weight = {0: weight_for_0, 1: weight_for_1}
def make_model(size, layers, metrics=METRICS, output_bias=None):
if output_bias is not None:
output_bias = tf.keras.initializers.Constant(output_bias)
model = keras.Sequential()
model.add(keras.layers.Dense(size,input_shape=(window_length*indicators,)))
model.add(keras.layers.Dropout(0.5))
for i in range(layers-1):
model.add(keras.layers.Dense(size))
model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Dense(1, activation = "sigmoid", bias_initializer=output_bias))
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=0.0001),
loss=tf.keras.losses.BinaryCrossentropy(),
metrics=metrics)
return model
EPOCHS = 100
BATCH_SIZE = 2048
early_stopping = tf.keras.callbacks.EarlyStopping(
monitor='val_prc',
verbose=1,
patience=10,
mode='max',
restore_best_weights=True)
model = make_model(size = size,layers=layers, output_bias=np.log(pos/neg))
history = model.fit(X_train,y_train, batch_size=BATCH_SIZE, epochs=EPOCHS,callbacks=[early_stopping],validation_data=(X_test, y_test_pos)
,class_weight=class_weight
)
Can somebody please help

Tensorflow LSTM, Predict 5 values each time single value comes

I have a parser for some web page, each 20-60 seconds I am getting an array Xt=[x0, x1, x2, x3, x4]. I am only interested in predicting next five x0 from each array, but want to use x1,..x4 as well. So, for Xt predict [Xt+1[0], Xt+2[0],..., Xt+5[0]]. Not sure how to approach this problem at all.
First I tried to break all data into sequences with time_step=24: X=[[X0,...,X23], [X1,...,X24], [X2,...X25], ...]; Y = [[X24[0],...,X28[0]], [X25[0],...,X29[0]],...]. Used bunch of keras.LSTM layers and the model returns same values in each column.
Now I did this (time_step=1, for Xt predict [Xt+1[0],..., Xt+5[0]]):
import pandas as pd
import numpy as np
df1 = pd.read_csv('data1.csv', index_col=0)
data1 = df1.to_numpy()
trX = []
trY = []
for i in range(0, len(data1)-6):
trX.append(data1[i])
trY.append(data1[i+1:i+6][:, 0])
trX = np.array(trX, dtype=np.float32)
trY = np.array(trY, dtype=np.float32)
trX = trX.reshape(-1, 1, 5)
BS = 600
trX = trX[:trX.shape[0] - trX.shape[0] % BS]
trY = trY[:trY.shape[0] - trY.shape[0] % BS]
valX = trX[-BS:]
trX = trX[:-BS]
valY = trY[-BS:]
trY = trY[:-BS]
import tensorflow as tf
import numpy as np
model1 = tf.keras.Sequential()
model1.add(tf.keras.layers.LSTM(256, return_sequences=True, batch_input_shape=(600, 1, 5), stateful=True))
model1.add(tf.keras.layers.LSTM(128, return_sequences=True, stateful=True))
model1.add(tf.keras.layers.LSTM(64, return_sequences=True, stateful=True))
model1.add(tf.keras.layers.Flatten())
model1.add(tf.keras.layers.Dense(64, activation=tf.keras.activations.elu))
model1.add(tf.keras.layers.Dense(32, activation=tf.keras.activations.elu))
model1.add(tf.keras.layers.Dense(5, activation=tf.keras.activations.elu))
model1.compile(optimizer='adam',
loss='mse',
metrics=['acc'])
model1.fit(trX, trY, 600, 5, 1, validation_data=(valX, valY), shuffle=False)
res1 = model1(valX)
res1
Epoch 1/5
21/21 [==============================] - 6s 66ms/step - loss: 5.6772e-04 - acc: 0.2037 - val_loss: 3.7310e-04 - val_acc: 0.1917
Epoch 2/5
21/21 [==============================] - 0s 11ms/step - loss: 8.0731e-04 - acc: 0.1967 - val_loss: 3.2517e-04 - val_acc: 0.2083
Epoch 3/5
21/21 [==============================] - 0s 12ms/step - loss: 6.7266e-04 - acc: 0.2015 - val_loss: 4.2750e-04 - val_acc: 0.2083
Epoch 4/5
21/21 [==============================] - 0s 12ms/step - loss: 8.3055e-04 - acc: 0.2023 - val_loss: 7.4263e-05 - val_acc: 0.1917
Epoch 5/5
21/21 [==============================] - 0s 11ms/step - loss: 6.4451e-04 - acc: 0.1983 - val_loss: 2.0734e-04 - val_acc: 0.1917
<tf.Tensor: shape=(600, 5), dtype=float32, numpy=
array([[ 0.01462946, -0.0035404 , -0.01471442, 0.01326532, -0.0222075 ],
[ 0.01454796, -0.00362718, -0.01483804, 0.01332456, -0.02220327],
[ 0.01449167, -0.0035699 , -0.01502049, 0.01351681, -0.02212006],
...,
[ 0.01451699, -0.00386065, -0.01463401, 0.01302508, -0.02228123],
[ 0.01449066, -0.00371438, -0.0148297 , 0.01326665, -0.02216893],
[ 0.01450208, -0.0035758 , -0.01488554, 0.01341164, -0.02206981]],
dtype=float32)>
data1.csv
What approach should I use?
The answer is based on my understanding of your problem.
You want to take all 5 attribute for x timesteps and predict only one attribute for next 5 timesteps. Supposing that for x=16 timesteps, you want to predict next 5.
Using timeseries_dataset_from_array from keras.preprocessing
X = tf.keras.preprocessing.timeseries_dataset_from_array(
data1, None, 16, sequence_stride=1, sampling_rate=1, batch_size=128)
Y = tf.keras.preprocessing.timeseries_dataset_from_array(
data1.reshape((len(data1),5,1))[:,0],targets=None, sequence_length=5, sequence_stride=1, sampling_rate=1, batch_size=128,start_index=16)
Here, we want to do data1.reshape((len(data1),5,1) to have one feature per timesteps (5,1). Otherwise it will treat it as 5 features per one timestep (1,5)
You can verify it by seeing first example from one batch
for y in Y.take(1):
print(y[0])
tf.Tensor( [[5.2513130e-05] [6.7516880e-05] [2.0505126e-04]
[4.9012253e-04] [2.6181545e-03]], shape=(5, 1), dtype=float64)
for x in X.take(1):
print(x[0])
tf.Tensor( [[7.15178800e-04 4.77345650e-01 2.95000000e-01
6.57851550e-02
2.55877470e-02] [4.15103770e-04 4.77803350e-01 2.61000000e-01 4.81817540e-02
1.53348090e-02] [0.00000000e+00 4.77858450e-01 2.56750000e-01 5.49672660e-02
0.00000000e+00] [1.17529380e-04 4.78104230e-01 2.30000000e-01 4.48042680e-02
1.49495100e-03] [7.50187540e-05 4.78617040e-01 2.83500000e-01 5.70335300e-02
1.69152800e-03] [9.75243800e-05 4.78693340e-01 2.48750000e-01 5.27507600e-02
2.45307600e-03] [1.55038750e-04 4.78943380e-01 3.12500000e-01 7.79491600e-02
7.45544300e-03] [6.50162500e-04 4.79070500e-01 3.22500000e-01 8.41833000e-02
3.43781560e-02] [7.72693200e-04 4.79375660e-01 3.42250000e-01 8.51799300e-02
3.39922500e-02] [2.25056260e-05 4.79435000e-01 3.20500000e-01 6.69510960e-02
1.01650000e-05] [1.19584896e-01 4.79981700e-01 2.73750000e-01 5.85157000e-02
1.56834650e-01] [3.91847970e-03 4.80363100e-01 2.93000000e-01 6.89749400e-02
5.97691870e-02] [1.57539380e-04 4.80617400e-01 2.72000000e-01 5.25309100e-02
3.83557300e-03] [2.17554390e-04 4.80706400e-01 2.51500000e-01 5.18024450e-02
7.34595600e-03] [1.69292330e-03 4.81036960e-01 2.79000000e-01 5.94664920e-02
3.83583500e-02] [4.00100030e-05 4.81113260e-01 3.16500000e-01 6.70532600e-02
8.07160000e-04]], shape=(16, 5), dtype=float64)
You will then need to zip them together to pass it to fit() method.
ds = tf.data.Dataset.zip((X, Y))

Tensorflow DataSet Shuffle Impact the validation training accuracy and ambiguous behavior

i am struggling with training a neural network that uses tf.data.DataSet as input.
What I find is that if I call .shuffle() before split the entire dataset in train, val, test set the accuracy on val (in training) and test (in evaluate) is 91%, but when I run .evaluate() on the test set many times the accuracy and loss metrics change every time. The same behavior occurs with .predict() on test set, with the classes that change every time.
This is the output of traning, evaluate end predict process
total_record: 93166 - trainin_size: 74534 - val_size: 9316 - test_size: 9316
Epoch 1/5
145/145 [==============================] - 42s 273ms/step - loss: 1.7143 - sparse_categorical_accuracy: 0.4051 - val_loss: 1.4997 - val_sparse_categorical_accuracy: 0.4885
Epoch 2/5
145/145 [==============================] - 40s 277ms/step - loss: 0.7571 - sparse_categorical_accuracy: 0.7505 - val_loss: 1.1634 - val_sparse_categorical_accuracy: 0.6050
Epoch 3/5
145/145 [==============================] - 41s 281ms/step - loss: 0.4894 - sparse_categorical_accuracy: 0.8223 - val_loss: 0.7628 - val_sparse_categorical_accuracy: 0.7444
Epoch 4/5
145/145 [==============================] - 38s 258ms/step - loss: 0.3417 - sparse_categorical_accuracy: 0.8656 - val_loss: 0.4236 - val_sparse_categorical_accuracy: 0.8579
Epoch 5/5
145/145 [==============================] - 40s 271ms/step - loss: 0.2660 - sparse_categorical_accuracy: 0.8926 - val_loss: 0.2807 - val_sparse_categorical_accuracy: 0.9105
accr = model.evaluate(test_set)
19/19 [==============================] - 1s 39ms/step - loss: 0.2622 - sparse_categorical_accuracy: 0.9153
accr = model.evaluate(test_set)
19/19 [==============================] - 1s 40ms/step - loss: 0.2649 - sparse_categorical_accuracy: 0.9170
accr = model.evaluate(test_set)
19/19 [==============================] - 1s 40ms/step - loss: 0.2726 - sparse_categorical_accuracy: 0.9141
accr = model.evaluate(test_set)
19/19 [==============================] - 1s 40ms/step - loss: 0.2692 - sparse_categorical_accuracy: 0.9166
pred = model.predict(test_set)
pred_class = np.argmax(pred, axis=1)
pred_class
Out[41]: array([0, 1, 5, ..., 2, 0, 1])
pred = model.predict(test_set)
pred_class = np.argmax(pred, axis=1)
pred_class
Out[42]: array([2, 3, 1, ..., 1, 2, 0])
pred = model.predict(test_set)
pred_class = np.argmax(pred, axis=1)
pred_class
Out[43]: array([1, 2, 4, ..., 1, 3, 0])
pred = model.predict(test_set)
pred_class = np.argmax(pred, axis=1)
pred_class
Out[44]: array([0, 3, 1, ..., 0, 5, 4])
So, I tried to apply .shuffle() after the split and only on the training and validation (commenting the main .shuffle() and uncommenting the shuffle in train_set and val_set).
But in this case, I find that the network goes into overfitting after just 5 epochs (with the previous training process callbacks block the training at 30° epochs with 94% val accuracy), with an accuracy of 75% since 2° epoch on validation set.
However, in this case if I run .evaluate() and .predict() on the testset to which .shuffle () has not been applied, the metrics and classes remain unchanged on each call.
Why this behavior?
But especially what is the great way and what is the real accuracy of model?
Thank's
This is the code of the process
""" ### Make tf.data.Dataset """
dataset = tf.data.Dataset.from_tensor_slices(({ "features_emb_subj": features_emb_subj,
"features_emb_snip": features_emb_snip,
"features_emb_fromcat": features_emb_fromcat,
"features_dense": features_dense,
"features_emb_user": features_emb_user}, cat_labels))
dataset = dataset.shuffle(int(len(features_dense)), reshuffle_each_iteration=True)
""" ### Split in train,val,test """
train_size = int(0.8 * len(features_dense))
val_size = int(0.10 * len(features_dense))
test_size = int(0.10 * len(features_dense))
test_set = dataset.take(test_size)
validation_set = dataset.skip(test_size).take(val_size)
training_set = dataset.skip(test_size + val_size)
test_set = test_set.batch(BATCH_SIZE, drop_remainder=False)
#validation_set = validation_set.shuffle(val_size, reshuffle_each_iteration=True)
validation_set = validation_set.batch(BATCH_SIZE, drop_remainder=False)
#training_set = training_set.shuffle(train_size, reshuffle_each_iteration=True)
training_set = training_set.batch(BATCH_SIZE, drop_remainder=True)
"""### Train model """
callbacks = [EarlyStopping(monitor='val_loss', patience=3, min_delta=0.0001, restore_best_weights=True)]
history = model.fit( training_set,
epochs = 5,
validation_data = validation_set,
callbacks=callbacks,
class_weight = setClassWeight(cat_labels),
verbose = 1)
"""### Evaluate model """
accr = model.evaluate(test_set)
"""### Predict test_test """
pred = model.predict(test_set)
pred_class = np.argmax(pred, axis=1)
pred_class
In the comments of this Question you can see that shuffle applies to the base dataset, and such is propagated to the references in the train, test and validation sets.
I would recommend to create 3 distinct datasets, using (e.g.) sklearn.model_selection.train_test_split on the original data before tf.data.Dataset.from_tensor_slices on those split tensor slices, so you can use the shuffle on the training dataset only.