I want to train my timeseries prediction model (LSTM-model, using tensorflow 2.2) with using the timeseries generator in order to generate the data on the fly.
What I observed is that the train loss decreases good in the first steps of training, but then it increases with the last 12 steps. Then I found out, that the model.fit method runs the timeseriesgenerator 11 times and starts training after that. So thats the cause that my model doesn't train with the first samples of data and therefore starts in the middle of my dataset. When I reach the end of the epochs, there is no data left so it gets an empty array and I think that this "destoys" the training progress and therefore the training loss starts to increase.
Here are the important parts of my code to understand my problem:
Model:
model = keras.Sequential()
model.add(keras.layers.Bidirectional(k.layers.LSTM(20, activation='relu'), input_shape=(seq_length, 1)))
model.add(keras.layers.Dense(1))
Training:
training_data = DataGenerator(data=train_data,
seq_length=1024,
batch_size=2048,
shuffle=False)
training_process = model.fit(x=training_data,
epochs=8,
verbose=True,
validation_data=validation_data)
The DataGenerator class countains the data generator, with the real values after the # :
class DataGenerator(k.utils.Sequence):
def __init__(self, data, seq_length, batch_size, shuffle=True):
self.data = data
self.n_sequences = np.shape(self.data)[0] #2
self.data_length = np.shape(self.data)[1] #60000
self.seq_length = seq_length #1024
self.batch_size = batch_size #2048
self.shuffle = shuffle #False
self.seq_index = 0
self.own_index = 0
self.n_batches = int(np.floor((self.data_length-self.seq_length)/self.batch_size)) #28
self.on_epoch_end()
def __len__(self):
return int(np.floor((self.data_length-self.seq_length)/self.batch_size)*self.n_sequences) #56
def __getitem__(self, index):
print('own index=' + str(self.own_index)) # for testing generator
index = int(self.own_index/self.n_batches) #0
start = (self.own_index % self.n_batches) * self.batch_size #0
end = start + self.batch_size #2048
gen = TimeseriesGenerator(data=self.data[index],
targets=self.data[index],
length=self.seq_length,
sampling_rate=1, stride=1, start_index=start, end_index=end,
shuffle=False, reverse=False, batch_size=self.batch_size)
self.own_index += 1
if self.own_index > (self.n_sequences*self.n_batches-1):
self.own_index = 0
x, y = gen[0]
x = np.expand_dims(x, axis=2)
return x, y
def on_epoch_end(self):
self.own_index = 0
if self.shuffle is True:
perm = np.random.permutation(self.data.shape[0])
self.data = self.data[perm]
When I run this code, I get the following output:
own index=0 own index=1 own index=2 own index=3 own index=4 own
index=5 own index=6 own index=7 own index=8 own index=9 own index=10
own index=11
1/56 [..............................] - ETA: 0s - loss: 0.3705own
index=12
So the training starts with the index of 12, which causes the start-value to be 24576 instead of 0. So all the data before this start index are not used for training.
Can someone help me to find out, what causes the fit method to run the generator 11 times before training starts?
Related
First, I m sorry but it's not possible to reproduce this problem on a few lines, as the model involved is a very complex network.
But here is an idea of the code:
def return_iterator(data, nb_epochs, batch_size):
dataset = tf.data.Dataset.from_tensor_slices(data)
dataset = dataset.repeat(nb_epochs).batch(batch_size)
iterator = dataset.make_one_shot_iterator()
yy = iterator.get_next()
return tf.cast(yy, tf.float32)
with tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) as sess:
y_pred = complex_model.autoencode(train)
y_pred = tf.convert_to_tensor(y_pred, dtype=tf.float32)
nb_epochs = 10
batch_size = 64
y_real = return_iterator(train, nb_epochs, batch_size)
y_pred = return_iterator(y_pred, nb_epochs, batch_size)
res_equal = 1. - tf.reduce_mean(tf.abs(y_pred - y_real), [1,2,3])
loss = 1 - tf.reduce_sum(res_equal, axis=0)
opt = tf.train.AdamOptimizer().minimize(loss)
tf.global_variables_initializer().run()
for epoch in range(0, nb_epochs):
_, d_loss = sess.run([opt, loss])
To define the loss, I must use operations like tf.reduce_mean and tf.reduce_sum , and these operations only accept Tensors as input.
My question is: with this code, will the complex_model autoencoder be trained during the training ? (eventhough here, it's just used to output the predictions to compute the loss)
Thank you
p.s: I am using TF1.15 (and I cannot use another version)
I am trying to convert a Tensorflow object localization code into Pytorch. In the original code, the author use model.compile / model.fit to train the model so I don't understand how the losses of classification of the MNIST digits and box regressions work. Still, I'm trying to implement my own training loop in Pytorch.
The goal here is, after some preprocessing, past the MNIST digits randomly into a black square image and then, classify and localize (bounding boxes) the digit.
I set two losses : nn.CrossEntropyLoss and nn.MSELoss and I do (loss_1+loss_2).backward() to compute the gradients. I know it's the right way to compute gradients with two losses from here and here.
But still, my loss doesn't decrease whereas it collapses quasi-imediately with the Tensorflow code. I checked the model with torchinfo.summary and it seems behaving as well as the Tensorflow implementation.
EDIT :
I looked for the predicted labels of my model and it doesn't seem to change at all.
This line of code label_preds, bbox_coords_preds = model(digits) always returns the same values
label_preds[0] = tensor([[0.0156, 0.0156, 0.0156, 0.0156, 0.0156, 0.0156, 0.0156, 0.0156, 0.0156, 0.0156]], device='cuda:0', grad_fn=<SliceBackward0>)
Here are my questions :
Is my custom network set correctly ?
Are my losses set correctly ?
Why my label predictions don't change ?
Do my training loop work as well as the .compile and .fit Tensorflow methods ?
Thanks a lot !
PYTORCH CODE
class ConvNetwork(nn.Module):
def __init__(self):
super(ConvNetwork, self).__init__()
self.conv2d_1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3)
self.conv2d_2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3)
self.conv2d_3 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3)
self.avgPooling2D = nn.AvgPool2d((2,2))
self.dense_1 = nn.Linear(in_features=3136, out_features=128)
self.dense_classifier = nn.Linear(in_features=128, out_features=10)
self.softmax = nn.Softmax(dim=0)
self.dense_regression = nn.Linear(in_features=128, out_features=4)
def forward(self, input):
x = self.avgPooling2D(F.relu(self.conv2d_1(input)))
x = self.avgPooling2D(F.relu(self.conv2d_2(x)))
x = self.avgPooling2D(F.relu(self.conv2d_3(x)))
x = nn.Flatten()(x)
x = F.relu(self.dense_1(x))
output_classifier = self.softmax(self.dense_classifier(x))
output_regression = self.dense_regression(x)
return [output_classifier, output_regression]
######################################################
learning_rate = 0.1
EPOCHS = 1
BATCH_SIZE = 64
model = ConvNetwork()
model = model.to(device)
optimizer = torch.optim.Adam(params=model.parameters(), lr=learning_rate)
classification_loss = nn.CrossEntropyLoss()
regression_loss = nn.MSELoss()
######################################################
begin_time = time.time()
for epoch in range(EPOCHS) :
tot_loss = 0
train_start = time.time()
training_losses = []
print("-"*20)
print(" "*5 + f"EPOCH {epoch+1}/{EPOCHS}")
print("-"*20)
model.train()
for batch, (digits, labels, bbox_coords) in enumerate(training_dataset):
digits, labels, bbox_coords = digits.to(device), labels.to(device), bbox_coords.to(device)
optimizer.zero_grad()
[label_preds, bbox_coords_preds] = model(digits)
class_loss = classification_loss(label_preds, labels)
box_loss = regression_loss(bbox_coords_preds, bbox_coords)
training_loss = class_loss + box_loss
training_loss.backward()
optimizer.step()
######### print part #######################
training_losses.append(training_loss.item())
if batch+1 <= len_training_ds//BATCH_SIZE:
current_training_sample = (batch+1)*BATCH_SIZE
else:
current_training_sample = (batch)*BATCH_SIZE + len_training_ds%BATCH_SIZE
if (batch+1) == 1 or (batch+1)%100 == 0 or (batch+1) == len_training_ds//BATCH_SIZE +1:
print(f"Elapsed time : {(time.time()-train_start)/60:.3f}",\
f" --- Digit : {current_training_sample}/{len_training_ds}",\
f" : loss = {training_loss:.5f}")
if batch+1 == (len_training_ds//BATCH_SIZE)+1:
print(f"Total elapsed time for training : {(time.time()-begin_time)/60:.3f}")
ORIGINAL TENSORFLOW CODE
def feature_extractor(inputs):
x = tf.keras.layers.Conv2D(16, activation='relu', kernel_size=3, input_shape=(75, 75, 1))(inputs)
x = tf.keras.layers.AveragePooling2D((2, 2))(x)
x = tf.keras.layers.Conv2D(32,kernel_size=3,activation='relu')(x)
x = tf.keras.layers.AveragePooling2D((2, 2))(x)
x = tf.keras.layers.Conv2D(64,kernel_size=3,activation='relu')(x)
x = tf.keras.layers.AveragePooling2D((2, 2))(x)
return x
def dense_layers(inputs):
x = tf.keras.layers.Flatten()(inputs)
x = tf.keras.layers.Dense(128, activation='relu')(x)
return x
def classifier(inputs):
classification_output = tf.keras.layers.Dense(10, activation='softmax', name = 'classification')(inputs)
return classification_output
def bounding_box_regression(inputs):
bounding_box_regression_output = tf.keras.layers.Dense(units = '4', name = 'bounding_box')(inputs)
return bounding_box_regression_output
def final_model(inputs):
feature_cnn = feature_extractor(inputs)
dense_output = dense_layers(feature_cnn)
classification_output = classifier(dense_output)
bounding_box_output = bounding_box_regression(dense_output)
model = tf.keras.Model(inputs = inputs, outputs = [classification_output,bounding_box_output])
return model
def define_and_compile_model(inputs):
model = final_model(inputs)
model.compile(optimizer='adam',
loss = {'classification' : 'categorical_crossentropy',
'bounding_box' : 'mse'
},
metrics = {'classification' : 'accuracy',
'bounding_box' : 'mse'
})
return model
inputs = tf.keras.layers.Input(shape=(75, 75, 1,))
model = define_and_compile_model(inputs)
EPOCHS = 10 # 45
steps_per_epoch = 60000//BATCH_SIZE # 60,000 items in this dataset
validation_steps = 1
history = model.fit(training_dataset,
steps_per_epoch=steps_per_epoch,
validation_data=validation_dataset,
validation_steps=validation_steps, epochs=EPOCHS)
loss, classification_loss, bounding_box_loss, classification_accuracy, bounding_box_mse = model.evaluate(validation_dataset, steps=1)
print("Validation accuracy: ", classification_accuracy)
I answering to myself about this bug :
What I found :
I figured that I use a Softmax layer in my code while I'm using the nn.CrossEntropyLoss() as a loss.
What this problem was causing :
This loss already apply a softmax (doc)
Apply a softmax twice must add some noise to the loss and preventing convergence
What I did :
One should let a linear layer as an output for the classification layer.
An other way is to use the NLLLoss (doc) instead and let the softmax layer in the model class.
Also :
I don't fully understand how the .compile() and .fit() Tensorflow methods work but I think it should optimize the training one way or another (I think about the learning rate) since I had to decrease the learning rate to 0.001 in Pytorch to "unstick" the loss and makes it decrease.
I am trying to run VoxelMorph network to perform image registration VoxelMorph Repository
I have added the validation code to check the dice score and mutual information. But it is slow compared to the training.
I need help to optimize the validation code so that the computation time for validation reduces.
I am using callback to validate my code.
class MetricCallback(tf.keras.callbacks.Callback):
def __init__(self, val_generator, val_steps=10):
super(tf.keras.callbacks.Callback, self).__init__()
self.validation_gen = val_generator
self.validation_steps = val_steps
def on_train_batch_end(self, batch, logs=None):
if (batch % args.checkpoint == 0):
model_name = os.path.join(model_dir, '{batch:04d}.h5')
model.save(model_name.format(batch=batch))
def on_epoch_end(self, epoch, logs={}):
eval_df = pd.DataFrame()
for val_step, [inputs, outputs] in enumerate(self.validation_gen):
if (self.validation_steps == val_step):
break
# tensorflow device handling
# device, nb_devices = vxm.tf.utils.setup_device(0)
# with tf.device(device):
moving_img = inputs[0]
fixed_img = inputs[1]
moving_seg = inputs[2]
fixed_seg = outputs[2]
warp = self.model.register(moving_img, fixed_img)
# apply transform
warped_seg = transform_nearest.predict([moving_seg, warp]).squeeze()
warpped_moving_image = transform_model.predict([moving_img, warp]).squeeze()
overlap_results_df, surface_distance_results_df = evaluate(fixed_img.squeeze(), warpped_moving_image,
sitk.GetImageFromArray(fixed_seg.squeeze().transpose(1,0)),
sitk.GetImageFromArray(warped_seg.transpose(1,0)))
result = pd.concat([overlap_results_df, surface_distance_results_df], axis=1)
eval_df = eval_df.append(result, ignore_index=True)
sys.stdout.write("\r" + 'Validation Pair "{0}"/"{1}"'.format(val_step, self.validation_steps))
sys.stdout.flush()
tf.summary.scalar("Validation Avg Dice/step ", np.mean(eval_df['dice']), epoch) # dice/validation epoch
tf.summary.scalar("Validation Avg Mi/step ", np.mean(eval_df['mutual_information']), epoch) # Mutual Information/validation epoch
tf.summary.scalar("Validation Avg volume_similarity/step ", np.mean(eval_df['volume_similarity']), epoch) # Volume Similarity/validation epoch
# Training time
start_t = datetime.datetime.now()
# Train model
history = model.fit(train_generator,
initial_epoch=args.initial_epoch,
epochs=args.epochs,
steps_per_epoch = args.steps_per_epoch - 2, # Generator starts from 0 and one sample is already taken on line 127
callbacks=[save_callback, MetricCallback(val_generator, args.val_steps)],
verbose=1
)
I am new with Deep Learning with Pytorch. I am more experienced with Tensorflow, and thus I should say I am not new to Deep Learning itself.
Currently, I am working on a simple ANN classification. There are only 2 classes so quite naturally I am using a Softmax BCELoss combination.
The dataset is like this:
shape of X_train (891, 7)
Shape of Y_train (891,)
Shape of x_test (418, 7)
I transformed the X_train and others to torch tensors as train_data and so on. The next step is:
train_ds = TensorDataset(train_data, train_label)
# Define data loader
batch_size = 32
train_dl = DataLoader(train_ds, batch_size, shuffle=True)
I made the model class like:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# an affine operation: y = Wx + b
self.fc1 = nn.Linear(7, 32)
self.bc1 = nn.BatchNorm1d(32)
self.fc2 = nn.Linear(32, 64)
self.bc2 = nn.BatchNorm1d(64)
self.fc3 = nn.Linear(64, 128)
self.bc3 = nn.BatchNorm1d(128)
self.fc4 = nn.Linear(128, 32)
self.bc4 = nn.BatchNorm1d(32)
self.fc5 = nn.Linear(32, 10)
self.bc5 = nn.BatchNorm1d(10)
self.fc6 = nn.Linear(10, 1)
self.bc6 = nn.BatchNorm1d(1)
self.drop = nn.Dropout2d(p=0.5)
def forward(self, x):
torch.nn.init.xavier_uniform(self.fc1.weight)
x = self.fc1(x)
x = self.bc1(x)
x = F.relu(x)
x = self.drop(x)
x = self.fc2(x)
x = self.bc2(x)
x = F.relu(x)
#x = self.drop(x)
x = self.fc3(x)
x = self.bc3(x)
x = F.relu(x)
x = self.drop(x)
x = self.fc4(x)
x = self.bc4(x)
x = F.relu(x)
#x = self.drop(x)
x = self.fc5(x)
x = self.bc5(x)
x = F.relu(x)
x = self.drop(x)
x = self.fc6(x)
x = self.bc6(x)
x = torch.sigmoid(x)
return x
model = Net()
The loss function and the optimizer are defined:
loss = nn.BCELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.00001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)
At last, the task is to run the forward in epochs:
num_epochs = 1000
# Repeat for given number of epochs
for epoch in range(num_epochs):
# Train with batches of data
for xb,yb in train_dl:
pred = model(xb)
yb = torch.unsqueeze(yb, 1)
#print(pred, yb)
print('grad', model.fc1.weight.grad)
l = loss(pred, yb)
#print('loss',l)
# 3. Compute gradients
l.backward()
# 4. Update parameters using gradients
optimizer.step()
# 5. Reset the gradients to zero
optimizer.zero_grad()
# Print the progress
if (epoch+1) % 10 == 0:
print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, l.item()))
I can see in the output that after each iteration with all the batches, the hard weights are non-zero, after this zero_grad is applied.
However, the model is pretty bad. I get an F1 score of around 50% only! And the model is bad when I call it to predict the train_dl itself!!!
I am wondering what the reason is. The grad of weights not zero but not updating properly? The optimizer not optimizing the weights? Or what else?
Can someone please have a look?
I already tried different loss functions and optimizers. I tried with smaller datasets, bigger batches, different hyperparameters.
Thanks! :)
First of all, you don't use softmax activation for BCE loss, unless you have 2 output nodes, which is not the case. In PyTorch, BCE loss doesn't apply any activation function before calculating the loss, unlike the CCE which has a built-in softmax function. So, if you want to use BCE, you have to use sigmoid (or any function f: R -> [0, 1]) at the output layer, which you don't have.
Moreover, you should ideally do optimizer.zero_grad() for each batch if you want to do SGD (which is the default). If you don't do that, you will be just doing full-batch gradient descent, which is quite slow and gets stuck in local minima easily.
I have implemented seq2seq translation model in Tensorflow 2.0
But during training I get the following error:
ValueError: Shapes (2056, 10, 10000) and (1776, 10, 10000) are incompatible
I have 10000 records in my dataset. Starting from the first record untill 8224 records dimensions matches. But for the last 1776 records I get the error mentioned above just because my batch_size is bigger than remaining number of records. Here is my code:
max_seq_len_output = 10
n_words = 10000
batch_size = 2056
model = Model_translation(batch_size = batch_size,embed_size = embed_size,total_words = n_words , dropout_rate = dropout_rate,num_classes = n_words,embedding_matrix = embedding_matrix)
dataset_train = tf.data.Dataset.from_tensor_slices((encoder_input,decoder_input,decoder_output))
dataset_train = dataset_train.shuffle(buffer_size = 1024).batch(batch_size)
loss_object = tf.keras.losses.CategoricalCrossentropy()#used in backprop
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
train_loss = tf.keras.metrics.Mean(name='train_loss')#mean of the losses per observation
train_accuracy =tf.keras.metrics.CategoricalAccuracy(name='train_accuracy')
##### no #tf.function here
def training(X_1,X_2,y):
#creation of one-hot-encoding, because if I would do it out of the loop if would have RAM problem
y_numpy = y.numpy()
Y = np.zeros((batch_size,max_seq_len_output,n_words),dtype='float32')
for i, d in enumerate(y_numpy):
for t, word in enumerate(d):
if word != 0:
Y[i, t, word] = 1
Y = tf.convert_to_tensor(Y)
#predictions
with tf.GradientTape() as tape:#Trainable variables (created by tf.Variable or tf.compat.v1.get_variable, where trainable=True is default in both cases) are automatically watched.
predictions = model(X_1,X_2)
loss = loss_object(Y,predictions)
gradients = tape.gradient(loss,model.trainable_variables)
optimizer.apply_gradients(zip(gradients,model.trainable_variables))
train_loss(loss)
train_accuracy(Y,predictions)
del Y
del y_numpy
EPOCHS = 70
for epoch in range(EPOCHS):
for X_1,X_2,y in dataset_train:
training(X_1,X_2,y)
template = 'Epoch {}, Loss: {}, Accuracy: {}'
print(template.format(epoch+1,train_loss.result(),train_accuracy.result()*100))
# Reset the metrics for the next epoch
train_loss.reset_states()
train_accuracy.reset_states()
How can I fixt this problem?
One solution would be to drop the remainder during batching with
dataset_train = dataset_train.shuffle(buffer_size = 1024).batch(batch_size, drop_remainder=True)