Error with FNN model: RuntimeError: The size of tensor a (N) must match the size of tensor b (M) at non-singleton dimension 0 - testing

I trained the FNN model for the Linear regression problem in PyTorch, saved that model, and now trying to check my model performance by applying with torch.no_grad():. Y_test size is [64000 2] and X_Test size is [64000 2]. Batch_size = 100. I am trying to check the accuracy of my model which is coming to zero every single time. Unable to understand the problem.
X_Test = torch.tensor(PA_DATA[:64000,0:2], dtype=torch.float32)
#output
Y_Test =torch.tensor(PA_DATA[:64000,2:4], dtype=torch.float32)
test_dataset = Data.TensorDataset(X_Test, Y_Test)
testloader = DataLoader(test_dataset, batch_size=100, shuffle=False)
loss_func1 = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)
running_accuracy = 0.0
running_test_loss = 0.0
total = 0
with torch.no_grad():
model.eval()
for X_Test,T_Val in testloader:
pred = model(X_Test)
test_loss = loss_func1(pred,Y_Test)
_, predicted = torch.max(pred.data, 1)
running_test_loss += test_loss.item()
total += Y_Test.size(0)
running_accuracy += (predicted == Y_Test).sum().item()
Test_loss_value = running_test_loss/len(testloader)
accuracy = (100 * running_accuracy / total)
print(accuracy)
Sometimes it shows an error tensor of size a (100) did not match with tensor size b(2). I tried various things, but can't calculate the accuracy.
accuracy = 0.0
Test Loss: 0.001215781958308071
RuntimeError: The size of tensor a (100) must match the size of tensor b (64000) at non-singleton dimension 0

Related

Time-Series Transformer Model Prediction Accuracy

I have created a transformer model for multivariate time series predictions for a linear regression problem.
Details about the Dataset
I have the hourly varying data i.e., single feature (lagged energy use data). The model improvement could be done by increasing the number of lagged energy use data, which provide more information to the model) to predict the time sequence (energy consumption of a building). So my input has the shape X.shape = (8783, 168, 1) i.e., 8783 time sequences, each sequence contains lagged energy use data of one week i.e., 24*7 =168 hourly entries/vectors and each vector contains lagged energy use data as input. My output has the shape Y.shape = (8783,1) i.e., 8783 sequences each containing 1 output value (i.e., building energy consumption value after every hour).
Model Details
I took as a model an example from the official keras site. It is created for classification problems, I modified it for my regression problem by changing the activation of last output layer from sigmoid to relu. Input shape (train_f) = (8783, 168, 1) Output shape (train_P) = (8783,1) When I trained the model for 100 no. of epochs it converges very well for less number of epochs as compared to my reference models (i.e., LSTMs and LSTMS with self attention). After training, when the model is asked to make prediction by feeding in the test data, the prediction performance is also good as compare to the reference models.
For the same model predicting well, in order to improve its performance now I am feeding in the lagged data of energy use of 1 month i.e., 168*4 = 672 hourly entries/vectors and each vector contains lagged energy use data as input. So my input going into the model now has the shape X.shape = (8783, 672, 1). Both the training and prediction accuracy drops in comparison to weekly input data as seen below.
**lagged energy use data for 1 week i.e., X.shape = (8783, 168, 1)**
**MSE RMSE MAE R-Score**
Training data 1.0489 1.0242 0.6395 0.9707
Testing data 0.6221 0.7887 0.5648 0.9171
**lagged energy use data for 1 week i.e., X.shape = (8783, 672, 1)**
**MSE RMSE MAE R-Score**
Training data 1.6424 1.2816 0.7326 0.9567
Testing data 1.4991 1.2244 0.9233 0.6903
I believe that providing more information to the model should result in better predictions. Any suggestions, how to improve the model prediction/test accuracy? Is there something wrong with the model?
df_energy = pd.read_excel("/content/drive/MyDrive/Architecture Topology/Building_energy_consumption_record.xlsx")
extract_for_normalization = list(df_energy)[1]
df_data_float = df_energy[extract_for_normalization].astype(float)
df_data_array = df_data_float.to_numpy()
df_data_array_1 = df_data_array.reshape(-1,1)
from sklearn.model_selection import train_test_split
train_X, test_X = train_test_split(df_data_array_1, train_size = 0.7, shuffle = False)
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_train_X=scaler.fit_transform(train_X)
**Converting train_X into required shape (inputs,sequences, features)**
train_f = [] #features input from training data
train_p = [] # prediction values
n_future = 1 #number of days we want to predict into the future
n_past = 672 # no. of time series input features to be considered for training
for val in range(n_past, len(scaled_train_X) - n_future+1):
train_f.append(scaled_train_X[val - n_past:val, 0:scaled_train_X.shape[1]])
train_p.append(scaled_train_X[val + n_future - 1:val + n_future, -1])
train_f, train_p = np.array(train_f), np.array(train_p)
**Transformer Model**
def transformer_encoder(inputs, head_size, num_heads, ff_dim, dropout=0):
# Normalization and Attention
x = layers.LayerNormalization(epsilon=1e-6)(inputs)
x = layers.MultiHeadAttention(
key_dim=head_size, num_heads=num_heads, dropout=dropout
)(x, x)
x = layers.Dropout(dropout)(x)
res = x + inputs
# Feed Forward Part
x = layers.LayerNormalization(epsilon=1e-6)(res)
x = layers.Conv1D(filters=ff_dim, kernel_size=1, activation="relu")(x)
x = layers.Dropout(dropout)(x)
x = layers.Conv1D(filters=inputs.shape[-1], kernel_size=1)(x)
return x + res
def build_model(
input_shape,
head_size,
num_heads,
ff_dim,
num_transformer_blocks,
mlp_units,
dropout=0,
mlp_dropout=0,
):
inputs = keras.Input(shape=input_shape)
x = inputs
for _ in range(num_transformer_blocks):
x = transformer_encoder(x, head_size, num_heads, ff_dim, dropout)
x = layers.GlobalAveragePooling1D(data_format="channels_first")(x)
for dim in mlp_units:
x = layers.Dense(dim, activation="relu")(x)
x = layers.Dropout(mlp_dropout)(x)
outputs = layers.Dense(train_p.shape[1])(x)
return keras.Model(inputs, outputs)
input_shape = (train_f.shape[1], train_f.shape[2])
model = build_model(
input_shape,
head_size=256,
num_heads=4,
ff_dim=4,
num_transformer_blocks=4,
mlp_units=[128],
mlp_dropout=0.4,
dropout=0.25,
)
model.compile(loss=tf.keras.losses.mean_absolute_error,
optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
metrics=["mse"])
model.summary()
history = model.fit(train_f, train_p, epochs=100, batch_size = 32, validation_split = 0.25, verbose = 1)
trainYPredict = model.predict(train_f)
**Inverse transform the prediction and keep the last value(output)**
trainYPredict1 = np.repeat(trainYPredict, scaled_train_X.shape[1], axis = -1)
trainYPredict_actual = scaler.inverse_transform(trainYPredict1)[:, -1]
train_p_actual = np.repeat(train_p, scaled_train_X.shape[1], axis = -1)
train_p_actual1 = scaler.inverse_transform(train_p_actual)[:, -1]
Prediction_mse=mean_squared_error(train_p_actual1 ,trainYPredict_actual)
print("Mean Squared Error of prediction is:", str(Prediction_mse))
Prediction_rmse =sqrt(Prediction_mse)
print("Root Mean Squared Error of prediction is:", str(Prediction_rmse))
prediction_r2=r2_score(train_p_actual1 ,trainYPredict_actual)
print("R2 score of predictions is:", str(prediction_r2))
prediction_mae=mean_absolute_error(train_p_actual1 ,trainYPredict_actual)
print("Mean absolute error of prediction is:", prediction_mae)
**Testing of model**
scaled_test_X = scaler.transform(test_X)
test_q = []
test_r = []
for val in range(n_past, len(scaled_test_X) - n_future+1):
test_q.append(scaled_test_X[val - n_past:val, 0:scaled_test_X.shape[1]])
test_r.append(scaled_test_X[val + n_future - 1:val + n_future, -1])
test_q, test_r = np.array(test_q), np.array(test_r)
testPredict = model.predict(test_q)

Shapes are incompatible at the last records of tf.data.Dataset.from_tensor_slices

I have implemented seq2seq translation model in Tensorflow 2.0
But during training I get the following error:
ValueError: Shapes (2056, 10, 10000) and (1776, 10, 10000) are incompatible
I have 10000 records in my dataset. Starting from the first record untill 8224 records dimensions matches. But for the last 1776 records I get the error mentioned above just because my batch_size is bigger than remaining number of records. Here is my code:
max_seq_len_output = 10
n_words = 10000
batch_size = 2056
model = Model_translation(batch_size = batch_size,embed_size = embed_size,total_words = n_words , dropout_rate = dropout_rate,num_classes = n_words,embedding_matrix = embedding_matrix)
dataset_train = tf.data.Dataset.from_tensor_slices((encoder_input,decoder_input,decoder_output))
dataset_train = dataset_train.shuffle(buffer_size = 1024).batch(batch_size)
loss_object = tf.keras.losses.CategoricalCrossentropy()#used in backprop
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
train_loss = tf.keras.metrics.Mean(name='train_loss')#mean of the losses per observation
train_accuracy =tf.keras.metrics.CategoricalAccuracy(name='train_accuracy')
##### no #tf.function here
def training(X_1,X_2,y):
#creation of one-hot-encoding, because if I would do it out of the loop if would have RAM problem
y_numpy = y.numpy()
Y = np.zeros((batch_size,max_seq_len_output,n_words),dtype='float32')
for i, d in enumerate(y_numpy):
for t, word in enumerate(d):
if word != 0:
Y[i, t, word] = 1
Y = tf.convert_to_tensor(Y)
#predictions
with tf.GradientTape() as tape:#Trainable variables (created by tf.Variable or tf.compat.v1.get_variable, where trainable=True is default in both cases) are automatically watched.
predictions = model(X_1,X_2)
loss = loss_object(Y,predictions)
gradients = tape.gradient(loss,model.trainable_variables)
optimizer.apply_gradients(zip(gradients,model.trainable_variables))
train_loss(loss)
train_accuracy(Y,predictions)
del Y
del y_numpy
EPOCHS = 70
for epoch in range(EPOCHS):
for X_1,X_2,y in dataset_train:
training(X_1,X_2,y)
template = 'Epoch {}, Loss: {}, Accuracy: {}'
print(template.format(epoch+1,train_loss.result(),train_accuracy.result()*100))
# Reset the metrics for the next epoch
train_loss.reset_states()
train_accuracy.reset_states()
How can I fixt this problem?
One solution would be to drop the remainder during batching with
dataset_train = dataset_train.shuffle(buffer_size = 1024).batch(batch_size, drop_remainder=True)

Weighted Pixel Wise Categorical Cross Entropy for Semantic Segmentation

I have recently started learning about Semantic Segmentation. I am trying to train a UNet for the same. My input is RGB 128x128x3 images. My masks are made up of 4 classes 0, 1, 2, 3 and are One-Hot Encoded with dimension 128x128x4.
def weighted_cce(y_true, y_pred):
weights = []
t_inf = tf.convert_to_tensor(1e9, dtype = 'float32')
t_zero = tf.convert_to_tensor(0, dtype = 'int64')
for i in range(0, 4):
l = tf.argmax(y_true, axis = -1) == i
n = tf.cast(tf.math.count_nonzero(l), 'float32') + K.epsilon()
weights.append(n)
weights = [batch_size/j for j in weights]
y_pred /= K.sum(y_pred, axis=-1, keepdims=True)
# clip to prevent NaN's and Inf's
y_pred = K.clip(y_pred, K.epsilon(), 1 - K.epsilon())
# calc
loss = y_true * K.log(y_pred) * weights
loss = -K.sum(loss, -1)
return loss
This is the loss function that I am using but it classifies every pixel as 2. What am I doing wrong?
You should have weights based on you entire data (unless your batch size is reasonably big so you have sort of stable weights).
If some class is underrepresented, with a small batch size, it will have near infinity weights.
If your target data is numpy array:
shp = y_train.shape
totalPixels = shp[0] * shp[1] * shp[2]
weights = np.sum(y_train, axis=(0, 1, 2)) #final shape (4,)
weights = totalPixels/weights
If your data is in a Sequence generator:
totalPixels = 0
counts = np.zeros((4,))
for i in range(len(generator)):
x, y = generator[i]
shp = y.shape
totalPixels += shp[0] * shp[1] * shp[2]
counts = counts + np.sum(y, axis=(0,1,2))
weights = totalPixels / counts
If your data is in a yield generator (you must know how many batches you have in an epoch):
for i in range(batches_per_epoch):
x, y = next(generator)
#the rest is equal to the Sequence example above
Attempt 1
I don't know if newer versions of Keras are able to handle this, but you can try the simplest approach first: simply call fit or fit_generator with the class_weight argument:
model.fit(...., class_weight = {0: weights[0], 1: weights[1], 2: weights[2], 3: weights[3]})
Attempt 2
Make a healthier loss function:
weights = weights.reshape((1,1,1,4))
kWeights = K.constant(weights)
def weighted_cce(y_true, y_pred):
yWeights = kWeights * y_pred #shape (batch, 128, 128, 4)
yWeights = K.sum(yWeights, axis=-1) #shape (batch, 128, 128)
loss = K.categorical_crossentropy(y_true, y_pred) #shape (batch, 128, 128)
wLoss = yWeights * loss
return K.sum(wLoss, axis=(1,2))

Backpropagation Using Tensorflow and Numpy MSE not Dropping

I am trying to create a Backpropagation but I do not want to use the GradientDescentOptimizer from TF. I just wanted to update my own weights and biases. The problem is that the Mean Square Error or Cost is not approaching to zero. It just stays at some 0.2xxx. Is it because of my inputs which are 520x1600 (yes, each input has 1600 units and yes, there are 520 of them) or my number of neurons in the Hidden Layer is problematic? I have tried implementing this using the GradientDescentOptimizer and minimize(cost) which is working fine (Cost reduces near to zero as training goes on) but maybe I have an issue in my code of updating the weights and biases.
Here's my code:
import tensorflow as tf
import numpy as np
from BPInputs40 import pattern, desired;
#get the inputs and desired outputs, 520 inputs, each has 1600 units
train_in = pattern
train_out = desired
learning_rate=tf.constant(0.5)
num_input_neurons = len(train_in[0])
num_output_neurons = len(train_out[0])
num_hidden_neurons = 20
#weight matrix initialization with random values
w_h = tf.Variable(tf.random_normal([num_input_neurons, num_hidden_neurons]), dtype=tf.float32)
w_o = tf.Variable(tf.random_normal([num_hidden_neurons, num_output_neurons]), dtype=tf.float32)
b_h = tf.Variable(tf.random_normal([1, num_hidden_neurons]), dtype=tf.float32)
b_o = tf.Variable(tf.random_normal([1, num_output_neurons]), dtype=tf.float32)
# Model input and output
x = tf.placeholder("float")
y = tf.placeholder("float")
def sigmoid(v):
return tf.div(tf.constant(1.0),tf.add(tf.constant(1.0),tf.exp(tf.negative(v*0.001))))
def derivative(v):
return tf.multiply(sigmoid(v), tf.subtract(tf.constant(1.0), sigmoid(v)))
output_h = tf.sigmoid(tf.add(tf.matmul(x,w_h),b_h))
output_o = tf.sigmoid(tf.add(tf.matmul(output_h,w_o),b_o))
error = tf.subtract(output_o,y) #(1x35)
mse = tf.reduce_mean(tf.square(error))
delta_o=tf.multiply(error,derivative(output_o))
delta_b_o=delta_o
delta_w_o=tf.matmul(tf.transpose(output_h), delta_o)
delta_backprop=tf.matmul(delta_o,tf.transpose(w_o))
delta_h=tf.multiply(delta_backprop,derivative(output_h))
delta_b_h=delta_h
delta_w_h=tf.matmul(tf.transpose(x),delta_h)
#updating the weights
train = [
tf.assign(w_h, tf.subtract(w_h, tf.multiply(learning_rate, delta_w_h))),
tf.assign(b_h, tf.subtract(b_h, tf.multiply(learning_rate, tf.reduce_mean(delta_b_h, 0)))),
tf.assign(w_o, tf.subtract(w_o, tf.multiply(learning_rate, delta_w_o))),
tf.assign(b_o, tf.subtract(b_o, tf.multiply(learning_rate, tf.reduce_mean(delta_b_o, 0))))
]
sess = tf.Session()
sess.run(tf.global_variables_initializer())
err,target=1, 0.005
epoch, max_epochs = 0, 2000000
while epoch < max_epochs:
epoch += 1
err, _ = sess.run([mse, train],{x:train_in,y:train_out})
if (epoch%1000 == 0):
print('Epoch:', epoch, '\nMSE:', err)
answer = tf.equal(tf.floor(output_o + 0.5), y)
accuracy = tf.reduce_mean(tf.cast(answer, "float"))
print(sess.run([output_o], feed_dict={x: train_in, y: train_out}));
print("Accuracy: ", (1-err) * 100 , "%");
Update: I got it working now. The MSE dropped to almost zero once I increased the number of neurons in the hidden layer. I tried using 5200 and 6400 neurons for the hidden layer and with just 5000 epochs, the accuracy was almost 99%. Also, the largest learning rate I used is 0.1 because when above that, the MSE will not be close to zero.
I'm not an expert in this field, but it looks like your weights are updated correctly. And the fact that your MSE decreases from some higher values to 0.2xxx is the strong indicator of that. I would definitely try to run this problem with way more hidden neurons (e.g. 500)
Btw, are your inputs normalized? If not, that obviously could be the reason

Tensorflow Dropout implementation, test accuracy = train accuracy and low, why?

I have tried dropout implementation in Tensorflow.
I do know that dropout should be declared as a placeholder and keep_prob parameter during training and testing should be different. However still almost broke my brain trying to find why with dropout the accuracy is so low. When keep_drop = 1, the train accuracy 99%, test accuracy 85%, with keep_drop = 0.5, both train and test accuracy is 16% Any ideas where to look into, anyone? Thank you!
def forward_propagation(X, parameters, keep_prob):
"""
Implements the forward propagation for the model: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX
Arguments:
X -- input dataset placeholder, of shape (input size, number of examples)
parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3"
the shapes are given in initialize_parameters
Returns:
Z3 -- the output of the last LINEAR unit
"""
# Retrieve the parameters from the dictionary "parameters"
W1 = parameters['W1']
b1 = parameters['b1']
W2 = parameters['W2']
b2 = parameters['b2']
W3 = parameters['W3']
b3 = parameters['b3']
Z1 = tf.add(tf.matmul(W1,X),b1) # Z1 = np.dot(W1, X) + b1
A1 = tf.nn.relu(Z1) # A1 = relu(Z1)
A1 = tf.nn.dropout(A1,keep_prob) # apply dropout
Z2 = tf.add(tf.matmul(W2,A1),b2) # Z2 = np.dot(W2, a1) + b2
A2 = tf.nn.relu(Z2) # A2 = relu(Z2)
A2 = tf.nn.dropout(A2,keep_prob) # apply dropout
Z3 = tf.add(tf.matmul(W3,A2),b3) # Z3 = np.dot(W3,A2) + b3
return Z3
def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.0001, lambd = 0.03, train_keep_prob = 0.5,
num_epochs = 800, minibatch_size = 32, print_cost = True):
"""
Implements a three-layer tensorflow neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SOFTMAX.
Arguments:
X_train -- training set, of shape (input size = 12288, number of training examples = 1080)
Y_train -- test set, of shape (output size = 6, number of training examples = 1080)
X_test -- training set, of shape (input size = 12288, number of training examples = 120)
Y_test -- test set, of shape (output size = 6, number of test examples = 120)
learning_rate -- learning rate of the optimization
lambd -- L2 regularization hyperparameter
train_keep_prob -- probability of keeping a neuron in hidden layer for dropout implementation
num_epochs -- number of epochs of the optimization loop
minibatch_size -- size of a minibatch
print_cost -- True to print the cost every 100 epochs
Returns:
parameters -- parameters learnt by the model. They can then be used to predict.
"""
ops.reset_default_graph() # to be able to rerun the model without overwriting tf variables
tf.set_random_seed(1) # to keep consistent results
seed = 3 # to keep consistent results
(n_x, m) = X_train.shape # (n_x: input size, m : number of examples in the train set)
n_y = Y_train.shape[0] # n_y : output size
costs = [] # To keep track of the cost
# Create Placeholders of shape (n_x, n_y)
X, Y = create_placeholders(n_x, n_y)
keep_prob = tf.placeholder(tf.float32)
# Initialize parameters
parameters = initialize_parameters()
# Forward propagation: Build the forward propagation in the tensorflow graph
Z3 = forward_propagation(X, parameters, keep_prob)
# Cost function: Add cost function to tensorflow graph
cost = compute_cost(Z3, Y, parameters, lambd)
# Backpropagation.
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)
# Initialize all the variables
init = tf.global_variables_initializer()
# Start the session to compute the tensorflow graph
with tf.Session() as sess:
# Run the initialization
sess.run(init)
# Do the training loop
for epoch in range(num_epochs):
epoch_cost = 0. # Defines a cost related to an epoch
num_minibatches = int(m / minibatch_size) # number of minibatches of size minibatch_size in the train set
seed = seed + 1
minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)
for minibatch in minibatches:
# Select a minibatch
(minibatch_X, minibatch_Y) = minibatch
# IMPORTANT: The line that runs the graph on a minibatch.
# Run the session to execute the "optimizer" and the "cost", the feedict should contain a minibatch for (X,Y).
_ , minibatch_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y, keep_prob: train_keep_prob})
epoch_cost += minibatch_cost / num_minibatches
# Print the cost every epoch
if print_cost == True and epoch % 100 == 0:
print ("Cost after epoch %i: %f" % (epoch, epoch_cost))
if print_cost == True and epoch % 5 == 0:
costs.append(epoch_cost)
# plot the cost
plt.plot(np.squeeze(costs))
plt.ylabel('cost')
plt.xlabel('iterations (per tens)')
plt.title("Learning rate =" + str(learning_rate))
plt.show()
# lets save the parameters in a variable
parameters = sess.run(parameters)
print ("Parameters have been trained!")
# Calculate the correct predictions
correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))
# Calculate accuracy on the test set
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print ("Train Accuracy:", accuracy.eval({X: X_train, Y: Y_train, keep_prob: 1.0}))
print ("Test Accuracy:", accuracy.eval({X: X_test, Y: Y_test, keep_prob: 1.0}))
return parameters
The algo is correct. It is just the keep_prob = 0.5 is too low.
Managed to get 87% accuracy on the test set with the following hyperparameters:
learning_rate = 0.00002, lambd = 0.03, train_keep_prob = 0.90, num_epochs = 1500, minibatch_size = 32,
In the first case your model was overfitting to the data, hence the large difference between the train and test accuracy. Dropout is a regularization technique to reduce the variance of the model by reducing the effect of particular nodes and hence prevent overfitting. But keeping the keep_prob = 0.5(too low) weakens the model and hence it underfits severely to the data, giving an accuracy as low as 16%. You should iterate by gradually decreasing the keep_prob value untill you find a suitable value.