CNN sudden drop in accuracy after steady increase over ~24 epochs - pandas

I'm attempting to create a CNN to classify emotions based on facial expressions from an image. I'm using this dataset I found on Kaggle, in csv format with each row having grayscale image data and emotion. It has ~30,000 data points randomly split into 80% training, 10% validation and 10% testing.
While adjusting settings and structure of my CNN I've experienced at least one of these problems at a time.
CNN only (or mainly) predicting one or two outputs.
Validation Accuracy that does not change at all
Accuracy fluctuating around equivalent to random guessing (16.66%) +/- 10%
Constant accuracy and loss for both training and data
I've tried changing learning rate, optimizer, batch size, number of filters, epochs. I've used a different data set and tried class weights for slight imbalance as well as varying numbers of hidden layers and complexities.
I've also tried training on one sample but got this for accuracy and this for loss.
The following graphs are for the configuration currently in the code, though changing what I've listed above just results in any of the problems I mentioned.
Train/Val Accuracy during training
Train/Vall Loss during training
I suspected it may be due to labels being mismatched with data, so I manually check whether labels and images matched in train_set, which they did.
However using the following code to check them in all_sets (after conversion) gave several mismatched data samples out of 16.
for num, row in enumerate(all_sets[0]):
if num < 16:
newimage = tensorflow.keras.preprocessing.image.array_to_img(row)
newimage.save("filename.png") # takes type from filename extension
display(Image('filename.png'))
print(classes[all_labels[0][num]])
!rm -rf filename.png
Conversion code in question:
tempSets = [train_set, test_set, val_set] #store all sets in list to iterate through
#panda reads image from CSV as a string rather than an array of floats.
#convert from panda dataframe to numpy array, as easier to feed into image augmentation
set_sizes = [train_set.shape[0] , test_set.shape[0], val_set.shape[0]] #Array storing number of records for each set
all_sets = [np.empty([set_sizes[0], 48,48,1], dtype=float), # train
np.empty([set_sizes[1], 48,48,1], dtype=float), # test
np.empty([set_sizes[2], 48,48,1], dtype=float)] # validate
all_labels = [np.empty(set_sizes[0], dtype=int), # train
np.empty(set_sizes[1], dtype=int), # test
np.empty(set_sizes[2], dtype=int)] # validate
for count, val in enumerate(tempSets): # for each set
for num, row in enumerate(val.itertuples()): #each row in a set, store as a tuple and keep index num
num_str_list = row._2.split() # split long string into array of string seperated by whitespaces (_2 is emotion, for some reason it renames itself)
for convertI in range(len(num_str_list)): # for each element in string array
num_str_list[convertI] = float(num_str_list[convertI]) #convert to float and store in new (1d) array
for xPixel in range(48):
for yPixel in range(48):
#match indexes to map 1d array contents into 2d
all_sets[count][num, xPixel, yPixel, 0] = num_str_list[xPixel*48 + yPixel]
all_labels[count][num] = data['emotion'][num] #assign labels with matching index
I'm afraid I'm lost as to where the issue is, as I can't find problems in my conversion or in displaying the images. I would appreciate any help. Thank you.
My full code:
drive.mount('/content/drive', force_remount=False)
driveContent='/content/drive/My Drive/EmotionClassification'
#Set base_dir as current working directory
base_dir = os.getcwd()
#Check whether training data is present in current working directory
dataSetIsInColab = False
for fname in os.listdir(base_dir): #get all names in current directory
if fname == 'icml_face_data.csv':
dataSetIsInColab = True
if dataSetIsInColab == False:
#dataset.zip is a zip file containing a CSV file which holds the dataset.
zip_path = os.path.join(driveContent,'dataset.zip') #grab content from drive
!cp "{zip_path}" . #Copy zip file to current working directory
!unzip -q dataset.zip # Unzip (-q prevents printing all files)
!rm dataset.zip #Remove zip file since we now have unzipped data
data = panda.read_csv('icml_face_data.csv') # import data as panda dataframe object
data.pop(' Usage') #remove unnecessary column
data.rename(columns={" pixels":"pixels"}) # remove space from column name as it can cause issues
data[data['emotion']!=1] #remove disgust emotion due to large imbalance
data.loc[data['emotion'] > 1, 'emotion'] = data['emotion'] - 1 #shift emotion values to fill disgust slot
classes = ['anger', 'fear', 'happiness', 'sadness', 'surprise', 'neutral']
#Use panda dataframe methods to randomly split dataset
train_set = data.sample(frac=0.8) # Train set is 80%
temp_set = data.drop(train_set.index) # Temp set is 20%
test_set = temp_set.sample(frac=0.5) #Test set is half of 20%
val_set = temp_set.drop(test_set.index) #Val set is the other half of 20%
#train_set.reset_index(drop=True, inplace=True)
#test_set.reset_index(drop=True, inplace=True)
#val_set.reset_index(drop=True, inplace=True)
tempSets = [train_set, test_set, val_set] #store all sets in list to iterate through
#panda reads image from CSV as a string rather than an array of floats.
#convert from panda dataframe to numpy array, as easier to feed into image augmentation
set_sizes = [train_set.shape[0] , test_set.shape[0], val_set.shape[0]] #Array storing number of records for each set
all_sets = [np.empty([set_sizes[0], 48,48,1], dtype=float), # train
np.empty([set_sizes[1], 48,48,1], dtype=float), # test
np.empty([set_sizes[2], 48,48,1], dtype=float)] # validate
all_labels = [np.empty(set_sizes[0], dtype=int), # train
np.empty(set_sizes[1], dtype=int), # test
np.empty(set_sizes[2], dtype=int)] # validate
for count, val in enumerate(tempSets): # for each set
for num, row in enumerate(val.itertuples()): #each row in a set, store as a tuple and keep index num
num_str_list = row._2.split() # split long string into array of string seperated by whitespaces (_2 is emotion, for some reason it renames itself)
for convertI in range(len(num_str_list)): # for each element in string array
num_str_list[convertI] = float(num_str_list[convertI]) #convert to float and store in new (1d) array
for xPixel in range(48):
for yPixel in range(48):
#match indexes to map 1d array contents into 2d
all_sets[count][num, xPixel, yPixel, 0] = num_str_list[xPixel*48 + yPixel]
all_labels[count][num] = data['emotion'][num] #assign labels with matching index
#Check to use GPU if possible
gpu_name = tensorflow.test.gpu_device_name()
if gpu_name != '/device:GPU:0':
print('GPU device not found')
print('Found GPU at: {}'.format(gpu_name))
#SET UP IMAGE AUGMENTATION AND INPUT
imgShape=(48,48,1) # input image dimensions (greyscale)
batchSize = 64
#this datagen is applied to training sets
dataGen = ImageDataGenerator(
rescale=1./255,
rotation_range=45,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
vertical_flip = True)
# Set up data generator batches, shapes and class modes
trainingDataGen = dataGen.flow(
x=all_sets[0],
y=tensorflow.keras.utils.to_categorical(all_labels[0]),
batch_size=batchSize)
testGen = ImageDataGenerator(rescale=1./255).flow(
x=all_sets[1],
y=tensorflow.keras.utils.to_categorical(all_labels[1]))
validGen = ImageDataGenerator(rescale=1./255).flow(
x=all_sets[2],
y=tensorflow.keras.utils.to_categorical(all_labels[2]),
batch_size=batchSize)
# BUILD THE MODEL
model = Sequential()
model.add(Conv2D(256, (3,3), activation='relu', input_shape = imgShape))
model.add(MaxPooling2D((2,2)))
model.add(Dropout(0.1))
model.add(Conv2D(512, (3,3), activation='relu'))
model.add(MaxPooling2D((2,2)))
model.add(Dropout(0.1))
model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(Dense(6, activation='softmax'))
model.summary() # Show the structure of the network
opt = tensorflow.keras.optimizers.Adam(learning_rate = 0.0001) # initialise optimizer
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
weights = sklearn.utils.class_weight.compute_class_weight('balanced', np.unique(all_labels[0]), all_labels[0]) # use class weights for imbalanced classes
weights = {i : weights[i] for i in range(6)}
history = model.fit(
trainingDataGen,
batch_size = batchSize,
epochs = 32,
class_weight = weights,
validation_data = validGen
)
EDIT: I found the issue in my conversion, I was assigning labels based on the unsplit data instead of each split set individually.
From all_labels[count][num] = data['emotion'][num] to
all_labels[count][num] = row.emotion
Labels are matching now. Training accuracy/loss steadily improves, but validation accuracy and loss are fluctuating a lot. I'm going to do some more tweaking, but is this something I should worry about?
EDIT 2: Made title more fitting
So I left it training for more epochs, and here are the results.
accuracy and loss.
Peaking at 35% isn't ideal
EDIT 3: It's back to fluctuating wildly.
accuracy and loss.
I used some code to test predictions of my network after training. Here are results;
Total tests performed: 4068
Total accuracy: 0.17649950835791545
anger : {'Successes': 0, 'Occured': 567} Accuracy: 0.0
fear : {'Successes': 0, 'Occured': 633} Accuracy: 0.0
happiness : {'Successes': 0, 'Occured': 1023} Accuracy: 0.0
sadness : {'Successes': 0, 'Occured': 685} Accuracy: 0.0
surprise : {'Successes': 0, 'Occured': 442} Accuracy: 0.0
neutral : {'Successes': 718, 'Occured': 718} Accuracy: 1.0
Code below;
#Test our model's performance
truth = []
predict = []
#Test a batch of 128 test images against the model
for i in range(128):
a,b = next(testGen)
predict.append(model.predict(a))
truth.append(b)
predict = np.concatenate(predict) #Array of predictions model has made
truth = np.concatenate(truth) #Array of true results
successes = 0
items = []
for i in range(len(classes)):
items.append({"Successes" : 0, "Occured" : 0})
for i in range(0, len(predict)):
items[np.argmax(truth[i])]["Occured"] += 1
if (np.argmax(predict[i]) == np.argmax(truth[i])): # If the model's prediction matches the true value
successes += 1
items[np.argmax(truth[i])]["Successes"] += 1
print("Total tests performed: ", len(predict))
print("Total accuracy: ", successes/len(predict))
print()
for i in range(len(classes)):
print(classes[i], ':', items[i], "Accuracy: ", (items[i]["Successes"]/items[i]["Occured"]))

Related

How to split mnist dataset into smaller size and adding augmentation to it?

I have this problem of splitting mnist dataset + adding augmentation data. i want to take only total of 22000(including training + test set) data from mnist dataset which is 70000. mnist dataset have 10 label. im only using shear, rotation, width-shift, and heigh-shift for augmetation method.
training set --> 20000(total) --> 20 images + 1980 augmentation images(per label)
test set --> 2000(total) --> 200 images(per label)
i also want to make sure that the class distribution is preserved in the split.
i'm really confused how to split those data. would gladly if anyone can provide the code.
i have tried this code :
# Load the MNIST dataset
(x_train_full, y_train_full), (x_test_full, y_test_full) = keras.datasets.mnist.load_data()
# Normalize the data
x_train_full = x_train_full / 255.0
x_test_full = x_test_full / 255.0
# Create a data generator for data augmentation
data_gen = ImageDataGenerator(shear_range=0.2, rotation_range=20,
width_shift_range=0.2, height_shift_range=0.2)
# Initialize empty lists for the training and test sets
x_train, y_train, x_test, y_test = [], [], [], []
# Loop through each class/label
for class_n in range(10):
# Get the indices of the images for this class
class_indices = np.where(y_train_full == class_n)[0]
# Select 20 images for training
train_indices = np.random.choice(class_indices, 20, replace=False)
# Append the training images and labels to the respective lists
x_train.append(x_train_full[train_indices])
y_train.append(y_train_full[train_indices])
# Select 200 images for test
test_indices = np.random.choice(class_indices, 200, replace=False)
# Append the test images and labels to the respective lists
x_test.append(x_test_full[test_indices])
y_test.append(y_test_full[test_indices])
# Generate 100 augmented images for training
x_augmented = data_gen.flow(x_train_full[train_indices], y_train_full[train_indices], batch_size=100)
# Append the augmented images and labels to the respective lists
x_train.append(x_augmented[0])
y_train.append(x_augmented[1])
# Concatenate the list of images and labels to form the final training and test sets
x_train = np.concatenate(x_train)
y_train = np.concatenate(y_train)
x_test = np.concatenate(x_test)
y_test = np.concatenate(y_test)
print("training set shape: ", x_train.shape)
print("training label shape: ", y_train.shape)
print("test set shape: ", x_test.shape)
print("test label shape: ", y_test.shape)
but it keep saying error like this :
IndexError: index 15753 is out of bounds for axis 0 with size 10000
You are mixing the train and test set. In the loop, you are getting the class_indices from the train set:
# Get the indices of the images for this class
class_indices = np.where(y_train_full == class_n)[0]
but then you are using these train indices (that might be numbers above 10000!) to address indices in the testset (that has only 10000 samples) some lines further down:
# Select 200 images for test
test_indices = np.random.choice(class_indices, 200, replace=False)
So, you will need to do the same index-selection for the label in the loop for the test-set and it should work out.

XGBoost iterative training: Not having all 0,...,C labels in minibatch without erroring

When training XGBoost iteratively for data too large to fit in memory, one may want to use "batches". The problem is, however, that each batch may not contain all 0,...,C labels. This leads to the error ValueError: The label must consist of integer labels of form 0, 1, 2, ..., [num_class-1] -
Is there a way to train XGBoost where we just have some subset of the labels, which may not contain zero?
The code has structure similar to this:
train = module.trainloader
test = module.valloader
# Train on one minibatch to get started
sample = next(iter(loader))
X = xgb.DMatrix(sample[0].numpy(), label=sample[1].numpy())
params = {
'learning_rate': 0.007,
'updater':'refresh',
'process_type': 'update',
}
# Get initial model training
model = xgb.train(params, dtrain=X)
for i, (trainsample, valsample) in enumerate(zip(train, test)):
X_train, y_train = trainsample
X_test, y_test = valsample
X_train = xgb.DMatrix(X_train, labels=y_train)
X_test = xgb.DMatrix(X_test)
model = xgb.train(params, dtrain=X_train, xgb_model=model)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(accuracy)

pytorch isn't running on gpu while true

I want to train on my local gpu but it's only running on cpu while torch.cuda.is_available() is actually true and i can see my gpu but it runs only on cpu , so how to fix it
my CNN model:
import torch.nn as nn
import torch.nn.functional as F
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
# define the CNN architecture
class Net(nn.Module):
### TODO: choose an architecture, and complete the class
def __init__(self):
super(Net, self).__init__()
## Define layers of a CNN
self.conv1 = nn.Conv2d(3, 16, 3, padding=1)
# convolutional layer (sees 16x16x16 tensor)
self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
# convolutional layer (sees 8x8x32 tensor)
self.conv3 = nn.Conv2d(32, 64, 3, padding=1)
# max pooling layer
self.pool = nn.MaxPool2d(2, 2)
# linear layer (64 * 4 * 4 -> 500)
self.fc1 = nn.Linear(64 * 28 * 28, 500)
# linear layer (500 -> 10)
self.fc2 = nn.Linear(500, 133)
# dropout layer (p=0.25)
self.dropout = nn.Dropout(0.25)
def forward(self, x):
## Define forward behavior
x = self.pool(F.relu(self.conv1(x)))
#print(x.shape)
x = self.pool(F.relu(self.conv2(x)))
#print(x.shape)
x = self.pool(F.relu(self.conv3(x)))
#print(x.shape)
#print(x.shape)
# flatten image input
x = x.view(-1, 64 * 28 * 28)
# add dropout layer
x = self.dropout(x)
# add 1st hidden layer, with relu activation function
x = F.relu(self.fc1(x))
# add dropout layer
x = self.dropout(x)
# add 2nd hidden layer, with relu activation function
x = self.fc2(x)
return x
#-#-# You so NOT have to modify the code below this line. #-#-#
# instantiate the CNN
model_scratch = Net()
# move tensors to GPU if CUDA is available
if use_cuda:
print("TRUE")
model_scratch = model_scratch.cuda()
train function :
def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path):
"""returns trained model"""
# initialize tracker for minimum validation loss
valid_loss_min = np.Inf
loaders_scratch = {'train': train_loader,'valid': valid_loader,'test': test_loader}
for epoch in range(1, n_epochs+1):
# initialize variables to monitor training and validation loss
train_loss = 0.0
valid_loss = 0.0
###################
# train the model #
###################
model.train()
for batch_idx, (data, target) in enumerate(loaders['train']):
# move to GPU
if use_cuda:
data, target = data.cuda(), target.cuda()
## find the loss and update the model parameters accordingly
## record the average training loss, using something like
## train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data - train_loss))
# clear the gradients of all optimized variables
optimizer.zero_grad()
# forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
# calculate the batch loss
loss = criterion(output, target)
# backward pass: compute gradient of the loss with respect to model parameters
loss.backward()
# perform a single optimization step (parameter update)
optimizer.step()
# update training loss
train_loss += loss.item()*data.size(0)
######################
# validate the model #
######################
model.eval()
for batch_idx, (data, target) in enumerate(loaders['valid']):
# move to GPU
if use_cuda:
data, target = data.cuda(), target.cuda()
## update the average validation loss
output = model(data)
# calculate the batch loss
loss = criterion(output, target)
# update average validation loss
valid_loss += loss.item()*data.size(0)
# calculate average losses
train_loss = train_loss/len(train_loader.dataset)
valid_loss = valid_loss/len(valid_loader.dataset)
# print training/validation statistics
print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
epoch,
train_loss,
valid_loss
))
## TODO: save the model if validation loss has decreased
if valid_loss <= valid_loss_min:
print('Validation loss decreased ({:.6f} --> {:.6f}). Saving model ...'.format(
valid_loss_min,
valid_loss))
torch.save(model.state_dict(), save_path)
valid_loss_min = valid_loss
# return trained model
return model
# train the model
loaders_scratch = {'train': train_loader,'valid': valid_loader,'test': test_loader}
model_scratch = train(100, loaders_scratch, model_scratch, optimizer_scratch,
criterion_scratch, use_cuda, 'model_scratch.pt')
# load the model that got the best validation accuracy
model_scratch.load_state_dict(torch.load('model_scratch.pt'))
while i am getting "TRUE" in torch.cuda.is_available() but still not running on GPU
i am only running on CPU
the below picture shows that i am running on cpu with 62%
To utilize cuda in pytorch you have to specify that you want to run your code on gpu device.
a line of code like:
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
will determine whether you have cuda available and if so, you will have it as your device.
later in the code you have to pass your tensors and model to this device:
net = net.to(device)
and do the same for your other tensors that need to go to gpu, like test and training values.
If you are experiencing an issue where your model is only using the CPU during training even though your GPU is available, it's likely due to the data loading and transformation process. When loading images from your local directory and applying transforms on your data, the majority of the time during training is spent on the data loading process, which is performed on the CPU.
To resolve this issue, you can preprocess your data by applying your custom transforms once and then saving the results. This way, when you load the preprocessed data, you can take advantage of the GPU's performance during training. This can help to significantly improve the training time of your model.
In summary, if you are facing a problem with model using CPU instead of GPU during training, it could be due to the data loading process. To fix this, preprocess your data and save the results, then use the preprocessed data while training. This will allow you to take advantage of the GPU's performance and reduce training time.

Custom TensorFlow loss function with batch size > 1?

I have some neural network with following code snippets, note that batch_size == 1 and input_dim == output_dim:
net_in = tf.Variable(tf.zeros(shape = [batch_size, input_dim]), dtype=tf.float32)
input_placeholder = tf.compat.v1.placeholder(shape = [batch_size, input_dim], dtype=tf.float32)
assign_input = net_in.assign(input_placeholder)
# Some matmuls, activations, dropouts, normalizations...
net_out = tf.tanh(output_before_activation)
def loss_fn(output, input):
#input.shape = output.shape = (batch_size, input_dim)
output = tf.reshape(output, [input_dim,]) # shape them into 1d vectors
input = tf.reshape(input, [input_dim,])
return my_fn_that_only_takes_in_vectors(output, input)
# Create session, preprocess data ...
for epoch in epoch_num:
for batch in range(total_example_num // batch_size):
sess.run(assign_input, feed_dict = {input_placeholder : some_appropriate_numpy_array})
sess.run(optimizer.minimize(loss_fn(net_out, net_in)))
Currently the neural network above works fine, but it is very slow because it updates gradient every sample (batch size = 1). I would like to set batch size > 1, but my_fn_that_only_takes_in_vectors cannot accommodate matrices whose first dimension is not 1. Due to the nature of my custom loss, flattening the batch input into a vector of length (batch_size * input_dim) seems to not work.
How would I write my new custom loss_fn now that the input and output are N x input_dim where N > 1? In Keras this would not have been an issue because keras somehow takes the average of the gradients of each example in the batch. For my TensorFlow function, should I take each row as a vector individually, pass them to my_fn_that_only_takes_in_vectors, then take the average of the results?
You can use a function that computes the loss on the whole batch, and works independently on the batch size. Basically the operations are applied to the whole first dimension of the input (the first dimension represents the element number in the batch). Here is an example, I hope this helps to see how the operations are carried out:
def my_loss(y_true, y_pred):
dx2 = tf.math.squared_difference(y_true[:, 0], y_true[:, 2]) # shape (BatchSize, )
dy2 = tf.math.squared_difference(y_true[:, 1], y_true[:, 3]) # shape: (BatchSize, )
denominator = dx2 + dy2 # shape: (BatchSize, )
dst_vec = tf.math.squared_difference(y_true, y_pred) # shape: (Batch, n_labels)
numerator = tf.reduce_sum(dst_vec, axis=-1) # shape: (BatchSize,)
loss_vector = tf.cast(numerator / denominator, dtype="float32") # shape: (BatchSize,) this is a vector containing the loss of each element of the batch
loss = tf.reduce_sum(loss_vector ) #if you want to sum the losses
return loss
I am not sure whether you need to return the sum or the avg of the losses for the batch.
If you sum, make sure to use a validation dataset with same batch size, otherwise the loss is not comparable.

Good training accuracy but bad evaluation

I trained a DNN model, get good training accuracy but bad evaluation accuracy.
def DNN_Metrix(shape, dropout):
model = tf.keras.Sequential()
print(shape)
model.add(tf.keras.layers.Flatten(input_shape=shape))
model.add(tf.keras.layers.Dense(10,activation=tf.nn.relu))
for i in range(0,2):
model.add(tf.keras.layers.Dense(10,activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(8,activation=tf.nn.tanh))
model.add(tf.keras.layers.Dense(1, activation=tf.nn.sigmoid))
model.compile(loss='binary_crossentropy',
optimizer=tf.keras.optimizers.Adam(),
metrics=['accuracy'])
return model
model_dnn = DNN_Metrix(shape=(28,20,1), dropout=0.1)
model_dnn.fit(
train_dataset,
steps_per_epoch=1000,
epochs=10,
verbose=2
)
Here is my training process, and result:
Epoch 10/10
- 55s - loss: 0.4763 - acc: 0.7807
But when I evaluation with test dataset, I got:
result = model_dnn.evaluate(np.array(X_test), np.array(y_test), batch_size=len(X_test))
loss, accuracy = [0.9485417604446411, 0.3649936616420746]
it's a binary classification, Positive label : Negetive label is about
0.37 : 0.63
I don't think it was result from overfiting, I have 700k instances when training, with shape of 28 * 20, and my DNN model is simple and have few parameters.
Here is my code when generating the test data and training data:
def parse_function(example_proto):
dics = {
'feature': tf.FixedLenFeature(shape=(), dtype=tf.string, default_value=None),
'label': tf.FixedLenFeature(shape=(2), dtype=tf.float32),
'shape': tf.FixedLenFeature(shape=(2), dtype=tf.int64)
}
parsed_example = tf.parse_single_example(example_proto, dics)
parsed_example['feature'] = tf.decode_raw(parsed_example['feature'], tf.float64)
parsed_example['feature'] = tf.reshape(parsed_example['feature'], [28,20,1])
label_t = tf.cast(parsed_example['label'], tf.int32)
parsed_example['label'] = parsed_example['label'][1]
return parsed_example['feature'], parsed_example['label']
def read_tfrecord(train_tfrecord):
dataset = tf.data.TFRecordDataset(train_tfrecord)
dataset = dataset.map(parse_function)
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.repeat(100)
dataset = dataset.batch(670)
return dataset
def read_tfrecord_test(test_tfrecord):
dataset = tf.data.TFRecordDataset(test_tfrecord)
dataset = dataset.map(parse_function)
return dataset
# tf_record_target = 'train_csv_temp_norm_vx.tfrecords'
train_files = 'train_baseline.tfrecords'
test_files = 'test_baseline.tfrecords'
train_dataset = read_tfrecord(train_files)
test_dataset = read_tfrecord_test(test_files)
it_test_dts = test_dataset.make_one_shot_iterator()
it_train_dts = train_dataset.make_one_shot_iterator()
X_test = []
y_test = []
el = it_test_dts.get_next()
count = 1
with tf.Session() as sess:
while True:
try:
x_t, y_t = sess.run(el)
X_test.append(x_t)
y_test.append(y_t)
except tf.errors.OutOfRangeError:
break
Judging from the fact that your data distribution in your test set is [37%-63%] and your final accuracy is 0.365, I would first check the labels predicted on the test set.
Most probably, all your predictions are of class 0, provided that class 0 amounts for 37% of your dataset. In this case, it means that your neural network is not able to learn anything on the training set, and you have a massive scenario of overfitting.
I recommend that you always use a validation set, so that at the end of each epoch, you would check to see if your neural network has learnt anything. In such a situation(like yours), you would see very fast the overfitting issue.
Training accuracy doesn't mean much. A NN can fit any random set of inputs and outputs, even if they're unrelated. That's why you want to use validation data.
After training look at your loss curves, this will give you a better idea of where things are going wrong.
NN's default to just guessing the most popular class it's seen in training data for classification problems. This is usually what happens when you haven't setup your experiment correctly.
And since your dealing with binary classification you might want to look at things like StratifiedKFold which will provided you folds of train/test data were the sample % is persevered.