Why am I obtaining high MAE values when constructing my deep neural network - tensorflow

I am trying to do hyperparameter optimization for a regression problem using a deep neural network.
I am getting slightly high MAE values (5-7) when in the literature this is around 0.9-5 (using the same dataset to train the NN).
Any idea of what I could improve? I can provide the dataset if needed, but it's very large.
X looks like this:
Y looks like this:
X is composed of 865432 rows=features and 134 columns=samples where rows are ordered by decreasing pearson correlation with variable Y which is the age of each sample (a float variable).
Each entry in X is a number between 0-1.
Since there are too many features for the number of samples I decided to take only the 40000 most important features. (Also because I don't know how to include feature selection within the training of the model).
Here is the loss vs epochs:
#!/usr/bin/env python3
import numpy as np
import pandas as pd
import os
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasRegressor
import keras.losses
from keras import backend as K
import keras_tuner as kt
from keras_tuner.tuners import RandomSearch
from sklearn.model_selection import cross_val_score
def dynamic_model(hp):
# hyperparameters (independent of number of layers)
# Number of hidden layers: 1 - 50
hp_num_layers = hp.Int("num_layers", 2, 10)
hp_learning_rate = hp.Choice("learning_rate", values=[1e-2, 1e-3, 1e-4])
# Initialize sequential API and start building model.
model = keras.Sequential()
model.add(keras.layers.InputLayer(input_shape=(CpG_num,)))
# Tune the number of hidden layers and units in each
for i in range(1, hp_num_layers):
# hyperparameters (dependent on num_layers):
hp_units = hp.Int(f"units_{i}", min_value=100, max_value=400, step=100)
hp_dropout_rate = hp.Float(f"dropout_{i}", 0, 0.5, step=0.1)
hp_activation = hp.Choice(f"act_{i}", ['LeakyReLU', 'PReLU'])
model.add(
keras.layers.Dense(units=hp_units, kernel_initializer='normal', activation=hp_activation)
)
model.add(keras.layers.Dropout(hp_dropout_rate))
# Add output layer
model.add(keras.layers.Dense(units=1, kernel_initializer='normal'))
# Define optimizer, loss, and metrics
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=hp_learning_rate),
loss=loss_function,
metrics=metrics
)
return model
def correlation_coefficient_loss(y_true, y_pred):
x = y_true
y = y_pred
mx = K.mean(x)
my = K.mean(y)
xm, ym = x-mx, y-my
print(type(mx))
r_num = K.sum(tf.multiply(xm,ym))
r_den = K.sqrt(tf.multiply(K.sum(K.square(xm)), K.sum(K.square(ym))))
r = r_num / r_den
r = K.maximum(K.minimum(r, 1.0), -1.0)
return (1 - K.square(r))
loss_function = 'mae'
metrics = ['mae', correlation_coefficient_loss]
scoring_function = 'neg_mean_absolute_error'
tuner_kind = 'random'
results_dir = '../tmp_files/Grid_Search_results/'+name+tuner_kind
hp = kt.HyperParameters()
if tuner_kind == 'random':
tuner = kt.RandomSearch(
dynamic_model,
objective = 'mae',
overwrite=True,
directory = results_dir,
project_name = 'random_tuner_trials',
max_trials = 500
)
tuner.search(
X1[name],Y1[name],
epochs=50,
validation_data=(X2[name],Y2[name]),
)
best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]
best_hps_model = dynamic_model(best_hps)
best_hps_model.fit(X1_train[name],y1_train[name],validation_data=(X2_train[name],y2_train[name]), epochs=500, batch_size=10)

Related

my CNN model claims it has 70+% accurcy but when i test it its actually 6%

im having an odd bug that i cant seem to debug
#import all the necessary libraries and be specific so as to avoid wasting time importing everything
import sys
import matplotlib.pyplot as plt
import numpy as np
from tensorflow.keras.applications import VGG16
from tensorflow.keras import optimizers
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import sklearn.metrics as metrics
from sklearn.metrics import confusion_matrix
import seaborn as sns
model_type = 'vgg16'
# Loading the VGG Model
vgg_model = VGG16(weights='imagenet', include_top=False, input_shape=(200,200,3))
vgg_model.trainable = False
model = tf.keras.Sequential([vgg_model,
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dropout(0.1),
tf.keras.layers.Dense(512, activation= "relu"),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.1),
tf.keras.layers.Dense(16, activation="softmax")
])
datagen = ImageDataGenerator(featurewise_center=True)
#retrieve the trian data using imagedatagen
train = datagen.flow_from_directory('/content/16_flowers/Train/',
class_mode='categorical', color_mode= 'rgb',batch_size=64, target_size=(200, 200))
test = datagen.flow_from_directory('/content/16_flowers/Test/',
class_mode='categorical', color_mode= 'rgb',batch_size=10, target_size=(200, 200))
base_learning_rate = 0.00005
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=base_learning_rate),loss = tf.keras.metrics.categorical_crossentropy,metrics=['accuracy'])
history = model.fit(train,epochs = 100 , validation_data = test)
#summarise_diagnostics(history)
model.save("vgg16CNN.model")
_, acc = model.evaluate(test, steps=len(test), verbose=0)
print('> %.3f' % (acc * 100.0))
plot_cm(model)
my dataset is a dataset of 16 flowers, the training data is 70 images per class and my validation/ test data is 10 images per class. when i run the code i get approx 99% training accuracy and 76% validation accuracy with 0.07 training loss and 0.7 validation loss.
but when i take the model which is produced and i use it to make predictions on the test dataset and then compare its predictions to the true classes i get around 6.25% accuracy, can anyone tell me why this might be?
below is my code to make predictions and retrun the accuracy of those predictions
datagen = ImageDataGenerator(featurewise_center=True)
test = datagen.flow_from_directory('/content/16_flowers/Test/',
class_mode='categorical', color_mode= 'rgb',batch_size=10, target_size=(200, 200))
predictions = model.predict(test)
predicted_classes = np.argmax(predictions, axis=1)
true_classes = test.classes
count = 0
for i in range(len(predicted_classes)):
print(true_classes[i],predicted_classes[i])
if true_classes[i] == predicted_classes[i]:
count +=1
print(count/160*100)
For test in flow_from_directory set shuffle=False to preserve the order of the predictions with respect to the files. Then use
classes=list(train_gen.class_indices.keys())
count=0
predictions = model.predict(test)
for i,p in enumerate(predictions):
index=np.argmax(p)
predicted_class=classes[index]
true_index=test.labels[i]
true_class= classes[true_index]
print('True class is ', true_class, . ' Predicted class is ', predicted class)
if index == true_index:
count +=1
accuracy = count * 100/len(predictions)
print('Accuracy om Test set is ', accuracy)

Keras/TensorFlow MNIST DCGAN: why does generator has almost zero loss from start?

I have constructed a DCGAN (deep convolutional generative adversarial network) inspired by this github repository. It is written in a more low level Tensorflow code that I tried transforming into Keras syntax instead.
Now, the network is quite heavy I think (around 4 million parameters), and I get this problem that during training that the generator network beats the discriminator network by a lot. I have not found any similar posts about this problem, since most of the time it is the discriminator that beats the generator (when in fact being fooled), or that we have mode collapse. So I am thinking there might be something wrong in the code (maybe the discriminator network is training when it shouldn't, or the loss function is wrong etc.). I have tried spotting the mistake but failed.
My code follows below:
from keras.models import Sequential
from keras.layers import Dense, Reshape, ReLU, LeakyReLU, BatchNormalization as BN#, tanh, sigmoid
from keras.layers.core import Activation, Flatten
from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import UpSampling2D, Conv2D, MaxPooling2D, Conv2DTranspose
from keras.optimizers import SGD, Adam
from keras.datasets import mnist
import time
import numpy as np
import math
from utils import load_mnist, load_lines, load_celebA
class dcgan(object):
def __init__(self, config):
"""
Args:
batch_size: The size of batch. Should be specified before training.
y_dim: (optional) Dimension of dim for y. [None]
z_dim: (optional) Dimension of dim for Z. [100]
gf_dim: (optional) Dimension of G filters in first conv layer. [64]
df_dim: (optional) Dimension of D filters in first conv layer. [64]
gfc_dim: (optional) Dimension of G units for for fully connected layer. [1024]
dfc_dim: (optional) Dimension of D units for fully connected layer. [1024]
c_dim: (optional) Dimension of image color. For grayscale input, set to 1. [3]
"""
self.build_model(config)
def build_model(self,config):
self.D = self.discriminator(config)
self.G = self.generator(config)
self.GAN = Sequential()
self.GAN.add(self.G)
self.D.trainable = False
self.GAN.add(self.D)
def discriminator(self,config):
input_shape = (config.x_h,config.x_w,config.x_d)
D = Sequential()
D.add(Conv2D(filters=config.df_dim,strides=2,padding='same',kernel_size=5,input_shape=input_shape))
D.add(LeakyReLU(alpha=0.2))
D.add(Conv2D(filters=config.df_dim*2,strides=2,padding='same',kernel_size=5))
D.add(BN(momentum=0.9,epsilon=1e-5))
D.add(LeakyReLU(alpha=0.2))
D.add(Conv2D(filters=config.df_dim*4,strides=2,padding='same',kernel_size=5))
D.add(BN(momentum=0.9,epsilon=1e-5))
D.add(LeakyReLU(alpha=0.2))
D.add(Conv2D(filters=config.df_dim*8,strides=2,padding='same',kernel_size=5))
D.add(BN(momentum=0.9,epsilon=1e-5))
D.add(LeakyReLU(alpha=0.2))
D.add(Flatten())
D.add(Dense(1))
D.add(Activation('sigmoid'))
print('D:')
D.summary()
return D
def generator(self,config):
G = Sequential()
G.add(Dense(input_dim=config.z_dim, units=config.gf_dim*8*4*4))
G.add(Reshape((4,4,config.gf_dim*8)))
G.add(BN(momentum=0.9,epsilon=1e-5))
G.add(ReLU())
G.add(Conv2DTranspose(filters=config.gf_dim*4,strides=2,padding='same',kernel_size=5))
G.add(BN(momentum=0.9,epsilon=1e-5))
G.add(ReLU())
G.add(Conv2DTranspose(filters=config.gf_dim*2,strides=2,padding='same',kernel_size=5))
G.add(BN(momentum=0.9,epsilon=1e-5))
G.add(ReLU())
if config.dataset not in ['mnist','lines']:
#more layers could (and should) be added in order to get correct output size of G
G.add(Conv2DTranspose(filters=config.gf_dim,strides=2,padding='same',kernel_size=5))
G.add(BN(momentum=0.9,epsilon=1e-5))
G.add(ReLU())
G.add(Conv2DTranspose(filters=config.c_dim,strides=2,padding='same',kernel_size=5))
G.add(Activation('tanh'))
print('G:')
G.summary()
return G
def train(self,config):
if config.dataset == 'mnist':
(X_train, y_train), (X_test, y_test) = load_mnist()
X_train = (X_train.astype(np.float32) - 127.5)/127.5
elif config.dataset == 'lines':
(X_train, y_train), (X_test, y_test) = load_lines()
elif config.dataset == 'celebA':
(X_train, y_train), (X_test, y_test) = load_celebA()
D_optim = Adam(learning_rate=config.learning_rate, beta_1=config.beta_1)
G_optim = Adam(learning_rate=config.learning_rate, beta_1=config.beta_1)
loss_f = 'binary_crossentropy'
#Compile models
self.D.compile(loss=loss_f,optimizer=D_optim)
self.D.trainable = True
self.G.compile(loss=loss_f,optimizer=G_optim)
self.GAN.compile(loss=loss_f,optimizer=G_optim)
batches = int(len(X_train)/config.batch_size) #int always rounds down --> no problem with running out of data
counter = 1
print('\n' * 1)
print('='*42)
print('-'*10,'Training initialized.','-'*10)
print('='*42)
print('\n' * 2)
start_time = time.time()
for epoch in range(config.epochs):
for batch in range(batches):
batch_X_real = X_train[int(batch*config.batch_size/2):int((batch+1)*config.batch_size/2)][np.newaxis].transpose(1,2,3,0)
batch_z = np.random.normal(0,1,size=(config.batch_size,config.z_dim))
batch_X_fake = self.G.predict(batch_z[0:int(config.batch_size/2)])
batch_X = np.concatenate((batch_X_real,batch_X_fake),axis=0)
batch_yd = np.concatenate((np.ones(int(config.batch_size/2)),np.zeros((int(config.batch_size/2)))))
batch_yg = np.ones((config.batch_size))
#maybe normalize values in X?
#Update D network
self.D.trainable = True
D_loss = self.D.train_on_batch(batch_X, batch_yd)
#Update G network
self.D.trainable = False
G_loss = self.GAN.train_on_batch(batch_z, batch_yg)
#Update G network again according to https://github.com/carpedm20/DCGAN-tensorflow.git
#G_loss = self.GAN.train_on_batch(batch_z, batch_yg)
# Ta tid på körningen
# print("[%8d Epoch:[%2d/%2d] [%4d/%4d] time: %4.4f, d_loss: %.8f, g_loss: %.8f" \
# % (counter, epoch, config.epoch, idx, batch_idxs,
# time.time() - start_time, errD_fake+errD_real, errG))
#Save losses to vectors in order to plot
#Print status and save images for each config.sample_freq iterations
if np.mod(counter,config.sample_freq) == 0:
print('Epoch: {}/{} | Batch: {}/{} | D-loss {} | G-loss {} | Time: {}'.format(epoch+1,config.epochs,batch+1,batches,D_loss,G_loss,time.time() - start_time))
counter += 1
print('\n' * 2)
print('='*38)
print('-'*10,'Training complete.','-'*10)
print('='*38)
The program runs slow, but if you try running it using this chunk of code:
#import model
from setup import model_config
#create configuration object
config = model_config(dataset='mnist',loadmodel=False, interpolation=False,epochs=20,batch_size=64,
z_dim=100,gf_dim=64,df_dim=64,gfc_dim=1024,dfc_dim=1024,
c_dim=1,sample_freq=10) # >> model=None << ny parameter!
if config.loadmodel:
# Pass model to model parameter in config, vet inte hur man gör
# model1 = LoadModel('Generator')
# model2 = LoadModel('Discriminator')
# model3 = LoadModel('DG')
#load existing model
pass
else:
dcgan = dcgan(config)
dcgan.train(config)
if config.interpolation:
#do interpolation
pass
it will start printing out progress and losses. I am certain there is some obvious error somewhere! If I have missed something, let me know what I can add in order to make this a better post!

Time-Series LSTM Model wrong prediction

I am practicing how to create an LSTM model on a univariate series using this dataset from Kaggle: https://www.kaggle.com/sumanthvrao/daily-climate-time-series-data
My issue is that I am unable to get an accurate prediction of the temperature and my loss seems to be going all over the place. I have tried multiple methods including
Ensuring that time series data is stationary
Changing the time steps
Changing the hyperparameters
Using a stacked LSTM model
I am really curious as to what is wrong with my code although I do have a few hypothesis:
I made an error when preprocessing the data
I introduced stationarity wrongly
This dataset requires a multivariate approach
%tensorflow_version 2.x # this line is not required unless you are in a notebook
import tensorflow as tf
from numpy import array
from numpy import argmax
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import os
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
# preparing independent and dependent features
def prepare_data(timeseries_data, n_features):
X, y =[],[]
for i in range(len(timeseries_data)):
# find the end of this pattern
end_ix = i + n_features
# check if we are beyond the sequence
if end_ix > len(timeseries_data)-1:
break
# gather input and output parts of the pattern
seq_x, seq_y = timeseries_data[i:end_ix], timeseries_data[end_ix]
X.append(seq_x)
y.append(seq_y)
return np.array(X), np.array(y)
# preparing independent and dependent features
def prepare_x_input(timeseries_data, n_features):
x = []
for i in range(len(timeseries_data)):
# find the end of this pattern
end_ix = i + n_features
# check if we are beyond the sequence
if end_ix > len(timeseries_data):
break
# gather input and output parts of the pattern
seq_x = timeseries_data[i:end_ix]
x.append(seq_x)
x = x[-1:]
#remove non-stationerity
#x = np.log(x)
return np.array(x)
#read data and filter temperature column
df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Weather Parameter/DailyDelhiClimateTrain.csv')
df.head()
temp_df = df.pop('meantemp')
plt.plot(temp_df)
#make data stationery
sta_temp_df = np.log(temp_df).diff()
plt.figure(figsize=(15,5))
plt.plot(sta_temp_df)
print(sta_temp_df)
time_step = 7
x, y = prepare_data(sta_temp_df, time_step)
n_features = 1
x = x.reshape((x.shape[0], x.shape[1], n_features))
model = Sequential()
model.add(LSTM(10, return_sequences=True, input_shape=(time_step, n_features)))
model.add(LSTM(10))
model.add(Dense(16, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
model.summary()
result = model.fit(x, y, epochs=800)
n_days = 113
pred_temp_df = list(temp_df)
test = sta_temp_df.copy()
sta_temp_df = list(sta_temp_df)
i = 0
while(i<n_days):
x_input = prepare_x_input(sta_temp_df, time_step)
print(x_input)
x_input = x_input.reshape((1, time_step, n_features))
#pass data into model
yhat = model.predict(x_input, verbose=0)
yhat.flatten
print(yhat[0][0])
sta_temp_df.append(yhat[0][0])
i = i+1
sta_temp_df[0] = np.log(temp_df[0])
cum_temp_df = np.exp(np.cumsum(sta_temp_df))
print(cum_temp_df)
My code is shown above. Would really appreciate if someone can identify what I did wrong here!

Loss function with derivative in TensorFlow 2

I am using TF2 (2.3.0) NN to approximate the function y which solves the ODE: y'+3y=0
I have defined cutsom loss class and function in which I am trying to differentiate the single output with respect to the single input so the equation holds, provided that y_true is zero:
from tensorflow.keras.losses import Loss
import tensorflow as tf
class CustomLossOde(Loss):
def __init__(self, x, model, name='ode_loss'):
super().__init__(name=name)
self.x = x
self.model = model
def call(self, y_true, y_pred):
with tf.GradientTape() as tape:
tape.watch(self.x)
y_p = self.model(self.x)
dy_dx = tape.gradient(y_p, self.x)
loss = tf.math.reduce_mean(tf.square(dy_dx + 3 * y_pred - y_true))
return loss
but running the following NN:
import tensorflow as tf
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense
from tensorflow.keras import Input
from custom_loss_ode import CustomLossOde
num_samples = 1024
x_train = 4 * (tf.random.uniform((num_samples, )) - 0.5)
y_train = tf.zeros((num_samples, ))
inputs = Input(shape=(1,))
x = Dense(16, 'tanh')(inputs)
x = Dense(8, 'tanh')(x)
x = Dense(4)(x)
y = Dense(1)(x)
model = Model(inputs=inputs, outputs=y)
loss = CustomLossOde(model.input, model)
model.compile(optimizer=Adam(learning_rate=0.01, beta_1=0.9, beta_2=0.99),loss=loss)
model.run_eagerly = True
model.fit(x_train, y_train, batch_size=16, epochs=30)
for now I am getting 0 loss from the fisrt epoch, which doesn't make any sense.
I have printed both y_true and y_test from within the function and they seem OK so I suspect that the problem is in the gradien which I didn't succeed to print.
Apprecitate any help
Defining a custom loss with the high level Keras API is a bit difficult in that case. I would instead write the training loop from scracth, as it allows a finer grained control over what you can do.
I took inspiration from those two guides :
Advanced Automatic Differentiation
Writing a training loop from scratch
Basically, I used the fact that multiple tape can interact seamlessly. I use one to compute the loss function, the other to calculate the gradients to be propagated by the optimizer.
import tensorflow as tf
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense
from tensorflow.keras import Input
num_samples = 1024
x_train = 4 * (tf.random.uniform((num_samples, )) - 0.5)
y_train = tf.zeros((num_samples, ))
inputs = Input(shape=(1,))
x = Dense(16, 'tanh')(inputs)
x = Dense(8, 'tanh')(x)
x = Dense(4)(x)
y = Dense(1)(x)
model = Model(inputs=inputs, outputs=y)
# using the high level tf.data API for data handling
x_train = tf.reshape(x_train,(-1,1))
dataset = tf.data.Dataset.from_tensor_slices((x_train,y_train)).batch(1)
opt = Adam(learning_rate=0.01, beta_1=0.9, beta_2=0.99)
for step, (x,y_true) in enumerate(dataset):
# we need to convert x to a variable if we want the tape to be
# able to compute the gradient according to x
x_variable = tf.Variable(x)
with tf.GradientTape() as model_tape:
with tf.GradientTape() as loss_tape:
loss_tape.watch(x_variable)
y_pred = model(x_variable)
dy_dx = loss_tape.gradient(y_pred, x_variable)
loss = tf.math.reduce_mean(tf.square(dy_dx + 3 * y_pred - y_true))
grad = model_tape.gradient(loss, model.trainable_variables)
opt.apply_gradients(zip(grad, model.trainable_variables))
if step%20==0:
print(f"Step {step}: loss={loss.numpy()}")

Convolutional neural network outputting equal probabilities for all labels

I am currently training a CNN on MNIST, and the output probabilities (softmax) are giving [0.1,0.1,...,0.1] as training goes on. The initial values aren't uniform, so I can't figure out if I'm doing something stupid here?
I'm only training for 15 steps, just to see how training progresses; even though that's a low number, I don't think that should result in uniform predictions?
import numpy as np
import tensorflow as tf
import imageio
from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original')
# Getting data
from sklearn.model_selection import train_test_split
def one_hot_encode(data):
new_ = []
for i in range(len(data)):
_ = np.zeros([10],dtype=np.float32)
_[int(data[i])] = 1.0
new_.append(np.asarray(_))
return new_
data = np.asarray(mnist["data"],dtype=np.float32)
labels = np.asarray(mnist["target"],dtype=np.float32)
labels = one_hot_encode(labels)
tr_data,test_data,tr_labels,test_labels = train_test_split(data,labels,test_size = 0.1)
tr_data = np.asarray(tr_data)
tr_data = np.reshape(tr_data,[len(tr_data),28,28,1])
test_data = np.asarray(test_data)
test_data = np.reshape(test_data,[len(test_data),28,28,1])
tr_labels = np.asarray(tr_labels)
test_labels = np.asarray(test_labels)
def get_conv(x,shape):
weights = tf.Variable(tf.random_normal(shape,stddev=0.05))
biases = tf.Variable(tf.random_normal([shape[-1]],stddev=0.05))
conv = tf.nn.conv2d(x,weights,[1,1,1,1],padding="SAME")
return tf.nn.relu(tf.nn.bias_add(conv,biases))
def get_pool(x,shape):
return tf.nn.max_pool(x,ksize=shape,strides=shape,padding="SAME")
def get_fc(x,shape):
sh = x.get_shape().as_list()
dim = 1
for i in sh[1:]:
dim *= i
x = tf.reshape(x,[-1,dim])
weights = tf.Variable(tf.random_normal(shape,stddev=0.05))
return tf.nn.relu(tf.matmul(x,weights) + tf.Variable(tf.random_normal([shape[1]],stddev=0.05)))
#Creating model
x = tf.placeholder(tf.float32,shape=[None,28,28,1])
y = tf.placeholder(tf.float32,shape=[None,10])
conv1_1 = get_conv(x,[3,3,1,128])
conv1_2 = get_conv(conv1_1,[3,3,128,128])
pool1 = get_pool(conv1_2,[1,2,2,1])
conv2_1 = get_conv(pool1,[3,3,128,512])
conv2_2 = get_conv(conv2_1,[3,3,512,512])
pool2 = get_pool(conv2_2,[1,2,2,1])
conv3_1 = get_conv(pool2,[3,3,512,1024])
conv3_2 = get_conv(conv3_1,[3,3,1024,1024])
conv3_3 = get_conv(conv3_2,[3,3,1024,1024])
conv3_4 = get_conv(conv3_3,[3,3,1024,1024])
pool3 = get_pool(conv3_4,[1,3,3,1])
fc1 = get_fc(pool3,[9216,1024])
fc2 = get_fc(fc1,[1024,10])
softmax = tf.nn.softmax(fc2)
loss = tf.losses.softmax_cross_entropy(logits=fc2,onehot_labels=y)
train_step = tf.train.AdamOptimizer().minimize(loss)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
for i in range(15):
print(i)
indices = np.random.randint(len(tr_data),size=[200])
batch_data = tr_data[indices]
batch_labels = tr_labels[indices]
sess.run(train_step,feed_dict={x:batch_data,y:batch_labels})
Thank you so much.
There are several issues with your code, including elementary ones. I strongly suggest you first go through the Tensorflow step-by-step tutorials for MNIST, MNIST For ML Beginners and Deep MNIST for Experts.
In short, regarding your code:
First, your final layer fc2 should not have a ReLU activation.
Second, the way you build your batches, i.e.
indices = np.random.randint(len(tr_data),size=[200])
is by just grabbing random samples in each iteration, which is far from the correct way of doing so...
Third, the data you feed into the network are not normalized in [0, 1], as they should be:
np.max(tr_data[0]) # get the max value of your first training sample
# 255.0
The third point was initially puzzling for me, too, since in the aforementioned Tensorflow tutorials they don't seem to normalize the data either. But close inspection revealed the reason: if you import the MNIST data through the Tensorflow-provided utility functions (instead of the scikit-learn ones, as you do here), they come already normalized in [0, 1], something that is nowhere hinted at:
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
np.max(mnist.train.images[0])
# 0.99607849
This is an admittedly strange design decision - as far as I am aware of, in all other similar cases/tutorials normalizing the input data is an explicit part of the pipeline (see e.g. the Keras example), and with good reason (it is something you will be certainly expected to do yourself later, when using your own data).