I have same piece of code written first in eager execution mode and then in graph mode. Now, I am not quite able to figure out why the GRU state is not retained in the graph mode while it's working fine in eager mode.
Here's the eager mode code:
import tensorflow as tf
import xxhash
import numpy as np
tf.enable_eager_execution()
rnn_units = 1024
def hash_code(arr):
return xxhash.xxh64(arr).hexdigest()
model = tf.keras.Sequential([tf.keras.layers.GRU(rnn_units,
return_sequences=True,
stateful=True,
recurrent_initializer='glorot_uniform', batch_input_shape=[1, None, 256])])
lstm_wt = np.load('lstm_wt.npy', allow_pickle=True) # fixed weights for comparison
lstm_re_wt = np.load('lstm_re_wt.npy', allow_pickle=True)
lstm_bias = np.load('lstm_bias.npy', allow_pickle=True)
model.layers[0].set_weights([lstm_wt, lstm_re_wt, lstm_bias])
op_embed = np.load('op_embed.npy', allow_pickle=True) # fixed input
op_lstm = model(op_embed)
print(hash_code(op_lstm.numpy()))
op_lstm = model(op_embed)
print(hash_code(op_lstm.numpy()))
model.layers[0].reset_states() # now reset the state, you'll get back the initial output.
op_lstm = model(op_embed)
print(hash_code(op_lstm.numpy()))
Output of this code:
d092fdb4739588a3
cdfdf8b8e292c6e8
d092fdb4739588a3
Now, the graph model code:
import tensorflow as tf
import xxhas
import numpy as np
# checking lstm
op_embed = np.load('op_embed.npy', allow_pickle=True)
# load op_embed, lstm weights
lstm_wt = np.load('lstm_wt.npy', allow_pickle=True)
lstm_re_wt = np.load('lstm_re_wt.npy', allow_pickle=True)
lstm_bias = np.load('lstm_bias.npy', allow_pickle=True)
rnn_units = 1024
layers = tf.keras.layers.GRU(rnn_units,
return_sequences=True,
stateful=True,
recurrent_initializer='glorot_uniform')
x_placeholder = tf.placeholder(shape=op_embed.shape, dtype=tf.float32)
op_lstm = layers(x_placeholder)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
layers.set_weights([lstm_wt, lstm_re_wt, lstm_bias])
tf.assign(layers.weights[0],lstm_wt ).eval(sess)
tf.assign(layers.weights[1], lstm_re_wt).eval(sess)
tf.assign(layers.weights[2], lstm_bias).eval(sess)
print('keras op hash',xxhash.xxh64(sess.run(op_lstm, feed_dict={x_placeholder:op_embed})).hexdigest())
print('keras op hash',xxhash.xxh64(sess.run(op_lstm, feed_dict={x_placeholder:op_embed})).hexdigest())
output:
keras op hash d092fdb4739588a3
keras op hash d092fdb4739588a3
Any insights on how to fix this ambiguity and retain states in case of graph mode?
There's a similar question asked before but unanswered. Statefulness in eager mode vs non-eager mode
Specifying the Solution here (Answer Section) even though it is present in the Link provided in the Question, for the Benefit of the Community.
The Recurrent Neural Network (RNN or GRU or LSTM) loses its state while executing it in Non-Eager-Mode/Graph-Mode by default.
If we want to retain the state, we need to pass the Initial State during RNN call, as shown below:
current_state = np.zeros((1,1))
state_placeholder = tf.placeholder(tf.float32, shape=[1, 1])
output, state = rnn(x, initial_state=state_placeholder)
Then, while executing the Output, we need to pass the State as well, in addition to the Input for feed_dict.
So, the code,
print('keras op hash',xxhash.xxh64(sess.run(op_lstm, feed_dict={x_placeholder:op_embed})).hexdigest())
can be replaced with
for _ in range(No_Of_TimeSteps):
op_val, state_val = sess.run([op_lstm, state], feed_dict={x_placeholder:op_embed})).hexdigest(),
state_placeholder: current_state.astype(np.float32)})
current_state = state_val
print('keras op hash',xxhash.xxh64(op_val))
Hope this helps. Happy Learning!
Related
I made a script in tensorflow 2.x but I had to downconvert it to tensorflow 1.x (tested in 1.14 and 1.15). However, the tf1 version performs very differently (10% accuracy lower on the test set). See also the plot for train and validation performance (diagram is attached below).
Looking at the operations needed for the migration from tf1 to tf2 it seems that only the Adam learning rate may be a problem but I'm defining it explicitly tensorflow migration
I've reproduced the same behavior both locally on GPU and CPU and on colab. The keras used was the one built-in in tensorflow (tf.keras). I've used the following functions (both for train,validation and test), using a sparse categorization (integers):
train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
horizontal_flip=horizontal_flip,
#rescale=None, #not needed for resnet50
preprocessing_function=None,
validation_split=None)
train_dataset = train_datagen.flow_from_directory(
directory=train_dir,
target_size=image_size,
class_mode='sparse',
batch_size=batch_size,
shuffle=True)
And the model is a simple resnet50 with a new layer on top:
IMG_SHAPE = img_size+(3,)
inputs = Input(shape=IMG_SHAPE, name='image_input',dtype = tf.uint8)
x = tf.cast(inputs, tf.float32)
# not working in this version of keras. inserted in imageGenerator
x = preprocess_input_resnet50(x)
base_model = tf.keras.applications.ResNet50(
include_top=False,
input_shape = IMG_SHAPE,
pooling=None,
weights='imagenet')
# Freeze the pretrained weights
base_model.trainable = False
x=base_model(x)
# Rebuild top
x = GlobalAveragePooling2D(data_format='channels_last',name="avg_pool")(x)
top_dropout_rate = 0.2
x = Dropout(top_dropout_rate, name="top_dropout")(x)
outputs = Dense(num_classes,activation="softmax", name="pred_out")(x)
model = Model(inputs=inputs, outputs=outputs,name="ResNet50_comp")
optimizer = tf.keras.optimizers.Adam(lr=learning_rate)
model.compile(optimizer=optimizer,
loss="sparse_categorical_crossentropy",
metrics=['accuracy'])
And then I'm calling the fit function:
history = model.fit_generator(train_dataset,
steps_per_epoch=n_train_batches,
validation_data=validation_dataset,
validation_steps=n_val_batches,
epochs=initial_epochs,
verbose=1,
callbacks=[stopping])
I've reproduced the same behavior for example with the following full script (applied to my dataset and changed to adam and removed intermediate final dense layer):
deep learning sandbox
The easiest way to replicate this behavior was to enable or disable the following line on a tf2 environment with the same script and add the following line to it. However, I've tested also on tf1 environments (1.14 and 1.15):
tf.compat.v1.disable_v2_behavior()
Sadly I cannot provide the dataset.
Update 26/11/2020
For full reproducibility I've obtained a similar behaviour by means of the food101 (101 categories) dataset enabling tf1 behaviour with 'tf.compat.v1.disable_v2_behavior()'. The following is the script executed with tensorflow-gpu 2.2.0:
#%% ref https://medium.com/deeplearningsandbox/how-to-use-transfer-learning-and-fine-tuning-in-keras-and-tensorflow-to-build-an-image-recognition-94b0b02444f2
import os
import sys
import glob
import argparse
import matplotlib.pyplot as plt
import tensorflow as tf
# enable and disable this to obtain tf1 behaviour
tf.compat.v1.disable_v2_behavior()
from tensorflow.keras import __version__
from tensorflow.keras.applications.resnet50 import ResNet50, preprocess_input
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.optimizers import Adam
# since i'm using resnet50 weights from imagenet, i'm using food101 for
# similar but different categorization tasks
# pip install tensorflow-datasets if tensorflow_dataset not found
import tensorflow_datasets as tfds
(train_ds,validation_ds),info= tfds.load('food101', split=['train','validation'], shuffle_files=True, with_info=True)
assert isinstance(train_ds, tf.data.Dataset)
print(train_ds)
#%%
IM_WIDTH, IM_HEIGHT = 224, 224
NB_EPOCHS = 10
BAT_SIZE = 32
def get_nb_files(directory):
"""Get number of files by searching directory recursively"""
if not os.path.exists(directory):
return 0
cnt = 0
for r, dirs, files in os.walk(directory):
for dr in dirs:
cnt += len(glob.glob(os.path.join(r, dr + "/*")))
return cnt
def setup_to_transfer_learn(model, base_model):
"""Freeze all layers and compile the model"""
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
def add_new_last_layer(base_model, nb_classes):
"""Add last layer to the convnet
Args:
base_model: keras model excluding top
nb_classes: # of classes
Returns:
new keras model with last layer
"""
x = base_model.output
x = GlobalAveragePooling2D()(x)
#x = Dense(FC_SIZE, activation='relu')(x) #new FC layer, random init
predictions = Dense(nb_classes, activation='softmax')(x) #new softmax layer
model = Model(inputs=base_model.input, outputs=predictions)
return model
def train(nb_epoch, batch_size):
"""Use transfer learning and fine-tuning to train a network on a new dataset"""
#nb_train_samples = train_ds.cardinality().numpy()
nb_train_samples=info.splits['train'].num_examples
nb_classes = info.features['label'].num_classes
classes_names = info.features['label'].names
#nb_val_samples = validation_ds.cardinality().numpy()
nb_val_samples = info.splits['validation'].num_examples
#nb_epoch = int(args.nb_epoch)
#batch_size = int(args.batch_size)
def preprocess(features):
#print(features['image'], features['label'])
image = tf.image.resize(features['image'], [224,224])
#image = tf.divide(image, 255)
#print(image)
# data augmentation
image=tf.image.random_flip_left_right(image)
image = preprocess_input(image)
label = features['label']
# for categorical crossentropy
#label = tf.one_hot(label,101,axis=-1)
#return image, tf.cast(label, tf.float32)
return image, label
#pre-processing the dataset to fit a specific image size and 2D labelling
train_generator = train_ds.map(preprocess).batch(batch_size).repeat()
validation_generator = validation_ds.map(preprocess).batch(batch_size).repeat()
#train_generator=train_ds
#validation_generator=validation_ds
#fig = tfds.show_examples(validation_generator, info)
# setup model
base_model = ResNet50(weights='imagenet', include_top=False) #include_top=False excludes final FC layer
model = add_new_last_layer(base_model, nb_classes)
# transfer learning
setup_to_transfer_learn(model, base_model)
history = model.fit(
train_generator,
epochs=nb_epoch,
steps_per_epoch=nb_train_samples//BAT_SIZE,
validation_data=validation_generator,
validation_steps=nb_val_samples//BAT_SIZE)
#class_weight='auto')
#execute
history = train(nb_epoch=NB_EPOCHS, batch_size=BAT_SIZE)
And the performance on food101 dataset:
update 27/11/2020
It's possible to see the discrepancy also in the way smaller oxford_flowers102 dataset:
(train_ds,validation_ds,test_ds),info= tfds.load('oxford_flowers102', split=['train','validation','test'], shuffle_files=True, with_info=True)
Nb: the above plot shows confidences given by running the same training multiple times and evaluatind mean and std to check for the effects on random weights initialization and data augmentation.
Moreover I've tried some hyperparameter tuning on tf2 resulting in the following picture:
changing optimizer (adam and rmsprop)
not applying horizontal flipping aumgentation
deactivating keras resnet50 preprocess_input
Thanks in advance for every suggestion. Here are the accuracy and validation performance on tf1 and tf2 on my dataset:
Update 14/12/2020
I'm sharing the colab for reproducibility on oxford_flowers at the clic of a button:
colab script
I came across something similar, when doing the opposite migration (from TF1+Keras to TF2).
Running this code below:
# using TF2
import numpy as np
from tensorflow.keras.applications.resnet50 import ResNet50
fe = ResNet50(include_top=False, pooling="avg")
out = fe.predict(np.ones((1,224,224,3))).flatten()
sum(out)
>>> 212.3205274187726
# using TF1+Keras
import numpy as np
from keras.applications.resnet50 import ResNet50
fe = ResNet50(include_top=False, pooling="avg")
out = fe.predict(np.ones((1,224,224,3))).flatten()
sum(out)
>>> 187.23898954353717
you can see the same model from the same library on different versions does not return the same value (using sum as a quick check-up). I found the answer to this mysterious behavior in this other SO answer: ResNet model in keras and tf.keras give different output for the same image
Another recommendation I'd give you is, try using pooling from inside applications.resnet50.ResNet50 class, instead of the additional layer in your function, for simplicity, and to remove possible problem-generators :)
I'm using Tensorflow 2.2.0-gpu, and I have a simple Keras model that's composed of a few dense layers and a linear output (reference the code below). I'm training the model on variable-length samples, and when I run the code I get warnings about tf.function retracing. From what I've read, function tracing is expensive, and consequently the performance is poor. Here's the code, which takes about 330 seconds to run on my machine.
#import tensorflow as tf
#tf.compat.v1.disable_eager_execution()
import numpy as np
import timeit
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import optimizers
def main():
state_input = keras.Input((2,))
hidden1 = layers.Dense(units = 64, activation = "relu")(state_input)
hidden2 = layers.Dense(units = 128, activation = "relu")(hidden1)
hidden3 = layers.Dense(units = 128, activation = "relu")(hidden2)
output = layers.Dense(units = 2, activation = "linear")(hidden3)
model = keras.Model(inputs = state_input, outputs = output)
opt = optimizers.Adam(lr = 1e-4)
model.compile(optimizer = opt, loss = "mean_squared_error")
np.random.seed(0)
def train():
for i in range(2000):
print(i)
num_samples = np.random.randint(int(1e4), int(1e5))
x = np.random.rand(num_samples, 2)
y = np.random.rand(num_samples, 2)
model.train_on_batch(x, y)
print(timeit.timeit(train, number=1))
if __name__ == "__main__":
main()
If I disable eager execution using tf.compat.v1.disable_eager_execution() (line 2 in the code), then the same code runs in about 30 seconds. This is similar to the performance I was seeing under Tensorflow 1.
Is there a way I can change my model such that I get similar performance to that attained with eager execution diabled? Namely, can the model be changed such that the function retracing isn't incurred on each call?
For reference, this is the warning that's generated when train_on_batch is called:
WARNING:tensorflow:10 out of the last 11 calls to <function Model.make_train_function.<locals>.train_function at 0x7f68f3724158> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
I was able to improve the performance without disabling eager mode by using a tf.function with a signature, and applying gradients manually. (Reference Tensorflow's Better performance with tf.function article.) This improves the performance significantly, but the performance is still better when eager execution is outright disabled.
import tensorflow as tf
import numpy as np
import timeit
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import optimizers
from tensorflow.keras import losses
def main():
state_input = keras.Input((2,))
hidden1 = layers.Dense(units = 64, activation = "relu")(state_input)
hidden2 = layers.Dense(units = 128, activation = "relu")(hidden1)
hidden3 = layers.Dense(units = 128, activation = "relu")(hidden2)
output = layers.Dense(units = 2, activation = "linear")(hidden3)
model = keras.Model(inputs = state_input, outputs = output)
opt = optimizers.Adam(lr = 1e-4)
loss = losses.MeanSquaredError()
np.random.seed(0)
#tf.function(input_signature=[
tf.TensorSpec(shape=(None, 2), dtype=tf.float32),
tf.TensorSpec(shape=(None, 2), dtype=tf.float32)
])
def fit(x, y):
with tf.GradientTape() as tape:
preds = model(x)
losses = loss(preds, y)
grad = tape.gradient(losses, model.trainable_variables)
opt.apply_gradients(zip(grad, model.trainable_variables))
def train():
for i in range(2000):
print(i)
num_samples = np.random.randint(int(1e4), int(1e5))
x = np.random.rand(num_samples, 2)
y = x * 2
fit(x, y)
print(timeit.timeit(train, number=1))
print('test')
print(model.predict(np.array([[.2, .4], [.6, .8]])))
if __name__ == "__main__":
main()
But honestly, that's pretty ugly.
Here's a great question about why TF2 is so slow as compared to TF1: Why is TensorFlow 2 much slower than TensorFlow 1? That gives some benchmarks.
My actual code, which is markedly more complex than the simple snippet provided in the question, is about 1/10th of the speed with eager execution enabled (the default). While using tf.function with a signature does speed the code up, it's still not nearly as fast as simply disabling eager execution (plus, again, using tf.function and GradientTape is pretty atrocious).
In the end I just disable eager execution. If someone comes along with a better answer then I'll gladly accept it.
Thanks for reading my question.
I was using keras to develop my reinforcement learning agent based on keres-rl. But I want to upgrade my agent so that I get some update from open ai base line code for better action exploration. But the code used tensorflow only. It is my first time to use tensorflow. I am so confused. I build keras deep learninng model using its "Model API". I have never concerned about inside of model. But the code I referenced was full of the code that kick in inside of deep learning model and give some change to its weight and get immediate layer output using tf.Session(). The framework is so flexible. Like below, using tf.Session() the tensor, which is recognized tensor and is not callable, can get result feeding feed_dict data. In keras, it is impossible as far as I know.
Once I allow using tf.Session(), my architecture will be complex and nobody wants to understand and use it except that I can adapt reference code more easily.
On the other side, if I don't allow that, I needs to break down my existing model and use tons of K.function to get middle layer's output or something that I can't get from keras model.
import numpy as np
from keras.layers import Dense, Input, BatchNormalization
from keras.models import Model
import tensorflow as tf
import keras.backend as K
import rl2.tf_util as U
def normalize(x, stats):
if stats is None:
return x
return (x - stats.mean) / (stats.std + 1e-8)
class RunningMeanStd(object):
# https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
def __init__(self, my, epsilon=1e-2, shape=()):
self._sum = K.variable(value=np.zeros(shape), dtype=tf.float32, name=my+"_runningsum")
self._sumsq = K.variable(value=np.zeros(shape) + epsilon, dtype=tf.float32, name=my+"_runningsumsq")
self._count = K.variable(value=np.zeros(()) + epsilon, dtype=tf.float32, name=my+"_count")
self.mean = self._sum / self._count
self.std = K.sqrt(K.maximum((self._sumsq / self._count) - K.square(self.mean), epsilon))
newsum = K.variable(value=np.zeros(shape), dtype=tf.float32, name=my+'_sum')
newsumsq = K.variable(value=np.zeros(shape), dtype=tf.float32, name=my+'_var')
newcount = K.variable(value=np.zeros(()), dtype=tf.float32, name=my+'_count')
self.incfiltparams = K.function([newsum, newsumsq, newcount], [],
updates=[K.update_add(self._sum, newsum),
K.update(self._sumsq, newsumsq),
K.update(self._count, newcount)])
def update(self, x):
x = x.astype('float64')
n = int(np.prod(self.shape))
totalvec = np.zeros(n*2+1, 'float64')
addvec = np.concatenate([x.sum(axis=0).ravel(), np.square(x).sum(axis=0).ravel(), np.array([len(x)],dtype='float64')])
self.incfiltparams(totalvec[0:n].reshape(self.shape),
totalvec[n:2*n].reshape(self.shape),
totalvec[2*n])
i = Input(shape=(1,))
# h = BatchNormalization()(i)
h = Dense(4, activation='relu', kernel_initializer='he_uniform')(i)
h = Dense(10, activation='relu', kernel_initializer='he_uniform')(h)
o = Dense(1, activation='linear', kernel_initializer='he_uniform')(h)
model = Model(i, o)
obs_rms = RunningMeanStd(my='obs', shape=(1,))
normalized_obs0 = K.clip(normalize(i, obs_rms), 0, 100)
tf2 = model(normalized_obs0)
# print(model.predict(np.asarray([2,2,2,2,2]).reshape(5,)))
# print(tf(np.asarray([2,2,2,2,2]).reshape(5,)))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run([tf2], feed_dict={i : U.adjust_shape(i, [np.asarray([2,]).reshape(1,)])}))
I recently found a paper where they used a CNN with complex 2D-feature-maps as an input. However, there Network also outputs a complex output vector. They used Keras with tensorflow backend.
Here is the link: https://arxiv.org/pdf/1802.04479.pdf
I asked myself if it is possible to build complex Deep Neural Networks like CNNs with tensorflow. As far as i know it is not possible. Did i missed something?
There are other related questions which adresses the same problem with no answer: Complex convolution in tensorflow
when building a realy senseless model with real number in and output all works correct:
import tensorflow as tf
from numpy import random, empty
n = 10
feature_vec_real = random.rand(1,n)
X_real = tf.placeholder(tf.float64,feature_vec_real.shape)
def model(x):
out = tf.layers.dense(
inputs=x,
units=2
)
return out
model_output = model(X_real)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
output = sess.run(model_output,feed_dict={X_real:feature_vec_real})
but when using complex inputs:
import tensorflow as tf
from numpy import random, empty
n = 10
feature_vec_complex = empty(shape=(1,n),dtype=complex)
feature_vec_complex.real = random.rand(1,n)
feature_vec_complex.imag = random.rand(1,n)
X_complex = tf.placeholder(tf.complex128,feature_vec_complex.shape)
def complex_model(x):
out = tf.layers.dense(
inputs=x,
units=2
)
return out
model_output = complex_model(X_complex)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
output = sess.run(model_output,feed_dict={X_complex:feature_vec_complex})
i get the following error:
ValueError: An initializer for variable dense_7/kernel of is required
So what is the correct way to initialize the weights of the dense kernel when having complex inputs?
I know there is the possibility to handle complex numbers as two different layers in the network. But this is not what i want.
Thanks for your help!
I am currently training a CNN on MNIST, and the output probabilities (softmax) are giving [0.1,0.1,...,0.1] as training goes on. The initial values aren't uniform, so I can't figure out if I'm doing something stupid here?
I'm only training for 15 steps, just to see how training progresses; even though that's a low number, I don't think that should result in uniform predictions?
import numpy as np
import tensorflow as tf
import imageio
from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original')
# Getting data
from sklearn.model_selection import train_test_split
def one_hot_encode(data):
new_ = []
for i in range(len(data)):
_ = np.zeros([10],dtype=np.float32)
_[int(data[i])] = 1.0
new_.append(np.asarray(_))
return new_
data = np.asarray(mnist["data"],dtype=np.float32)
labels = np.asarray(mnist["target"],dtype=np.float32)
labels = one_hot_encode(labels)
tr_data,test_data,tr_labels,test_labels = train_test_split(data,labels,test_size = 0.1)
tr_data = np.asarray(tr_data)
tr_data = np.reshape(tr_data,[len(tr_data),28,28,1])
test_data = np.asarray(test_data)
test_data = np.reshape(test_data,[len(test_data),28,28,1])
tr_labels = np.asarray(tr_labels)
test_labels = np.asarray(test_labels)
def get_conv(x,shape):
weights = tf.Variable(tf.random_normal(shape,stddev=0.05))
biases = tf.Variable(tf.random_normal([shape[-1]],stddev=0.05))
conv = tf.nn.conv2d(x,weights,[1,1,1,1],padding="SAME")
return tf.nn.relu(tf.nn.bias_add(conv,biases))
def get_pool(x,shape):
return tf.nn.max_pool(x,ksize=shape,strides=shape,padding="SAME")
def get_fc(x,shape):
sh = x.get_shape().as_list()
dim = 1
for i in sh[1:]:
dim *= i
x = tf.reshape(x,[-1,dim])
weights = tf.Variable(tf.random_normal(shape,stddev=0.05))
return tf.nn.relu(tf.matmul(x,weights) + tf.Variable(tf.random_normal([shape[1]],stddev=0.05)))
#Creating model
x = tf.placeholder(tf.float32,shape=[None,28,28,1])
y = tf.placeholder(tf.float32,shape=[None,10])
conv1_1 = get_conv(x,[3,3,1,128])
conv1_2 = get_conv(conv1_1,[3,3,128,128])
pool1 = get_pool(conv1_2,[1,2,2,1])
conv2_1 = get_conv(pool1,[3,3,128,512])
conv2_2 = get_conv(conv2_1,[3,3,512,512])
pool2 = get_pool(conv2_2,[1,2,2,1])
conv3_1 = get_conv(pool2,[3,3,512,1024])
conv3_2 = get_conv(conv3_1,[3,3,1024,1024])
conv3_3 = get_conv(conv3_2,[3,3,1024,1024])
conv3_4 = get_conv(conv3_3,[3,3,1024,1024])
pool3 = get_pool(conv3_4,[1,3,3,1])
fc1 = get_fc(pool3,[9216,1024])
fc2 = get_fc(fc1,[1024,10])
softmax = tf.nn.softmax(fc2)
loss = tf.losses.softmax_cross_entropy(logits=fc2,onehot_labels=y)
train_step = tf.train.AdamOptimizer().minimize(loss)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
for i in range(15):
print(i)
indices = np.random.randint(len(tr_data),size=[200])
batch_data = tr_data[indices]
batch_labels = tr_labels[indices]
sess.run(train_step,feed_dict={x:batch_data,y:batch_labels})
Thank you so much.
There are several issues with your code, including elementary ones. I strongly suggest you first go through the Tensorflow step-by-step tutorials for MNIST, MNIST For ML Beginners and Deep MNIST for Experts.
In short, regarding your code:
First, your final layer fc2 should not have a ReLU activation.
Second, the way you build your batches, i.e.
indices = np.random.randint(len(tr_data),size=[200])
is by just grabbing random samples in each iteration, which is far from the correct way of doing so...
Third, the data you feed into the network are not normalized in [0, 1], as they should be:
np.max(tr_data[0]) # get the max value of your first training sample
# 255.0
The third point was initially puzzling for me, too, since in the aforementioned Tensorflow tutorials they don't seem to normalize the data either. But close inspection revealed the reason: if you import the MNIST data through the Tensorflow-provided utility functions (instead of the scikit-learn ones, as you do here), they come already normalized in [0, 1], something that is nowhere hinted at:
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
np.max(mnist.train.images[0])
# 0.99607849
This is an admittedly strange design decision - as far as I am aware of, in all other similar cases/tutorials normalizing the input data is an explicit part of the pipeline (see e.g. the Keras example), and with good reason (it is something you will be certainly expected to do yourself later, when using your own data).