I use batch = 5 in tensorflow during training phase, why I cant use only batch = 1 test in tensorflowjs? - tensorflow

I train the GAN model in tensorflow with batchsize=5,so the generator input size is [5,imagesize,imagesize,3].After training,I convert tensorflow model into the tensorflowjs model.
So,I load the model by loadFrozenModel.Then use model.predict to predict an image.However, the shape of dict['concat'] provided in model.execute(dict) must be [5,512,512,12], but was [1,512,512,12].
How to solve this problem?I use mini-batch in traning phase in tensorflow,and only predict an image with one input not 5 inputs in tensorflowjs
Figure 1. the error

It sounds like you set the batch size explicitly as part of the input shape in your training job, e.g.
x = tf.placeholder("float", shape=[5, 512, 512, 12])
Instead you should leave the batch size unspecified, like this:
x = tf.placeholder("float", shape=[None, 512, 512, 12])
That way the graph will work with whatever batch size you give it, both at training and at inference time.
If you have code that needs to know the batch size explicitly, see here for some tips.

Related

Tensorflow model Inference Accuracy Drooping with Batch Size

I trained a DenseNet121 based model on my data and achieved desired accuracy in training. But During prediction with BATCH=1 the accuracy drops badly. I have found that prediction output is depending upon BATCH SIZE. I get the same accuracy if I keep the BATCH size same as during training but for any other batch size the accuracy is lower. The lower thhe BATCH size , the lower accuracy. Please help as I need to do predictions on single image at a time. Below is the model:-
def make_model():
base_model = DenseNet121(include_top=False, weights="imagenet", input_shape=(128, 128, 3), pooling="max")
inputs = keras.Input(shape=(128, 128, 3))
output = base_model(inputs, training=True)
output = tf.keras.layers.Dropout(0.2)(output)
output = keras.layers.Dense(units=max_seq_length * TOTAL_SYMBOLS)(output)
output = keras.layers.Reshape((max_seq_length, TOTAL_SYMBOLS))(output)
model = keras.Model(inputs, output)
return model
model = make_model()
This is not an answer but I found a way to solve the problem. I created the dense121 network from scratch and used that to train my model, every thing worked fine. I suspect there are some optimization in the keras.application.YOUR_MODEL or in the keras.application.YourModel.pre_processing provided by keras, which are creating this problem. The optimizations seems batch dependednt.

Stateful LSTM fails to predict due to batch_size issue

I am able to successfully train my stateful LSTM using keras. My batch size is 60 and every input I am sending in the network is divisible by batch_size
Following is my snippet :
model = Sequential()
model.add(LSTM(80,input_shape = trainx.shape[1:],batch_input_shape=(60,
trainx.shape[1], trainx.shape[2]),stateful=True,return_sequences=True))
model.add(Dropout(0.15))
model.add(LSTM(40,return_sequences=False))
model.add(Dense(40))
model.add(Dropout(0.3))
model.add(Dense(output_dim=1))
model.add(Activation("linear"))
keras.optimizers.RMSprop(lr=0.005, rho=0.9, epsilon=1e-08, decay=0.0)
model.compile(loss="mse", optimizer="rmsprop")
My training line which runs successfully:
model.fit(trainx[:3000,:],trainy[:3000],validation_split=0.1,shuffle=False,nb_epoch=9,batch_size=60)
Now I try to predict on test set which is again divisible by 60 , but I get error :
ValueError: In a stateful network, you should only pass inputs with a
number of samples that can be divided by the batch size. Found: 240
samples. Batch size: 32.
Can anyone tell me what is wrong above ? I am confused , tried so many things but nothing helps.
I suspect that the reason for the error is that you did not specify the batch size in model.predict. As you can see in the documentation in the "predict" section, the default parameters are
model.predict(self, x, batch_size=32, verbose=0)
which is why 32 appears in your error message. So you need to specify batch_size=60 in model.predict.

Tensorflow: calculate gradient for tf.multiply

I'm building a neural network that has the following two layers
pseudo_inputs = tf.Variable(a_numpy_ndarray)
weights = tf.Variable(tf.truncated_normal(...))
I then want to multiply them using tf.multiply (which, unlike tf.matmul multiplies corresponding indices, i.e. c_ij = a_ij * b_ij)
input = tf.multiply(pseudo_inputs, weights)
My goal is to learn weights. So I run
train_step = tf.train.AdamOptimizer(learn_rate).minimize(loss, var_list=[weights])
But it doesn't work. The network doesn't change at all.
Looking at tensorboard, I could see that 'input' has no gradient, so I'm assuming that's the problem. Any ideas how to solve this?
From reading tensorflow docs it seems like I might have to write a gradient op for tf.multiply, but I find it hard to believe no one needed to do this before.
I thought the pseudo_inputs should be set as Placeholders in the first line.
And in this line:
train_step = tf.train.AdamOptimizer(learn_rate).minimize(loss, var_list=[weights])
Since weights are to be trained in the graph by minimizing loss then it should not passed as a parameter here.
train = tf.train.AdamOptimizer(learn_rate).minimize(loss)
Then you should first run the train using the samples(you don't have labels) you have.
for x_train, y_train in samples:
sess.run(train, {pseudo_inputs:x_train, y:y_train})
And after that you can get weights by:
W_c, loss_c = sess.run([W, loss], {pseudo_inputs=x_train, y:y_train})

Tensorflow VGG16 (converted from caffe) got low evaluation accuracy

I didn't convert the weights by myself, instead I used vgg16_weights.npz from www(dot)cs(dot)toronto(dot)edu/~frossard/post/vgg16/. There, it is mentioned
We convert the Caffe weights publicly available in the author’s GitHub profile (gist(dot)github(dot)com/ksimonyan/211839e770f7b538e2d8#file-readme-md) using a specialized tool (github(dot)com/ethereon/caffe-tensorflow).
But, in that page, there is no validation code, so I made it referring to tensorflow MNIST and inception code.
How I create TFRecords of Imagenet
I use build_imagenet_data.py from inception. I changed the
label_index = 0 #originally label_index = 1
because inception use label_index 0 as background class (so in total there are 1001 classes). Caffe format doesn't use that as the number of output is 1000. I prefer to use TFRecord format as I will change process the weight and retrain.
How I load the weights
inference function taken from MNIST's mnist.py was modified so the Variable is taken from the vgg16_weights.npz
How I load the weights:
weights = np.load('/the_path/vgg16_weights.npz')
How I put the variable in conv1_1:
with tf.name_scope('conv1_1') as scope:
kernel = tf.Variable(tf.constant(weights['conv1_1_W']), name='weights')
conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME')
biases = tf.Variable(tf.constant(weights['conv1_1_b']), name='biases')
out = tf.nn.bias_add(conv, biases)
conv1_1 = tf.nn.relu(out, name=scope)
sess.run(conv1_1)
How I read the TFRecords
I took inception's image_processing.py, dataset.py, and ImagenetData.py with no change. Then, I run inception's inception_eval.py evaluate function with changing in inference code and deleting the restoring moving variable from checkpoint (as I already restore manually in variable initialization). However, the accuracy is not same with the VGG-16 in caffe. Top-5 accuracy is around 9%.
Closing
What is the problem of this method? There are several part of code that I still don't understand though:
How TFReader move to the next batch of images after processing 1 batch of images? The output of inception's image_processing.py size is only the number of batch size. To be complete, this is the output based on documentation:
images: Images. 4D tensor of size [batch_size, FLAGS.image_size,
image_size, 3].
labels: 1-D integer Tensor of [FLAGS.batch_size].
Do I need softmax the logits before tf.in_top_k ? (Well, I don't think it is matter as the value sequence is same)
Thank you for the help. Sorry if the link is messy as I can only post 2 links in 1 post because of my reputation.
UPDATE
I tried myself by changing the caffe weight. Reverse the channel input dimension of conv1_1 (because caffe receive BGR, so the weight is for BGR instead of RGB in tensorflow) and get the same accuracy with the weight from website: around 9% in top-5.
I found out that there is no mean image subtraction in tensorflow inception's image_processing.py. I add mean subtraction (in eval_image function) with tf.reduce_mean and got 11% accuracy.
Then I tried to change the eval_image function with
# source: https://github.com/ethereon/caffe-tensorflow/blob/master/examples/imagenet/dataset.py
img_shape = tf.to_float(tf.shape(image)[:2])
min_length = tf.minimum(img_shape[0], img_shape[1])
new_shape = tf.to_int32((256 / min_length) * img_shape) #isotropic case
# new_shape = tf.pack([256,256]) #non isotropic case
image = tf.image.resize_images(image, [new_shape[0], new_shape[1]])
offset = tf.to_int32((new_shape - 224) / 2)
image = tf.slice(image, begin=tf.pack([offset[0], offset[1], 0]), size=tf.pack([224, 224, -1]))
# mean_subs_image = tf.reduce_mean(image,axis=[0,1],keep_dims=True)
return image - mean_subs_image
and I got 13%. Increased but still lack a lot. Seems it is one of the problem. I am not sure what is the other problems.
In general porting whole model weights across libraries will be hard. You pointed out some differences from caffe, but there could be others. It might be easier to retrain the model in TensorFlow.

Oversampling images during inference

It is is a common practice in convolutional neural networks to oversample a given image during inference,
I.e to create a batch from different transformation of the same image (most common - different crops and mirroring), transfer the entire batch through the network and average (or another kind of reducing function) over the results to get a single prediction (caffe example),
How can this approach be implemented in tensorflow?
You can take a look at the TF cnn tutorial. In particular, the function distorted_inputs does the image preprocessing step.
In short, there are a couple of TF functions in the tf.image package that help with distorting the images. You can use either them or regular numpy functions to create an extra dimension for the output, for which you can average the results:
Before:
input_place = tf.placeholder(tf.float32, [None, 256, 256, 3])
prediction = some_model(input_place) # size: [None]
sess.run(prediction, feed_dict={input_place: batch_of_images})
After:
input_place = tf.placeholder(tf.float32, [None, NUM_OF_DISTORTIONS, 256, 256, 3])
prediction = some_model(input_place) # make sure it is of size [None, NUM_DISTORTIONS]
new_prediction = tf.reduce_mean(prediction, axis=1)
new_batch = np.zeros(batch_size, NUM_OF_DISTORTIONS, 256, 256, 3)
for i in xrange(len(batch_of_images)):
for f in xrange(len(distortion_functions)):
new_batch[i, f, :, :, :] = distortion_functions[f](batch_of_images[i])
sess.run(new_prediction, feed_dict={input_place: new_batch})
Take a look at TF's image-related functions. You could apply those transformations at test time to some input image, and stack all of them together to make a batch.
I imagine you could also do this using OpenCV or some other image processing tool. I don't see a need to do it in the computation graph. You could create the batches beforehand, and pass it through in feed_dict.