Tensorflow weight initialization - tensorflow

Regarding the MNIST tutorial on the TensorFlow website, I ran an experiment (gist) to see what the effect of different weight initializations would be on learning. I noticed that, against what I read in the popular [Xavier, Glorot 2010] paper, learning is just fine regardless of weight initialization.
The different curves represent different values for w for initializing the weights of the convolutional and fully connected layers. Note that all values for w work fine, even though 0.3 and 1.0 end up at lower performance and some values train faster - in particular, 0.03 and 0.1 are fastest. Nevertheless, the plot shows a rather large range of w which works, suggesting 'robustness' w.r.t. weight initialization.
def weight_variable(shape, w=0.1):
initial = tf.truncated_normal(shape, stddev=w)
return tf.Variable(initial)
def bias_variable(shape, w=0.1):
initial = tf.constant(w, shape=shape)
return tf.Variable(initial)
Question: Why does this network not suffer from the vanishing or exploding gradient problem?
I would suggest you read the gist for implementation details, but here's the code for reference. It took approximately an hour on my Nvidia 960m, although I imagine it could also run on a CPU within reasonable time.
import time
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
from tensorflow.python.client import device_lib
import numpy
import matplotlib.pyplot as pyplot
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
# Weight initialization
def weight_variable(shape, w=0.1):
initial = tf.truncated_normal(shape, stddev=w)
return tf.Variable(initial)
def bias_variable(shape, w=0.1):
initial = tf.constant(w, shape=shape)
return tf.Variable(initial)
# Network architecture
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
def build_network_for_weight_initialization(w):
""" Builds a CNN for the MNIST-problem:
- 32 5x5 kernels convolutional layer with bias and ReLU activations
- 2x2 maxpooling
- 64 5x5 kernels convolutional layer with bias and ReLU activations
- 2x2 maxpooling
- Fully connected layer with 1024 nodes + bias and ReLU activations
- dropout
- Fully connected softmax layer for classification (of 10 classes)
Returns the x, and y placeholders for the train data, the output
of the network and the dropbout placeholder as a tuple of 4 elements.
"""
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
x_image = tf.reshape(x, [-1,28,28,1])
W_conv1 = weight_variable([5, 5, 1, 32], w)
b_conv1 = bias_variable([32], w)
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
W_conv2 = weight_variable([5, 5, 32, 64], w)
b_conv2 = bias_variable([64], w)
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
W_fc1 = weight_variable([7 * 7 * 64, 1024], w)
b_fc1 = bias_variable([1024], w)
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
W_fc2 = weight_variable([1024, 10], w)
b_fc2 = bias_variable([10], w)
y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2
return (x, y_, y_conv, keep_prob)
# Experiment
def evaluate_for_weight_init(w):
""" Returns an accuracy learning curve for a network trained on
10000 batches of 50 samples. The learning curve has one item
every 100 batches."""
with tf.Session() as sess:
x, y_, y_conv, keep_prob = build_network_for_weight_initialization(w)
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.global_variables_initializer())
lr = []
for _ in range(100):
for i in range(100):
batch = mnist.train.next_batch(50)
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
assert mnist.test.images.shape[0] == 10000
# This way the accuracy-evaluation fits in my 2GB laptop GPU.
a = sum(
accuracy.eval(feed_dict={
x: mnist.test.images[2000*i:2000*(i+1)],
y_: mnist.test.labels[2000*i:2000*(i+1)],
keep_prob: 1.0})
for i in range(5)) / 5
lr.append(a)
return lr
ws = [0.0001, 0.0003, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1.0]
accuracies = [
[evaluate_for_weight_init(w) for w in ws]
for _ in range(3)
]
# Plotting results
pyplot.plot(numpy.array(accuracies).mean(0).T)
pyplot.ylim(0.9, 1)
pyplot.xlim(0,140)
pyplot.xlabel('batch (x 100)')
pyplot.ylabel('test accuracy')
pyplot.legend(ws)

Weight initialization strategies can be an important and often overlooked step in improving your model, and since this is now the top result on Google I thought it could warrant a more detailed answer.
In general, the total product of each layer's activation function gradient, number of incoming/outgoing connections (fan_in/fan_out), and variance of weights should be equal to one. This way, as you backpropagate through the network the variance between input and output gradients will stay consistent, and you won't suffer from exploding or vanishing gradients. Even though ReLU is more resistant to exploding/vanishing gradients, you might still have problems.
tf.truncated_normal used by OP does a random initialization which encourages weights to be updated "differently", but does not take the above optimization strategy into account. On smaller networks this might not be a problem, but if you want deeper networks, or faster training times, then you are best trying a weight initialization strategy based on recent research.
For weights preceding a ReLU function you could use the default settings of:
tf.contrib.layers.variance_scaling_initializer
for tanh/sigmoid activated layers "xavier" might be more appropriate:
tf.contrib.layers.xavier_initializer
More details on both these functions and associated papers can be found at:
https://www.tensorflow.org/versions/r0.12/api_docs/python/contrib.layers/initializers
Beyond weight initialization strategies, further optimization could explore batch normalization: https://www.tensorflow.org/api_docs/python/tf/nn/batch_normalization

Logistic functions are more prone to vanishing gradient, because their gradients are all <1, so the more of them you multiply during back-propagation, the smaller your gradient becomes (and quite quickly), whereas RelU has a gradient of 1 on the positive part, so it does not have this problem.
Also, you network is not at all deep enough to suffer from that.

Related

Fully Convolutional Network, Training Error

I apologize that I'm not good at English.
I'm trying to build my own Fully Convolutional Network using TensorFlow.
But I have difficulties on training this model with my own image data, whereas the MNIST data worked properly.
Here is my FCN model code: (Not using pre-trained or pre-bulit model)
import tensorflow as tf
import numpy as np
Loading MNIST Data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
images_flatten = tf.placeholder(tf.float32, shape=[None, 784])
images = tf.reshape(images_flatten, [-1,28,28,1]) # CNN deals with 3 dimensions
labels = tf.placeholder(tf.float32, shape=[None, 10])
keep_prob = tf.placeholder(tf.float32) # Dropout Ratio
Convolutional Layers
# Conv. Layer #1
W1 = tf.Variable(tf.truncated_normal([3, 3, 1, 4], stddev = 0.1))
b1 = tf.Variable(tf.truncated_normal([4], stddev = 0.1))
FMA = tf.nn.conv2d(images, W1, strides=[1,1,1,1], padding='SAME')
# FMA stands for Fused Multiply Add, which means convolution
RELU = tf.nn.relu(tf.add(FMA, b1))
POOL = tf.nn.max_pool(RELU, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')
# Conv. Layer #2
W2 = tf.Variable(tf.truncated_normal([3, 3, 4, 8], stddev = 0.1))
b2 = tf.Variable(tf.truncated_normal([8], stddev = 0.1))
FMA = tf.nn.conv2d(POOL, W2, strides=[1,1,1,1], padding='SAME')
RELU = tf.nn.relu(tf.add(FMA, b2))
POOL = tf.nn.max_pool(RELU, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')
# Conv. Layer #3
W3 = tf.Variable(tf.truncated_normal([7, 7, 8, 16], stddev = 0.1))
b3 = tf.Variable(tf.truncated_normal([16], stddev = 0.1))
FMA = tf.nn.conv2d(POOL, W3, strides=[1,1,1,1], padding='VALID')
RELU = tf.nn.relu(tf.add(FMA, b3))
# Dropout
Dropout = tf.nn.dropout(RELU, keep_prob)
# Conv. Layer #4
W4 = tf.Variable(tf.truncated_normal([1, 1, 16, 10], stddev = 0.1))
b4 = tf.Variable(tf.truncated_normal([10], stddev = 0.1))
FMA = tf.nn.conv2d(Dropout, W4, strides=[1,1,1,1], padding='SAME')
LAST_RELU = tf.nn.relu(tf.add(FMA, b4))
Summary: [Conv-ReLU-Pool] - [Conv-ReLU-Pool] - [Conv-ReLU] - [Dropout] - [Conv-ReLU]
Define Loss, Accuracy
prediction = tf.squeeze(LAST_RELU)
# Because FCN returns (1 x 1 x class_num) in training
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(prediction, labels))
# First arg is 'logits=' and the other one is 'labels='
optimizer = tf.train.AdamOptimizer(0.001)
train = optimizer.minimize(loss)
label_max = tf.argmax(labels, 1)
pred_max = tf.argmax(prediction, 1)
correct_pred = tf.equal(pred_max, label_max)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
Training Model
sess = tf.Session()
sess.run(tf.global_variables_initializer())
for i in range(10000):
image_batch, label_batch = mnist.train.next_batch(100)
sess.run(train, feed_dict={images: image_batch, labels: label_batch, keep_prob: 0.8})
if i % 10 == 0:
tr = sess.run([loss, accuracy], feed_dict={images: image_batch, labels: label_batch, keep_prob: 1.0})
print("Step %d, Loss %g, Accuracy %g" % (i, tr[0], tr[1]))
Loss: 0.784 (Approximately)
Accuracy: 94.8% (Approximately)
The problem is that, training this model with MNIST data worked very well, but with my own data, loss is always same(0.6319), and the output layer is always 0.
There is no difference with the code, excepting for the third convolutional layer's filter size. This filter size and input size which is compressed by previous pooling layers, must have same width & height. That's why the filter size in this layer is [7,7].
What is wrong with my model?..
The only different code between two cases (MNIST, my own data) is:
Placeholder
My own data has (128 x 64 x 1) and the label is 'eyes', 'not_eyes'
images = tf.placeholder(tf.float32, [None, 128, 64, 1])
labels = tf.placeholder(tf.int32, [None, 2])
3rd Convolutional Layer
W3 = tf.Variable(tf.truncated_normal([32, 16, 8, 16], stddev = 0.1))
Feeding (Batch)
image_data, label_data = input_data.get_batch(TRAINING_FILE, 10)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
for i in range(10000):
image_batch, label_batch = sess.run([image_data, label_data])
sess.run(train, feed_dict={images: image_batch, labels: label_batch, keep_prob: 0.8})
if i % 10 == 0: ... # Validation part is almost same, too...
coord.request_stop()
coord.join(threads)
Here "input_data" is an another python file in the same directory, and "get_batch(TRAINING_FILE, 10)" is the function that returns batch data. The code is:
def get_input_queue(txtfile_name):
images = []
labels = []
for line in open(txtfile_name, 'r'): # Here txt file has data's path, label, label number
cols = re.split(',|\n', line)
labels.append(int(cols[2]))
images.append(tf.image.decode_jpeg(tf.read_file(cols[0]), channels = 1))
input_queue = tf.train.slice_input_producer([images, labels], shuffle = True)
return input_queue
def get_batch(txtfile_name, batch_size):
input_queue = get_input_queue(txtfile_name)
image = input_queue[0]
label = input_queue[1]
image = tf.reshape(image, [128, 64, 1])
batch_image, batch_label = tf.train.batch([image, label], batch_size)
batch_label_one_hot = tf.one_hot(tf.to_int64(batch_label), 2, on_value=1.0, off_value=0.0)
return batch_image, batch_label_one_hot
It seems not to have any problem .... :( Please Help me..!!
Are your inputs scaled appropriately?. The jpegs are in [0-255] range and it needs to be scaled to [-1 - 1]. You can try:
image = tf.reshape(image, [128, 64, 1])
image = tf.scalar_mul((1.0/255), image)
image = tf.subtract(image, 0.5)
image = tf.multiply(image, 2.0)
What is the accuracy you are getting with your model for MNIST? It would be helpful if you post the code. Are you using the trained model to evaluate the output for your own data.
A general suggestion on setting up the convolution model is provided here.
Here is the model suggestion according to the article :-
INPUT -> [[CONV -> RELU]*N -> POOL?]*M -> [FC -> RELU]*K -> FC
Having more than one layers of CONV->RELU pair before pooling improves learning complex features. Try with N=2 instead of 1.
Some other suggestions:
While you are preparing your data reduce it to smaller size than 128x64. Try same size as the MNIST data ..
image = tf.reshape(image, [28, 28, 1])
If your eye/noeye image is color, then convert it to greyscale and normalize the values to unity range. You can do this using numpy or tf, here is how using numpy
grayscale-->
img = np.dot(np.array(img, dtype='float32'), [[0.2989],[0.5870],[0.1140]])
normalize-->
mean = np.mean(img, dtype='float32')
std = np.std(img, dtype='float32', ddof=1)
if std < 1e-4: std = 1.
img = (img - mean) / std

TensorFlow: Why do parameters not update when GradientDescentOptimizer train step is run?

When I run the following code, it prints a constant loss at every training step; I also tried printing the parameters, which also do not change.
I can't seem to figure out why train_step, which uses a GradientDescentOptimizer, doesnt change the weights in W_fc1, b_fc1, W_fc2, and b_fc2.
I'm a beginner to machine learning so I might be missing something obvious.
(An answer for a similar question was that weights should not be initialized at zero, but the weights here are initialized with truncated normal so that cant be the problem).
import tensorflow as tf
import numpy as np
import csv
import random
with open('wine_data.csv', 'rb') as csvfile:
input_arr = list(csv.reader(csvfile, delimiter=','))
for i in range(len(input_arr)):
input_arr[i][0] = int(input_arr[i][0]) - 1 # 0 index for one hot
for j in range(1, len(input_arr[i])):
input_arr[i][j] = float(input_arr[i][j])
random.shuffle(input_arr)
training_data = np.array(input_arr[:2*len(input_arr)/3]) # train on first two thirds of data
testing_data = np.array(input_arr[2*len(input_arr)/3:]) # test on last third of data
x_train = training_data[0:, 1:]
y_train = training_data[0:, 0]
x_test = testing_data[0:, 1:]
y_test = testing_data[0:, 0]
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
x = tf.placeholder(tf.float32, shape=[None, 13], name='x')
y_ = tf.placeholder(tf.float32, shape=[None], name='y_')
y_one_hot = tf.one_hot(tf.cast(y_, tf.int32), 3) # actual y values
W_fc1 = weight_variable([13, 128])
b_fc1 = bias_variable([128])
fc1 = tf.matmul(x, W_fc1)+b_fc1
W_fc2 = weight_variable([128, 3])
b_fc2 = bias_variable([3])
y = tf.nn.softmax(tf.matmul(fc1, W_fc2)+b_fc2)
cross_entropy = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(labels=y_one_hot, logits=y))
train_step = tf.train.GradientDescentOptimizer(1e-17).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_one_hot,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
for _ in range(1000):
train_step.run(feed_dict={x: x_train, y_: y_train})
if _%10 == 0:
loss = cross_entropy.eval(feed_dict={x: x_train, y_: y_train})
print('step', _, 'loss', loss)
Thanks in advance.
From the official tensorflow documentation:
WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.
Remove the softmax on y before feeding it into tf.nn.softmax_cross_entropy_with_logits
Also set your learning rate to something higher (like 3e-4)

Massive diff. between Training and Testing accuracy: just overfitting or am I missing somth obvious?

I am using Tensorflow to adopt the Convolution and Pooling techniques to a deep learning project that is unrelated to images and uses a simple numeric dataset as input. I realize that it's a bit apples and oranges, but I quite like the approach of 'sliding a window' over a set of observations and extracting a set of features out of it, even though the observations aren't exactly pixels, so I wanted to give it a try.
So far I am getting very disappointing Testing Accuracy, and I would have given up by now, except that the Training Accuracy and Loss seem to be performing reasonably well and respond in a sensible way to various adjustments in hyperparameters.
I am well aware of the possibility of overfitting, but could it really be so bad that I would achieve 90% accuracy on Training and remain stuck at 25% accuracy on Testing? My data has 4 classes, so 25% accuracy in testing is basically purely random outcomes. I am wondering if I'm just missing something completely obvious here?
I'm trying to analyze my Graph in TensorBoard and as far as I can understand I don't see anything wrong with with the computation of Training and Testing accuracies. The only thing I don't understand is why Training and Testing input queues are listed on the side and don't seem to be connected with anything, but I can see from running the code and logging from inside the code that TF reads the appropriate training and testing data in batches.
My network quite simple -- 1 convolution layer + 1 fully connected layer + 1 readout layer. Each input row has 480 columns and the idea is to keep it that way, instead of forming an AxB matrix. I then 'slide' a 30x1 window across this line with the given stride. I am extracting 50 features out of the convolution layer and 100 features out of the fully connected layer. I split the original dataset 80/20 for training and testing via sklearn.cross_validation.train_test_split
Do I have too many degrees of freedom that the network simply overfits and memorizes the training data and remains useless for the testing data? Or am I not evaluating the Test Accuracy properly?
Testing Accuracy is calculated as follows:
dateLbl_batch, feature_batch, label_batch = sess.run([dateLblTest, featuresTest, labelsTest])
acc, summary = sess.run([accuracyTest, summary_test], feed_dict={X: feature_batch, Y_: label_batch})
i += 1
summary_writer.add_summary(summary, i)
Where accuracyTest is defined like this:
with tf.name_scope('AccuracyTest'):
accuracyTest = tf.reduce_mean(tf.cast(tf.equal(
tf.argmax(Y, 1),
tf.argmax(Y_, 1)), tf.float32))
and Y_ are the labels loaded from the Test dataset and Y is the output of the Readout Layer:
Y = tf.matmul(h_fc1, W_fc2, name='ReadOut_Layer') + b_fc2
Here's the relevant part of my code that has it all together:
TS = 480
TL = 4
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
def conv2d(x, W, sX, sY):
return tf.nn.conv2d(x, W, strides=[1, sX, sY, 1], padding='SAME')
def pool_fn(x, kX, kY, sX, sY):
return tf.nn.max_pool(x, ksize=[1, kX, kY, 1], strides=[1, sX, sY, 1], padding='SAME')
Y_ = tf.placeholder(tf.float32, [None, TL], name='pl_labels')
X = tf.placeholder(tf.float32, [None, TS], name='pl_x')
# Convolution Parameters
frame_x = 480
frame_y = 1
wnd_x = 30
wnd_y = 1
features_l1 = 50
features_lFC = 100
conv_stride_x = 1
conv_stride_y = 1
pool_krn_x = 2
pool_krn_y = 1
pool_stride_x = 2
pool_stride_y = 1
fc_x = int(frame_x / pool_krn_x)
fc_y = int(frame_y / pool_krn_y)
# 1st Layer
x_conv = tf.reshape(X, [-1,frame_x,frame_y,1])
W_conv1 = weight_variable([wnd_x, wnd_y, 1, features_l1])
b_conv1 = bias_variable([features_l1])
h_conv1 = tf.nn.relu(conv2d(x_conv, W_conv1, conv_stride_x, conv_stride_y) + b_conv1)
h_pool1 = pool_fn(h_conv1, pool_krn_x, pool_krn_y, pool_stride_x, pool_stride_y)
# Fully Connected Layer
W_fc1 = weight_variable([fc_x * fc_y * features_l1, features_lFC])
b_fc1 = bias_variable([features_lFC])
h_pool2_flat = tf.reshape(h_pool1, [-1, fc_x*fc_y*features_l1])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
# Readout Layer
W_fc2 = weight_variable([features_lFC, TL])
b_fc2 = bias_variable([TL])
Y = tf.matmul(h_fc1, W_fc2, name='ReadOut_Layer') + b_fc2
dateLbl, features, labels = input_pipeline(fileNameTrain, batch_size, try_epochs)
dateLblTest, featuresTest, labelsTest = input_pipeline(fileNameTest, batch_size, 1)
with tf.name_scope('SoftMaxModel'):
myModel = tf.nn.softmax_cross_entropy_with_logits(labels=Y_, logits=Y, name='SoftMaxModel')
with tf.name_scope('LossFn'):
lossFn = tf.reduce_mean(myModel, name = 'LossFn')
with tf.name_scope('Optimizer'):
train_step = tf.train.AdamOptimizer(1e-4, name='AdamConst').minimize(lossFn, name='MinimizeLossFn')
with tf.name_scope('AccuracyTrain'):
accuracyTrain = tf.reduce_mean(tf.cast(tf.equal(
tf.argmax(Y, 1),
tf.argmax(Y_, 1)), tf.float32))
with tf.name_scope('AccuracyTest'):
accuracyTest = tf.reduce_mean(tf.cast(tf.equal(
tf.argmax(Y, 1),
tf.argmax(Y_, 1)), tf.float32))
a1 = tf.summary.histogram("Model", myModel)
a2 = tf.summary.scalar("Loss", lossFn)
a3 = tf.summary.scalar("AccuracyTrain", accuracyTrain)
a4 = tf.summary.scalar("AccuracyTest", accuracyTest)
summary_train = tf.summary.merge([a1, a2, a3])
summary_test = tf.summary.merge([a4])
with tf.Session() as sess:
summary_writer = tf.summary.FileWriter(logs_path, sess.graph)
gInit = tf.global_variables_initializer().run()
lInit = tf.local_variables_initializer().run()
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
############################ TRAINING ############################
try:
i = 0
acCumTrain = 0
while not coord.should_stop():
dateLbl_batch, feature_batch, label_batch = sess.run([dateLbl, features, labels])
_, acc, summary = sess.run([train_step, accuracyTrain, summary_train], feed_dict={X: feature_batch, Y_: label_batch})
i += 1
summary_writer.add_summary(summary, i)
acCumTrain += acc
except tf.errors.OutOfRangeError:
acCumTrain /= i
print('-------------- Finished Training ---------------')
finally:
coord.request_stop()
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
############################ TESTING ############################
try:
i = 0
acCumTest = 0
while not coord.should_stop():
dateLbl_batch, feature_batch, label_batch = sess.run([dateLblTest, featuresTest, labelsTest])
acc, summary = sess.run([accuracyTest, summary_test], feed_dict={X: feature_batch, Y_: label_batch})
i += 1
summary_writer.add_summary(summary, i)
acCumTest += acc
except tf.errors.OutOfRangeError:
acCumTest /= i
print('-------------- Finished Testing ---------------')
finally:
coord.request_stop()
print('Training Accuracy: {:.2f} Testing Accuracy: {:.2f}'.format(acCumTrain, acCumTest))
coord.join(threads)
Here are the screenshots of training and testing accuracy from TensorBoard -- looks promising in Training, but just random noise in Testing!
Here is the overall Graph:
And here is the zoom of the Graph that shows how Training and Testing Accuracy are calculated:
Looks like overfitting to me. Try adding dropout and weight decay and see what changes. Maybe even increase the capacity, while applying regularization, by adding more layers. If this does not help you should take a look at the data. Maybe your training/test set is too different. Then augmentations might help.
The numbers you give in the comments make me think that it's definitely overfitting. For example, MNIST has twice as many features as you but 35 times as many training examples.
I think it's also useful to get a sense for how many weights (parameters) you have -- that surprised me when I started using CNNs. Your first conv layer has 30*50 featues = 1500, which isn't so bad. But then you have 50*480/2=12000 INPUTS to your fully-connected layer, which turn into 48000 features in your FC. That's a pretty big matrix for the size of your problem.
You could actually make your parameter space a lot smaller by adding a conv-layer or two, along with pooling layers.
Other ways to make your parameter-space smaller: decrease the size of your conv-filter from 50 to 10, or use dimension-reduction techniques like PCA.
Finally, don't forget that there's actually a Conv1d function that suits your use much better.

'numpy.ndarray' object has no attribute 'train'

how can i solve this ? this is my first time for Tensortflow. I try to copy Train and Evaluate the Model from tensortflow tutorial but it seem not work. Can someone help me to solve my problem? Thanks!
http://pastebin.com/NCQKNyKy
import tensorflow as tf
sess = tf.InteractiveSession()
import numpy as np
from numpy import genfromtxt
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 3*3, 1], padding='VALID')
data = genfromtxt('circle_deeplearn_data_small.txt',delimiter=',')
out = genfromtxt('circle_deeplearn_output_small.txt',delimiter=',')
x = tf.placeholder(tf.float32, shape =[None, 3*3*15]) # size of x
y_ = tf.placeholder(tf.float32, shape =[None, 1]) # size of output
W_conv1 = weight_variable([1,3*3,1,15])
b_conv1 = bias_variable([15])
x_image = tf.reshape(x,[-1,1,3*3*15,1])
h_conv1 = tf.nn.relu(conv2d(x_image,W_conv1) + b_conv1)
W_fc1 = weight_variable([1 * 1 * 15 , 1])
b_fc1 = bias_variable([1])
h_conv1_flat = tf.reshape(h_conv1 , [-1,1 * 1 * 15])
h_fc1 = tf.nn.relu(tf.matmul(h_conv1_flat , W_fc1) + b_fc1)
y_conv = h_fc1
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
#Adam
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_conv, y_))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
#sess.run(tf.global_variables_initializer())
sess.run(tf.initialize_all_variables())
for i in range(20000):
batch = data.train.next_batch(50)
if i%100 == 0:
train_accuracy = accuracy.eval(feed_dict={x:batch[0], y_: batch[1], keep_prob: 1.0})
print("step %d, training accuracy %g"%(i, train_accuracy))
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
print("test accuracy %g"%accuracy.eval(feed_dict={x: data, y_: out, keep_prob: 1.0}))
This is result:
AttributeError: 'numpy.ndarray' object has no attribute 'train'
here datais just a numpy array. You may need to write ur own train data iterator
It is not quite clear what you are trying to do. The problem occurs because data is a numpy array generated in this line
data = genfromtxt('circle_deeplearn_data_small.txt',delimiter=',')
The error occurs when you try to use the method train of data, which does not exist, in the following line
batch = data.train.next_batch(50)
Instead you need to feed data to tensorflow.
I have faced same problem. Actually, it's not a problem. Literally, I didn't know the structure of the data that's why I have faced this problem. Te datasets comes from tensorflow lib are compressed in a single file and separated in a file as train, test, and validation set. That's why when we call dataset.train.next_batch() it does work. You own datatset is not compressed in the same way that's why it doesn't work. You have to configure your dataset on the own way so do the batch system and looping.
You may try to use numpy.reshape to turn your data from 2 dimension into 3 dimension.
For example if you had 20 samples and 100 features, so a (20,100) data matrix and used a minibatch size of 5. Then you could reshape using np.reshape(data,[10,5,-1]) to get a (10,5,40) matrix.
*The "-1" meaning that you leave numpy to count the array for your, the total number of array is 20,000.
Thus, in this example: 10*5*40 = 20,000.

Convolutional Neural Network in Tensorflow with Own Data for Prediction

I am a beginner in CNN and Tensorflow. I am trying to implement convolutional neural network in tensorflow with own data for prediction but I am having some problems. I converted Deep MNIST for Experts tutorials to this. Deep MNIST for Experts is classification, but I am trying to do regression. Another problem is, this code give me accuracy=1 for each step.
What is the cause of the error? How can I convert this code for regression?
Data set:
Year_Month_Day,Hour_Minute,Temperature,Relative_humidity,Pressure,Total_Precipitation,Snowfall_amount,Total_cloud_cover,High_cloud_cover,Medium_cloud_cover,Low_cloud_cover,Shortwave_Radiation,Wind_speed_10m,Wind_direction_10m,Wind_speed_80m,Wind_direction_80m,Wind_speed_900m,Wind_direction_900m,Wind_Gust_10m,Difference
2016-10-24,23.00,15.47,76.00,1015.40,0.00,0.00,100.00,26.00,100.00,100.00,0.00,6.88,186.01,12.26,220.24,27.60,262.50,14.04,2.1
2016-10-24,22.00,16.14,73.00,1014.70,0.00,0.00,10.20,34.00,0.00,2.00,0.00,6.49,176.82,11.97,201.16,24.27,249.15,7.92,0.669999
.....
.....
.....
2016-10-24,18.00,20.93,56.00,1012.20,0.00,0.00,100.00,48.00,15.00,100.00,91.67,6.49,146.31,12.10,149.62,17.65,163.41,8.64,1.65
2016-10-24,17.00,21.69,50.00,1012.10,0.00,0.00,100.00,42.00,10.00,100.00,243.86,9.50,142.70,12.77,139.57,19.08,144.21,32.40,0.76
Code:
import tensorflow as tf
import pandas as pandas
from sklearn import cross_validation
from sklearn import preprocessing
from sklearn import metrics
sess = tf.InteractiveSession()
data = pandas.read_csv("tuna.csv")
print(data[-2:])
#X=data.copy(deep=True)
X=data[['Relative_humidity','Pressure','Total_Precipitation','Snowfall_amount','Total_cloud_cover','High_cloud_cover','Medium_cloud_cover','Low_cloud_cover','Shortwave_Radiation','Wind_speed_10m','Wind_direction_10m','Wind_speed_80m','Wind_direction_80m','Wind_speed_900m','Wind_direction_900m','Wind_Gust_10m']].fillna(0)
Y=data[['Temperature']]
number_of_samples=X.shape[0]
elements_of_one_sample=X.shape[1]
print("number of samples", number_of_samples)
print("elements_of_one_sample", elements_of_one_sample)
train_x, test_x, train_y, test_y = cross_validation.train_test_split(X, Y, test_size=0.1, random_state=42)
print("train_x.shape=", train_x.shape)
print("train_y.shape=", train_y.shape)
print("test_x.shape=", test_x.shape)
print("test_y.shape=", test_y.shape)
epoch = 0 # counter for number of rounds training network
last_cost = 0 # keep track of last cost to measure difference
max_epochs = 2000 # total number of training sessions
tolerance = 1e-6 # we stop when diff in costs less than that
batch_size = 50 # we batch the data in groups of this size
num_samples = train_y.shape[0] # number of samples in training set
num_batches = int( num_samples / batch_size ) # compute number of batches, given
print("############################## num_samples", num_samples)
print("############################## num_batches", num_batches)
x = tf.placeholder(tf.float32, shape=[None, 16])
y_ = tf.placeholder(tf.float32, shape=[None, 1])
# xW + b
W = tf.Variable(tf.zeros([16,1]))
b = tf.Variable(tf.zeros([1]))
sess.run(tf.initialize_all_variables())
# y = softmax(xW + b)
y = tf.nn.softmax(tf.matmul(x,W) + b)
# lossはcross entropy
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
for n in range( num_batches ):
batch_x = train_x[ n*batch_size : (n+1)*batch_size ]
batch_y = train_y[ n*batch_size : (n+1)*batch_size ]
train_step.run( feed_dict={x: batch_x, y_: batch_y} )
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(accuracy.eval(feed_dict={x: test_x, y_: test_y}))
# To create this model, we're going to need to create a lot of weights and biases.
# One should generally initialize weights with a small amount of noise for symmetry
# breaking, and to prevent 0 gradients
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
# Since we're using ReLU neurons, it is also good practice to initialize them
# with a slightly positive initial bias to avoid "dead neurons." Instead of doing
# this repeatedly while we build the model, let's create two handy functions
# to do it for us.
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
# https://www.tensorflow.org/versions/master/api_docs/python/nn.html#conv2d
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
# https://www.tensorflow.org/versions/master/api_docs/python/nn.html#max_pool
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],strides=[1, 2, 2, 1], padding='SAME')
W_conv1 = weight_variable([2, 2, 1, 32])
b_conv1 = bias_variable([32])
x_image = tf.reshape(x, [-1,4,4,1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
W_conv2 = weight_variable([2, 2, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
W_fc1 = weight_variable([1 * 1 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 1*1*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
W_fc2 = weight_variable([1024, 1])
b_fc2 = bias_variable([1])
y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
# loss
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
# accuracy
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# train
sess.run(tf.initialize_all_variables())
for i in range(20000):
if i%100 == 0:
batch_x = train_x[ n*batch_size : (n+1)*batch_size ]
batch_y = train_y[ n*batch_size : (n+1)*batch_size ]
train_accuracy = accuracy.eval(feed_dict={x:batch_x, y_: batch_y, keep_prob: 1.0})
print("step %d, training accuracy %g"%(i, train_accuracy))
train_step.run(feed_dict={x: batch_x, y_: batch_y, keep_prob: 0.5})
# result
print("test accuracy %g"%accuracy.eval(feed_dict={
x: test_x, y_: test_y, keep_prob: 1.0}))
Output:
number of samples 1250
elements_of_one_sample 16
train_x.shape= (1125, 16)
train_y.shape= (1125, 1)
test_x.shape= (125, 16)
test_y.shape= (125, 1)
############################## num_samples 1125
############################## num_batches 22
1.0
step 0, training accuracy 1
step 100, training accuracy 1
step 200, training accuracy 1
step 300, training accuracy 1
step 400, training accuracy 1
....
....
....
step 19500, training accuracy 1
step 19600, training accuracy 1
step 19700, training accuracy 1
step 19800, training accuracy 1
step 19900, training accuracy 1
test accuracy 1
I am quite new to neural nets and machine learning so pardon me for any mistakes, thanks in advance.
You've got a loss function of cross entropy, which is a loss function specifically designed for classification. If you want to do regression, you need to start with a loss function that penalizes prediction error (L2 error is a great place to start).
For prediction, the rightmost layer of the network needs to have linear units (no activation function). The number of neurons in the rightmost layer should correspond to the number of values you're predicting (if it's a simple regression problem where you're predicting a single value of y given a vector of inputs x, then you just need a single neuron in the right-most layer). Right now, you've got a softmax layer on the back end of the network, which is also specifically used for classification tasks.
Basically - you need to swap your softmax for a linear neuron and change your loss function to something like L2 error (aka mean-squared error).