Error in simple Network - tensorflow

What is wrong with this tensorflow code? I seem to be tardy to see the mistake. It doesn't converge. It stopps by 2.30.
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
x = tf.placeholder(tf.float32, [None, 784])
W1 = tf.Variable(tf.zeros([784, 100]))
b1 = tf.Variable(tf.zeros([100]))
W2 = tf.Variable(tf.zeros([100, 20]))
b2 = tf.Variable(tf.zeros([20]))
W3 = tf.Variable(tf.zeros([20, 10]))
b3 = tf.Variable(tf.zeros([10]))
y1 = tf.nn.relu(tf.add(tf.matmul(x, W1), b1))
y2 = tf.nn.relu(tf.add(tf.matmul(y1, W2), b2))
y3 = tf.nn.softmax(tf.add(tf.matmul(y2, W3), b3))
sess = tf.InteractiveSession()
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y3), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
tf.global_variables_initializer().run()
init = tf.global_variables_initializer()
sess.run(init)
for _ in range(10000):
batch_xs, batch_ys = mnist.train.next_batch(100)
print(sess.run(cross_entropy, feed_dict={x: batch_xs, y_: batch_ys}))
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
Thank you!

I can see a couple of things that should be addressed:
The learning rate of 0.5 is quite large for stochastic gradient descent. If a network isn't training, you can always try with different values, typically in the range [1e-2, 1e-5].
Networks initialized with zeros (tf.zeros) fail to learn for two reasons:
Without any difference between parameter values, gradient is shared evenly across them all, meaning that they all learn to be the same value.
As the gradients are multiplied by weights during back-propagation, the resultant value will always equal zero - meaning no change in weight values.
I would also recommend using the built-in tf.losses.softmax_cross_entropy instead of doing it yourself. It's generally a good idea, as it minimizes the chance of making a mistake along the way. :)

Related

2 Layer Neural Network Does not Converge

Background
I am a newbie to TensorFlow and I am trying to understand the basics of deep learning. I started from writing a two-layer neural network from scratch and it achieved 89% accuracy on MNIST dataset and now I am trying to implement the same network in TensorFlow and compare their performance.
Problem
I am not sure if I miss something basic in the code, but the following implementation seems to be unable to update weights and therefore could not output anything meaningful.
num_hidden = 100
# x -> (batch_size, 784)
x = tf.placeholder(tf.float32, [None, 784])
W1 = tf.Variable(tf.zeros((784, num_hidden)))
b1 = tf.Variable(tf.zeros((1, num_hidden)))
W2 = tf.Variable(tf.zeros((num_hidden, 10)))
b2 = tf.Variable(tf.zeros((1, 10)))
# z -> (batch_size, num_hidden)
z = tf.nn.relu(tf.matmul(x, W1) + b1)
# y -> (batch_size, 10)
y = tf.nn.softmax(tf.matmul(z, W2) + b2)
# y_ -> (batch_size, 10)
y_ = tf.placeholder(tf.float32, [None, 10])
# y_ * tf.log(y) -> (batch_size, 10)
cross_entropy = -tf.reduce_sum(y_ * tf.log(y+1e-10))
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
# tf.argmax(y, axis=1) returns the maximum index in each row
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
for epoch in range(1000):
# batch_xs -> (100, 784)
# batch_ys -> (100, 10), one-hot encoded
batch_xs, batch_ys = mnist.train.next_batch(100)
train_data = {x: batch_xs, y_: batch_ys}
sess.run(train_step, feed_dict=train_data)
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
W1_e, b1_e, W2_e, b2_e = W1.eval(), b1.eval(), W2.eval(), b2.eval()
sess.close()
What I Have Done
I checked many the official docs and many other implementations, but I feel totally confused since they may use different versions and API varies greatly.
So could someone help me, thank you in advance.
There are two problems with what you have done so far. First, you have initialised all of the weights to zero, which will prevent the network from learning. And secondly, the learning rate was too high. The below code got me 0.9665 accuracy. For why not to set all the weights to zero you can see here .
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
num_hidden = 100
# x -> (batch_size, 784)
x = tf.placeholder(tf.float32, [None, 784])
label_place = tf.placeholder(tf.float32, [None, 10])
# WONT WORK as EVERYTHING IS ZERO!
# # Get accuracy at chance \approx 0.1
# W1 = tf.Variable(tf.zeros((784, num_hidden)))
# b1 = tf.Variable(tf.zeros((1, num_hidden)))
# W2 = tf.Variable(tf.zeros((num_hidden, 10)))
# b2 = tf.Variable(tf.zeros((1, 10)))
# Will work, you will need to train a bit more than 1000 steps
# though
W1 = tf.Variable(tf.random_normal((784, num_hidden), 0., 0.1))
b1 = tf.Variable(tf.zeros((1, num_hidden)))
W2 = tf.Variable(tf.random_normal((num_hidden, 10), 0, 0.1))
b2 = tf.Variable(tf.zeros((1, 10)))
# network, we only go as far as the linear output after the hidden layer
# so we can feed it into the tf.nn.softmax_cross_entropy_with_logits below
# this is more numerically stable
z = tf.nn.relu(tf.matmul(x, W1) + b1)
logits = tf.matmul(z, W2) + b2
# define our loss etc as before. however note that the learning rate is lower as
# with a higher learning rate it wasnt really working
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=label_place, logits=logits)
train_step = tf.train.GradientDescentOptimizer(.001).minimize(cross_entropy)
# continue as before
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
correct_prediction = tf.equal(tf.argmax(tf.nn.softmax(logits), 1), tf.argmax(label_place, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
for epoch in range(5000):
batch_xs, batch_ys = mnist.train.next_batch(100)
train_data = {x: batch_xs, label_place: batch_ys}
sess.run(train_step, feed_dict=train_data)
print(sess.run(accuracy, feed_dict={x: mnist.test.images, label_place: mnist.test.labels}))
W1_e, b1_e, W2_e, b2_e = W1.eval(), b1.eval(), W2.eval(), b2.eval()
sess.close()

How does TensorFlow know which variables to change for optimization?

Code taken from:-http://adventuresinmachinelearning.com/python-tensorflow-tutorial/
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
# Python optimisation variables
learning_rate = 0.5
epochs = 10
batch_size = 100
# declare the training data placeholders
# input x - for 28 x 28 pixels = 784
x = tf.placeholder(tf.float32, [None, 784])
# now declare the output data placeholder - 10 digits
y = tf.placeholder(tf.float32, [None, 10])
# now declare the weights connecting the input to the hidden layer
W1 = tf.Variable(tf.random_normal([784, 300], stddev=0.03), name='W1')
b1 = tf.Variable(tf.random_normal([300]), name='b1')
# and the weights connecting the hidden layer to the output layer
W2 = tf.Variable(tf.random_normal([300, 10], stddev=0.03), name='W2')
b2 = tf.Variable(tf.random_normal([10]), name='b2')
# calculate the output of the hidden layer
hidden_out = tf.add(tf.matmul(x, W1), b1)
hidden_out = tf.nn.relu(hidden_out)
# now calculate the hidden layer output - in this case, let's use a softmax activated
# output layer
y_ = tf.nn.softmax(tf.add(tf.matmul(hidden_out, W2), b2))
y_clipped = tf.clip_by_value(y_, 1e-10, 0.9999999)
cross_entropy = -tf.reduce_mean(tf.reduce_sum(y * tf.log(y_clipped)
+ (1 - y) * tf.log(1 - y_clipped), axis=1))
# add an optimiser
optimiser = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cross_entropy)
# finally setup the initialisation operator
init_op = tf.global_variables_initializer()
# define an accuracy assessment operation
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# start the session
with tf.Session() as sess:
# initialise the variables
sess.run(init_op)
total_batch = int(len(mnist.train.labels) / batch_size)
for epoch in range(epochs):
avg_cost = 0
for i in range(total_batch):
batch_x, batch_y = mnist.train.next_batch(batch_size=batch_size)
_, c = sess.run([optimiser, cross_entropy],
feed_dict={x: batch_x, y: batch_y})
avg_cost += c / total_batch
print("Epoch:", (epoch + 1), "cost =", "{:.3f}".format(avg_cost))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels}))
I wanted to ask, how does tensorflow recognize the parameters it needs to optimize , like in the above code we need to optimize w1,w2,b1 & b2 but we never specified that anywhere. We did ask GradientDescentOptimizer to minimize cross_entropy but we never told it that it would have to change the values of w1,w2,b1&b2 in order to do so , So how did it know the parameters on which cross_entropy depended upon?
The answer by Cory Nezin is only partially correct, and could lead to wrong assumptions!
You actually do specify which parameters are optimized (=trainable), namely by doing this:
# now declare the weights connecting the input to the hidden layer
W1 = tf.Variable(tf.random_normal([784, 300], stddev=0.03), name='W1')
b1 = tf.Variable(tf.random_normal([300]), name='b1')
# and the weights connecting the hidden layer to the output layer
W2 = tf.Variable(tf.random_normal([300, 10], stddev=0.03), name='W2')
b2 = tf.Variable(tf.random_normal([10]), name='b2')
In short, TensorFlow will only update tf.Variables. If you would use something like tf.Variable(...,trainable=False), you would not get any updates, regardless of what "network depends on". You would have still specified it, and the network would still propagate through that part, but you would never receive any updates for that specific variable.
Cory's answer is correct in the way that the network does automatically recognize what values to update it with, but you specify what has to be defined/updated first!
TensorFlow works on the premise of something called a computational graph. Essentially, whenever you say something like:
hidden_out = tf.add(tf.matmul(x, W1), b1)
TensorFlow says ok, so that output clearly depends on W1, I'll connect an edge from "hidden_out" to W1. This same process happens for y_, y_clipped, and cross_entropy. So in the end you have a graph which connects cross_entropy to W1. Pick your favorite graph traversal algorithm and TensorFlow finds the connection between cross entropy and W1.

Tensorflow Neural Network for Regression

I am using tensor flow library to build a pretty simple 2 layer artificial neural network to perform linear regression.
My problem is that the results seem to be far from expected. I've been trying to spot my mistake for hours but no hope. I am new to tensor flow and neural networks so it could be a trivial mistake. Could anyone have an idea what i am doing wrong?
from __future__ import print_function
import tensorflow as tf
import numpy as np
# Python optimisation variables
learning_rate = 0.02
data_size=100000
data_length=100
train_input=10* np.random.rand(data_size,data_length);
train_label=train_input.sum(axis=1);
train_label=np.reshape(train_label,(data_size,1));
test_input= np.random.rand(data_size,data_length);
test_label=test_input.sum(axis=1);
test_label=np.reshape(test_label,(data_size,1));
x = tf.placeholder(tf.float32, [data_size, data_length])
y = tf.placeholder(tf.float32, [data_size, 1])
W1 = tf.Variable(tf.random_normal([data_length, 1], stddev=0.03), name='W1')
b1 = tf.Variable(tf.random_normal([data_size, 1]), name='b1')
y_ = tf.add(tf.matmul(x, W1), b1)
cost = tf.reduce_mean(tf.square(y-y_))
optimiser=tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
.minimize(cost)
init_op = tf.global_variables_initializer()
correct_prediction = tf.reduce_mean(tf.square(y-y_))
accuracy = tf.cast(correct_prediction, tf.float32)
with tf.Session() as sess:
sess.run(init_op)
_, c = sess.run([optimiser, cost],
feed_dict={x:train_input , y:train_label})
k=sess.run(b1)
print(k)
print(sess.run(accuracy, feed_dict={x: test_input, y: test_label}))
Thanks for your help!
There are a number of changes you have to make in your code.
First of all, you have to perform training for number of epochs and also feed the optimizer training data in batches. Your learning rate was very high. Bias is supposed to be only one input for every dense (fully connected) layer. You can plot the cost (loss) value to see how your network is converging.
In order to feed data in batches, I have made the changes in placeholders also. Check the full modified code:
from __future__ import print_function
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# Python optimisation variables
learning_rate = 0.001
data_size=1000 # Had to change these value to fit in my memory
data_length=10
train_input=10* np.random.rand(data_size,data_length);
train_label=train_input.sum(axis=1);
train_label=np.reshape(train_label,(data_size,1));
test_input= np.random.rand(data_size,data_length);
test_label=test_input.sum(axis=1);
test_label=np.reshape(test_label,(data_size,1));
tf.reset_default_graph()
x = tf.placeholder(tf.float32, [None, data_length])
y = tf.placeholder(tf.float32, [None, 1])
W1 = tf.Variable(tf.random_normal([data_length, 1], stddev=0.03), name='W1')
b1 = tf.Variable(tf.random_normal([1, 1]), name='b1')
y_ = tf.add(tf.matmul(x, W1), b1)
cost = tf.reduce_mean(tf.square(y-y_))
optimiser=tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)
init_op = tf.global_variables_initializer()
EPOCHS = 500
BATCH_SIZE = 32
with tf.Session() as sess:
sess.run(init_op)
loss_history = []
for epoch_no in range(EPOCHS):
for offset in range(0, data_size, BATCH_SIZE):
batch_x = train_input[offset: offset + BATCH_SIZE]
batch_y = train_label[offset: offset + BATCH_SIZE]
_, c = sess.run([optimiser, cost],
feed_dict={x:batch_x , y:batch_y})
loss_history.append(c)
plt.plot(range(len(loss_history)), loss_history)
plt.show()
# For running test dataset
results, test_cost = sess.run([y_, cost], feed_dict={x: test_input, y: test_label})
print('test cost: {:.3f}'.format(test_cost))
for t1, t2 in zip(results, test_label):
print('Prediction: {:.3f}, actual: {:.3f}'.format(t1[0], t2[0]))

Simple model gets 0.0 accuracy

I am training a simple model on a dataset containing labels always equal to 0, and am getting a 0.0 accuracy.
The model is the following:
import csv
import numpy as np
import pandas as pd
import tensorflow as tf
labelsReader = pd.read_csv('data.csv',usecols = [12],header=None)
dataReader = pd.read_csv('data.csv',usecols = [1,2,3,4,5,6,7,8,9,10,11],header=None)
labels_ = labelsReader.values
data_ = dataReader.values
labels = np.float32(labels_)
data = np.float32(data_)
x = tf.placeholder(tf.float32, [None, 11])
W = tf.Variable(tf.truncated_normal([11, 1], stddev=1./11.))
b = tf.Variable(tf.zeros([1]))
y = tf.matmul(x, W) + b
# Define loss and optimizer
y_ = tf.placeholder(tf.float32, [None, 1])
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
for i in range(0, 1000):
train_step.run(feed_dict={x: data, y_: labels})
correct_prediction = tf.equal(y, y_)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: data, y_: labels}))
And here is the dataset:
444444,0,0,0.9993089149965446,0,0,0.000691085003455425,0,0,0,0,0,0
As the model trains, y of the data shown above decreases, and reaches -1000 after 1000 iterations.
What could be the cause of the failure to train the model ?
Your accuracy checks if the predicted float is exactly equal to the value you expect. With the network you made this is a very difficult task (although you might have a chance as you are also overfitting your data).
To get better results:
- Define accuracy to be higher/lower than a value (closer to 1 or closer to 0).
- Normalise your input data, I don't know the range of your input, but 444444 is a rediculous value to use as input, and it is difficult to train weights that can handle these values.
Also: try to add some sanity checks. For example: what is the output your model is predicting? (y.eval) And what is the cross entropy you have during training your network? (sess.run([accuracy,cross_entropy], feed_dict={x: data, y_: labels})
Good luck!

tensor flow character recognition with softmax results in accuracy 1 due to [NaN...NaN] prediction

I am trying to use the softmax regression method discussed in https://www.tensorflow.org/get_started/mnist/beginners to recognize characters.
My code is as follows.
train_data = pd.read_csv('CharDataSet/train.csv')
print(train_data.shape)
x = tf.placeholder(tf.float32, [None, 130])
W = tf.Variable(tf.zeros([130, 26]))
b = tf.Variable(tf.zeros([26]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 26])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
for _ in range(10):
batch_xs = train_data.iloc[:, 2:]
print(batch_xs)
batch_ys = getencodedbatch(train_data.iloc[:, 1])
print(batch_ys)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
However, I am getting an accuracy of 1, which shouldn't be the case.
The reason why I am getting it so is because my y tensor results with an array like
[nan, ..., nan]
Can anyone explain to me what is wrong in my code?
I converted each character to a one-hot encoding using the method below
def getencodedbatch(param):
s = (param.shape[0],26)
y_encode = np.zeros(s)
row=0
# print(y_encode)
for val in param:
col = ord(val)-97
y_encode[row, col] = 1
row += 1
return pd.DataFrame(y_encode)
Here is the problem you are having:
You set your initial weights and biases to 0 (this is wrong, as your
network does not learn).
The result is that y consists of all zeros
You take the log of y.. and a log of 0 is not defined... Hence the NaN.
Good luck!
Edit to tell you how to fix it: look for an example on classifying MNIST characters and see what they do. You probably want to initialise your weights to be random normals ;)