2 Layer Neural Network Does not Converge - tensorflow

Background
I am a newbie to TensorFlow and I am trying to understand the basics of deep learning. I started from writing a two-layer neural network from scratch and it achieved 89% accuracy on MNIST dataset and now I am trying to implement the same network in TensorFlow and compare their performance.
Problem
I am not sure if I miss something basic in the code, but the following implementation seems to be unable to update weights and therefore could not output anything meaningful.
num_hidden = 100
# x -> (batch_size, 784)
x = tf.placeholder(tf.float32, [None, 784])
W1 = tf.Variable(tf.zeros((784, num_hidden)))
b1 = tf.Variable(tf.zeros((1, num_hidden)))
W2 = tf.Variable(tf.zeros((num_hidden, 10)))
b2 = tf.Variable(tf.zeros((1, 10)))
# z -> (batch_size, num_hidden)
z = tf.nn.relu(tf.matmul(x, W1) + b1)
# y -> (batch_size, 10)
y = tf.nn.softmax(tf.matmul(z, W2) + b2)
# y_ -> (batch_size, 10)
y_ = tf.placeholder(tf.float32, [None, 10])
# y_ * tf.log(y) -> (batch_size, 10)
cross_entropy = -tf.reduce_sum(y_ * tf.log(y+1e-10))
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
# tf.argmax(y, axis=1) returns the maximum index in each row
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
for epoch in range(1000):
# batch_xs -> (100, 784)
# batch_ys -> (100, 10), one-hot encoded
batch_xs, batch_ys = mnist.train.next_batch(100)
train_data = {x: batch_xs, y_: batch_ys}
sess.run(train_step, feed_dict=train_data)
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
W1_e, b1_e, W2_e, b2_e = W1.eval(), b1.eval(), W2.eval(), b2.eval()
sess.close()
What I Have Done
I checked many the official docs and many other implementations, but I feel totally confused since they may use different versions and API varies greatly.
So could someone help me, thank you in advance.

There are two problems with what you have done so far. First, you have initialised all of the weights to zero, which will prevent the network from learning. And secondly, the learning rate was too high. The below code got me 0.9665 accuracy. For why not to set all the weights to zero you can see here .
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
num_hidden = 100
# x -> (batch_size, 784)
x = tf.placeholder(tf.float32, [None, 784])
label_place = tf.placeholder(tf.float32, [None, 10])
# WONT WORK as EVERYTHING IS ZERO!
# # Get accuracy at chance \approx 0.1
# W1 = tf.Variable(tf.zeros((784, num_hidden)))
# b1 = tf.Variable(tf.zeros((1, num_hidden)))
# W2 = tf.Variable(tf.zeros((num_hidden, 10)))
# b2 = tf.Variable(tf.zeros((1, 10)))
# Will work, you will need to train a bit more than 1000 steps
# though
W1 = tf.Variable(tf.random_normal((784, num_hidden), 0., 0.1))
b1 = tf.Variable(tf.zeros((1, num_hidden)))
W2 = tf.Variable(tf.random_normal((num_hidden, 10), 0, 0.1))
b2 = tf.Variable(tf.zeros((1, 10)))
# network, we only go as far as the linear output after the hidden layer
# so we can feed it into the tf.nn.softmax_cross_entropy_with_logits below
# this is more numerically stable
z = tf.nn.relu(tf.matmul(x, W1) + b1)
logits = tf.matmul(z, W2) + b2
# define our loss etc as before. however note that the learning rate is lower as
# with a higher learning rate it wasnt really working
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=label_place, logits=logits)
train_step = tf.train.GradientDescentOptimizer(.001).minimize(cross_entropy)
# continue as before
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
correct_prediction = tf.equal(tf.argmax(tf.nn.softmax(logits), 1), tf.argmax(label_place, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
for epoch in range(5000):
batch_xs, batch_ys = mnist.train.next_batch(100)
train_data = {x: batch_xs, label_place: batch_ys}
sess.run(train_step, feed_dict=train_data)
print(sess.run(accuracy, feed_dict={x: mnist.test.images, label_place: mnist.test.labels}))
W1_e, b1_e, W2_e, b2_e = W1.eval(), b1.eval(), W2.eval(), b2.eval()
sess.close()

Related

How does TensorFlow know which variables to change for optimization?

Code taken from:-http://adventuresinmachinelearning.com/python-tensorflow-tutorial/
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
# Python optimisation variables
learning_rate = 0.5
epochs = 10
batch_size = 100
# declare the training data placeholders
# input x - for 28 x 28 pixels = 784
x = tf.placeholder(tf.float32, [None, 784])
# now declare the output data placeholder - 10 digits
y = tf.placeholder(tf.float32, [None, 10])
# now declare the weights connecting the input to the hidden layer
W1 = tf.Variable(tf.random_normal([784, 300], stddev=0.03), name='W1')
b1 = tf.Variable(tf.random_normal([300]), name='b1')
# and the weights connecting the hidden layer to the output layer
W2 = tf.Variable(tf.random_normal([300, 10], stddev=0.03), name='W2')
b2 = tf.Variable(tf.random_normal([10]), name='b2')
# calculate the output of the hidden layer
hidden_out = tf.add(tf.matmul(x, W1), b1)
hidden_out = tf.nn.relu(hidden_out)
# now calculate the hidden layer output - in this case, let's use a softmax activated
# output layer
y_ = tf.nn.softmax(tf.add(tf.matmul(hidden_out, W2), b2))
y_clipped = tf.clip_by_value(y_, 1e-10, 0.9999999)
cross_entropy = -tf.reduce_mean(tf.reduce_sum(y * tf.log(y_clipped)
+ (1 - y) * tf.log(1 - y_clipped), axis=1))
# add an optimiser
optimiser = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cross_entropy)
# finally setup the initialisation operator
init_op = tf.global_variables_initializer()
# define an accuracy assessment operation
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# start the session
with tf.Session() as sess:
# initialise the variables
sess.run(init_op)
total_batch = int(len(mnist.train.labels) / batch_size)
for epoch in range(epochs):
avg_cost = 0
for i in range(total_batch):
batch_x, batch_y = mnist.train.next_batch(batch_size=batch_size)
_, c = sess.run([optimiser, cross_entropy],
feed_dict={x: batch_x, y: batch_y})
avg_cost += c / total_batch
print("Epoch:", (epoch + 1), "cost =", "{:.3f}".format(avg_cost))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels}))
I wanted to ask, how does tensorflow recognize the parameters it needs to optimize , like in the above code we need to optimize w1,w2,b1 & b2 but we never specified that anywhere. We did ask GradientDescentOptimizer to minimize cross_entropy but we never told it that it would have to change the values of w1,w2,b1&b2 in order to do so , So how did it know the parameters on which cross_entropy depended upon?
The answer by Cory Nezin is only partially correct, and could lead to wrong assumptions!
You actually do specify which parameters are optimized (=trainable), namely by doing this:
# now declare the weights connecting the input to the hidden layer
W1 = tf.Variable(tf.random_normal([784, 300], stddev=0.03), name='W1')
b1 = tf.Variable(tf.random_normal([300]), name='b1')
# and the weights connecting the hidden layer to the output layer
W2 = tf.Variable(tf.random_normal([300, 10], stddev=0.03), name='W2')
b2 = tf.Variable(tf.random_normal([10]), name='b2')
In short, TensorFlow will only update tf.Variables. If you would use something like tf.Variable(...,trainable=False), you would not get any updates, regardless of what "network depends on". You would have still specified it, and the network would still propagate through that part, but you would never receive any updates for that specific variable.
Cory's answer is correct in the way that the network does automatically recognize what values to update it with, but you specify what has to be defined/updated first!
TensorFlow works on the premise of something called a computational graph. Essentially, whenever you say something like:
hidden_out = tf.add(tf.matmul(x, W1), b1)
TensorFlow says ok, so that output clearly depends on W1, I'll connect an edge from "hidden_out" to W1. This same process happens for y_, y_clipped, and cross_entropy. So in the end you have a graph which connects cross_entropy to W1. Pick your favorite graph traversal algorithm and TensorFlow finds the connection between cross entropy and W1.

How to switch from GradientDescent Optimizer to Adam in Tensorflow

My code is running perfectly with Gradient Descent, but I want to compare the effectiveness of my algorithm using Adam Optimizer, so I tried to modify the following code:
# Import MNIST data
#import input_data
#mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
#fashion_mnist = input_data.read_data_sets('data/fashion')
import tensorflow as tf
# Set parameters
learning_rate = 0.01 #1e-4
training_iteration = 30
batch_size = 100
display_step = 2
# TF graph input
x = tf.placeholder("float", [None, 784]) # mnist data image of shape 28*28=784
y = tf.placeholder("float", [None, 10]) # 0-9 digits recognition => 10 classes
#regularizer = tf.reduce_sum(tf.square(y))
# Create a model
# Set model weights
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
with tf.name_scope("Wx_b") as scope:
# Construct a linear model
model = tf.nn.softmax(tf.matmul(x, W) + b) # Softmax
# Add summary ops to collect data
w_h = tf.summary.histogram("weights", W)
b_h = tf.summary.histogram("biases", b)
# More name scopes will clean up graph representation
with tf.name_scope("cost_function") as scope:
# Minimize error using cross entropy
# Cross entropy
cost_function = -tf.reduce_sum(y*tf.log(model))
# Create a summary to monitor the cost function
tf.summary.scalar("cost_function", cost_function)
with tf.name_scope("train") as scope:
# Gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost_function)
# Initializing the variables
#init = tf.initialize_all_variables()
init = tf.global_variables_initializer()
# Merge all summaries into a single operator
merged_summary_op = tf.summary.merge_all()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
summary_writer = tf.summary.FileWriter('/home/raed/Tensorflow/tensorflow_demo', graph_def =sess.graph_def)
#writer.add_graph(sess.graph_def)
# Training cycle
for iteration in range(training_iteration):
avg_cost = 0.
total_batch = int(mnist.train.num_examples/batch_size)
# Loop over all batches
for i in range(total_batch):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
# Fit training using batch data
sess.run(optimizer, feed_dict={x: batch_xs, y: batch_ys})
# Compute the average loss
avg_cost += sess.run(cost_function, feed_dict={x: batch_xs, y: batch_ys})/total_batch
# Write logs for each iteration
summary_str = sess.run(merged_summary_op, feed_dict={x: batch_xs, y: batch_ys})
summary_writer.add_summary(summary_str, iteration*total_batch + i)
# Display logs per iteration step
if iteration % display_step == 0:
print ("Iteration:" "%04d" % (iteration + 1), "cost=", "{:.9f}".format(avg_cost))
print ("Tuning completed!")
# Test the model
predictions = tf.equal(tf.argmax(model, 1), tf.argmax(y, 1))
# Calculate accuracy
accuracy = tf.reduce_mean(tf.cast(predictions, "float"))
print ("Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))
to use Adam Optimizer I tried to change the following line :
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost_function)
and replace it with the AdamOptimizer :
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost_function)
when I ran the code , I got few iteration and then it stopped with the following error.
InvalidArgumentError (see above for traceback): Nan in summary histogram for: weights
[[Node: weights = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](weights/tag, Variable/read)]]
could you please help me understnad the problem , thanks in advance
the problem is weights are initialized to zero W = tf.Variable(tf.zeros([784, 10])) that`s why you re get Nan as weights.
you need to inialize them with some initializer i.e normal distribution as follow
W = tf.Variable(tf.random_normal([784, 10], stddev=0.35),
name="weights")

Error in simple Network

What is wrong with this tensorflow code? I seem to be tardy to see the mistake. It doesn't converge. It stopps by 2.30.
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
x = tf.placeholder(tf.float32, [None, 784])
W1 = tf.Variable(tf.zeros([784, 100]))
b1 = tf.Variable(tf.zeros([100]))
W2 = tf.Variable(tf.zeros([100, 20]))
b2 = tf.Variable(tf.zeros([20]))
W3 = tf.Variable(tf.zeros([20, 10]))
b3 = tf.Variable(tf.zeros([10]))
y1 = tf.nn.relu(tf.add(tf.matmul(x, W1), b1))
y2 = tf.nn.relu(tf.add(tf.matmul(y1, W2), b2))
y3 = tf.nn.softmax(tf.add(tf.matmul(y2, W3), b3))
sess = tf.InteractiveSession()
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y3), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
tf.global_variables_initializer().run()
init = tf.global_variables_initializer()
sess.run(init)
for _ in range(10000):
batch_xs, batch_ys = mnist.train.next_batch(100)
print(sess.run(cross_entropy, feed_dict={x: batch_xs, y_: batch_ys}))
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
Thank you!
I can see a couple of things that should be addressed:
The learning rate of 0.5 is quite large for stochastic gradient descent. If a network isn't training, you can always try with different values, typically in the range [1e-2, 1e-5].
Networks initialized with zeros (tf.zeros) fail to learn for two reasons:
Without any difference between parameter values, gradient is shared evenly across them all, meaning that they all learn to be the same value.
As the gradients are multiplied by weights during back-propagation, the resultant value will always equal zero - meaning no change in weight values.
I would also recommend using the built-in tf.losses.softmax_cross_entropy instead of doing it yourself. It's generally a good idea, as it minimizes the chance of making a mistake along the way. :)

TensorBoard shows No image data was found

I have implemented a NN for MNIST using TensorFlow. I want to show the result on the TensorBoard. Flowing are screenshots of the TensorBoard that I have implemented. But the IMAGES page shows "No image data was found".
What information should be shown here? I should ignore it?
CODE
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
tf.reset_default_graph()
mnist = input_data.read_data_sets('data', one_hot=True)
batch_size = 100
learning_rate = 0.5
training_epochs = 5
logs_path = "C:/tmp/mlp"
with tf.name_scope('input'):
x = tf.placeholder(tf.float32, shape=[None, 784], name="x-input")
y_ = tf.placeholder(tf.float32, shape=[None, 10], name="y-input")
with tf.name_scope("weights"):
W = tf.Variable(tf.zeros([784, 10]))
with tf.name_scope("biases"):
b = tf.Variable(tf.zeros([10]))
with tf.name_scope("softmax"):
y = tf.nn.softmax(tf.matmul(x, W) + b)
with tf.name_scope('cross_entropy'):
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
with tf.name_scope('train'):
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)
with tf.name_scope('Accuracy'):
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.summary.scalar("cost", cross_entropy)
tf.summary.scalar("accuracy", accuracy)
summary_op = tf.summary.merge_all()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
summary_writer = tf.summary.FileWriter("C:/tmp/mlp", sess.graph)
for epoch in range(training_epochs):
batch_count = int(mnist.train.num_examples / batch_size)
for i in range(batch_count):
batch_x, batch_y = mnist.train.next_batch(batch_size)
_, summary = sess.run([train_op, summary_op], feed_dict={x: batch_x, y_: batch_y})
summary_writer.add_summary(summary, epoch * batch_count + i)
if epoch % 5 == 0:
print("Epoch: ", epoch)
print("Accuracy: ", accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
print("done")
The only lines in your code that refer to a summary operation are:
tf.summary.scalar("cost", cross_entropy)
tf.summary.scalar("accuracy", accuracy)
These lines create 2 scalar summaries (and add the created summary to a default collection that contains every defined summary).
You're not defining any image summary (with tf.summmary.image) thus that tab in tensorboard will be empty.
Just ignore them, Because you did save any tf.summary.image summary, Tensorboard won't show anything in this tab;

Convolutional Neural Network in Tensorflow with Own Data for Prediction

I am a beginner in CNN and Tensorflow. I am trying to implement convolutional neural network in tensorflow with own data for prediction but I am having some problems. I converted Deep MNIST for Experts tutorials to this. Deep MNIST for Experts is classification, but I am trying to do regression. Another problem is, this code give me accuracy=1 for each step.
What is the cause of the error? How can I convert this code for regression?
Data set:
Year_Month_Day,Hour_Minute,Temperature,Relative_humidity,Pressure,Total_Precipitation,Snowfall_amount,Total_cloud_cover,High_cloud_cover,Medium_cloud_cover,Low_cloud_cover,Shortwave_Radiation,Wind_speed_10m,Wind_direction_10m,Wind_speed_80m,Wind_direction_80m,Wind_speed_900m,Wind_direction_900m,Wind_Gust_10m,Difference
2016-10-24,23.00,15.47,76.00,1015.40,0.00,0.00,100.00,26.00,100.00,100.00,0.00,6.88,186.01,12.26,220.24,27.60,262.50,14.04,2.1
2016-10-24,22.00,16.14,73.00,1014.70,0.00,0.00,10.20,34.00,0.00,2.00,0.00,6.49,176.82,11.97,201.16,24.27,249.15,7.92,0.669999
.....
.....
.....
2016-10-24,18.00,20.93,56.00,1012.20,0.00,0.00,100.00,48.00,15.00,100.00,91.67,6.49,146.31,12.10,149.62,17.65,163.41,8.64,1.65
2016-10-24,17.00,21.69,50.00,1012.10,0.00,0.00,100.00,42.00,10.00,100.00,243.86,9.50,142.70,12.77,139.57,19.08,144.21,32.40,0.76
Code:
import tensorflow as tf
import pandas as pandas
from sklearn import cross_validation
from sklearn import preprocessing
from sklearn import metrics
sess = tf.InteractiveSession()
data = pandas.read_csv("tuna.csv")
print(data[-2:])
#X=data.copy(deep=True)
X=data[['Relative_humidity','Pressure','Total_Precipitation','Snowfall_amount','Total_cloud_cover','High_cloud_cover','Medium_cloud_cover','Low_cloud_cover','Shortwave_Radiation','Wind_speed_10m','Wind_direction_10m','Wind_speed_80m','Wind_direction_80m','Wind_speed_900m','Wind_direction_900m','Wind_Gust_10m']].fillna(0)
Y=data[['Temperature']]
number_of_samples=X.shape[0]
elements_of_one_sample=X.shape[1]
print("number of samples", number_of_samples)
print("elements_of_one_sample", elements_of_one_sample)
train_x, test_x, train_y, test_y = cross_validation.train_test_split(X, Y, test_size=0.1, random_state=42)
print("train_x.shape=", train_x.shape)
print("train_y.shape=", train_y.shape)
print("test_x.shape=", test_x.shape)
print("test_y.shape=", test_y.shape)
epoch = 0 # counter for number of rounds training network
last_cost = 0 # keep track of last cost to measure difference
max_epochs = 2000 # total number of training sessions
tolerance = 1e-6 # we stop when diff in costs less than that
batch_size = 50 # we batch the data in groups of this size
num_samples = train_y.shape[0] # number of samples in training set
num_batches = int( num_samples / batch_size ) # compute number of batches, given
print("############################## num_samples", num_samples)
print("############################## num_batches", num_batches)
x = tf.placeholder(tf.float32, shape=[None, 16])
y_ = tf.placeholder(tf.float32, shape=[None, 1])
# xW + b
W = tf.Variable(tf.zeros([16,1]))
b = tf.Variable(tf.zeros([1]))
sess.run(tf.initialize_all_variables())
# y = softmax(xW + b)
y = tf.nn.softmax(tf.matmul(x,W) + b)
# lossはcross entropy
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
for n in range( num_batches ):
batch_x = train_x[ n*batch_size : (n+1)*batch_size ]
batch_y = train_y[ n*batch_size : (n+1)*batch_size ]
train_step.run( feed_dict={x: batch_x, y_: batch_y} )
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(accuracy.eval(feed_dict={x: test_x, y_: test_y}))
# To create this model, we're going to need to create a lot of weights and biases.
# One should generally initialize weights with a small amount of noise for symmetry
# breaking, and to prevent 0 gradients
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
# Since we're using ReLU neurons, it is also good practice to initialize them
# with a slightly positive initial bias to avoid "dead neurons." Instead of doing
# this repeatedly while we build the model, let's create two handy functions
# to do it for us.
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
# https://www.tensorflow.org/versions/master/api_docs/python/nn.html#conv2d
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
# https://www.tensorflow.org/versions/master/api_docs/python/nn.html#max_pool
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],strides=[1, 2, 2, 1], padding='SAME')
W_conv1 = weight_variable([2, 2, 1, 32])
b_conv1 = bias_variable([32])
x_image = tf.reshape(x, [-1,4,4,1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
W_conv2 = weight_variable([2, 2, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
W_fc1 = weight_variable([1 * 1 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 1*1*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
W_fc2 = weight_variable([1024, 1])
b_fc2 = bias_variable([1])
y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
# loss
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
# accuracy
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# train
sess.run(tf.initialize_all_variables())
for i in range(20000):
if i%100 == 0:
batch_x = train_x[ n*batch_size : (n+1)*batch_size ]
batch_y = train_y[ n*batch_size : (n+1)*batch_size ]
train_accuracy = accuracy.eval(feed_dict={x:batch_x, y_: batch_y, keep_prob: 1.0})
print("step %d, training accuracy %g"%(i, train_accuracy))
train_step.run(feed_dict={x: batch_x, y_: batch_y, keep_prob: 0.5})
# result
print("test accuracy %g"%accuracy.eval(feed_dict={
x: test_x, y_: test_y, keep_prob: 1.0}))
Output:
number of samples 1250
elements_of_one_sample 16
train_x.shape= (1125, 16)
train_y.shape= (1125, 1)
test_x.shape= (125, 16)
test_y.shape= (125, 1)
############################## num_samples 1125
############################## num_batches 22
1.0
step 0, training accuracy 1
step 100, training accuracy 1
step 200, training accuracy 1
step 300, training accuracy 1
step 400, training accuracy 1
....
....
....
step 19500, training accuracy 1
step 19600, training accuracy 1
step 19700, training accuracy 1
step 19800, training accuracy 1
step 19900, training accuracy 1
test accuracy 1
I am quite new to neural nets and machine learning so pardon me for any mistakes, thanks in advance.
You've got a loss function of cross entropy, which is a loss function specifically designed for classification. If you want to do regression, you need to start with a loss function that penalizes prediction error (L2 error is a great place to start).
For prediction, the rightmost layer of the network needs to have linear units (no activation function). The number of neurons in the rightmost layer should correspond to the number of values you're predicting (if it's a simple regression problem where you're predicting a single value of y given a vector of inputs x, then you just need a single neuron in the right-most layer). Right now, you've got a softmax layer on the back end of the network, which is also specifically used for classification tasks.
Basically - you need to swap your softmax for a linear neuron and change your loss function to something like L2 error (aka mean-squared error).