Tensorflow model always produces mean - tensorflow

I am having trouble with fitting a very simple model in tensorflow. If I have a column of input data which is constant, my output always converges to produce the same value for all rows, which is the mean of my output data, y_, even when there is another column in x_ which has enough information to reproduce y_ exactly. Here is a small example.
import tensorflow as tf
def weight_variable(shape):
"""Initialize the weights with random weights"""
initial = tf.truncated_normal(shape, stddev=0.1, dtype=tf.float64)
return tf.Variable(initial)
#Initialize my data
x = tf.constant([[1.0,1.0],[1.0,2.0],[1.0,3.0]], dtype=tf.float64)
y_ = tf.constant([1.0,2.0,3.0], dtype=tf.float64)
w = weight_variable((2,1))
y = tf.matmul(x,w)
error = tf.reduce_mean(tf.square(y_ - y))
train_step = tf.train.AdamOptimizer(1e-5).minimize(error)
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
#Train the model and output every 1000 iterations
for i in range(1000000):
sess.run(train_step)
err = sess.run(error)
if i % 1000 == 0:
print "\nerr:", err
print "x: ", sess.run(x)
print "w: ", sess.run(w)
print "y_: ", sess.run(y_)
print "y: ", sess.run(y)
This example always converges to w=[2,0], and y = [2,2,2]. This is a smooth function with a minimum at w=[0,1] and y = [1,2,3], where the error function is zero. Why does it not converge to this? I have also tried using gradient descent and I have tried varying the training rate.

Your target is y_ = tf.constant([1.0,2.0,3.0], dtype=tf.float64) has the shape (1, 3). The output of tf.matmul(x, w) has the shape (3, 1). Thus y_ - y has the shape (3, 3) according to numpy broadcasting rules. So you are really not optimizing the function that you thought you were optimizing. Change your y_ to the following and give it a shot :
y_ = tf.constant([[1.0],[2.0],[3.0]], dtype=tf.float64)
This should converge pretty quickly to your expected answer, even with a large learning rate.

Related

How does Tensorflow's reduce_sum work in a loop?

I cannot understand the working of reduce_sum when the optimizer is run in a loop.
I have 30 samples in my train_x and train_y lists. I run my optimizer in a loop by feeding one sample from both at an iteration. My cost function computes the sum of the difference of predicted and actual values for all samples using the tensorflow's reduce_sum method. According to the graph the optimzer depends on the cost function and so the cost will be computed for every x and y. I need to know whether the reduce_sum will wait for all the 30 samples or take one sample (x, y) at a time. Here n_samples is 30. I also need to know whether the weights and bias will be updated for each epoch or for each x and y.
X = tf.placeholder(tf.float32)
Y = tf.placeholder(tf.float32)
W = tf.Variable(np.random.randn(), name='weights')
B = tf.Variable(np.random.randn(), name='bias')
pred = X * W + B
cost = tf.reduce_sum((pred - Y) ** 2) / (2 * n_samples)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
init = tf.global_variables_initializer()
with tf.Session() as sesh:
sesh.run(init)
for epoch in range(epochs):
for x, y in zip(train_x, train_y):
sesh.run(optimizer, feed_dict={X: x, Y: y})
if not epoch % 20:
c = sesh.run(cost, feed_dict={X: train_x, Y: train_y})
w = sesh.run(W)
b = sesh.run(B)
print(f'epoch: {epoch:04d} c={c:.4f} w={w:.4f} b={b:.4f}')
I need to know whether the reduce_sum will wait for all the 30 samples or take one sample (x, y) at a time.
tf.reduce_sum is an operation and as such it does not have any implicit mutable state. The result of tf.reduce_sum is fully defined by the model parameters (W and B) and the placeholder values explicitly provided in the feed_dict argument to the sess.run(cost, feed_dict={...}) call.
If you would like to aggregate the value of a metric across all batches check out tf.metrics:
y_pred = tf.placeholder(tf.float32)
y_true = tf.placeholder(tf.float32)
mse, update_op = tf.metrics.mean_squared_error(y_true, y_pred)
init = tf.local_variables_initializer() # MSE state is local!
sess = tf.Session()
sess.run(init)
# Update the metric and compute the value after the update.
sess.run(update_op, feed_dict={y_pred: [0.0], y_true: [42.0]}) # => 1764.0
# Get current value.
sess.run(mse) # => 1764.0
I also need to know whether the weights and bias will be updated for each epoch or for each x and y.
Each sess.run(optimizer, ...) call will compute the gradients of the trainable variables and apply these gradients to the variable values. See GradientDescentOptimizer.minimize.

RNN sequence learning

I am new to TensorFlow RNN prediction.
I am trying to use RNN with BasicLSTMCell to predict sequence, such as
1,2,3,4,5 ->6
3,4,5,6,7 ->8
35,36,37,38,39 ->40
My code doesn't report error, but outputs for every batch seem to be the same, and the cost seem to not reduce while training.
When I divided all training data by 100
0.01,0.02,0.03,0.04,0.05 ->0.06
0.03,0.04,0.05,0.06,0.07 ->0.08
0.35,0.36,0.37,0.38,0.39 ->0.40
The result is pretty good, the correlation between prediction and real values is very high (0.9998).
I suspect the problem is because integer and float? but I cannot explain the reason. Anyone can help? Many thanks!!
Here is the code
library(tensorflow)
start=sample(1:1000, 100000, T)
start1= start+1
start2=start1+1
start3= start2+1
start4=start3+1
start5= start4+1
start6=start5+1
label=start6+1
data=data.frame(start, start1, start2, start3, start4, start5, start6, label)
data=as.matrix(data)
n = nrow(data)
trainIndex = sample(1:n, size = round(0.7*n), replace=FALSE)
train = data[trainIndex ,]
test = data[-trainIndex ,]
train_data= train[,1:7]
train_label= train[,8]
means=apply(train_data, 2, mean)
sds= apply(train_data, 2, sd)
train_data=(train_data-means)/sds
test_data=test[,1:7]
test_data=(test_data-means)/sds
test_label=test[,8]
batch_size = 50L
n_inputs = 1L # MNIST data input (img shape: 28*28)
n_steps = 7L # time steps
n_hidden_units = 10L # neurons in hidden layer
n_outputs = 1L # MNIST classes (0-9 digits)
x = tf$placeholder(tf$float32, shape(NULL, n_steps, n_inputs))
y = tf$placeholder(tf$float32, shape(NULL, 1L))
weights_in= tf$Variable(tf$random_normal(shape(n_inputs, n_hidden_units)))
weights_out= tf$Variable(tf$random_normal(shape(n_hidden_units, 1L)))
biases_in=tf$Variable(tf$constant(0.1, shape= shape(n_hidden_units )))
biases_out = tf$Variable(tf$constant(0.1, shape=shape(1L)))
RNN=function(X, weights_in, weights_out, biases_in, biases_out)
{
X = tf$reshape(X, shape=shape(-1, n_inputs))
X_in = tf$sigmoid (tf$matmul(X, weights_in) + biases_in)
X_in = tf$reshape(X_in, shape=shape(-1, n_steps, n_hidden_units)
lstm_cell = tf$contrib$rnn$BasicLSTMCell(n_hidden_units, forget_bias=1.0, state_is_tuple=T)
init_state = lstm_cell$zero_state(batch_size, dtype=tf$float32)
outputs_final_state = tf$nn$dynamic_rnn(lstm_cell, X_in, initial_state=init_state, time_major=F)
outputs= tf$unstack(tf$transpose(outputs_final_state[[1]], shape(1,0,2)))
results = tf$matmul(outputs[[length(outputs)]], weights_out) + biases_out
return(results)
}
pred = RNN(x, weights_in, weights_out, biases_in, biases_out)
cost = tf$losses$mean_squared_error(pred, y)
train_op = tf$contrib$layers$optimize_loss(loss=cost, global_step=tf$contrib$framework$get_global_step(), learning_rate=0.05, optimizer="SGD")
init <- tf$global_variables_initializer()
sess <- tf$Session()
sess.run(init)
step = 0
while (step < 1000)
{
train_data2= train_data[(step*batch_size+1) : (step*batch_size+batch_size) , ]
train_label2=train_label[(step*batch_size+1):(step*batch_size+batch_size)]
batch_xs <- sess$run(tf$reshape(train_data2, shape(batch_size, n_steps, n_inputs))) # Reshape
batch_ys= matrix(train_label2, ncol=1)
sess$run(train_op, feed_dict = dict(x = batch_xs, y= batch_ys))
mycost <- sess$run(cost, feed_dict = dict(x = batch_xs, y= batch_ys))
print (mycost)
test_data2= test_data[(0*batch_size+1) : (0*batch_size+batch_size) , ]
test_label2=test_label[(0*batch_size+1):(0*batch_size+batch_size)]
batch_xs <- sess$run(tf$reshape(test_data2, shape(batch_size, n_steps, n_inputs))) # Reshape
batch_ys= matrix(test_label2, ncol=1)
step=step+1
}
First, it's quite useful to always normalize your network inputs (there are different approaches, divide by a maximum value, subtract mean and divide by std and many more). This will help your optimizer a lot.
Second, and actually most important in your case, after the RNN output you are applying sigmoid function. If you check the plot of the sigmoid function, you will see that it actually scales all inputs to the range (0,1). So basically no matter how big your inputs are your output will always be at most 1. Thus you should not use any activation functions at the output layer in regression problems.
Hope it helps.

Simple model gets 0.0 accuracy

I am training a simple model on a dataset containing labels always equal to 0, and am getting a 0.0 accuracy.
The model is the following:
import csv
import numpy as np
import pandas as pd
import tensorflow as tf
labelsReader = pd.read_csv('data.csv',usecols = [12],header=None)
dataReader = pd.read_csv('data.csv',usecols = [1,2,3,4,5,6,7,8,9,10,11],header=None)
labels_ = labelsReader.values
data_ = dataReader.values
labels = np.float32(labels_)
data = np.float32(data_)
x = tf.placeholder(tf.float32, [None, 11])
W = tf.Variable(tf.truncated_normal([11, 1], stddev=1./11.))
b = tf.Variable(tf.zeros([1]))
y = tf.matmul(x, W) + b
# Define loss and optimizer
y_ = tf.placeholder(tf.float32, [None, 1])
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
for i in range(0, 1000):
train_step.run(feed_dict={x: data, y_: labels})
correct_prediction = tf.equal(y, y_)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: data, y_: labels}))
And here is the dataset:
444444,0,0,0.9993089149965446,0,0,0.000691085003455425,0,0,0,0,0,0
As the model trains, y of the data shown above decreases, and reaches -1000 after 1000 iterations.
What could be the cause of the failure to train the model ?
Your accuracy checks if the predicted float is exactly equal to the value you expect. With the network you made this is a very difficult task (although you might have a chance as you are also overfitting your data).
To get better results:
- Define accuracy to be higher/lower than a value (closer to 1 or closer to 0).
- Normalise your input data, I don't know the range of your input, but 444444 is a rediculous value to use as input, and it is difficult to train weights that can handle these values.
Also: try to add some sanity checks. For example: what is the output your model is predicting? (y.eval) And what is the cross entropy you have during training your network? (sess.run([accuracy,cross_entropy], feed_dict={x: data, y_: labels})
Good luck!

tensor flow character recognition with softmax results in accuracy 1 due to [NaN...NaN] prediction

I am trying to use the softmax regression method discussed in https://www.tensorflow.org/get_started/mnist/beginners to recognize characters.
My code is as follows.
train_data = pd.read_csv('CharDataSet/train.csv')
print(train_data.shape)
x = tf.placeholder(tf.float32, [None, 130])
W = tf.Variable(tf.zeros([130, 26]))
b = tf.Variable(tf.zeros([26]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 26])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
for _ in range(10):
batch_xs = train_data.iloc[:, 2:]
print(batch_xs)
batch_ys = getencodedbatch(train_data.iloc[:, 1])
print(batch_ys)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
However, I am getting an accuracy of 1, which shouldn't be the case.
The reason why I am getting it so is because my y tensor results with an array like
[nan, ..., nan]
Can anyone explain to me what is wrong in my code?
I converted each character to a one-hot encoding using the method below
def getencodedbatch(param):
s = (param.shape[0],26)
y_encode = np.zeros(s)
row=0
# print(y_encode)
for val in param:
col = ord(val)-97
y_encode[row, col] = 1
row += 1
return pd.DataFrame(y_encode)
Here is the problem you are having:
You set your initial weights and biases to 0 (this is wrong, as your
network does not learn).
The result is that y consists of all zeros
You take the log of y.. and a log of 0 is not defined... Hence the NaN.
Good luck!
Edit to tell you how to fix it: look for an example on classifying MNIST characters and see what they do. You probably want to initialise your weights to be random normals ;)

Loss not converging in Polynomial regression in Tensorflow

import numpy as np
import tensorflow as tf
#input data:
x_input=np.linspace(0,10,1000)
y_input=x_input+np.power(x_input,2)
#model parameters
W = tf.Variable(tf.random_normal([2,1]), name='weight')
#bias
b = tf.Variable(tf.random_normal([1]), name='bias')
#placeholders
#X=tf.placeholder(tf.float32,shape=(None,2))
X=tf.placeholder(tf.float32,shape=[None,2])
Y=tf.placeholder(tf.float32)
x_modified=np.zeros([1000,2])
x_modified[:,0]=x_input
x_modified[:,1]=np.power(x_input,2)
#model
#x_new=tf.constant([x_input,np.power(x_input,2)])
Y_pred=tf.add(tf.matmul(X,W),b)
#algortihm
loss = tf.reduce_mean(tf.square(Y_pred -Y ))
#training algorithm
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
#initializing the variables
init = tf.initialize_all_variables()
#starting the session session
sess = tf.Session()
sess.run(init)
epoch=100
for step in xrange(epoch):
# temp=x_input.reshape((1000,1))
#y_input=temp
_, c=sess.run([optimizer, loss], feed_dict={X: x_modified, Y: y_input})
if step%50==0 :
print c
print "Model paramters:"
print sess.run(W)
print "bias:%f" %sess.run(b)
I'm trying to implement Polynomial regression(quadratic) in Tensorflow. The loss isn't converging. Could anyone please help me out with this. The similar logic is working for linear regression though!
First there is a problem in your shapes, for Y_pred and Y:
Y has unknown shape, and is fed with an array of shape (1000,)
Y_pred has shape (1000, 1)
Y - Y_pred will then have shape (1000, 1000)
This small code will prove my point:
a = tf.zeros([1000]) # shape (1000,)
b = tf.zeros([1000, 1]) # shape (1000, 1)
print (a-b).get_shape() # prints (1000, 1000)
You should use consistent types:
y_input = y_input.reshape((1000, 1))
Y = tf.placeholder(tf.float32, shape=[None, 1])
Anyway, the loss is exploding because you have very high values (input between 0 and 100, you should normalize it) and thus very high loss (around 2000 at the beginning of training).
The gradient is very high and the parameters explode, and the loss gets to infinite.
The quickest fix is to lower your learning rate (1e-5 converges for me, albeit very slowly at the end). You can make it higher after the loss converges to around 1.