Softmax logistic regression: Different performance by scikit-learn and TensorFlow - tensorflow

I'm trying to learn a simple linear softmax model on some data. The LogisticRegression in scikit-learn seems to work fine, and now I am trying to port the code to TensorFlow, but I'm not getting the same performance, but quite a bit worse. I understand that the results will not be exactly equal (scikit learn has regularization params etc), but it's too far off.
total = pd.read_feather('testfile.feather')
labels = total['labels']
features = total[['f1', 'f2']]
print(labels.shape)
print(features.shape)
classifier = linear_model.LogisticRegression(C=1e5, solver='newton-cg', multi_class='multinomial')
classifier.fit(features, labels)
pred_labels = classifier.predict(features)
print("SCI-KITLEARN RESULTS: ")
print('\tAccuracy:', classifier.score(features, labels))
print('\tPrecision:', precision_score(labels, pred_labels, average='macro'))
print('\tRecall:', recall_score(labels, pred_labels, average='macro'))
print('\tF1:', f1_score(labels, pred_labels, average='macro'))
# now try softmax regression with tensorflow
print("\n\nTENSORFLOW RESULTS: ")
## By default, the OneHotEncoder class will return a more efficient sparse encoding.
## This may not be suitable for some applications, such as use with the Keras deep learning library.
## In this case, we disabled the sparse return type by setting the sparse=False argument.
enc = OneHotEncoder(sparse=False)
enc.fit(labels.values.reshape(len(labels), 1)) # Reshape is required as Encoder expect 2D data as input
labels_one_hot = enc.transform(labels.values.reshape(len(labels), 1))
# tf Graph Input
x = tf.placeholder(tf.float32, [None, 2]) # 2 input features
y = tf.placeholder(tf.float32, [None, 5]) # 5 output classes
# Set model weights
W = tf.Variable(tf.zeros([2, 5]))
b = tf.Variable(tf.zeros([5]))
# Construct model
pred = tf.nn.softmax(tf.matmul(x, W) + b) # Softmax
clas = tf.argmax(pred, axis=1)
# Minimize error using cross entropy
cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1))
# Gradient Descent
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(cost)
# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()
# Start training
with tf.Session() as sess:
# Run the initializer
sess.run(init)
# Training cycle
for epoch in range(1000):
# Run optimization op (backprop) and cost op (to get loss value)
_, c = sess.run([optimizer, cost], feed_dict={x: features, y: labels_one_hot})
# Test model
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
class_out = clas.eval({x: features})
# Calculate accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print("\tAccuracy:", accuracy.eval({x: features, y: labels_one_hot}))
print('\tPrecision:', precision_score(labels, class_out, average='macro'))
print('\tRecall:', recall_score(labels, class_out, average='macro'))
print('\tF1:', f1_score(labels, class_out, average='macro'))
The output of this code is
(1681,)
(1681, 2)
SCI-KITLEARN RESULTS:
Accuracy: 0.822129684711
Precision: 0.837883361162
Recall: 0.784522522208
F1: 0.806251963817
TENSORFLOW RESULTS:
Accuracy: 0.694825
Precision: 0.735883666192
Recall: 0.649145125846
F1: 0.678045562185
I inspected the result of the one-hot-encoding, and the data, but I have no idea why the result in TF is much worse.
Any suggestion would be really appreciated..

The problem turned out to be silly, I just needed more epochs, a smaller learning rate (and for efficiency I turned to AdamOptimizer, results are now equal, although the TF implementation is much slower.
(1681,)
(1681, 2)
SCI-KITLEARN RESULTS:
Accuracy: 0.822129684711
Precision: 0.837883361162
Recall: 0.784522522208
F1: 0.806251963817
TENSORFLOW RESULTS:
Accuracy: 0.82213
Precision: 0.837883361162
Recall: 0.784522522208
F1: 0.806251963817

Related

In tensorflow 1, when the loss function is defined with operations on Tensors, is the model really trained?

First, I m sorry but it's not possible to reproduce this problem on a few lines, as the model involved is a very complex network.
But here is an idea of the code:
def return_iterator(data, nb_epochs, batch_size):
dataset = tf.data.Dataset.from_tensor_slices(data)
dataset = dataset.repeat(nb_epochs).batch(batch_size)
iterator = dataset.make_one_shot_iterator()
yy = iterator.get_next()
return tf.cast(yy, tf.float32)
with tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) as sess:
y_pred = complex_model.autoencode(train)
y_pred = tf.convert_to_tensor(y_pred, dtype=tf.float32)
nb_epochs = 10
batch_size = 64
y_real = return_iterator(train, nb_epochs, batch_size)
y_pred = return_iterator(y_pred, nb_epochs, batch_size)
res_equal = 1. - tf.reduce_mean(tf.abs(y_pred - y_real), [1,2,3])
loss = 1 - tf.reduce_sum(res_equal, axis=0)
opt = tf.train.AdamOptimizer().minimize(loss)
tf.global_variables_initializer().run()
for epoch in range(0, nb_epochs):
_, d_loss = sess.run([opt, loss])
To define the loss, I must use operations like tf.reduce_mean and tf.reduce_sum , and these operations only accept Tensors as input.
My question is: with this code, will the complex_model autoencoder be trained during the training ? (eventhough here, it's just used to output the predictions to compute the loss)
Thank you
p.s: I am using TF1.15 (and I cannot use another version)

Tensorflow reporting wrong AUC

I have an issue while using AUC from tensorflow library. I train my model (convolutional neural network) per batch ( i do not use a validation set) and after each epoch I use an independent test set to obtain my evaluations. The problem lies within AUC evaluation.
In each batch I calculate AUC/Accuracy/Loss/Precision/Recall/F1_score for the training set and then I aggregate the mean of these scores. When I try to do the same for the test set I again calculate the same scores. I notice that all scores except AUC have different values. I think it is not correct test's loss function to increase and AUC to increase as well. And the problem is that test's AUC is almost identical to training's AUC (even though their accuracy, loss error are completely different).
with tf.name_scope("output"):
W = tf.Variable(tf.truncated_normal([num_filters_total, num_classes], stddev=0.1), name="W")
b = tf.Variable(tf.constant(0.1, shape=[num_classes]), name="b")
scores = tf.nn.xw_plus_b(h_drop, W, b, name="scores")
predictions = tf.argmax(scores, 1, name="predictions")
l2_loss += tf.nn.l2_loss(W, name="l2_loss")
l2_loss += tf.nn.l2_loss(b, name="l2_loss")
tf.summary.histogram("l2", l2_loss)
tf.summary.histogram("weigths", W)
tf.summary.histogram("biases", b)
with tf.name_scope("auc_score"):
# labelOut = tf.argmax(y_place_holder, 1)
probability = tf.nn.softmax(scores)
# auc_scoreTemp = streaming_auc(y_place_holder, probability, curve="PR")
auc_scoreTemp = tf.metrics.auc(y_place_holder, probability, curve="PR")
auc_score = tf.reduce_mean(tf.cast(auc_scoreTemp, tf.float32), name="auc_score")
tf.summary.scalar("auc_score", auc_score)
with tf.name_scope("accuracy"):
labelOut = tf.argmax(y_place_holder, 1)
correct_prediction = tf.equal(predictions, tf.argmax(y_place_holder, 1), name="correct_prediction")
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name="accuracy")
tf.summary.scalar("accuracy", accuracy)
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
for batch in batches:
x_batch, y_batch = list(zip(*batch))
_, accuracy_train, auc_training, loss_train, prec_batch, recall_batch, f1_batch \
= sess.run([train_step, accuracy, auc_score, cross_entropy, precision_mini_batch,
recall_mini_batch, f1_score_min_batch], feed_dict={x_place_holder: x_batch,
y_place_holder: y_batch,
emb_place_holder: vocab_inv_emb_dset,
dropout_keep_prob: dropout_rate})
...
for test_batch in test_batches:
auc_test = None
x_test_batch, y_test_batch = list(zip(*test_batch))
accuracy_test, loss_test, auc_test = sess.run([accuracy, cross_entropy, auc_score],
feed_dict={x_place_holder: x_test_batch,
y_place_holder: y_test_batch,
emb_place_holder: vocab_inv_emb_dset_val,
dropout_keep_prob: 1.0})
I also tried to use streaming_auc which returns always 1.
EDIT
In the end of every epoch I reset the local variables by running:
sess.run(tf.local_variables_initializer())
But the first batch outputs really bad results. After the first batch I get normal results from test set which are not close to the training results. I don't know if this is the correct way to do it but results seem more realistic this way.
All of the tf.metrics return a value and an updating op (see here). So as described here you want to use the updating op to accumulate values and then evaluate auc_score to retrieve the accumulated value, something like this:
...
auc_score, auc_op = tf.metrics.auc(y_place_holder, probability, curve="PR")
...
for batch in batches:
sess.run([train_step, accuracy, auc_op, cross_entropy,...)
...
py_auc = sess.run(auc)
EDIT -- toy example showing tf.metrics.auc and tf.contrib.metrics.streaming_auc
import tensorflow as tf
from tensorflow.contrib import metrics
batch_sz = 100
noise_mag = 0.5
nloop = 10
tf.set_random_seed(0)
batch_x = tf.random_uniform([batch_sz, 1], 0, 2, dtype=tf.int32)
noise = noise_mag * tf.random_normal([batch_sz, 1])
batch_y = tf.sigmoid(tf.to_float(batch_x) + noise)
auc_val, auc_accum = tf.metrics.auc(batch_x, batch_y)
#note: contrib.metrics.streaming_auc reverses labels, predictions
auc_val2, auc_accum2 = metrics.streaming_auc(batch_y, batch_x)
with tf.Session() as sess:
sess.run(tf.local_variables_initializer())
for i in range(nloop):
_ = sess.run([auc_accum, auc_accum2])
auc, auc2 = sess.run([auc_val, auc_val2])
print('Accumulated AUC = ', sess.run(auc_val)) #0.9238014
print('Accumulated AUC2 = ', sess.run(auc_val)) #0.9238014

How to properly use tf.metrics.accuracy?

I have some trouble using the accuracy function from tf.metrics for a multiple classification problem with logits as input.
My model output looks like:
logits = [[0.1, 0.5, 0.4],
[0.8, 0.1, 0.1],
[0.6, 0.3, 0.2]]
And my labels are one hot encoded vectors:
labels = [[0, 1, 0],
[1, 0, 0],
[0, 0, 1]]
When I try to do something like tf.metrics.accuracy(labels, logits) it never gives the correct result. I am obviously doing something wrong but I can't figure what it is.
TL;DR
The accuracy function tf.metrics.accuracy calculates how often predictions matches labels based on two local variables it creates: total and count, that are used to compute the frequency with which logits matches labels.
acc, acc_op = tf.metrics.accuracy(labels=tf.argmax(labels, 1),
predictions=tf.argmax(logits,1))
print(sess.run([acc, acc_op]))
print(sess.run([acc]))
# Output
#[0.0, 0.66666669]
#[0.66666669]
acc (accuracy): simply returns the metrics using total and count, doesnt update the metrics.
acc_op (update up): updates the metrics.
To understand why the acc returns 0.0, go through the details below.
Details using a simple example:
logits = tf.placeholder(tf.int64, [2,3])
labels = tf.Variable([[0, 1, 0], [1, 0, 1]])
acc, acc_op = tf.metrics.accuracy(labels=tf.argmax(labels, 1),
predictions=tf.argmax(logits,1))
Initialize the variables:
Since metrics.accuracy creates two local variables total and count, we need to call local_variables_initializer() to initialize them.
sess = tf.Session()
sess.run(tf.local_variables_initializer())
sess.run(tf.global_variables_initializer())
stream_vars = [i for i in tf.local_variables()]
print(stream_vars)
#[<tf.Variable 'accuracy/total:0' shape=() dtype=float32_ref>,
# <tf.Variable 'accuracy/count:0' shape=() dtype=float32_ref>]
Understanding update ops and accuracy calculation:
print('acc:',sess.run(acc, {logits:[[0,1,0],[1,0,1]]}))
#acc: 0.0
print('[total, count]:',sess.run(stream_vars))
#[total, count]: [0.0, 0.0]
The above returns 0.0 for accuracy as total and count are zeros, inspite of giving matching inputs.
print('ops:', sess.run(acc_op, {logits:[[0,1,0],[1,0,1]]}))
#ops: 1.0
print('[total, count]:',sess.run(stream_vars))
#[total, count]: [2.0, 2.0]
With the new inputs, the accuracy is calculated when the update op is called. Note: since all the logits and labels match, we get accuracy of 1.0 and the local variables total and count actually give total correctly predicted and the total comparisons made.
Now we call accuracy with the new inputs (not the update ops):
print('acc:', sess.run(acc,{logits:[[1,0,0],[0,1,0]]}))
#acc: 1.0
Accuracy call doesnt update the metrics with the new inputs, it just returns the value using the two local variables. Note: the logits and labels dont match in this case. Now calling update ops again:
print('op:',sess.run(acc_op,{logits:[[0,1,0],[0,1,0]]}))
#op: 0.75
print('[total, count]:',sess.run(stream_vars))
#[total, count]: [3.0, 4.0]
The metrics are updated to new inputs
For more information on how to use the metrics during training and how to reset them during validation, can be found here.
On TF 2.0, if you are using the tf.keras API, you can define a custom class myAccuracy which inherits from tf.keras.metrics.Accuracy, and overrides the update method like this:
# imports
# ...
class myAccuracy(tf.keras.metrics.Accuracy):
def update_state(self, y_true, y_pred, sample_weight=None):
y_true = tf.argmax(y_true,1)
y_pred = tf.argmax(y_pred,1)
return super(myAccuracy,self).update_state(y_true,y_pred,sample_weight)
Then, when compiling the model you can add metrics in the usual way.
from my_awesome_models import discriminador
discriminador.compile(tf.keras.optimizers.Adam(),
loss=tf.nn.softmax_cross_entropy_with_logits,
metrics=[myAccuracy()])
from my_puzzling_datasets import train_dataset,test_dataset
discriminador.fit(train_dataset.shuffle(70000).repeat().batch(1000),
epochs=1,steps_per_epoch=1,
validation_data=test_dataset.shuffle(70000).batch(1000),
validation_steps=1)
# Train for 1 steps, validate for 1 steps
# 1/1 [==============================] - 3s 3s/step - loss: 0.1502 - accuracy: 0.9490 - val_loss: 0.1374 - val_accuracy: 0.9550
Or evaluate yout model over the whole dataset
discriminador.evaluate(test_dataset.batch(TST_DSET_LENGTH))
#> [0.131587415933609, 0.95354694]
Applied on a cnn you can write:
x_len=24*24
y_len=2
x = tf.placeholder(tf.float32, shape=[None, x_len], name='input')
fc1 = ... # cnn's fully connected layer
keep_prob = tf.placeholder(tf.float32, name='keep_prob')
layer_fc_dropout = tf.nn.dropout(fc1, keep_prob, name='dropout')
y_pred = tf.nn.softmax(fc1, name='output')
logits = tf.argmax(y_pred, axis=1)
y_true = tf.placeholder(tf.float32, shape=[None, y_len], name='y_true')
acc, acc_op = tf.metrics.accuracy(labels=tf.argmax(y_true, axis=1), predictions=tf.argmax(y_pred, 1))
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
def print_accuracy(x_data, y_data, dropout=1.0):
accuracy = sess.run(acc_op, feed_dict = {y_true: y_data, x: x_data, keep_prob: dropout})
print('Accuracy: ', accuracy)
Extending the answer to TF2.0, the tutorial here explains clearly how to use tf.metrics for accuracy and loss.
https://www.tensorflow.org/beta/tutorials/quickstart/advanced
Notice that it mentions that the metrics are reset after each epoch :
train_loss.reset_states()
train_accuracy.reset_states()
test_loss.reset_states()
test_accuracy.reset_states()
When label and predictions are one-hot-coded
def train_step(features, labels):
with tf.GradientTape() as tape:
prediction = model(features)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=predictions))
gradients = tape.gradient(loss, model.trainable_weights)
optimizer.apply_gradients(zip(gradients, model.trainable_weights))
train_loss(loss)
train_accuracy(tf.argmax(labels, 1), tf.argmax(predictions, 1))
Here how I use it:
test_accuracy = tf.keras.metrics.Accuracy()
# use dataset api or normal dataset from lists/np arrays
ds_test_batch = zip(x_test,y_test)
predicted_classes = np.array([])
for (x, y) in ds_test_batch:
# training=False is needed only if there are layers with different
# behaviour during training versus inference (e.g. Dropout).
#Ajust the input similar to your input during the training
logits = model(x.reshape(1,-1), training=False )
prediction = tf.argmax(logits, axis=1, output_type=tf.int64)
predicted_classes = np.concatenate([predicted_classes,prediction.numpy()])
test_accuracy(prediction, y)
print("Test set accuracy: {:.3%}".format(test_accuracy.result()))

Tensorflow Dropout implementation, test accuracy = train accuracy and low, why?

I have tried dropout implementation in Tensorflow.
I do know that dropout should be declared as a placeholder and keep_prob parameter during training and testing should be different. However still almost broke my brain trying to find why with dropout the accuracy is so low. When keep_drop = 1, the train accuracy 99%, test accuracy 85%, with keep_drop = 0.5, both train and test accuracy is 16% Any ideas where to look into, anyone? Thank you!
def forward_propagation(X, parameters, keep_prob):
"""
Implements the forward propagation for the model: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX
Arguments:
X -- input dataset placeholder, of shape (input size, number of examples)
parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3"
the shapes are given in initialize_parameters
Returns:
Z3 -- the output of the last LINEAR unit
"""
# Retrieve the parameters from the dictionary "parameters"
W1 = parameters['W1']
b1 = parameters['b1']
W2 = parameters['W2']
b2 = parameters['b2']
W3 = parameters['W3']
b3 = parameters['b3']
Z1 = tf.add(tf.matmul(W1,X),b1) # Z1 = np.dot(W1, X) + b1
A1 = tf.nn.relu(Z1) # A1 = relu(Z1)
A1 = tf.nn.dropout(A1,keep_prob) # apply dropout
Z2 = tf.add(tf.matmul(W2,A1),b2) # Z2 = np.dot(W2, a1) + b2
A2 = tf.nn.relu(Z2) # A2 = relu(Z2)
A2 = tf.nn.dropout(A2,keep_prob) # apply dropout
Z3 = tf.add(tf.matmul(W3,A2),b3) # Z3 = np.dot(W3,A2) + b3
return Z3
def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.0001, lambd = 0.03, train_keep_prob = 0.5,
num_epochs = 800, minibatch_size = 32, print_cost = True):
"""
Implements a three-layer tensorflow neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SOFTMAX.
Arguments:
X_train -- training set, of shape (input size = 12288, number of training examples = 1080)
Y_train -- test set, of shape (output size = 6, number of training examples = 1080)
X_test -- training set, of shape (input size = 12288, number of training examples = 120)
Y_test -- test set, of shape (output size = 6, number of test examples = 120)
learning_rate -- learning rate of the optimization
lambd -- L2 regularization hyperparameter
train_keep_prob -- probability of keeping a neuron in hidden layer for dropout implementation
num_epochs -- number of epochs of the optimization loop
minibatch_size -- size of a minibatch
print_cost -- True to print the cost every 100 epochs
Returns:
parameters -- parameters learnt by the model. They can then be used to predict.
"""
ops.reset_default_graph() # to be able to rerun the model without overwriting tf variables
tf.set_random_seed(1) # to keep consistent results
seed = 3 # to keep consistent results
(n_x, m) = X_train.shape # (n_x: input size, m : number of examples in the train set)
n_y = Y_train.shape[0] # n_y : output size
costs = [] # To keep track of the cost
# Create Placeholders of shape (n_x, n_y)
X, Y = create_placeholders(n_x, n_y)
keep_prob = tf.placeholder(tf.float32)
# Initialize parameters
parameters = initialize_parameters()
# Forward propagation: Build the forward propagation in the tensorflow graph
Z3 = forward_propagation(X, parameters, keep_prob)
# Cost function: Add cost function to tensorflow graph
cost = compute_cost(Z3, Y, parameters, lambd)
# Backpropagation.
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)
# Initialize all the variables
init = tf.global_variables_initializer()
# Start the session to compute the tensorflow graph
with tf.Session() as sess:
# Run the initialization
sess.run(init)
# Do the training loop
for epoch in range(num_epochs):
epoch_cost = 0. # Defines a cost related to an epoch
num_minibatches = int(m / minibatch_size) # number of minibatches of size minibatch_size in the train set
seed = seed + 1
minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)
for minibatch in minibatches:
# Select a minibatch
(minibatch_X, minibatch_Y) = minibatch
# IMPORTANT: The line that runs the graph on a minibatch.
# Run the session to execute the "optimizer" and the "cost", the feedict should contain a minibatch for (X,Y).
_ , minibatch_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y, keep_prob: train_keep_prob})
epoch_cost += minibatch_cost / num_minibatches
# Print the cost every epoch
if print_cost == True and epoch % 100 == 0:
print ("Cost after epoch %i: %f" % (epoch, epoch_cost))
if print_cost == True and epoch % 5 == 0:
costs.append(epoch_cost)
# plot the cost
plt.plot(np.squeeze(costs))
plt.ylabel('cost')
plt.xlabel('iterations (per tens)')
plt.title("Learning rate =" + str(learning_rate))
plt.show()
# lets save the parameters in a variable
parameters = sess.run(parameters)
print ("Parameters have been trained!")
# Calculate the correct predictions
correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))
# Calculate accuracy on the test set
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print ("Train Accuracy:", accuracy.eval({X: X_train, Y: Y_train, keep_prob: 1.0}))
print ("Test Accuracy:", accuracy.eval({X: X_test, Y: Y_test, keep_prob: 1.0}))
return parameters
The algo is correct. It is just the keep_prob = 0.5 is too low.
Managed to get 87% accuracy on the test set with the following hyperparameters:
learning_rate = 0.00002, lambd = 0.03, train_keep_prob = 0.90, num_epochs = 1500, minibatch_size = 32,
In the first case your model was overfitting to the data, hence the large difference between the train and test accuracy. Dropout is a regularization technique to reduce the variance of the model by reducing the effect of particular nodes and hence prevent overfitting. But keeping the keep_prob = 0.5(too low) weakens the model and hence it underfits severely to the data, giving an accuracy as low as 16%. You should iterate by gradually decreasing the keep_prob value untill you find a suitable value.

Tensorflow loss minimization is increasing loss

I implemented the linear regression model shown on Tensorflow's main page: https://www.tensorflow.org/get_started/get_started
import numpy as np
import tensorflow as tf
# Model parameters
W = tf.Variable([.3], tf.float32)
b = tf.Variable([-.3], tf.float32)
# Model input and output
x = tf.placeholder(tf.float32)
linear_model = W * x + b
y = tf.placeholder(tf.float32)
# loss
loss = tf.reduce_sum(tf.square(linear_model - y)) # sum of the squares
# optimizer
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
# training data
x_train = [1,2,3,4]
y_train = [0,-1,-2,-3]
# training loop
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init) # reset values to wrong
for i in range(1000):
sess.run(train, {x:x_train, y:y_train})
# evaluate training accuracy
curr_W, curr_b, curr_loss = sess.run([W, b, loss], {x:x_train, y:y_train})
print("W: %s b: %s loss: %s"%(curr_W, curr_b, curr_loss))
However, when I change the training data to x_train=[2,4,6,8] and y_train=[3,4,5,6],
the loss starts to increase over time until it reaches 'nan'
As suggested by Steven, you should probably use reduce_mean(), which seems to fix the problem of the increasing loss function. Note that I also increased the number of training steps since reduce_mean() appears to need a bit longer to converge. Be careful with increasing the learning rate, since this may reproduce the problem. Instead, if training time is not a critical factor, you might want to decrease the learning rate and increase the number of training iterations further.
With the reduce_sum() function it worked well for me after decreasing the learning rate from 0.01 to 0.001. Again, thanks to Steven for the suggestion.
import numpy as np
import tensorflow as tf
# Model parameters
W = tf.Variable([.3], tf.float32)
b = tf.Variable([-.3], tf.float32)
# Model input and output
x = tf.placeholder(tf.float32)
linear_model = W * x + b
y = tf.placeholder(tf.float32)
# loss
loss = tf.reduce_mean(tf.square(linear_model - y)) # sum of the squares
# optimizer
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
# training data
x_train = [2,4,6,8]
y_train = [0,3,4,5]
# training loop
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init) # reset values to wrong
for i in range(5000):
sess.run(train, {x:x_train, y:y_train})
# evaluate training accuracy
curr_W, curr_b, curr_loss = sess.run([W, b, loss], {x:x_train, y:y_train})
print("W: %s b: %s loss: %s"%(curr_W, curr_b, curr_loss))