I am trying to create a Backpropagation but I do not want to use the GradientDescentOptimizer from TF. I just wanted to update my own weights and biases. The problem is that the Mean Square Error or Cost is not approaching to zero. It just stays at some 0.2xxx. Is it because of my inputs which are 520x1600 (yes, each input has 1600 units and yes, there are 520 of them) or my number of neurons in the Hidden Layer is problematic? I have tried implementing this using the GradientDescentOptimizer and minimize(cost) which is working fine (Cost reduces near to zero as training goes on) but maybe I have an issue in my code of updating the weights and biases.
Here's my code:
import tensorflow as tf
import numpy as np
from BPInputs40 import pattern, desired;
#get the inputs and desired outputs, 520 inputs, each has 1600 units
train_in = pattern
train_out = desired
learning_rate=tf.constant(0.5)
num_input_neurons = len(train_in[0])
num_output_neurons = len(train_out[0])
num_hidden_neurons = 20
#weight matrix initialization with random values
w_h = tf.Variable(tf.random_normal([num_input_neurons, num_hidden_neurons]), dtype=tf.float32)
w_o = tf.Variable(tf.random_normal([num_hidden_neurons, num_output_neurons]), dtype=tf.float32)
b_h = tf.Variable(tf.random_normal([1, num_hidden_neurons]), dtype=tf.float32)
b_o = tf.Variable(tf.random_normal([1, num_output_neurons]), dtype=tf.float32)
# Model input and output
x = tf.placeholder("float")
y = tf.placeholder("float")
def sigmoid(v):
return tf.div(tf.constant(1.0),tf.add(tf.constant(1.0),tf.exp(tf.negative(v*0.001))))
def derivative(v):
return tf.multiply(sigmoid(v), tf.subtract(tf.constant(1.0), sigmoid(v)))
output_h = tf.sigmoid(tf.add(tf.matmul(x,w_h),b_h))
output_o = tf.sigmoid(tf.add(tf.matmul(output_h,w_o),b_o))
error = tf.subtract(output_o,y) #(1x35)
mse = tf.reduce_mean(tf.square(error))
delta_o=tf.multiply(error,derivative(output_o))
delta_b_o=delta_o
delta_w_o=tf.matmul(tf.transpose(output_h), delta_o)
delta_backprop=tf.matmul(delta_o,tf.transpose(w_o))
delta_h=tf.multiply(delta_backprop,derivative(output_h))
delta_b_h=delta_h
delta_w_h=tf.matmul(tf.transpose(x),delta_h)
#updating the weights
train = [
tf.assign(w_h, tf.subtract(w_h, tf.multiply(learning_rate, delta_w_h))),
tf.assign(b_h, tf.subtract(b_h, tf.multiply(learning_rate, tf.reduce_mean(delta_b_h, 0)))),
tf.assign(w_o, tf.subtract(w_o, tf.multiply(learning_rate, delta_w_o))),
tf.assign(b_o, tf.subtract(b_o, tf.multiply(learning_rate, tf.reduce_mean(delta_b_o, 0))))
]
sess = tf.Session()
sess.run(tf.global_variables_initializer())
err,target=1, 0.005
epoch, max_epochs = 0, 2000000
while epoch < max_epochs:
epoch += 1
err, _ = sess.run([mse, train],{x:train_in,y:train_out})
if (epoch%1000 == 0):
print('Epoch:', epoch, '\nMSE:', err)
answer = tf.equal(tf.floor(output_o + 0.5), y)
accuracy = tf.reduce_mean(tf.cast(answer, "float"))
print(sess.run([output_o], feed_dict={x: train_in, y: train_out}));
print("Accuracy: ", (1-err) * 100 , "%");
Update: I got it working now. The MSE dropped to almost zero once I increased the number of neurons in the hidden layer. I tried using 5200 and 6400 neurons for the hidden layer and with just 5000 epochs, the accuracy was almost 99%. Also, the largest learning rate I used is 0.1 because when above that, the MSE will not be close to zero.
I'm not an expert in this field, but it looks like your weights are updated correctly. And the fact that your MSE decreases from some higher values to 0.2xxx is the strong indicator of that. I would definitely try to run this problem with way more hidden neurons (e.g. 500)
Btw, are your inputs normalized? If not, that obviously could be the reason
Related
I'm trying to implement a Physical Informed Neural Network. The differential part in the loss did bring some improvment (compare to the classical neural net) in the (supposed) unknown area. This unknown area is actually known but I just removed them from training and testing data set to check performance of PINN vs other technics. Here is the code I m using :
model = tf.keras.Sequential([
layers.Dense(units=64, activation='relu', input_shape=(2,)),
layers.Dense(units=64, activation='relu'),
layers.Dense(units=1,)
])
optimizer = tf.keras.optimizers.Adam()
objective = tf.keras.losses.Huber()
metric = tf.keras.metrics.MeanAbsoluteError()
w_phys = 0.5
w_loss = 1.0 - w_phys
with tf.device('gpu:0'):
for epoch in range(epochs):
cumulative_loss_train = 0.0
metric.reset_states()
for mini_batch, gdth in dataset:
with tf.GradientTape(persistent=True) as tape:
tape.watch(unknown_area_SOCP_tensor)
tape.watch(mini_batch)
# Physics loss
predictions_unkwon = model(unknown_area_SOCP_tensor, training=True)
d_f = tape.gradient(predictions_unkwon, unknown_area_SOCP_tensor)
# Physics part with P #
dp = tf.convert_to_tensor(1/((K*unknown_area_SOCP_tensor[:,0]+L)**2-4*R*unknown_area_SOCP_tensor[:,1]), dtype = np.float64)
phys_loss_p = 10*tf.cast(tf.math.reduce_mean(tf.math.square(d_f[:,1]**2 - dp)), np.float32)
# Traditionall loss #
predictions = model(mini_batch, training=True)
loss = objective(gdth, predictions)
# Compute grads #
grads = tape.gradient(w_loss*loss + w_phys*(phys_loss_p), model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
cumulative_loss_train += loss
metric.update_state(gdth, predictions)
del tape
So far so good. K, R and L were fixed parameter.
Next step was to assume they were unknwon and try to figure out if we could learn them.
I tried first by focusing only on R parameter. Here is the code used :
with tf.device('gpu:0'):
for epoch in range(epochs):
cumulative_loss_train = 0.0
metric.reset_states()
for mini_batch, gdth in dataset:
with tf.GradientTape(persistent=True) as tape:
tape.watch(unknown_area_SOCP_tensor)
tape.watch(mini_batch)
tape.watch(R)
# Physics loss
predictions_unkwon = model(unknown_area_SOCP_tensor, training=True)
d_f = tape.gradient(predictions_unkwon, unknown_area_SOCP_tensor)
# Physics part with P #
dp = tf.convert_to_tensor(1/((K*unknown_area_SOCP_tensor[:,0]+L)**2-4*R*unknown_area_SOCP_tensor[:,1]), dtype = np.float64)
phys_loss_p = 10*tf.cast(tf.math.reduce_mean(tf.math.square(d_f[:,1]**2 - dp)), np.float32)
# Traditionall loss #
predictions = model(mini_batch, training=True)
loss = objective(gdth, predictions)
# Compute grads #
grads = tape.gradient(w_loss*loss + w_phys*(phys_loss_p), model.trainable_variables + [R])
optimizer.apply_gradients(zip(grads, model.trainable_variables + [R]))
cumulative_loss_train += loss
metric.update_state(gdth, predictions)
del tape
But that lead to terrible result (like high loss and poor metric). Worse, the value of R has to be positive, and at the end of the training, R was estimated as a negative value...
I'm quite confident on the equation since I have checked a lot of time, and it seems accurate compared to simulation software I'm using. Moreover, the equation brought value to the learning (as predictions on the unknwon were way better).
Did I miss something here ?
Thanks for your help !
I post my answer here in case this may help someone one day.
My issue came from gradient value which was too high. Clipping gradients finally solved my issue.
Im stepping through the code here: https://www.tensorflow.org/tutorials/text/nmt_with_attention
as a learning method and I am confused as to when the loss function is called and what is passed. I added two print statements in the loss_function and when the training loop runs, it only prints out
(64,)
(64, 4935)
at the very start multiple times and then nothing again. I am confused on two fronts:
Why doesnt the loss_function() get called repeatedly through the training loop and print the shapes? I expected that the loss function would get called at the end of each batch which is of size 64.
I expected the shapes of the actuals to be (batch size, time steps) and the predictions to be (batch size, time steps, vocabulary size). It looks like the loss gets called seperately for every time step (64 is the batch size and 4935 is the vocabulary size).
The relevant bits I believe are reproduced below.
optimizer = tf.keras.optimizers.Adam()
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction='none')
def loss_function(real, pred):
mask = tf.math.logical_not(tf.math.equal(real, 0))
print(real.shape)
print(pred.shape)
loss_ = loss_object(rea
l, pred)
mask = tf.cast(mask, dtype=loss_.dtype)
loss_ *= mask #set padding entries to zero loss
return tf.reduce_mean(loss_)
#tf.function
def train_step(inp, targ, enc_hidden):
loss = 0
with tf.GradientTape() as tape:
enc_output, enc_hidden = encoder(inp, enc_hidden)
dec_hidden = enc_hidden
dec_input = tf.expand_dims([targ_lang.word_index['<start>']] * BATCH_SIZE, 1)
# Teacher forcing - feeding the target as the next input
for t in range(1, targ.shape[1]):
# passing enc_output to the decoder
predictions, dec_hidden, _ = decoder(dec_input, dec_hidden, enc_output)
print(targ[:, t])
print(predictions)
loss += loss_function(targ[:, t], predictions)
# using teacher forcing
dec_input = tf.expand_dims(targ[:, t], 1)
batch_loss = (loss / int(targ.shape[1]))
variables = encoder.trainable_variables + decoder.trainable_variables
gradients = tape.gradient(loss, variables)
optimizer.apply_gradients(zip(gradients, variables))
return batch_loss
EPOCHS = 10
for epoch in range(EPOCHS):
start = time.time()
enc_hidden = encoder.initialize_hidden_state()
total_loss = 0
for (batch, (inp, targ)) in enumerate(dataset.take(steps_per_epoch)):
#print(batch)
batch_loss = train_step(inp, targ, enc_hidden)
total_loss += batch_loss
if batch % 100 == 0:
print('Epoch {} Batch {} Loss {:.4f}'.format(epoch + 1,
batch,
batch_loss.numpy()))
# saving (checkpoint) the model every 2 epochs
if (epoch + 1) % 2 == 0:
checkpoint.save(file_prefix = checkpoint_prefix)
print('Epoch {} Loss {:.4f}'.format(epoch + 1,
total_loss / steps_per_epoch))
print('Time taken for 1 epoch {} sec\n'.format(time.time() - start))
The loss is treated similar to the rest of the graph. In tensorflow calls like tf.keras.layers.Dense and tf.nn.conv2d don't actually do the operation, but instead they define the graph for the operations. I have another post here How do backpropagation works in tensorflow that explains the backprop and some motivation of why this is.
The loss function you have above is
def loss_function(real, pred):
mask = tf.math.logical_not(tf.math.equal(real, 0))
print(real.shape)
print(pred.shape)
loss_ = loss_object(real, pred)
mask = tf.cast(mask, dtype=loss_.dtype)
loss_ *= mask #set padding entries to zero loss
result = tf.reduce_mean(loss_)
return result
Think of this function as a generate that returns result. Result defines the graph to compute the loss. Perhaps a better name for this function would be loss_function_graph_creator ... but that's another story.
Result, which is a graph that contains weights, bias, and information about how to both do the forward propagation and the back propagation is all model.fit needs. It no longer needs this function and it doesn't need to run the function every loop.
Truly, what is happening under the covers is that given your model (called my_model), the compile line
model.compile(loss=loss_function, optimizer='sgd')
is effectively the following lines
input = tf.keras.Input()
output = my_model(input)
loss = loss_function(input,output)
opt = tf.keras.optimizers.SGD()
gradient = opt.minimize(loss)
get_gradient_model = tf.keras.Model(input,gradient)
and there you have the gradient operation which can be use in a loop to get the gradients, which is conceptually what model.fit does.
Q and A
Is the fact that this function: #tf.function def train_step(inp, targ, enc_hidden): has the tf.function decorator (and the loss function is called in it) what makes this code run as you describe and not normal python?
No. It is not 'normal' python. It only defines the flow of tensors through the graph of matrix operations that will (hopefully) run on your GPU. All the tensorflow operations just set up the operations on the GPU (or a simulated GPU if you don't have one).
How can I tell the actual shapes being passed into loss_function (the second part of my question)?
No problem at all... simply run this code
loss_function(y, y).shape
This will compute the loss function of your expected output compared exactly to the same output. The loss will (hopefully) be zero, but actually calculating the value of the loss wasn't the point. You want the shape and this will give it to you.
I am working on a triplet loss based model for this Kaggle competition.
Short Description- In this competition, we have been challenged to build an algorithm to identify individual whales in images by analyzing a database of containing more than 25,000 images, gathered from research institutions and public contributors.
https://www.kaggle.com/c/humpback-whale-identification?rvi=1
I have decided to use a Siamese network architecture and train it to give me encodings which I can then use to calculate the distance between two pictures of whales. If this distance is below a particular threshold the two pictures belong to the same whale and if this distance is greater then, they aren't the same whale.
This is the Triplet loss function(learnt it from Andrew's deeplearning specialization) I used but i also normalized the encoding's to make the loss function more interpretable(easier to determine margin and split point) across different models(if that makes sense).(First, tried it without the normalization and when it didnt work i tried normalizing.) I also have tried changing alpha(margin) and varied it from 0.2 to 0.6.
from tensorflow.nn import l2_normalize as norm_l2
def triplet_loss(y_true, y_pred, alpha = 0.3):
"""
Arguments:
y_true -- true labels, required when you define a loss in Keras, you don't need it in this function.
y_pred -- python list containing three objects:
anchor -- the encodings for the anchor images, of shape (None, 128)
positive -- the encodings for the positive images, of shape (None, 128)
negative -- the encodings for the negative images, of shape (None, 128)
Returns:
loss -- real number, value of the loss
"""
anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]
anchor, positive, negative = norm_l2(anchor), norm_l2(positive), norm_l2(negative)
# Step 1: Compute the (encoding) distance between the anchor and the positive
pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor,positive)), axis = -1)
# Step 2: Compute the (encoding) distance between the anchor and the negative
neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor,negative)), axis = -1)
# Step 3: subtract the two previous distances and add alpha.
basic_loss = tf.add(tf.subtract(pos_dist, neg_dist), alpha)
# Step 4: Take the maximum of basic_loss and 0.0. Sum over the training examples.
loss = tf.reduce_sum(tf.maximum(basic_loss, 0.0))
return loss
This is an example of one of the model architectures i tried out. I have tried using pretrained Facenet, ResNet, DenseNet and Xception till now. I have tried Freezing different numbers of layers in each.
R = tf.keras.applications.ResNet50(include_top=False, weights = 'imagenet', input_shape=(224,224,3))
lr = 0.0001
optimizer = Adam(learning_rate=lr)
R.compile(optimizer=optimizer, loss = triplet_loss)
for layer in R.layers[0:30]:
layer.trainable = False
em_Rmodel = Sequential([
R,
GlobalAveragePooling2D(),
#tf.keras.layers.GlobalMaxPooling2D(),
Dense(512, activation='relu'),
bn(),
Dense(256, activation = 'sigmoid'),
Dense(128, activation = 'sigmoid')
])
def make_tripletModel(model):
#I was manually changing the input shape to fit the default shape of pretrained networks
A = Input(shape = (224, 224, 3), name='anchor')
P = Input(shape = (224, 224, 3), name = 'anchorPositive')
N = Input(shape = (224, 224, 3), name = 'anchorNegative')
enc_A = model(A)
enc_P = model(P)
enc_N = model(N)
tripletModel = Model(inputs=[A, P, N], outputs=[enc_A, enc_P, enc_N])
return tripletModel
tripletModel = make_tripletModel(em_Rmodel)
I have been training using semi-hard triplets and have also been augmenting data properly to generate more training images.
This is the batch generator that i used for training. crop_batch is a function that crops images to show only the whale's tail, using which one can identify whales. It uses a DenseNet trained on more than 1000 images with whale tails and the bounding box surrounding it. Does the work sufficiently well.
def batch_generator_RN(batch_size = batch_size, ishape = (256, 256, 3), model_input_shape = (224, 224, 3)):
triplet_generator = get_triplets()
y_val = np.zeros((batch_size, 2, 1))
anchors = np.zeros((batch_size, ishape[0], ishape[1], ishape[2]))
positives = np.zeros((batch_size, ishape[0], ishape[1], ishape[2]))
negatives = np.zeros((batch_size, ishape[0], ishape[1], ishape[2]))
while True:
for i in range(batch_size):
anchors[i], positives[i], negatives[i] = next(triplet_generator)
anc = crop_batch(anchors, batch_size= batch_size, img_shape=model_input_shape)
pos = crop_batch(positives, batch_size= batch_size, img_shape=model_input_shape)
neg = crop_batch(negatives, batch_size= batch_size, img_shape=model_input_shape)
x_data = {'anchor': anc,
'anchorPositive': pos,
'anchorNegative': neg
}
yield (x_data, [y_val, y_val, y_val])
And finally, this, in general, is how i have been trying to train these models. I have tried reducing and increasing learning rate, batch_size = 16.
lr = 0.0001
optimizer = Adam(learning_rate=lr)
tripletModel.compile(optimizer = optimizer, loss = triplet_loss)
es = EarlyStopping(monitor='loss', patience=20, min_delta=0.05, restore_best_weights=True)
#mc = ModelCheckpoint('Rmodel.h5', monitor='loss', save_best_only=True, save_weights_only=True)
rlr = ReduceLROnPlateau(monitor='loss',min_delta=0.05,factor = 0.1,patience = 5, verbose = 1, min_lr = 0)
gen = batch_generator(batch_size)
tripletModel.fit(gen, steps_per_epoch=64, epochs = 40, callbacks=[es, rlr])
So after training all these models, in some models the triplet loss does go down for a while but then plateaus and basically learns nothing meaningful(which basically means that just by looking at the distance between two embeddings i cant figure out if they are the same whale or not.). In other models, immediately after the first or the second epoch the weights converge and don't change at all and doesn't learning anything.
I have tried a very wide range of learning rates and i am pretty sure that it isnt the problem.
Please tell me if i should add all the code files for you to understand the problem better. The reason i havent done it yet because i havent cleaned it but will gladly do so if required. Thanks.
When you say that it doesn't learn anything, is it that the loss reaches a plateau and thus it stops decreasing or it does decrease significantly but when you predict the embeddings of both same and different whales are are similar in value?
The triples_loss() fn and batch_generator_RN() fn are correct, the problem is not related to the data generation.
However, I suspect that your learning rate is too high while you freeze a lot of layers, i.e. numerous trainable parameters are frozen, which may lead to your network being unable to converge.
My suggestion is to unfreeze all the layers and decrease the learning rate to 0.00001 and start training again, regardless of the architecture that you use (Xception/ResNet etc.)
I have an issue while using AUC from tensorflow library. I train my model (convolutional neural network) per batch ( i do not use a validation set) and after each epoch I use an independent test set to obtain my evaluations. The problem lies within AUC evaluation.
In each batch I calculate AUC/Accuracy/Loss/Precision/Recall/F1_score for the training set and then I aggregate the mean of these scores. When I try to do the same for the test set I again calculate the same scores. I notice that all scores except AUC have different values. I think it is not correct test's loss function to increase and AUC to increase as well. And the problem is that test's AUC is almost identical to training's AUC (even though their accuracy, loss error are completely different).
with tf.name_scope("output"):
W = tf.Variable(tf.truncated_normal([num_filters_total, num_classes], stddev=0.1), name="W")
b = tf.Variable(tf.constant(0.1, shape=[num_classes]), name="b")
scores = tf.nn.xw_plus_b(h_drop, W, b, name="scores")
predictions = tf.argmax(scores, 1, name="predictions")
l2_loss += tf.nn.l2_loss(W, name="l2_loss")
l2_loss += tf.nn.l2_loss(b, name="l2_loss")
tf.summary.histogram("l2", l2_loss)
tf.summary.histogram("weigths", W)
tf.summary.histogram("biases", b)
with tf.name_scope("auc_score"):
# labelOut = tf.argmax(y_place_holder, 1)
probability = tf.nn.softmax(scores)
# auc_scoreTemp = streaming_auc(y_place_holder, probability, curve="PR")
auc_scoreTemp = tf.metrics.auc(y_place_holder, probability, curve="PR")
auc_score = tf.reduce_mean(tf.cast(auc_scoreTemp, tf.float32), name="auc_score")
tf.summary.scalar("auc_score", auc_score)
with tf.name_scope("accuracy"):
labelOut = tf.argmax(y_place_holder, 1)
correct_prediction = tf.equal(predictions, tf.argmax(y_place_holder, 1), name="correct_prediction")
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name="accuracy")
tf.summary.scalar("accuracy", accuracy)
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
for batch in batches:
x_batch, y_batch = list(zip(*batch))
_, accuracy_train, auc_training, loss_train, prec_batch, recall_batch, f1_batch \
= sess.run([train_step, accuracy, auc_score, cross_entropy, precision_mini_batch,
recall_mini_batch, f1_score_min_batch], feed_dict={x_place_holder: x_batch,
y_place_holder: y_batch,
emb_place_holder: vocab_inv_emb_dset,
dropout_keep_prob: dropout_rate})
...
for test_batch in test_batches:
auc_test = None
x_test_batch, y_test_batch = list(zip(*test_batch))
accuracy_test, loss_test, auc_test = sess.run([accuracy, cross_entropy, auc_score],
feed_dict={x_place_holder: x_test_batch,
y_place_holder: y_test_batch,
emb_place_holder: vocab_inv_emb_dset_val,
dropout_keep_prob: 1.0})
I also tried to use streaming_auc which returns always 1.
EDIT
In the end of every epoch I reset the local variables by running:
sess.run(tf.local_variables_initializer())
But the first batch outputs really bad results. After the first batch I get normal results from test set which are not close to the training results. I don't know if this is the correct way to do it but results seem more realistic this way.
All of the tf.metrics return a value and an updating op (see here). So as described here you want to use the updating op to accumulate values and then evaluate auc_score to retrieve the accumulated value, something like this:
...
auc_score, auc_op = tf.metrics.auc(y_place_holder, probability, curve="PR")
...
for batch in batches:
sess.run([train_step, accuracy, auc_op, cross_entropy,...)
...
py_auc = sess.run(auc)
EDIT -- toy example showing tf.metrics.auc and tf.contrib.metrics.streaming_auc
import tensorflow as tf
from tensorflow.contrib import metrics
batch_sz = 100
noise_mag = 0.5
nloop = 10
tf.set_random_seed(0)
batch_x = tf.random_uniform([batch_sz, 1], 0, 2, dtype=tf.int32)
noise = noise_mag * tf.random_normal([batch_sz, 1])
batch_y = tf.sigmoid(tf.to_float(batch_x) + noise)
auc_val, auc_accum = tf.metrics.auc(batch_x, batch_y)
#note: contrib.metrics.streaming_auc reverses labels, predictions
auc_val2, auc_accum2 = metrics.streaming_auc(batch_y, batch_x)
with tf.Session() as sess:
sess.run(tf.local_variables_initializer())
for i in range(nloop):
_ = sess.run([auc_accum, auc_accum2])
auc, auc2 = sess.run([auc_val, auc_val2])
print('Accumulated AUC = ', sess.run(auc_val)) #0.9238014
print('Accumulated AUC2 = ', sess.run(auc_val)) #0.9238014
import numpy as np
import tensorflow as tf
#input data:
x_input=np.linspace(0,10,1000)
y_input=x_input+np.power(x_input,2)
#model parameters
W = tf.Variable(tf.random_normal([2,1]), name='weight')
#bias
b = tf.Variable(tf.random_normal([1]), name='bias')
#placeholders
#X=tf.placeholder(tf.float32,shape=(None,2))
X=tf.placeholder(tf.float32,shape=[None,2])
Y=tf.placeholder(tf.float32)
x_modified=np.zeros([1000,2])
x_modified[:,0]=x_input
x_modified[:,1]=np.power(x_input,2)
#model
#x_new=tf.constant([x_input,np.power(x_input,2)])
Y_pred=tf.add(tf.matmul(X,W),b)
#algortihm
loss = tf.reduce_mean(tf.square(Y_pred -Y ))
#training algorithm
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
#initializing the variables
init = tf.initialize_all_variables()
#starting the session session
sess = tf.Session()
sess.run(init)
epoch=100
for step in xrange(epoch):
# temp=x_input.reshape((1000,1))
#y_input=temp
_, c=sess.run([optimizer, loss], feed_dict={X: x_modified, Y: y_input})
if step%50==0 :
print c
print "Model paramters:"
print sess.run(W)
print "bias:%f" %sess.run(b)
I'm trying to implement Polynomial regression(quadratic) in Tensorflow. The loss isn't converging. Could anyone please help me out with this. The similar logic is working for linear regression though!
First there is a problem in your shapes, for Y_pred and Y:
Y has unknown shape, and is fed with an array of shape (1000,)
Y_pred has shape (1000, 1)
Y - Y_pred will then have shape (1000, 1000)
This small code will prove my point:
a = tf.zeros([1000]) # shape (1000,)
b = tf.zeros([1000, 1]) # shape (1000, 1)
print (a-b).get_shape() # prints (1000, 1000)
You should use consistent types:
y_input = y_input.reshape((1000, 1))
Y = tf.placeholder(tf.float32, shape=[None, 1])
Anyway, the loss is exploding because you have very high values (input between 0 and 100, you should normalize it) and thus very high loss (around 2000 at the beginning of training).
The gradient is very high and the parameters explode, and the loss gets to infinite.
The quickest fix is to lower your learning rate (1e-5 converges for me, albeit very slowly at the end). You can make it higher after the loss converges to around 1.