SGD converges but batch learning does not, simple regression in tensorflow - tensorflow

I have run into an issue where batch learning in tensorflow fails to converge to the correct solution for a simple convex optimization problem, whereas SGD converges. A small example is found below, in the Julia and python programming languages, I have verified that the same exact behaviour results from using tensorflow from both Julia and python.
I'm trying to fit the linear model y = s*W + B with parameters W and B
The cost function is quadratic, so the problem is convex and should be easily solved using a small enough step size. If I feed all data at once, the end result is just a prediction of the mean of y. If, however, I feed one datapoint at the time (commented code in julia version), the optimization converges to the correct parameters very fast.
I have also verified that the gradients computed by tensorflow differs between the batch example and summing up the gradients for each datapoint individually.
Any ideas on where I have failed?
using TensorFlow
s = linspace(1,10,10)
s = [s reverse(s)]
y = s*[1,4] + 2
session = Session(Graph())
s_ = placeholder(Float32, shape=[-1,2])
y_ = placeholder(Float32, shape=[-1,1])
W = Variable(0.01randn(Float32, 2,1), name="weights1")
B = Variable(Float32(1), name="bias3")
q = s_*W + B
loss = reduce_mean((y_ - q).^2)
train_step = train.minimize(train.AdamOptimizer(0.01), loss)
function train_critic(s,targets)
for i = 1:1000
# for i = 1:length(y)
# run(session, train_step, Dict(s_ => s[i,:]', y_ => targets[i]))
# end
ts = run(session, [loss,train_step], Dict(s_ => s, y_ => targets))[1]
println(ts)
end
v = run(session, q, Dict(s_ => s, y_ => targets))
plot(s[:,1],v, lab="v (Predicted value)")
plot!(s[:,1],y, lab="y (Correct value)")
gui();
end
run(session, initialize_all_variables())
train_critic(s,y)
Same code in python (I'm not a python user so this might be ugly)
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
import sklearn.datasets
import tensorflow as tf
from tensorflow.python.framework.ops import reset_default_graph
s = np.linspace(1,10,50).reshape((50,1))
s = np.concatenate((s,s[::-1]),axis=1).astype('float32')
y = np.add(np.matmul(s,[1,4]), 2).astype('float32')
reset_default_graph()
rng = np.random
s_ = tf.placeholder(tf.float32, [None, 2])
y_ = tf.placeholder(tf.float32, [None])
weight_initializer = tf.truncated_normal_initializer(stddev=0.1)
with tf.variable_scope('model'):
W = tf.get_variable('W', [2, 1],
initializer=weight_initializer)
B = tf.get_variable('B', [1],
initializer=tf.constant_initializer(0.0))
q = tf.matmul(s_, W) + B
loss = tf.reduce_mean(tf.square(tf.sub(y_ , q)))
optimizer = tf.train.AdamOptimizer(learning_rate=0.1)
train_op = optimizer.minimize(loss)
num_epochs = 200
train_cost= []
with tf.Session() as sess:
init = tf.initialize_all_variables()
sess.run(init)
for e in range(num_epochs):
feed_dict_train = {s_: s, y_: y}
fetches_train = [train_op, loss]
res = sess.run(fetches=fetches_train, feed_dict=feed_dict_train)
train_cost = [res[1]]
print train_cost

The answer turned out to be that when I fed in the targets, I fed a vector and not an Nx1 matrix. The operation y_-q then turned into a broadcast operation and instead of returning the elementwise difference, it returned an NxN matrix with the desired difference along the diagonal. In Julia, I solved this by modifying the line
train_critic(s,y)
to
train_critic(s,reshape(y, length(y),1))
to ensure y being a matrix.
A subtle error that took me a very long time to find! Part of the confusion was that TensorFlow seems to treat vectors as row vectors and not as column vectors like Julia, hence the broadcast operation in y_-q

Related

How can I print the loss value obtained from model.add_loss() function in Tensorflow 2 (with Keras API)?

I have a classification model in Keras API of Tensorflow 2. The model has two losses in the example below, categorical cross entropy and KL-divergence. Categorical cross-entropy is being added while initializing the model and hence it can be printed while training. KL-divergence is being added separately using model.add_loss(). However, it does not display during training. Is there a way to print it while training? The sample code is shown below. Since it is on simulated data, loss value might come to be nan.
import tensorflow as tf
keras = tf.keras
import keras.backend as K
import numpy as np
def ohc(y):
y = np.random.randint(4, size=100)
b = np.zeros((y.size, y.max() + 1))
b[np.arange(y.size), y] = 1
return b
xtrain = np.random.rand(100,50)
y = np.random.randint(4, size=100)
y_one_hot = ohc(y)
def my_kld(y_true, y_pred):
y_pred2 = tf.keras.layers.Lambda(lambda x: x + 0.00001)(y_pred)
LR = y_true/y_pred2
logLR = K.log(LR)
kld = y_true*logLR
loss = K.mean(kld)
loss = tf.keras.layers.Lambda(lambda x: x * 0.2)(loss)
return loss
x_t = keras.layers.Input((50,))
y_t = keras.layers.Input((4,))
x1 = keras.layers.Dense(100, activation = 'relu')(x_t)
x2 = keras.layers.Dense(4, activation = 'softmax')(x1)
model = keras.models.Model([x_t,y_t], x2)
model.add_loss(my_kld(y_t, x2))
optim = keras.optimizers.Nadam(0.00006)
model.compile(loss=['categorical_crossentropy'], optimizer=optim, metrics=['accuracy'])
model.fit(x=[xtrain,y_one_hot], y = y_one_hot, epochs = 100)
The code that I have tried has been included in the question in an implementable way. Hwever, there does not seem to be any way to print the loss from model.add_loss().

TensorFlow code not giving intended results

The following code has the irritating trait of making every row of "out" the same. I am trying to classify k time series in Xtrain as [1,0,0,0], [0,1,0,0], [0,0,1,0], or [0,0,0,1], according to the way they were generated (by one of four random algorithms). Anyone know why? Thanks!
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import copy
n = 100
m = 10
k = 1000
hidden_layers = 50
learning_rate = .01
training_epochs = 10000
Xtrain = []
Xtest = []
Ytrain = []
Ytest = []
# ... fill variables with data ..
x = tf.placeholder(tf.float64,shape = (k,1,n,1))
y = tf.placeholder(tf.float64,shape = (k,1,4))
conv1_weights = 0.1*tf.Variable(tf.truncated_normal([1,m,1,hidden_layers],dtype = tf.float64))
conv1_biases = tf.Variable(tf.zeros([hidden_layers],tf.float64))
conv = tf.nn.conv2d(x,conv1_weights,strides = [1,1,1,1],padding = 'VALID')
sigmoid1 = tf.nn.sigmoid(conv + conv1_biases)
s = sigmoid1.get_shape()
sigmoid1_reshape = tf.reshape(sigmoid1,(s[0],s[1]*s[2]*s[3]))
sigmoid2 = tf.nn.sigmoid(tf.layers.dense(sigmoid1_reshape,hidden_layers))
sigmoid3 = tf.nn.sigmoid(tf.layers.dense(sigmoid2,4))
penalty = tf.reduce_sum((sigmoid3 - y)**2)
train_op = tf.train.AdamOptimizer(learning_rate).minimize(penalty)
model = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(model)
for i in range(0,training_epochs):
sess.run(train_op,{x: Xtrain,y: Ytrain})
out = sigmoid3.eval(feed_dict = {x: Xtest})
Likely because your loss function is mean squared error. If you're doing classification you should be using cross-entropy loss
Your loss is penalty = tf.reduce_sum((sigmoid3 - y)**2) that's the squared difference elementwise between a batch of predictions and a batch of values.
Your network output (sigmoid3) is a tensor with shape [?, 4] and y (I guess) is a tensor with shape [?, 4] too.
The squared difference has thus shape [?, 4].
This means that the tf.reduce_sum is computing in order:
The sum over the second dimension of the squared difference, producing a tensor with shape [?]
The sum over the first dimension (the batch size, here indicated with ?) producing a scalar value (shape ()) that's your loss value.
Probably you don't want this behavior (the sum over the batch dimension) and you're looking for the mean squared error over the batch:
penalty = tf.reduce_mean(tf.squared_difference(sigmoid3, y))

Sampling from tensor that depends on a random variable in tensorflow

Is it possible to get samples from a tensor that depends on a random variable in tensorflow? I need to get an approximate sample distribution to use in a loss function to be optimized. Specifically, in the example below, I want to be able to obtain samples of Y_output in order to be able to calculate the mean and variance of the output distribution and use these parameters in a loss function.
def sample_weight(mean, phi, seed=1):
P_epsilon = tf.distributions.Normal(loc=0., scale=1.0)
epsilon_s = P_epsilon.sample([1])
s = tf.multiply(epsilon_s, tf.log(1.0+tf.exp(phi)))
weight_sample = mean + s
return weight_sample
X = tf.placeholder(tf.float32, shape=[None, 1], name="X")
Y_labels = tf.placeholder(tf.float32, shape=[None, 1], name="Y_labels")
sw0 = sample_weight(u0,p0)
sw1 = sample_weight(u1,p1)
Y_output = sw0 + tf.multiply(sw1,X)
loss = tf.losses.mean_squared_error(labels=Y_labels, predictions=Y_output)
train_op = tf.train.AdamOptimizer(0.5e-1).minimize(loss)
init_op = tf.global_variables_initializer()
losses = []
predictions = []
Fx = lambda x: 0.5*x + 5.0
xrnge = 50
xs, ys = build_toy_data(funcx=Fx, stdev=2.0, num=xrnge)
with tf.Session() as sess:
sess.run(init_op)
iterations=1000
for i in range(iterations):
stat = sess.run(loss, feed_dict={X: xs, Y_labels: ys})
Not sure if this answers your question, but: when you have a Tensor downstream from a sampling Op (e.g., the Op created by your call to P_epsilon.sample([1]), anytime you call sess.run on the downstream Tensor, the sample op will be re-run, and produce a new random value. Example:
import tensorflow as tf
from tensorflow_probability import distributions as tfd
n = tfd.Normal(0., 1.)
s = n.sample()
y = s**2
sess = tf.Session() # Don't actually do this -- use context manager
print(sess.run(y))
# ==> 0.13539088
print(sess.run(y))
# ==> 0.15465781
print(sess.run(y))
# ==> 4.7929106
If you want a bunch of samples of y, you could do
import tensorflow as tf
from tensorflow_probability import distributions as tfd
n = tfd.Normal(0., 1.)
s = n.sample(100)
y = s**2
sess = tf.Session() # Don't actually do this -- use context manager
print(sess.run(y))
# ==> vector of 100 squared random normal values
We also have some cool tools in tensorflow_probability to do the kind of thing you're driving at here. Namely the Bijector API and, somewhat simpler, the trainable_distributions API.
(Another minor point: I'd suggest using tf.nn.softplus, or at a minimum tf.log1p(tf.exp(x)) instead of tf.log(1.0 + tf.exp(x)). The latter has poor numerical properties due to floating point imprecision, which the former are optimized for).
Hope this is some help!

How to resolve Tensorflow Fetch argument error while trying to pass arguments through feed_dict?

Below is the code that I am running where in I am implementing a paper. I take two matrices, multiply them and then perform clustering. What am I doing wrong?
import tensorflow as tf
from sklearn.cluster import KMeans
import numpy as np
a = np.random.rand(10,10)
b = np.random.rand(10,5)
F = tf.placeholder("float", [None, 10], name='F')
mask = tf.placeholder("float", [None, 5], name='mask')
def getZfeature(F,mask):
return tf.matmul(F,mask)
def cluster(zfeature):
#km = KMeans(n_clusters=3)
#km.fit(x)
#mu = km.cluster_centers_
return zfeature
def computeQ(zfeature,mu):
print "computing q matrix"
print type(zfeature), type(mu)
#construct model
zfeature = getZfeature(F,mask)
mu = cluster(zfeature)
q = computeQ(zfeature,mu)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
sess.run(q, feed_dict={F: a, mask: b})
Working code below. Your problem is that q and mu don't do anything. q is a reference to the function computeQ as it doesn't return anything. mu doesn't do anything so in this answer I have evaluated zfeature. You can do more tensor operations in these two functions but you need to return a tensor for it to work.
import tensorflow as tf
from sklearn.cluster import KMeans
import numpy as np
a = np.random.rand(10,10)
b = np.random.rand(10,5)
F = tf.placeholder("float", [None, 10], name='F')
mask = tf.placeholder("float", [None, 5], name='mask')
def getZfeature(F,mask):
return tf.matmul(F,mask)
def cluster(zfeature):
#km = KMeans(n_clusters=3)
#km.fit(x)
#mu = km.cluster_centers_
return zfeature
def computeQ(zfeature,mu):
print ("computing q matrix")
print (type(zfeature), type(mu))
#construct model
zfeature = getZfeature(F,mask)
mu = cluster(zfeature)
q = computeQ(zfeature,mu)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
result=sess.run(zfeature, feed_dict={F: a, mask: b})
print(result)
Here is the code for k-means clustering using tensorflow from https://codesachin.wordpress.com/2015/11/14/k-means-clustering-with-tensorflow/
import tensorflow as tf
from random import choice, shuffle
from numpy import array
def TFKMeansCluster(vectors, noofclusters):
"""
K-Means Clustering using TensorFlow.
'vectors' should be a n*k 2-D NumPy array, where n is the number
of vectors of dimensionality k.
'noofclusters' should be an integer.
"""
noofclusters = int(noofclusters)
assert noofclusters < len(vectors)
#Find out the dimensionality
dim = len(vectors[0])
#Will help select random centroids from among the available vectors
vector_indices = list(range(len(vectors)))
shuffle(vector_indices)
#GRAPH OF COMPUTATION
#We initialize a new graph and set it as the default during each run
#of this algorithm. This ensures that as this function is called
#multiple times, the default graph doesn't keep getting crowded with
#unused ops and Variables from previous function calls.
graph = tf.Graph()
with graph.as_default():
#SESSION OF COMPUTATION
sess = tf.Session()
##CONSTRUCTING THE ELEMENTS OF COMPUTATION
##First lets ensure we have a Variable vector for each centroid,
##initialized to one of the vectors from the available data points
centroids = [tf.Variable((vectors[vector_indices[i]]))
for i in range(noofclusters)]
##These nodes will assign the centroid Variables the appropriate
##values
centroid_value = tf.placeholder("float64", [dim])
cent_assigns = []
for centroid in centroids:
cent_assigns.append(tf.assign(centroid, centroid_value))
##Variables for cluster assignments of individual vectors(initialized
##to 0 at first)
assignments = [tf.Variable(0) for i in range(len(vectors))]
##These nodes will assign an assignment Variable the appropriate
##value
assignment_value = tf.placeholder("int32")
cluster_assigns = []
for assignment in assignments:
cluster_assigns.append(tf.assign(assignment,
assignment_value))
##Now lets construct the node that will compute the mean
#The placeholder for the input
mean_input = tf.placeholder("float", [None, dim])
#The Node/op takes the input and computes a mean along the 0th
#dimension, i.e. the list of input vectors
mean_op = tf.reduce_mean(mean_input, 0)
##Node for computing Euclidean distances
#Placeholders for input
v1 = tf.placeholder("float", [dim])
v2 = tf.placeholder("float", [dim])
euclid_dist = tf.sqrt(tf.reduce_sum(tf.pow(tf.sub(
v1, v2), 2)))
##This node will figure out which cluster to assign a vector to,
##based on Euclidean distances of the vector from the centroids.
#Placeholder for input
centroid_distances = tf.placeholder("float", [noofclusters])
cluster_assignment = tf.argmin(centroid_distances, 0)
##INITIALIZING STATE VARIABLES
##This will help initialization of all Variables defined with respect
##to the graph. The Variable-initializer should be defined after
##all the Variables have been constructed, so that each of them
##will be included in the initialization.
init_op = tf.initialize_all_variables()
#Initialize all variables
sess.run(init_op)
##CLUSTERING ITERATIONS
#Now perform the Expectation-Maximization steps of K-Means clustering
#iterations. To keep things simple, we will only do a set number of
#iterations, instead of using a Stopping Criterion.
noofiterations = 100
for iteration_n in range(noofiterations):
##EXPECTATION STEP
##Based on the centroid locations till last iteration, compute
##the _expected_ centroid assignments.
#Iterate over each vector
for vector_n in range(len(vectors)):
vect = vectors[vector_n]
#Compute Euclidean distance between this vector and each
#centroid. Remember that this list cannot be named
#'centroid_distances', since that is the input to the
#cluster assignment node.
distances = [sess.run(euclid_dist, feed_dict={
v1: vect, v2: sess.run(centroid)})
for centroid in centroids]
#Now use the cluster assignment node, with the distances
#as the input
assignment = sess.run(cluster_assignment, feed_dict = {
centroid_distances: distances})
#Now assign the value to the appropriate state variable
sess.run(cluster_assigns[vector_n], feed_dict={
assignment_value: assignment})
##MAXIMIZATION STEP
#Based on the expected state computed from the Expectation Step,
#compute the locations of the centroids so as to maximize the
#overall objective of minimizing within-cluster Sum-of-Squares
for cluster_n in range(noofclusters):
#Collect all the vectors assigned to this cluster
assigned_vects = [vectors[i] for i in range(len(vectors))
if sess.run(assignments[i]) == cluster_n]
#Compute new centroid location
new_location = sess.run(mean_op, feed_dict={
mean_input: array(assigned_vects)})
#Assign value to appropriate variable
sess.run(cent_assigns[cluster_n], feed_dict={
centroid_value: new_location})
#Return centroids and assignments
centroids = sess.run(centroids)
assignments = sess.run(assignments)
return centroids, assignments

Making simple rnn code with scan function in Tensorflow

I recently started to learn Tensorflow and try to make simple rnn code using scan function.
What I'm trying to do is to make The RNN predict sine function.
It gets input of 1 dim. and outputs also 1 dim in batch as follow.
import tensorflow as tf
from tensorflow.examples.tutorials import mnist
import numpy as np
import matplotlib.pyplot as plt
import os
import time
# FLAGS (options)
tf.flags.DEFINE_string("data_dir", "", "")
#tf.flags.DEFINE_boolean("read_attn", True, "enable attention for reader")
#tf.flags.DEFINE_boolean("write_attn",True, "enable attention for writer")
opt = tf.flags.FLAGS
#Parameters
time_step = 10
num_rnn_h = 16
batch_size = 2
max_epoch=10000
learning_rate=1e-3 # learning rate for optimizer
eps=1e-8 # epsilon for numerical stability
#temporary sinusoid data
x_tr = np.zeros([batch_size,time_step])
y_tr = np.zeros([batch_size,time_step])
ptrn = 0.7*np.sin(np.arange(time_step+1)/(2*np.pi))
x_tr[0] = ptrn[0:time_step]
y_tr[0] = ptrn[1:time_step+1]
x_tr[1] = ptrn[0:time_step]
y_tr[1] = ptrn[1:time_step+1]
#Build model
x = tf.placeholder(tf.float32,shape=[batch_size,time_step,1], name= 'input')
y = tf.placeholder(tf.float32,shape=[None,time_step,1], name= 'target')
cell = tf.nn.rnn_cell.BasicRNNCell(num_rnn_h)
#cell = tf.nn.rnn_cell.LSTMCell(num_h, state_is_tuple=True)
with tf.variable_scope('output'):
W_o = tf.get_variable('W_o', shape=[num_rnn_h, 1])
b_o = tf.get_variable('b_o', shape=[1], initializer=tf.constant_initializer(0.0))
init_state = cell.zero_state(batch_size, tf.float32)
#make graph
#rnn_outputs, final_states = tf.scan(cell, xx1, initializer= tf.zeros([num_rnn_h]))
scan_outputs = tf.scan(lambda a, xi: cell(xi, a), tf.transpose(x, perm=[1,0,2]), initializer= init_state)
rnn_outputs, rnn_states = tf.unpack(tf.transpose(scan_outputs,perm=[1,2,0,3]))
print rnn_outputs, rnn_states
with tf.variable_scope('predictions'):
weighted_sum = tf.reshape(tf.matmul(tf.reshape(rnn_outputs, [-1, num_rnn_h]), W_o), [batch_size, time_step, 1])
predictions = tf.add(weighted_sum, b_o, name='predictions')
with tf.variable_scope('loss'):
loss = tf.reduce_mean((y - predictions) ** 2, name='loss')
train_step = tf.train.AdamOptimizer(learning_rate).minimize(loss)
But It gives an error at the last line (optimizer) like ,
ValueError: Shapes (2, 16) and (2, 2, 16) are not compatible
Please someone knows the reason, tell me how to fix it...
I assume your error is not on the last line (the optimizer) but rather on some operation you are doing earlier. Perhaps in the reduce_mean with this y - prediction? I will not go over your code in details but I will tell you that this error comes when you do an operation between two tensors which require the same shape (usually math operations).