Basic neural network in TensorFlow - tensorflow

I am trying to implement a very basic neural network in TensorFlow but I am having some problems. It is a very basic network that takes as input to values (hours or sleep and hours of study) and predicts the score on a test (I found this example on you-tube). So basically I have only one hidden layer with three units, each one computes an activation function (sigmoid) and the cost function is sum of square errors and I am using Gradient descent to minimize it. So the problem is, when I train the net with the training data and try to make some predictions using the same training data, the results do not quite match and they also appear strange because the look equal each other.
import tensorflow as tf
import numpy as np
import input_data
sess = tf.InteractiveSession()
# create a 2-D version of input for plotting
trX = np.matrix(([3,5], [5,1],[10,2]), dtype=float)
trY = np.matrix(([85], [82], [93]), dtype=float) # 3X1 matrix
trX = trX / np.max(trX, axis=0)
trY = trY / 100 # 100 is the maximum score allowed
teX = np.matrix(([3,5]), dtype=float)
teY = np.matrix(([85]), dtype=float)
teX = teX/np.amax(teX, axis=0)
teY = teY/100
def init_weights(shape):
return tf.Variable(tf.random_normal(shape, stddev=0.01))
def model(X, w_h, w_o):
z2 = tf.matmul(X, w_h)
a2 = tf.nn.sigmoid(z2) # this is a basic mlp, think 2 stacked logistic regressions
z3 = tf.matmul(a2, w_o)
yHat = tf.nn.sigmoid(z3)
return yHat # note that we dont take the softmax at the end because our cost fn does that for us
X = tf.placeholder("float", [None, 2])
Y = tf.placeholder("float", [None, 1])
W1 = init_weights([2, 3]) # create symbolic variables
W2 = init_weights([3, 1])
sess.run(tf.initialize_all_variables())
py_x = model(X, W1, W2)
cost = tf.reduce_mean(tf.square(py_x - Y))
train_op = tf.train.GradientDescentOptimizer(0.5).minimize(cost) # construct an optimizer
predict_op = py_x
sess.run(train_op, feed_dict={X: trX, Y: trY})
print sess.run(predict_op, feed_dict={X: trX})
sess.close()
It yields:
[[ 0.51873487]
[ 0.51874501]
[ 0.51873082]]
and I believe it should be similar to the training data results.
I am quite new to neural nets and machine learning so pardon me for any mistakes, thanks in advance.

The main reason that your network isn't training is that the statement:
sess.run(train_op, feed_dict={X: trX, Y: trY})
…only executes once. In TensorFlow, running train_op (or whatever operation is returned from Optimizer.minimize() will only cause the network to take a single gradient descent step. You should execute it in a loop to perform iterative training, and the weights will eventually converge.
Two other tips: (i) you might achieve faster convergence if you feed a subset of your training data in each step, rather than the entire dataset; and (ii) the learning rate of 0.5 is probably too high (although this depends on the data).

Related

Meta-Gradients / Multi-Batch Backpropagation in tansorflow

I am trying to implement a meta-gradient based pruning-at-initialization method by Alizadeh et al. (2022) in tensorflow. The method works roughly like this:
Take some batches from the dataset.
Mask all weights of the network with ones (e. g. tf.ones).
Perform one update of the weights, including the mask.
UNMASK all weights and perform the rest of the updates through the other batches.
Compute the meta-gradient of the loss w. r. t. the mask, i. e. backpropagate through all batches and weight-updates until the mask from the first iteration is "reached".
The authors implement this in pytorch, which I typically do not use at work. I want to implement it in tensorflow, yet I run into the following problem: tensorflow is not designed to process gradients "through" assign-operations. E. g. that means:
w = tf.Variable([4.])
c = tf.Variable([2.])
with tf.GradientTape() as tape:
tape.watch(c)
w.assign(w * c)
output = 2. * w
print(output)
# >> tf.Tensor([16.], shape=(1,), dtype=float32)
print(tape.gradient(output, c))
# >> None
That being said, my "pruning loop" is looking somewhat like this:
test_factor = tf.Variable(1., dtype=tf.float32)
with tf.GradientTape(persistent=True) as outer_tape:
outer_tape.watch(masked_model.masks)
outer_tape.watch(test_factor)
## First btach
X_batch, y_batch = wrp.non_random_batch(X_train, y_train, 0, 256)
with tf.GradientTape() as tape1:
y_pred = masked_model(X_batch)
loss = test_factor*loss_fn(y_batch, y_pred)
gradients = tape1.gradient(loss, masked_model.proper_weights)
## Updating weights
for w, g in zip(masked_model.proper_weights, gradients):
w.assign(w - 0.05*g)
## Unmasking
masked_model.unmask_forward_passes()
## Second batch (and more)
X_batch, y_batch = wrp.non_random_batch(X_train, y_train, 1, 256)
with tf.GradientTape() as tape2:
y_pred = masked_model(X_batch)
loss = loss_fn(y_batch, y_pred)
gradients = tape2.gradient(loss, masked_model.proper_weights)
print(outer_tape.gradient(loss, masked_model.masks))
# >> ListWrapper([None, None, ..., None])
print(outer_tape.gradient(loss, test_factor))
# >> None
Where after the second batch more batches would be to come.
I inserted the test_factor to show, that this problem is not some problem with my masks, but with the general structure. Simply changing the line w.assign(w - 0.05*g) to w = w - 0.05*g enables the usage of the gradient, but then the weights are not actually updated...
For the authors of the paper mentioned, this does not seem to be a problem. Is pytorch simply more powerful in such cases, or do I miss some kind of trick to get this to work in tensorflow?

TensorFlow get updates computed by optimizer

In tf, the optimizer class only has two function:
compute_gradients
apply_gradients
where apply_gradients returns an op that performs the update w <- w + Δw via a tf.assign_add function.
However I need direct access to the Δw itself. (or w' = w+Δw). I know that the optimizer adds nodes to the computational graph which compute this Δw for each variable. How can I access them? Or do I have to re-implement the optimizer myself?
The reason is that I need to compute gradients dw'/dw, as I am working on something related to gradient based hyperparameter optimization (cf. https://arxiv.org/abs/1703.01785)
The "delta" applied to each variable is not accessible through any common method or name. In fact, looking a bit into the source it seems rather difficult extract, as it varies from one optimizer to the other.
What you can do, at least, is to compute the differences between variable values and their updates. For example it could work like this:
import tensorflow as tf
with tf.Graph().as_default():
# Setup example model
x = tf.placeholder(tf.float32, [None, 1])
y = tf.placeholder(tf.float32, [None, 2])
w = tf.Variable([[1., 2.]], tf.float32)
pred = x # w
loss = (tf.reduce_sum(tf.squared_difference(pred, y))
/ tf.cast(tf.shape(x)[0], tf.float32))
# Record variable values before training step
# (tf.identity should work here but it does not so we use
# a trivial add operation to enforce the control dependency)
w_old = w + 0
# Train after having recorded variable values
with tf.control_dependencies([w_old]):
train_op = tf.train.GradientDescentOptimizer(0.1).minimize(loss)
# Compute deltas after training
with tf.control_dependencies([train_op]):
w_delta = w.read_value() - w_old
init_op = tf.global_variables_initializer()
# Test
with tf.Session() as sess:
sess.run(init_op)
print(sess.run(w))
# [[1. 2.]]
_, w_delta_val = sess.run(
[train_op, w_delta],
feed_dict={x: [[1.], [2.]], y: [[3., 4], [5., 6.]]})
print(w_delta_val)
# [[0.79999995 0.5999999 ]]
print(sess.run(w))
# [[1.8 2.6]]
To get the updated w', you get just print the w directly after you have executed optimizer.apply_gradients(). The w at present is the w'.
Now, if you want to acquire the gradient of w, just do the operation of w'-w.

How to force Tensorflow to show a simple linear regression prediction result?

I have a simple linear regression question as below:
My codes are as below:
import tensorflow as tf
import numpy as np
batch_xs=np.array([[0,0,1],[1,1,1],[1,0,1],[0,1,1]])
batch_ys=np.array([[0],[1],[1],[0]])
x = tf.placeholder(tf.float32, [None, 3])
W = tf.Variable(tf.zeros([3, 1]))
b = tf.Variable(tf.zeros([1]))
y = tf.nn.sigmoid(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 1])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
learning_rate = 0.05
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)
sess = tf.Session()
tf.global_variables_initializer().run(session=sess)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
Prediction:
x0=np.array([[1.,0.,0.]])
x0=np.float32(x0)
y0=tf.nn.softmax(tf.matmul(x0,W) + b)
print(y0)
However, print(y0) shows Tensor("Softmax_2:0", shape=(1, 1), dtype=float32) instead of a figure. I expect y0 would be around 0.99.
I tried y0.eval(), but I got ValueError: Cannot evaluate tensor using 'eval()': No default session is registered..
How can I make a change to obtain the result? Thanks!
There are a couple of ways to get things to print out while writing TensorFlow code. Of course, there’s the classic Python built-in, print (Or the function print(), if we’re being Python 3 about it). And then there’s TensorFlow’s print function, tf.Print (notice the capital P).
When working with TensorFlow, it’s important to remember that everything is ultimately a graph computation. This means that if you print a TensorFlow operation using Python’s print, it will simply show a description of what that operation is, since no values have been passed through it yet. It will also often show the dimensions that are expected to be in that node, if they’re known.
If you want to print the values that are ‘flowing’ through a particular part of the graph as it’s being executed, then we need to turn to using tf.Print.

Can't learn parameters of tf.contrib.distributions.MultivariateNormalDiag via optimization

Working example:
import numpy as np
import tensorflow as tf
## construct data
np.random.seed(723888)
N,P = 50,3 # number and dimensionality of observations
Xbase = np.random.multivariate_normal(mean=np.zeros((P,)), cov=np.eye(P), size=N)
## construct model
X = tf.placeholder(dtype=tf.float32, shape=(None, P), name='X')
mu = tf.Variable(np.random.normal(loc=0.0, scale=0.1, size=(P,)), dtype=tf.float32, name='mu')
xDist = tf.contrib.distributions.MultivariateNormalDiag(loc=mu, scale_diag=tf.ones(shape=(P,), dtype=tf.float32), name='xDist')
xProbs = xDist.prob(X, name='xProbs')
## prepare optimizer
eta = 1e-3 # learning rate
loss = -tf.reduce_mean(tf.log(xProbs), name='loss')
optimizer = tf.train.AdamOptimizer(learning_rate=eta).minimize(loss)
## launch session
with tf.Session() as sess:
tf.global_variables_initializer().run()
sess.run(optimizer, feed_dict={X: Xbase})
I want to do optimization over the parameters of a multivariate gaussian distribution in tensorflow, as in my above example. I can successfully run commands like sess.run(loss, feed_dict={X: Xbase}), so I have implemented the distribution correctly. When I try to run the optimization op, I get an odd error message:
InvalidArgumentError: -1 is not between 0 and 3
[[Node: gradients_1/xDist_7/xProbs/Prod_grad/InvertPermutation = InvertPermutation[T=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients_1/xDist_7/xProbs/Prod_grad/concat)]]
Caused by op 'gradients_1/xDist_7/xProbs/Prod_grad/InvertPermutation'
That I do not understand.
I get the same error message if I use tf.contrib.distributions.MultivariateNormalFullCovariance instead of tf.contrib.distributions.MultivariateNormalDiag. I do not get the error if scale_diag and not loc is the variable being optimized over.
I'm still looking into why this is failing, but for a short-term fix, does making the following change work?
xLogProbs = xDist.log_prob(X, name='xLogProbs')
loss = -tf.reduce_mean(xLogProbs, name='loss')
Note: this is actually preferable to tf.log(xProbs) because it is never less numerically precise--and sometimes substantially more precise. (This is true of all tf.Distributions.)

Linear Regression model On tensorflow can't learn bias

I am trying to train a linear regression model in Tensorflow using some generated data. The model seems to learn the slope of the line, but is unable to learn the bias.
I have tried changing the no. of epochs, the weight(slope) and the biases, but every time , the learnt bias by the model comes out to be zero. I don't know where I am going wrong and some help would be appreciated.
Here is the code.
import numpy as np
import tensorflow as tf
# assume the linear model to be Y = W*X + b
X = tf.placeholder(tf.float32, [None, 1])
Y = tf.placeholder(tf.float32, [None,1])
# the weight and biases
W = tf.Variable(tf.zeros([1,1]))
b = tf.Variable(tf.zeros([1]))
# the model
prediction = tf.matmul(X,W) + b
# the cost function
cost = tf.reduce_mean(tf.square(Y - prediction))
# Use gradient descent
learning_rate = 0.000001
train_step =
tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
steps = 1000
epochs = 10
Verbose = False
# In the end, the model should learn these values
test_w = 3
bias = 10
for _ in xrange(epochs):
for i in xrange(steps):
# make fake data for the model
# feed one example at a time
# stochastic gradient descent, because we only use one example at a time
x_temp = np.array([[i]])
y_temp = np.array([[test_w*i + bias]])
# train the model using the data
feed_dict = {X: x_temp, Y:y_temp}
sess.run(train_step,feed_dict=feed_dict)
if Verbose and i%100 == 0:
print("Iteration No: %d" %i)
print("W = %f" % sess.run(W))
print("b = %f" % sess.run(b))
print("Finally:")
print("W = %f" % sess.run(W))
print("b = %f" % sess.run(b))
# These values should be close to the values we used to generate data
https://github.com/HarshdeepGupta/tensorflow_notebooks/blob/master/Linear%20Regression.ipynb
Outputs are in the last line of code.
The model needs to learn test_w and bias (in the notebook link, it is in the 3rd cell, after the first comment), which are set to 3 and 10 respectively.
The model correctly learns the weight(slope), but is unable to learn the bias. Where is the error?
The main problem is that you are feeding just one sample at a time to the model. This makes your optimizer very inestable, that's why you have to use such a small learning rate. I will suggest you to feed more samples in each step.
If you insist in feeding one sample at a time, maybe you should consider using an optimizer with momentum, like tf.train.AdamOptimizer(learning_rate). This way you can increase the learning rate and reach convergence.