I just learnt about tensorflow. To be more familiar with the syntax, I build a toy model to perform polynomial regression.
The toy dataset that I created is
x_data = np.linspace(-1, 1, 300) + np.random.uniform(-0.05, 0.05, 300)
y_data = np.linspace(-1, 1, 300) ** 2 + np.random.uniform(-0.05, 0.05, 300)
The model that I built is
batch_size = 20
x = tf.placeholder(tf.float64, [1, batch_size])
y = tf.placeholder(tf.float64, [1, batch_size])
a0 = tf.Variable(np.random.rand(1))
a1 = tf.Variable(np.random.rand(1))
a2 = tf.Variable(np.random.rand(1))
a3 = tf.Variable(np.random.rand(1))
a4 = tf.Variable(np.random.rand(1))
a5 = tf.Variable(np.random.rand(1))
a6 = tf.Variable(np.random.rand(1))
op = a6 * x ** 6 + a5 * x ** 5 + a4 * x ** 4 + a3 * x ** 3 + a2 * x ** 2 + a1 * x ** 1 + a0
error = tf.reduce_sum(tf.square(op - y))
init = tf.global_variables_initializer()
optimizer = tf.train.GradientDescentOptimizer(0.0001)
train = optimizer.minimize(error)
sess = tf.Session()
steps = 100000
sess.run(init)
for i in range(steps):
rand_int = np.random.randint(0, 300, batch_size)
x_temp = x_data[rand_int].reshape(1, batch_size)
y_temp = y_data[rand_int].reshape(1, batch_size)
feed = {x: x_temp, y: y_temp}
sess.run(train, feed)
a0, a1, a2, a3, a4, a5, a6= sess.run([a0, a1, a2, a3, a4, a5, a6])
However, after I run the model, the result that I got is:
[a0, a1, a2, a3, a4, a5, a6] = [array([ nan]), array([ nan]), array([ nan]), array([ nan]), array([ nan]), array([ nan]), array([ nan])]
Why did the model learn nothing? I've changed the learning rate to a magnitude smaller, yet the outcome is still the same.
Related
Is there a way, in tensorflow, to multiply each channel by a different matrix?
Imagine you have a 2D array A of dimensions (N, D1).
You can multiply it by an array B of size (D1, D2) to get output size (N, D2).
Now imagine you have a 3D array of dimensions (N, D1, 3).
Suppose you had B1, B2, B3 all of size (D1, D2). Combining the outputs A * B1, A * B2, A * B3, you could form an array of size (N, D2, 3).
But is there a way to get an output size of (N, D2, 3) by just doing multiplication once?
I looked into transpose and matmul but it doesn't seem to work for this purpose.
Thank you!
tf.einsum() could be applied here.
To make the code below easier to understand, I renamed D1 = O and D2 = P.
import tensorflow as tf
A = tf.random_normal([N, O, 3])
B = tf.random_normal([O, P, 3]) # B = tf.stack([B1, B2, B3], axis=2)
res = tf.einsum("noi,opi->npi", A, B)
You could use tf.matmul here. Its just that you will have to transpose the dimensions.
Consider, N = 2, D1 = 4, D2 = 5. First create two matrices having shapes N x D1 x 3 and D1 x D2 x 3.
a = tf.constant(np.arange(1, 25, dtype=np.int32), shape=[2,4,3])
b = tf.constant(np.arange(1, 61, dtype=np.int32), shape=[4,5,3])
Transpose the matrices so that the first dimension is the same.
a = tf.transpose(a, (2, 0, 1)) # a.shape = (3, 2, 4)
b = tf.transpose(b, (2, 0, 1)) # b.shape = (3, 4, 5)
Perform the multiplication as usual.
r = tf.matmul(a,b) # r.shape = (3, 2, 5)
r = tf.transpose(r, (1, 2, 0)) # r.shape = (2, 5, 3)
Hope this helps.
Below is a forward pass and partly implemented backward pass of back propagation of a neural network :
import numpy as np
def sigmoid(z):
return 1 / (1 + np.exp(-z))
X_train = np.asarray([[1,1], [0,0]]).T
Y_train = np.asarray([[1], [0]]).T
hidden_size = 2
output_size = 1
learning_rate = 0.1
forward propagation
w1 = np.random.randn(hidden_size, 2) * 0.1
b1 = np.zeros((hidden_size, 1))
w2 = np.random.randn(output_size, hidden_size) * 0.1
b2 = np.zeros((output_size, 1))
Z1 = np.dot(w1, X_train) + b1
A1 = sigmoid(Z1)
Z2 = np.dot(w2, A1) + b2
A2 = sigmoid(Z2)
derivativeA2 = A2 * (1 - A2)
derivativeA1 = A1 * (1 - A1)
first steps of back propagation
error = (A2 - Y_train)
dA2 = error / derivativeA2
dZ2 = np.multiply(dA2, derivativeA2)
What is the intuition behind :
error = (A2 - Y_train)
dA2 = error / derivativeA2
dZ2 = np.multiply(dA2, derivativeA2)
I understand error is the difference between the current prediction A2 and actual values Y_train.
But why divide this error by the derivative of A2 and then multiply the result of error / derivativeA2 by derivativeA2 ? What is intuition behind this ?
These expressions are indeed confusing:
derivativeA2 = A2 * (1 - A2)
error = (A2 - Y_train)
dA2 = error / derivativeA2
... because error doesn't have a meaning on its own. At this point, the goal is compute the derivative of the cross-entropy loss, which has this formula:
dA2 = (A2 - Y_train) / (A2 * (1 - A2))
See these lecture notes (formula 6) for the derivation. It just happens that the previous operation is sigmoid and its derivative is A2 * (1 - A2). That's why this expression is used again to compute dZ2 (formula 7).
But if you had a different loss function (say, L2) or a different squeeze layer, then A2 * (1 - A2) wouldn't be reused. These are different nodes in the computational graph.
I'm learning TensorFlow and tired to apply on mnist database.
My question is (see attached image) :
what could cause such output for accuracy (improving and then degrading!) & Loss (almost constant!)
the accuracy isn't that great just hovering around 10%
Despite:
5 layer network (incl. output layer), with 200/10/60/30/10 neurons respectively
Is the network not learning ? despite 0.1 learning rate (which is quite high I believe)
Full code: https://github.com/vibhorj/tf > mnist-2.py
1) here's how the layers are defined:
K,L,M,N=200,100,60,30
""" Layer 1 """
with tf.name_scope('L1'):
w1 = tf.Variable(initial_value = tf.truncated_normal([28*28,K],mean=0,stddev=0.1), name = 'w1')
b1 = tf.Variable(initial_value = tf.truncated_normal([K],mean=0,stddev=0.1), name = 'b1')
""" Layer 2 """
with tf.name_scope('L2'):
w2 = tf.Variable(initial_value =tf.truncated_normal([K,L],mean=0,stddev=0.1), name = 'w2')
b2 = tf.Variable(initial_value = tf.truncated_normal([L],mean=0,stddev=0.1), name = 'b2')
""" Layer 3 """
with tf.name_scope('L3'):
w3 = tf.Variable(initial_value = tf.truncated_normal([L,M],mean=0,stddev=0.1), name = 'w3')
b3 = tf.Variable(initial_value = tf.truncated_normal([M],mean=0,stddev=0.1), name = 'b3')
""" Layer 4 """
with tf.name_scope('L4'):
w4 = tf.Variable(initial_value = tf.truncated_normal([M,N],mean=0,stddev=0.1), name = 'w4')
b4 = tf.Variable(initial_value = tf.truncated_normal([N],mean=0,stddev=0.1), name = 'b4')
""" Layer output """
with tf.name_scope('L_out'):
w_out = tf.Variable(initial_value = tf.truncated_normal([N,10],mean=0,stddev=0.1), name = 'w_out')
b_out = tf.Variable(initial_value = tf.truncated_normal([10],mean=0,stddev=0.1), name = 'b_out')
2) loss function
Y1 = tf.nn.sigmoid(tf.add(tf.matmul(X,w1),b1), name='Y1')
Y2 = tf.nn.sigmoid(tf.add(tf.matmul(Y1,w2),b2), name='Y2')
Y3 = tf.nn.sigmoid(tf.add(tf.matmul(Y2,w3),b3), name='Y3')
Y4 = tf.nn.sigmoid(tf.add(tf.matmul(Y3,w4),b4), name='Y4')
Y_pred_logits = tf.add(tf.matmul(Y4, w_out),b_out,name='logits')
Y_pred_prob = tf.nn.softmax(Y_pred_logits, name='probs')
error = -tf.matmul(Y
, tf.reshape(tf.log(Y_pred_prob),[10,-1]), name ='err')
loss = tf.reduce_mean(error, name = 'loss')
3) optimization function
opt = tf.train.GradientDescentOptimizer(0.1)
grads_and_vars = opt.compute_gradients(loss)
ctr = tf.Variable(0.0, name='ctr')
z = opt.apply_gradients(grads_and_vars, global_step=ctr)
4) Tensorboard code:
evt_file = tf.summary.FileWriter('/Users/vibhorj/python/-tf/g_mnist')
evt_file.add_graph(tf.get_default_graph())
s1 = tf.summary.scalar(name='accuracy', tensor=accuracy)
s2 = tf.summary.scalar(name='loss', tensor=loss)
m1 = tf.summary.merge([s1,s2])
5) run the session (test data is mnist.test.images & mnist.test.labels
with tf.Session() as sess:
sess.run(tf.variables_initializer(tf.global_variables()))
for i in range(300):
""" calc. accuracy on test data - TENSORBOARD before iteration beings """
summary = sess.run(m1, feed_dict=test_data)
evt_file.add_summary(summary, sess.run(ctr))
evt_file.flush()
""" fetch train data """
a_train, b_train = mnist.train.next_batch(batch_size=100)
train_data = {X: a_train , Y: b_train}
""" train """
sess.run(z, feed_dict = train_data)
Appreciate your time to provide any insight into it. I'm completely clueless hwo to proceed further (even tried initializing w & b with random_normal, played with learning rates [0.1,0.01, 0.001])
Cheers!
Please consider
Initializing biases to zeros
Using ReLU units instead of sigmoid - avoid saturation
Using Adam optimizer - faster learning
I feel that your network is quite large. You could do with a smaller network.
K,L,M,N=200,100,60,30
""" Layer 1 """
with tf.name_scope('L1'):
w1 = tf.Variable(initial_value = tf.truncated_normal([28*28,K],mean=0,stddev=0.1), name = 'w1')
b1 = tf.zeros([K])#tf.Variable(initial_value = tf.truncated_normal([K],mean=0,stddev=0.01), name = 'b1')
""" Layer 2 """
with tf.name_scope('L2'):
w2 = tf.Variable(initial_value =tf.truncated_normal([K,L],mean=0,stddev=0.1), name = 'w2')
b2 = tf.zeros([L])#tf.Variable(initial_value = tf.truncated_normal([L],mean=0,stddev=0.01), name = 'b2')
""" Layer 3 """
with tf.name_scope('L3'):
w3 = tf.Variable(initial_value = tf.truncated_normal([L,M],mean=0,stddev=0.1), name = 'w3')
b3 = tf.zeros([M]) #tf.Variable(initial_value = tf.truncated_normal([M],mean=0,stddev=0.01), name = 'b3')
""" Layer 4 """
with tf.name_scope('L4'):
w4 = tf.Variable(initial_value = tf.truncated_normal([M,N],mean=0,stddev=0.1), name = 'w4')
b4 = tf.zeros([N])#tf.Variable(initial_value = tf.truncated_normal([N],mean=0,stddev=0.1), name = 'b4')
""" Layer output """
with tf.name_scope('L_out'):
w_out = tf.Variable(initial_value = tf.truncated_normal([N,10],mean=0,stddev=0.1), name = 'w_out')
b_out = tf.zeros([10])#tf.Variable(initial_value = tf.truncated_normal([10],mean=0,stddev=0.1), name = 'b_out')
Y1 = tf.nn.relu(tf.add(tf.matmul(X,w1),b1), name='Y1')
Y2 = tf.nn.relu(tf.add(tf.matmul(Y1,w2),b2), name='Y2')
Y3 = tf.nn.relu(tf.add(tf.matmul(Y2,w3),b3), name='Y3')
Y4 = tf.nn.relu(tf.add(tf.matmul(Y3,w4),b4), name='Y4')
Y_pred_logits = tf.add(tf.matmul(Y4, w_out),b_out,name='logits')
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=Y, logits=Y_pred_logits, name='xentropy'))
opt = tf.train.GradientDescentOptimizer(0.01)
grads_and_vars = opt.compute_gradients(loss)
ctr = tf.Variable(0.0, name='ctr', trainable=False)
train_op = opt.minimize(loss, global_step=ctr)
for v in tf.trainable_variables():
print v.op.name
with tf.Session() as sess:
sess.run(tf.variables_initializer(tf.global_variables()))
for i in range(3000):
""" calc. accuracy on test data - TENSORBOARD before iteration beings """
#summary = sess.run(m1, feed_dict=test_data)
#evt_file.add_summary(summary, sess.run(ctr))
#evt_file.flush()
""" fetch train data """
a_train, b_train = mnist.train.next_batch(batch_size=100)
train_data = {X: a_train , Y: b_train}
""" train """
l = sess.run(loss, feed_dict = train_data)
print l
sess.run(train_op, feed_dict = train_data)
I tried the following code
batch_size= 128
c1 = tf.zeros([128,32,32,16])
c2 = tf.zeros([128,32,32,16])
c3 = tf.zeros([128,32,32,16])
c = tf.stack([c1, c2, c3], 4) (size: [128, 32, 32, 16, 3])
alpha = tf.zeros([128,3,1])
M = tf.matmul(c,alpha)
And it makes error at tf.matmul.
What I want is just a linear combination alpha[0]*c1 + alpha[1]*c2 + alpha[2]*c3 at each sample. When batch size is 1, this code will be fine, but when it is not how can I do it?
Should I reshape c1,c2,c3?
I think this code works; verified it.
import tensorflow as tf
import numpy as np
batch_size= 128
c1 = tf.ones([128,32,32,16])
c2 = tf.ones([128,32,32,16])
c3 = tf.ones([128,32,32,16])
c = tf.stack([c1, c2, c3], 4)
alpha = tf.zeros([1,3])
for j in range(127):
z = alpha[j] + 1
z = tf.expand_dims(z,0)
alpha = tf.concat([alpha,z],0)
M = tf.einsum('aijkl,al->aijk',c,alpha)
print('')
with tf.Session() as sess:
_alpha = sess.run(alpha)
_M = sess.run(M)
print('')
I have 19 input integer features. Output and labels is 1 or 0. I examine MNIST example from tensorflow website.
My code is here :
validation_images, validation_labels, train_images, train_labels = ld.read_data_set()
print "\n"
print len(train_images[0])
print len(train_labels)
import tensorflow as tf
sess = tf.InteractiveSession()
x = tf.placeholder(tf.float32, shape=[None, 19])
y_ = tf.placeholder(tf.float32, shape=[None, 2])
W = tf.Variable(tf.zeros([19,2]))
b = tf.Variable(tf.zeros([2]))
sess.run(tf.initialize_all_variables())
y = tf.nn.softmax(tf.matmul(x,W) + b)
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
start = 0
batch_1 = 50
end = 100
for i in range(1000):
#batch = mnist.train.next_batch(50)
x1 = train_images[start:end]
y1 = train_labels[start:end]
start = start + batch_1
end = end + batch_1
x1 = np.reshape(x1, (-1, 19))
y1 = np.reshape(y1, (-1, 2))
train_step.run(feed_dict={x: x1[0], y_: y1[0]})
I run upper code, I get an error. The compiler says that
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (19,) for Tensor u'Placeholder:0', which has shape '(?, 19)'
How can I handle this error?
Try
train_step.run(feed_dict={x: x1, y_: y1})
You can reshape your feed's value by the following code:
x1 = np.column_stack((x1))
x1 = np.transpose(x1) # if necessary
Thus, the shape of the input value will be (1, 19) instead of (19,)