Related
Given two 2d masks m1, m2 (both shape [m,m]), obtain 3d mask m3 (shape [m,m,m]):
if m1[i][j] == True and m2[i][k] == True and i != j and i != k and j != k, then m3[i][j][k] = True
Note that m1 and m2 is diagonal, m1[i][j] = m1[j][i], m2[i][k]=m2[k][i]. but m3[i][k][j] is not necessarily True.
For example:
m1=[[0,1,0],[1,0,0],[0,0,0]]
m2=[[0,0,1],[0,0,0],[1,0,0]]
m3 (shape (3,3,3)) the only True value is m3[0][1][2]
def _get_triplet_mask(mask1, mask2):
indices_equal = tf.cast(tf.eye(tf.shape(mask1)[0]), tf.bool)
indices_not_equal = ~indices_equal
i_not_equal_j = tf.expand_dims(indices_not_equal, 2)
i_not_equal_k = tf.expand_dims(indices_not_equal, 1)
j_not_equal_k = tf.expand_dims(indices_not_equal, 0)
distinct_indices = (i_not_equal_j & i_not_equal_k) & j_not_equal_k
i_j = tf.expand_dims(mask1, 2)
i_k = tf.expand_dims(mask2, 1)
valid_labels = i_j & i_k
return valid_labels & distinct_indices
modified from sentence-transformers/sentence_transformers/losses/BatchHardTripletLoss.py. in tensorflow
From deeplearning course on Coursera I've implemented logistic regression :
import numpy as np
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
def sigmoid(z):
s = 1 / (1 + np.exp(-z))
return s
def initialize_with_zeros(dim):
w = np.zeros(shape=(dim, 1))
b = 0
return w, b
def propagate(w, b, X, Y):
m = X.shape[1]
A = sigmoid(np.dot(w.T, X) + b) # compute activation
cost = (- 1 / m) * np.sum(Y * np.log(A) + (1 - Y) * (np.log(1 - A))) # compute cost
dw = (1 / m) * np.dot(X, (A - Y).T)
db = (1 / m) * np.sum(A - Y)
cost = np.squeeze(cost)
grads = {"dw": dw,
"db": db}
return grads, cost
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
costs = []
for i in range(num_iterations):
grads, cost = propagate(w, b, X, Y)
dw = grads["dw"]
db = grads["db"]
w = w - learning_rate * dw # need to broadcast
b = b - learning_rate * db
if i % 100 == 0:
costs.append(cost)
# Print the cost every 100 training examples
if print_cost and i % 100 == 0:
print ("Cost after iteration %i: %f" % (i, cost))
params = {"w": w,
"b": b}
grads = {"dw": dw,
"db": db}
return params, grads, costs
def predict(w, b, X):
m = X.shape[1]
Y_prediction = np.zeros((1, m))
w = w.reshape(X.shape[0], 1)
A = sigmoid(np.dot(w.T, X) + b)
for i in range(A.shape[1]):
# Convert probabilities a[0,i] to actual predictions p[0,i]
### START CODE HERE ### (≈ 4 lines of code)
print(A)
Y_prediction[0, i] = 1 if A[0, i] > 0.5 else 0
### END CODE HERE ###
assert(Y_prediction.shape == (1, m))
return Y_prediction
print ("sigmoid(0) = " + str(sigmoid(0)))
print ("sigmoid(9.2) = " + str(sigmoid(9.2)))
dim = 2
w, b = initialize_with_zeros(dim)
print ("w = " + str(w))
print ("b = " + str(b))
w, b, X, Y = np.array([[1], [2]]), 2, np.array([[-1,-2], [3,4]]), np.array([[1, 0]])
grads, cost = propagate(w, b, X, Y)
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print ("cost = " + str(cost))
params, grads, costs = optimize(w, b, X, Y, num_iterations= 10000, learning_rate = 0.01, print_cost = False)
print ("w = " + str(params["w"]))
print ("b = " + str(params["b"]))
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print("predictions = " + str(predict(w, b, X)))
def model(X_train, Y_train, X_test, Y_test, num_iterations=2000, learning_rate=0.5, print_cost=False):
w, b = initialize_with_zeros(X_train.shape[0])
parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
w = parameters["w"]
b = parameters["b"]
Y_prediction_test = predict(w, b, X_test)
Y_prediction_train = predict(w, b, X_train)
print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))
d = {"costs": costs,
"Y_prediction_test": Y_prediction_test,
"Y_prediction_train" : Y_prediction_train,
"w" : w,
"b" : b,
"learning_rate" : learning_rate,
"num_iterations": num_iterations}
return d
I'm attempting to use a generic dataset which contains 5 samples where each sample contain 4 elements :
train_set_x = np.array([[1,2,3,4],[4,3,2,1],[1,2,3,4],[4,3,2,1],[1,2,3,4]])
train_set_y = np.array([1,0,1,0,1])
test_set_x = np.array([[1,2,3,4],[4,3,2,1],[1,2,3,4],[4,3,2,1],[1,2,3,4]])
test_set_y = np.array([1,0,1,0,1])
train_set_x , train_set_y , test_set_x , test_set_y
d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True)
But the following error is thrown :
<ipython-input-409-bd4e233a8f4e> in propagate(w, b, X, Y)
18
19 A = sigmoid(np.dot(w.T, X) + b) # compute activation
---> 20 cost = (- 1 / m) * np.sum(Y * np.log(A) + (1 - Y) * (np.log(1 - A))) # compute cost
21
22 dw = (1 / m) * np.dot(X, (A - Y).T)
ValueError: operands could not be broadcast together with shapes (5,) (1,4)
Do I need to change the weight dimensions in order to compute the cost value ?
Update :
Using modification :
A = sigmoid(np.dot(X , w) + b) # compute activation
causes error :
<ipython-input-546-7a7980550834> in propagate(w, b, X, Y)
20 m = X.shape[1]
21
---> 22 A = sigmoid(np.dot(X , w) + b) # compute activation
23 print('w.T' , w.T , 'w' , w, 'X' , X , 'Y' , Y , 'A' , A)
24 cost = (- 1 / m) * np.sum(Y * np.log(A) + (1 - Y) * (np.log(1 - A))) # compute cost
ValueError: shapes (5,4) and (5,1) not aligned: 4 (dim 1) != 5 (dim 0)
I am trying to train a name generation LSTM network. I am not using pre-defined tensorflow cells (like tf.contrib.rnn.BasicLSTMCell, etc). I have created LSTM cell myself. But the error is not reducing beyond a limit. It only decreases 30% from what it is initially (when random weights were used in forward propagation) and then it starts increasing. Also, the gradients and weights become very small after few thousand training steps.
I think the reason for non-convergence can be one of two:
1. The design of tensorflow graph i have created OR
2. The loss function i used.
I am feeding one hot vectors of each character of the word for each time-step of the network. The code i have used for graph generation and loss function is as follows. Tx is the number of time steps in RNN, n_x,n_a,n_y are length of the input vectors, LSTM cell vector and output vector respectively.
Will be great if someone can help me in identifying what i am doing wrong here.
n_x = vocab_size
n_y = vocab_size
n_a = 100
Tx = 50
Ty = Tx
with open("trainingnames_file.txt") as f:
examples = f.readlines()
examples = [x.lower().strip() for x in examples]
X0 = [[char_to_ix[x1] for x1 in list(x)] for x in examples]
X1 = np.array([np.concatenate([np.array(x), np.zeros([Tx-len(x)])]) for x in X0], dtype=np.int32).T
Y0 = [(x[1:] + [char_to_ix["\n"]]) for x in X0]
Y1 = np.array([np.concatenate([np.array(y), np.zeros([Ty-len(y)])]) for y in Y0], dtype=np.int32).T
m = len(X0)
Wf = tf.get_variable(name="Wf", shape = [n_a,(n_a+n_x)])
Wu = tf.get_variable(name="Wu", shape = [n_a,(n_a+n_x)])
Wc = tf.get_variable(name="Wc", shape = [n_a,(n_a+n_x)])
Wo = tf.get_variable(name="Wo", shape = [n_a,(n_a+n_x)])
Wy = tf.get_variable(name="Wy", shape = [n_y,n_a])
bf = tf.get_variable(name="bf", shape = [n_a,1])
bu = tf.get_variable(name="bu", shape = [n_a,1])
bc = tf.get_variable(name="bc", shape = [n_a,1])
bo = tf.get_variable(name="bo", shape = [n_a,1])
by = tf.get_variable(name="by", shape = [n_y,1])
X_input = tf.placeholder(dtype = tf.int32, shape = [Tx,None])
Y_input = tf.placeholder(dtype = tf.int32, shape = [Ty,None])
X = tf.one_hot(X_input, axis = 0, depth = n_x)
Y = tf.one_hot(Y_input, axis = 0, depth = n_y)
X.shape
a_prev = tf.zeros(shape = [n_a,m])
c_prev = tf.zeros(shape = [n_a,m])
a_all = []
c_all = []
for i in range(Tx):
ac = tf.concat([a_prev,tf.squeeze(tf.slice(input_=X,begin=[0,i,0],size=[n_x,1,m]))], axis=0)
ct = tf.tanh(tf.matmul(Wc,ac) + bc)
tug = tf.sigmoid(tf.matmul(Wu,ac) + bu)
tfg = tf.sigmoid(tf.matmul(Wf,ac) + bf)
tog = tf.sigmoid(tf.matmul(Wo,ac) + bo)
c = tf.multiply(tug,ct) + tf.multiply(tfg,c_prev)
a = tf.multiply(tog,tf.tanh(c))
y = tf.nn.softmax(tf.matmul(Wy,a) + by, axis = 0)
a_all.append(a)
c_all.append(c)
a_prev = a
c_prev = c
y_ex = tf.expand_dims(y,axis=1)
if i == 0:
y_all = y_ex
else:
y_all = tf.concat([y_all,y_ex], axis=1)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=Y,logits=y_all,dim=0))
opt = tf.train.AdamOptimizer()
train = opt.minimize(loss)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
o = sess.run(loss, feed_dict = {X_input:X1,Y_input:Y1})
print(o.shape)
print(o)
sess.run(train, feed_dict = {X_input:X1,Y_input:Y1})
o = sess.run(loss, feed_dict = {X_input:X1,Y_input:Y1})
print(o)
I'm learning TensorFlow and tired to apply on mnist database.
My question is (see attached image) :
what could cause such output for accuracy (improving and then degrading!) & Loss (almost constant!)
the accuracy isn't that great just hovering around 10%
Despite:
5 layer network (incl. output layer), with 200/10/60/30/10 neurons respectively
Is the network not learning ? despite 0.1 learning rate (which is quite high I believe)
Full code: https://github.com/vibhorj/tf > mnist-2.py
1) here's how the layers are defined:
K,L,M,N=200,100,60,30
""" Layer 1 """
with tf.name_scope('L1'):
w1 = tf.Variable(initial_value = tf.truncated_normal([28*28,K],mean=0,stddev=0.1), name = 'w1')
b1 = tf.Variable(initial_value = tf.truncated_normal([K],mean=0,stddev=0.1), name = 'b1')
""" Layer 2 """
with tf.name_scope('L2'):
w2 = tf.Variable(initial_value =tf.truncated_normal([K,L],mean=0,stddev=0.1), name = 'w2')
b2 = tf.Variable(initial_value = tf.truncated_normal([L],mean=0,stddev=0.1), name = 'b2')
""" Layer 3 """
with tf.name_scope('L3'):
w3 = tf.Variable(initial_value = tf.truncated_normal([L,M],mean=0,stddev=0.1), name = 'w3')
b3 = tf.Variable(initial_value = tf.truncated_normal([M],mean=0,stddev=0.1), name = 'b3')
""" Layer 4 """
with tf.name_scope('L4'):
w4 = tf.Variable(initial_value = tf.truncated_normal([M,N],mean=0,stddev=0.1), name = 'w4')
b4 = tf.Variable(initial_value = tf.truncated_normal([N],mean=0,stddev=0.1), name = 'b4')
""" Layer output """
with tf.name_scope('L_out'):
w_out = tf.Variable(initial_value = tf.truncated_normal([N,10],mean=0,stddev=0.1), name = 'w_out')
b_out = tf.Variable(initial_value = tf.truncated_normal([10],mean=0,stddev=0.1), name = 'b_out')
2) loss function
Y1 = tf.nn.sigmoid(tf.add(tf.matmul(X,w1),b1), name='Y1')
Y2 = tf.nn.sigmoid(tf.add(tf.matmul(Y1,w2),b2), name='Y2')
Y3 = tf.nn.sigmoid(tf.add(tf.matmul(Y2,w3),b3), name='Y3')
Y4 = tf.nn.sigmoid(tf.add(tf.matmul(Y3,w4),b4), name='Y4')
Y_pred_logits = tf.add(tf.matmul(Y4, w_out),b_out,name='logits')
Y_pred_prob = tf.nn.softmax(Y_pred_logits, name='probs')
error = -tf.matmul(Y
, tf.reshape(tf.log(Y_pred_prob),[10,-1]), name ='err')
loss = tf.reduce_mean(error, name = 'loss')
3) optimization function
opt = tf.train.GradientDescentOptimizer(0.1)
grads_and_vars = opt.compute_gradients(loss)
ctr = tf.Variable(0.0, name='ctr')
z = opt.apply_gradients(grads_and_vars, global_step=ctr)
4) Tensorboard code:
evt_file = tf.summary.FileWriter('/Users/vibhorj/python/-tf/g_mnist')
evt_file.add_graph(tf.get_default_graph())
s1 = tf.summary.scalar(name='accuracy', tensor=accuracy)
s2 = tf.summary.scalar(name='loss', tensor=loss)
m1 = tf.summary.merge([s1,s2])
5) run the session (test data is mnist.test.images & mnist.test.labels
with tf.Session() as sess:
sess.run(tf.variables_initializer(tf.global_variables()))
for i in range(300):
""" calc. accuracy on test data - TENSORBOARD before iteration beings """
summary = sess.run(m1, feed_dict=test_data)
evt_file.add_summary(summary, sess.run(ctr))
evt_file.flush()
""" fetch train data """
a_train, b_train = mnist.train.next_batch(batch_size=100)
train_data = {X: a_train , Y: b_train}
""" train """
sess.run(z, feed_dict = train_data)
Appreciate your time to provide any insight into it. I'm completely clueless hwo to proceed further (even tried initializing w & b with random_normal, played with learning rates [0.1,0.01, 0.001])
Cheers!
Please consider
Initializing biases to zeros
Using ReLU units instead of sigmoid - avoid saturation
Using Adam optimizer - faster learning
I feel that your network is quite large. You could do with a smaller network.
K,L,M,N=200,100,60,30
""" Layer 1 """
with tf.name_scope('L1'):
w1 = tf.Variable(initial_value = tf.truncated_normal([28*28,K],mean=0,stddev=0.1), name = 'w1')
b1 = tf.zeros([K])#tf.Variable(initial_value = tf.truncated_normal([K],mean=0,stddev=0.01), name = 'b1')
""" Layer 2 """
with tf.name_scope('L2'):
w2 = tf.Variable(initial_value =tf.truncated_normal([K,L],mean=0,stddev=0.1), name = 'w2')
b2 = tf.zeros([L])#tf.Variable(initial_value = tf.truncated_normal([L],mean=0,stddev=0.01), name = 'b2')
""" Layer 3 """
with tf.name_scope('L3'):
w3 = tf.Variable(initial_value = tf.truncated_normal([L,M],mean=0,stddev=0.1), name = 'w3')
b3 = tf.zeros([M]) #tf.Variable(initial_value = tf.truncated_normal([M],mean=0,stddev=0.01), name = 'b3')
""" Layer 4 """
with tf.name_scope('L4'):
w4 = tf.Variable(initial_value = tf.truncated_normal([M,N],mean=0,stddev=0.1), name = 'w4')
b4 = tf.zeros([N])#tf.Variable(initial_value = tf.truncated_normal([N],mean=0,stddev=0.1), name = 'b4')
""" Layer output """
with tf.name_scope('L_out'):
w_out = tf.Variable(initial_value = tf.truncated_normal([N,10],mean=0,stddev=0.1), name = 'w_out')
b_out = tf.zeros([10])#tf.Variable(initial_value = tf.truncated_normal([10],mean=0,stddev=0.1), name = 'b_out')
Y1 = tf.nn.relu(tf.add(tf.matmul(X,w1),b1), name='Y1')
Y2 = tf.nn.relu(tf.add(tf.matmul(Y1,w2),b2), name='Y2')
Y3 = tf.nn.relu(tf.add(tf.matmul(Y2,w3),b3), name='Y3')
Y4 = tf.nn.relu(tf.add(tf.matmul(Y3,w4),b4), name='Y4')
Y_pred_logits = tf.add(tf.matmul(Y4, w_out),b_out,name='logits')
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=Y, logits=Y_pred_logits, name='xentropy'))
opt = tf.train.GradientDescentOptimizer(0.01)
grads_and_vars = opt.compute_gradients(loss)
ctr = tf.Variable(0.0, name='ctr', trainable=False)
train_op = opt.minimize(loss, global_step=ctr)
for v in tf.trainable_variables():
print v.op.name
with tf.Session() as sess:
sess.run(tf.variables_initializer(tf.global_variables()))
for i in range(3000):
""" calc. accuracy on test data - TENSORBOARD before iteration beings """
#summary = sess.run(m1, feed_dict=test_data)
#evt_file.add_summary(summary, sess.run(ctr))
#evt_file.flush()
""" fetch train data """
a_train, b_train = mnist.train.next_batch(batch_size=100)
train_data = {X: a_train , Y: b_train}
""" train """
l = sess.run(loss, feed_dict = train_data)
print l
sess.run(train_op, feed_dict = train_data)
I tried the following code
batch_size= 128
c1 = tf.zeros([128,32,32,16])
c2 = tf.zeros([128,32,32,16])
c3 = tf.zeros([128,32,32,16])
c = tf.stack([c1, c2, c3], 4) (size: [128, 32, 32, 16, 3])
alpha = tf.zeros([128,3,1])
M = tf.matmul(c,alpha)
And it makes error at tf.matmul.
What I want is just a linear combination alpha[0]*c1 + alpha[1]*c2 + alpha[2]*c3 at each sample. When batch size is 1, this code will be fine, but when it is not how can I do it?
Should I reshape c1,c2,c3?
I think this code works; verified it.
import tensorflow as tf
import numpy as np
batch_size= 128
c1 = tf.ones([128,32,32,16])
c2 = tf.ones([128,32,32,16])
c3 = tf.ones([128,32,32,16])
c = tf.stack([c1, c2, c3], 4)
alpha = tf.zeros([1,3])
for j in range(127):
z = alpha[j] + 1
z = tf.expand_dims(z,0)
alpha = tf.concat([alpha,z],0)
M = tf.einsum('aijkl,al->aijk',c,alpha)
print('')
with tf.Session() as sess:
_alpha = sess.run(alpha)
_M = sess.run(M)
print('')