Calculating gradients of cusom loss function with Gradient.Tape

Calculating gradients of cusom loss function with Gradient.Tape - tensorflow

I am trying custom traning of the network using Gradient.Tape method.
This traning is unsupervised.
The details of network and cost function is as following,
My Network is,
def CreateNetwork(inplayer, hidlayer, outlayer,seed):
model = keras.Sequential()
model.add(Dense(hidlayer, input_dim=inplayer, kernel_initializer=initializers.RandomNormal(mean=0.0,stddev=1/np.sqrt(inplayer),seed=seed), bias_initializer=initializers.RandomNormal(mean=0.0,stddev=1/np.sqrt(inplayer),seed=seed), activation='tanh'))
model.add(Dense(outlayer, kernel_initializer=initializers.RandomNormal(mean=0.0,stddev=1/np.sqrt(hidlayer),seed=seed), bias_initializer=initializers.Zeros(), activation='linear'))
return model
and my custom cost function is defined as,
def H_tilda(J,U,nsamples,nsites,configs,out_matrix):
EigenValue = 0.0
for k in range(nsamples):
config = configs[k,:]
out_n = out_matrix[k,:]
exp = 0.0
for i in range(nsamples):
n = configs[i,:]
out_nprime = out_matrix[i,:]
#------------------------------------------------------------------------------------------------
# Calculation of Hopping Term
#------------------------------------------------------------------------------------------------
hop = 0.0
for j in range(nsites):
if j == 0:
k = [nsites-1,j+1]
elif j == (nsites - 1):
k = [j-1,0]
else:
k = [j-1,j+1]
if n[k[0]] != 0:
annihiliate1 = np.sqrt(n[k[0]])
n1 = np.copy(n)
n1[k[0]] = n1[k[0]] - 1
n1[j] = n1[j] +1
if (config == n1).all():
delta1 = 1
else:
delta1 = 0
else:
annihiliate1 = 0
n1 = np.zeros(nsites)
delta1 = 0
if n[k[1]] != 0:
annihiliate2 = np.sqrt(n[k[1]])
n2 = np.copy(n)
n2[k[1]] = n2[k[1]] -1
n2[j] = n2[j] + 1
if (config == n2).all():
delta2 = 1
else:
delta2 = 0
else:
annihiliate2 = 0
n2 = np.zeros(nsites)
delta2 = 0
create = np.sqrt(n[j] + 1)
hop = hop + create*(annihiliate1*delta1 + annihiliate2*delta2)
#------------------------------------------------------------------------------------------------
#------------------------------------------------------------------------------------------------
# Calculation of Onsite Term
#------------------------------------------------------------------------------------------------
if (config == n).all():
ons = np.sum(np.dot(np.square(n),n - 1))
else:
ons = 0.0
#------------------------------------------------------------------------------------------------
phi_value = phi(out_nprime.numpy())
exp = exp + ((hop + ons) * phi_value)
Phi_value = phi(out_n.numpy())
EigenValue = EigenValue + exp/Phi_value
return np.real(EigenValue/nsamples)
I want to do custom traning using GradientTape method, for which I used following lines ,
optimizer = optimizers.SGD(learning_rate=1e-3)
with tf.GradientTape(watch_accessed_variables=False) as tape:
tape.watch(tf.convert_to_tensor(configs))
out_matrix = model(configs)
print(out_matrix)
eival = H_tilda(J,U,nsamples,nsites,configs,out_matrix)
print(eival)
gradients = tape.gradient(tf.convert_to_tensor(eival), model.trainable_weights)
print(gradients)
But the gradient I am getting is NONE,
output: [None, None, None, None]

Related

Used MLESAC but got poor answer

MLESAC is better than RANSAC by calculating likelihood rather than counting numbers of inliers.
(Torr and Zisserman 2000)
So there is no reason to use RANSAC if we use MLESAC. But when I implied on the plane fitting problem, I got a worse result than RANSAC. It came out similar p_i when I substituted distance errors of each data in equation 19, leading wrong negative log likelihood.
%% MLESAC (REF.PCL)
% data
clc;clear; close all;
f = #(a_hat,b_hat,c_hat,x,y)a_hat.*x+b_hat.*y+c_hat; % z
a = 1;
b = 1;
c = 20;
width = 10;
range = (-width:0.01:width)'; % different from linespace
x = -width+(width-(-width))*rand(length(range),1); % r = a + (b-a).*rand(N,1)
y = -width+(width-(-width))*rand(length(range),1);
X = (-width:0.5:width)';
Y = (-width:0.5:width)';
[X,Y] = meshgrid(X,Y); % for drawing surf
Z = f(a/c,b/c,c/c,X,Y);
z_n = f(a/c,b/c,c/c,x,y); % z/c
% add noise
r = 0.3;
noise = r*randn(size(x));
z_n = z_n + noise;
% add outliers
out_rng = find(y>=8,200);
out_udel = 5;
z_n(out_rng) = z_n(out_rng) + out_udel;
plot3(x,y,z_n,'b.');hold on;
surf(X,Y,Z);hold on;grid on ;axis equal;
p_n = [x y z_n];
num_pt = size(p_n,1);
% compute sigma = median(dist (x - median (x)))
threshold = 0.3; %%%%%%%%% user-defined
medianx = median(p_n(:,1));
mediany = median(p_n(:,2));
medianz = median(p_n(:,3));
medianp = [medianx mediany medianz];
mediadist = median(sqrt(sum((p_n - medianp).*(p_n - medianp),2)));
sigma = mediadist * threshold;
% compute the bounding box diagonal
maxx = max(p_n(:,1));
maxy = max(p_n(:,2));
maxz = max(p_n(:,3));
minx = min(p_n(:,1));
miny = min(p_n(:,2));
minz = min(p_n(:,3));
bound = [maxx maxy maxz]-[minx miny minz];
v = sqrt(sum(bound.*bound,2));
%% iteration
iteration = 0;
num_inlier = 0;
max_iteration = 10000;
max_num_inlier = 0;
k = 1;
s = 5; % number of sample point
probability = 0.99;
d_best_penalty = 100000;
dist_scaling_factor = -1 / (2.0*sigma*sigma);
normalization_factor = 1 / (sqrt(2*pi)*sigma);
Gaussian = #(gamma,disterr,sig)gamma * normalization_factor * exp(disterr.^2*dist_scaling_factor);
Uniform = #(gamma,v)(1-gamma)/v;
while(iteration < k)
% get sample
rand_var = randi([1 length(x)],s,1);
% find coeff. & inlier
A_rand = [p_n(rand_var,1:2) ones(size(rand_var,1),1)];
y_est = p_n(rand_var,3);
Xopt = pinv(A_rand)*y_est;
disterr = abs(sum([p_n(:,1:2) ones(size(p_n,1),1)].*Xopt',2) - p_n(:,3))./sqrt(dot(Xopt',Xopt'));
inlier = find(disterr <= threshold);
outlier = find(disterr >= threshold);
num_inlier = size(inlier,1);
outlier_num = size(outlier,1);
% EM
gamma = 0.5;
iterations_EM = 3;
for i = 1:iterations_EM
% Likelihood of a datam given that it is an inlier
p_i = Gaussian(gamma,disterr,sigma);
% Likelihood of a datum given that it is an outlier
p_o = Uniform(gamma,v);
zi = p_i./(p_i + p_o);
gamma = sum(zi)/num_pt;
end
% Find the log likelihood of the mode -L
d_cur_pentnalty = -sum(log(p_i+p_o));
if(d_cur_pentnalty < d_best_penalty)
d_best_penalty = d_cur_pentnalty;
% record inlier
best_inlier = p_n(inlier,:);
max_num_inlier = num_inlier;
best_model = Xopt;
% Adapt k
w = max_num_inlier / num_pt;
p_no_outliers = 1 - w^s;
k = log(1-probability)/log(p_no_outliers);
end
% RANSAC
% if (num_inlier > max_num_inlier)
% max_num_inlier = num_inlier;
% best_model = Xopt;
%
% % Adapt k
% w = max_num_inlier / num_pt;
% p_no_outliers = 1 - w^s;
% k = log(1-probability)/log(p_no_outliers);
% end
iteration = iteration + 1;
if iteration > max_iteration
break;
end
end
a_est = best_model(1,:);
b_est = best_model(2,:);
c_est = best_model(3,:);
Z_opt = f(a_est,b_est,c_est,X,Y);
new_sur = mesh(X,Y,Z_opt,'edgecolor', 'r','FaceAlpha',0.5); % estimate
title('MLESAC',sprintf('original: a/c = %.2f, b/c = %.2f, c/c = %.2f\n new: a/c = %.2f, b/c = %.2f, c/c = %.2f',a/c,b/c,c/c,a_est,b_est,c_est));
The reference of my source code is from PCL(MLESAC), and I coded it in MATLAB.

tensorflow multi slice not reshape

I have 3D (64,64,64) shape (chair) when I reshape it using tf operation to (8,32,32,32) then do my operation Deep learning operation and then return it back using tf reshape to (64,64,64) the shape looks very bad, actually there is no shape only strange looks unknown shape (100% not looks like chair)
but if I use function that I build to slice 32 by 32 and I stack them as (8,32,32,32) I use it as input to my DL Model. the output (8,32,32,32) I use also combine function which I build to recombine by reversing the slice function I got good looking shape
the issue both function slice and combine numpy not tf. I have to train model end-to-end so I need equivalent function that slice or combine in tensorflow please
def slice(self,size, obj):
#print('inside')
oldi = 0
newi = 0
oldj = 0
newj = 0
oldk = 0
newk = 0
lst = []
s = obj.shape[0]
s += 1
for i in range(size, s, size):
if (newi == s - 1):
oldi = 0
else:
oldi = newi
for j in range(size, s, size):
if (newj == s - 1):
oldj = 0
else:
oldj = newj
for k in range(size, s, size):
newi = i
newj = j
newk = k
slc = obj[oldi:newi, oldj:newj, oldk:newk]
#print(oldi,':',newi,',',oldj,':',newj,',',oldk,':',newk)
#print(slc.shape)
lst.append(slc)
if (newk == s - 1):
oldk = 0
else:
oldk = newk
# print(slc.shape)
return lst
def combine(self,lst, shape, size):
oldi = 0
newi = 0
oldj = 0
newj = 0
oldk = 0
newk = 0
obj = np.zeros((shape, shape, shape))
s = shape
s += 1
counter = 0
for i in range(size, s, size):
if (newi == s - 1):
oldi = 0
else:
oldi = newi
for j in range(size, s, size):
if (newj == s - 1):
oldj = 0
else:
oldj = newj
for k in range(size, s, size):
newi = i
newj = j
newk = k
obj[oldi:newi, oldj:newj, oldk:newk] = lst[counter]
counter += 1
#print(oldi,':',newi,',',oldj,':',newj,',',oldk,':',newk)
# print(slc.shape)
if (newk == s - 1):
oldk = 0
else:
oldk = newk
return obj

in other words I want tensorflow operation mimic
the following function
def combine(self,lst, shape, size):
oldi = 0
newi = 0
oldj = 0
newj = 0
oldk = 0
newk = 0
obj = np.zeros((shape, shape, shape))
s = shape
s += 1
counter = 0
for i in range(size, s, size):
if (newi == s - 1):
oldi = 0
else:
oldi = newi
for j in range(size, s, size):
if (newj == s - 1):
oldj = 0
else:
oldj = newj
for k in range(size, s, size):
newi = i
newj = j
newk = k
obj[oldi:newi, oldj:newj, oldk:newk] = lst[counter]
counter += 1
#print(oldi,':',newi,',',oldj,':',newj,',',oldk,':',newk)
# print(slc.shape)
if (newk == s - 1):
oldk = 0
else:
oldk = newk
return obj

Tensorflow: How to update only single variable at a time out of many variables based on conditions

k1 = tf.Variable(10.0)
k2 = tf.Variable(10.0)
pred = tf.pow(B, ?) / C
cost = tf.pow(pred_s1 - Y, 2)
optimizer = tf.train.AdamOptimizer(0.01).minimize(cost)
sess.run(optimizer, feed_dict{A:a, B:b, C:c})
Update:
pred = tf.pow(B, k1) / C if A == 0
pred = tf.pow(B, k2) / C if A == 1
Single prediction function which updates only one variable based on the value fed into placeholder 'A'

s1 = tf.Variable(tf.random_normal([1]))
s2 = tf.Variable(tf.random_normal([1]))
s3 = tf.Variable(tf.random_normal([1]))
s4 = tf.Variable(tf.random_normal([1]))
s5 = tf.Variable(tf.random_normal([1]))
D = tf.placeholder("float")
s2_s = tf.where(tf.logical_and(1.9<D,D<2.1),x=s2,y=s1)
s3_s = tf.where(tf.logical_and(2.9<D,D<3.1),x=s3,y=s2_s)
s4_s = tf.where(tf.logical_and(3.9<D,D<4.1),x=s4,y=s3_s)
s5_s = tf.where(tf.logical_and(4.9<D,D<5.1),x=s5,y=s4_s)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
print(sess.run([s1])[0], sess.run([s2])[0], sess.run([s3])[0], sess.run([s4])[0], sess.run([s5])[0])
print(sess.run(s5_s, feed_dict={D:5}))
sess.close()

Just use
pred = tf.pow(B, A*k2 + (1-A)* k1) / C
Which gives the switch. An alternative would be tf.where.

How to avoid dying weights/gradients in custom LSTM cell in tensorflow. What shall be ideal loss function?

I am trying to train a name generation LSTM network. I am not using pre-defined tensorflow cells (like tf.contrib.rnn.BasicLSTMCell, etc). I have created LSTM cell myself. But the error is not reducing beyond a limit. It only decreases 30% from what it is initially (when random weights were used in forward propagation) and then it starts increasing. Also, the gradients and weights become very small after few thousand training steps.
I think the reason for non-convergence can be one of two:
1. The design of tensorflow graph i have created OR
2. The loss function i used.
I am feeding one hot vectors of each character of the word for each time-step of the network. The code i have used for graph generation and loss function is as follows. Tx is the number of time steps in RNN, n_x,n_a,n_y are length of the input vectors, LSTM cell vector and output vector respectively.
Will be great if someone can help me in identifying what i am doing wrong here.
n_x = vocab_size
n_y = vocab_size
n_a = 100
Tx = 50
Ty = Tx
with open("trainingnames_file.txt") as f:
examples = f.readlines()
examples = [x.lower().strip() for x in examples]
X0 = [[char_to_ix[x1] for x1 in list(x)] for x in examples]
X1 = np.array([np.concatenate([np.array(x), np.zeros([Tx-len(x)])]) for x in X0], dtype=np.int32).T
Y0 = [(x[1:] + [char_to_ix["\n"]]) for x in X0]
Y1 = np.array([np.concatenate([np.array(y), np.zeros([Ty-len(y)])]) for y in Y0], dtype=np.int32).T
m = len(X0)
Wf = tf.get_variable(name="Wf", shape = [n_a,(n_a+n_x)])
Wu = tf.get_variable(name="Wu", shape = [n_a,(n_a+n_x)])
Wc = tf.get_variable(name="Wc", shape = [n_a,(n_a+n_x)])
Wo = tf.get_variable(name="Wo", shape = [n_a,(n_a+n_x)])
Wy = tf.get_variable(name="Wy", shape = [n_y,n_a])
bf = tf.get_variable(name="bf", shape = [n_a,1])
bu = tf.get_variable(name="bu", shape = [n_a,1])
bc = tf.get_variable(name="bc", shape = [n_a,1])
bo = tf.get_variable(name="bo", shape = [n_a,1])
by = tf.get_variable(name="by", shape = [n_y,1])
X_input = tf.placeholder(dtype = tf.int32, shape = [Tx,None])
Y_input = tf.placeholder(dtype = tf.int32, shape = [Ty,None])
X = tf.one_hot(X_input, axis = 0, depth = n_x)
Y = tf.one_hot(Y_input, axis = 0, depth = n_y)
X.shape
a_prev = tf.zeros(shape = [n_a,m])
c_prev = tf.zeros(shape = [n_a,m])
a_all = []
c_all = []
for i in range(Tx):
ac = tf.concat([a_prev,tf.squeeze(tf.slice(input_=X,begin=[0,i,0],size=[n_x,1,m]))], axis=0)
ct = tf.tanh(tf.matmul(Wc,ac) + bc)
tug = tf.sigmoid(tf.matmul(Wu,ac) + bu)
tfg = tf.sigmoid(tf.matmul(Wf,ac) + bf)
tog = tf.sigmoid(tf.matmul(Wo,ac) + bo)
c = tf.multiply(tug,ct) + tf.multiply(tfg,c_prev)
a = tf.multiply(tog,tf.tanh(c))
y = tf.nn.softmax(tf.matmul(Wy,a) + by, axis = 0)
a_all.append(a)
c_all.append(c)
a_prev = a
c_prev = c
y_ex = tf.expand_dims(y,axis=1)
if i == 0:
y_all = y_ex
else:
y_all = tf.concat([y_all,y_ex], axis=1)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=Y,logits=y_all,dim=0))
opt = tf.train.AdamOptimizer()
train = opt.minimize(loss)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
o = sess.run(loss, feed_dict = {X_input:X1,Y_input:Y1})
print(o.shape)
print(o)
sess.run(train, feed_dict = {X_input:X1,Y_input:Y1})
o = sess.run(loss, feed_dict = {X_input:X1,Y_input:Y1})
print(o)

NameError when running GMRes following FEniCS discretisation

I've discretised a diffusion equation with FEniCS as follows:
def DiscretiseEquation(h):
mesh = UnitSquareMesh(h, h)
V = FunctionSpace(mesh, 'Lagrange', 1)
def on_boundary(x, on_boundary):
return on_boundary
bc_value = Constant(0.0)
boundary_condition = DirichletBC(V, bc_value, on_boundary)
class RandomDiffusionField(Expression):
def __init__(self, m, n, element):
self._rand_field = np.exp(-np.random.randn(m, n))
self._m = m
self._n = n
self._ufl_element = element
def eval(self, value, x):
x_index = np.int(np.floor(self._m * x[0]))
y_index = np.int(np.floor(self._n * x[1]))
i = min(x_index, self._m - 1)
j = min(y_index, self._n - 1)
value[0] = self._rand_field[i, j]
def value_shape(self):
return(1, )
class RandomRhs(Expression):
def __init__(self, m, n, element):
self._rand_field = np.random.randn(m, n)
self._m = m
self._n = n
self._ufl_element = element
def eval(self, value, x):
x_index = np.int(np.floor(self._m * x[0]))
y_index = np.int(np.floor(self._n * x[1]))
i = min(x_index, self._m - 1)
j = min(y_index, self._n - 1)
value[0] = self._rand_field[i, j]
def value_shape(self):
return (1, )
u = TrialFunction(V)
v = TestFunction(V)
random_field = RandomDiffusionField(100, 100, element=V.ufl_element())
zero = Expression("0", element=V.ufl_element())
one = Expression("1", element=V.ufl_element())
diffusion = as_matrix(((random_field, zero), (zero, one)))
a = inner(diffusion * grad(u), grad(v)) * dx
L = RandomRhs(h, h, element=V.ufl_element()) * v * dx
A = assemble(a)
b = assemble(L)
boundary_condition.apply(A, b)
A = as_backend_type(A).mat()
(indptr, indices, data) = A.getValuesCSR()
mat = csr_matrix((data, indices, indptr), shape=A.size)
rhs = b.array()
#Solving
x = spsolve(mat, rhs)
#Conversion to a FEniCS function
u = Function(V)
u.vector()[:] = x
I am running the GMRES solver as normal. The callback argument is a separate iteration counter I've defined.
DiscretiseEquation(100)
A = mat
b = rhs
x, info = gmres(A, b, callback = IterCount())
The routine returns a NameError, stating that 'mat' is not defined:
NameError Traceback (most recent call last)
<ipython-input-18-e096b2eea097> in <module>()
1 DiscretiseEquation(200)
----> 2 A = mat
3 b = rhs
4 x_200, info_200 = gmres(A, b, callback = IterCount())
5 gmres_res = closure_variables["residuals"]
NameError: name 'mat' is not defined
As far as I'm aware, it should be defined when I call the DiscretiseEquation function?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Calculating gradients of cusom loss function with Gradient.Tape - tensorflow

Related

Used MLESAC but got poor answer

tensorflow multi slice not reshape

Tensorflow: How to update only single variable at a time out of many variables based on conditions

How to avoid dying weights/gradients in custom LSTM cell in tensorflow. What shall be ideal loss function?

NameError when running GMRes following FEniCS discretisation

Categories

Resources