Gradient error when calculating - pytorch - numpy

I am learning to use pytorch (0.4.0) to automate the gradient calculation, however I did not quite understand how to use the backward () and grad, as I'm doing an exercise I need to calculate df / dw using pytorch and
making the derivative analytically, returning respectively auto_grad, user_grad, but I did not quite understand the use of automatic differentiation, in the code I made f.backward () and did w.grad to find df / dw, in addition the two calculations are not corresponding, if I even erred the derivative, it follows the graph that I am using and the code that I am trying to do:
import numpy as np
import torch
import torch.nn.functional as F
def graph2(W_np, x_np, b_np):
W = torch.Tensor(W_np)
W.requires_grad = True
x = torch.Tensor(x_np)
b = torch.Tensor(b_np)
u = torch.matmul(W, x) + b
g = F.sigmoid(u)
f = torch.sum(g)
user_grad = (sigmoid(W_np*x_np + b_np)*(1 - sigmoid(W_np*x_np + b_np))).T*x_np
f.backward(retain_graph=True)
auto_grad = W.grad
print(auto_grad)
print(user_grad)
# raise NotImplementedError("falta completar a função graph2")
# END YOUR CODE
return f, auto_grad, user_grad
test:
iterations = 1000
sizes = np.random.randint(2,10, size=(iterations))
for i in range(iterations):
size = sizes[i]
W_np = np.random.rand(size, size)
x_np = np.random.rand(size, 1)
b_np = np.random.rand(size, 1)
f, auto_grad, user_grad = graph2(W_np, x_np, b_np)
manual_f = np.sum(sigmoid(np.matmul(W_np, x_np) + b_np))
assert np.isclose(f.data.numpy(), manual_f, atol=1e-4), "f not correct"
assert np.allclose(auto_grad.numpy(), user_grad), "Gradient not correct"

I think you computed the gradients in the wrong way. Try this.
import numpy as np
import torch
from torch.autograd import Variable
import torch.nn.functional as F
def sigmoid(x):
return 1.0 / (1.0 + np.exp(-x))
def graph2(W_np, x_np, b_np):
W = Variable(torch.Tensor(W_np), requires_grad=True)
x = torch.tensor(x_np, requires_grad=True).type(torch.FloatTensor)
b = torch.tensor(b_np, requires_grad=True).type(torch.FloatTensor)
u = torch.matmul(W, x) + b
g = F.sigmoid(u)
f = torch.sum(g)
user_grad = (sigmoid(np.matmul(W_np, x_np) + b_np)*(1 - sigmoid(np.matmul(W_np, x_np) + b_np)))*x_np.T
f.backward(retain_graph=True)
auto_grad = W.grad
print("auto_grad", auto_grad)
print("user_grad", user_grad)
# END YOUR CODE
return f, auto_grad, user_grad
iterations = 1000
sizes = np.random.randint(2,10, size=(iterations))
for i in range(iterations):
size = sizes[i]
print("i, size", i, size)
W_np = np.random.rand(size, size)
x_np = np.random.rand(size, 1)
b_np = np.random.rand(size, 1)
f, auto_grad, user_grad = graph2(W_np, x_np, b_np)
manual_f = np.sum(sigmoid(np.matmul(W_np, x_np) + b_np))
assert np.isclose(f.data.numpy(), manual_f, atol=1e-4), "f not correct"
assert np.allclose(auto_grad.numpy(), user_grad), "Gradient not correct"

Related

custom Keras Layer

I want to make this deep learning network with Keras. This network is proposed for compressing video recently.
One layer of this model is ConvLSTM.
ConvLSTM is good for compressing sequences of images.
I know Keras has the ConvLSTM2D layer but I want to use this class:
import tensorflow as tf
class ConvLSTMCell(tf.nn.rnn_cell.RNNCell):
"""A LSTM cell with convolutions instead of multiplications.
Reference:
Xingjian, S. H. I., et al. "Convolutional LSTM network: A machine learning approach for precipitation nowcasting." Advances in Neural Information Processing Systems. 2015.
"""
def __init__(self, shape, filters, kernel, forget_bias=1.0, activation=tf.tanh, normalize=True, peephole=True, data_format='channels_last', reuse=None):
super(ConvLSTMCell, self).__init__(_reuse=reuse)
self._kernel = kernel
self._filters = filters
self._forget_bias = forget_bias
self._activation = activation
self._normalize = normalize
self._peephole = peephole
if data_format == 'channels_last':
self._size = tf.TensorShape(shape + [self._filters])
self._feature_axis = self._size.ndims
self._data_format = None
elif data_format == 'channels_first':
self._size = tf.TensorShape([self._filters] + shape)
self._feature_axis = 0
self._data_format = 'NC'
else:
raise ValueError('Unknown data_format')
#property
def state_size(self):
return tf.nn.rnn_cell.LSTMStateTuple(self._size, self._size)
#property
def output_size(self):
return self._size
def call(self, x, state):
c, h = state
x = tf.concat([x, h], axis=self._feature_axis)
n = x.shape[-1].value
m = 4 * self._filters if self._filters > 1 else 4
W = tf.get_variable('kernel', self._kernel + [n, m])
y = tf.nn.convolution(x, W, 'SAME', data_format=self._data_format)
if not self._normalize:
y += tf.get_variable('bias', [m], initializer=tf.zeros_initializer())
j, i, f, o = tf.split(y, 4, axis=self._feature_axis)
if self._peephole:
i += tf.get_variable('W_ci', c.shape[1:]) * c
f += tf.get_variable('W_cf', c.shape[1:]) * c
if self._normalize:
j = tf.contrib.layers.layer_norm(j)
i = tf.contrib.layers.layer_norm(i)
f = tf.contrib.layers.layer_norm(f)
f = tf.sigmoid(f + self._forget_bias)
i = tf.sigmoid(i)
c = c * f + i * self._activation(j)
if self._peephole:
o += tf.get_variable('W_co', c.shape[1:]) * c
if self._normalize:
o = tf.contrib.layers.layer_norm(o)
c = tf.contrib.layers.layer_norm(c)
o = tf.sigmoid(o)
h = o * self._activation(c)
state = tf.nn.rnn_cell.LSTMStateTuple(c, h)
return h, state
Now I don't know how to change this class to a custom Keras Layer. Anyone can help me?

Deep neural-network with backpropagation implementation does not work - python

I want to implement a multilayer NN with backpropagation. I have been trying for days, but it simply does not work. It is extremely clear in my head how it is supposed to work, I have streamline my code to be as simple as possible but I can't do it. It's probably something stupid, but I cannot see it.
The implementation I have done is with an input layer of 784 (28x28), two (L) hidden layers of 300 and an output of 10 classes. I have a bias in every layer (except last...)
The output activation is softmax and the hidden activation is ReLU.
I use mini batches of 600 examples over a dataset of 60k examples with 50 to 500 epoches.
Here the core of my code:
Preparation:
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
L = 2
K = len(np.unique(train_labels))
lr = 0.001
nb_epochs = 50
node_per_hidden_layer = 300
nb_batches = 100
W = []
losses_test = []
X_train = np.reshape(train_images, (train_images.shape[0], train_images.shape[1]*train_images.shape[2]))
X_test = np.reshape(test_images, (test_images.shape[0], train_images.shape[1]*train_images.shape[2]))
Y_train = np.zeros((train_labels.shape[0], K))
Y_train[np.arange(Y_train.shape[0]), train_labels] = 1
Y_test = np.zeros((test_labels.shape[0], K))
Y_test[np.arange(Y_test.shape[0]), test_labels] = 1
W.append(np.random.normal(0, 0.01, (X_train.shape[1]+1, node_per_hidden_layer)))
for i in range(L-1):
W.append(np.random.normal(0, 0.01, (node_per_hidden_layer+1, node_per_hidden_layer)))
W.append(np.random.normal(0, 0.01, (node_per_hidden_layer+1, K)))
Helper function:
def softmax(z):
exp = np.exp(z - z.max(1)[:,np.newaxis])
return np.array(exp / exp.sum(1)[:,np.newaxis])
def softmax_derivative(z):
sm = softmax(z)
return sm * (1-sm)
def ReLU(z):
return np.maximum(z, 0)
def ReLU_derivative(z):
return (z >= 0).astype(int)
def get_loss(y, y_pred):
return -np.sum(y * np.log(y_pred))
fitting
def fit():
minibatch_size = len(X_train) // nb_batches
for epoch in range(nb_epochs):
permutaion = list(np.random.permutation(X_train.shape[0]))
X_shuffle = X_train[permutaion]
Y_shuffle = Y_train[permutaion]
print("Epoch----------------", epoch)
for batche in range(0, X_shuffle.shape[0], minibatch_size):
Z = [None] * (L + 2)
a = [None] * (L + 2)
delta = [None] * (L + 2)
X = X_train[batche:batche+minibatch_size]
Y = Y_shuffle[batche:batche+minibatch_size]
### forward propagation
a[0] = np.append(X, np.ones((minibatch_size, 1)), axis=1)
for i in range(L):
Z[i + 1] = a[i] # W[i]
a[i + 1] = np.append(ReLU(Z[i+1]), np.ones((minibatch_size, 1), dtype=int), axis=1)
Z[-1] = a[L] # W[L]
a[-1] = softmax(Z[-1])
### back propagation
delta[-1] = (Y - a[-1]) * softmax_derivative(Z[-1])
for i in range(L, 0, -1):
delta[i] = (delta[i+1] # W[i].T)[:,:-1] * ReLU_derivative(Z[i])
for i in range(len(W)):
g = a[i].T # delta[i+1] / minibatch_size
W[i] = W[i] + lr * g
get_loss_on_test()
loss
def get_loss_on_test():
Z_test = [None] * (L + 2)
a_test = [None] * (L + 2)
a_test[0] = np.append(X_test, np.ones((len(X_test), 1)), axis=1)
for i in range(L):
Z_test[i + 1] = a_test[i] # W[i]
a_test[i + 1] = np.append(ReLU(Z_test[i+1]), np.ones((len(X_test), 1)), axis=1)
Z_test[-1] = a_test[L] # W[L]
a_test[-1] = softmax(Z_test[-1])
losses_test.append(get_loss(Y_test, a_test[-1]))
main
losses_test.clear()
fit()
plt.plot(losses_test)
plt.show()
If you want to see it in my notebook with an example of losses graph, here the link: https://github.com/beurnii/INF8225/blob/master/tp2/jpt.ipynb
If you want more details on my assignment, this is part 1b (page 2 for english):
https://github.com/beurnii/INF8225/blob/master/tp2/INF8225_TP2_2020.pdf

ValueError: No gradients provided for any variable tensorflow 2.0

I am using tensorflow 2.0 and trying to make a actor critic algorithm to play the game of cartpole. I have done everything right but getting the following error: ValueError: No gradients provided for any variable: ['dense/kernel:0', 'dense/bias:0', 'dense_1/kernel:0', 'dense_1/bias:0'].
Please help me out
Here is my code:
import gym
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
MAX_EPISODES = 2000
GAMMA = 0.9
LR_A = 0.001
LR_C = 0.01
env = gym.make("CartPole-v0")
N_ACTIONS = env.action_space.n
N_FEATURES = 4
def make_actor(n_features, n_actions):
inputs = tf.keras.Input(shape=[n_features])
hidden = tf.keras.layers.Dense(20, activation=tf.nn.relu)(inputs)
dist = tf.keras.layers.Dense(n_actions, activation=tf.nn.softmax)(hidden)
model = tf.keras.Model(inputs=inputs, outputs=dist)
return model
def make_critic(n_features):
inputs = tf.keras.Input(shape=[n_features])
hidden = tf.keras.layers.Dense(20, activation=tf.nn.relu)(inputs)
value = tf.keras.layers.Dense(1)(hidden)
model = tf.keras.Model(inputs=inputs, outputs=value)
return model
actor = make_actor(N_FEATURES, N_ACTIONS)
critic = make_critic(N_FEATURES)
actor.summary()
critic.summary()
actor_optimizer = tf.keras.optimizers.Adam(LR_A)
critic_optimizer = tf.keras.optimizers.Adam(LR_C)
def loss_actor(s, a, td_error):
dist = actor(s.reshape(1, 4)).numpy()
log_prob = np.log(dist[0, a])
exp_v = np.mean(log_prob * td_error)
return tf.multiply(exp_v, -1)
def loss_critic(s, s_, r, gamma):
s, s_ = s[np.newaxis, :], s_[np.newaxis, :]
v = critic(s)
v_ = critic(s_)
td_error = r + gamma * v_ - v
return tf.multiply(td_error, 1)
def train(max_episodes):
for episode in range(max_episodes):
s = env.reset().astype(np.float32)
t = 0
track_r = []
while True:
dist = actor(s.reshape(1, 4)).numpy()
a = np.random.choice(range(N_ACTIONS), p=dist.ravel())
s_, r, done, info = env.step(a)
s_ = s_.astype(np.float32)
if done: r=-20
track_r.append(r)
with tf.GradientTape() as cri_tape, tf.GradientTape() as act_tape:
td_error = loss_critic(s, s_, r, GAMMA)
gradient = cri_tape.gradient(td_error, critic.trainable_variables)
critic_optimizer.apply_gradients(zip(gradient,critic.trainable_variables))
with tf.GradientTape() as act_tape:
neg_exp_v = loss_actor(s, a, td_error.numpy())
gradient = act_tape.gradient(neg_exp_v, critic.trainable_variables)
actor_optimizer.apply_gradients(zip(gradient, actor.trainable_variables))
s = s_
t += 1
if done:
print("Episode:{} Steps:{}".format(episode+1, t))
train(MAX_EPISODES)
The error is on line 69:actor_optimizer.apply_gradients(zip(gradient, actor.trainable_variables))
When I tried to print out the gradients for the actor the result was None.
I am really not getting where the problem is.

Tensorflow: sparse_tensor_dense_matmul slower than regular matmul

I have 2 scenarios:
scenario 1:
op: sparse_tensor_dense_matmul
A: 1000x1000 sparsity = 90%
B: 1000x1000 sparsity = 0%
scenario 2:
op: matmul
A: 1000x1000 sparsity = 0%
B: 1000x1000 sparsity = 0%
I understand that GPUs do not compute sparse matrix multiplication well but I would certainly expect them to perform it atleast as well as they perform non-sparse matrix mulipliation. In my code I get 10x slower for sparse_tensor_dense_matmul!
import tensorflow as tf
import numpy as np
import time
import itertools
rate = 0.1
N = 1000
itrs = 1000
num = int(rate * N * N)
combs = np.array(list(itertools.product(range(N), range(N))))
choices = range(len(combs))
_idxs = np.random.choice(a=choices, size=num, replace=False).tolist()
_idxs = combs[_idxs]
_idxs = _idxs.tolist()
_idxs = sorted(_idxs)
_vals = np.float32(np.random.rand(num))
_y = np.random.uniform(low=-1., high=1., size=(N, N))
_z = np.random.uniform(low=-1., high=1., size=(N, N))
################################################
x = tf.SparseTensor(indices=_idxs, values=_vals, dense_shape=(N, N))
y = tf.Variable(_y, dtype=tf.float32)
z = tf.Variable(_z, dtype=tf.float32)
sparse_dot = tf.sparse_tensor_dense_matmul(x, y)
dot = tf.matmul(z, y)
################################################
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
tf.local_variables_initializer().run()
start = time.time()
for i in range(itrs):
[_sparse_dot] = sess.run([sparse_dot], feed_dict={})
total = time.time() - start
print (total)
start = time.time()
for i in range(itrs):
[_dot] = sess.run([dot], feed_dict={})
total = time.time() - start
print (total)
################################################
25.357680797576904
2.7684502601623535

`scipy.optimize` functions hang even with `maxiter=0`

I am trying to train the MNIST data (which I downloaded from Kaggle) with simple multi-class logistic regression, but the scipy.optimize functions hang.
Here's the code:
import csv
from math import exp
from numpy import *
from scipy.optimize import fmin, fmin_cg, fmin_powell, fmin_bfgs
# Prepare the data
def getIiter(ifname):
"""
Get the iterator from a csv file with filename ifname
"""
ifile = open(ifname, 'r')
iiter = csv.reader(ifile)
iiter.__next__()
return iiter
def parseRow(s):
y = [int(x) for x in s]
lab = y[0]
z = y[1:]
return (lab, z)
def getAllRows(ifname):
iiter = getIiter(ifname)
x = []
l = []
for row in iiter:
lab, z = parseRow(row)
x.append(z)
l.append(lab)
return x, l
def cutData(x, y):
"""
70% training
30% testing
"""
m = len(x)
t = int(m * .7)
return [(x[:t], y[:t]), (x[t:], y[t:])]
def num2IndMat(l):
t = array(l)
tt = [vectorize(int)((t == i)) for i in range(10)]
return array(tt).T
def readData(ifname):
x, l = getAllRows(ifname)
t = [[1] + y for y in x]
return array(t), num2IndMat(l)
#Calculate the cost function
def sigmoid(x):
return 1 / (1 + exp(-x))
vSigmoid = vectorize(sigmoid)
vLog = vectorize(log)
def costFunction(theta, x, y):
sigxt = vSigmoid(dot(x, theta))
cm = (- y * vLog(sigxt) - (1 - y) * vLog(1 - sigxt)) / m / N
return sum(cm)
def unflatten(flatTheta):
return [flatTheta[i * N : (i + 1) * N] for i in range(n + 1)]
def costFunctionFlatTheta(flatTheta):
return costFunction(unflatten(flatTheta), trainX, trainY)
def costFunctionFlatTheta1(flatTheta):
return costFunction(flatTheta.reshape(785, 10), trainX, trainY)
x, y = readData('train.csv')
[(trainX, trainY), (testX, testY)] = cutData(x, y)
m = len(trainX)
n = len(trainX[0]) - 1
N = len(trainY[0])
initTheta = zeros(((n + 1), N))
flatInitTheta = ndarray.flatten(initTheta)
flatInitTheta1 = initTheta.reshape(1, -1)
In the last two lines we flatten initTheta because the fmin{,_cg,_bfgs,_powell} functions seem to only take vectors as the initial value argument x0. I also flatten initTheta using reshape in hope this answer can be of help.
There is no problem computing the cost function which takes up less than 2 seconds on my computer:
print(costFunctionFlatTheta(flatInitTheta), costFunctionFlatTheta1(flatInitTheta1))
# 0.69314718056 0.69314718056
But all the fmin functions hang, even if I set maxiter=0.
e.g.
newFlatTheta = fmin(costFunctionFlatTheta, flatInitTheta, maxiter=0)
or
newFlatTheta1 = fmin(costFunctionFlatTheta1, flatInitTheta1, maxiter=0)
When I interrupt the program, it seems to me it all hangs at lines in optimize.py calling the cost functions, lines like this:
return function(*(wrapper_args + args))
For example, if I use fmin_cg, this would be line 292 in optimize.py (Version 0.5).
How do I solve this problem?
OK I found a way to stop fmin_cg from hanging.
Basically I just need to write a function that computes the gradient of the cost function, and pass it to the fprime parameter of fmin_cg.
def gradient(theta, x, y):
return dot(x.T, vSigmoid(dot(x, theta)) - y) / m / N
def gradientFlatTheta(flatTheta):
return ndarray.flatten(gradient(flatTheta.reshape(785, 10), trainX, trainY))
Then
newFlatTheta = fmin_cg(costFunctionFlatTheta, flatInitTheta, fprime=gradientFlatTheta, maxiter=0)
terminates within seconds, and setting maxiter to a higher number (say 100) one can train the model within reasonable amount of time.
The documentation of fmin_cg says the gradient would be numerically computed if no fprime is given, which is what I suspect caused the hanging.
Thanks to this notebook by zgo2016#Kaggle which helped me find the solution.