Gradient in Pytorch and TensorFlow - tensorflow

I am new to PyTorch and Tensorflow and I would like to use them for solving ODEs and PDEs. My question is how to take a gradient of a vector (let's say Y=Y_(3*1) and Y = [Y1(X1,X2,X3), Y2(X1,X2,X3), Y3(X1,X2,X3)]^T with respect to the vector X = [X1,X2,X3]^T to get the following matrix with both PyTorch and TensorFlow (Keras).
F = [[Y11, Y12, Y13] , [Y21, Y22, Y23], [Y31, Y32, Y33]]
where
Yij = dYi/dXj
Thanks
I expect to get
F = [[Y11, Y12, Y13] , [Y21, Y22, Y23], [Y31, Y32, Y33]]
where
Yij = dYi/dXj

Related

How to perform custom operations in between keras layers?

I have one input and one output neural network and in between I need to perform small operation. I have two inputs (from the same distribution of either mean 0 or mean 1) which I need to fed to the neural network one at a time and compare the output of each input. After the comparison, I am finally generating the prediction of the model. The implementation is as follows:
from tensorflow import keras
import tensorflow as tf
import numpy as np
#define network
x1 = keras.Input(shape=(1), name="x1")
x2 = keras.Input(shape=(1), name="x2")
model = keras.layers.Dense(20)
model1 = keras.layers.Dense(1)
x11 = model1(model(x1))
x22 = model1(model(x2))
After this I need to perform following operations:
if x11>=x22:
Vm=x1
else:
Vm=x2
Finally I need to do:
out = Vm - 0.5
out= keras.activations.sigmoid(out)
model = keras.Model([x1,x2], out)
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss=tf.keras.losses.binary_crossentropy,
metrics=['accuracy']
)
model.summary()
tf.keras.utils.plot_model(model) #visualize model
I have normally distributed pair of data with same mean (mean 0 and mean 1 as generated below:
#Generating training dataset
from scipy.stats import skewnorm
n=1000 #sample each
s = 1 # scale to change o/p range
X1_0 = skewnorm.rvs(a = 0 ,loc=0, size=n)*s; X1_1 = skewnorm.rvs(a = 0 ,loc=1, size=n)*s #Skewnorm function
X2_0 = skewnorm.rvs(a = 0 ,loc=0, size=n)*s; X2_1 = skewnorm.rvs(a = 0 ,loc=1, size=n)*s #Skewnorm function
X1_train = list(X1_0) + list(X1_1) #append both data
X2_train = list(X2_0) + list(X2_1) #append both data
y_train = [x for x in (0,1) for i in range(0, n)] #make Y for above conditions
#reshape to proper format
X1_train = np.array(X1_train).reshape(-1,1)
X2_train = np.array(X2_train).reshape(-1,1)
y_train = np.array(y_train)
#train model
model.fit([X1_train, X2_train], y_train, epochs=10)
I am not been able to run the program if I include operation
if x11>=x22:
Vm=x1
else:
Vm=x2
in between layers. If I directly work with maximum of outputs as:
Vm = keras.layers.Maximum()([x11,x22])
The program is working fine. But I need to select either x1 or x2 based on the value of x11 and x22.
The problem might be due to the inclusion of the comparison operation while defining structure of the model where there is no value for x11 and x22 (I guess). I am totally new to all these stuffs and so I could not resolve this. I would greatly appreciate any help/suggestions. Thank you.
You can add this functionality via a Lambda layer.
Vm = tf.keras.layers.Lambda(lambda x: tf.where(x[0]>=x[1], x[2], x[3]))([x11, x22, x1, x2])

Get logits of a trained Keras model [duplicate]

I am building a deconvolution network. I would like to add a layer to it which is the reverse of a softmax. I tried to write a basic python function that returns the inverse of a softmax for a given matrix and put that in a tensorflow Lambda and add it to my model.
I have no error but when I doing a predict I only have 0 at the exit. When I don't add this layer to my network I have output something other than zeros. This therefore justifies that they are due to my inv_softmax function which is bad.
Can you enlighten me how to proceed?
I define my funct as this :
def inv_softmax(x):
C=0
S = np.zeros((1,1,10)) #(1,1,10) is the shape of the datas that my layer will receive
try:
for j in range(np.max(np.shape(x))):
C+=np.exp(x[0,0,j])
for i in range(np.max(np.shape(x))):
S[0,0,i] = np.log(x[0,0,i]+C
except ValueError:
print("ValueError in inv_softmax")
pass
S = tf.convert_to_tensor(S,dtype=tf.float32)
return S
I add it to my network as :
x = ...
x = layers.Lambda(lambda x : inv_softmax(x),name='inv_softmax',output_shape=[1,1,10])(x)
x = ...
If you need more of my code or others informations ask me please.
Try this:
import tensorflow as tf
def inv_softmax(x, C):
return tf.math.log(x) + C
import math
input = tf.keras.layers.Input(shape=(1,10))
x = tf.keras.layers.Lambda(lambda x : inv_softmax(x, math.log(10.)),name='inv_softmax')(input)
model = tf.keras.Model(inputs=input, outputs=x)
a = tf.zeros([1, 1, 10])
a = tf.nn.softmax(a)
a = model(a)
print(a.numpy())
Thanks it works !
I put :
import keras.backend as K
def inv_softmax(x,C):
return K.log(x)+K.log(C)

LSTM from scratch in tensorflow 2

I'm trying to make LSTM in tensorflow 2.1 from scratch, without using the one already supplied with keras (tf.keras.layers.LSTM), just to learn and code something. To do so, I've defined a class "Model" that when called (like with model(input)) it computes the matrix multiplications of the LSTM. I'm pasting here part of my code, the other parts are on github (link)
class Model(object):
[...]
def __call__(self, inputs):
assert inputs.shape == (vocab_size, T_steps)
outputs = []
for time_step in range(T_steps):
x = inputs[:,time_step]
x = tf.expand_dims(x,axis=1)
z = tf.concat([self.h_prev,x],axis=0)
f = tf.matmul(self.W_f, z) + self.b_f
f = tf.sigmoid(f)
i = tf.matmul(self.W_i, z) + self.b_i
i = tf.sigmoid(i)
o = tf.matmul(self.W_o, z) + self.b_o
o = tf.sigmoid(o)
C_bar = tf.matmul(self.W_C, z) + self.b_C
C_bar = tf.tanh(C_bar)
C = (f * self.C_prev) + (i * C_bar)
h = o * tf.tanh(C)
v = tf.matmul(self.W_v, h) + self.b_v
v = tf.sigmoid(v)
y = tf.math.softmax(v, axis=0)
self.h_prev = h
self.C_prev = C
outputs.append(y)
outputs = tf.squeeze(tf.stack(outputs,axis=1))
return outputs
But this neural netoworks has three problems:
1) it is way slow during training. In comparison a model that uses tf.keras.layers.LSTM() is trained more than 10 times faster. Why is this? Maybe because I didn't use a minibatch training, but a stochastic one?
2) the NN seems to not learn anything at all. After just some (very few!) training examples, the loss seems to settle down and it won't decrease anymore, but rather it oscillates around the reached value. After training, I tested the NN making it generate some text, but it just outputs non-sense gibberish. Why isn't learning anything?
3) the loss function outputs very high values. I've coded a categorical cross-entropy loss function but, with 100 characters long sequence, the value of the function is over 370 per training example. Shouldn't it be way lower than this?
I've wrote the loss function like this:
def compute_loss(predictions, desired_outputs):
l = 0
for i in range(T_steps):
l -= tf.math.log(predictions[desired_outputs[i], i])
return l
I know they're open questions, but unfortunately I can't make it works. So any answer, even a short answer that help me to make myself solve the problem, is fine :)

backward, grad function in pytorch

I'm trying to implement backward, grad function in pytorch.
But, I don't know why this value is returned.
Here is my code.
x = Variable(torch.FloatTensor([[1,2],[3,4]]), requires_grad=True)
y = x + 2
z = y * y
gradient = torch.ones(2, 2)
z.backward(gradient)
print(x.grad)
I think that result value should be [[6,8],[10,12]]
Because of dz/dx= 2*(x+2) and x=1,2,3,4
But returned value is [[7,9],[11,13]]
Why this is happened.. I want to know how gradient, grad function is doing.
Help me please..
The below piece of code on pytorch v0.12.1
import torch
from torch.autograd import Variable
x = Variable(torch.FloatTensor([[1,2],[3,4]]), requires_grad=True)
y = x + 2
z = y * y
gradient = torch.ones(2, 2)
z.backward(gradient)
print(x.grad)
returns
Variable containing:
6 8
10 12
[torch.FloatTensor of size 2x2]
Update your pytorch installation. This explains the working of autograd, which handles gradient computation for pytorch.

Implementing a one-to-many rnn model in tensorflow

I would like to implement a "none-to-many" RNN of the following form: "Learning to learn without gradient descent
by gradient descent"
to reproduce this piece of work https://arxiv.org/pdf/1611.03824.pdf.
The input ("training-data") to this model is the function f and not a sequence of data as usual. What I would like to do is something like
x_0 = tf.constant(..)
h_0 = tf.constant(..)
f_params = tf.placeholder(..)
h = h_0
x = x_0
cell = tf.contrib.rnn.LSTMCell(num_units)
for _ in range(seq_length):
y = f(x, f_params)
x,h = cell([x,y],h)
But I can not find a way to get this to work. All examples that I can find online use tf.contrib.rnn.static_rnn() or tf.nn.dynamic_rnn() to implement "many-to-many" or "many-to-one" architectures.