A gradient is a nested list of tensors. I want to get the total number of elements in the gradient, and record this number as an int. However, I don't know how to do this in tf.function.
import tensorflow as tf
grad = [tf.ones((500,200)), tf.ones((200,)), tf.ones((200,1))]
def test(grad):
m = tf.cast(0, tf.int32)
for i in grad:
m = m + tf.math.reduce_prod(tf.shape(i))
out = tf.zeros((m,))
out.set_shape((m,))
return out
The code works as intended in eager mode. If you apply tf.function, you will get the following error
TypeError: Dimension value must be integer or None or have an index method, got value '<tf.Tensor 'add_2:0' shape=() dtype=int32>' with type '<class 'tensorflow.python.framework.ops.Tensor'>'
The issue is that 'm' should be <tf.Tensor: shape=(), dtype=int32, numpy=100400> but it is <tf.Tensor 'add_2:0' shape=() dtype=int32>.
Per Jirayu here is the workaround
import tensorflow as tf
grad = [tf.ones((500,200)), tf.ones((200,)), tf.ones((200,1))]
#tf.function
def test(grad):
m = tf.cast(0, tf.int32)
for i in grad:
m = m + tf.math.reduce_prod(tf.shape(i))
m = tf.zeros((m,)).shape[0] # this m can be used to define shape
return out
Calculate and pre-calculation variables.
Pre-calculation methods that reduce limits of computation as similar tasks.
[ Sample ]:
import tensorflow as tf
grad = [tf.ones((500,200)), tf.ones((200,)), tf.ones((200,1))]
#tf.function
def test(grad):
m = tf.cast(0, tf.int32)
m = tf.add(m, 0)
for i in grad:
m = m + tf.math.reduce_prod(tf.shape(i))
# Compute of tf.math.multiply
out = tf.zeros((int(m),))
out.set_shape((out.shape[0],))
return out
print( test( grad ) )
[ Output ]:
You don't have to create a zero tensor just to get the number of elements in the gradient.
grad = [tf.ones((500,100)), tf.ones((200,)), tf.ones((100,1,3))]
#tf.function
def test(grad):
m = tf.reduce_sum([tf.reduce_prod(i.shape) for i in grad])
return m
test(grad)
Related
As said in https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Optimizer?hl=en#minimize, the first parameter of minmize should satisfy the requirement,
Tensor or callable. If a callable, loss should take no arguments and return the value to minimize. If a Tensor, the tape argument must be passed.
The first piece of code takes tensor as the input of minimize(), and it requires the gradient tape, but I don't know how.
The second piece of code takes callable function as the input of minimize(), which is easy
import numpy as np
import tensorflow as tf
from tensorflow import keras
x_train = [1, 2, 3]
y_train = [1, 2, 3]
W = tf.Variable(tf.random.normal([1]), name='weight')
b = tf.Variable(tf.random.normal([1]), name='bias')
hypothesis = W * x_train + b
#tf.function
def cost():
y_model = W * x_train + b
error = tf.reduce_mean(tf.square(y_train - y_model))
return error
optimizer = tf.optimizers.SGD(learning_rate=0.01)
cost_value = cost()
train = tf.keras.optimizers.Adam().minimize(cost_value, var_list=[W, b])
tf.print(W)
tf.print(b)
How to add the gradient tape, I know the following code certainly works.
import numpy as np
import tensorflow as tf
from tensorflow import keras
x_train = [1, 2, 3]
y_train = [1, 2, 3]
W = tf.Variable(tf.random.normal([1]), name='weight')
b = tf.Variable(tf.random.normal([1]), name='bias')
hypothesis = W * x_train + b
#tf.function
def cost():
y_model = W * x_train + b
error = tf.reduce_mean(tf.square(y_train - y_model))
return error
optimizer = tf.optimizers.SGD(learning_rate=0.01)
cost_value = cost()
train = tf.keras.optimizers.Adam().minimize(cost, var_list=[W, b])
tf.print(W)
tf.print(b)
Please help me revise the first piece of code and let it run, thanks!
This occurs because .minimize() expects a function. While cost_value&cost(), is a tf.Tensor object, cost is a tf.function. You should directly pass your loss function into the minimize as tf.keras.optimizers.Adam().minimize(cost, var_list=[W, b]).
Changed part for Gradient:
train = tf.keras.optimizers.Adam().minimize(cost(), var_list=[W, b],tape=tf.GradientTape())
This is a late answer (Hakan basically got it for you), but I write this in hopes that it will help people in the future that are stuck and googling this exact question (like I was). This is also an alternate implementation using the tf.GradientTape() directly.
import numpy as np
import tensorflow as tf
from tensorflow import keras
x_train = [1, 2, 3]
y_train = [1, 2, 3]
W = tf.Variable(tf.random.normal([1]), trainable = True, name='weight')
b = tf.Variable(tf.random.normal([1]), trainable = True, name='bias')
#tf.function
def cost(W, b):
y_model = W * x_train + b
error = tf.reduce_mean(tf.square(y_train - y_model))
return error
optimizer = tf.optimizers.SGD(learning_rate=0.01)
trainable_vars = [W,b]
epochs = 100 #(or however many iterations you want it to run)
for _ in range(epochs):
with tf.GradientTape() as tp:
#your loss/cost function must always be contained within the gradient tape instantiation
cost_fn = cost(W, b)
gradients = tp.gradient(cost_fn, trainable_vars)
optimizer.apply_gradients(zip(gradients, trainable_vars))
tf.print(W)
tf.print(b)
This should give you the value of your weights and biases after the number of epochs you ran.
You must compute the loss function everytime a new gradient tape is invoked. Then you get the gradient of your loss function, and then call optimizer.apply_gradient to do your minimization according to what tensorflow documentation says here: https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Optimizer#apply_gradients.
I am trying to learn a similarity matrix(M) between two image embeddings, A single instance of training is a pair of images - (anchor, positive). So ideally the model will return 0 distance for embeddings of similar images.
The problem is, when i declare the distance matrix(M) as a tf.Variable, it returns an error
on this line
self.optimizer.apply_gradients(zip(gradients, self.trainable_variables))
TypeError: 'Variable' object is not iterable.
I think I should use a tensorflow datatype for M, that is iterable
Please tell me how I can fix this issue
import tensorflow as tf
from tensorflow import keras
# metric learning model
class MetricLearningModel:
def __init__(self, lr):
self.optimizer = keras.optimizers.Adam(lr=lr)
self.lr = lr
self.loss_object = keras.losses.MeanSquaredError()
self.trainable_variables = tf.Variable(
(tf.ones((2048, 2048), dtype=tf.float32)),
trainable=True
)
def similarity_function(self, anchor_embeddings, positive_embeddings):
M = self.trainable_variables
X_i = anchor_embeddings
X_j = positive_embeddings
similarity_value = tf.matmul(X_j, M, name='Tensor')
similarity_value = tf.matmul(similarity_value, tf.transpose(X_i), name='Tensor')
# distance(x,y) = sqrt( (x-y)#M#(x-y).T )
return similarity_value
def train_step(self, anchor, positive):
anchor_embeddings, positive_embeddings = anchor, positive
# Calculate gradients
with tf.GradientTape() as tape:
# Calculate similarity between anchors and positives.
similarities = self.similarity_function(anchor_embeddings, positive_embeddings)
y_pred = similarities
y_true = tf.zeros(1)
print(y_true, y_pred)
loss_value = self.loss_object(
y_pred=y_true,
y_true=y_pred,
)
gradients = tape.gradient(loss_value, self.trainable_variables)
# Apply gradients via optimizer
self.optimizer.apply_gradients(zip(gradients, self.trainable_variables))
metric_model = MetricLearningModel(lr=1e-3)
anchor, positive = tf.ones((1, 2048), dtype=tf.float32), tf.ones((1, 2048), dtype=tf.float32)
metric_model.train_step(anchor, positive)
The python zip function expects iterable objects, like for example a list or a tuple.
In your calls to tape.gradient, or optimizer.apply_gradients, you can put your Variable in a list to solve the issue :
with tf.GradienTape() as tape:
gradients = tape.gradient(loss_value, [self.trainable_variables])
# Apply gradients via optimizer
self.optimizer.apply_gradients(zip(gradients, [self.trainable_variables]))
tape.gradient respects the shape of the sources object passed to compute the gradients of, so if you feed it with a list, you will get a list out of it. It is stated in the documentation:
Returns
a list or nested structure of Tensors (or IndexedSlices, or None), one for each element in sources. Returned structure is the same as the structure of sources.
The following code has the irritating trait of making every row of "out" the same. I am trying to classify k time series in Xtrain as [1,0,0,0], [0,1,0,0], [0,0,1,0], or [0,0,0,1], according to the way they were generated (by one of four random algorithms). Anyone know why? Thanks!
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import copy
n = 100
m = 10
k = 1000
hidden_layers = 50
learning_rate = .01
training_epochs = 10000
Xtrain = []
Xtest = []
Ytrain = []
Ytest = []
# ... fill variables with data ..
x = tf.placeholder(tf.float64,shape = (k,1,n,1))
y = tf.placeholder(tf.float64,shape = (k,1,4))
conv1_weights = 0.1*tf.Variable(tf.truncated_normal([1,m,1,hidden_layers],dtype = tf.float64))
conv1_biases = tf.Variable(tf.zeros([hidden_layers],tf.float64))
conv = tf.nn.conv2d(x,conv1_weights,strides = [1,1,1,1],padding = 'VALID')
sigmoid1 = tf.nn.sigmoid(conv + conv1_biases)
s = sigmoid1.get_shape()
sigmoid1_reshape = tf.reshape(sigmoid1,(s[0],s[1]*s[2]*s[3]))
sigmoid2 = tf.nn.sigmoid(tf.layers.dense(sigmoid1_reshape,hidden_layers))
sigmoid3 = tf.nn.sigmoid(tf.layers.dense(sigmoid2,4))
penalty = tf.reduce_sum((sigmoid3 - y)**2)
train_op = tf.train.AdamOptimizer(learning_rate).minimize(penalty)
model = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(model)
for i in range(0,training_epochs):
sess.run(train_op,{x: Xtrain,y: Ytrain})
out = sigmoid3.eval(feed_dict = {x: Xtest})
Likely because your loss function is mean squared error. If you're doing classification you should be using cross-entropy loss
Your loss is penalty = tf.reduce_sum((sigmoid3 - y)**2) that's the squared difference elementwise between a batch of predictions and a batch of values.
Your network output (sigmoid3) is a tensor with shape [?, 4] and y (I guess) is a tensor with shape [?, 4] too.
The squared difference has thus shape [?, 4].
This means that the tf.reduce_sum is computing in order:
The sum over the second dimension of the squared difference, producing a tensor with shape [?]
The sum over the first dimension (the batch size, here indicated with ?) producing a scalar value (shape ()) that's your loss value.
Probably you don't want this behavior (the sum over the batch dimension) and you're looking for the mean squared error over the batch:
penalty = tf.reduce_mean(tf.squared_difference(sigmoid3, y))
I am trying to use some_model.predict(x) within a custom loss function.
I found this custom loss function:
_EPSILON = K.epsilon()
def _loss_tensor(y_true, y_pred):
y_pred = K.clip(y_pred, _EPSILON, 1.0-_EPSILON)
out = -(y_true * K.log(y_pred) + (1.0 - y_true) * K.log(1.0 - y_pred))
return K.mean(out, axis=-1)
But the problem is that model.predict() is expecting a numpy array.
So I looked for how to convert a tensor (y_pred) to a numpy array.
I found tmp = K.tf.round(y_true) but this returns a tensor.
I have also found: x = K.eval(y_true) which takes a Keras variable and returns a numpy array.
This produces the error: You must feed a value for placeholder tensor 'dense_78_target' with dtype float.....
Some people suggested setting the learning phase to true. I did that, but it did not help.
What I just want to do:
def _loss_tensor(y_true, y_pred):
y_tmp_true = first_decoder.predict(y_true)
y_tmp_pred = first_decoder.predict(y_pred)
return keras.losses.binary_crossentropy(y_tmp_true,y_tmp_pred)
Any help would be appreciated.
This works:
sess = K.get_session()
with sess.as_default():
tmp = K.tf.constant([1,2,3]).eval()
print(tmp)
I also tried this now:
tmp = first_decoder(y_true)
This fails the assertion:
assert input_shape[-1]
Maybe someone knows how to resolve this?
Update:
I can now feed it through the model with:
y_t = first_decoder(K.reshape(y_true, (1,512)))
y_p = first_decoder(K.reshape(y_pred, (1,512)))
But when I try to return the binary cross entropy the shape is not right:
Input to reshape is a tensor with 131072 values, but the requested shape has
512
I figured out that 131072 was the product of my batch size and input size (256*512). I then adopted my code to reshape to (256,512) size. The first batch runs fine, but then I get another error that says that the passed size was (96,512).
[SOLVED]Update:
It works now:
def _loss_tensor(y_true, y_pred):
num_ex = K.shape(y_true)[0]
y_t = first_decoder(K.reshape(y_true, (num_ex,512)))
y_p = first_decoder(K.reshape(y_pred, (num_ex,512)))
return keras.losses.binary_crossentropy(y_t,y_p)
I have a question concerning the implementation of a correlation-based loss function for a sequence labelling task in Keras (Tensorflow backend).
Consider we have a sequence labelling problem, e.g., the input is a tensor of shape (20,100,5), the output is a tensor of shape (20,100,1).
In the documentation it is written that, the loss function needs to return a "scalar for each data point". What the default MSE loss does for the loss between tensors of shape (20,100,1) is to return a loss tensor of shape (20,100).
Now, if we use a loss function based on the correlation coefficient for each sequence, in theory, we will get only a single value for each sequence, i.e., a tensor of shape (20,).
However, using this in Keras as a loss function, fit() returns an error as a tensor of shape (20,100) is expected.
On the other side, there is no error when I either
Return just the mean value of the tensor (a single scalar for the whole data), or
Repeat the tensor (using K.repeat_elements) ending up in a tensor of shape (20,100).
The framework does not return an error (Tensorflow backend) and the loss is reduced over epochs, also on independent test data, the performance is good.
My questions are:
Which dimensionality of the targets/losses does the "fit" function usually assume in case of sequences?
Is the Tensorflow backend able to derive the gradients properly also with only the mean value returned?
Please find below an executable example with my implementations of correlation-based loss functions.
my_loss_1 returns only the mean value of the correlation coefficients of all (20) sequences.
my_loss_2 returns only one loss for each sequence (does not work in a real training).
my_loss_3 repeats the loss for each sample within each sequence.
Many thanks and best wishes
from keras import backend as K
from keras.losses import mean_squared_error
import numpy as np
import tensorflow as tf
def my_loss_1(seq1, seq2): # Correlation-based loss function - version 1 - return scalar
seq1 = K.squeeze(seq1, axis=-1)
seq2 = K.squeeze(seq2, axis=-1)
seq1_mean = K.mean(seq1, axis=-1, keepdims=True)
seq2_mean = K.mean(seq2, axis=-1, keepdims=True)
nominator = K.sum((seq1-seq1_mean) * (seq2-seq2_mean), axis=-1)
denominator = K.sqrt( K.sum(K.square(seq1-seq1_mean), axis=-1) * K.sum(K.square(seq2-seq2_mean), axis=-1) )
corr = nominator / (denominator + K.common.epsilon())
corr_loss = K.constant(1.) - corr
corr_loss = K.mean(corr_loss)
return corr_loss
def my_loss_2(seq1, seq2): # Correlation-based loss function - version 2 - return 1D array
seq1 = K.squeeze(seq1, axis=-1)
seq2 = K.squeeze(seq2, axis=-1)
seq1_mean = K.mean(seq1, axis=-1, keepdims=True)
seq2_mean = K.mean(seq2, axis=-1, keepdims=True)
nominator = K.sum((seq1-seq1_mean) * (seq2-seq2_mean), axis=-1)
denominator = K.sqrt( K.sum(K.square(seq1-seq1_mean), axis=-1) * K.sum(K.square(seq2-seq2_mean), axis=-1) )
corr = nominator / (denominator + K.common.epsilon())
corr_loss = K.constant(1.) - corr
return corr_loss
def my_loss_3(seq1, seq2): # Correlation-based loss function - version 3 - return 2D array
seq1 = K.squeeze(seq1, axis=-1)
seq2 = K.squeeze(seq2, axis=-1)
seq1_mean = K.mean(seq1, axis=-1, keepdims=True)
seq2_mean = K.mean(seq2, axis=-1, keepdims=True)
nominator = K.sum((seq1-seq1_mean) * (seq2-seq2_mean), axis=-1)
denominator = K.sqrt( K.sum(K.square(seq1-seq1_mean), axis=-1) * K.sum(K.square(seq2-seq2_mean), axis=-1) )
corr = nominator / (denominator + K.common.epsilon())
corr_loss = K.constant(1.) - corr
corr_loss = K.reshape(corr_loss, (-1,1))
corr_loss = K.repeat_elements(corr_loss, K.int_shape(seq1)[1], 1) # Does not work for fit(). It seems that NO dimension may be None in order to get a value!=None from int_shape().
return corr_loss
# Test
sess = tf.Session()
# input (20,100,1)
a1 = np.random.rand(20,100,1)
a2 = np.random.rand(20,100,1)
print('\nInput: ' + str(a1.shape))
p1 = K.placeholder(shape=a1.shape, dtype=tf.float32)
p2 = K.placeholder(shape=a1.shape, dtype=tf.float32)
loss0 = mean_squared_error(p1,p2)
print('\nMSE:') # output: (20,100)
print(sess.run(loss0, feed_dict={p1: a1, p2: a2}))
loss1 = my_loss_1(p1,p2)
print('\nCorrelation coefficient:') # output: ()
print(sess.run(loss1, feed_dict={p1: a1, p2: a2}))
loss2 = my_loss_2(p1,p2)
print('\nCorrelation coefficient:') # output: (20,)
print(sess.run(loss2, feed_dict={p1: a1, p2: a2}))
loss3 = my_loss_3(p1,p2)
print('\nCorrelation coefficient:') # output: (20,100)
print(sess.run(loss3, feed_dict={p1: a1, p2: a2}))
Now, if we use a loss function based on the correlation coefficient
for each sequence, in theory, we will get only a single value for each
sequence, i.e., a tensor of shape (20,).
That's not true. the coefficient is something like
average((avg_label - label_value)(average_prediction - prediction_value)) /
(var(label_value)*var(prediction_value))
Remove the overall average and you are left the componenets of the correlation coefficient, per element of the sequence, which is the right shape.
You can plug in other correlation formulas as well, just stop before computing the single value.
Thanks a lot!
Well, I thought the coefficient is already the overall (averaged) metric over a sample sequence, but your solution makes sense, indeed.
Below, there is my running code (the summation in the denominator has also been changed to averaging now, otherwise the result would get smaller the longer the sequence is and this may not be as the overall loss is the mean over all losses). It works well when applied to real tasks (not shown here).
The only problem I still have is that the squeezing step at the beginning of the loss function is not so nice, but I was not able to find a nicer solution.
from keras import backend as K
from keras.losses import mean_squared_error
import numpy as np
import tensorflow as tf
def my_loss(seq1, seq2): # Correlation-based loss function
seq1 = K.squeeze(seq1, axis=-1) # To remove the last dimension
seq2 = K.squeeze(seq2, axis=-1) # To remove the last dimension
seq1_mean = K.mean(seq1, axis=-1, keepdims=True)
seq2_mean = K.mean(seq2, axis=-1, keepdims=True)
nominator = (seq1-seq1_mean) * (seq2-seq2_mean)
denominator = K.sqrt( K.mean(K.square(seq1-seq1_mean), axis=-1, keepdims=True) * K.mean(K.square(seq2-seq2_mean), axis=-1, keepdims=True) )
corr = nominator / (denominator + K.common.epsilon())
corr_loss = K.constant(1.) - corr
return corr_loss
# Test
sess = tf.Session()
# Input (20,100,1)
a1 = np.random.rand(20,100,1)
a2 = np.random.rand(20,100,1)
print('\nInput: ' + str(a1.shape))
p1 = K.placeholder(shape=a1.shape, dtype=tf.float32)
p2 = K.placeholder(shape=a1.shape, dtype=tf.float32)
loss0 = mean_squared_error(p1,p2)
print('\nMSE:') # output: (20,100)
print(sess.run(loss0, feed_dict={p1: a1, p2: a2}))
loss1 = my_loss(p1,p2)
print('\nCorrelation coefficient-based loss:') # output: (20,100)
print(sess.run(loss1, feed_dict={p1: a1, p2: a2}))