Implementing the Cosine similarity in tensor flow - tensorflow

My Question is for the below equation
The equation above of single vector. But if I have a batches of vectors, like my X and Y having the dimension of (None, 32), then there will some issue.
Also remember in coding environment, one example inside the batch is already in transpose shape. My problem is when we need to do transpose on [None, 32] the code will not accept and transpose for None dimenation.So I solve it in the following way:
def Cosine_similarity(X, Y, feature_dim):
L = tf.compat.v1.initializers.glorot_normal()(shape=[feature_dim, feature_dim])
out1 = tf.matmul(X, L)
out2 = tf.matmul(Y, L)
out_numerator = tf.reduce_sum(tf.multiply(out1, out2), axis = 1)
out3 = tf.reduce_sum(tf.multiply(out1, out1), axis = 1)
out3 = tf.sqrt(out3)
out4 = tf.reduce_sum(tf.multiply(out2, out2), axis = 1)
out4 = tf.sqrt(out4)
out_denominator = tf.multiply(out3, out4)
final_out = tf.divide(out_numerator, out_denominator)
return final_out
And this is coming from the following:
<XA.YA> = (XA)^T (YA)
= tf.reduce_sum(tf.multiply((X A) , (Y A)), axis = 1)
So I just to know if this implementation is right? Or you can correct me if I am missing something

Not sure I understand your concern for the (none) dimension.
If I understand correctly the cosine similarity between two identically shaped matrix X and Y ([batch, target_dim]) is just a matrix multiplication of X * Y^T with some L2 normalization. Note X would be your out1 and Y would be your out2.
def Cosine_similarity(x, y, A):
"""Pair-wise Cosine similarity.
First `x` and `y` are transformed by A.
`X = xA^T` with shape [batch, target_dim],
`Y = yA^T` with shape [batch, target_dim].
Args:
x: shaped [batch, feature_dim].
y: shaped [batch, feature_dim].
A: shaped [targte_dim, feature_dim]. Transformation matrix to project
from `feature_dim` to `target_dim`.
Returns:
A cosine similarity matrix shaped [batch, batch]. The entry
at (i, j) is the cosine similarity value between vector `X[i, :]` and
`Y[j, :]` where `X`, `Y` are the transformed `x` and y` by `A`
respectively. In the other word, entry at (i, j) is the pair-wise
cosine similarity value between the i-th example of `x` and the j-th
example of `y`.
"""
x = tf.matmul(x, A, transpose_b=True)
y = tf.matmul(y, A, transpose_b=True)
x_norm = tf.nn.l2_normalize(x, axis=-1)
y_norm = tf.nn.l2_normalize(y, axis=-1)
y_norm_trans = tf.transpose(y_norm, [1, 0])
sim = tf.matmul(x_norm, y_norm_trans)
return sim
import numpy as np
feature_dim = 8
target_dim = 4
batch_size = 2
x = tf.placeholder(tf.float32, shape=(None, dim))
y = tf.placeholder(tf.float32, shape=(None, dim))
A = tf.placeholder(tf.float32, shape=(target_dim, feature_dim))
sim = Cosine_similarity(x, y, A)
with tf.Session() as sess:
x, y, sim = sess.run([x, y, sim], feed_dict={
x: np.ones((batch_size, feature_dim)),
y: np.random.rand(batch_size, feature_dim),
A: np.random.rand(target_dim, feature_dim)})
print 'x=\n', x
print 'y=\n', y
print 'sim=\n', sim
Result:
x=
[[ 1. 1. 1. 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1. 1. 1. 1.]]
y=
[[ 0.01471654 0.76577073 0.97747731 0.06429122 0.91344446 0.47987637
0.09899797 0.773938 ]
[ 0.8555786 0.43403915 0.92445409 0.03393625 0.30154493 0.60895061
0.1233703 0.58597666]]
sim=
[[ 0.95917791 0.98181278]
[ 0.95917791 0.98181278]]

Related

Custom Loss Function Leads to High MSE and an offset in the output Keras

I am training a neural network for time series regression. The model is
####################################################################################################################
# Define ANN Model
# define two sets of inputs
acc = layers.Input(shape=(3,1,))
gyro = layers.Input(shape=(3,1,))
# the first branch operates on the first input
x = Conv1D(256, 1, activation='relu')(acc)
x = Conv1D(128, 1, activation='relu')(x)
x = Conv1D(128, 1, activation='relu')(x)
x = MaxPooling1D(pool_size=3)(x)
x = Model(inputs=acc, outputs=x)
# the second branch opreates on the second input
y = Conv1D(256, 1, activation='relu')(gyro)
y = Conv1D(128, 1, activation='relu')(y)
y = Conv1D(128, 1, activation='relu')(y)
y = MaxPooling1D(pool_size=3)(y)
y = Model(inputs=gyro, outputs=y)
# combine the output of the three branches
combined = layers.concatenate([x.output, y.output])
# combined outputs
z = Bidirectional(LSTM(128, dropout=0.25, return_sequences=False,activation='tanh'))(combined)
z = Reshape((256,1),input_shape=(128,))
z = Bidirectional(LSTM(128, dropout=0.25, return_sequences=False,activation='tanh'))(combined)
#z = Dense(10, activation="relu")(z)
z = Flatten()(z)
z = Dense(4, activation="linear")(z)
model = Model(inputs=[x.input, y.input], outputs=z)
model.compile(loss=loss, optimizer = tf.keras.optimizers.Adam(),metrics=['mse'],run_eagerly=True)
I have tried to implement a custom loss function (based on different papers).
Math
The error will calculated as follows:
y_pred = [w x y z]
y_true = [w1 x1 y1 z1]
error = 2 * acos(w*w1 + x*x1 + y*y1 + z*z1)
Based on this formula I wrote the custom loss function:
def loss(y_true, y_pred):
z = y_true * (y_pred )
wtot = tf.reduce_sum(z,axis=1)
error = 2*tf.math.acos(K.clip(tf.math.sqrt(wtot*wtot), -1.,1.))
return error
But while the loss value is decreasing the MSE increased and I can see an offset in the output which will grow by the number of epochs. I understand that we do not optimize this Network for MSE but based on mathematics the MSE must be reduced or converge to some value near 1.
Orange is the Target/Reference
Blue is the Network ouptut
for 1 epoch
for 10 epochs
for 50 epochs
To solve this problem, I used geometric distance equation to find the loss value
def QQuat_mult(y_true, y_pred):
"""
The function takes in two quaternions, normalizes the first one, and then multiplies the two
quaternions together.
The function returns the absolute value of the vector part of the resulting quaternion.
The reason for this is that the vector part of the quaternion is the axis of rotation, and the
absolute value of the vector part is the angle of rotation.
The reason for normalizing the first quaternion is that the first quaternion is the predicted
quaternion, and the predicted quaternion is not always normalized.
The reason for returning the absolute value of the vector part of the resulting quaternion is that
the angle of rotation is always positive.
The reason for returning the vector part of the resulting quaternion is that the axis of rotation is
always a vector.
:param y_true: the ground truth quaternion
:param y_pred: the predicted quaternion
:return: The absolute value of the quaternion multiplication of the predicted and true quaternions.
"""
y_pred = tf.linalg.normalize(y_pred, ord='euclidean', axis=1)[0]
w0, x0, y0, z0 = tf.split(
(tf.multiply(y_pred, [1., -1, -1, -1]),), num_or_size_splits=4, axis=-1)
w1, x1, y1, z1 = tf.split(y_true, num_or_size_splits=4, axis=-1)
w = w0*w1 - x0*x1 - y0*y1 - z0*z1
w = tf.subtract(w, 1)
x = w0*x1 + x0*w1 + y0*z1 - z0*y1
y = w0*y1 - x0*z1 + y0*w1 + z0*x1
z = w0*z1 + x0*y1 - y0*x1 + z0*w1
loss = tf.abs(tf.concat(values=[w, x, y, z], axis=-1))
return tf.reduce_mean(loss)

Backpropagating gradients through nested tf.map_fn

I would like to map a TensorFlow function on each vector corresponding to the depth channel of every pixel in a matrix with dimension [batch_size, H, W, n_channels].
In other words, for every image of size H x W that I have in the batch:
I extract some features maps F_k (whose number is n_channels) with the same size H x W (hence, the features maps all together are a tensor of shape [H, W, n_channels];
then, I wish to apply a custom function to the vector v_ij that is associated with the i-th row and j-th column of each feature map F_k, but explores the depth channel in its entirety (e.g. v has dimension [1 x 1 x n_channels]). Ideally, all of this would happen in parallel.
A picture to explain the process can be found below. The only difference with the picture is that both input and output "receptive fields" have size 1x1 (apply the function to each pixel independently).
This would be similar to applying a 1x1 convolution to the matrix; however, I need to apply a more general function over the depth channel, rather than a simple sum operation.
I think tf.map_fn() could be an option and I tried the following solution, where I recursively use tf.map_fn() to access the features associated with each pixel. However, this kind of seems sub-optimal, and most importantly it raises an error when trying to backpropagate the gradients.
Do you have any idea of the reason why this happens and how I should structure my code to avoid the error?
This is my current implementation of the function:
import tensorflow as tf
from tensorflow import layers
def apply_function_on_pixel_features(incoming):
# at first the input is [None, W, H, n_channels]
if len(incoming.get_shape()) > 1:
return tf.map_fn(lambda x: apply_function_on_pixel_features(x), incoming)
else:
# here the input is [n_channels]
# apply some function that applies a transfomration and returns a vetor of the same size
output = my_custom_fun(incoming) # my_custom_fun() doesn't change the shape
return output
and the body of my code:
H = 128
W = 132
n_channels = 8
x1 = tf.placeholder(tf.float32, [None, H, W, 1])
x2 = layers.conv2d(x1, filters=n_channels, kernel_size=3, padding='same')
# now apply a function to the features vector associated to each pixel
x3 = apply_function_on_pixel_features(x2)
x4 = tf.nn.softmax(x3)
loss = cross_entropy(x4, labels)
optimizer = tf.train.AdamOptimizer(lr)
train_op = optimizer.minimize(loss) # <--- ERROR HERE!
Particularly, the error is the following:
File "/home/venvs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2481, in AddOp
self._AddOpInternal(op)
File "/home/venvs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2509, in _AddOpInternal
self._MaybeAddControlDependency(op)
File "/home/venvs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2547, in _MaybeAddControlDependency
op._add_control_input(self.GetControlPivot().op)
AttributeError: 'NoneType' object has no attribute 'op'
The whole error stack and the code can be found here.
Thanks for the help,
G.
Update:
Following #thushv89 suggestion, I added a possible solution to the problem. I still don't know why my previous code didn't work. Any insight on this would still be very appreciated.
#gabriele regarding having to depend on batch_size, have you tried doing it the following way? This function does not depend on batch_size. You can replace the map_fn with anything you like.
def apply_function_on_pixel_features(incoming):
# get input shape:
_, W, H, C = incoming.get_shape().as_list()
incoming_flat = tf.reshape(incoming, shape=[-1, C])
# apply function on every vector of shape [1, C]
out_matrix = tf.map_fn(lambda x: x+1, incoming_flat) # dimension remains unchanged
# go back to the input shape shape [None, W, H, C]
out_matrix = tf.reshape(out_matrix, shape=[-1, W, H, C])
return out_matrix
The full code of what I tested is as below.
import numpy as np
import tensorflow as tf
from tensorflow.keras.losses import categorical_crossentropy
def apply_function_on_pixel_features(incoming):
# get input shape:
_, W, H, C = incoming.get_shape().as_list()
incoming_flat = tf.reshape(incoming, shape=[-1])
# apply function on every vector of shape [1, C]
out_matrix = tf.map_fn(lambda x: x+1, incoming_flat) # dimension remains unchanged
# go back to the input shape shape [None, W, H, C]
out_matrix = tf.reshape(out_matrix, shape=[-1, W, H, C])
return out_matrix
H = 32
W = 32
x1 = tf.placeholder(tf.float32, [None, H, W, 1])
labels = tf.placeholder(tf.float32, [None, 10])
x2 = tf.layers.conv2d(x1, filters=1, kernel_size=3, padding='same')
# now apply a function to the features vector associated to each pixel
x3 = apply_function_on_pixel_features(x2)
x4 = tf.layers.flatten(x3)
x4 = tf.layers.dense(x4, units=10, activation='softmax')
loss = categorical_crossentropy(labels, x4)
optimizer = tf.train.AdamOptimizer(0.001)
train_op = optimizer.minimize(loss)
x = np.zeros(shape=(10, H, W, 1))
y = np.random.choice([0,1], size=(10, 10))
with tf.Session() as sess:
tf.global_variables_initializer().run()
sess.run(train_op, feed_dict={x1: x, labels:y})
Following #thushv89 suggestion, I reshaped the array, applied the function and then reshaped it back (so to avoid the tf.map_fn recursion). I still don't know exactly why the previous code didn't work, but the current implementation allowed to propagate the gradients back to the previous layers. I'll leave it below, for whom might be interested:
def apply_function_on_pixel_features(incoming, batch_size):
# get input shape:
_, W, H, C = incoming.get_shape().as_list()
incoming_flat = tf.reshape(incoming, shape=[batch_size * W * H, C])
# apply function on every vector of shape [1, C]
out_matrix = my_custom_fun(incoming_flat) # dimension remains unchanged
# go back to the input shape shape [None, W, H, C]
out_shape = tf.convert_to_tensor([batch_size, W, H, C])
out_matrix = tf.reshape(out_matrix, shape=out_shape)
return out_matrix
Notice that now I needed to give the batch size to correctly reshape the tensor because TensorFlow would complain if I gave None or -1 as a dimension.
Any comments and insight on the above code would still be very appreciated.

Network diverges with NaN in simple TensorFlow example

I am trying to follow the example from Stanford series on TF by implementing a quadratic linear regression.
Y = W*X*X + u*X + b
The dataset can be found in Cengage dataset; and the code is the following:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import xlrd
DATA = 'data\\slr05.xls'
# Read data
data = xlrd.open_workbook(DATA, encoding_override='utf-8')
sheet = data.sheet_by_index(0)
dataset = np.asarray([sheet.row_values(i) for i in range(1, sheet.nrows)])
n_samples = sheet.nrows - 1
X = tf.placeholder('float', name = 'X')
Y = tf.placeholder('float', name = 'Y')
W = tf.Variable(0.0, name = 'weights')
b = tf.Variable(0.0, name = 'bias')
u = tf.Variable(0.0, name = 'u_weight')
Y_ = X*X*W + X*u + b
loss = tf.square(Y - Y_, name = 'loss')
optimizer = tf.train.GradientDescentOptimizer(0.0001).minimize(loss)
init = tf.global_variables_initializer()
loss_average = []
# Start the Session
with tf.Session() as sess:
sess.run(init)
for i in range(10):
for x, y in dataset:
print(sess.run([optimizer, Y_, W, b, u, X, Y], feed_dict = {X:x, Y:y}))
loss_average.append(sess.run(loss, feed_dict = {X:x, Y:y}))
The final W, b, and u values that I get are nan. I tried to check step-by-step why this is happening. So, in the output below I have included the [optimizer, Y_, W, b, u, X, Y]
and after a few row iterations I get:
[None, 3.9304674e+33, -1.0271335e+33, -7.7725354e+29, -2.8294217e+31, 36.2, 41.]
[None, -1.619979e+36, inf, 3.2321854e+32, 1.2834338e+34, 39.7, 147]
Apparently, during optimization the W ends up to 'inf', which breaks down the regression output.
Any, idea what have I done wrong?
You have an exploding gradient problem here. That's because your X and Y, and consequently difference values are in the magnitude of 101, so the square differences (you loss) are of magnitude 102. When you introduce the X2 into the regression, your difference values will be in the magnitude of 102, their squares of magnitude 104. Therefore the gradients will be much larger and the network diverges violently.
To correct for this, you can reduce the learning rate by a factor of 10-3, to put the gradients roughly back where they were, and lo and behold, this code (tested):
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import xlrd
DATA = 'slr05.xls'
# Read data
data = xlrd.open_workbook(DATA, encoding_override='utf-8')
sheet = data.sheet_by_index(0)
dataset = np.asarray([sheet.row_values(i) for i in range(1, sheet.nrows)])
n_samples = sheet.nrows - 1
X = tf.placeholder('float', name = 'X')
Y = tf.placeholder('float', name = 'Y')
W = tf.Variable(0.0, name = 'weights')
b = tf.Variable(0.0, name = 'bias')
u = tf.Variable(0.0, name = 'u_weight')
Y_ = X*X*W + X*u + b
#Y_ = X * u + b
loss = tf.square(Y - Y_, name = 'loss')
optimizer = tf.train.GradientDescentOptimizer(0.0000001).minimize(loss)
init = tf.global_variables_initializer()
loss_average = []
# Start the Session
with tf.Session() as sess:
sess.run(init)
for i in range(10):
for x, y in dataset:
print(sess.run([optimizer, loss, Y_, W, b, u, X, Y], feed_dict = {X:x, Y:y}))
loss_average.append(sess.run(loss, feed_dict = {X:x, Y:y}))
will obediently and orderly converge, as nice networks do, outputting (last 5 lines only):
[None, 1313.2705, 9.760924, 0.06911032, 0.0014081484, 0.010015297, array(11.9, dtype=float32), array(46., dtype=float32)]
[None, 1174.7083, 7.7259817, 0.06986606, 0.0014150032, 0.010087272, array(10.5, dtype=float32), array(42., dtype=float32)]
[None, 1217.4297, 8.1083145, 0.07066501, 0.0014219815, 0.01016194, array(10.7, dtype=float32), array(43., dtype=float32)]
[None, 657.74097, 8.353538, 0.07126329, 0.0014271108, 0.010217336, array(10.8, dtype=float32), array(34., dtype=float32)]
[None, 299.5538, 1.6923765, 0.07134304, 0.0014305722, 0.010233952, array(4.8, dtype=float32), array(19., dtype=float32)]

How to optimize a variable with dynamic shape?

Based on following code, m0 is a constant with shape (3,1) but got changed its shape inside the while loop.
So after the while loop, the Tensorflow doesn't know its shape, but I use set_shape to change it to correct shape.
However, when you run it through optimization(take gradients), it pop a error:
Incompatible shapes between op input and calculated input gradient. Forward operation: while_29/Enter_1. Input index: 0. Original input shape: (3, 1). Calculated input gradient shape: (15, 1)
It seems like the gradients still treat the shape as (3,1) but our set_shape change it to shape (15,1). Could anyone please tell me how to fix?
sess = tf.Session()
i0 = tf.constant(0)
m0 = tf.ones([3, 1])
x = tf.get_variable('www', shape=(3,1), initializer=tf.zeros_initializer)
loop = 5
def _cond(i0, m0):
return tf.less(i0, loop-1)
def _res(i0, m0):
n = tf.ones([3, 1]) + x
m0 = tf.concat([m0, n], axis=0)
return i0+1, m0
i0, m0 = tf.while_loop(
_cond, _res, loop_vars=[i0, m0],
shape_invariants=[i0.get_shape(), tf.TensorShape([None, 1])])
m0.set_shape([loop*3,1])
opt = tf.train.AdagradOptimizer(1)
grad = opt.compute_gradients(m0)
sess.run(tf.global_variables_initializer())
print(sess.run(grad))
The short answer is that your problem can be efficiently solved by creating m0 like this:
m0 = 1 + tf.tile( x, (loop,1) )
However, the answer to the underlying problem you have above is that you are growing m0 in a loop. However, you know the size that you want m0 to take, so if you really must use a while_loop, then you should use a TensorArray. Something like this:
def mystack(x, n):
loop_vars = [
tf.constant(0, tf.int32),
tf.TensorArray(x.dtype, size=n),
]
_, fx = tf.while_loop(
lambda j, _: j < n,
lambda j, result: (j + 1, result.write(j, 1+x)),
loop_vars
)
return tf.reshape( fx.stack(), (-1,1) )
x = tf.constant( numpy.random.randn(3,1), tf.float32 )
loop = 5
m = mystack(x,loop)
with tf.Session() as sess:
print(sess.run(m).shape)

Tensorflow: how to deal with dynamic shape trying to tile and concatenate two tensor?

I've two tensor, for example x = [1,2] and y=[3], and I want replicate the last along an axis of the other, obtaining z = [[1,3],[2,3]]. Ideally in tensorflow:
x = tf.placeholder(shape=[None, 2], dtype = tf.float32)
y = tf.placeholder(shape=[1], dtype = tf.float32)
z = tf.concat(x, tf.tile(y, [ x.shape[0] ]) , 1)
The problem is that x placeholder first dimension is not determined, how can I fix this?
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
x = tf.placeholder(shape=[None, 2], dtype = tf.float32)
y = tf.placeholder(shape=[1], dtype = tf.float32)
dim = tf.shape(x)[0]
y1 = tf.expand_dims(y, axis = 1)
y1 = tf.tile(y1, [dim, 1])
z = tf.concat((x, y1), axis = 1)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
z_val = sess.run(z, feed_dict = {x:[[2,5],[5,7],[8,9]], y:[3]})
print(z_val)
Output:
[[ 2. 5. 3.]
[ 5. 7. 3.]
[ 8. 9. 3.]]