I'm trying to obtain a matrix, where each element is calculated as follows:
X = torch.ones(batch_size, dim)
X_ = torch.ones(batch_size, dim)
Y = torch.ones(batch_size, dim)
M = torch.zeros(batch_size, batch_size)
for i in range(batch_size):
for j in range(batch_size):
M[i, j] = ((X[i] - X_[i] * Y[j])**2).sum()
It's very slow to calculate M element-wise, is there any suggestion about how to use matrix multiplication to replace the for loops?
If you want to sum() over dim, you can "lift" your 2D problem to 3D and sum there:
M = ((X[:, None, :] - X_[:, None, :] * Y[None, ...])**2).sum(dim=2)
How it works:
X[:, None, :] and X_[:, None, :] are 3D of size (batch_size, 1, dim), and Y[None, ...] is of size (1, batch_size, dim).
When multiplying X_[:, None, :] * Y[None, ...] pytorch broadcasts the dimensions of size 1 to the appropriate dimension to get a result of size (batch_size, batch_size, dim).
Finally, you sum() only over the last dimension (dim=2) to get an output M of size (batch_size, batch_size).
The trick here is done by taking advantage of broadcasting.
I have recently started learning about Semantic Segmentation. I am trying to train a UNet for the same. My input is RGB 128x128x3 images. My masks are made up of 4 classes 0, 1, 2, 3 and are One-Hot Encoded with dimension 128x128x4.
def weighted_cce(y_true, y_pred):
weights = []
t_inf = tf.convert_to_tensor(1e9, dtype = 'float32')
t_zero = tf.convert_to_tensor(0, dtype = 'int64')
for i in range(0, 4):
l = tf.argmax(y_true, axis = -1) == i
n = tf.cast(tf.math.count_nonzero(l), 'float32') + K.epsilon()
weights = [batch_size/j for j in weights]
y_pred /= K.sum(y_pred, axis=-1, keepdims=True)
# clip to prevent NaN's and Inf's
y_pred = K.clip(y_pred, K.epsilon(), 1 - K.epsilon())
# calc
loss = y_true * K.log(y_pred) * weights
loss = -K.sum(loss, -1)
return loss
This is the loss function that I am using but it classifies every pixel as 2. What am I doing wrong?
You should have weights based on you entire data (unless your batch size is reasonably big so you have sort of stable weights).
If some class is underrepresented, with a small batch size, it will have near infinity weights.
If your target data is numpy array:
shp = y_train.shape
totalPixels = shp[0] * shp[1] * shp[2]
weights = np.sum(y_train, axis=(0, 1, 2)) #final shape (4,)
weights = totalPixels/weights
If your data is in a Sequence generator:
totalPixels = 0
counts = np.zeros((4,))
for i in range(len(generator)):
x, y = generator[i]
shp = y.shape
totalPixels += shp[0] * shp[1] * shp[2]
counts = counts + np.sum(y, axis=(0,1,2))
weights = totalPixels / counts
If your data is in a yield generator (you must know how many batches you have in an epoch):
for i in range(batches_per_epoch):
x, y = next(generator)
#the rest is equal to the Sequence example above
Attempt 1
I don't know if newer versions of Keras are able to handle this, but you can try the simplest approach first: simply call fit or fit_generator with the class_weight argument:
model.fit(...., class_weight = {0: weights[0], 1: weights[1], 2: weights[2], 3: weights[3]})
Attempt 2
Make a healthier loss function:
weights = weights.reshape((1,1,1,4))
kWeights = K.constant(weights)
def weighted_cce(y_true, y_pred):
yWeights = kWeights * y_pred #shape (batch, 128, 128, 4)
yWeights = K.sum(yWeights, axis=-1) #shape (batch, 128, 128)
loss = K.categorical_crossentropy(y_true, y_pred) #shape (batch, 128, 128)
wLoss = yWeights * loss
return K.sum(wLoss, axis=(1,2))
I have some neural network with following code snippets, note that batch_size == 1 and input_dim == output_dim:
net_in = tf.Variable(tf.zeros(shape = [batch_size, input_dim]), dtype=tf.float32)
input_placeholder = tf.compat.v1.placeholder(shape = [batch_size, input_dim], dtype=tf.float32)
assign_input = net_in.assign(input_placeholder)
# Some matmuls, activations, dropouts, normalizations...
net_out = tf.tanh(output_before_activation)
def loss_fn(output, input):
#input.shape = output.shape = (batch_size, input_dim)
output = tf.reshape(output, [input_dim,]) # shape them into 1d vectors
input = tf.reshape(input, [input_dim,])
return my_fn_that_only_takes_in_vectors(output, input)
# Create session, preprocess data ...
for epoch in epoch_num:
for batch in range(total_example_num // batch_size):
sess.run(assign_input, feed_dict = {input_placeholder : some_appropriate_numpy_array})
sess.run(optimizer.minimize(loss_fn(net_out, net_in)))
Currently the neural network above works fine, but it is very slow because it updates gradient every sample (batch size = 1). I would like to set batch size > 1, but my_fn_that_only_takes_in_vectors cannot accommodate matrices whose first dimension is not 1. Due to the nature of my custom loss, flattening the batch input into a vector of length (batch_size * input_dim) seems to not work.
How would I write my new custom loss_fn now that the input and output are N x input_dim where N > 1? In Keras this would not have been an issue because keras somehow takes the average of the gradients of each example in the batch. For my TensorFlow function, should I take each row as a vector individually, pass them to my_fn_that_only_takes_in_vectors, then take the average of the results?
You can use a function that computes the loss on the whole batch, and works independently on the batch size. Basically the operations are applied to the whole first dimension of the input (the first dimension represents the element number in the batch). Here is an example, I hope this helps to see how the operations are carried out:
def my_loss(y_true, y_pred):
dx2 = tf.math.squared_difference(y_true[:, 0], y_true[:, 2]) # shape (BatchSize, )
dy2 = tf.math.squared_difference(y_true[:, 1], y_true[:, 3]) # shape: (BatchSize, )
denominator = dx2 + dy2 # shape: (BatchSize, )
dst_vec = tf.math.squared_difference(y_true, y_pred) # shape: (Batch, n_labels)
numerator = tf.reduce_sum(dst_vec, axis=-1) # shape: (BatchSize,)
loss_vector = tf.cast(numerator / denominator, dtype="float32") # shape: (BatchSize,) this is a vector containing the loss of each element of the batch
loss = tf.reduce_sum(loss_vector ) #if you want to sum the losses
return loss
I am not sure whether you need to return the sum or the avg of the losses for the batch.
If you sum, make sure to use a validation dataset with same batch size, otherwise the loss is not comparable.
I would like to map a TensorFlow function on each vector corresponding to the depth channel of every pixel in a matrix with dimension [batch_size, H, W, n_channels].
In other words, for every image of size H x W that I have in the batch:
I extract some features maps F_k (whose number is n_channels) with the same size H x W (hence, the features maps all together are a tensor of shape [H, W, n_channels];
then, I wish to apply a custom function to the vector v_ij that is associated with the i-th row and j-th column of each feature map F_k, but explores the depth channel in its entirety (e.g. v has dimension [1 x 1 x n_channels]). Ideally, all of this would happen in parallel.
A picture to explain the process can be found below. The only difference with the picture is that both input and output "receptive fields" have size 1x1 (apply the function to each pixel independently).
This would be similar to applying a 1x1 convolution to the matrix; however, I need to apply a more general function over the depth channel, rather than a simple sum operation.
I think tf.map_fn() could be an option and I tried the following solution, where I recursively use tf.map_fn() to access the features associated with each pixel. However, this kind of seems sub-optimal, and most importantly it raises an error when trying to backpropagate the gradients.
Do you have any idea of the reason why this happens and how I should structure my code to avoid the error?
This is my current implementation of the function:
import tensorflow as tf
from tensorflow import layers
def apply_function_on_pixel_features(incoming):
# at first the input is [None, W, H, n_channels]
if len(incoming.get_shape()) > 1:
return tf.map_fn(lambda x: apply_function_on_pixel_features(x), incoming)
# here the input is [n_channels]
# apply some function that applies a transfomration and returns a vetor of the same size
output = my_custom_fun(incoming) # my_custom_fun() doesn't change the shape
return output
and the body of my code:
H = 128
W = 132
n_channels = 8
x1 = tf.placeholder(tf.float32, [None, H, W, 1])
x2 = layers.conv2d(x1, filters=n_channels, kernel_size=3, padding='same')
# now apply a function to the features vector associated to each pixel
x3 = apply_function_on_pixel_features(x2)
x4 = tf.nn.softmax(x3)
loss = cross_entropy(x4, labels)
optimizer = tf.train.AdamOptimizer(lr)
train_op = optimizer.minimize(loss) # <--- ERROR HERE!
Particularly, the error is the following:
File "/home/venvs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2481, in AddOp
File "/home/venvs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2509, in _AddOpInternal
File "/home/venvs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2547, in _MaybeAddControlDependency
AttributeError: 'NoneType' object has no attribute 'op'
The whole error stack and the code can be found here.
Thanks for the help,
Following #thushv89 suggestion, I added a possible solution to the problem. I still don't know why my previous code didn't work. Any insight on this would still be very appreciated.
#gabriele regarding having to depend on batch_size, have you tried doing it the following way? This function does not depend on batch_size. You can replace the map_fn with anything you like.
def apply_function_on_pixel_features(incoming):
# get input shape:
_, W, H, C = incoming.get_shape().as_list()
incoming_flat = tf.reshape(incoming, shape=[-1, C])
# apply function on every vector of shape [1, C]
out_matrix = tf.map_fn(lambda x: x+1, incoming_flat) # dimension remains unchanged
# go back to the input shape shape [None, W, H, C]
out_matrix = tf.reshape(out_matrix, shape=[-1, W, H, C])
return out_matrix
The full code of what I tested is as below.
import numpy as np
import tensorflow as tf
from tensorflow.keras.losses import categorical_crossentropy
def apply_function_on_pixel_features(incoming):
# get input shape:
_, W, H, C = incoming.get_shape().as_list()
incoming_flat = tf.reshape(incoming, shape=[-1])
# apply function on every vector of shape [1, C]
out_matrix = tf.map_fn(lambda x: x+1, incoming_flat) # dimension remains unchanged
# go back to the input shape shape [None, W, H, C]
out_matrix = tf.reshape(out_matrix, shape=[-1, W, H, C])
return out_matrix
H = 32
W = 32
x1 = tf.placeholder(tf.float32, [None, H, W, 1])
labels = tf.placeholder(tf.float32, [None, 10])
x2 = tf.layers.conv2d(x1, filters=1, kernel_size=3, padding='same')
# now apply a function to the features vector associated to each pixel
x3 = apply_function_on_pixel_features(x2)
x4 = tf.layers.flatten(x3)
x4 = tf.layers.dense(x4, units=10, activation='softmax')
loss = categorical_crossentropy(labels, x4)
optimizer = tf.train.AdamOptimizer(0.001)
train_op = optimizer.minimize(loss)
x = np.zeros(shape=(10, H, W, 1))
y = np.random.choice([0,1], size=(10, 10))
with tf.Session() as sess:
sess.run(train_op, feed_dict={x1: x, labels:y})
Following #thushv89 suggestion, I reshaped the array, applied the function and then reshaped it back (so to avoid the tf.map_fn recursion). I still don't know exactly why the previous code didn't work, but the current implementation allowed to propagate the gradients back to the previous layers. I'll leave it below, for whom might be interested:
def apply_function_on_pixel_features(incoming, batch_size):
# get input shape:
_, W, H, C = incoming.get_shape().as_list()
incoming_flat = tf.reshape(incoming, shape=[batch_size * W * H, C])
# apply function on every vector of shape [1, C]
out_matrix = my_custom_fun(incoming_flat) # dimension remains unchanged
# go back to the input shape shape [None, W, H, C]
out_shape = tf.convert_to_tensor([batch_size, W, H, C])
out_matrix = tf.reshape(out_matrix, shape=out_shape)
return out_matrix
Notice that now I needed to give the batch size to correctly reshape the tensor because TensorFlow would complain if I gave None or -1 as a dimension.
Any comments and insight on the above code would still be very appreciated.
My Question is for the below equation
The equation above of single vector. But if I have a batches of vectors, like my X and Y having the dimension of (None, 32), then there will some issue.
Also remember in coding environment, one example inside the batch is already in transpose shape. My problem is when we need to do transpose on [None, 32] the code will not accept and transpose for None dimenation.So I solve it in the following way:
def Cosine_similarity(X, Y, feature_dim):
L = tf.compat.v1.initializers.glorot_normal()(shape=[feature_dim, feature_dim])
out1 = tf.matmul(X, L)
out2 = tf.matmul(Y, L)
out_numerator = tf.reduce_sum(tf.multiply(out1, out2), axis = 1)
out3 = tf.reduce_sum(tf.multiply(out1, out1), axis = 1)
out3 = tf.sqrt(out3)
out4 = tf.reduce_sum(tf.multiply(out2, out2), axis = 1)
out4 = tf.sqrt(out4)
out_denominator = tf.multiply(out3, out4)
final_out = tf.divide(out_numerator, out_denominator)
return final_out
And this is coming from the following:
<XA.YA> = (XA)^T (YA)
= tf.reduce_sum(tf.multiply((X A) , (Y A)), axis = 1)
So I just to know if this implementation is right? Or you can correct me if I am missing something
Not sure I understand your concern for the (none) dimension.
If I understand correctly the cosine similarity between two identically shaped matrix X and Y ([batch, target_dim]) is just a matrix multiplication of X * Y^T with some L2 normalization. Note X would be your out1 and Y would be your out2.
def Cosine_similarity(x, y, A):
"""Pair-wise Cosine similarity.
First `x` and `y` are transformed by A.
`X = xA^T` with shape [batch, target_dim],
`Y = yA^T` with shape [batch, target_dim].
x: shaped [batch, feature_dim].
y: shaped [batch, feature_dim].
A: shaped [targte_dim, feature_dim]. Transformation matrix to project
from `feature_dim` to `target_dim`.
A cosine similarity matrix shaped [batch, batch]. The entry
at (i, j) is the cosine similarity value between vector `X[i, :]` and
`Y[j, :]` where `X`, `Y` are the transformed `x` and y` by `A`
respectively. In the other word, entry at (i, j) is the pair-wise
cosine similarity value between the i-th example of `x` and the j-th
example of `y`.
x = tf.matmul(x, A, transpose_b=True)
y = tf.matmul(y, A, transpose_b=True)
x_norm = tf.nn.l2_normalize(x, axis=-1)
y_norm = tf.nn.l2_normalize(y, axis=-1)
y_norm_trans = tf.transpose(y_norm, [1, 0])
sim = tf.matmul(x_norm, y_norm_trans)
return sim
import numpy as np
feature_dim = 8
target_dim = 4
batch_size = 2
x = tf.placeholder(tf.float32, shape=(None, dim))
y = tf.placeholder(tf.float32, shape=(None, dim))
A = tf.placeholder(tf.float32, shape=(target_dim, feature_dim))
sim = Cosine_similarity(x, y, A)
with tf.Session() as sess:
x, y, sim = sess.run([x, y, sim], feed_dict={
x: np.ones((batch_size, feature_dim)),
y: np.random.rand(batch_size, feature_dim),
A: np.random.rand(target_dim, feature_dim)})
print 'x=\n', x
print 'y=\n', y
print 'sim=\n', sim
[[ 1. 1. 1. 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1. 1. 1. 1.]]
[[ 0.01471654 0.76577073 0.97747731 0.06429122 0.91344446 0.47987637
0.09899797 0.773938 ]
[ 0.8555786 0.43403915 0.92445409 0.03393625 0.30154493 0.60895061
0.1233703 0.58597666]]
[[ 0.95917791 0.98181278]
[ 0.95917791 0.98181278]]
I want to implement the tf.nn.softmax() for the selected two dimension of a tensor with shape (batch_size=?, height, width, channel).
But it seems not possible for tf.nn.softmax() to receive 2 axis in the same time. Using tf.softmax(tensor, axis=[1, 2]) will raise axis error in tensorflow.
How can I implement this elegantly and in vectorized mode? thx :D
Instead of passing two dimensions at a time, I would first reshape the input accordingly, e.g.:
array = tf.constant([[1., 2.], [3., 4.]])
tf.nn.softmax(array, axis=0) # Calculate for axis 0
tf.nn.softmax(array, axis=1) # Calculate for axis 1
tf.nn.softmax(tf.reshape(array, [-1])) # Calculate for both axes
You can do
array = np.random.rand(1, 2, 2, 1)
s1 = tf.nn.softmax(array, axis=1)
s2 = tf.nn.softmax(array, axis=2)
rs = tf.reduce_sum([s1, s2], 0)
This will return tensor of same shape as initial array
It can be done with keras activation functions:
# logits has shape [BS, H, W, CH]
prob = tf.keras.activations.softmax(logits, axis=[1, 2])
# prob has shape [BS, H, W, CH]
check = tf.reduce_sum(prob, axis=[1, 2])
# check is tensor of ones with shape [BS, CH]