Given a tensor (with numbers >= 0) in tensorflow. I need to shift all zeros to the end of each line and remove columns that only include 0's.
E.g.
0 2 3 4
0 1 0 5
2 3 1 0
should be transformed to
2 3 4
1 5 0
2 3 1
Is there any nice way to do this in tensorflow? Btw, the order of the non-zero elements should be the same (no sorting).
Ragged tensor method
The best way
def rm_zeros(pred):
pred = tf.cast(pred, tf.float32)
# num_non_zero element in every row
num_non_zero = tf.count_nonzero(pred, -1) #[3 2 3]
# flat input and remove all zeros
flat_pred = tf.reshape(pred, [-1])
mask = tf.math.logical_not(tf.equal(flat_pred, tf.zeros_like(flat_pred)))
flat_pred_without_zero = tf.boolean_mask(flat_pred, mask) #[2. 3. 4. 1. 5. 2. 3. 1.]
# create a ragged tensor and change it to tensor, rows will be padded to max length
ragged_pred = tf.RaggedTensor.from_row_lengths(values=flat_pred_without_zero, row_lengths=num_non_zero)
paded_pred = ragged_pred.to_tensor(default_value=0.)
return paded_pred
a = tf.constant([[0, 2, 3, 4],[0, 1, 0, 5],[2, 3, 1, 0]])
print(rm_zeros(a))
output
tf.Tensor(
[[2. 3. 4.]
[1. 5. 0.]
[1. 2. 3.]], shape=(3, 3), dtype=float32)
Sorted method
If you don't mind the original data get sorted, the code below might be helpful. Although it's not the best solution.
The idea here is
1. change all zeros to infinity
2. sort the tensor
3. change all infinity back to zeros
4. slice the tensor to get minimal padding
def rm_zeros_sorted(input):
input = tf.cast(input, tf.float32)
# 1. change all zeros to infinity
zero_to_inf = tf.where(tf.equal(input, tf.zeros_like(input)), np.inf*tf.ones_like(input), input)
# 2. sort the tensor
input_sorted = tf.sort(zero_to_inf, axis=-1, direction='ASCENDING')
# 3. change all infinity back to zeros
inf_to_zero = tf.where(tf.math.is_inf(input_sorted), tf.zeros_like(input_sorted), input_sorted)
# 4. slice the tensor to get minimal padding
num_non_zero = tf.count_nonzero(inf_to_zero, -1)
max_non_zero = tf.reduce_max(num_non_zero)
remove_useless_zero = inf_to_zero[..., 0:max_non_zero]
return remove_useless_zero
a = tf.constant([[0, 2, 3, 4],[0, 1, 0, 5],[2, 3, 1, 0]])
print(rm_zeros_sorted(a))
output
tf.Tensor(
[[2. 3. 4.]
[1. 5. 0.]
[1. 2. 3.]], shape=(3, 3), dtype=float32)
The code below gets the trick done, although I'm sure that there are more elegant solutions possible and I'm curious to see those. The annoying part is that you have different amounts of zeros for each row.
a = tf.constant([[0, 2, 3, 4],[0, 1, 0, 5],[2, 3, 1, 0]])
boolean_mask = tf.logical_not(tf.equal(a, tf.zeros_like(a)))
# all the non-zero values in a flat tensor
non_zero_values = tf.gather_nd(a, tf.where(boolean_mask))
# number of non-zero values in each row
n_non_zero = tf.reduce_sum(tf.cast(boolean_mask, tf.int64), axis=-1)
# max number of non-zeros -> this will be the padding length
max_non_zero = tf.reduce_max(n_non_zero).numpy()
(Here it gets ugly)
# Split the tensor into flat tensors with the non-zero values of each row
rows = tf.split(non_zero_values, n_non_zero)
# Pad with zeros wherever necessary and recombine into a single tensor
tf.stack([tf.pad(r, paddings=[[0, max_non_zero - r.get_shape().as_list()[0]]]) for r in rows])
Produces the desired result:
<tf.Tensor: id=49, shape=(3, 3), dtype=int32, numpy=
array([[2, 3, 4],
[1, 5, 0],
[2, 3, 1]], dtype=int32)>
def shift_zeros(data, mask):
data_flat = tf.boolean_mask(data, mask)
nonzero_lens = tf.reduce_sum(tf.cast(mask, dtype=tf.int32), axis=-1)
nonzero_mask = tf.sequence_mask(nonzero_lens, maxlen=tf.shape(mask)[-1])
nonzero_data = tf.scatter_nd(tf.cast(tf.where(nonzero_mask), dtype=tf.int32), data_flat, shape=tf.shape(data))
return nonzero_data
Related
Can anyone help me understand masking a 3D input (technically 4D) in MultiHeadAttention?
My original dataset consists of timeseries in the form of:
Inputs: (samples, horizon, features) ~> (8, 4, 2) ~> K, V, Q during inference
Targets: (samples, horizon, features) ~> (8, 4, 2) ~> Q during training
Labels: (sample, horizon, features) ~> (1, 4, 2)
Essentially I'm taking 8 samples of timeseries data and ultimately outputting 1 sample in the same format. Targets are horizon-shifted values of Inputs and fed into an encoder-only Transformer model (Q, K, V as shown above).
In order to best approximate the single output sample (which is identical to the last sample in Targets), I need to run full attention on the horizons of each sample and causal attention between samples. Once the data has been run through the encoder, it is sent to an EinsumDense layer which reduces the (8, 4, 2) encoder output into (1, 4, 2). In order for all this to work, I need to inject a 4th dimension on my data, so Inputs and Targets are formatted as (1, 8, 4, 2).
So getting to my actual question, how do I generate the masking for the encoder? After some digging around through errors I noticed that the shape of the tensor that MHA uses for masking the softmax is formatted (1, 1, 8, 4, 8, 4) which makes me believe it's (B, H, TS, TH, SS, SH) where:
B=batch
H=heads
TS=target samples
TH=target horizon
SS=source samples
SH=source horizon
I gather this notion from the docs only because of the attention_output description:
...where T is for target sequence shapes
Assuming this to be the case, is the following a reasonable mask, or is there a more appropriate method:
sample_mask = tf.linalg.band_part(tf.ones((samples, samples)), -1, 0)
horizon_mask = tf.ones((horizon, horizon))
encoder_mask = (
sample_mask[:, tf.newaxis, :, tf.newaxis]
* horizon_mask[tf.newaxis, :, tf.newaxis, :]
)
it is masking you can fancy it since data are contained in many fashions nothing wrong with it but I am trying to use the Tensorflow methods please see the result they are on the same dimensions. Tensorflow Masking layer
Sample: Simply identical masking values with target shapes you become observers for the solutions, proved with eyes fashions improved governance.
import tensorflow as tf
import matplotlib.pyplot as plt
start = 3
limit = 25
delta = 3
sample = tf.range(start, limit, delta)
sample = tf.cast( sample, dtype=tf.int64 )
sample = tf.constant( sample, shape=( 8, 1 ) )
horizon = tf.random.uniform(shape=[1, 4], minval=5, maxval=10, dtype=tf.int64)
features = tf.random.uniform(shape=[1, 1, 2], minval=-5, maxval=+5, dtype=tf.int64)
temp = tf.math.multiply(sample, horizon)
temp = tf.expand_dims(temp, axis=2)
input = tf.math.multiply( temp, features )
print( "input: " )
print( input )
n_samples = 8
n_horizon = 4
n_features = 2
sample_mask = tf.linalg.band_part(tf.ones((n_samples, n_samples)), -1, 0)
horizon_mask = tf.ones((n_horizon, n_horizon))
encoder_mask = (
sample_mask[:, tf.newaxis, :, tf.newaxis]
* horizon_mask[tf.newaxis, :, tf.newaxis, :]
)
print( encoder_mask )
masking_layer = tf.keras.layers.Masking(mask_value=50, input_shape=(n_horizon, n_features))
print( masking_layer(input) )
img_1 = tf.keras.preprocessing.image.array_to_img(
tf.constant( tf.constant( input[:,:,1], shape=(8, 4, 1) ), shape=(8, 4, 1) ),
data_format=None,
scale=True
)
img_2 = tf.keras.preprocessing.image.array_to_img(
tf.constant( masking_layer(input)[:,:,0], shape=(8, 4, 1) ),
data_format=None,
scale=True
)
plt.figure(figsize=(1, 2))
plt.title("🧸")
plt.subplot(1, 2, 1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(img_1)
plt.xlabel("Input (8, 4, 2), left")
plt.subplot(1, 2, 2)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(img_2)
plt.xlabel("Masks (8, 4, 2), left")
plt.show()
Output: Input tensor we created from table matched features.
[[ -960 0]
[-1080 0]
[ -960 0]
[ -960 0]]], shape=(8, 4, 2), dtype=int64)
Output: The question - masking methods.
[[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
...
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]]]], shape=(8, 4, 8, 4), dtype=float32)
Output: The masking_layer = tf.keras.layers.Masking(mask_value=50, input_shape=(n_horizon, n_features))
[[ -840 0]
[ -945 0]
[ -840 0]
[ -840 0]]
[[ -960 0]
[-1080 0]
[ -960 0]
[ -960 0]]], shape=(8, 4, 2), dtype=int64)
I am doing the image semantic segmentation job with unet. I am confused with the last layers for pixel classification. The Unet code is like this:
...
reshape = Reshape((n_classes,self.img_rows * self.img_cols))(conv9)
permute = Permute((2,1))(reshape)
activation = Activation('softmax')(permute)
model = Model(input = inputs, output = activation)
return model
...
Can I just reshape without using Permute like this?
reshape = Reshape((self.img_rows * self.img_cols, n_classes))(conv9)
Updated:
I found the training result is not right when when using the directly reshape way:
reshape = Reshape((self.img_rows * self.img_cols, n_classes))(conv9) // the loss is not convergent
My groundtruth is generated like this:
X = []
Y = []
im = cv2.imread(impath)
X.append(im)
seg_labels = np.zeros((height, width, n_classes))
for spath in segpaths:
mask = cv2.imread(spath, 0)
seg_labels[:, :, c] += mask
Y.append(seg_labels.reshape(width*height, n_classes))
Why reshape directly does not work?
You clearly misunderstand the meaning of each operation and the final goal:
final goal: classification for each pixel, i.e. softmax along the semantic class axis
how to achieve this goal in the original code? Let's see the code line by line:
reshape = Reshape((n_classes,self.img_rows * self.img_cols))(conv9) # L1
permute = Permute((2,1))(reshape) # L2
activation = Activation('softmax')(permute) # L3
L1's output dim = n_class-by-n_pixs, (n_pixs=img_rows x img_cols)
L2's output dim = n_pixs-by-n_class
L3's output dim = n_pixs-by-n_class
Note the default softmax activation is applied to the last axis, i.e. the axis that n_class stands for, which is the semantic class axis.
Therefore, this original code fulfills the final goal of semantic segmentation.
Let's revisit the code that you want to change, which is
reshape = Reshape((self.img_rows * self.img_cols, n_classes))(conv9) # L4
L4's output dim = n_pixs-by-n_class
My guess is that you think L4's output dim matches L2's, and thus L4 is a short-cut that is equivalent to executing L1 and L2.
However, matching the shape does not necessarily mean matching the physical meaning of axes. Why? A simple example will explain.
Say you have 2 semantic classes and 3 pixels. To see the difference assume all three pixels belong to the same class.
In other words, a ground truth tensor will look like this
# cls#1 cls#2
[ [0, 1], # pixel #1
[0, 1], # pixel #2
[0, 1], # pixel #3
]
Assume you have a perfect network and generate the exact response for each pixel, but your solution will create a tensor like below
# cls#1 cls#2
[ [0, 0], # pixel #1
[0, 1], # pixel #2
[1, 1], # pixel #3
]
whose shape is the same as the ground truth's, but fails to match the physical meaning of axes.
This further makes the softmax operation meaningless, because it is supposed to apply to the class dimension, but this dimension does not physically exist. As a result, it leads to the following erroneous output after applying softmax,
# cls#1 cls#2
[ [0.5, 0.5], # pixel #1
[0, 1], # pixel #2
[0.5, 0.5], # pixel #3
]
which completely mess up the training even if it is under the ideal assumption.
Therefore, it is a good habit to write down the physical meaning of each axis of a tensor. When you do any tensor reshape operation, ask yourself whether the physical meaning of an axis is changed in your expected way.
For example, if you have a tensor T of shape batch_dim x img_rows x img_cols x feat_dim, you can do many things and not all of them make sense (due to the problematic physical meaning of axes)
(Wrong) reshape it to whatever x feat_dim, because whatever dimension is meaningless in testing where the batch_size might be different.
(Wrong) reshape it to batch_dim x feat_dim x img_rows x img_cols, because the 2nd dimension is NOT the feature dimension and neither for the 3rd and 4th dimension.
(Correct) permute axes (3,1,2), and this will lead you the tensor of shape batch_dim x feat_dim x img_rows x img_cols, while keeping the physical meaning of each axis.
(Correct) reshape it to batch_dim x whatever x feat_dim. This is also valid, because the whatever=img_rows x img_cols is equivalent to the pixel location dimension, and both the meanings of batch_dim and feat_dim are unchanged.
Your code will still be runnable since the shape will be the same, but the result (backprops) will be different since the values of tensors will be different. For example:
arr = np.array([[[1,1,1],[1,1,1]],[[2,2,2],[2,2,2]],[[3,3,3],[3,3,3]],[[4,4,4],[4,4,4]]])
arr.shape
>>>(4, 2, 3)
#do reshape, then premute
reshape_1 = arr.reshape((4, 2*3))
np.swapaxes(reshape_1, 1, 0)
>>>array([[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]])
#do reshape directly
reshape_2 = arr.reshape(2*3, 4)
reshape_2
>>>array([[1, 1, 1, 1],
[1, 1, 2, 2],
[2, 2, 2, 2],
[3, 3, 3, 3],
[3, 3, 4, 4],
[4, 4, 4, 4]])
The Reshape and Permute is done to take the softmax at each pixel location. Adding to #meowongac's answer, Reshape preserves the order of the elements. In this case, since the channel dimensions have to be swapped, Reshape followed by Permute is appropriate.
Considering the case of (2,2) image with 3 values at each location,
arr = np.array([[[1,1],[1,1]],[[2,2],[2,2]],[[3,3],[3,3]]])
>>> arr.shape
(3, 2, 2)
>>> arr
array([[[1, 1],
[1, 1]],
[[2, 2],
[2, 2]],
[[3, 3],
[3, 3]]])
>>> arr[:,0,0]
array([1, 2, 3])
The channel values at each location are [1,2,3]. The goal is to swap the channel axis(length 3) to the end.
>>> arr.reshape((2,2,3))[0,0]
array([1, 1, 1]) # incorrect
>>> arr.transpose((1,2,0))[0,0] # similar to what permute does.
array([1, 2, 3]) # correct
More examples at this link: https://discuss.pytorch.org/t/how-to-change-shape-of-a-matrix-without-dispositioning-the-elements/30708
I am trying to import some pytorch code to tensorflow, I came to know that torch.nn.functional.conv1d() is tf.nn.conv1d() but I am afraid there are still some discrepancies in tf's versions. Specifically, I cannot find the group parameter in tf.conv1d. For example: the following codes output two different results:
Pytorch:
inputs = torch.Tensor([[[1, 1, 1, 1],[2, 2, 2, 2],[3, 3, 3, 3]]]) #batch_sizex seq_length x embed_dim,
inputs = inputs.transpose(2,1) #batch_size x embed_dim x seq_length
batch_size, embed_dim, seq_length = inputs.size()
kernel_size = 3
in_channels = 2
out_channels = in_channels
weight = torch.ones(out_channels, 1, kernel_size)
inputs = inputs.contiguous().view(-1, in_channels, seq_length) #batch_size*embed_dim/in_channels x in_channels x seq_length
inputs = F.pad(inputs, (kernel_size-1,0), 'constant', 0)
output = F.conv1d(inputs, weight, padding=0, groups=in_channels)
output = output.contiguous().view(batch_size, embed_dim, seq_length).transpose(2,1)
Output:
tensor([[[1., 1., 1., 1.],
[3., 3., 3., 3.],
[6., 6., 6., 6.]]])
Tensorflow:
inputs = tf.constant([[[1, 1, 1, 1],[2, 2, 2, 2],[3, 3, 3, 3]]], dtype=tf.float32) #batch_sizex seq_length x embed_dim
inputs = tf.transpose(inputs, perm=[0,2,1])
batch_size, embed_dim, seq_length = inputs.get_shape()
print(batch_size, seq_length, embed_dim)
kernel_size = 3
in_channels = 2
out_channels = in_channels
weight = tf.ones([kernel_size, in_channels, out_channels])
inputs = tf.reshape(inputs, [(batch_size*embed_dim)//in_channels, in_channels, seq_length], name='inputs')
inputs = tf.transpose(inputs, perm=[0, 2, 1])
padding = [[0, 0], [(kernel_size - 1), 0], [0, 0]]
padded = tf.pad(inputs, padding)
res = tf.nn.conv1d(padded, weight, 1, 'VALID')
res = tf.transpose(res, perm=[0, 2, 1])
res = tf.reshape(res, [batch_size, embed_dim, seq_length])
res = tf.transpose(res, perm=[0, 2, 1])
print(res)
Output:
[[[ 2. 2. 2. 2.]
[ 6. 6. 6. 6.]
[12. 12. 12. 12.]]], shape=(1, 3, 4), dtype=float32)
Different results
There is no discrepancy between those versions, you are just setting up different things. To get exactly same results as in Tensorflow change the lines specifying weights to:
weight = torch.ones(out_channels, 2, kernel_size)
, because your input has two input channels, as you have correctly declared in TF:
weight = tf.ones([kernel_size, in_channels, out_channels])
Groups parameter
You have misunderstood what is groups parameter responsible for in pytorch. It restricts the number of channels each filter uses (in this case only one as 2 input_channels divided by 2 give us one).
See here for more intuitive explanation for 2D convolution.
I have 2 matrix with shape:
pob.shape = (2,49,20)
rob.shape = np.zeros((2,49,20))
and I want to get the index of pob's elements which has value !=0. So in numpy I can do this:
x,y,z = np.where(pob!=0)
eg:
x = [2,4,7]
y = [3,5,5]
z = [3,5,6]
I want to change value of rob:
rob[x1,y1,:] = np.ones((20))
How can i do this with tensorflow objects?
I tried to use tf.where but I can't get the index value out of tensor obj
You could use tf.range() and tf.meshgrid() to create index matrices, then use tf.where() with your condition on them to obtain the indices which meet it. However, the tricky part would come next: you can't easily assign values to a tensor based on indices in TF (my_tensor[my_indices] = my_values).
A workaround for your problem ("for all (i,j,k), if pob[i,j,k] != 0 then rob[i,j] = 1") could be as follows:
import tensorflow as tf
# Example values for demonstration:
pob_val = [[[0, 0, 0], [1, 0, 0], [1, 0, 1]], [[1, 1, 1], [0, 0, 0], [0, 0, 0]]]
pob = tf.constant(pob_val)
pob_shape = tf.shape(pob)
rob = tf.zeros(pob_shape)
# Get the mask:
mask = tf.cast(tf.not_equal(pob, 0), tf.uint8)
# If there's at least one "True" in mask[i, j, :], make all mask[i, j, :] = True:
mask = tf.cast(tf.reduce_max(mask, axis=-1, keepdims=True), tf.bool)
mask = tf.tile(mask, [1, 1, pob_shape[-1]])
# Apply mask:
rob = tf.where(mask, tf.ones(pob_shape), rob)
with tf.Session() as sess:
rob_eval = sess.run(rob)
print(rob_eval)
# [[[0. 0. 0.]
# [1. 1. 1.]
# [1. 1. 1.]]
#
# [[1. 1. 1.]
# [0. 0. 0.]
# [0. 0. 0.]]]
I have the following matrix:
and the following kernel:
If I do a convolution with no padding and slide by 1 row, I should get the following answer:
Because:
Based the documentation of tf.nn.conv2d, I thought this code expresses what I just described above:
import tensorflow as tf
input_batch = tf.constant([
[
[[.0], [1.0]],
[[2.], [3.]]
]
])
kernel = tf.constant([
[
[[1.0, 2.0]]
]
])
conv2d = tf.nn.conv2d(input_batch, kernel, strides=[1, 1, 1, 1], padding='VALID')
sess = tf.Session()
print(sess.run(conv2d))
But it produces this output:
[[[[ 0. 0.]
[ 1. 2.]]
[[ 2. 4.]
[ 3. 6.]]]]
And I have no clue how that is computed. I've tried experimenting with different values for the strides padding parameter but still am not able to produce the result I expected.
You have not correctly read my explanation in the tutorial you linked. After a straight-forward modification of no-padding, strides=1 you suppose to get the following code.
import tensorflow as tf
k = tf.constant([
[1, 2],
], dtype=tf.float32, name='k')
i = tf.constant([
[0, 1],
[2, 3],
], dtype=tf.float32, name='i')
kernel = tf.reshape(k, [1, 2, 1, 1], name='kernel')
image = tf.reshape(i, [1, 2, 2, 1], name='image')
res = tf.squeeze(tf.nn.conv2d(image, kernel, [1, 1, 1, 1], "VALID"))
# VALID means no padding
with tf.Session() as sess:
print sess.run(res)
Which gives you the result you expected: [2., 8.]. Here I got a vector instead of the column because of squeeze operator.
One problem I see with your code (there might be other) is that your kernel is of the shape (1, 1, 1, 2), but it suppose to be (1, 2, 1, 1).