Tensorflow : Choosing a range of columns in each row from a Tensor - tensorflow

I would like to choose only particular columns in each row of a tensor, using it for an RNN
seq_len=[11,12,20,30] #This is the sequence length, assume 4 sequences
array=tf.ones([4,30]) #Assuming this is the array I want to index from
function(array,seq_len) #apply required function
Output=(first 11 elements from row 0, first 12 from row 2, first 20 from row 3 etc), perhaps obtained as a flat tensor

You can use tf.sequence_mask and tf.boolean_mask to get them flattened:
mask = tf.sequence_mask(seq_len, MAX_LENGTH) # Replace MAX_LENGTH with the size of array on the right dimension, 30 in your case
output= tf.boolean_mask(array, mask=mask)

A tensor in tensorflow can be sliced just like a numpy array and then concatenated into one tensor. Assuming you measure the sequence length from the first element.
Use [row_idx,column_idx] to slice the tensor. slice = array[0,:] would assign the first row to slice.
flat_slices = tf.concat([slice,slice]) will flatten them into one tensor.
import tensorflow as tf
seq_len = [11,12,20,30]
array = tf.ones([4,30])
init = tf.global_variables_initializer()
with tf.Session() as sess:
init.run()
flatten = array[0,:seq_len[0]]
for i in range(1,len(seq_len)):
row = array[i,:seq_len[i]]
flatten = tf.concat([flatten, row])
print(sess.run(flatten))

Related

Is there an efficient way to select 5 regions of a tensor in Tensorflow?

For example, given a tensor m which its shape is [28, 28].
I want to randomly select five regions with the tensor, the shape of each region is [3, 3].
Then, I want to modify the values of these regions.
One sulution would be random extraction inside a loop:
import random
tensor = tf.ones(shape=(28,28))
desired_shape = (3,3)
dim1 = random.randint(0,tensor.shape[0] - desired_shape[0])
dim2 = random.randint(0,tensor.shape[1] - desired_shape[1])
extracted_tensor = tensor[dim1:dim1+desired_shape[0]][:,dim2 + desired_shape[1]]
First import the random module and create a (or use your) tensor. Set your desired_shape.
Then create two random variables, one for each dimension and extract the tensor via sublisting.
But, keep in mind, that you cannot assign values to a tensor in tensorflow as this thread says.
To solve this, first convert it to a numpy array, change the values and convert it to a tensor again, so this would be a solution for your issue.
np_arr = tensor.numpy()
for i in range(5):
dim1 = random.randint(0,tensor.shape[0] - desired_shape[0])
dim2 = random.randint(0,tensor.shape[1] - desired_shape[1])
np_arr[dim1:dim1+desired_shape[0]][:,dim2 + desired_shape[1]] = [1,2,3] # any value
new_tens = tf.convert_to_tensor(np_arr)

Mask values in Tensorflow ragged tensor

I have a ragged tensor:
tf.ragged.constant([[[17712], [16753], [11850], [13028], [10155], [15734, 15938], [126], [10135], [17665]]], dtype=tf.int32)
I would like to set the value of elements in rows with a length greater than 1 to a particular value. For example:
tf.ragged.constant([[[17712], [16753], [11850], [13028], [10155], [15734, 0], [126], [10135], [17665]]], dtype=tf.int32)
How can I express such a transformation in Tensorflow?
Ragged tensors always make things trickier than they should be, but here is one possible implementation of that:
import tensorflow as tf
# Using an intermediate NumPy array avoids having the second dimension as ragged
a = tf.ragged.constant([[[17712], [16753], [11850], [13028], [10155], [15734, 15938],
[126], [10135], [17665]]], dtype=tf.int32)
# Index from which values are replaced
replace_from_idx = 1
# Replacement value
new_value = 0
# Get size of each element in the last dimension
s = a.row_lengths(axis=-1)
# Make ragged ranges
r = tf.ragged.range(s.flat_values)
# Un-flatten
r = tf.RaggedTensor.from_row_lengths(r, a.row_lengths(1))
# Replace values
m = tf.dtypes.cast(r < replace_from_idx, a.dtype)
out = a * m + new_value * (1 - m)
print(out.to_list())
# [[[17712], [16753], [11850], [13028], [10155], [15734, 0], [126], [10135], [17665]]]

tensorflow how to pad batched text like pytorch's 'collate_fn'?

I want to pad a batch of text into same length, generate segment id, mask vector, and then feed them to bert model.
In pytorch, I can use the collate_fn like below.
def collate_fn(self, batch):
rows = self.df.iloc[batch] # take a batch of data
ids, seg_ids = self.get_ids_segs(rows) # process data
attention_mask = (ids > 0)
return ids, seg_ids,attention_mask
But in tensorflow, the data is pass by a tuple of matrix, thus all the text are padded into the max length 512.
# ids.shape = seg_ids = attention_mask = (data_number, max_seq_len)
xs = (ids, seg_ids, attention_mask)
model.fit(xs,, ys, batch_size=batch_size)
I found tf.data.dataset has a function padded_batch. But it can only pad one input, what I have is 3 input data, ids, seq_ids, attn_mask.
Probably using apply or map method of
tf.data.Dataset
after applying batch method should solve the problem.

what's the appropriate placeholder for my input

I have a 1k rows and 14 columns dataframe containing numpy arrays like shown below.
Here a subset of 2 rows and 3 columns :
[5,4,74,-12] [ 78,1,2,-9] [5 ,1,1,2]
[10,4,4,-1] [ 8,15,21,-19] [1,1,0,0]
where each cell is a numpy array of shape (4,1).
I couldn't find the right placeholder to input my whole dataframe as it needs to be processed by row batches.
Could anyone have an idea ?
I tried this to find the proper placeholder for my dataframe but its not correct:
x = tf.placeholder(tf.int32,[None,14],name='x')
with tf.Session() as sess:
print(sess.run(x,feed_dict={x:Data}))
It gives ValueError: setting an array element with a sequence.
Does anyone have an idea please ?
You did not specify in which format your data is available, so I assume it is a numpy array. In this case, you can do it like this:
n_columns = 14
n_elements_per_column = 4
x = tf.placeholder(tf.int32, [None, n_columns, n_elements_per_column], name='x')
with tf.Session() as sess:
print(sess.run(x,feed_dict={x:Data}))

gather values from 2dim tensor in tensorflow

Hi tensorflow beginner here... I'm trying to get the value of a certain elements in an 2 dim tensor, in my case class scores from a probability matrix.
The probability matrix is (1000,81) with batchsize 1000 and number of classes 81. ClassIDs is (1000,) and contains the index for the highest class score for each sample. How do I get the corresponding class score from the probability matrix using tf.gather?
class_ids = tf.cast(tf.argmax(probs, axis=1), tf.int32)
class_scores = tf.gather_nd(probs,class_ids)
class_scores should be a tensor of shape (1000,) containing the highest class_score for each sample.
Right now I'm using a workaround that looks like this:
class_score_count = []
for i in range(probs.shape[0]):
prob = probs[i,:]
class_score = prob[class_ids[i]]
class_score_count.append(class_score)
class_scores = tf.stack(class_score_count, axis=0)
Thanks for the help!
You can do it with tf.gather_nd like this:
class_ids = tf.cast(tf.argmax(probs, axis=1), tf.int32)
# If shape is not dynamic you can use probs.shape[0].value instead of tf.shape(probs)[0]
row_ids = tf.range(tf.shape(probs)[0], dtype=tf.int32)
idx = tf.stack([row_ids, class_ids], axis=1)
class_scores = tf.gather_nd(probs, idx)
You could also just use tf.reduce_max, even though it would actually compute the maximum again it may not be much slower if your data is not too big:
class_scores = tf.reduce_max(probs, axis=1)
you need to run the tensor class_ids to get the values
the values will be a bumpy array
you can access numpy array normally by a loop
you have to do something like this :
predictions = sess.run(tf.argmax(probs, 1), feed_dict={x: X_data})
predictions variable has all the information you need
tensorflow only returns those tensor values which you run explicitly
I think this is what the batch_dims argument for tf.gather is for.