Randomly select items from two equally sized tensors - indexing

Assume that we have two equally sized tensors of size batch_size * 1. For each index in the batch dimension we want to choose randomly between the two tensors. My solution was to create an indices tensor that contains random 0 or 1 indices of size batch_size and use those to index_select from the concatenation of the two tensors. However, to do so I had the "view" that cat tensor and the solution ended up to be quite "ugly":
import torch
bs = 8
a = torch.zeros(bs, 1)
print("a size", a.size())
b = torch.ones(bs, 1)
c = torch.cat([a, b], dim=-1)
print(c)
print("c size", c.size())
# create bs number of random 0 and 1's
indices = torch.randint(0, 2, [bs])
print("idxs size", indices.size())
print("idxs", indices)
# use `indices` to slice the `cat`ted tensor
d = c.view(1, -1).index_select(-1, indices).view(-1, 1)
print("d size", d.size())
print(d)
I am wondering whether there is a prettier and, more importantly, more efficient solution.

Posting two answers that I got over at the PyTorch forums
import torch
bs = 8
a = torch.zeros(bs, 1)
b = torch.ones(bs, 1)
c = torch.cat([a, b], dim=-1)
choices_flat = c.view(-1)
# index = torch.randint(choices_flat.numel(), (bs,))
# or if replace = False
index = torch.randperm(choices_flat.numel())[:bs]
select = choices_flat[index]
print(select)
import torch
bs = 8
a = torch.zeros(bs, 1)
print("a size", a.size())
b = torch.ones(bs, 1)
idx = torch.randint(2 * bs, (bs,))
d = torch.cat([a, b])[idx] # [bs, 1]

Related

How to concatenate two tensors with intervals in tensorflow?

I want to concatenate two tensors checkerboard-ly in tensorflow2, like examples showed below:
example 1:
a = [[1,1],[1,1]]
b = [[0,0],[0,0]]
concated_a_and_b = [[1,0,1,0],[0,1,0,1]]
example 2:
a = [[1,1,1],[1,1,1],[1,1,1]]
b = [[0,0,0],[0,0,0],[0,0,0]]
concated_a_and_b = [[1,0,1,0,1,0],[0,1,0,1,0,1],[1,0,1,0,1,0]]
Is there a decent way in tensorflow2 to concatenate them like this?
A bit of background for this:
I first split a tensor c with a checkerboard mask into two halves a and b. A after some transformation I have to concat them back into oringnal shape and order.
What I mean by checkerboard-ly:
Step 1: Generate a matrix with alternated values
You can do this by first concatenating into [1, 0] pairs, and then by applying a final reshape.
Step 2: Reverse some rows
I split the matrix into two parts, reverse the second part and then rebuild the full matrix by picking alternatively from the first and second part
Code sample:
import math
import numpy as np
import tensorflow as tf
a = tf.ones(shape=(3, 4))
b = tf.zeros(shape=(3, 4))
x = tf.expand_dims(a, axis=-1)
y = tf.expand_dims(b, axis=-1)
paired_ones_zeros = tf.concat([x, y], axis=-1)
alternated_values = tf.reshape(paired_ones_zeros, [-1, a.shape[1] + b.shape[1]])
num_samples = alternated_values.shape[0]
middle = math.ceil(num_samples / 2)
is_num_samples_odd = middle * 2 != num_samples
# Gather first part of the matrix, don't do anything to it
first_elements = tf.gather_nd(alternated_values, [[index] for index in range(middle)])
# Gather second part of the matrix and reverse its elements
second_elements = tf.reverse(tf.gather_nd(alternated_values, [[index] for index in range(middle, num_samples)]), axis=[1])
# Pick alternatively between first and second part of the matrix
indices = np.concatenate([[[index], [index + middle]] for index in range(middle)], axis=0)
if is_num_samples_odd:
indices = indices[:-1]
output = tf.gather_nd(
tf.concat([first_elements, second_elements], axis=0),
indices
)
print(output)
I know this is not a decent way as it will affect time and space complexity. But it solves the above problem
def concat(tf1, tf2):
result = []
for (index, (tf_item1, tf_item2)) in enumerate(zip(tf1, tf2)):
item = []
for (subitem1, subitem2) in zip(tf_item1, tf_item2):
if index % 2 == 0:
item.append(subitem1)
item.append(subitem2)
else:
item.append(subitem2)
item.append(subitem1)
concated_a_and_b.append(item)
return concated_a_and_b

How to show the class distribution in Dataset object in Tensorflow

I am working on a multi-class classification task using my own images.
filenames = [] # a list of filenames
labels = [] # a list of labels corresponding to the filenames
full_ds = tf.data.Dataset.from_tensor_slices((filenames, labels))
This full dataset will be shuffled and split into train, valid and test dataset
full_ds_size = len(filenames)
full_ds = full_ds.shuffle(buffer_size=full_ds_size*2, seed=128) # seed is used for reproducibility
train_ds_size = int(0.64 * full_ds_size)
valid_ds_size = int(0.16 * full_ds_size)
train_ds = full_ds.take(train_ds_size)
remaining = full_ds.skip(train_ds_size)
valid_ds = remaining.take(valid_ds_size)
test_ds = remaining.skip(valid_ds_size)
Now I am struggling to understand how each class is distributed in train_ds, valid_ds and test_ds. An ugly solution is to iterate all the element in the dataset and count the occurrence of each class. Is there any better way to solve it?
My ugly solution:
def get_class_distribution(dataset):
class_distribution = {}
for element in dataset.as_numpy_iterator():
label = element[1]
if label in class_distribution.keys():
class_distribution[label] += 1
else:
class_distribution[label] = 0
# sort dict by key
class_distribution = collections.OrderedDict(sorted(class_distribution.items()))
return class_distribution
train_ds_class_dist = get_class_distribution(train_ds)
valid_ds_class_dist = get_class_distribution(valid_ds)
test_ds_class_dist = get_class_distribution(test_ds)
print(train_ds_class_dist)
print(valid_ds_class_dist)
print(test_ds_class_dist)
The answer below assumes:
there are five classes.
labels are integers from 0 to 4.
It can be modified to suit your needs.
Define a counter function:
def count_class(counts, batch, num_classes=5):
labels = batch['label']
for i in range(num_classes):
cc = tf.cast(labels == i, tf.int32)
counts[i] += tf.reduce_sum(cc)
return counts
Use the reduce operation:
initial_state = dict((i, 0) for i in range(5))
counts = train_ds.reduce(initial_state=initial_state,
reduce_func=count_class)
print([(k, v.numpy()) for k, v in counts.items()])
A solution inspired by user650654 's answer, only using TensorFlow primitives (with tf.unique_with_counts instead of for loop):
In theory, this should have better performance and scale better to large datasets, batches or class count.
num_classes = 5
#tf.function
def count_class(counts, batch):
y, _, c = tf.unique_with_counts(batch[1])
return tf.tensor_scatter_nd_add(counts, tf.expand_dims(y, axis=1), c)
counts = train_ds.reduce(
initial_state=tf.zeros(num_classes, tf.int32),
reduce_func=count_class)
print(counts.numpy())
Similar and simpler version with numpy that actually had better performances for my simple use-case:
count = np.zeros(num_classes, dtype=np.int32)
for _, labels in train_ds:
y, _, c = tf.unique_with_counts(labels)
count[y.numpy()] += c.numpy()
print(count)

How to define (sparse) variable diagonal tensors

In a problem I want to solve using Tensorflow, I want to build a n-dimensional rank tensor that is 'diagonal' by blocks. That is, I want to generate a tensor object from a concatenation of low order tensors.
I have tried to define the whole tf.Variable tensor and then to impose the value 0 to some variables but Tensorflow does not allow assignments when working with variable tensors.
Moreover, I would want to create 'diagonal' tensors with the same independent variables, as, for example, using a stacked 2D representation, being A a 2 dimensional tensor:
T = [A, 0;0 , A]
My current source code:
shape1 = [3,3,10,10]
shape2 = [3,3]
i1 = tf.truncated_normal(shape1, stddev=1.0, dtype = tf.float32)
i2 = tf.truncated_normal(shape2, stddev=1.0, dtype = tf.float32)
A = tf.Variable(i1)
V = tf.Variable(i2)
for i in range(10):
for j in range(10):
if i != j:
A[:,:,i,j] = tf.zeros((3,3))
else:
A[:,:,i,j] = V
Of course, this code returns the error Variable object does not support item assignment.
What I want, at the end of the day, is to define a variable tensor such as:
T[:,:,i,j] = tf.zeros([D0,D1]), if i != j
and
T[:,:,i,j] = A, if i = j
with A = tf.variable([D0,D1])
Thank you very much in advance!
One way would be to use tf.stack, which converts a list of tensors of dimension n to a tensor of dimension n+1.
l = []
for i in range(10):
li = [V * 0.0 if i != j else V for j in range(10)]
Ai = tf.stack(li)
l.append(Ai)
A = tf.stack(l)

matmul function for vector with tensor multiplication in tensorflow

In general when we multiply a vector v of dimension 1*n with a tensor T of dimension m*n*k, we expect to get a matrix/tensor of dimension m*k/m*1*k. This means that our tensor has m slices of matrices with dimension n*k, and v is multiplied to each matrix and the resulting vectors are stacked together. In order to do this multiplication in tensorflow, I came up with the following formulation. I am just wondering if there is any built-in function that does this standard multiplication straightforward?
T = tf.Variable(tf.random_normal((m,n,k)), name="tensor")
v = tf.Variable(tf.random_normal((1,n)), name="vector")
c = tf.stack([v,v]) # m times, here set m=2
output = tf.matmul(c,T)
You can do it with:
tf.reduce_sum(tf.expand_dims(v,2)*T,1)
Code:
m, n, k = 2, 3, 4
T = tf.Variable(tf.random_normal((m,n,k)), name="tensor")
v = tf.Variable(tf.random_normal((1,n)), name="vector")
c = tf.stack([v,v]) # m times, here set m=2
out1 = tf.matmul(c,T)
out2 = tf.reduce_sum(tf.expand_dims(v,2)*T,1)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
n_out1 = sess.run(out1)
n_out2 = sess.run(out2)
#both n_out1 and n_out2 matches
Not sure if there is a better way, but it sounds like you could use tf.map_fn like this:
output = tf.map_fn(lambda x: tf.matmul(v, x), T)

Compute value of variable for multiple input values

I have a tensorflow graph which is trained. After training, I want to sample one variable for multiple intermediate values. Simplified:
a = tf.placeholder(tf.float32, [1])
b = a + 10
c = b * 10
Now I want to query c for values of b. Currently, I am using an outer loop
b_values = [0, 1, 2, 3, 4, 5]
samples = []
for b_value in b_values:
samples += [sess.run(c,
feed_dict={b: [b_value]})]
This loop takes quite a bit of time, I think it is because b_values contains 5000 values in my case. Is there a way of running sess.run only once, and passing all b_values at once? I cannot really modify the graph a->b->c, but I could add something to it if that helps.
You could do it as follows:
import tensorflow as tf
import numpy as np
import time
a = tf.placeholder(tf.float32, [None,1])
b = a + 10
c = b * 10
sess = tf.Session()
b_values = np.random.randint(500,size=(5000,1))
samples = []
t = time.time()
for b_value in b_values:
samples += [sess.run(c,feed_dict={b: [b_value]})]
print time.time()-t
#print samples
t=time.time()
samples = sess.run(c,feed_dict={b:b_values})
print time.time()-t
#print samples
Output: (time in seconds)
0.874449968338
0.000532150268555
Hope this helps !