using zip and generator, how can I get a batch data - iterator

simple example code is.
import numpy as np
x_train = np.array([[95, 50, 10, 5, 4],
[85, 5, 100, 40, 3],
[75, 50, 10, 30, 1],
[65, 50, 1, 20, 42],
[55, 500, 10, 10, 3],
[45, 50, 10, 110, 40]], dtype=np.float32) # training data
y_train = np.array([1,1,0,0,1,0]) # label
train_data= list(zip(x_train, y_train)) # zip both data and lable
def batch_iter(data): # I make simple generator
for i in range(len(data)) :
yield data[i:i+1]
batches = batch_iter(train_data)
for i in range(len(x_train)):
x, y = batches # error happend too many values to unpack (expected 2)
x, y = zip(*batches) # error happend not enough values to unpack (expected 2, got 1)
How can I take each train data and label for each iteration??
thanks.

I changed the code like this, its working well.
I need to study generator and numpy.
please add your answer.
thanks
x_train = np.array([[95, 50, 10, 5, 4],
[85, 5, 100, 40, 3],
[75, 50, 10, 30, 1],
[65, 50, 1, 20, 42],
[55, 500, 10, 10, 3],
[45, 50, 10, 110, 40]], dtype=np.float32)
y_train = np.array([1,1,0,0,1,0])
train_data= list(zip(x_train, y_train))
def batch_iter(data):
data = np.array(data)
for i in range(len(data)) :
yield data[i:i+1]
batches = batch_iter(train_data)
x, y = zip(*next(batches))

Related

Is there an equivalent function of pytorch named "index_select" in tensorflow

I tried to translate pytorch code to tensorflow. So I wanna know is there an equivalent function of pytorch named index_select in tensorflow
I haven't found a similar api can directly achieve it, but we can use tf.slice to implement it.
def tf_index_select(input_, dim, indices):
"""
input_(tensor): input tensor
dim(int): dimension
indices(list): selected indices list
"""
shape = input_.get_shape().as_list()
if dim == -1:
dim = len(shape)-1
shape[dim] = 1
tmp = []
for idx in indices:
begin = [0]*len(shape)
begin[dim] = idx
tmp.append(tf.slice(input_, begin, shape))
res = tf.concat(tmp, axis=dim)
return res
Here is an example to show the equivalence.
import tensorflow as tf
import torch
import numpy as np
a = np.arange(2*3*4).reshape(2,3,4)
dim = 1
indices = [0,2]
# array([[[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11]],
# [[12, 13, 14, 15],
# [16, 17, 18, 19],
# [20, 21, 22, 23]]])
# pytorch
res = torch.tensor(a).index_select(dim, torch.tensor(indices))
# tensor([[[ 0, 1, 2, 3],
# [ 8, 9, 10, 11]],
# [[12, 13, 14, 15],
# [20, 21, 22, 23]]])
# tensorflow
res = tf_index_select(tf.constant(a), dim, indices)
# tensor([[[ 0, 1, 2, 3],
# [ 8, 9, 10, 11]],
# [[12, 13, 14, 15],
# [20, 21, 22, 23]]])

Adding a row-dependent value to each row

I have a 2D array containing the following numbers:
A = [[1, 5, 9, 42],
[20, 2, 71, 0],
[2, 44, 4, 9]]
I want to add a different constant value to each row without using loops. This value is a n*c with n being the current row and c being the constant. For example, c=100 so that:
B = [[1, 5, 9, 42],
[120, 102, 171, 100],
[202, 244, 204, 209]]
Any help would be greatly appreciated
You can do that as follows:
>>> A = [[1, 5, 9, 42],
... [20, 2, 71, 0],
... [2, 44, 4, 9]]
...
>>> a = np.array(A)
>>> c = 100
>>> addto = np.arange(len(a))[:, None] * c
>>> a + addto
array([[ 1, 5, 9, 42],
[120, 102, 171, 100],
[202, 244, 204, 209]])
np.arange(len(a)) gets you a 1-dimensional array of the indices, array([0, 1, 2]), which you can then multiply by c.
The hitch is that you then need to conform this to NumPy's broadcasting rules by expanding it's dimensionality:
>>> np.arange(len(a)).shape
(3,)
>>> np.arange(len(a))[:, None].shape
(3, 1)
You could also do something like np.linspace(0, 100*(len(a)-1), num=len(a))[:, None], but that is probably overkill here.

How to do multiply each row of a matrix by different scalar in tensorflow [duplicate]

I have a 2D matrix M of shape [batch x dim], I have a vector V of shape [batch]. How can I multiply each of the columns in the matrix by the corresponding element in the V? That is:
I know an inefficient numpy implementation would look like this:
import numpy as np
M = np.random.uniform(size=(4, 10))
V = np.random.randint(4)
def tst(M, V):
rows = []
for i in range(len(M)):
col = []
for j in range(len(M[i])):
col.append(M[i][j] * V[i])
rows.append(col)
return np.array(rows)
In tensorflow, given two tensors, what is the most efficient way to achieve this?
import tensorflow as tf
sess = tf.InteractiveSession()
M = tf.constant(np.random.normal(size=(4,10)), dtype=tf.float32)
V = tf.constant([1,2,3,4], dtype=tf.float32)
In NumPy, we would need to make V 2D and then let broadcasting do the element-wise multiplication (i.e. Hadamard product). I am guessing, it should be the same on tensorflow. So, for expanding dims on tensorflow, we can use tf.newaxis (on newer versions) or tf.expand_dims or a reshape with tf.reshape -
tf.multiply(M, V[:,tf.newaxis])
tf.multiply(M, tf.expand_dims(V,1))
tf.multiply(M, tf.reshape(V, (-1, 1)))
In addition to #Divakar's answer, I would like to make a note that the order of M and V don't matter. It seems that tf.multiply also does broadcasting during multiplication.
Example:
In [55]: M.eval()
Out[55]:
array([[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6]], dtype=int32)
In [56]: V.eval()
Out[56]: array([10, 20, 30], dtype=int32)
In [57]: tf.multiply(M, V[:,tf.newaxis]).eval()
Out[57]:
array([[ 10, 20, 30, 40],
[ 40, 60, 80, 100],
[ 90, 120, 150, 180]], dtype=int32)
In [58]: tf.multiply(V[:, tf.newaxis], M).eval()
Out[58]:
array([[ 10, 20, 30, 40],
[ 40, 60, 80, 100],
[ 90, 120, 150, 180]], dtype=int32)

Convert 2d tensor to 3d in tensorflow

I need to convert 2d tensor to a 3d tensor. how can I transfer this in tensor flow.
[[30, 29, 19, 17, 12, 11],
[30, 27, 20, 16, 5, 1],
[28, 25, 17, 14, 7, 2],
[28, 26, 21, 14, 6, 4]]
to this
[[[0,30], [0,29], [0,19], [0,17], [0,12], [0,11]],
[[1,30], [1,27], [1,20], [1,16],[1,5], [1,1]],
[[2,28], [2,25], [2,17], [2,14], [2,7], [2,2]],
[[3,28], [3,26], [3,21], [3,14], [3,6], [3,4]]]
Thanks! I am doing this to implement asked in How to select rows from a 3-D Tensor in TensorFlow? #kom
Here's a workaround to achieve the 3D tensor from 2D tensor
a = tf.constant([[30, 29, 19, 17, 12, 11],
[30, 27, 20, 16, 5, 1],
[28, 25, 17, 14, 7, 2],
[28, 26, 21, 14, 6, 4]], dtype=tf.int32)
a = tf.expand_dims(a, axis=2)
b = tf.constant(np.asarray([i*np.ones(a.shape[1]) for i in range(0, a.shape[0])], dtype=np.int32), dtype=tf.int32)
b = tf.expand_dims(b, axis=2)
final_ids = tf.concat([b, a], axis=2)

Tensorflow multiplication broadcasting within batches

We know that tf.multiply can broadcast like this:
import tensorflow as tf
import numpy as np
a = tf.Variable(np.arange(12).reshape(3, 4))
b = tf.Variable(np.arange(4))
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
sess.run(tf.multiply(a, b))
This will give us
[[0, 1, 4, 9],
[0, 5, 12, 21],
[0, 9, 20, 33]]
But my question is, what should I do if both a and b are in batches? That is,
a = tf.Variable(np.arange(24).reshape(2, 3, 4))
b = tf.Variable(np.arange(8).reshape(2, 4))
Then how can I get the result of multiplying (broadcasting) the vector onto the matrix in each batch? Like the following answer:
[[[0, 1, 4, 9],
[0, 5, 12, 21],
[0, 9, 20, 33]],
[[48, 65, 84, 105],
[64, 85, 108, 133],
[80, 105, 132, 161]]]
Thanks!
Broadcasting first adds singleton dimensions to the left until rank is matched. In first case that adds batch dimension. But in second case you already have batch dimension so you need to insert singleton dimension manually in the second position:
a = tf.reshape(tf.range(24), (2, 3, 4))
b = tf.reshape(tf.range(8), (2, 4))
sess.run(tf.mul(a, tf.expand_dims(b, 1)))