is there an numpy function that will return an array with different combinations of the original? - numpy

For example if my array was
(2,2)
array([[1, 0],
[0, 1]])
I would want it to return:
(4,2,2)
array([[[0, 0],
[0, 1]],
[[1, 1],
[0, 1]],
[[1, 0],
[1, 1]],
[[1, 0],
[0, 0]]])

You can flip binary number at a time using:
(np.identity(inp.size, int)^inp.ravel()).reshape(-1, *inp.shape)
or more verbose but also more economical:
>>> out = np.empty(2*(inp.size,), inp.dtype)
>>> out[...] = inp.ravel()
>>> np.einsum('ii->i', out)[...]^=1
>>> out = out.reshape(-1, *inp.shape)

Related

Looking for an efficient way to index 2D tensor by another 2D tensor in pytorch

I have a tensor, say
A = tensor([
[0, 0],
[0, 2],
[0, 3],
[0, 4],
[0, 5],
[0, 6],
[1, 0],
[1, 1],
[1, 4],
[1, 5],
[1, 6]
])
and the other tensor
b = tensor([[0, 2], [1, 2]])
I would like to find an efficient way to index into A by b such that the result is
result = tensor([[0, 3], [1, 4]])
That is, match A’s first column of last dim (i.e. [0,…,1…]) with b’s first column of the last dim (i.e. [0,1]) by their values and then use b’s second column (i.e. [2, 2]) to index A’s second column.
Thanks
Work out a solution by converting it to one dimensional problem with torch.nonzero and offset by mask sum.
Instead of the original A, get a flatten version, like
A = tensor([[ 0], [ 2], [ 3], [ 4], [ 5], [ 7], [ 8], [11], [12]])
and also calculate the offsets along batch,
offset = tensor([[0], [5], [4]])
Similarly, get b
b = tensor([2, 2])
and
offset_b = b+offset.reshape(-1)[:-1]
Then
indices=A.reshape(-1)[offset_b]

How to pad a list of NumPy arrays in a vectorized way?

I am trying to find a vectorized way (or at least, better than using a loop) to create a three-dimensional NumPy array from a list of 2D NumPy arrays. Right now, I have a list L that looks something like:
L = [ np.array([[1,2,3], [4,5,6]]), np.array([[8,9,10]]), ...]
Each NumPy array has the same size for the second dimension (in the above case, it is 3). But the first dimension has different sizes.
My goal is to create a 3D NumPy array M that incorporates the above data. I've been trying to use the np.pad() function, since I have a maximum size for the first dimension of each of my arrays, but it looks like it would only operate on the individual elements of my list. I could then do what I wanted using the function and looping over every array. However, I'd like to do this without a loop if possible, using a vectorized approach. Are there any techniques to do this?
This question is related to this one, though I'm hoping to do this over my whole list at once.
First lets look at the common task of padding 1d arrays to a common size.
In [441]: alist = [np.ones((2,),int),np.zeros((1,),int)+2, np.zeros((3,),int)+3]
In [442]: alist
Out[442]: [array([1, 1]), array([2]), array([3, 3, 3])]
The obvious iterative approach:
In [443]: [np.hstack((arr, np.zeros((3-arr.shape[0]),int))) for arr in alist]
Out[443]: [array([1, 1, 0]), array([2, 0, 0]), array([3, 3, 3])]
In [444]: np.stack(_)
Out[444]:
array([[1, 1, 0],
[2, 0, 0],
[3, 3, 3]])
A clever alternative. It still requires an iteration to determine sizes, but the rest is whole-array "vectorization":
In [445]: sizes = [arr.shape[0] for arr in alist]
In [446]: sizes
Out[446]: [2, 1, 3]
Make the output array with the pad values:
In [448]: res = np.zeros((3,3),int)
Make a clever mask (#Divakar first proposed this)
In [449]: np.array(sizes)[:,None]>np.arange(3)
Out[449]:
array([[ True, True, False],
[ True, False, False],
[ True, True, True]])
then map the 'flattened' inputs to res:
In [450]: res[_]=np.hstack(alist)
In [451]: res
Out[451]:
array([[1, 1, 0],
[2, 0, 0],
[3, 3, 3]])
I think this process can be extended to your 2d=>3d case. But it will take a bit of work. I tried doing it directly and found I was getting lost in applying the mask. That's why I decided to first layout the 1d=>2d case. There's enough thinking-outside-the-box that I have to work out the details fresh each time.
2d=>3d
In [457]: a2list = [np.ones((2,3),int),np.zeros((1,3),int)+2, np.zeros((3,3),int)+3]
In [458]: [np.vstack((arr, np.zeros((3-arr.shape[0],arr.shape[1]),int))) for arr in a2list]
Out[458]:
[array([[1, 1, 1],
[1, 1, 1],
[0, 0, 0]]),
array([[2, 2, 2],
[0, 0, 0],
[0, 0, 0]]),
array([[3, 3, 3],
[3, 3, 3],
[3, 3, 3]])]
In [459]: np.stack(_)
Out[459]:
array([[[1, 1, 1],
[1, 1, 1],
[0, 0, 0]],
[[2, 2, 2],
[0, 0, 0],
[0, 0, 0]],
[[3, 3, 3],
[3, 3, 3],
[3, 3, 3]]])
Now for the 'vectorized' approach:
In [460]: sizes = [arr.shape[0] for arr in a2list]
In [461]: sizes
Out[461]: [2, 1, 3]
In [462]: np.array(sizes)[:,None]>np.arange(3)
Out[462]:
array([[ True, True, False],
[ True, False, False],
[ True, True, True]])
In [463]: res = np.zeros((3,3,3),int)
and the corresponding indices from the mask:
In [464]: I,J=np.nonzero(Out[462])
In [465]: I
Out[465]: array([0, 0, 1, 2, 2, 2])
In [466]: J
Out[466]: array([0, 1, 0, 0, 1, 2])
In [467]: res[I,J,:] = np.vstack(a2list)
In [468]: res
Out[468]:
array([[[1, 1, 1],
[1, 1, 1],
[0, 0, 0]],
[[2, 2, 2],
[0, 0, 0],
[0, 0, 0]],
[[3, 3, 3],
[3, 3, 3],
[3, 3, 3]]])

Algorithms of Joining arrays in numpy

I'm new in numpy, I understand the methods of "Joining arrays" in lower shape such as (n1, n2) beacause we can visualize, like a matrix.
But I don't undestand the logic in higher dimensions (n0, ...., n_{d-1}) of course I can't visualize that. To visualize I usually imagine a multidimensional array like a tree, so (n0, ...., n_{d-1}) means that at level (axis) i of tree every node has n_{i} children. So at level 0 (the root) we have n0 children and so on.
In substance what is the formal exact definiton of "Joining arrays" algorithms?
https://numpy.org/doc/stable/reference/routines.array-manipulation.html
Let's see I can illustrate some basic array operations.
First make a 2d array. Start with a 1d, [0,1,...5], and reshape it to (2,3):
In [1]: x = np.arange(6).reshape(2,3)
In [2]: x
Out[2]:
array([[0, 1, 2],
[3, 4, 5]])
I can join 2 copies of x along the 1st dimension (vstack, v for vertical also does this):
In [3]: np.concatenate([x,x], axis=0)
Out[3]:
array([[0, 1, 2],
[3, 4, 5],
[0, 1, 2],
[3, 4, 5]])
Note that the result is (4,3); no new dimension.
Or join them 'horizontally':
In [4]: np.concatenate([x,x], axis=1)
Out[4]:
array([[0, 1, 2, 0, 1, 2], # (2,6) shape
[3, 4, 5, 3, 4, 5]])
But if I supply them to np.array I make a 3d array (2,2,3) shape:
In [5]: np.array([x,x])
Out[5]:
array([[[0, 1, 2],
[3, 4, 5]],
[[0, 1, 2],
[3, 4, 5]]])
This action of np.array is really no different from making a 2d array from nested lists, np.array([[1,2],[3,4]]). We could just add a layer of nesting, just like Out[5} without the line breaks. I tend to think of this 3d array as having 2 blocks, each with 2 rows and 3 columns. But the names are just a convenience.
stack acts like np.array, making a 3d array. It actually changes the input arrays to (1,2,3) shape, and concatenates on the first axis.
In [6]: np.stack([x,x])
Out[6]:
array([[[0, 1, 2],
[3, 4, 5]],
[[0, 1, 2],
[3, 4, 5]]])
stack lets us join the array in other ways
In [7]: np.stack([x,x], axis=1) # expand to (2,1,3) and concatante
Out[7]:
array([[[0, 1, 2],
[0, 1, 2]],
[[3, 4, 5],
[3, 4, 5]]])
In [8]: np.stack([x,x], axis=2) # expand to (2,3,1) and concatenate
Out[8]:
array([[[0, 0],
[1, 1],
[2, 2]],
[[3, 3],
[4, 4],
[5, 5]]])
concatenate and the other stack functions don't add anything new to basic numpy arrays. They just provide a way(s) of making a new array from existing ones. There aren't any special algorithms.
If it helps you could think of these join functions as creating a new "blank" array, and filling it with copies of the source arrays. For example that last stack can be done with:
In [9]: res = np.zeros((2,3,2), int)
In [10]: res
Out[10]:
array([[[0, 0],
[0, 0],
[0, 0]],
[[0, 0],
[0, 0],
[0, 0]]])
In [11]: res[:,:,0] = x
In [12]: res[:,:,1] = x
In [13]: res
Out[13]:
array([[[0, 0],
[1, 1],
[2, 2]],
[[3, 3],
[4, 4],
[5, 5]]])

Random valid data items in numpy array

Suppose I have a numpy array as follows:
data = np.array([[1, 3, 8, np.nan], [np.nan, 6, 7, 9], [np.nan, 0, 1, 2], [5, np.nan, np.nan, 2]])
I would like to randomly select n-valid items from the array, including their indices.
Does numpy provide an efficient way of doing this?
Example
data = np.array([[1, 3, 8, np.nan], [np.nan, 6, 7, 9], [np.nan, 0, 1, 2], [5, np.nan, np.nan, 2]])
n = 5
Get valid indices
y_val, x_val = np.where(~np.isnan(data))
n_val = y_val.size
Pick random subset of size n by index
pick = np.random.choice(n_val, n)
Apply index to valid coordinates
y_pick, x_pick = y_val[pick], x_val[pick]
Get corresponding data
data_pick = data[y_pick, x_pick]
Admire
data_pick
# array([2., 8., 1., 1., 2.])
y_pick
# array([3, 0, 0, 2, 3])
x_pick
# array([3, 2, 0, 2, 3])
Find nonzeros by :
In [37]: a = np.array(np.nonzero(data)).reshape(-1,2)
In [38]: a
Out[38]:
array([[0, 0],
[0, 0],
[1, 1],
[1, 1],
[2, 2],
[2, 3],
[3, 3],
[3, 0],
[1, 2],
[3, 0],
[1, 2],
[3, 0],
[2, 3],
[0, 1],
[2, 3]])
Now pick a random choice :
In [44]: idx = np.random.choice(np.arange(len(a)))
In [45]: data[a[idx][0],a[idx][1]]
Out[45]: 2.0

tensorflow manipulate labels vector into "multiple hot encoder"

is it possible (in a nice way, that is) in tensorflow to achieve the next functionality:
assume we have a dense vector of tags
labels = [0,3,1,2,0]
I need to make a "multiple hot encoder" of it. meaning, for each row I need 1's up to the index of the label minus 1
so the required result will be
[[0, 0, 0],
[1, 1, 1],
[0, 0, 1],
[0, 1, 1],
[0, 0, 0]]
thanks
You could do this using tf.nn.embeddings_lookup as shown here:
embeddings = tf.constant([[0,0,0], [0,0,1], [0,1,1], [1,1,1]])
labels = [0,3,1,2,0]
encode_tensors = tf.nn.embedding_lookup(embeddings,labels)
Output of sess.run(encode_tensors) :
array([[0, 0, 0],
[1, 1, 1],
[0, 0, 1],
[0, 1, 1],
[0, 0, 0]], dtype=int32)
Hope this helps !
for completion:
it's also possible to use:
In [397]: labels = np.array([1, 2, 0, 3, 0])
In [398]: sess.run(tf.sequence_mask(labels, 3, dtype=tf.int8))
Out[398]:
array([[1, 0, 0],
[1, 1, 0],
[0, 0, 0],
[1, 1, 1],
[0, 0, 0]], dtype=int8)
the result matrix will be reversed from what I asked though