Create a view containing subsets of numpy array - numpy

I have an numpy array of shape (1000,100)
I would like to create a new array containing the first 100 rows and then all the rows between 200th and 299th (boundaries included). Is there a way to do it using only views, without copying all the data of the array?

Unfortunately, not.
Here is why: A NumPy array draws data from an underlying block of contiguous memory.
The dtype, shape, and strides of the array determine how the data in that block of memory is to be interpreted as values.
Since an array can have only one strides attribute, the values have to be regularly spaced. Therefore, an array can not be a view of another array which takes values from the original array at irregularly spaced intervals.
Note, however, that Divakar shows that by a clever reshaping to a 3D array, the desired values can be viewed as a slice with a regularly spaced stride. So if you are willing to add another dimension, it is possible to create a view with the desired values.
Building on Divakar's answer, you could also use a.reshape(10,-1,a.shape[1])[:3:2]. This breaks the array into 10 chunks, then slices off the first 3, and steps by 2 -- giving you only the first and third chunks.

You could have a 3D array of shape (2,100,100) with some slicing and reshaping, where the first element would be the first block (0-99) rows and the second element would represent the second block with values from 200 - 299 rows off the input array.
The implementation would be -
a[:300].reshape(3,-1,a.shape[1])[::2]
Sample run with input array of shape (20,5) as we would try to get rows (0-5) and (10-15) -
1) Input array :
In [364]: a
Out[364]:
array([[6, 2, 3, 4, 7],
[4, 7, 7, 4, 7],
[3, 5, 6, 2, 1],
[0, 6, 7, 4, 8],
[1, 5, 8, 6, 7],
[6, 3, 3, 3, 3],
[1, 6, 1, 3, 5],
[6, 8, 4, 7, 6],
[8, 4, 6, 8, 7],
[4, 8, 3, 5, 2],
[4, 6, 7, 0, 8],
[7, 1, 6, 0, 7],
[1, 5, 5, 4, 4],
[3, 4, 8, 4, 7],
[0, 4, 5, 0, 5],
[2, 6, 8, 2, 4],
[5, 6, 2, 5, 0],
[6, 2, 4, 2, 7],
[3, 1, 6, 8, 4],
[0, 4, 3, 2, 0]])
2) Use proposed slicing and reshaping to get us a 3D array :
In [365]: a[:15].reshape(3,-1,a.shape[1])[::2]
Out[365]:
array([[[6, 2, 3, 4, 7],
[4, 7, 7, 4, 7],
[3, 5, 6, 2, 1],
[0, 6, 7, 4, 8],
[1, 5, 8, 6, 7]],
[[4, 6, 7, 0, 8],
[7, 1, 6, 0, 7],
[1, 5, 5, 4, 4],
[3, 4, 8, 4, 7],
[0, 4, 5, 0, 5]]])
3) Verify output with manual slicing :
In [366]: a[:5]
Out[366]:
array([[6, 2, 3, 4, 7],
[4, 7, 7, 4, 7],
[3, 5, 6, 2, 1],
[0, 6, 7, 4, 8],
[1, 5, 8, 6, 7]])
In [367]: a[10:15]
Out[367]:
array([[4, 6, 7, 0, 8],
[7, 1, 6, 0, 7],
[1, 5, 5, 4, 4],
[3, 4, 8, 4, 7],
[0, 4, 5, 0, 5]])
4) Finally, the most important part to verify that it's a view indeed :
In [368]: np.shares_memory(a, a[:15].reshape(3,-1,a.shape[1])[::2])
Out[368]: True
5) We could of course reshape it afterwards to get a 2D output, but that forces a copy there -
In [371]: a[:15].reshape(3,-1,a.shape[1])[::2].reshape(-1,a.shape[1])
Out[371]:
array([[6, 2, 3, 4, 7],
[4, 7, 7, 4, 7],
[3, 5, 6, 2, 1],
[0, 6, 7, 4, 8],
[1, 5, 8, 6, 7],
[4, 6, 7, 0, 8],
[7, 1, 6, 0, 7],
[1, 5, 5, 4, 4],
[3, 4, 8, 4, 7],
[0, 4, 5, 0, 5]])
In [372]: np.shares_memory(a, _)
Out[372]: False

Related

Efficient way to create a "rolling window" type grouping in TensorFlow

Imagine you have an n-dimensional tensor where one of those dimensions corresponds to time.
What I'd like to do is: given some integer window_size, I'd like to replace my time dimension with two new dimensions, [..., n_groups, window_size]. Where n_groups is representative of all posible groupings of size window_size across the time dimension. So if we started with a time dimension of size n_periods, then n_groups should end up being n_periods - window_size.
All of this is very easy to accomplish using traditional "pythonic" looping and slicing, such as:
stacked = tf.stack([inputs[i:i+window_size] for i in range(len(inputs) - window_size + 1)], axis=0)
However, if the time dimension is very long, this produces a staggering number of graph operations. I am wondering if there isn't a built-in TensorFlow function that might help me accomplish this relatively simple task more efficiently...
So common is the idea of "rolling-window grouping" that the Pandas project has a very sophisticated and sizeable API to handle this particular case. I would have thought that TensorFlow would also include such a utility.
Considering the tf documentation about map_fn:
"map_fn will apply the operations used by fn to each element of elems, resulting in O(elems.shape[0]) total operations. This is somewhat mitigated by the fact that map_fn can process elements in parallel. However, a transform expressed using map_fn is still typically less efficient than an equivalent transform expressed using vectorized operations."
You can try the following approach, given an input tensor:
input_tensor = tf.range([10])
# <tf.Tensor: shape=(10,), dtype=int32, numpy=array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)>
convert into a square matrix:
res = tf.repeat(tf.expand_dims(input_tensor, 0), input_tensor.shape[0], axis = 0)
# array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]], dtype=int32)>
Then apply map_fn over this tensor including in the input a range vector with negative values:
elements = tf.range(10, dtype=tf.int32) * -1
w,_ = tf.map_fn(lambda x: (tf.roll(x[0], x[1], axis=0), x[1]), (res, elements), dtype=(tf.int32, tf.int32))
This will row(left) the elements as:
#<tf.Tensor: shape=(10, 10), dtype=int32, numpy=
#array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [1, 2, 3, 4, 5, 6, 7, 8, 9, 0],
# [2, 3, 4, 5, 6, 7, 8, 9, 0, 1],
# [3, 4, 5, 6, 7, 8, 9, 0, 1, 2],
# [4, 5, 6, 7, 8, 9, 0, 1, 2, 3],
# [5, 6, 7, 8, 9, 0, 1, 2, 3, 4],
# [6, 7, 8, 9, 0, 1, 2, 3, 4, 5],
# [7, 8, 9, 0, 1, 2, 3, 4, 5, 6],
# [8, 9, 0, 1, 2, 3, 4, 5, 6, 7],
# [9, 0, 1, 2, 3, 4, 5, 6, 7, 8]], dtype=int32)>
Finally, take as much element as you need using tensor slicing like:
window = 8
tf.slice(w, [0, 0], [(w.shape[0] - window) + 1, window])
gives:
#<tf.Tensor: shape=(3, 8), dtype=int32, numpy=
#array([[0, 1, 2, 3, 4, 5, 6, 7],
# [1, 2, 3, 4, 5, 6, 7, 8],
# [2, 3, 4, 5, 6, 7, 8, 9]], dtype=int32)>
For a window = 4
window = 4
tf.slice(w, [0, 0], [(w.shape[0] - window) + 1, window])
gives:
#array([[0, 1, 2, 3],
# [1, 2, 3, 4],
# [2, 3, 4, 5],
# [3, 4, 5, 6],
# [4, 5, 6, 7],
# [5, 6, 7, 8],
# [6, 7, 8, 9]], dtype=int32)>
Try to. convert this into a tf graph to see if it has better performance than the normal python loop.

Selecting and View Metrices

so i have this Multi-dimensional array with the shape (2,3,4,5)
Here is how it looks like.
rand_5 =
array([[[[0, 2, 8, 9, 6],
[4, 9, 7, 3, 3],
[8, 3, 0, 1, 0],
[0, 6, 7, 7, 9]],
[[3, 0, 7, 7, 7],
[0, 5, 4, 3, 1],
[3, 1, 3, 4, 3],
[1, 9, 5, 9, 1]],
[[2, 3, 2, 2, 5],
[7, 3, 0, 9, 9],
[3, 4, 5, 3, 0],
[4, 8, 6, 7, 2]]],
[[[7, 3, 8, 6, 6],
[5, 6, 5, 7, 1],
[5, 4, 4, 9, 9],
[0, 6, 2, 6, 8]],
[[2, 4, 1, 6, 1],
[5, 1, 6, 9, 8],
[6, 5, 9, 7, 5],
[4, 9, 6, 8, 1]],
[[5, 5, 8, 3, 7],
[7, 9, 4, 7, 5],
[9, 6, 2, 0, 5],
[3, 0, 5, 7, 1]]]])
the third metric in the second index metric(1) is shown below
is rand_5[1,2] =
array([[5, 5, 8, 3, 7],
[7, 9, 4, 7, 5],
[9, 6, 2, 0, 5],
[3, 0, 5, 7, 1]])
QUESTION?
My Question is how can i select from the 2nd,3rd row & 1st,2nd Column from the metric above, such that i have the result shown in the metric below.?
[9,6]
[3,0]
With array slicing:
rand_5[1, 2, 2:4, 0:2]
outputs:
array([[9, 6],
[3, 0]])

Calculating distance between each element of an array

I have an array,
a = np.array([1, 3, 5, 10])
I would like to create a function that calculates the distance between each of its elements from every other element. There should be no for loop as speed is critical.
The expected result of the above would be:
array([[0, 2, 4, 9],
[2, 0, 2, 7],
[4, 2, 0, 5],
[9, 7, 5, 0]])
You can use numpy.subtract.outer:
np.abs(np.subtract.outer(a, a))
array([[0, 2, 4, 9],
[2, 0, 2, 7],
[4, 2, 0, 5],
[9, 7, 5, 0]])
Or equivalently use either of the followings:
np.abs(a - a[:, np.newaxis])
np.abs(a - a[:, None])
np.abs(a - a.reshape((-1, 1)))

Mat2cell matlab equivalent in tensorflow or pytorch

I have a square matrix and like to break it into some smaller matrices. For example, assume we have a matrix with the shape of [4,4] and would like to convert it into 4 smaller matrices with size [2,2].
input:
[9, 9, 9, 9,
8, 8, 8, 8,
7, 7, 7, 7,
6, 6, 6, 6]
output:
[[9, 9 | [9, 9,
8, 8] | 8, 8],
---------------
[7, 7 | [7, 7,
6, 6] | 6, 6]]
You can use repeated calls to torch.split for this.
>>> x
tensor([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12],
[13, 14, 15, 16]])
>>> [z for y in x.split(2) for z in y.split(2, dim=1)]
[tensor([[1, 2],
[5, 6]]), tensor([[3, 4],
[7, 8]]), tensor([[ 9, 10],
[13, 14]]), tensor([[11, 12],
[15, 16]])]
Given a tensor with the shape of 4*4 or 1*16 the easiest way to do this is by view function or reshape:
a = torch.tensor([9, 9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 6, 6, 6, 6])
# a = a.view(4,4)
a = a.view(2, 2, 2, 2)
# output:
tensor([[[[9, 9],
[9, 9]],
[[8, 8],
[8, 8]]],
[[[7, 7],
[7, 7]],
[[6, 6],
[6, 6]]]])

Python - numpy mgrid and reshape

Can someone explain to me what the second line of this code does?
objp = np.zeros((48,3), np.float32)
objp[:,:2] = np.mgrid[0:8,0:6].T.reshape(-1,2)
Can someone explain to me what exactly the np.mgrid[0:8,0:6] part of the code is doing and what exactly the T.reshape(-1,2) part of the code is doing?
Thanks and good job!
The easiest way to see these is to use smaller values for mgrid:
In [11]: np.mgrid[0:2,0:3]
Out[11]:
array([[[0, 0, 0],
[1, 1, 1]],
[[0, 1, 2],
[0, 1, 2]]])
In [12]: np.mgrid[0:2,0:3].T # (matrix) transpose
Out[12]:
array([[[0, 0],
[1, 0]],
[[0, 1],
[1, 1]],
[[0, 2],
[1, 2]]])
In [13]: np.mgrid[0:2,0:3].T.reshape(-1, 2) # reshape to an Nx2 matrix
Out[13]:
array([[0, 0],
[1, 0],
[0, 1],
[1, 1],
[0, 2],
[1, 2]])
Then objp[:,:2] = sets the 0th and 1th columns of objp to this result.
The second line creates a multi-dimensional mesh grid, transposes it, reshapes it so that it represents two columns and inserts it into the first two columns of the objp array.
Breakdown:
np.mgrid[0:8,0:6] creates the following mgrid:
>> np.mgrid[0:8,0:6]
array([[[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3],
[4, 4, 4, 4, 4, 4],
[5, 5, 5, 5, 5, 5],
[6, 6, 6, 6, 6, 6],
[7, 7, 7, 7, 7, 7]],
[[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5]]])
The .T transposes the matrix, and the .reshape(-1,2) then reshapes it into two a two-column array shape. These two columns are then the correct shape to replace two columns in the original array.