Fastest method to create 2D numpy array whose elements are in range - numpy

I want to create a 2D numpy array where I want to store the coordinates of the pixels such that numpy array looks like this
[(0, 0), (0, 1), (0, 2), ...., (0, 510), (0, 511)
(1, 0), (1, 1), (1, 2), ...., (1, 510), (1, 511)
..
..
..
(511, 0), (511, 1), (511, 2), ...., (511, 510), (511, 511)]
This is a ridiculous question but I couldn't find anything yet.

Can use np.indices or np.meshgrid for more advanced indexing:
>>> data=np.indices((512,512)).swapaxes(0,2).swapaxes(0,1)
>>> data.shape
(512, 512, 2)
>>> data[5,0]
array([5, 0])
>>> data[5,25]
array([ 5, 25])
This may look odd because its really made to do something like this:
>>> a=np.ones((3,3))
>>> ind=np.indices((2,1))
>>> a[ind[0],ind[1]]=0
>>> a
array([[ 0., 1., 1.],
[ 0., 1., 1.],
[ 1., 1., 1.]])
A mgrid example:
np.mgrid[0:512,0:512].swapaxes(0,2).swapaxes(0,1)
A meshgrid example:
>>> a=np.arange(0,512)
>>> x,y=np.meshgrid(a,a)
>>> ind=np.dstack((y,x))
>>> ind.shape
(512, 512, 2)
>>> ind[5,0]
array([5, 0])
All are equivalent ways of doing this; however, meshgrid can be used to create non-uniform grids.
If you do not mind switching row/column indices you can drop the final swapaxes(0,1).

You can use np.ogrid here. Instead of storing a tuple, store it in a 3D array.
>>> t_row, t_col = np.ogrid[0:512, 0:512]
>>> a = np.zeros((512, 512, 2), dtype=np.uint8)
>>> t_row, t_col = np.ogrid[0:512, 0:512]
>>> a[t_row, t_col, 0] = t_row
>>> a[t_row, t_col, 1] = t_col
This should do the trick. Hopefully you can use this, instead of the tuple.
Chintak

The example in the question is not completely clear - either extra commas are missing or extra brakets.
This one - example ranges 3, 4 for clarity - provides the solution for the first variant and produces a 2D array in effect (as the question title suggests) - "listing" all coordinates:
>>> np.indices((3,4)).reshape(2,-1).T
array([[0, 0],
[0, 1],
[0, 2],
[0, 3],
[1, 0],
[1, 1],
[1, 2],
[1, 3],
[2, 0],
[2, 1],
[2, 2],
[2, 3]])
The other variant was already shown in another answer by using 2x .swapaxes() - but it could also be done with one np.rollaxis() (or the new np.moveaxis()) :
>>> np.rollaxis(np.indices((3,4)), 0, 2+1)
array([[[0, 0],
[0, 1],
[0, 2],
[0, 3]],
[[1, 0],
[1, 1],
[1, 2],
[1, 3]],
[[2, 0],
[2, 1],
[2, 2],
[2, 3]]])
>>> _[0,1]
array([0, 1])
This method also works the same for N-dimensional indices, e.g.:
>>> np.rollaxis(np.indices((5,6,7)), 0, 3+1)
Note: The function np.indices works indeed (C-speed) fast for big ranges.

Related

Numpy: How to select row entries in a 2d array by column vector

How can I retrieve a column vector from a 2d array given an indicator column vector?
Suppose I have
X = np.array([[1, 4, 6],
[8, 2, 9],
[0, 3, 7],
[6, 5, 1]])
and
S = np.array([0, 2, 1, 2])
Is there an elegant way to get from X and S the result array([1, 9, 3, 1]), which is equivalent to
np.array([x[s] for x, s in zip(X, S)])
You can achieve this using np.take_along_axis:
>>> np.take_along_axis(X, S[..., None], axis=1)
array([[1],
[9],
[3],
[1]])
You need to make sure both array arguments are of the same shape (or broadcasting can be applied), hence the S[..., None] broadcasting.
Of course your can reshape the returned value with a [:, 0] slice.
>>> np.take_along_axis(X, S[..., None], axis=1)[:, 0]
array([1, 9, 3, 1])
Alternatively you can just use indexing with an arangement:
>>> X[np.arange(len(S)), S[np.arange(len(S))]]
array([1, 9, 3, 1])
I believe this is also equivalent to np.diag(X[:, S]) but with unnecessary copying...
For 2d arrays
# Mention row numbers as one list and S which is column number as other
X[[0, 1, 2, 3], S]
# more general
X[np.indices(S.shape), S]
indexing_basics

Algorithms of Joining arrays in numpy

I'm new in numpy, I understand the methods of "Joining arrays" in lower shape such as (n1, n2) beacause we can visualize, like a matrix.
But I don't undestand the logic in higher dimensions (n0, ...., n_{d-1}) of course I can't visualize that. To visualize I usually imagine a multidimensional array like a tree, so (n0, ...., n_{d-1}) means that at level (axis) i of tree every node has n_{i} children. So at level 0 (the root) we have n0 children and so on.
In substance what is the formal exact definiton of "Joining arrays" algorithms?
https://numpy.org/doc/stable/reference/routines.array-manipulation.html
Let's see I can illustrate some basic array operations.
First make a 2d array. Start with a 1d, [0,1,...5], and reshape it to (2,3):
In [1]: x = np.arange(6).reshape(2,3)
In [2]: x
Out[2]:
array([[0, 1, 2],
[3, 4, 5]])
I can join 2 copies of x along the 1st dimension (vstack, v for vertical also does this):
In [3]: np.concatenate([x,x], axis=0)
Out[3]:
array([[0, 1, 2],
[3, 4, 5],
[0, 1, 2],
[3, 4, 5]])
Note that the result is (4,3); no new dimension.
Or join them 'horizontally':
In [4]: np.concatenate([x,x], axis=1)
Out[4]:
array([[0, 1, 2, 0, 1, 2], # (2,6) shape
[3, 4, 5, 3, 4, 5]])
But if I supply them to np.array I make a 3d array (2,2,3) shape:
In [5]: np.array([x,x])
Out[5]:
array([[[0, 1, 2],
[3, 4, 5]],
[[0, 1, 2],
[3, 4, 5]]])
This action of np.array is really no different from making a 2d array from nested lists, np.array([[1,2],[3,4]]). We could just add a layer of nesting, just like Out[5} without the line breaks. I tend to think of this 3d array as having 2 blocks, each with 2 rows and 3 columns. But the names are just a convenience.
stack acts like np.array, making a 3d array. It actually changes the input arrays to (1,2,3) shape, and concatenates on the first axis.
In [6]: np.stack([x,x])
Out[6]:
array([[[0, 1, 2],
[3, 4, 5]],
[[0, 1, 2],
[3, 4, 5]]])
stack lets us join the array in other ways
In [7]: np.stack([x,x], axis=1) # expand to (2,1,3) and concatante
Out[7]:
array([[[0, 1, 2],
[0, 1, 2]],
[[3, 4, 5],
[3, 4, 5]]])
In [8]: np.stack([x,x], axis=2) # expand to (2,3,1) and concatenate
Out[8]:
array([[[0, 0],
[1, 1],
[2, 2]],
[[3, 3],
[4, 4],
[5, 5]]])
concatenate and the other stack functions don't add anything new to basic numpy arrays. They just provide a way(s) of making a new array from existing ones. There aren't any special algorithms.
If it helps you could think of these join functions as creating a new "blank" array, and filling it with copies of the source arrays. For example that last stack can be done with:
In [9]: res = np.zeros((2,3,2), int)
In [10]: res
Out[10]:
array([[[0, 0],
[0, 0],
[0, 0]],
[[0, 0],
[0, 0],
[0, 0]]])
In [11]: res[:,:,0] = x
In [12]: res[:,:,1] = x
In [13]: res
Out[13]:
array([[[0, 0],
[1, 1],
[2, 2]],
[[3, 3],
[4, 4],
[5, 5]]])

Random valid data items in numpy array

Suppose I have a numpy array as follows:
data = np.array([[1, 3, 8, np.nan], [np.nan, 6, 7, 9], [np.nan, 0, 1, 2], [5, np.nan, np.nan, 2]])
I would like to randomly select n-valid items from the array, including their indices.
Does numpy provide an efficient way of doing this?
Example
data = np.array([[1, 3, 8, np.nan], [np.nan, 6, 7, 9], [np.nan, 0, 1, 2], [5, np.nan, np.nan, 2]])
n = 5
Get valid indices
y_val, x_val = np.where(~np.isnan(data))
n_val = y_val.size
Pick random subset of size n by index
pick = np.random.choice(n_val, n)
Apply index to valid coordinates
y_pick, x_pick = y_val[pick], x_val[pick]
Get corresponding data
data_pick = data[y_pick, x_pick]
Admire
data_pick
# array([2., 8., 1., 1., 2.])
y_pick
# array([3, 0, 0, 2, 3])
x_pick
# array([3, 2, 0, 2, 3])
Find nonzeros by :
In [37]: a = np.array(np.nonzero(data)).reshape(-1,2)
In [38]: a
Out[38]:
array([[0, 0],
[0, 0],
[1, 1],
[1, 1],
[2, 2],
[2, 3],
[3, 3],
[3, 0],
[1, 2],
[3, 0],
[1, 2],
[3, 0],
[2, 3],
[0, 1],
[2, 3]])
Now pick a random choice :
In [44]: idx = np.random.choice(np.arange(len(a)))
In [45]: data[a[idx][0],a[idx][1]]
Out[45]: 2.0

How to do a multidimensional slice in tensorflow?

For example:
array = [[1, 2, 3], [4, 5, 6]]
slice = [[0, 0, 1], [0, 1, 2]]
output = [[1, 1, 2], [4, 5,6]]
I've tried array[slice], but that didn't work. I also couldn't get tf.gather or tf.gather_nd to work, although these initially seemed like the correct functions to use. Note that these are all tensors in-graph.
How can I select these values in my array according to slice?
You need to add a dimension to your slice tensor which you can do with tf.pack and then we can use tf.gather_nd no problem.
import tensorflow as tf
tensor = tf.constant([[1, 2, 3], [4, 5, 6]])
old_slice = tf.constant([[0, 0, 1], [0, 1, 2]])
# We need to add a dimension - we need a tensor of rank 2, 3, 2 instead of 2, 3
dims = tf.constant([[0, 0, 0], [1, 1, 1]])
new_slice = tf.pack([dims, old_slice], 2)
out = tf.gather_nd(tensor, new_slice)
If we run the follow code:
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
run_tensor, run_slice, run_out = sess.run([tensor, new_slice, out])
print 'Input tensor:'
print run_tensor
print 'Correct param for gather_nd:'
print run_slice
print 'Output:'
print run_out
This should give the correct output:
Input tensor:
[[1 2 3]
[4 5 6]]
Correct param for gather_nd:
[[[0 0]
[0 0]
[0 1]]
[[1 0]
[1 1]
[1 2]]]
Output:
[[1 1 2]
[4 5 6]]
An even easier way to calculate the results, which is also of more general nature, is to directly leverage the batch_dims argument of tf.gather:
>>> array = tf.constant([[1,2,3], [4,5,6]])
>>> slice = tf.constant([[0,0,1], [0,1,2]])
>>> output = tf.constant([[1,1,2], [4,5,6]])
>>> tf.gather(array, slice, batch_dims=1, axis=1)
<tf.Tensor: shape=(2, 3), dtype=int32, numpy=
array([[1, 1, 2],
[4, 5, 6]], dtype=int32)>

Merge duplicate indices in a sparse tensor

Lets say I have a sparse tensor with duplicate indices and where they are duplicate I want to merge values (sum them up)
What is the best way to do this?
example:
indicies = [[1, 1], [1, 2], [1, 2], [1, 3]]
values = [1, 2, 3, 4]
object = tf.SparseTensor(indicies, values, shape=[10, 10])
result = tf.MAGIC(object)
result should be a spare tensor with the following values (or concrete!):
indicies = [[1, 1], [1, 2], [1, 3]]
values = [1, 5, 4]
The only thing I have though of is to string concat the indicies together to create an index hash apply it to a third dimension and then reduce sum on that third dimension.
indicies = [[1, 1, 11], [1, 2, 12], [1, 2, 12], [1, 3, 13]]
sparse_result = tf.sparse_reduce_sum(sparseTensor, reduction_axes=2, keep_dims=true)
But that feels very very ugly
Here is a solution using tf.segment_sum. The idea is to linearize the indices in to a 1-D space, get the unique indices with tf.unique, run tf.segment_sum, and convert the indices back to N-D space.
indices = tf.constant([[1, 1], [1, 2], [1, 2], [1, 3]])
values = tf.constant([1, 2, 3, 4])
# Linearize the indices. If the dimensions of original array are
# [N_{k}, N_{k-1}, ... N_0], then simply matrix multiply the indices
# by [..., N_1 * N_0, N_0, 1]^T. For example, if the sparse tensor
# has dimensions [10, 6, 4, 5], then multiply by [120, 20, 5, 1]^T
# In your case, the dimensions are [10, 10], so multiply by [10, 1]^T
linearized = tf.matmul(indices, [[10], [1]])
# Get the unique indices, and their positions in the array
y, idx = tf.unique(tf.squeeze(linearized))
# Use the positions of the unique values as the segment ids to
# get the unique values
values = tf.segment_sum(values, idx)
# Go back to N-D indices
y = tf.expand_dims(y, 1)
indices = tf.concat([y//10, y%10], axis=1)
tf.InteractiveSession()
print(indices.eval())
print(values.eval())
Maybe you can try:
indicies = [[1, 1], [1, 2], [1, 2], [1, 3]]
values = [1, 2, 3, 4]
object = tf.SparseTensor(indicies, values, shape=[10, 10])
tf.sparse.to_dense(object, validate_indices=False)
Using unsorted_segment_sum could be simpler:
def deduplicate(tensor):
if not isinstance(tensor, tf.IndexedSlices):
return tensor
unique_indices, new_index_positions = tf.unique(tensor.indices)
summed_values = tf.unsorted_segment_sum(tensor.values, new_index_positions, tf.shape(unique_indices)[0])
return tf.IndexedSlices(indices=unique_indices, values=summed_values, dense_shape=tensor.dense_shape)
Another solution is to use tf.scatter_nd which will create a dense tensor and accumulate values on duplicate indices. This behavior is clearly stated in the documentation:
If indices contains duplicates, the duplicate values are accumulated (summed).
Then eventually we can convert it back to a sparse representation.
Here is a code sample for TensorFlow 2.x in eager mode:
import tensorflow as tf
indicies = [[1, 1], [1, 2], [1, 2], [1, 3]]
values = [1, 2, 3, 4]
merged_dense = tf.scatter_nd(indices, values, shape=(10, 10))
merged_sparse = tf.sparse.from_dense(merged_dense)
print(merged_sparse)
Output
SparseTensor(
indices=tf.Tensor(
[[1 1]
[1 2]
[1 3]],
shape=(3, 2),
dtype=int64),
values=tf.Tensor([1 5 4], shape=(3,), dtype=int32),
dense_shape=tf.Tensor([10 10], shape=(2,), dtype=int64))
So. As per the solution mentioned above.
Another example.
For the shape [12, 5]:
Lines to be changed in the code:
linearized = tf.matmul(indices, [[5], [1]])
indices = tf.concat([y//5, y%5], axis=1)