Third parameter of np.r_? (numpy) - numpy

I'm looking over the docs and I still can't figure out how the third parameter operates.
np.r_['0,2,0', [1,2,3], [4,5,6]]
output:
array([[1],
[2],
[3],
[4],
[5],
2)
np.r_['1,2,0', [1,2,3], [4,5,6]]
output:
array([[1, 4],
[2, 5],
[3, 6]])
The first parameter is the axis, second is the number of dimensions and third according to the docs means " which axis should contain the start of the arrays which are less than the specified number of dimensions"
Here are the docs:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.r_.html
Thank you.

Maybe a simple example can clear things up:
b=np.arange(3)
np.r_['0,2,0', b, b]
# array([[0],
# [1],
# [2],
# [0],
# [1],
# [2]])
np.r_['0,2,1', b, b]
# array([[0, 1, 2],
# [0, 1, 2]])
We are concatenating b a 1d array with itself. The second number specifies that it should be made 2d before it gets stacked on itself as specified by the first number. Now there are two ways to make a shape (3,) array 2d: either make it (3, 1) (first example) or make it (1, 3) (second example). The third number specifies where the first original dimension (i.e. 3) goes in the 2d array.

I don't believe the existing answer is right. From my testing it seems that setting the third integer at 1 is the default and doesn't make a change. But setting it at 0 results in NumPy going into every row in your array, and turning all its elements into individual rows. So if your array has a row [1,2,3] inside of it, it becomes [1],[2],[3].
np.r_['0,2,1', [1,2,3], [4,5,6]]
#array([[1, 2, 3],
# [4, 5, 6]])
np.r_['0,2,0', [1,2,3], [4,5,6]]
#array([[1],
# [2],
# [3],
# [4],
# [5],
# [6]])
It also seems that Numpy only splits up the elements of the outermost row into individual rows:
np.r_['0,2,1', [[1,2,3], [4,5,6]]]
#array([[1, 2, 3],
# [4, 5, 6]])

https://docs.scipy.org/doc/numpy/reference/generated/numpy.r_.html
Negative integers specify where in the new shape tuple the last dimension of upgraded arrays should be placed, so the default is ‘-1’.
what does this sentence mean?
np.r_['0,2,-5', [1,2,3],[4,5,6] ] # ValueError: all the input array dimensions except for the concatenation axis must match exactly
np.r_['0,2,-6', [1,2,3],[4,5,6] ] # array([[1],[2],[3],[4],[5],[6]])

Related

Numpy, how to retrieve sub-array of array (specific indices)?

I have an array:
>>> arr1 = np.array([[1,2,3], [4,5,6], [7,8,9]])
array([[1 2 3]
[4 5 6]
[7 8 9]])
I want to retrieve a list (or 1d-array) of elements of this array by giving a list of their indices, like so:
indices = [[0,0], [0,2], [2,0]]
print(arr1[indices])
# result
[1,6,7]
But it does not work, I have been looking for a solution about it for a while, but I only found ways to select per row and/or per column (not per specific indices)
Someone has any idea ?
Cheers
Aymeric
First make indices an array instead of a nested list:
indices = np.array([[0,0], [0,2], [2,0]])
Then, index the first dimension of arr1 using the first values of indices, likewise the second:
arr1[indices[:,0], indices[:,1]]
It gives array([1, 3, 7]) (which is correct, your [1, 6, 7] example output is probably a typo).

Argmax indexing in pytorch with 2 tensors of equal shape

Summarize the problem
I am working with high dimensional tensors in pytorch and I need to index one tensor with the argmax values from another tensor. So I need to index tensor y of dim [3,4] with the results from the argmax of tensor xwith dim [3,4]. If tensors are:
import torch as T
# Tensor to get argmax from
# expected argmax: [2, 0, 1]
x = T.tensor([[1, 2, 8, 3],
[6, 3, 3, 5],
[2, 8, 1, 7]])
# Tensor to index with argmax from preivous
# expected tensor to retrieve [2, 4, 9]
y = T.tensor([[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11]])
# argmax
x_max, x_argmax = T.max(x, dim=1)
I would like an operation that given the argmax indexes of x, or x_argmax, retrieves the values in tensor y in the same indexes x_argmax indexes.
Describe what you’ve tried
This is what I have tried:
# What I have tried
print(y[x_argmax])
print(y[:, x_argmax])
print(y[..., x_argmax])
print(y[x_argmax.unsqueeze(1)])
I have been reading a lot about numpy indexing, basic indexing, advanced indexing and combined indexing. I have been trying to use combined indexing (since I want a slice in first dimension of the tensor and the indexes values on the second one). But I have not been able to come up with a solution for this use case.
You are looking for torch.gather:
idx = torch.argmax(x, dim=1, keepdim=true) # get argmax directly, w/o max
out = torch.gather(y, 1, idx)
Resulting with
tensor([[2],
[4],
[9]])
How about y[T.arange(3), x_argmax]?
That does the job for me...
Explanation: You take dimensional information away when you invoke T.max(x, dim=1), so this information needs to be restored explicitly.

Why is the output like this? I do not understand how the indexing is working

How is it indexing it? Why is the output [1,4,5]?
I am following the tutorial on http://cs231n.github.io/python-numpy-tutorial/#numpy
a = np.array([[1,2], [3, 4], [5, 6]])
# An example of integer array indexing.
# The returned array will have shape (3,) and
print(a[[0, 1, 2], [0, 1, 0]]) # Prints "[1 4 5]"
It's called fancy indexing in numpy.
You can image the first list and the second list as x-axis and y-axis. So a[[0,1,2],[0,1,0]] is like getting three elements which their coordinates are (0,0), (1,1), (2,0) from a.
a[0,0] # 1
a[1,1] # 4
a[2,0] # 5

How to get those rows having the equal value and their subscript if there is a [10,1] tensor?

I am new in TensorFlow. If there is a [10,1] tensor, I want to find out all rows with the same value and their subscript.
For example, there is a tensor like [[1],[2],[3],[4],[5],[1],[2],[3],[4],[6]].
By comparing each element in the matrix, it is easy to get a dictionary structure like
{‘1’: [0,5], ‘2’: [1,6], ‘3’: [2, 7], ‘4’: [3, 8], ‘5’: [4], ‘6’: [9]} in python, which can record how many times each element occurs in the matrix.
I expect to achieve this result in TensorFlow. Could someone please give me a hand? Thanks a lot.
I think this is a longer method. Still the elements and indices are not associated in a data structure.
Other shorter methods must be there.
t = tf.constant([[1],[2],[3],[4],[5],[1],[2],[3],[4],[6]])
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
y, idx, count =tf.unique_with_counts(tf.squeeze(t))
y1, idx1, count1 = sess.run([y,idx,count])
for i in range(len(y1)) :
print( sess.run( tf.where(tf.equal(t,y1[i]))[:2,-2]))
Output is
[0 5]
[1 6]
[2 7]
[3 8]
[4]
[9]

what is the use of reduce command in tensorflow?

tensorflow.reduce_sum(..) computes the sum of elements across dimensions of a tensor. it is Ok.
But one thing is not clear to me , what is the purpose of saying reduce in the function name ?
Is it related to map_reduce of parallel computation?
Let's say, it distributes the required computation to
different cores , and collect the result from the cores , finally delivers the sum of the collected results ?
Because you can compute the sum along a given dimension (and therefore reduce it). And no it has nothing to do with map-reduce.
Quoting the documentation string of the method:
Reduces input_tensor along the dimensions given in axis. Unless keepdims is true, the rank of the tensor is reduced by 1 for each entry in axis. If keepdims is true, the reduced dimensions are retained with length 1.
Example from the API:
x = tf.constant([[1, 1, 1], [1, 1, 1]])
tf.reduce_sum(x) # 6
tf.reduce_sum(x, 0) # [2, 2, 2]
tf.reduce_sum(x, 1) # [3, 3]
tf.reduce_sum(x, 1, keepdims=True) # [[3], [3]]
tf.reduce_sum(x, [0, 1]) # 6