How to get index from value in 2D Numpy array - numpy

I have two arrays:
values_arr = [[100,1], [20,5], [40,50]...[50,30]]
images_arr = [img1, img2, img3,...imgn]
Both the arrays are numpy arrays.
The values_arr and images_arr are in the same order.
i.e
[100, 1] corresponds to img1
How do I get the image given the value of index?
index = [20,5]
In this case, I should get img2 given the value of index = [20,5].

You can make a dict as
values_arr_tup = [tuple(i) for i in values_arr]
dict_ = {key:value for key,value in zip(values_arr_tup ,images_arr)}
then perform dict_[tuple(index)] to get the image

You can use np.where to extract the index of the item :
images_arr[np.where(values_arr == [20,5])[0][0]]

Related

randomly choose value between two numpy arrays

I have two numpy arrays:
left = np.array([2, 7])
right = np.array([4, 7])
right_p1 = right + 1
What I want to do is
rand = np.zeros(left.shape[0])
for i in range(left.shape[0]):
rand[i] = np.random.randint(left[i], right_p1[i])
Is there a way I could do this without using a for loop?
You could try with:
extremes = zip(left, right_p1)
rand = map(lambda x: np.random.randint(x[0], x[1]), extremes)
This way you will end up with a map object. If you need to save memory, you can keep it that way, otherwise you can get the full np.array passing through a list conversion, like this:
rand = np.array(list(map(lambda x: np.random.randint(x[0], x[1]), extremes)))

Creating 2d array and filling first columns of each row in numpy

I have written the following code for creating a 2D array and filing the first element of each row. I am new to numpy. Is there a better way to do this?
y=np.zeros(N*T1).reshape(N,T1)
x = np.linspace(0,L,num = N)
for k in range(0,N):
y[k][0] = np.sin(PI*x[k]/L)
Simply do this:
y[:, 0] = np.sin(PI*x/L)

Extract array elements from another array indices

I have a numpy array, a:
a = np.array([[-21.78878256, 97.37484004, -11.54228119],
[ -5.72592375, 99.04189958, 3.22814204],
[-19.80795922, 95.99377136, -10.64537733]])
I have another array, b:
b = np.array([[ 54.64642121, 64.5172014, 44.39991983],
[ 9.62420892, 95.14361441, 0.67014312],
[ 49.55036427, 66.25136632, 40.38778238]])
I want to extract minimum value indices from the array, b.
ixs = [[2],
[2],
[2]]
Then, want to extract elements from the array, a using the indices, ixs:
The expected answer is:
result = [[-11.54228119]
[3.22814204]
[-10.64537733]]
I tried as:
ixs = np.argmin(b, axis=1)
print ixs
[2,2,2]
result = np.take(a, ixs)
print result
Nope!
Any ideas are welcomed
You can use
result = a[np.arange(a.shape[0]), ixs]
np.arange will generate indices for each row and ixs will have indices for each column. So effectively result will have required result.
You can try using below code
np.take(a, ixs, axis = 1)[:,0]
The initial section will create a 3 by 3 array and slice the first column
>>> np.take(a, ixs, axis = 1)
array([[-11.54228119, -11.54228119, -11.54228119],
[ 3.22814204, 3.22814204, 3.22814204],
[-10.64537733, -10.64537733, -10.64537733]])

TypeError: unhashable type: 'numpy.ndarray' - How to get data from data frame by querying radius from ball tree?

How to get data by querying radius from ball tree? For example
from sklearn.neighbors import BallTree
import pandas as pd
bt = BallTree(df[['lat','lng']], metric="haversine")
for idx, row in df.iterrow():
res = df[bt.query_radius(row[['lat','lng']],r=1)]
I want to get those rows in df that are in radius r=1. But it throws type error
TypeError: unhashable type: 'numpy.ndarray'
Following the first answer I got index out of range when iterating over the rows
5183
(5219, 25)
5205
(5219, 25)
5205
(5219, 25)
5221
(5219, 25)
Traceback (most recent call last):
File "/Users/Chu/Documents/dssg2018/sa4.py", line 45, in <module>
df.loc[idx,word]=len(df.iloc[indices[idx]][df[word]==1])/\
IndexError: index 5221 is out of bounds for axis 0 with size 5219
And the code is
bag_of_words = ['beautiful','love','fun','sunrise','sunset','waterfall','relax']
for idx,row in df.iterrows():
for word in bag_of_words:
if word in row['caption']:
df.loc[idx, word] = 1
else:
df.loc[idx, word] = 0
bt = BallTree(df[['lat','lng']], metric="haversine")
indices = bt.query_radius(df[['lat','lng']],r=(float(10)/40000)*360)
for idx,row in df.iterrows():
for word in bag_of_words:
if word in row['caption']:
print(idx)
print(df.shape)
df.loc[idx,word]=len(df.iloc[indices[idx]][df[word]==1])/\
np.max([1,len(df.iloc[indices[idx]][df[word]!=1])])
The error is not in the BallTree, but the indices returned by it are not used properly for putting it into index.
Do it this way:
for idx, row in df.iterrows():
indices = bt.query_radius(row[['lat','lng']].values.reshape(1,-1), r=1)
res = df.iloc[[x for b in indices for x in b]]
# Do what you want to do with res
This will also do (since we are sending only a single point each time):
res = df.iloc[indices[0]]
Explanation:
I'm using scikit 0.20. So the code you wrote above:
df[bt.query_radius(row[['lat','lng']],r=1)]
did not work for me. I needed to make it a 2-d array by using reshape().
Now bt.query_radius() returns array of array of indices within the radius r specified as mentioned in the documentation:
ind : array of objects, shape = X.shape[:-1]
each element is a numpy integer array listing the indices of neighbors of the corresponding point. Note that unlike the results of
a k-neighbors query, the returned neighbors are not sorted by distance
by default.
So we needed to iterate two arrays to reach the actual indices of the data.
Now once we got the indices, in a pandas Dataframe, iloc is the way to access data with indices.
Update:
You dont need to query the bt each time for individual points. You can send all the df at once to return a 2-d array containing the indices of points within the radius to the point specified that index.
indices = bt.query_radius(df, r=1)
for idx, row in df.iterrows():
nearest_points_index = indices[idx]
res = df.iloc[nearest_points_index]
# Do what you want to do with res

Sort Numpy array by subfield

I have a structured numpy array, in which one of field has subfields:
import numpy, string, random
dtype = [('name', 'a10'), ('id', 'i4'),
('size', [('length', 'f8'), ('width', 'f8')])]
a = numpy.zeros(10, dtype = dtype)
for idx in range(len(a)):
a[idx] = (''.join(random.sample(string.ascii_lowercase, 10)), idx,
numpy.random.uniform(0, 1, size=[1, 2]))
I can easily get it sorted by any of fields, like this:
a.sort(order = ['name'])
a.sort(order = ['size'])
When I try to sort it by a structured field ('size' in this example), it is effectively getting sorted by the first subfield ('length' in this example). However, I would like to have my elements sorted by 'height'. I tried something like this, but it does not work:
a.sort(order = ['size[\'height\']']))
ValueError: unknown field name: size['height']
a.sort(order = ['size', 'height'])
ValueError: unknown field name: height
Therefore, I wonder, if there is a way to accomplish the task?
I believe this is what you want:
a[a["size"]["width"].argsort()]