Extract array elements from another array indices - numpy

I have a numpy array, a:
a = np.array([[-21.78878256, 97.37484004, -11.54228119],
[ -5.72592375, 99.04189958, 3.22814204],
[-19.80795922, 95.99377136, -10.64537733]])
I have another array, b:
b = np.array([[ 54.64642121, 64.5172014, 44.39991983],
[ 9.62420892, 95.14361441, 0.67014312],
[ 49.55036427, 66.25136632, 40.38778238]])
I want to extract minimum value indices from the array, b.
ixs = [[2],
[2],
[2]]
Then, want to extract elements from the array, a using the indices, ixs:
The expected answer is:
result = [[-11.54228119]
[3.22814204]
[-10.64537733]]
I tried as:
ixs = np.argmin(b, axis=1)
print ixs
[2,2,2]
result = np.take(a, ixs)
print result
Nope!
Any ideas are welcomed

You can use
result = a[np.arange(a.shape[0]), ixs]
np.arange will generate indices for each row and ixs will have indices for each column. So effectively result will have required result.

You can try using below code
np.take(a, ixs, axis = 1)[:,0]
The initial section will create a 3 by 3 array and slice the first column
>>> np.take(a, ixs, axis = 1)
array([[-11.54228119, -11.54228119, -11.54228119],
[ 3.22814204, 3.22814204, 3.22814204],
[-10.64537733, -10.64537733, -10.64537733]])

Related

sort dataframe by string and set a new id

is there a possibility to adjust the strings according to the order for example 1.wav, 2.wav 3.wav etc. and the ID accordingly with ID: 1, 2, 3 etc?
i have already tried several sorting options do any of you have any ideas?
Thank you in advance
dataframe output
def createSampleDF(audioPath):
data = []
for file in Path(audioPath).glob('*.wav'):
print(file)
data.append([os.path.basename(file), file])
df_dataSet = pd.DataFrame(data, columns= ['audio_name', 'filePath'])
df_dataSet['ID'] = df_dataSet.index+1
df_dataSet = df_dataSet[['ID','audio_name','filePath']]
df_dataSet.sort_values(by=['audio_name'],inplace=True)
return df_dataSet
def createSamples(myAudioPath,savePath, sampleLength, overlap = 0):
cutSamples(myAudioPath=myAudioPath,savePath=savePath,sampleLength=sampleLength)
df_dataSet=createSampleDF(audioPath=savePath)
return df_dataSet
You can split the string, make it an integer, and then sort on multiple columns. See the pandas.Dataframe.sort_values for more info. If your links are more complicated you may need to design a regex to pull out the integers you want to sort on using pandas.Series.str.extract.
df = pd.DataFrame({
'ID':[1,2,3,4, 5],
'audio_name' : ['1.wav','10.wav','96.wav','3.wav','55.wav']})
(df
.assign(audio_name=lambda df_ : df_.audio_name.str.split('.', expand=True).iloc[:,0].astype('int'))
.sort_values(by=['audio_name','ID']))

How to get index from value in 2D Numpy array

I have two arrays:
values_arr = [[100,1], [20,5], [40,50]...[50,30]]
images_arr = [img1, img2, img3,...imgn]
Both the arrays are numpy arrays.
The values_arr and images_arr are in the same order.
i.e
[100, 1] corresponds to img1
How do I get the image given the value of index?
index = [20,5]
In this case, I should get img2 given the value of index = [20,5].
You can make a dict as
values_arr_tup = [tuple(i) for i in values_arr]
dict_ = {key:value for key,value in zip(values_arr_tup ,images_arr)}
then perform dict_[tuple(index)] to get the image
You can use np.where to extract the index of the item :
images_arr[np.where(values_arr == [20,5])[0][0]]

Scalar multiplication of arrays with another array in numpy

I have an array of arrays of size (4,1) like:
ar1 = np.array([[1,1],
[2,2],
[3,3],
[4,4]])
and a second array of size (1,2) like:
ar2 = np.array([2,3])
I'm trying to multiply every first item of the first array times the first item of the second array, and every second item of the first array times the second item of the second array, such as the result is:
ar_result = np.array([[2,3],
[4,6],
[6,9],
[8,12]])
Is there a way to do this in an easy and vectorized way?
When I try to ar1*ar2 it I'm getting this error:
ValueError: operands could not be broadcast together with shapes (4,) (2,)
Thanks
EDIT: To clarify, in my case ar1 is a DataFrame df1 column or row, and ar2 is the content of a cell in another DataFrame df2 (df2.loc[x,y] = [2,3])
Have you tried ar1 * ar2?
ar_result = ar1 * ar2
#array([[ 2, 3],
# [ 4, 6],
# [ 6, 9],
# [ 8, 12]])

TypeError: unhashable type: 'numpy.ndarray' - How to get data from data frame by querying radius from ball tree?

How to get data by querying radius from ball tree? For example
from sklearn.neighbors import BallTree
import pandas as pd
bt = BallTree(df[['lat','lng']], metric="haversine")
for idx, row in df.iterrow():
res = df[bt.query_radius(row[['lat','lng']],r=1)]
I want to get those rows in df that are in radius r=1. But it throws type error
TypeError: unhashable type: 'numpy.ndarray'
Following the first answer I got index out of range when iterating over the rows
5183
(5219, 25)
5205
(5219, 25)
5205
(5219, 25)
5221
(5219, 25)
Traceback (most recent call last):
File "/Users/Chu/Documents/dssg2018/sa4.py", line 45, in <module>
df.loc[idx,word]=len(df.iloc[indices[idx]][df[word]==1])/\
IndexError: index 5221 is out of bounds for axis 0 with size 5219
And the code is
bag_of_words = ['beautiful','love','fun','sunrise','sunset','waterfall','relax']
for idx,row in df.iterrows():
for word in bag_of_words:
if word in row['caption']:
df.loc[idx, word] = 1
else:
df.loc[idx, word] = 0
bt = BallTree(df[['lat','lng']], metric="haversine")
indices = bt.query_radius(df[['lat','lng']],r=(float(10)/40000)*360)
for idx,row in df.iterrows():
for word in bag_of_words:
if word in row['caption']:
print(idx)
print(df.shape)
df.loc[idx,word]=len(df.iloc[indices[idx]][df[word]==1])/\
np.max([1,len(df.iloc[indices[idx]][df[word]!=1])])
The error is not in the BallTree, but the indices returned by it are not used properly for putting it into index.
Do it this way:
for idx, row in df.iterrows():
indices = bt.query_radius(row[['lat','lng']].values.reshape(1,-1), r=1)
res = df.iloc[[x for b in indices for x in b]]
# Do what you want to do with res
This will also do (since we are sending only a single point each time):
res = df.iloc[indices[0]]
Explanation:
I'm using scikit 0.20. So the code you wrote above:
df[bt.query_radius(row[['lat','lng']],r=1)]
did not work for me. I needed to make it a 2-d array by using reshape().
Now bt.query_radius() returns array of array of indices within the radius r specified as mentioned in the documentation:
ind : array of objects, shape = X.shape[:-1]
each element is a numpy integer array listing the indices of neighbors of the corresponding point. Note that unlike the results of
a k-neighbors query, the returned neighbors are not sorted by distance
by default.
So we needed to iterate two arrays to reach the actual indices of the data.
Now once we got the indices, in a pandas Dataframe, iloc is the way to access data with indices.
Update:
You dont need to query the bt each time for individual points. You can send all the df at once to return a 2-d array containing the indices of points within the radius to the point specified that index.
indices = bt.query_radius(df, r=1)
for idx, row in df.iterrows():
nearest_points_index = indices[idx]
res = df.iloc[nearest_points_index]
# Do what you want to do with res

Sort Numpy array by subfield

I have a structured numpy array, in which one of field has subfields:
import numpy, string, random
dtype = [('name', 'a10'), ('id', 'i4'),
('size', [('length', 'f8'), ('width', 'f8')])]
a = numpy.zeros(10, dtype = dtype)
for idx in range(len(a)):
a[idx] = (''.join(random.sample(string.ascii_lowercase, 10)), idx,
numpy.random.uniform(0, 1, size=[1, 2]))
I can easily get it sorted by any of fields, like this:
a.sort(order = ['name'])
a.sort(order = ['size'])
When I try to sort it by a structured field ('size' in this example), it is effectively getting sorted by the first subfield ('length' in this example). However, I would like to have my elements sorted by 'height'. I tried something like this, but it does not work:
a.sort(order = ['size[\'height\']']))
ValueError: unknown field name: size['height']
a.sort(order = ['size', 'height'])
ValueError: unknown field name: height
Therefore, I wonder, if there is a way to accomplish the task?
I believe this is what you want:
a[a["size"]["width"].argsort()]