Speeding up Euclidean Distance in python [duplicate] - numpy

How do you optimize this code?
At the moment it is running to slow for the amount of data that goes through this loop. This code runs 1-nearest neighbor. It will predict the label of the training_element based off the p_data_set
# [x] , [[x1],[x2],[x3]], [l1, l2, l3]
def prediction(training_element, p_data_set, p_label_set):
temp = np.array([], dtype=float)
for p in p_data_set:
temp = np.append(temp, distance.euclidean(training_element, p))
minIndex = np.argmin(temp)
return p_label_set[minIndex]

Use a k-D tree for fast nearest-neighbour lookups, e.g. scipy.spatial.cKDTree:
from scipy.spatial import cKDTree
# I assume that p_data_set is (nsamples, ndims)
tree = cKDTree(p_data_set)
# training_elements is also assumed to be (nsamples, ndims)
dist, idx = tree.query(training_elements, k=1)
predicted_labels = p_label_set[idx]

You could use distance.cdist to directly get the distances temp and then use .argmin() to get min-index, like so -
minIndex = distance.cdist(training_element[None],p_data_set).argmin()
Here's an alternative approach using np.einsum -
subs = p_data_set - training_element
minIndex = np.einsum('ij,ij->i',subs,subs).argmin()
Runtime test
Well I was thinking cKDTree would easily beat cdist, but I guess training_element being a 1D array isn't too heavy for cdist and I am seeing it to beat out cKDTree instead by a good 10x+ margin!
Here's the timing results -
In [422]: # Setup arrays
...: p_data_set = np.random.randint(0,9,(40000,100))
...: training_element = np.random.randint(0,9,(100,))
...:
In [423]: def tree_based(p_data_set,training_element): ##ali_m's soln
...: tree = cKDTree(p_data_set)
...: dist, idx = tree.query(training_element, k=1)
...: return idx
...:
...: def einsum_based(p_data_set,training_element):
...: subs = p_data_set - training_element
...: return np.einsum('ij,ij->i',subs,subs).argmin()
...:
In [424]: %timeit tree_based(p_data_set,training_element)
1 loops, best of 3: 210 ms per loop
In [425]: %timeit einsum_based(p_data_set,training_element)
100 loops, best of 3: 17.3 ms per loop
In [426]: %timeit distance.cdist(training_element[None],p_data_set).argmin()
100 loops, best of 3: 14.8 ms per loop

Python can be quite fast programming language if used properly.
This is my suggestion (faster_prediction):
import numpy as np
import time
def euclidean(a,b):
return np.linalg.norm(a-b)
def prediction(training_element, p_data_set, p_label_set):
temp = np.array([], dtype=float)
for p in p_data_set:
temp = np.append(temp, euclidean(training_element, p))
minIndex = np.argmin(temp)
return p_label_set[minIndex]
def faster_prediction(training_element, p_data_set, p_label_set):
temp = np.tile(training_element, (p_data_set.shape[0],1))
temp = np.sqrt(np.sum( (temp - p_data_set)**2 , 1))
minIndex = np.argmin(temp)
return p_label_set[minIndex]
training_element = [1,2,3]
p_data_set = np.random.rand(100000, 3)*10
p_label_set = np.r_[0:p_data_set.shape[0]]
t1 = time.time()
result_1 = prediction(training_element, p_data_set, p_label_set)
t2 = time.time()
t3 = time.time()
result_2 = faster_prediction(training_element, p_data_set, p_label_set)
t4 = time.time()
print "Execution time 1:", t2-t1, "value: ", result_1
print "Execution time 2:", t4-t3, "value: ", result_2
print "Speed up: ", (t4-t3) / (t2-t1)
I get the following result on pretty old laptop:
Execution time 1: 21.6033108234 value: 9819
Execution time 2: 0.0176379680634 value: 9819
Speed up: 1224.81857013
which makes me think I must have done some stupid mistake :)
In case of very huge data, where memory might be an issue, I suggest using Cython or implementing function in C++ and wrapping it in python.

Related

Is it possible without using parallelization (Swifter, Parallel) to make an instant calculation immediately without passing through the index?

Is it possible without using parallelization (Swifter, Parallel) to make an instant calculation immediately without passing through the index, for example through the use of the "apply"-function for all dataset?
%%time
import random
df = pd.DataFrame({'A':random.sample(range(200), 200)})
for j in range(200):
for i in df.index:
df.loc[i,'A_last_{}'.format(j)] = df.loc[(df.index < i) & (df.index >= i - j),'A'].mean()
%%time
import random
df = pd.DataFrame({'A':random.sample(range(200), 200)})
First calculate the sums.
df[1] = df['A'].shift()
for j in range(2, 200):
df[j] = df[j-1].fillna(0) + df['A'].shift(j)
Then do the division for means and take care of the formatting
df = df.set_index('A')
df.divide(df.columns, axis=1)\
.fillna(method='ffill', axis=1)\
.rename(lambda x: f'A_last_{x}', axis=1)\
.reset_index()

call functions on a specific level of a numpy ndarray without for loops

suppose a numpy ndarrary
arr
has shape (100,100,5,5)
The following codes work:
result=np.zeros((arr.shape[0], arr.shape[1], 10))
for i in range(arr.shape[0]):
for j in range(arr.shape[1]):
v=arr[i,j].flatten()
hist, bi= np.histogram(v, bins=10, range=(0,3))
result[i,j] =hist
but it's slow. Is there a more efficient way to write the codes, say avoid the for loops?
Hmm, I thought apply_along_axis would help, but it doesn't seem to make much of a difference, at least at the problem sizes of interest to you. Maybe there's overhead in myhist.
See the code below.
import numpy as np
import time
low = 0.0
high = 3.0
bins = 10
arrshp = (100,100,5,5)
def myhist(xx):
out = np.histogram(xx,bins=bins,range=(low,high))
return out[0]
arr = np.random.uniform(low,high,arrshp)
time1 = time.time()
arr2 = arr.reshape(arrshp[0],arrshp[1],-1)
out_fast = np.apply_along_axis(myhist,-1,arr2)
time2 = time.time()
print('time (secs) fast = ',time2-time1)
time3 = time.time()
out_slow = np.zeros((arr.shape[0],arr.shape[1],bins),dtype='float64')
for i in range(arr.shape[0]):
for j in range(arr.shape[1]):
v = arr[i,j].flatten()
_hh = np.histogram(v, bins=bins, range=(low,high))
out_slow[i,j,:] = _hh[0]
time4 = time.time()
print('norm diff = ',np.linalg.norm(out_fast-out_slow))
print('time (secs) slow = ',time4-time3)

Add to items, with multiple occurrences [duplicate]

I have unsorted array of indexes:
i = np.array([1,5,2,6,4,3,6,7,4,3,2])
I also have an array of values of the same length:
v = np.array([2,5,2,3,4,1,2,1,6,4,2])
I have array with zeros of desired values:
d = np.zeros(10)
Now I want to add to elements in d values of v based on it's index in i.
If I do it in plain python I would do it like this:
for index,value in enumerate(v):
idx = i[index]
d[idx] += v[index]
It is ugly and inefficient. How can I change it?
np.add.at(d, i, v)
You'd think d[i] += v would work, but if you try to do multiple additions to the same cell that way, one of them overrides the others. The ufunc.at method avoids those problems.
We can use np.bincount which is supposedly pretty efficient for such accumulative weighted counting, so here's one with that -
counts = np.bincount(i,v)
d[:counts.size] = counts
Alternatively, using minlength input argument and for a generic case when d could be any array and we want to add into it -
d += np.bincount(i,v,minlength=d.size).astype(d.dtype, copy=False)
Runtime tests
This section compares np.add.at based approach listed in the other post with the np.bincount based one listed earlier in this post.
In [61]: def bincount_based(d,i,v):
...: counts = np.bincount(i,v)
...: d[:counts.size] = counts
...:
...: def add_at_based(d,i,v):
...: np.add.at(d, i, v)
...:
In [62]: # Inputs (random numbers)
...: N = 10000
...: i = np.random.randint(0,1000,(N))
...: v = np.random.randint(0,1000,(N))
...:
...: # Setup output arrays for two approaches
...: M = 12000
...: d1 = np.zeros(M)
...: d2 = np.zeros(M)
...:
In [63]: bincount_based(d1,i,v) # Run approaches
...: add_at_based(d2,i,v)
...:
In [64]: np.allclose(d1,d2) # Verify outputs
Out[64]: True
In [67]: # Setup output arrays for two approaches again for timing
...: M = 12000
...: d1 = np.zeros(M)
...: d2 = np.zeros(M)
...:
In [68]: %timeit add_at_based(d2,i,v)
1000 loops, best of 3: 1.83 ms per loop
In [69]: %timeit bincount_based(d1,i,v)
10000 loops, best of 3: 52.7 µs per loop

how to use Apache Commons Math Optimization in Jython?

I want to transfer Matlab code to Jython version, and find that the fminsearch in Matlab might be replaced by Apache-Common-Math-Optimization.
I'm coding on the Mango Medical Image script manager, which uses Jython 2.5.3 as coding language. And the Math version is 3.6.1.
Here is my code:
def f(x,y):
return x^2+y^2
sys.path.append('/home/shujian/APPs/Mango/lib/commons-math3-3.6.1.jar')
sys.add_package('org.apache.commons.math3.analysis')
from org.apache.commons.math3.analysis import MultivariateFunction
sys.add_package('org.apache.commons.math3.optim.nonlinear.scalar.noderiv')
from org.apache.commons.math3.optim.nonlinear.scalar.noderiv import NelderMeadSimplex,SimplexOptimizer
sys.add_package('org.apache.commons.math3.optim.nonlinear.scalar')
from org.apache.commons.math3.optim.nonlinear.scalar import ObjectiveFunction
sys.add_package('org.apache.commons.math3.optim')
from org.apache.commons.math3.optim import MaxEval,InitialGuess
sys.add_package('org.apache.commons.math3.optimization')
from org.apache.commons.math3.optimization import GoalType
initialSolution=[2.0,2.0]
simplex=NelderMeadSimplex([2.0,2.0])
opt=SimplexOptimizer(2**(-6), 2**(-10))
solution=opt.optimize(MaxEval(300),ObjectiveFunction(f),simplex,GoalType.MINIMIZE,InitialGuess([2.0,2.0]))
skewParameters2 = solution.getPointRef()
print skewParameters2;
And I got the error below:
TypeError: optimize(): 1st arg can't be coerced to
I'm quite confused about how to use the optimization in Jython and the examples are all Java version.
I've given up this plan and find another method to perform the fminsearch in Jython. Below is the Jython version code:
import sys
sys.path.append('.../jnumeric-2.5.1_ra0.1.jar') #add the jnumeric path
import Numeric as np
def nelder_mead(f, x_start,
step=0.1, no_improve_thr=10e-6,
no_improv_break=10, max_iter=0,
alpha=1., gamma=2., rho=-0.5, sigma=0.5):
'''
#param f (function): function to optimize, must return a scalar score
and operate over a numpy array of the same dimensions as x_start
#param x_start (float list): initial position
#param step (float): look-around radius in initial step
#no_improv_thr, no_improv_break (float, int): break after no_improv_break iterations with
an improvement lower than no_improv_thr
#max_iter (int): always break after this number of iterations.
Set it to 0 to loop indefinitely.
#alpha, gamma, rho, sigma (floats): parameters of the algorithm
(see Wikipedia page for reference)
return: tuple (best parameter array, best score)
'''
# init
dim = len(x_start)
prev_best = f(x_start)
no_improv = 0
res = [[np.array(x_start), prev_best]]
for i in range(dim):
x=np.array(x_start)
x[i]=x[i]+step
score = f(x)
res.append([x, score])
# simplex iter
iters = 0
while 1:
# order
res.sort(key=lambda x: x[1])
best = res[0][1]
# break after max_iter
if max_iter and iters >= max_iter:
return res[0]
iters += 1
# break after no_improv_break iterations with no improvement
print '...best so far:', best
if best < prev_best - no_improve_thr:
no_improv = 0
prev_best = best
else:
no_improv += 1
if no_improv >= no_improv_break:
return res[0]
# centroid
x0 = [0.] * dim
for tup in res[:-1]:
for i, c in enumerate(tup[0]):
x0[i] += c / (len(res)-1)
# reflection
xr = x0 + alpha*(x0 - res[-1][0])
rscore = f(xr)
if res[0][1] <= rscore < res[-2][1]:
del res[-1]
res.append([xr, rscore])
continue
# expansion
if rscore < res[0][1]:
xe = x0 + gamma*(x0 - res[-1][0])
escore = f(xe)
if escore < rscore:
del res[-1]
res.append([xe, escore])
continue
else:
del res[-1]
res.append([xr, rscore])
continue
# contraction
xc = x0 + rho*(x0 - res[-1][0])
cscore = f(xc)
if cscore < res[-1][1]:
del res[-1]
res.append([xc, cscore])
continue
# reduction
x1 = res[0][0]
nres = []
for tup in res:
redx = x1 + sigma*(tup[0] - x1)
score = f(redx)
nres.append([redx, score])
res = nres
And the test example is as below:
def f(x):
return x[0]**2+x[1]**2+x[2]**2
print nelder_mead(f,[3.4,2.3,2.2])
Actually, the original version is for python, and the link below is the source:
https://github.com/fchollet/nelder-mead

Creating image from point list with Numpy, how to speed up?

I've following code which seems to be performance bottleneck:
for x, y, intensity in myarr:
target_map[x, y] = target_map[x,y] + intensity
There are multiple coordinates for same coordinate with variable intensity.
Datatypes:
> print myarr.shape, myarr.dtype
(219929, 3) uint32
> print target_map.shape, target_map.dtype
(150, 200) uint32
Is there any way to optimize this loop, other than writing it in C?
This seems to be related question, how ever I couldn't get the accepted answer working for me: How to convert python list of points to numpy image array?
I get following error message:
Traceback (most recent call last):
File "<pyshell#38>", line 1, in <module>
image[coordinates] = 1
IndexError: too many indices for array
If you convert your 2D coordinates into target_map into flat indices into it using np.ravel_multi_index, you can use np.unique and np.bincount to speed things up quite a bit:
def vec_intensity(my_arr, target_map) :
flat_coords = np.ravel_multi_index((my_arr[:, 0], my_arr[:, 1]),
dims=target_map.shape)
unique_, idx = np.unique(flat_coords, return_inverse=True)
sum_ = np.bincount(idx, weights=my_arr[:, 2])
target_map.ravel()[unique_] += sum_
return target_map
def intensity(my_arr, target_map) :
for x, y, intensity in myarr:
target_map[x, y] += intensity
return target_map
#sample data set
rows, cols = 150, 200
items = 219929
myarr = np.empty((items, 3), dtype=np.uint32)
myarr[:, 0] = np.random.randint(rows, size=(items,))
myarr[:, 1] = np.random.randint(cols, size=(items,))
myarr[:, 2] = np.random.randint(100, size=(items,))
And now:
In [6]: %timeit target_map_1 = np.zeros((rows, cols), dtype=np.uint32); target_map_1 = vec_intensity(myarr, target_map_1)
10 loops, best of 3: 53.1 ms per loop
In [7]: %timeit target_map_2 = np.zeros((rows, cols), dtype=np.uint32); target_map_2 = intensity(myarr, target_map_2)
1 loops, best of 3: 934 ms per loop
In [8]: np.all(target_map_1 == target_map_2)
Out[8]: True
That's almost a 20x speed increase.