Parallelization of outer product on multiple vectors using cython - numpy

Suppose to have a numpy array m of shape (N, M) and I want to compute
res = np.zeros((M, M))
for i in range(N):
res += np.outer(m[i], m[i])
This loop can be made more efficient using einsum, i.e.
res = np.sum(np.einsum('ij,ik->ijk', m , m, axis=0)
but this requires the storage of a N x M x M matrix, which can be (and in my case is) very demanding.
I thought to build this function in cython, using parallelzation
import numpy as np
cimport numpy as np
from cython.parallel import prange
def get_s(double[:,:] m):
cdef Py_ssize_t i = 0
cdef int n = m.shape[1]
res = 0.
for i in prange(n, nogil=True):
res += np.outer(m[i], m[i])
return res
The idea of this code is that
The run of this code produces a lot of errors since I am using python objects, disallowed operations, and I don't know how to properly initialize res.

You cannot use numpy functions (np.outer) is the nogil context. So you just spell it out with loops.
Futhermore, your res variable seems to be an array, so you'll need to declare one and initialize it.
Last, you want the loops to compile to C, thus use typed memoryviews. It's easiest to use numpy arrays for memory management and take memoryviews of them. Taking it all together,
%%cython -a
cimport cython
import numpy as np
#cython.boundscheck(False)
#cython.wraparound(False)
def m_outer(double[:, ::1] a):
n, m = a.shape[0], a.shape[1]
cdef double[:, ::1] resm = np.zeros((m, m))
for i in range(a.shape[0]):
for j in range(a.shape[1]):
for k in range(a.shape[1]):
resm[j, k] += a[i, j] * a[i, k]
return np.asarray(resm)
A way to write these things (maybe the way) is to write it in python (nevermind the speed), validate the output on a small example (I use 3-by-4), then cythonize.
When cythonizing, use %cython -a and examine the generated C code.
Now, there are two obvious opportunities here: reorder the loops to lift loop-constants and use prange. Both are left as an exercise to the reader.
And the very last note. Unless it's an educational exercise, note that what you are really computing is a matrix product A.T # A.

Your iteration:
In [139]: res = np.zeros((4,4))
In [140]: for i in range(3): res += np.outer(m[i],m[i])
In [141]: res
Out[141]:
array([[ 80., 92., 104., 116.],
[ 92., 107., 122., 137.],
[104., 122., 140., 158.],
[116., 137., 158., 179.]])
We can do the same outer with broadcasting:
In [142]: np.sum(m[:,:,None]*m[:,None,:], axis=0)
Out[142]:
array([[ 80, 92, 104, 116],
[ 92, 107, 122, 137],
[104, 122, 140, 158],
[116, 137, 158, 179]])
(yes, this does make a temporary (N,M,M) array)
The suggested single step einsum:
In [143]: np.einsum('ij,ik->jk',m,m)
Out[143]:
array([[ 80, 92, 104, 116],
[ 92, 107, 122, 137],
[104, 122, 140, 158],
[116, 137, 158, 179]])
Which is just a simple dot product (with the appropriate transpose):
In [144]: m.T.dot(m)
Out[144]:
array([[ 80, 92, 104, 116],
[ 92, 107, 122, 137],
[104, 122, 140, 158],
[116, 137, 158, 179]])
In [145]: m.T#m
Out[145]:
array([[ 80, 92, 104, 116],
[ 92, 107, 122, 137],
[104, 122, 140, 158],
[116, 137, 158, 179]])
Since numpy dot's use fast BLAS code, I doubt if you can improve on it with cython.

Related

Populating numpy array with most performance

I have few arrays a,b,c and d as shown below and would like to populate a matrix by evaluating a function f(...) which consumes a,b,c and d.
with nested for loop this is obviously possible but I'm looking for more pythonic and fast way to do this.
So far I tried, np.fromfunction with no luck.
Thanks
PS: This function f has a conditional. I still can consider approaches which does not support conditionals but if the solution supports conditionals that would be fantastic.
example function in case helpful
def fun(a,b,c,c): return a+b+c+d if a==b else a*b*c*d
Also why fromfunction failed is shown below
>>> a = np.array([1,2,3,4,5])
>>> b = np.array([10,20,30])
>>> def fun(i,j): return a[i] * b[j]
>>> np.fromfunction(fun, (3,5))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Anaconda3\lib\site-packages\numpy\core\numeric.py", line 1853, in fromfunction
return function(*args, **kwargs)
File "<stdin>", line 1, in fun
IndexError: arrays used as indices must be of integer (or boolean) type
The reason the function fails is that np.fromfunction passes floating-point values, which are not valid as indices. You can modify your function like this to make it work:
def fun(i,j):
return a[j.astype(int)] * b[i.astype(int)]
print(np.fromfunction(fun, (3,5)))
[[ 10 20 30 40 50]
[ 20 40 60 80 100]
[ 30 60 90 120 150]]
Jake has explained why your fromfunction approach fails. However, you don't need fromfunction for your example. You could simply add an axis to b and have numpy broadcast the shapes:
a = np.array([1,2,3,4,5])
b = np.array([10,20,30])
def fun(i,j): return a[j.astype(int)] * b[i.astype(int)]
f1 = np.fromfunction(fun, (3, 5))
f2 = b[:, None] * a
(f1 == f2).all() # True
Extending this to the function you showed that contains an if condition, you could just split the if into two operations in sequence: creating an array given by the if expression, and overwriting the relevant parts by the else expression.
a = np.array([1, 2, 3, 4, 5])
b = np.array([5, 4, 3, 2, 1])
c = np.array([100, 200, 300, 400, 500])
d = np.array([0, 1, 2, 3])
# Calculate the values at all indices as the product
result = d[:, None] * (a * b * c)
# array([[ 0, 0, 0, 0, 0],
# [ 500, 1600, 2700, 3200, 2500],
# [1000, 3200, 5400, 6400, 5000],
# [1500, 4800, 8100, 9600, 7500]])
# Calculate sum
sum_arr = d[:, None] + (a + b + c)
# array([[106, 206, 306, 406, 506],
# [107, 207, 307, 407, 507],
# [108, 208, 308, 408, 508],
# [109, 209, 309, 409, 509]])
# Set diagonal elements (i==j) to sum:
np.fill_diagonal(result, np.diag(sum_arr))
which gives the following result:
array([[ 106, 0, 0, 0, 0],
[ 500, 207, 2700, 3200, 2500],
[1000, 3200, 308, 6400, 5000],
[1500, 4800, 8100, 409, 7500]])

Per channel normalization of RGB images

I would like to know how I can manually normalize an RGB image.
I tried:
img_name = 'example/abc/myfile.png'
img = np.asarray(Image.open(img_name))
img = np.transpose(img, (2,0,1))
(img/255.0 - mean)/std
mean and std has shape (3,)
When I run the code above I get this error:
ValueError: operands could not be broadcast together with shapes (3,512,512) (3,)
How can we normalize each channel individually?
Normalization means to transform to zero mean and unit variance. This is done by subtracting the mean and dividing the result by the standard deviation.
You can do it per channel by specifying the axes as x.mean((1,2)) instead of just x.mean(). In order to be able to broadcast you need to transpose the image first and then transpose back.
import numpy as np
np.random.seed(0)
c = 3 # number of channels
h = 2 # height
w = 4 # width
img = np.random.randint(0, 255, (c, h, w))
img_n = ((img.T - img.mean((1,2))) / img.std((1,2))).T
Original image:
array([[[172, 47, 117, 192],
[ 67, 251, 195, 103]],
[[ 9, 211, 21, 242],
[ 36, 87, 70, 216]],
[[ 88, 140, 58, 193],
[230, 39, 87, 174]]])
Normalized image:
array([[[ 0.43920493, -1.45391976, -0.39376994, 0.74210488],
[-1.15101981, 1.63565973, 0.78753987, -0.6057999 ]],
[[-1.14091546, 1.10752281, -1.00734487, 1.45258017],
[-0.84038163, -0.27270662, -0.46193163, 1.16317723]],
[[-0.59393963, 0.21615508, -1.06130196, 1.04182853],
[ 1.61824207, -1.35729811, -0.60951837, 0.74583239]]])
Verify zero mean and unit standard deviation per channel print(img_n.mean((1,2)), img_n.std((1,2))):
[-1.38777878e-17 0.00000000e+00 0.00000000e+00]
[1. 1. 1.]

Minimizing the peak difference of elements of two lists given some constraints

I want to minimize the peak difference of list1[i] - list2[i] using the scipy.optimize.minimize method.
The elements in list1 and list2 are floats.
For example:
list1 = [50, 50.5, 52, 53, 55, 55.5, 56, 57, 60, 61]
How do I minimize list1[i] - list2[i] given that I have two constraints:
1. list2 = list1[0]
2. list2[i+1]-list2[i]<=1.5
Basically, two consecutive elements in list2 can not be separated by more than 1.5 and the first element of list2 is the first element of list1.
Maybe there is another way other than scipy.optimize.minimize but I don't know how to do this.
I think the optimum values for list2 are maybe:
list2 = [50, 50.5, 52, 53, 54.5, 55.5, 56, 57, 58.5, 60]
In this case the peak difference is 1.5.
But maybe the algorithm finds a more optimum solution where there is less difference between the elements of list1 and list2.
Here is what I have tried but failed:
import numpy as np
from scipy.optimize import minimize
list1 = [50, 50.5, 52, 53, 55, 55.5, 56, 57,60, 61]
list2 = [list1[0]]
#Define objective
def peakDifference(*args):
global list2
peak_error = []
for list1_i, list2_i in zip(list1, list2):
peak_error.append(list1_i-list2_i)
return max(peak_error)
peak_error = peakDifference()
#Define constraints
def constraint1(*args):
for x in range(len(list2) - 1):
return list2[x+1] - list2[x] - 1.5
con1 = {'type': 'ineq', 'fun': constraint1}
#Optimize
sol = minimize(peakDifference,list2, constraints=con1)
Traceback (most recent call last): File "C:/Users/JumpStart/PycharmProjects/anglesimulation/venv/asdfgh.py", line 27, in <module>
sol = minimize(peakDifference,list2, constraints=con1) File "C:\Users\JumpStart\anaconda3\lib\site-packages\scipy\optimize\_minimize.py", line 625, in minimize
return _minimize_slsqp(fun, x0, args, jac, bounds, File "C:\Users\JumpStart\anaconda3\lib\site-packages\scipy\optimize\slsqp.py", line 412, in _minimize_slsqp
a = _eval_con_normals(x, cons, la, n, m, meq, mieq) File "C:\Users\JumpStart\anaconda3\lib\site-packages\scipy\optimize\slsqp.py", line 486, in _eval_con_normals
a_ieq = vstack([con['jac'](x, *con['args']) File "C:\Users\JumpStart\anaconda3\lib\site-packages\scipy\optimize\slsqp.py", line 486, in <listcomp>
a_ieq = vstack([con['jac'](x, *con['args']) File "C:\Users\JumpStart\anaconda3\lib\site-packages\scipy\optimize\slsqp.py", line 284, in cjac
return approx_derivative(fun, x, method='2-point', File "C:\Users\JumpStart\anaconda3\lib\site-packages\scipy\optimize\_numdiff.py", line 426, in approx_derivative
return _dense_difference(fun_wrapped, x0, f0, h, File "C:\Users\JumpStart\anaconda3\lib\site-packages\scipy\optimize\_numdiff.py", line 497, in _dense_difference
df = fun(x) - f0 TypeError: unsupported operand type(s) for -: 'NoneType' and 'NoneType'
Process finished with exit code 1
NLP model
Here is a "working" version of the NLP model. The model cannot be solved reliably this way, as it is non-differentiable.
import numpy as np
from scipy.optimize import minimize
list1 = np.array([50, 50.5, 52, 53, 55, 55.5, 56, 57,60, 61])
list1_0 = list1[0]
n = len(list1)
# our x variable will have one element less (first element is fixed)
list1x = np.delete(list1,0) # version with 1st element dropped
nx = len(list1x)
# objective function
# minimize the the maximum difference
# Notes:
# - x excludes first element (they are the same by definition)
# - this is non-differentiable so likely to be non-optimal
def maxDifference(x):
return np.max(np.abs(list1x - x))
# n-1 constraints
def constraint1(x):
return 1.5 - np.diff(np.insert(x,0,list1_0),1)
con1 = {'type': 'ineq', 'fun': constraint1}
#Optimize
x = 55*np.ones(nx) # initial value
sol = minimize(maxDifference, x, constraints=con1)
sol
# optimal is: x = [51.25,51.25,52.75,54.25,54.75,56.25,57.75,59.25,60.25]
# obj = 0.75
The result is:
fun: 5.0
jac: array([0., 0., 0., 0., 0., 0., 0., 0., 0.])
message: 'Optimization terminated successfully'
nfev: 20
nit: 2
njev: 2
status: 0
success: True
x: array([51.5, 53. , 54.5, 55. , 55. , 55. , 55. , 55. , 56. ])
This is non-optimal: the objective is 5 (instead of 0.75).
LP model
An LP model will find a proven optimal solution. That is much more reliable. E.g.:
import pulp as lp
list1 = [50, 50.5, 52, 53, 55, 55.5, 56, 57,60, 61]
n = len(list1)
model = pulp.LpProblem("Min_difference", pulp.LpMinimize)
x = lp.LpVariable.dicts("x",(i for i in range(n)))
z = lp.LpVariable("z")
# objective
model += z
# constraints
for i in range(n):
model += z >= x[i]-list1[i]
model += z >= list1[i]-x[i]
for i in range(n-1):
model += x[i+1] - x[i] <= 1.5
model += x[0] == list1[0]
model.solve()
print(lp.LpStatus[model.status])
print("obj:",z.varValue)
print([x[i].varValue for i in range(n)])
This shows:
Optimal
obj: 0.75
[50.0, 51.25, 52.75, 53.75, 55.25, 56.25, 56.75, 57.75, 59.25, 60.25]

implementation of Hierarchial Agglomerative clustering

i am newbie and just want to implement Hierarchical Agglomerative clustering for RGB images. For this I extract all values of RGB from an image. And I process image.Next I find its distance and then develop the linkage. Now from linkage I want to extract my original data (i.e RGB values) on specified indices with indices id. Here is code I have done so far.
image = Image.open('image.jpg')
image = image.convert('RGB')
im = np.array(image).reshape((-1,3))
rgb = list(im.getdata())
X = pdist(im)
Y = linkage(X)
I = inconsistent(Y)
based on the 4th column of consistency. I opt minimum value of the cutoff in order to get maximum clusters.
cutoff = 0.7
cluster_assignments = fclusterdata(Y, cutoff)
# Print the indices of the data points in each cluster.
num_clusters = cluster_assignments.max()
print "%d clusters" % num_clusters
indices = cluster_indices(cluster_assignments)
ind = np.array(enumerate(rgb))
for k, ind in enumerate(indices):
print "cluster", k + 1, "is", ind
dendrogram(Y)
I got results like this
cluster 6 is [ 6 11]
cluster 7 is [ 9 12]
cluster 8 is [15]
Means cluster 6 contains the indices of 6 and 11 leafs. Now at this point I stuck in how to map these indices to get original data(i.e rgb values). indices of each rgb values to each pixel in the image. And then I have to generate codebook to implement Agglomeration Clustering. I have no idea how to approach this task. Read a lot of stuff but nothing clued.
Here is my solution:
import numpy as np
from scipy.cluster import hierarchy
im = np.array([[54,101,9],[ 67,89,27],[ 67,85,25],[ 55,106,1],[ 52,108,0],
[ 55,78,24],[ 19,57,8],[ 19,46,0],[ 95,110,15],[112,159,57],
[ 67,118,26],[ 76,127,35],[ 74,128,30],[ 25,62,0],[100,120,9],
[127,145,61],[ 48,112,25],[198,25,21],[203,11,10],[127,171,60],
[124,173,45],[120,133,19],[109,137,18],[ 60,85,0],[ 37,0,0],
[187,47,20],[127,170,52],[ 30,56,0]])
groups = hierarchy.fclusterdata(im, 0.7)
idx_sorted = np.argsort(groups)
group_sorted = groups[idx_sorted]
im_sorted = im[idx_sorted]
split_idx = np.where(np.diff(group_sorted) != 0)[0] + 1
np.split(im_sorted, split_idx)
output:
[array([[203, 11, 10],
[198, 25, 21]]),
array([[187, 47, 20]]),
array([[127, 171, 60],
[127, 170, 52]]),
array([[124, 173, 45]]),
array([[112, 159, 57]]),
array([[127, 145, 61]]),
array([[25, 62, 0],
[30, 56, 0]]),
array([[19, 57, 8]]),
array([[19, 46, 0]]),
array([[109, 137, 18],
[120, 133, 19]]),
array([[100, 120, 9],
[ 95, 110, 15]]),
array([[67, 89, 27],
[67, 85, 25]]),
array([[55, 78, 24]]),
array([[ 52, 108, 0],
[ 55, 106, 1]]),
array([[ 54, 101, 9]]),
array([[60, 85, 0]]),
array([[ 74, 128, 30],
[ 76, 127, 35]]),
array([[ 67, 118, 26]]),
array([[ 48, 112, 25]]),
array([[37, 0, 0]])]

how to vectorize an operation on a 1 dimensionsal array to produce 2 dimensional matrix in numpy

I have a 1d array of values
i = np.arange(0,7,1)
and a function
# Returns a column matrix
def fn(i):
return np.matrix([[i*2,i*3]]).T
fnv = np.vectorize(fn)
then writing
fnv(i)
gives me an error
File "<stdin>", line 1, in <module>
File "c:\Python33\lib\site-packages\numpy\lib\function_base.py",
line 1872, in __call__
return self._vectorize_call(func=func, args=vargs)
File "c:\Python33\lib\site-packages\numpy\lib\function_base.py",
line 1942, in _vectorize_call
copy=False, subok=True, dtype=otypes[0])
ValueError: setting an array element with a sequence.
The result I am looking for is a matrix with two rows and as many columns as in the input array. What is the best notation in numpy to achieve this?
For example i would equal
[1,2,3,4,5,6]
and the output would equal
[[2,4,6,8,10,12],
[3,6,9,12,15,18]]
EDIT
You should try to avoid using vectorize, because it gives the illusion of numpy efficiency, but inside it's all python loops.
If you really have to deal with user supplied functions that take ints and return a matrix of shape (2, 1) then there probably isn't much you can do. But that seems like a really weird use case. If you can replace that with a list of functions that take an int and return an int, and that use ufuncs when needed, i.e. np.sin instead of math.sin, you can do the following
def vectorize2(funcs) :
def fnv(arr) :
return np.vstack([f(arr) for f in funcs])
return fnv
f2 = vectorize2((lambda x : 2 * x, lambda x : 3 * x))
>>> f2(np.arange(10))
array([[ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18],
[ 0, 3, 6, 9, 12, 15, 18, 21, 24, 27]])
Just for your reference, I have timed this vectorization against your proposed one:
f = vectorize(fn)
>>> timeit.timeit('f(np.arange(10))', 'from __main__ import np, f', number=1000)
0.28073329263679625
>>> timeit.timeit('f2(np.arange(10))', 'from __main__ import np, f2', number=1000)
0.023139129945661807
>>> timeit.timeit('f(np.arange(10000))', 'from __main__ import np, f', number=10)
2.3620706288432984
>>> timeit.timeit('f2(np.arange(10000))', 'from __main__ import np, f2', number=10)
0.002757072593169596
So there is an order of magnitude in speed even for small arrays, that grows to a x1000 speed up, available almost for free, for larger arrays.
ORIGINAL ANSWER
Don't use vectorize unless there is no way around it, it's slow. See the following examples
>>> a = np.array(range(7))
>>> a
array([0, 1, 2, 3, 4, 5, 6])
>>> np.vstack((a, a+1))
array([[0, 1, 2, 3, 4, 5, 6],
[1, 2, 3, 4, 5, 6, 7]])
>>> np.vstack((a, a**2))
array([[ 0, 1, 2, 3, 4, 5, 6],
[ 0, 1, 4, 9, 16, 25, 36]])
Whatever your function is, if it can be constructed with numpy's ufuncs, you can do something like np.vstack((a, f(a))) and get what you want
A simple reimplementation of vectorize gives me what I want
def vectorize( fn):
def do_it (array):
return np.column_stack((fn(p) for p in array))
return do_it
If this is not performant or there is a better way then let me know.