Multiply array indices with numbers - numpy

is there any short numpy command for the following operation in the for loops?
import numpy as np
a= np.array([1.0,2.0,3.0,4.0,5.0,6.0])
b= np.array([10.0,20.0,30.0])
c= np.array([100.0,200.0,300.0,900.0])
y=np.linspace(0,2,50)
m=np.array([0.2,0.1,0.3])
A,C,B,Y = np.meshgrid(a,c,b,y,indexing="ij")
print Y
for i in range(0,len(a)):
for j in range(0,len(c)):
for k in range(0,len(b)):
Y[i][j][k]=Y[i][j][k]*m[k]
print "--------"
print Y
Abstractly I have $Y_{ijkl}$ and I want to multiply $Y_{ij0l}$ with $m_0$ and $Y_{ij1l}$ with $m_1$ and so on...
Many thanks in advance!

To remove the loop, you just need einsum here.
np.einsum('ijkl,k->ijkl', Y, m)
Or just broadcasted multiplication:
Y * m[:, None]
However, if you don't want to create the meshgrid in the first place, you can broadcast Y first, to make this more memory efficient.
np.einsum(
"ijkl,k->ijkl",
np.broadcast_to(y, a.shape + c.shape + b.shape + y.shape),
m,
)
or:
np.broadcast_to(y, a.shape + c.shape + b.shape + y.shape) * m[:, None]
If you need A, C, B as well, you can continue with your current approach.
Performance
In [44]: %%timeit
...: np.einsum(
...: "ijkl,k->ijkl",
...: np.broadcast_to(y, (a.shape[0], c.shape[0], b.shape[0], y.shape[0])),
...: m,
...: )
...:
21.1 µs ± 121 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [45]: %%timeit
...: A,C,B,Y = np.meshgrid(a,c,b,y,indexing="ij")
...: for i in range(0,len(a)):
...: for j in range(0,len(c)):
...: for k in range(0,len(b)):
...: Y[i][j][k]=Y[i][j][k]*m[k]
...:
420 µs ± 1.58 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Related

apply calculation in pandas column with groupby

what could be wrong in below code ??
a)I need a group by Area columns and apply some mathematical formula across columns:
b)Also if I have another column lets say the date and need to be added to groupby how will it come in below command
df3 = dataset.groupby('AREA')(['col1']+['col2']).sum()
table is in image below
enter image description here
I think you can sum column before grouping for better performance:
dataset['new'] = dataset['col1']+dataset['col2']
df3 = dataset.groupby('AREA', as_index=False)['new'].sum()
But your solution is possible in lambda function:
df3 = (dataset.groupby('AREA')
.apply(lambda x: (x['col1']+x['col2']).sum())
.reset_index(name='SUM'))
Performance:
np.random.seed(123)
N = 100000
dataset = pd.DataFrame({'AREA': np.random.randint(1000, size=N),
'col1': np.random.randint(10, size=N),
'col2':np.random.randint(10, size=N)})
#print (dataset)
In [24]: %%timeit
...: dataset['new'] = dataset['col1']+dataset['col2']
...: df3 = dataset.groupby('AREA', as_index=False)['new'].sum()
...:
7.64 ms ± 50.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [25]: %%timeit
...: df3 = (dataset.groupby('AREA')
...: .apply(lambda x: (x['col1']+x['col2']).sum())
...: .reset_index(name='SUM'))
...:
368 ms ± 5.82 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

How to raise a matrix to the power of elements in an array that is increasing in an ascending order?

Currently I have a C matrix generated by:
def c_matrix(n):
exp = np.exp(1j*np.pi/n)
exp_n = np.array([[exp, 0], [0, exp.conj()]], dtype=complex)
c_matrix = np.array([exp_n**i for i in range(1, n, 1)], dtype=complex)
return c_matrix
What this does is basically generate a list of number from 0 to n-1 using list comprehension, then returns a list of the matrix exp_nbeing raised to the elements of the ascendingly increasing list. i.e.
exp_n**[0, 1, ..., n-1] = [exp_n**0, exp_n**1, ..., exp_n**(n-1)]
So I was wondering if there's a more numpythonic way of doing it(in order to make use of Numpy's broadcasting ability) like:
exp_n**np.arange(1,n,1) = np.array(exp_n**0, exp_n**1, ..., exp_n**(n-1))
You're speaking of a Vandermonde matrix. Numpy has numpy.vander
def c_matrix_vander(n):
exp = np.exp(1j*np.pi/n)
exp_n = np.array([[exp, 0], [0, exp.conj()]], dtype=complex)
return np.vander(exp_n.ravel(), n, increasing=True)[:, 1:].swapaxes(0, 1).reshape(n-1, 2, 2)
Performance
In [184]: %timeit c_matrix_vander(10_000)
849 µs ± 14.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [185]: %timeit c_matrix(10_000)
41.5 ms ± 549 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Validation
>>> np.isclose(c_matrix(10_000), c_matrix_vander(10_000)).all()
True

What is maybe_convert_objects good for?

I'm profiling the timing of one od my functions and I see that I spent alot of time on pandas DataFrame creation - I'm talking about 2.5 seconds to construct a dataFrame with 1000 columns and 10k rows:
def test(size):
samples = []
for r in range(10000):
a,b = np.random.randint(100, size=2)
data = np.random.beta(a,b ,size = size)
samples.append(data)
return DataFrame(samples, dtype = np.float64)
Running %prun -l 4 test(1000) returns:
Is there anyway I can avoid this check? This really not seems Tried to find out about this method and ways to bypass here but didnt find anything online.
pandas must introspect each row because you are passing it a list of arrays. Here are some more efficient methods in this case.
In [27]: size=1000
In [28]: samples = []
...: for r in range(10000):
...: data = np.random.beta(1,1 ,size = size)
...: samples.append(data)
...:
In [29]: np.asarray(samples).shape
Out[29]: (10000, 1000)
# original
In [30]: %timeit DataFrame(samples)
2.29 s ± 91.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# numpy is less flexible on the conversion, but in this case
# it is fine
In [31]: %timeit DataFrame(np.asarray(samples))
30.9 ms ± 426 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# you should probably just do this
In [32]: samples = np.random.beta(1,1, size=(10000, 1000))
In [33]: %timeit DataFrame(samples)
74.4 µs ± 381 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

numpy elementwise outer product

I want to do the element-wise outer product of two 2d arrays in numpy.
A.shape = (100, 3) # A numpy ndarray
B.shape = (100, 5) # A numpy ndarray
C = element_wise_outer_product(A, B) # A function that does the trick
C.shape = (100, 3, 5) # This should be the result
C[i] = np.outer(A[i], B[i]) # This should be the result
A naive implementation can the following.
tmp = []
for i in range(len(A):
outer_product = np.outer(A[i], B[i])
tmp.append(outer_product)
C = np.array(tmp)
A better solution inspired from stack overflow.
big_outer = np.multiply.outer(A, B)
tmp = np.swapaxes(tmp, 1, 2)
C_tmp = [tmp[i][i] for i in range(len(A)]
C = np.array(C_tmp)
I'm looking for a vectorized implementation that gets rid the for loop.
Does anyone have an idea?
Thank you!
Extend A and B to 3D keeping their first axis aligned and introducing new axes along the third and second ones respectively with None/np.newaxis and then multiply with each other. This would allow broadcasting to come into play for a vectorized solution.
Thus, an implementation would be -
A[:,:,None]*B[:,None,:]
We could shorten it a bit by using ellipsis for A's : :,: and skip listing the leftover last axis with B, like so -
A[...,None]*B[:,None]
As another vectorized approach we could also use np.einsum, which might be more intuitive once we get past the string notation syntax and consider those notations being representatives of the iterators involved in a naive loopy implementation, like so -
np.einsum('ij,ik->ijk',A,B)
Another solution using np.lib.stride_tricks.as_strided()..
Here the strategy is to, in essence, build a (100, 3, 5) array As and a (100, 3, 5) array Bs such that the normal element-wise product of these arrays will produce the desired result. Of course, we don't actually build big memory consuming arrays, thanks to as_strided(). (as_strided() is like a blueprint that tells NumPy how you'd map data from the original arrays to construct As and Bs.)
def outer_prod_stride(A, B):
"""stride trick"""
a = A.shape[-1]
b = B.shape[-1]
d = A.strides[-1]
new_shape = A.shape + (b,)
As = np.lib.stride_tricks.as_strided(A, shape=new_shape, strides=(a*d, d, 0))
Bs = np.lib.stride_tricks.as_strided(B, shape=new_shape, strides=(b*d, 0, d))
return As * Bs
Timings
def outer_prod_broadcasting(A, B):
"""Broadcasting trick"""
return A[...,None]*B[:,None]
def outer_prod_einsum(A, B):
"""einsum() trick"""
return np.einsum('ij,ik->ijk',A,B)
def outer_prod_stride(A, B):
"""stride trick"""
a = A.shape[-1]
b = B.shape[-1]
d = A.strides[-1]
new_shape = A.shape + (b,)
As = np.lib.stride_tricks.as_strided(A, shape=new_shape, strides=(a*d, d, 0))
Bs = np.lib.stride_tricks.as_strided(B, shape=new_shape, strides=(b*d, 0, d))
return As * Bs
%timeit op1 = outer_prod_broadcasting(A, B)
2.54 µs ± 436 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit op2 = outer_prod_einsum(A, B)
3.03 µs ± 637 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit op3 = outer_prod_stride(A, B)
16.6 µs ± 5.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Seems my stride trick solution is slower than both #Divkar's solutions. ..still an interesting method worth knowing though.

Vectorized way of calculating row-wise dot product two matrices with Scipy

I want to calculate the row-wise dot product of two matrices of the same dimension as fast as possible. This is the way I am doing it:
import numpy as np
a = np.array([[1,2,3], [3,4,5]])
b = np.array([[1,2,3], [1,2,3]])
result = np.array([])
for row1, row2 in a, b:
result = np.append(result, np.dot(row1, row2))
print result
and of course the output is:
[ 26. 14.]
Straightforward way to do that is:
import numpy as np
a=np.array([[1,2,3],[3,4,5]])
b=np.array([[1,2,3],[1,2,3]])
np.sum(a*b, axis=1)
which avoids the python loop and is faster in cases like:
def npsumdot(x, y):
return np.sum(x*y, axis=1)
def loopdot(x, y):
result = np.empty((x.shape[0]))
for i in range(x.shape[0]):
result[i] = np.dot(x[i], y[i])
return result
timeit npsumdot(np.random.rand(500000,50),np.random.rand(500000,50))
# 1 loops, best of 3: 861 ms per loop
timeit loopdot(np.random.rand(500000,50),np.random.rand(500000,50))
# 1 loops, best of 3: 1.58 s per loop
Check out numpy.einsum for another method:
In [52]: a
Out[52]:
array([[1, 2, 3],
[3, 4, 5]])
In [53]: b
Out[53]:
array([[1, 2, 3],
[1, 2, 3]])
In [54]: einsum('ij,ij->i', a, b)
Out[54]: array([14, 26])
Looks like einsum is a bit faster than inner1d:
In [94]: %timeit inner1d(a,b)
1000000 loops, best of 3: 1.8 us per loop
In [95]: %timeit einsum('ij,ij->i', a, b)
1000000 loops, best of 3: 1.6 us per loop
In [96]: a = random.randn(10, 100)
In [97]: b = random.randn(10, 100)
In [98]: %timeit inner1d(a,b)
100000 loops, best of 3: 2.89 us per loop
In [99]: %timeit einsum('ij,ij->i', a, b)
100000 loops, best of 3: 2.03 us per loop
Note: NumPy is constantly evolving and improving; the relative performance of the functions shown above has probably changed over the years. If performance is important to you, run your own tests with the version of NumPy that you will be using.
Played around with this and found inner1d the fastest. That function however is internal, so a more robust approach is to use
numpy.einsum("ij,ij->i", a, b)
Even better is to align your memory such that the summation happens in the first dimension, e.g.,
a = numpy.random.rand(3, n)
b = numpy.random.rand(3, n)
numpy.einsum("ij,ij->j", a, b)
For 10 ** 3 <= n <= 10 ** 6, this is the fastest method, and up to twice as fast as its untransposed equivalent. The maximum occurs when the level-2 cache is maxed out, at about 2 * 10 ** 4.
Note also that the transposed summation is much faster than its untransposed equivalent.
The plot was created with perfplot (a small project of mine)
import numpy
from numpy.core.umath_tests import inner1d
import perfplot
def setup(n):
a = numpy.random.rand(n, 3)
b = numpy.random.rand(n, 3)
aT = numpy.ascontiguousarray(a.T)
bT = numpy.ascontiguousarray(b.T)
return (a, b), (aT, bT)
b = perfplot.bench(
setup=setup,
n_range=[2 ** k for k in range(1, 25)],
kernels=[
lambda data: numpy.sum(data[0][0] * data[0][1], axis=1),
lambda data: numpy.einsum("ij, ij->i", data[0][0], data[0][1]),
lambda data: numpy.sum(data[1][0] * data[1][1], axis=0),
lambda data: numpy.einsum("ij, ij->j", data[1][0], data[1][1]),
lambda data: inner1d(data[0][0], data[0][1]),
],
labels=["sum", "einsum", "sum.T", "einsum.T", "inner1d"],
xlabel="len(a), len(b)",
)
b.save("out1.png")
b.save("out2.png", relative_to=3)
You'll do better avoiding the append, but I can't think of a way to avoid the python loop. A custom Ufunc perhaps? I don't think numpy.vectorize will help you here.
import numpy as np
a=np.array([[1,2,3],[3,4,5]])
b=np.array([[1,2,3],[1,2,3]])
result=np.empty((2,))
for i in range(2):
result[i] = np.dot(a[i],b[i]))
print result
EDIT
Based on this answer, it looks like inner1d might work if the vectors in your real-world problem are 1D.
from numpy.core.umath_tests import inner1d
inner1d(a,b) # array([14, 26])
I came across this answer and re-verified the results with Numpy 1.14.3 running in Python 3.5. For the most part the answers above hold true on my system, although I found that for very large matrices (see example below), all but one of the methods are so close to one another that the performance difference is meaningless.
For smaller matrices, I found that einsum was the fastest by a considerable margin, up to a factor of two in some cases.
My large matrix example:
import numpy as np
from numpy.core.umath_tests import inner1d
a = np.random.randn(100, 1000000) # 800 MB each
b = np.random.randn(100, 1000000) # pretty big.
def loop_dot(a, b):
result = np.empty((a.shape[1],))
for i, (row1, row2) in enumerate(zip(a, b)):
result[i] = np.dot(row1, row2)
%timeit inner1d(a, b)
# 128 ms ± 523 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.einsum('ij,ij->i', a, b)
# 121 ms ± 402 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.sum(a*b, axis=1)
# 411 ms ± 1.99 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit loop_dot(a, b) # note the function call took negligible time
# 123 ms ± 342 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
So einsum is still the fastest on very large matrices, but by a tiny amount. It appears to be a statistically significant (tiny) amount though!