How to np.roll() faster?

How to np.roll() faster? - numpy

I'm using np.roll() to do nearest-neighbor-like averaging, but I have a feeling there are faster ways. Here is a simplified example, but imagine 3 dimensions and more complex averaging "stencils". Just for example see section 6 of this paper.
Here are a few lines from that simplified example:
for j in range(nper):
phi2 = 0.25*(np.roll(phi, 1, axis=0) +
np.roll(phi, -1, axis=0) +
np.roll(phi, 1, axis=1) +
np.roll(phi, -1, axis=1) )
phi[do_me] = phi2[do_me]
So should I be looking for something that returns views instead of arrays (as it seems roll returns arrays)? In this case is roll initializing a new array each time it's called? I noticed the overhead is huge for small arrays.
In fact it's most efficient for arrays of [100,100] to [300,300] in size on my laptop. Possibly caching issues above that.
Would scipy.ndimage.interpolation.shift() perform better, the way it is implemented here, and if so, is it fixed? In the linked example above, I'm throwing away the wrapped parts anyway, but might not always.
note: in this question I'm looking only for what is available within NumPy / SciPy. Of course there are many good ways to speed up Python and even NumPy, but that's not what I'm looking for here, because I'm really trying to understand NumPy better. Thanks!

np.roll has to create a copy of the array each time, which is why it is (comparatively) slow. Convolution with something like scipy.ndimage.filters.convolve() will be a bit faster, but it may still create copies (depending on the implementation).
In this particular case, we can avoid copying altogether by using numpy views and padding the original array in the beginning.
import numpy as np
def no_copy_roll(nx, ny):
phi_padded = np.zeros((ny+2, nx+2))
# these are views into different sub-arrays of phi_padded
# if two sub-array overlap, they share memory
phi_north = phi_padded[:-2, 1:-1]
phi_east = phi_padded[1:-1, 2:]
phi_south = phi_padded[2:, 1:-1]
phi_west = phi_padded[1:-1, :-2]
phi = phi_padded[1:-1, 1:-1]
do_me = np.zeros_like(phi, dtype='bool')
do_me[1:-1, 1:-1] = True
x0, y0, r0 = 40, 65, 12
x = np.arange(nx, dtype='float')[None, :]
y = np.arange(ny, dtype='float')[:, None]
rsq = (x-x0)**2 + (y-y0)**2
circle = rsq <= r0**2
phi[circle] = 1.0
do_me[circle] = False
n, nper = 100, 100
phi_hold = np.zeros((n+1, ny, nx))
phi_hold[0] = phi
for i in range(n):
for j in range(nper):
phi2 = 0.25*(phi_south +
phi_north +
phi_east +
phi_west)
phi[do_me] = phi2[do_me]
phi_hold[i+1] = phi
return phi_hold
This will cut about 35% off the time for a simple benchmark like
from original import original_roll
from mwe import no_copy_roll
import numpy as np
nx, ny = (301, 301)
arr1 = original_roll(nx, ny)
arr2 = no_copy_roll(nx, ny)
assert np.allclose(arr1, arr2)
here is my profiling result
37.685 <module> timing.py:1
├─ 22.413 original_roll original.py:4
│ ├─ 15.056 [self]
│ └─ 7.357 roll <__array_function__ internals>:2
│ └─ 7.243 roll numpy\core\numeric.py:1110
│ [10 frames hidden] numpy
├─ 14.709 no_copy_roll mwe.py:4
└─ 0.393 allclose <__array_function__ internals>:2
└─ 0.393 allclose numpy\core\numeric.py:2091
[2 frames hidden] numpy
0.391 isclose <__array_function__ internals>:2
└─ 0.387 isclose numpy\core\numeric.py:2167
[4 frames hidden] numpy
For more elaborate stencils this approach still works, but it can get a bit unwieldy. In this case you can take a look at skimage.util.view_as_windows, which uses a variation of this trick (numpy stride tricks) to return an array that gives you cheap access to a window around each element. You will, however, have to do your own padding, and will need to take care to not create copies of the resulting array, which can get expensive fast.

The fastest implementation I could get so far is based on the low-level implementation of scipy.ndimage.interpolation.shift that you already mentioned:
from scipy.ndimage.interpolation import _nd_image, _ni_support
cval = 0.0 # unused for mode `wrap`
mode = _ni_support._extend_mode_to_code('wrap')
_nd_image.zoom_shift(data, None, shift, data, 0, mode, cval) # in-place update
Precomputing mode, cval and shift to call low-level zoom_shift method directly got me about x5 speedup wrt. calling shift instead, and x10 speedup wrt np.roll.

Related

Show class probabilities from Numpy array

I've had a look through and I don't think stack has an answer for this, I am fairly new at this though any help is appreciated.
I'm using an AWS Sagemaker endpoint to return a png mask and I'm trying to display the probability as a whole of each class.
So first stab does this:
np.set_printoptions(threshold=np.inf)
pred_map = np.argmax(mask, axis=0)
non_zero_mask = pred_map[pred_map != 0]) # get everything but background
# print(np.bincount(pred_map[pred_map != 0]).argmax()) # Ignore this line as it just shows the most probable
num_classes = 6
plt.imshow(pred_map, vmin=0, vmax=num_classes-1, cmap='jet')
plt.show()
As you can see I'm removing the background pixels, now I need to show class 1,2,3,4,5 have X probability based on the number of pixels they occupy - I'm unsure if I'll reinvent the wheel by simply taking the total number of elements from the original mask then looping and counting each pixel/class number etc - are there inbuilt methods for this please?
Update:
So after typing this out had a little think and reworded some of searches and came across this.
unique_elements, counts_elements = np.unique(pred_map[pred_map != 0], return_counts=True)
print(np.asarray((unique_elements, counts_elements)))
#[[ 2 3]
#[87430 2131]]
So then I'd just calculate the % based on this or is there a better way? For example I'd do
87430 / 89561(total number of pixels in the mask) * 100
Giving 2 in this case a 97% probability.
Update for Joe's comment below:
rec = Record()
recordio = mx.recordio.MXRecordIO(results_file, 'r')
protobuf = rec.ParseFromString(recordio.read())
values = list(rec.features["target"].float32_tensor.values)
shape = list(rec.features["shape"].int32_tensor.values)
shape = np.squeeze(shape)
mask = np.reshape(np.array(values), shape)
mask = np.squeeze(mask, axis=0)

My first thought was to use np.digitize and write a nice solution.
But then I realized how you can hack it in 10 lines:
import numpy as np
import matplotlib.pyplot as plt
size = (10, 10)
x = np.random.randint(0, 7, size) # your classes, seven excluded.
# empty array, filled with mask and number of occurrences.
x_filled = np.zeros_like(x)
for i in range(1, 7):
mask = x == i
count_mask = np.count_nonzero(mask)
x_filled[mask] = count_mask
print(x_filled)
plt.imshow(x_filled)
plt.colorbar()
plt.show()
I am not sure about the axis convention with imshow
at the moment, you might have to flip the y axis so up is up.

SageMaker does not provide in-built methods for this.

random.multivariate_normal on a dask array?

I've been struggling to find a way to get this calc that works for a dask workflow.
I have code that uses np.random.mulivariate_normal function and while many of these types are available to us on dask array it seems this one it not. Sooo.... I attempted to create my own based on an example provided in the dask documentation.
Here is my attempt which is giving errors that I am having difficulty understanding. I also provided random input variables to make it easy to replicate:
import numpy as np
from dask.distributed import Client
import dask.array as da
def mvn(mu, sigma, n, blocksize):
chunks = ((blocksize,) * (n // blocksize),
(blocksize,) * (n // blocksize))
name = 'mvn' # unique identifier
dsk = {(name, i, j): (np.random.multivariate_normal(mu,sigma, blocksize))
if i == j else
(np.zeros, (blocksize, blocksize))
for i in range(n // blocksize)
for j in range(n // blocksize)}
dtype = np.random.multivariate_normal(0).dtype # take dtype default from numpy
return da.Array(dsk, name, chunks, dtype)
n = 10000
A = da.random.normal(0, 1, size=(n,n), chunks=(1000, 1000))
sigma = da.dot(A,A.transpose())
mu = 4.0*da.ones(n, chunks = 1000)
R = da.numpy.random.mvn(mu, sigma, n, chunks=(100))
Any suggestions or am I so far off the mark here that I should abandon all hope? Thanks!

If you have a cluster to run this on, you can use my answer from this post, copied here for refrence:
An work arround for now, is to use a cholesky decomposition. Note that any covariance matrix C can be expressed as C=G*G'. It then follows that x = G'*y is correlated as specified in C if y is standard normal (see this excellent post on StackExchange Mathematic). In code:
Numpy
n_dim =4
size = 100000
A = np.random.randn(n_dim, n_dim)
covm = A.dot(A.T)
x= np.random.multivariate_normal(size=size, mean=np.zeros(len(covm)),cov=covm)
## verify numpys covariance is correct
np.cov(x, rowvar=False)
covm
Dask
## create covariance matrix
A = da.random.standard_normal(size=(n_dim, n_dim),chunks=(2,2))
covm = A.dot(A.T)
## get cholesky decomp
L = da.linalg.cholesky(covm, lower=True)
## drawn standard normal
sn= da.random.standard_normal(size=(size, n_dim),chunks=(100,100))
## correct for correlation
x =L.dot(sn.T)
x.shape
## verify
covm.compute()
da.cov(x, rowvar=True).compute()

This answer can be fleshed out, but I imagine you would have an easier time using dask's delayed, da.from_delayed and da.*stack.
One immediate problem I see with what you have: with np.random.multivariate_normal(mu,sigma, blocksize) you are directly calling the function, instead of making the spec. You probably wanted (np.random.multivariate_normal, mu,sigma, blocksize). This shows that working with raw dask dictionaries can be tricky!

Scipy Butter bandpass is not producing the desired results

So I'm trying to bandpass filter a wav PCM 24-bit 44.1khz file. What I would like to do is bandpass each frequency from 0Hz-22Khz.
So far I have loaded the data and can display it on Matplot and it looks like the following.
But when I go to apply the bandpass filter which I got from here
http://scipy-cookbook.readthedocs.io/items/ButterworthBandpass.html
I get the following result:
So I'm trying to bandpass at 100-101Hz as a test, here is my code:
from WaveData import WaveData
import matplotlib.pyplot as plt
from scipy.signal import butter, lfilter, freqz
from scipy.io.wavfile import read
import numpy as np
from WaveData import WaveData
class Filter:
def __init__(self, wav):
self.waveData = WaveData(wav)
def butter_bandpass(self, lowcut, highcut, fs, order=5):
nyq = 0.5 * fs
low = lowcut / nyq
high = highcut / nyq
b, a = butter(order, [low, high], btype='band')
return b, a
def butter_bandpass_filter(self, data, lowcut, highcut, fs, order):
b, a = self.butter_bandpass(lowcut, highcut, fs, order=order)
y = lfilter(b, a, data)
return y
def getFilteredSignal(self, freq):
return self.butter_bandpass_filter(data=self.waveData.file['Data'], lowcut=100, highcut=101, fs=44100, order=3)
def getUnprocessedData(self):
return self.waveData.file['Data']
def plot(self, signalA, signalB=None):
plt.plot(signalA)
if signalB != None:
plt.plot(signalB)
plt.show()
if __name__ == "__main__":
# file = WaveData("kick.wav")
# fileA = read("kick0.wav")
f = Filter("kick.wav")
a, b = f. butter_bandpass(lowcut=100, highcut=101, fs=44100)
w, h = freqz(b, a, worN=22000) ##Filted signal is not working?
f.plot(h, w)
print("break")
I dont understand where I have gone wrong.
Thanks

What #WoodyDev said is true: 1 Hz out of 44.1 kHz is way way too tiny of a bandpass for any kind of filter. Just look at the filter coefficients butter returns:
In [3]: butter(5, [100/(44.1e3/2), 101/(44.1e3/2)], btype='band')
Out[3]:
(array([ 1.83424060e-21, 0.00000000e+00, -9.17120299e-21, 0.00000000e+00,
1.83424060e-20, 0.00000000e+00, -1.83424060e-20, 0.00000000e+00,
9.17120299e-21, 0.00000000e+00, -1.83424060e-21]),
array([ 1. , -9.99851389, 44.98765092, -119.95470631,
209.90388506, -251.87018009, 209.88453023, -119.93258575,
44.9752074 , -9.99482662, 0.99953904]))
Look at the b coefficients (the first array): their values at 1e-20, meaning the filter design totally failed to converge, and if you apply it to any signal, the output will be zero—which is what you found.
You didn't mention your application but if you really really want to keep the signal's frequency content between 100 and 101 Hz, you could take a zero-padded FFT of the signal, zero out the portions of the spectrum outside that band, and IFFT (look at rfft, irfft, and rfftfreq in numpy.fft module).
Here's a function that applies a brick-wall bandpass filter in the Fourier domain using FFTs:
import numpy.fft as fft
import numpy as np
def fftBandpass(x, low, high, fs=1.0):
"""
Apply a bandpass signal via FFTs.
Parameters
----------
x : array_like
Input signal vector. Assumed to be real-only.
low : float
Lower bound of the passband in Hertz. (If less than or equal
to zero, a high-pass filter is applied.)
high : float
Upper bound of the passband, Hertz.
fs : float
Sample rate in units of samples per second. If `high > fs / 2`,
the output is low-pass filtered.
Returns
-------
y : ndarray
Output signal vector with all frequencies outside the `[low, high]`
passband zeroed.
Caveat
------
Note that the energe in `y` will be lower than the energy in `x`, i.e.,
`sum(abs(y)) < sum(abs(x))`.
"""
xf = fft.rfft(x)
f = fft.rfftfreq(len(x), d=1 / fs)
xf[f < low] = 0
xf[f > high] = 0
return fft.irfft(xf, len(x))
if __name__ == '__main__':
fs = 44.1e3
N = int(fs)
x = np.random.randn(N)
t = np.arange(N) / fs
import pylab as plt
plt.figure()
plt.plot(t, x, t, 100 * fftBandpass(x, 100, 101, fs=fs))
plt.xlabel('time (seconds)')
plt.ylabel('signal')
plt.legend(['original', 'scaled bandpassed'])
plt.show()
You can put this in a file, fftBandpass.py, and just run it with python fftBandpass.py to see it create the following plot:
Note I had to scale the 1 Hz bandpassed signal by 100 because, after bandpassing that much, there's very little energy in the signal. Also note that the signal living inside this small a passband is pretty much just a sinusoid at around 100 Hz.
If you put the following in your own code: from fftBandpass import fftBandpass, you can use the fftBandpass function.
Another thing you could try is to decimate the signal 100x, so convert it to a signal that was sampled at 441 Hz. 1 Hz out of 441 Hz is still a crazy-narrow passband but you might have better luck than trying to bandpass the original signal. See scipy.signal.decimate, but don't try and call it with q=100, instead recursively decimate the signal, by 2, then 2, then 5, then 5 (for total decimation of 100x).

So there are some problems with your code which means you aren't plotting the results correctly, although I believe this isn't your main problem.
Check your code
In the example you linked, they show precisely the process for calculating, and plotting the filter at different orders:
for order in [3, 6, 9]:
b, a = butter_bandpass(lowcut, highcut, fs, order=order)
w, h = freqz(b, a, worN=2000)
plt.plot((fs * 0.5 / np.pi) * w, abs(h), label="order = %d" % order)
You are currently not scaling your frequency axis correctly, or calling the absolute to get the real informatino from h, like the correct code above.
Check your theory
However your main issue, is your such steep bandpass (i.e. only 100Hz - 101Hz). It is very rare that I have seen a filter so sharp as this is very processing intensive (will require a lot of filter coefficients), and because you are only looking at a range of 1Hz, it will completely get rid of all other frequencies.
So the graph you have shown with the gain as 0 may very well be correct. If you use their example and change the bandpass cutoff frequencies to 100Hz -> 101Hz, then the output result is an array of (almost if not completely) zeros. This is because it will only be looking at the energy of the signal in a 1Hz range which will be very very small if you think about it.
If you are doing this for analysis, the frequency spacing tends to be much larger i.e. Octave Bands (or smaller divisions of octave bands).
The Spectrogram
As I am not sure of your end purpose I cannot clarify exactly which route you should take to get there. However, using bandpass filters on every single frequency up to 20kHz seems kind of silly in this day and age.
If I remember correctly, some of the first spectrogram attempts with needles on paper used this technique with analog band pass filter banks to analyze the frequency content. So this makes me think you may be looking for something to do with a spectrogram? It lets you analyze the whole signal's frequency information vs time and still has all of the signal's amplitude information. Python already has spectrogram functionality included as part of scipy or Matplotlib.

NumPy vectorization with integration

I have a vector and wish to make another vector of the same length whose k-th component is
The question is: how can we vectorize this for speed? NumPy vectorize() is actually a for loop, so it doesn't count.
Veedrac pointed out that "There is no way to apply a pure Python function to every element of a NumPy array without calling it that many times". Since I'm using NumPy functions rather than "pure Python" ones, I suppose it's possible to vectorize, but I don't know how.
import numpy as np
from scipy.integrate import quad
ws = 2 * np.random.random(10) - 1
n = len(ws)
integrals = np.empty(n)
def f(x, w):
if w < 0: return np.abs(x * w)
else: return np.exp(x) * w
def temp(x): return np.array([f(x, w) for w in ws]).sum()
def integrand(x, w): return f(x, w) * np.log(temp(x))
## Python for loop
for k in range(n):
integrals[k] = quad(integrand, -1, 1, args = ws[k])[0]
## NumPy vectorize
integrals = np.vectorize(quad)(integrand, -1, 1, args = ws)[0]
On a side note, is a Cython for loop always faster than NumPy vectorization?

The function quad executes an adaptive algorithm, which means the computations it performs depend on the specific thing being integrated. This cannot be vectorized in principle.
In your case, a for loop of length 10 is a non-issue. If the program takes long, it's because integration takes long, not because you have a for loop.
When you absolutely need to vectorize integration (not in the example above), use a non-adaptive method, with the understanding that precision may suffer. These can be directly applied to a 2D NumPy array obtained by evaluating all of your functions on some regularly spaced 1D array (a linspace). You'll have to choose the linspace yourself since the methods aren't adaptive.
numpy.trapz is the simplest and least precise
scipy.integrate.simps is equally easy to use and more precise (Simpson's rule requires an odd number of samples, but the method works around having an even number, too).
scipy.integrate.romb is in principle of higher accuracy than Simpson (for smooth data) but it requires the number of samples to be 2**n+1 for some integer n.

#zaq's answer focusing on quad is spot on. So I'll look at some other aspects of the problem.
In recent https://stackoverflow.com/a/41205930/901925 I argue that vectorize is of most value when you need to apply the full broadcasting mechanism to a function that only takes scalar values. Your quad qualifies as taking scalar inputs. But you are only iterating on one array, ws. The x that is passed on to your functions is generated by quad itself. quad and integrand are still Python functions, even if they use numpy operations.
cython improves low level iteration, stuff that it can convert to C code. Your primary iteration is at a high level, calling an imported function, quad. Cython can't touch or rewrite that.
You might be able to speed up integrand (and on down) with cython, but first focus on getting the most speed from that with regular numpy code.
def f(x, w):
if w < 0: return np.abs(x * w)
else: return np.exp(x) * w
With if w<0 w must be scalar. Can it be written so it works with an array w? If so, then
np.array([f(x, w) for w in ws]).sum()
could be rewritten as
fn(x, ws).sum()
Alternatively, since both x and w are scalar, you might get a bit of speed improvement by using math.exp etc instead of np.exp. Same for log and abs.
I'd try to write f(x,w) so it takes arrays for both x and w, returning a 2d result. If so, then temp and integrand would also work with arrays. Since quad feeds a scalar x, that may not help here, but with other integrators it could make a big difference.
If f(x,w) can be evaluated on a regular nx10 grid of x=np.linspace(-1,1,n) and ws, then an integral (of sorts) just requires a couple of summations over that space.

You can use quadpy for fully vectorized computation. You'll have to adapt your function to allow for vector inputs first, but that is done rather easily:
import numpy as np
import quadpy
np.random.seed(0)
ws = 2 * np.random.random(10) - 1
def f(x):
out = np.empty((len(ws), *x.shape))
out0 = np.abs(np.multiply.outer(ws, x))
out1 = np.multiply.outer(ws, np.exp(x))
out[ws < 0] = out0[ws < 0]
out[ws >= 0] = out1[ws >= 0]
return out
def integrand(x):
return f(x) * np.log(np.sum(f(x), axis=0))
val, err = quadpy.quad(integrand, -1, +1, epsabs=1.0e-10)
print(val)
[0.3266534 1.44001826 0.68767868 0.30035222 0.18011948 0.97630376
0.14724906 2.62169217 3.10276876 0.27499376]

Inconsistent interface of Pandas Series; yielding access to underlying data

While the new Categorical Series support since pandas 0.15.0 is fantastic, I'm a bit annoyed with how they decided to make the underlying data inaccessible except through underscored variables. Consider the following code:
import numpy as np
import pandas as pd
x = np.empty(3, dtype=np.int64)
s = pd.DatetimeIndex(x, tz='UTC')
x
Out[17]: array([140556737562568, 55872352, 32])
s[0]
Out[18]: Timestamp('1970-01-02 15:02:36.737562568+0000', tz='UTC')
x[0] = 0
s[0]
Out[20]: Timestamp('1970-01-01 00:00:00+0000', tz='UTC')
y = s.values
y[0] = 5
x[0]
Out[23]: 5
s[0]
Out[24]: Timestamp('1970-01-01 00:00:00.000000005+0000', tz='UTC')
We can see that both in construction and when asked for underlying values, no deep copies are being made in this DatetimeIndex with regards to its underlying data. Not only is this potentially useful in terms of efficiency, but it's great if you are using a DataFrame as a buffer. You can easily get the numpy primitive containing the underlying data, from there get a pointer to the raw data, which some low level C routine can use to do a copy into from some block of memory.
Now lets look at the behavior of the new Categorical Series. The underlying data of course is not the levels, but the codes.
x2 = np.zeros(3, dtype=np.int64)
s2 = pd.Categorical.from_codes(x2, ["hello", "bye"])
s2
Out[27]:
[hello, hello, hello]
Categories (2, object): [hello, bye]
x2[0] = 1
s2[0]
Out[29]: 'hello'
y2 = s2.codes
y2[0] = 1
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-31-0366d645c98d> in <module>()
----> 1 y2[0] = 1
ValueError: assignment destination is read-only
y2 = s2._codes
y2[0] = 1
s2[0]
Out[34]: 'bye'
The net effect of this behavior is that as a developer, efficient manipulation of the underlying data for Categoricals is not part of the interface. Also as a user, the from_codes constructor is slow as it deep copies the codes, which may often be unnecessary. There should at least be an option for this.
But the fact that codes is a read only variable and _codes needs to be used strikes me as worse. Why wouldn't .codes give the same behavior as .values? Is there some justification for this beyond the concept that the codes are "private"? I'm hoping some of the pandas gurus on stackoverflow can shed some light on this.

The Categorical type is different from almost all other types in that it is a compound type that has a certain guarantee among its data. Namely that the codes provide a factorization of the levels.
So the argument against mutability is that it would be easy to break the codes-categories mapping, and it could be non-performant. Of course these could possibly be mitigated with checking on the setitem instead (but with some added code complexity).
The vast majority of users are not going to manipulate the codes/categories directly (and only use exposed methods) so this is really a protection against accidently breaking these guarantees.
If you need to efficiently manipulate the underlying data, best/easiest is simply to pull out the codes/categories. Mutate them, then create a new Categorical (which is cheap if codes/categories are already provided).
e.g.
In [3]: s2 = pd.Categorical.from_codes(x2, ["hello", "bye"])
In [4]: s2
Out[4]:
[hello, hello, hello]
Categories (2, object): [hello, bye]
In [5]: s2.codes
Out[5]: array([0, 0, 0], dtype=int8)
In [6]: pd.Categorical(s2.codes+1,s2.categories,fastpath=True)
Out[6]:
[bye, bye, bye]
Categories (2, object): [hello, bye]
Of course this is quite dangerous, if you added 2 to the expression would blow up. Manipulation of the codes directly is simply buyer-be-ware.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas