Lets say I have a Python Numpy array a.
a = numpy.array([1,2,3,4,5,6,7,8,9,10,11])
I want to create a matrix of sub sequences from this array of length 5 with stride 3. The results matrix hence will look as follows:
numpy.array([[1,2,3,4,5],[4,5,6,7,8],[7,8,9,10,11]])
One possible way of implementing this would be using a for-loop.
result_matrix = np.zeros((3, 5))
for i in range(0, len(a), 3):
result_matrix[i] = a[i:i+5]
Is there a cleaner way to implement this in Numpy?
Approach #1 : Using broadcasting -
def broadcasting_app(a, L, S ): # Window len = L, Stride len/stepsize = S
nrows = ((a.size-L)//S)+1
return a[S*np.arange(nrows)[:,None] + np.arange(L)]
Approach #2 : Using more efficient NumPy strides -
def strided_app(a, L, S ): # Window len = L, Stride len/stepsize = S
nrows = ((a.size-L)//S)+1
n = a.strides[0]
return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))
Sample run -
In [143]: a
Out[143]: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
In [144]: broadcasting_app(a, L = 5, S = 3)
Out[144]:
array([[ 1, 2, 3, 4, 5],
[ 4, 5, 6, 7, 8],
[ 7, 8, 9, 10, 11]])
In [145]: strided_app(a, L = 5, S = 3)
Out[145]:
array([[ 1, 2, 3, 4, 5],
[ 4, 5, 6, 7, 8],
[ 7, 8, 9, 10, 11]])
Starting in Numpy 1.20, we can make use of the new sliding_window_view to slide/roll over windows of elements.
And coupled with a stepping [::3], it simply becomes:
from numpy.lib.stride_tricks import sliding_window_view
# values = np.array([1,2,3,4,5,6,7,8,9,10,11])
sliding_window_view(values, window_shape = 5)[::3]
# array([[ 1, 2, 3, 4, 5],
# [ 4, 5, 6, 7, 8],
# [ 7, 8, 9, 10, 11]])
where the intermediate result of the sliding is:
sliding_window_view(values, window_shape = 5)
# array([[ 1, 2, 3, 4, 5],
# [ 2, 3, 4, 5, 6],
# [ 3, 4, 5, 6, 7],
# [ 4, 5, 6, 7, 8],
# [ 5, 6, 7, 8, 9],
# [ 6, 7, 8, 9, 10],
# [ 7, 8, 9, 10, 11]])
Modified version of #Divakar's code with checking to ensure that memory is contiguous and that the returned array cannot be modified. (Variable names changed for my DSP application).
def frame(a, framelen, frameadv):
"""frame - Frame a 1D array
a - 1D array
framelen - Samples per frame
frameadv - Samples between starts of consecutive frames
Set to framelen for non-overlaping consecutive frames
Modified from Divakar's 10/17/16 11:20 solution:
https://stackoverflow.com/questions/40084931/taking-subarrays-from-numpy-array-with-given-stride-stepsize
CAVEATS:
Assumes array is contiguous
Output is not writable as there are multiple views on the same memory
"""
if not isinstance(a, np.ndarray) or \
not (a.flags['C_CONTIGUOUS'] or a.flags['F_CONTIGUOUS']):
raise ValueError("Input array a must be a contiguous numpy array")
# Output
nrows = ((a.size-framelen)//frameadv)+1
oshape = (nrows, framelen)
# Size of each element in a
n = a.strides[0]
# Indexing in the new object will advance by frameadv * element size
ostrides = (frameadv*n, n)
return np.lib.stride_tricks.as_strided(a, shape=oshape,
strides=ostrides, writeable=False)
I have 3D numpy array, for example, like this:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]],
[[16, 17, 18, 19],
[20, 21, 22, 23],
[24, 25, 26, 27],
[28, 29, 30, 31]]])
Is there a way to index it in such a way that I select, for example, top right corner of 2x2 elements in the first plane, and a center 2x2 elements subarray from the second plane? So that I could then zero out the elements 2,3,6,7,21,22,25,26:
array([[[ 0, 1, 0, 0],
[ 4, 5, 0, 0],
[ 8, 9, 10, 11],
[12, 13, 14, 15]],
[[16, 17, 18, 19],
[20, 0, 0, 23],
[24, 0, 0, 27],
[28, 29, 30, 31]]])
I have a batch of images, and I need to zero out a small window of fixed size, but at different (random) locations for each image in the batch. The first dimension is number of images.
Something like this:
a[:, x: x+2, y: y+2] = 0
where x and y are vectors which have different values for each first dimension of a.
Approach #1 : Here'e one approach that's mostly based on linear-indexing -
def random_block_fill_lidx(a, N, fillval=0):
# a is input array
# N is blocksize
# Store shape info
m,n,r = a.shape
# Get all possible starting linear indices for each 2D slice
possible_start_lidx = (np.arange(n-N+1)[:,None]*r + range(r-N+1)).ravel()
# Get random start indices from all possible ones for all 2D slices
start_lidx = np.random.choice(possible_start_lidx, m)
# Get linear indices for the block of (N,N)
offset_arr = (a.shape[-1]*np.arange(N)[:,None] + range(N)).ravel()
# Add in those random start indices with the offset array
idx = start_lidx[:,None] + offset_arr
# On a 2D view of the input array, use advance-indexing to set fillval.
a.reshape(m,-1)[np.arange(m)[:,None], idx] = fillval
return a
Approach #2 : Here's another and possibly more efficient one (for large 2D slices) using advanced-indexing -
def random_block_fill_adv(a, N, fillval=0):
# a is input array
# N is blocksize
# Store shape info
m,n,r = a.shape
# Generate random start indices for second and third axes keeping proper
# distance from the boundaries for the block to be accomodated within.
idx0 = np.random.randint(0,n-N+1,m)
idx1 = np.random.randint(0,r-N+1,m)
# Setup indices for advanced-indexing.
# First axis indices would be simply the range array to select one per elem.
# We need to extend this to 3D so that the latter dim indices could be aligned.
dim0 = np.arange(m)[:,None,None]
# Second axis indices would idx0 with broadcasted additon of blocksized
# range array to cover all block indices along this axis. Repeat for third.
dim1 = idx0[:,None,None] + np.arange(N)[:,None]
dim2 = idx1[:,None,None] + range(N)
a[dim0, dim1, dim2] = fillval
return a
Approach #3 : With the old-trusty loop -
def random_block_fill_loopy(a, N, fillval=0):
# a is input array
# N is blocksize
# Store shape info
m,n,r = a.shape
# Generate random start indices for second and third axes keeping proper
# distance from the boundaries for the block to be accomodated within.
idx0 = np.random.randint(0,n-N+1,m)
idx1 = np.random.randint(0,r-N+1,m)
# Iterate through first and use slicing to assign fillval.
for i in range(m):
a[i, idx0[i]:idx0[i]+N, idx1[i]:idx1[i]+N] = fillval
return a
Sample run -
In [357]: a = np.arange(2*4*7).reshape(2,4,7)
In [358]: a
Out[358]:
array([[[ 0, 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12, 13],
[14, 15, 16, 17, 18, 19, 20],
[21, 22, 23, 24, 25, 26, 27]],
[[28, 29, 30, 31, 32, 33, 34],
[35, 36, 37, 38, 39, 40, 41],
[42, 43, 44, 45, 46, 47, 48],
[49, 50, 51, 52, 53, 54, 55]]])
In [359]: random_block_fill_adv(a, N=3, fillval=0)
Out[359]:
array([[[ 0, 0, 0, 0, 4, 5, 6],
[ 7, 0, 0, 0, 11, 12, 13],
[14, 0, 0, 0, 18, 19, 20],
[21, 22, 23, 24, 25, 26, 27]],
[[28, 29, 30, 31, 32, 33, 34],
[35, 36, 37, 38, 0, 0, 0],
[42, 43, 44, 45, 0, 0, 0],
[49, 50, 51, 52, 0, 0, 0]]])
Fun stuff : Being in-place filling, if we keep running random_block_fill_adv(a, N=3, fillval=0), we will eventually end up with all zeros a. Thus, also verifying the code.
Runtime test
In [579]: a = np.random.randint(0,9,(10000,4,4))
In [580]: %timeit random_block_fill_lidx(a, N=2, fillval=0)
...: %timeit random_block_fill_adv(a, N=2, fillval=0)
...: %timeit random_block_fill_loopy(a, N=2, fillval=0)
...:
1000 loops, best of 3: 545 µs per loop
1000 loops, best of 3: 891 µs per loop
100 loops, best of 3: 10.6 ms per loop
In [581]: a = np.random.randint(0,9,(1000,40,40))
In [582]: %timeit random_block_fill_lidx(a, N=10, fillval=0)
...: %timeit random_block_fill_adv(a, N=10, fillval=0)
...: %timeit random_block_fill_loopy(a, N=10, fillval=0)
...:
1000 loops, best of 3: 739 µs per loop
1000 loops, best of 3: 671 µs per loop
1000 loops, best of 3: 1.27 ms per loop
So, which one to choose depends on the first axis length and blocksize.
I have written the following function to calculate a check digit in R.
verhoeffCheck <- function(x)
{
## calculates check digit based on Verhoeff algorithm
## note that due to the way strsplit works, to call for vector x, use sapply(x,verhoeffCheck)
## check for string since leading zeros with numbers will be lost
if (class(x)!="character"){stop("Must enter a string")}
#split and convert to numbers
digs <- strsplit(x,"")[[1]]
digs <- as.numeric(digs)
digs <- rev(digs) ## right to left algorithm
## tables required for D_5 group
d5_mult <- matrix(c(
0:9,
c(1:4,0,6:9,5),
c(2:4,0:1,7:9,5:6),
c(3:4,0:2,8:9,5:7),
c(4,0:3,9,5:8),
c(5,9:6,0,4:1),
c(6:5,9:7,1:0,4:2),
c(7:5,9:8,2:0,4:3),
c(8:5,9,3:0,4),
9:0
),10,10,byrow=T)
d5_perm <- matrix(c(
0:9,
c(1,5,7,6,2,8,3,0,9,4),
c(5,8,0,3,7,9,6,1,4,2),
c(8,9,1,6,0,4,3,5,2,7),
c(9,4,5,3,1,2,6,8,7,0),
c(4,2,8,6,5,7,3,9,0,1),
c(2,7,9,3,8,0,6,4,1,5),
c(7,0,4,6,9,1,3,2,5,8)
),8,10,byrow=T)
d5_inv <- c(0,4:1,5:9)
## apply algoritm - note 1-based indexing in R
d <- 0
for (i in 1:length(digs)){
d <- d5_mult[d+1,(d5_perm[(i%%8)+1,digs[i]+1])+1]
}
d5_inv[d+1]
}
In order to run on a vector of strings, sapply must be used. This is in part because of the use of strsplit, which returns a list of vectors. This does impact on the performance even for only moderately sized inputs.
How could this function be vectorized?
I am also aware that some performance is lost in having to create the tables in each iteration. Would storing these in a new environment be a better solution?
We begin by defining the lookup matrices. I've laid them out in a way
that should make them easier to check against a reference, e.g.
http://en.wikipedia.org/wiki/Verhoeff_algorithm.
d5_mult <- matrix(as.integer(c(
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
1, 2, 3, 4, 0, 6, 7, 8, 9, 5,
2, 3, 4, 0, 1, 7, 8, 9, 5, 6,
3, 4, 0, 1, 2, 8, 9, 5, 6, 7,
4, 0, 1, 2, 3, 9, 5, 6, 7, 8,
5, 9, 8, 7, 6, 0, 4, 3, 2, 1,
6, 5, 9, 8, 7, 1, 0, 4, 3, 2,
7, 6, 5, 9, 8, 2, 1, 0, 4, 3,
8, 7, 6, 5, 9, 3, 2, 1, 0, 4,
9, 8, 7, 6, 5, 4, 3, 2, 1, 0
)), ncol = 10, byrow = TRUE)
d5_perm <- matrix(as.integer(c(
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
1, 5, 7, 6, 2, 8, 3, 0, 9, 4,
5, 8, 0, 3, 7, 9, 6, 1, 4, 2,
8, 9, 1, 6, 0, 4, 3, 5, 2, 7,
9, 4, 5, 3, 1, 2, 6, 8, 7, 0,
4, 2, 8, 6, 5, 7, 3, 9, 0, 1,
2, 7, 9, 3, 8, 0, 6, 4, 1, 5,
7, 0, 4, 6, 9, 1, 3, 2, 5, 8
)), ncol = 10, byrow = TRUE)
d5_inv <- as.integer(c(0, 4, 3, 2, 1, 5, 6, 7, 8, 9))
Next, we'll define the check function, and try it out with a test input.
I've followed the derivation in wikipedia as closely as possible.
p <- function(i, n_i) {
d5_perm[(i %% 8) + 1, n_i + 1] + 1
}
d <- function(c, p) {
d5_mult[c + 1, p]
}
verhoeff <- function(x) {
#split and convert to numbers
digs <- strsplit(as.character(x), "")[[1]]
digs <- as.numeric(digs)
digs <- rev(digs) ## right to left algorithm
## apply algoritm - note 1-based indexing in R
c <- 0
for (i in 1:length(digs)) {
c <- d(c, p(i, digs[i]))
}
d5_inv[c + 1]
}
verhoeff(142857)
## [1] 0
This function is fundamentally iterative, as each iteration depends on
the value of the previous. This means that we're unlikely to be able to
vectorise in R, so if we want to vectorise, we'll need to use Rcpp.
However, before we turn to that, it's worth exploring if we can do the
initial split faster. First we do a little microbenchmark to see if it's
worth bothering:
library(microbenchmark)
digits <- function(x) {
digs <- strsplit(as.character(x), "")[[1]]
digs <- as.numeric(digs)
rev(digs)
}
microbenchmark(
digits(142857),
verhoeff(142857)
)
## Unit: microseconds
## expr min lq median uq max neval
## digits(142857) 11.30 12.01 12.43 12.85 28.79 100
## verhoeff(142857) 32.24 33.81 34.66 35.47 95.85 100
It looks like it! On my computer, verhoeff_prepare() accounts for
about 50% of the run time. A little searching on stackoverflow reveals
another approach to turning a number into
digits:
digits2 <- function(x) {
n <- floor(log10(x))
x %/% 10^(0:n) %% 10
}
digits2(12345)
## [1] 5 4 3 2 1
microbenchmark(
digits(142857),
digits2(142857)
)
## Unit: microseconds
## expr min lq median uq max neval
## digits(142857) 11.495 12.102 12.468 12.834 79.60 100
## digits2(142857) 2.322 2.784 3.358 3.561 13.69 100
digits2() is a lot faster than digits() but it has limited impact on
the whole runtime.
verhoeff2 <- function(x) {
digs <- digits2(x)
c <- 0
for (i in 1:length(digs)) {
c <- d(c, p(i, digs[i]))
}
d5_inv[c + 1]
}
verhoeff2(142857)
## [1] 0
microbenchmark(
verhoeff(142857),
verhoeff2(142857)
)
## Unit: microseconds
## expr min lq median uq max neval
## verhoeff(142857) 33.06 34.49 35.19 35.92 73.38 100
## verhoeff2(142857) 20.98 22.58 24.05 25.28 48.69 100
To make it even faster we could try C++.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
int verhoeff3_c(IntegerVector digits, IntegerMatrix mult, IntegerMatrix perm,
IntegerVector inv) {
int n = digits.size();
int c = 0;
for(int i = 0; i < n; ++i) {
int p = perm(i % 8, digits[i]);
c = mult(c, p);
}
return inv[c];
}
verhoeff3 <- function(x) {
verhoeff3_c(digits(x), d5_mult, d5_perm, d5_inv)
}
verhoeff3(142857)
## [1] 3
microbenchmark(
verhoeff2(142857),
verhoeff3(142857)
)
## Unit: microseconds
## expr min lq median uq max neval
## verhoeff2(142857) 21.00 22.85 25.53 27.11 63.71 100
## verhoeff3(142857) 16.75 17.99 18.87 19.64 79.54 100
That doesn't yield much of an improvement. Maybe we can do better if we
pass the number to C++ and process the digits in a loop:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
int verhoeff4_c(int number, IntegerMatrix mult, IntegerMatrix perm,
IntegerVector inv) {
int c = 0;
int i = 0;
for (int i = 0; number > 0; ++i, number /= 10) {
int p = perm(i % 8, number % 10);
c = mult(c, p);
}
return inv[c];
}
verhoeff4 <- function(x) {
verhoeff4_c(x, d5_mult, d5_perm, d5_inv)
}
verhoeff4(142857)
## [1] 3
microbenchmark(
verhoeff2(142857),
verhoeff3(142857),
verhoeff4(142857)
)
## Unit: microseconds
## expr min lq median uq max neval
## verhoeff2(142857) 21.808 24.910 26.838 27.797 64.22 100
## verhoeff3(142857) 17.699 18.742 19.599 20.764 81.67 100
## verhoeff4(142857) 3.143 3.797 4.095 4.396 13.21 100
And we get a pay off: verhoeff4() is about 5 times faster than
verhoeff2().
If your input strings can contain different numbers of characters, then I don't see any way round lapply calls (or a plyr equivalent). The trick is to move them inside the function, so verhoeffCheck can accept vector inputs. This way you only need to create the matrices once.
verhoeffCheckNew <- function(x)
{
## calculates check digit based on Verhoeff algorithm
## check for string since leading zeros with numbers will be lost
if (!is.character(x)) stop("Must enter a string")
#split and convert to numbers
digs <- strsplit(x, "")
digs <- lapply(digs, function(x) rev(as.numeric(x)))
## tables required for D_5 group
d5_mult <- matrix(c(
0:9,
c(1:4,0,6:9,5),
c(2:4,0:1,7:9,5:6),
c(3:4,0:2,8:9,5:7),
c(4,0:3,9,5:8),
c(5,9:6,0,4:1),
c(6:5,9:7,1:0,4:2),
c(7:5,9:8,2:0,4:3),
c(8:5,9,3:0,4),
9:0
),10,10,byrow=T)
d5_perm <- matrix(c(
0:9,
c(1,5,7,6,2,8,3,0,9,4),
c(5,8,0,3,7,9,6,1,4,2),
c(8,9,1,6,0,4,3,5,2,7),
c(9,4,5,3,1,2,6,8,7,0),
c(4,2,8,6,5,7,3,9,0,1),
c(2,7,9,3,8,0,6,4,1,5),
c(7,0,4,6,9,1,3,2,5,8)
),8,10,byrow=T)
d5_inv <- c(0,4:1,5:9)
## apply algorithm - note 1-based indexing in R
sapply(digs, function(x)
{
d <- 0
for (i in 1:length(x)){
d <- d5_mult[d + 1, (d5_perm[(i %% 8) + 1, x[i] + 1]) + 1]
}
d5_inv[d+1]
})
}
Since d depends on what it was previously, the is no easy way to vectorise the for loop.
My version runs in about half the time for 1e5 strings.
rand_string <- function(n = 12)
{
paste(sample(as.character(0:9), sample(n), replace = TRUE), collapse = "")
}
big_test <- replicate(1e5, rand_string())
tic()
res1 <- unname(sapply(big_test, verhoeffCheck))
toc()
tic()
res2 <- verhoeffCheckNew(big_test)
toc()
identical(res1, res2) #hopefully TRUE!
See this question for tic and toc.
Further thoughts:
You may want additional input checking for "" and other strings that return NA when converted in numeric.
Since you are dealing exclusively with integers, you may get a slight performance benefit from using them rather than doubles. (Use as.integer rather than as.numeric and append L to the values in your matrices.)
Richie C answered the vectorisation question nicely; as for only creatig the tables once without cluttering the global name space, one quick solution that does not require a package is
verhoeffCheck <- local(function(x)
{
## calculates check digit based on Verhoeff algorithm
## note that due to the way strsplit works, to call for vector x, use sapply(x,verhoeffCheck)
## check for string since leading zeros with numbers will be lost
if (class(x)!="character"){stop("Must enter a string")}
#split and convert to numbers
digs <- strsplit(x,"")[[1]]
digs <- as.numeric(digs)
digs <- rev(digs) ## right to left algorithm
## apply algoritm - note 1-based indexing in R
d <- 0
for (i in 1:length(digs)){
d <- d5_mult[d+1,(d5_perm[(i%%8)+1,digs[i]+1])+1]
}
d5_inv[d+1]
})
assign("d5_mult", matrix(c(
0:9, c(1:4,0,6:9,5), c(2:4,0:1,7:9,5:6), c(3:4,0:2,8:9,5:7),
c(4,0:3,9,5:8), c(5,9:6,0,4:1), c(6:5,9:7,1:0,4:2), c(7:5,9:8,2:0,4:3),
c(8:5,9,3:0,4), 9:0), 10, 10, byrow = TRUE),
envir = environment(verhoeffCheck))
assign("d5_perm", matrix(c(
0:9, c(1,5,7,6,2,8,3,0,9,4), c(5,8,0,3,7,9,6,1,4,2),
c(8,9,1,6,0,4,3,5,2,7), c(9,4,5,3,1,2,6,8,7,0), c(4,2,8,6,5,7,3,9,0,1),
c(2,7,9,3,8,0,6,4,1,5), c(7,0,4,6,9,1,3,2,5,8)), 8, 10, byrow = TRUE),
envir = environment(verhoeffCheck))
assign("d5_inv", c(0,4:1,5:9), envir = environment(verhoeffCheck))
## Now just use the function
which keeps the data in the environment of the function. You can time it to see how much faster it is.
Hope this helps.
Allan