The `out` arguments in `numpy.einsum` can not work as expected - numpy

I have two piece codes. The first one is:
A = np.arange(3*4*3).reshape(3, 4, 3)
P = np.arange(1, 4)
A[:, 1:, :] = np.einsum('j, ijk->ijk', P, A[:, 1:, :])
and the result A is :
array([[[ 0, 1, 2],
[ 6, 8, 10],
[ 18, 21, 24],
[ 36, 40, 44]],
[[ 12, 13, 14],
[ 30, 32, 34],
[ 54, 57, 60],
[ 84, 88, 92]],
[[ 24, 25, 26],
[ 54, 56, 58],
[ 90, 93, 96],
[132, 136, 140]]])
The second one is:
A = np.arange(3*4*3).reshape(3, 4, 3)
P = np.arange(1, 4)
np.einsum('j, ijk->ijk', P, A[:, 1:, :], out=A[:,1:,:])
and the result A is :
array([[[ 0, 1, 2],
[ 0, 0, 0],
[ 0, 0, 0],
[ 0, 0, 0]],
[[12, 13, 14],
[ 0, 0, 0],
[ 0, 0, 0],
[ 0, 0, 0]],
[[24, 25, 26],
[ 0, 0, 0],
[ 0, 0, 0],
[ 0, 0, 0]]])
So the result is different. Here I want to use out to save memory. Is it a bug in numpy.einsum? Or I missed something?
By the way, my numpy version is 1.13.3.

I haven't used this new out parameter before, but have worked with einsum in the past, and have a general idea of how it works (or at least used to).
It looks to me like it initializes the out array to zero before the start of iteration. That would account for all the 0s in the A[:,1:,:] block. If instead I initial separate out array, the desired values are inserted
In [471]: B = np.ones((3,4,3),int)
In [472]: np.einsum('j, ijk->ijk', P, A[:, 1:, :], out=B[:,1:,:])
Out[472]:
array([[[ 3, 4, 5],
[ 12, 14, 16],
[ 27, 30, 33]],
[[ 15, 16, 17],
[ 36, 38, 40],
[ 63, 66, 69]],
[[ 27, 28, 29],
[ 60, 62, 64],
[ 99, 102, 105]]])
In [473]: B
Out[473]:
array([[[ 1, 1, 1],
[ 3, 4, 5],
[ 12, 14, 16],
[ 27, 30, 33]],
[[ 1, 1, 1],
[ 15, 16, 17],
[ 36, 38, 40],
[ 63, 66, 69]],
[[ 1, 1, 1],
[ 27, 28, 29],
[ 60, 62, 64],
[ 99, 102, 105]]])
The Python portion of einsum doesn't tell me much, except how it decides to pass the out array to the c portion, (as one of the list of tmp_operands):
c_einsum(einsum_str, *tmp_operands, **einsum_kwargs)
I know that it sets up a c-api equivalent of np.nditer, using the str to define the axes and iterations.
It iterates something like this section in the iteration tutorial:
https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.nditer.html#reduction-iteration
Notice in particular the it.reset() step. That sets the out buffer to 0 prior to iterating. It then iterates over the elements of input arrays and the output array, writing the calculation values to the output element. Since it is doing a sum of products (e.g. out[:] += ...), it has to start with a clean slate.
I'm guessing a bit as to what is actually going on, but it seems logical to me that it should zero out the output buffer to start with. If that array is the same as one of the inputs, that will end up messing with the calculation.
So I don't think this approach will work and save you memory. It needs a clean buffer to accumulate the results in. Once that's done it, or you, can write the values back into A. But given the nature of a dot like product, you can't use the same array for input and for output.
In [476]: A[:,1:,:] = np.einsum('j, ijk->ijk', P, A[:, 1:, :])
In [477]: A
Out[477]:
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 12, 14, 16],
[ 27, 30, 33]],
....)

In the C source code for einsum, there is a section that will take the array specified by out and do some zero-setting.
But in the Python source code for example, there are execution paths that call the tensordot function before ever descending the arguments to call c_einsum.
This means that some operations might be pre-computed (thus modifying your array A on some contraction passes) with tensordot, before any sub-array is ever set to zero by the zero-setter inside the C code for einsum.
Another way to put it is: on each pass at doing the next contraction operations, NumPy has many choices available to it. To use tensordot directly without getting into the C-level einsum code just yet? Or to prepare the arguments and pass to the C level (which will involve over-writing some sub-view of the output array with all zeros)? Or to re-order the operations and repeat the check?
Depending on the order it chooses for these optimizations, you can end up with unexpected all-zeros sub-arrays.
Best bet is to not try to be this clever and use the same array for the output. You say it is because you want to save memory. Yes, in some special cases an einsum operation might be do-able in-place. But it does not currently detect if this is the case and attempt to avoid the zero-setting.
And in a huge number of cases, over-writing into one of the input arrays during the middle of the overall operation would cause many problems, much like trying to append to a list you are directly looping over, etc.

Related

Moving from nested loops to NumPy iterating

I have a system of equations that I am trying to simulate and using very basic looping structures seems to rapidly slow down my computing speed. I have a mock example below to illustrate how I am running the simulation now:
import numpy as np
Imax, Jmax, Tmax = 4, 4, 3
Iset, Jset, Tset = range(0,Imax), range(0,Jmax), range(0,Tmax)
X = np.arange(0,48).reshape(3,4,4)
X[1], X[2] = 4, 2
Y = 2*X
for t in Tset:
if t == 2:
break
else:
for i in Iset:
for j in Jset:
Y[t+1,i,j] = Y[t,i,j] + X[t,i,j]
X[t+1,i,j] = X[t,i,j] + 1
# Output for Y...
array([[[ 0, 2, 4, 6],
[ 8, 10, 12, 14],
[16, 18, 20, 22],
[24, 26, 28, 30]],
[[ 0, 3, 6, 9],
[12, 15, 18, 21],
[24, 27, 30, 33],
[36, 39, 42, 45]],
[[ 1, 5, 9, 13],
[17, 21, 25, 29],
[33, 37, 41, 45],
[49, 53, 57, 61]]])
Intuitively this structure makes sense to me because I am accessing the individual elements of the Y array and updating it, but because I have this looping over very large values and have more going on in the loop, I am experiencing a drastic reduction in computational speed.
I came across nditer and I am hoping that I can use this in place of the multiple nested loops that I have so that I can still get the same result, but faster. How can I go about converting this nested for-loop style into a more efficient iteration scheme?

How to use the 'where' option in numpy.multiply?

I need to multiply an array (NIR) with a scalar (f) but leaving some values that meet a certain condition intact.
I tried the following:
NIR_f = np.multiply(NIR,f,where=NIR!=-28672.0)
To check I made:
i,j=1119,753
NIR[i][j],NIR_f[i][j]
and I got this:
(-28672.0, 10058.0)
It is assumed that both results should be the same! In that position the condition is not met, therefore the value should remain intact.
Am I using the "where" option wrongly?
Without your array, or a smaller substitute, I can't exactly replicate your problem. But there are potentially 2 issues
float testing is not exact, so it might be matching one -28672.0, and not another.
the remain intact assumption is tricky. leave the value in the output alone, but what was it originally, 0's or NIR values.
Using an integer array to avoid the float issue:
In [20]: arr = np.arange(12).reshape(3,4)
In [21]: arr
Out[21]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [22]: np.multiply(arr, 10, where=arr!=10)
Out[22]:
array([[ 0, 10, 20, 30],
[ 40, 50, 60, 70],
[ 80, 90, 481036337249, 110]])
In [24]: np.multiply(arr, 10, where=arr!=10)
Out[24]:
array([[ 0, 10, 20, 30],
[ 40, 50, 60, 70],
[ 80, 90, 0, 110]])
arr[2,2] is random. In effect it started with a np.empty array of the right shape and dtype, and filled all values but that one with the multiplication. To use where correctly we need to specify an out parameter as well.
In [25]: out = np.full(arr.shape,-1)
In [26]: out
Out[26]:
array([[-1, -1, -1, -1],
[-1, -1, -1, -1],
[-1, -1, -1, -1]])
In [27]: np.multiply(arr, 10, where=arr!=10, out=out)
Out[27]:
array([[ 0, 10, 20, 30],
[ 40, 50, 60, 70],
[ 80, 90, -1, 110]])
The issue of inexact floats comes up often enough that I won't try to illustrate that.

Filter sequence items in TensorFlow

I have a tensor of allowed items
index = tf.constant([61, 215, 23, 18, 241, 125])
and need to remove items from input sequence batches that are not in index.
seq = tf.constant(
[
[ 18, 241, 0, 0],
[125, 61, 23, 241],
[ 23, 92, 18, 0],
[ 5, 61, 215, 18],
]
)
After the calculation in this case I need
result_needed = tf.constant(
[
[ 18, 241, 0, 0],
[125, 61, 23, 241],
[ 23, 18, 0, 0],
[ 61, 215, 18, 0],
]
)
I cannot do this in Python because this calculation happens during predictions. Also note that while item IDs here are small, solution needs to deal with numbers from 1 to 2^40.
Answer
After some serious pondering time, I came up with the following:
idx_range = tf.reshape(tf.range(seq.shape[-2]), [-1, 1])
idx_tile = tf.tile(idx_range, [1, seq.shape[-2].value])
idx_flat = tf.reshape(idx_tile, [-1])
truth_value = tf.equal(index, tf.expand_dims(seq, -1))
one_hot = tf.to_float(truth_value)
ones = tf.nn.top_k(tf.reduce_sum(one_hot, -1), seq.shape[-1]).indices
ones_flat = tf.reshape(ones, [-1])
ones_idx = tf.reshape(
tf.stack([idx_flat, ones_flat], axis=1),
tf.concat([seq.shape, [2]], axis=0)
)
tf.gather_nd(seq, ones_idx)
This is not exactly what I said I needed, but actually got me close enough. Instead of the output replacing the blacklisted items with 0, it moves them to the end. If you needed them gone, I'm sure there's a method to remove them, but I'm not looking into it. Apologies.

Fill blocks at random places on each 2D slice of a 3D array

I have 3D numpy array, for example, like this:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]],
[[16, 17, 18, 19],
[20, 21, 22, 23],
[24, 25, 26, 27],
[28, 29, 30, 31]]])
Is there a way to index it in such a way that I select, for example, top right corner of 2x2 elements in the first plane, and a center 2x2 elements subarray from the second plane? So that I could then zero out the elements 2,3,6,7,21,22,25,26:
array([[[ 0, 1, 0, 0],
[ 4, 5, 0, 0],
[ 8, 9, 10, 11],
[12, 13, 14, 15]],
[[16, 17, 18, 19],
[20, 0, 0, 23],
[24, 0, 0, 27],
[28, 29, 30, 31]]])
I have a batch of images, and I need to zero out a small window of fixed size, but at different (random) locations for each image in the batch. The first dimension is number of images.
Something like this:
a[:, x: x+2, y: y+2] = 0
where x and y are vectors which have different values for each first dimension of a.
Approach #1 : Here'e one approach that's mostly based on linear-indexing -
def random_block_fill_lidx(a, N, fillval=0):
# a is input array
# N is blocksize
# Store shape info
m,n,r = a.shape
# Get all possible starting linear indices for each 2D slice
possible_start_lidx = (np.arange(n-N+1)[:,None]*r + range(r-N+1)).ravel()
# Get random start indices from all possible ones for all 2D slices
start_lidx = np.random.choice(possible_start_lidx, m)
# Get linear indices for the block of (N,N)
offset_arr = (a.shape[-1]*np.arange(N)[:,None] + range(N)).ravel()
# Add in those random start indices with the offset array
idx = start_lidx[:,None] + offset_arr
# On a 2D view of the input array, use advance-indexing to set fillval.
a.reshape(m,-1)[np.arange(m)[:,None], idx] = fillval
return a
Approach #2 : Here's another and possibly more efficient one (for large 2D slices) using advanced-indexing -
def random_block_fill_adv(a, N, fillval=0):
# a is input array
# N is blocksize
# Store shape info
m,n,r = a.shape
# Generate random start indices for second and third axes keeping proper
# distance from the boundaries for the block to be accomodated within.
idx0 = np.random.randint(0,n-N+1,m)
idx1 = np.random.randint(0,r-N+1,m)
# Setup indices for advanced-indexing.
# First axis indices would be simply the range array to select one per elem.
# We need to extend this to 3D so that the latter dim indices could be aligned.
dim0 = np.arange(m)[:,None,None]
# Second axis indices would idx0 with broadcasted additon of blocksized
# range array to cover all block indices along this axis. Repeat for third.
dim1 = idx0[:,None,None] + np.arange(N)[:,None]
dim2 = idx1[:,None,None] + range(N)
a[dim0, dim1, dim2] = fillval
return a
Approach #3 : With the old-trusty loop -
def random_block_fill_loopy(a, N, fillval=0):
# a is input array
# N is blocksize
# Store shape info
m,n,r = a.shape
# Generate random start indices for second and third axes keeping proper
# distance from the boundaries for the block to be accomodated within.
idx0 = np.random.randint(0,n-N+1,m)
idx1 = np.random.randint(0,r-N+1,m)
# Iterate through first and use slicing to assign fillval.
for i in range(m):
a[i, idx0[i]:idx0[i]+N, idx1[i]:idx1[i]+N] = fillval
return a
Sample run -
In [357]: a = np.arange(2*4*7).reshape(2,4,7)
In [358]: a
Out[358]:
array([[[ 0, 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12, 13],
[14, 15, 16, 17, 18, 19, 20],
[21, 22, 23, 24, 25, 26, 27]],
[[28, 29, 30, 31, 32, 33, 34],
[35, 36, 37, 38, 39, 40, 41],
[42, 43, 44, 45, 46, 47, 48],
[49, 50, 51, 52, 53, 54, 55]]])
In [359]: random_block_fill_adv(a, N=3, fillval=0)
Out[359]:
array([[[ 0, 0, 0, 0, 4, 5, 6],
[ 7, 0, 0, 0, 11, 12, 13],
[14, 0, 0, 0, 18, 19, 20],
[21, 22, 23, 24, 25, 26, 27]],
[[28, 29, 30, 31, 32, 33, 34],
[35, 36, 37, 38, 0, 0, 0],
[42, 43, 44, 45, 0, 0, 0],
[49, 50, 51, 52, 0, 0, 0]]])
Fun stuff : Being in-place filling, if we keep running random_block_fill_adv(a, N=3, fillval=0), we will eventually end up with all zeros a. Thus, also verifying the code.
Runtime test
In [579]: a = np.random.randint(0,9,(10000,4,4))
In [580]: %timeit random_block_fill_lidx(a, N=2, fillval=0)
...: %timeit random_block_fill_adv(a, N=2, fillval=0)
...: %timeit random_block_fill_loopy(a, N=2, fillval=0)
...:
1000 loops, best of 3: 545 µs per loop
1000 loops, best of 3: 891 µs per loop
100 loops, best of 3: 10.6 ms per loop
In [581]: a = np.random.randint(0,9,(1000,40,40))
In [582]: %timeit random_block_fill_lidx(a, N=10, fillval=0)
...: %timeit random_block_fill_adv(a, N=10, fillval=0)
...: %timeit random_block_fill_loopy(a, N=10, fillval=0)
...:
1000 loops, best of 3: 739 µs per loop
1000 loops, best of 3: 671 µs per loop
1000 loops, best of 3: 1.27 ms per loop
So, which one to choose depends on the first axis length and blocksize.

Split last dimension of arrays in lower dimensional arrays

Assume we have an array with NxMxD shape. I want to get a list with D NxM arrays.
The correct way of doing it would be:
np.dsplit(myarray, D)
However, this returns D NxMx1 arrays.
I can achieve the desired result by doing something like:
[myarray[..., i] for i in range(D)]
Or:
[np.squeeze(subarray) for subarray in np.dsplit(myarray, D)]
However, I feel like it is a bit redundant to need to perform an additional operation. Am I missing any numpy function that returns the desired result?
Try D.swapaxes(1,2).swapaxes(1,0)
>>>import numpy as np
>>>a = np.arange(24).reshape(2,3,4)
>>>a
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
>>>[a[:,:,i] for i in range(4)]
[array([[ 0, 4, 8],
[12, 16, 20]]),
array([[ 1, 5, 9],
[13, 17, 21]]),
array([[ 2, 6, 10],
[14, 18, 22]]),
array([[ 3, 7, 11],
[15, 19, 23]])]
>>>a.swapaxes(1,2).swapaxes(1,0)
array([[[ 0, 4, 8],
[12, 16, 20]],
[[ 1, 5, 9],
[13, 17, 21]],
[[ 2, 6, 10],
[14, 18, 22]],
[[ 3, 7, 11],
[15, 19, 23]]])
Edit: As pointed out by ajcr (thanks again), the transpose command is more convenient since the two swaps can be done in one step by using
D.transpose(2,0,1)
np.dsplit uses np.array_split, the core of which is:
sub_arys = []
sary = _nx.swapaxes(ary, axis, 0)
for i in range(Nsections):
st = div_points[i]; end = div_points[i+1]
sub_arys.append(_nx.swapaxes(sary[st:end], axis, 0))
with axis=-1, this is equivalent to:
[x[...,i:(i+1)] for i in np.arange(x.shape[-1])] # or
[x[...,[i]] for i in np.arange(x.shape[-1])]
which accounts for the singleton dimension.
So there's nothing wrong or inefficient about your
[x[...,i] for i in np.arange(x.shape[-1])]
Actually in quick time tests, any use of dsplit is slow. It's generality costs. So adding squeeze is relatively cheap.
But by accepting the other answer, it looks like you are really looking for an array of the correct shape, rather than a list of arrays. For many operations that makes sense. split is more useful when the subarrays have more than one 'row' along the split axis, or even an uneven number of 'rows'.