Transform a matrix made of binomial vectors to ranges for consecutive zeros - numpy

I am trying to figure out how to do this transformation symbolically in theano a matrix of undetermined size
From:
[[0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1],
.
.
]
To:
[[1, 2, 3, 0, 1, 2, 3, 4, 5, 0, 0, 1, 0, 1, 2, 3, 0],
[1, 2, 3, 4, 5, 6, 7, 8, 0, 1, 2, 0, 0, 0, 0, 0, 0],
.
.
]
So for every consecutive 0 I want an increasing range and whenever I stumble on a 1 the range resets.

Here's one way to do it, using inefficient scans:
import theano
import theano.tensor as tt
def inner_step(x_t_t, y_t_tm1):
return tt.switch(x_t_t, 0, y_t_tm1 + 1)
def outer_step(x_t):
return theano.scan(inner_step, sequences=[x_t], outputs_info=[0])[0]
def compile():
x = tt.bmatrix()
y = theano.scan(outer_step, sequences=[x])[0]
return theano.function([x], y)
def main():
f = compile()
data = [[0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1]]
print f(data)
main()
When run, this prints:
[[1 2 3 0 1 2 3 4 5 0 0 1 0 1 2 3 0]
[1 2 3 4 5 6 7 8 0 1 2 0 0 0 0 0 0]]

Related

How to print a specific information from value_count()?

import pandas as pd
data = {'qtd': [0, 1, 4, 0, 1, 3, 1, 3, 0, 0,
3, 1, 3, 0, 1, 1, 0, 0, 1, 3,
0, 1, 0, 0, 1, 0, 1, 0, 0, 1,
0, 1, 1, 1, 1, 3, 0, 3, 0, 0,
2, 0, 0, 2, 0, 0, 2, 0, 0, 2,
0, 2, 0, 0, 2, 0, 0, 2, 0, 0,
2, 0, 0, 2, 0, 0, 2, 0, 0, 1,
1, 1, 1, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 1, 1,
1, 1, 1, 1, 1]
}
df = pd.DataFrame (data, columns = ['qtd'])
Counting
df['qtd'].value_counts()
0 43
1 34
2 10
3 7
4 1
Name: qtd, dtype: int64
What I want is to print a phrase: "The total with zero occurrencies is 43"
Tried with .head(1) but shows more than I want.
Does this solve your problem? The [0] indicates the index you wish to print, in this case the very first occurrence in your column of a data frame.
print('The total with zero occurences is:', df['qtd'].value_counts()[0])
The output of the code above will be:
The total with zero occurences is: 43
I am not sure if you want this but may be helpful:
import inflect
e = inflect.engine()
(df['qtd'].map(e.number_to_words).radd("The total with ").add(" occurances is ")
.value_counts().astype(str).reset_index().agg(':'.join,1))
0 The total with zero occurances is :43
1 The total with one occurances is :34
2 The total with two occurances is :10
3 The total with three occurances is :7
4 The total with four occurances is :1
dtype: object

Fill values in numpy array that are between a certain value

Let's say I have an array that looks like this:
a = np.array([0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0])
I want to fill the values that are between 1's with 1's.
So this would be the desired output:
a = np.array([0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0])
I have taken a look into this answer, which yields the following:
array([0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1])
I am sure this answer is really close to the output I want. However, although tried countless times, I can't change this code into making it work the way I want, as I am not that proficient with numpy arrays.
Any help is much appreciated!
Try this
b = ((a == 1).cumsum() % 2) | a
Out[10]:
array([0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0], dtype=int32)
From #Paul Panzer: use ufunc.accumulate with bitwise_xor
b = np.bitwise_xor.accumulate(a)|a
Try this:
import numpy as np
num_lst = np.array(
[0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0])
i = 0
while i < len(num_lst): # Iterate through the list
if num_lst[i]: # Check if element is 1 at i-th position
if not num_lst[i+1]: # Check if next element is 0
num_lst[i+1] = 1 # Change next element to 1
i += 1 # Continue through loop
else: # Check if next element is 1
i += 2 # Skip next element
else:
i += 1 # Continue through loop
print(num_lst)
This is probably not the most elegant way to execute this, but it should work. Basically, we loop through the list to find any 1s. When we find an element that is 1, we check if the next element is 0. If it is, then we change the next element to 1. If the next element is 1, that means we should stop changing 0s to 1s, so we jump over that element and proceed with the iteration.

Is there a way to slice out multiple 2D numpy arrays from one 2D numpy array in one batch operation?

I have a numpy array heatmap of shape (img_height, img_width) and another array bboxes of shape (K, 4), where K is a number of bounding boxes.
Each bounding box is defined
like so: [x_top_left, y_top_left, width, height].
Here's an example of such array:
bboxes = np.array([
[0, 0, 4, 7],
[3, 4, 3, 4],
[7, 2, 3, 7]
])
heatmap is initally filled with zeros.
What I need to do is to put value 1 for each bounding box in it's corresponding place.
The resulting heatmap should be:
heatmap = np.array([
[1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0],
[1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0],
[0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
])
Important things to note:
axis 0 corresponds to image height
axis 1 corresponds to image width
I've already solved this using python for loop, like so:
for bbox in bboxes:
# y_top_left:y_top_left + img_height, x_top_left:x_top_left + img_width
heatmap[bbox[1] : bbox[1] + bbox[3], bbox[0] : bbox[0] + bbox[2]] = 1
I would like to avoid using python for loops (if it's possible) and be able to do something like this:
heatmap[bboxes[:,1] : bboxes[:,1] + bboxes[:,3], bboxes[:,0]:bboxes[:,0] + bboxes[:,2]] = 1
Is there a way of doing such multiple slicing in numpy?
I am aware of numpy integer array indexing, but to generate such indices I am also unable to avoid python for loops.

Vectorize this for loop in numpy

I am trying to compute matrix z (defined below) in python with numpy.
Here's my current solution (using 1 for loop)
z = np.zeros((n, k))
for i in range(n):
v = pi * (1 / math.factorial(x[i])) * np.exp(-1 * lamb) * (lamb ** x[i])
numerator = np.sum(v)
c = v / numerator
z[i, :] = c
return z
Is it possible to completely vectorize this computation? I need to do this computation for thousands of iterations, and matrix operations in numpy is much faster than huge for loops.
Here is a vectorized version of E. It replaces the for-loop and scalar arithmetic with NumPy broadcasting and array-based arithmetic:
def alt_E(x):
x = x[:, None]
z = pi * (np.exp(-lamb) * (lamb**x)) / special.factorial(x)
denom = z.sum(axis=1)[:, None]
z /= denom
return z
I ran em.py to get a sense for the typical size of x, lamb, pi, n and k. On data of this size,
alt_E is about 120x faster than E:
In [32]: %timeit E(x)
100 loops, best of 3: 11.5 ms per loop
In [33]: %timeit alt_E(x)
10000 loops, best of 3: 94.7 µs per loop
In [34]: 11500/94.7
Out[34]: 121.43611404435057
This is the setup I used for the benchmark:
import math
import numpy as np
import scipy.special as special
def alt_E(x):
x = x[:, None]
z = pi * (np.exp(-lamb) * (lamb**x)) / special.factorial(x)
denom = z.sum(axis=1)[:, None]
z /= denom
return z
def E(x):
z = np.zeros((n, k))
for i in range(n):
v = pi * (1 / math.factorial(x[i])) * \
np.exp(-1 * lamb) * (lamb ** x[i])
numerator = np.sum(v)
c = v / numerator
z[i, :] = c
return z
n = 576
k = 2
x = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5])
lamb = np.array([ 0.84835141, 1.04025989])
pi = np.array([ 0.5806958, 0.4193042])
assert np.allclose(alt_E(x), E(x))
By the way, E could also be calculated using scipy.stats.poisson:
import scipy.stats as stats
pois = stats.poisson(mu=lamb)
def alt_E2(x):
z = pi * pois.pmf(x[:,None])
denom = z.sum(axis=1)[:, None]
z /= denom
return z
but this does not turn out to be faster, at least for arrays of this length:
In [33]: %timeit alt_E(x)
10000 loops, best of 3: 94.7 µs per loop
In [102]: %timeit alt_E2(x)
1000 loops, best of 3: 278 µs per loop
For larger x, alt_E2 is faster:
In [104]: x = np.random.random(10000)
In [106]: %timeit alt_E(x)
100 loops, best of 3: 2.18 ms per loop
In [105]: %timeit alt_E2(x)
1000 loops, best of 3: 643 µs per loop

How can I keep the patch which contain all the elements 1

from sklearn.feature_extraction.image import extract_patches
import numpy as np
data = np.array([[1, 1 , 0 , 0 , 0 , 0 , 1 , 0],
[1, 1 , 1 , 0 , 0 , 1 , 1 , 0],
[1, 1 , 0 , 1 , 1 , 0 , 0 , 0],
[0, 0 , 0 , 1 , 1 , 0 , 0 , 0],
[0, 0 , 0 , 1 , 1 , 0 , 0 , 1],
[1, 1 , 0 , 0 , 0 , 0 , 1 , 0],
[1, 1 , 0 , 0 , 0 , 0 , 0 , 0]])
patches = extract_patches(data, patch_shape=(2, 2))
How can I keep the patch which contain all the elements 1?
From the corrections to your post, I believe you might be looking for a way to detect where submatrices of shape (2,2) are all ones. Anywhere where that condition isn't fulfilled should be zero, but priority should be given to the submatrices where that condition is fulfilled, because submatrices can be overlapping.
In that case, you're most likely interested in the staggered grid of that matrix that has a one in the center of each 2x2 submatrix whenever the 4 elements of that submatrix are all ones:
>>> import numpy as np
>>> from sklearn.feature_extraction.image import extract_patches # similar to numpy's stride_tricks
>>>
>>> data = np.array([[1, 1, 0, 0, 0, 0, 1, 0],
... [1, 1, 1, 0, 0, 1, 1, 0],
... [1, 1, 0, 1, 1, 0, 0, 0],
... [0, 0, 0, 1, 1, 0, 0, 0],
... [0, 0, 0, 1, 1, 0, 0, 1],
... [1, 1, 0, 0, 0, 0, 1, 0],
... [1, 1, 0, 0, 0, 0, 0, 0]])
>>>
>>> # to take boundary effects into account, append ones to the right and bottom
... # modify this to `np.zeros` if boundaries are to be set to zero
... data2 = np.ones((data.shape[0]+1, data.shape[1]+1))
>>> data2[:-1,:-1] = data
>>> vert = np.logical_and(data2[:-1,:], data2[1:,])
>>> dual = np.logical_and(vert[:,:-1], vert[:,1:]) # dual is now the "dual" graph/staggered grid of the data2 array
>>> patches = extract_patches(data2, patch_shape=(2, 2)) # could've used numpy stride_tricks too
>>> patches[dual==0] = 0
>>> patches[dual] = 1 # Give precedence to the dual positives
>>> data2[:-1, :-1].astype(np.uint8)
array([[1, 1, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0]], dtype=uint8)
For completeness, this staggered grid form of the matrix could also be obtained easily with a correlation with a np.ones((2,2)) kernel. However, that is computationally more heavy, because a lot more work has to be done (multiplications and summations) rather than simple bit-operations. The method above will outperform a correlation-based method in terms of speed.
The staggered grid dual above could also be generated in the following way:
patches = extract_patches(data, patch_shape=(2, 2))
dual = patches.all(axis=-1).all(axis=-1)
And you would obtain the final result with:
dual = patches.all(axis=-1).all(axis=-1)
patches[dual==False] = 0
patches[dual] = 1
It differs from the previous method in what happens at the boundaries though.
Here's an alternative method, using minimum_filter and maximum_filter from scipy.ndimage. (The description in the question is still too vague--for me, anyway--so this is based on the result shown in #OliverW.'s answer.)
In [138]: from scipy.ndimage import minimum_filter, maximum_filter
In [139]: data
Out[139]:
array([[1, 1, 0, 0, 0, 0, 1, 0],
[1, 1, 1, 0, 0, 1, 1, 0],
[1, 1, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 1],
[1, 1, 0, 0, 0, 0, 1, 0],
[1, 1, 0, 0, 0, 0, 0, 0]])
In [140]: m = minimum_filter(data, size=(2,2), mode='constant', origin=(-1,-1))
In [141]: result = maximum_filter(m, size=(2,2), mode='constant')
In [142]: result
Out[142]:
array([[1, 1, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0]])