A = [[2,2,4,2,2,2]
[2,6,2,2,2,2]
[2,2,2,2,8,2]]
I want matrix B to be equal to:
B = [[0,0,4,0,0,0]
[0,6,0,0,0,0]
[0,0,0,0,8,0]]
So I want to find the maximum value of each row and replace other values with 0. Is there any way to do this without using for loops?
Thanks in advance for your comments.
Instead of looking at the argmax, you could take the max values for each row directly, then mask the elements which are lower and replace them with zeros:
Inplace this would look like (here True stands for keepdims=True):
>>> A[A < A.max(1, True)] = 0
>>> A
array([[0, 0, 4, 0, 0, 0],
[0, 6, 0, 0, 0, 0],
[0, 0, 0, 0, 8, 0]])
An out of place alternative is to use np.where:
>>> np.where(A == A.max(1, True), A, 0)
array([[0, 0, 4, 0, 0, 0],
[0, 6, 0, 0, 0, 0],
[0, 0, 0, 0, 8, 0]])
In François Chollet's Deep Learning with Python, appears this function:
def vectorize_sequences(sequences, dimension=10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results
I understand what this function does. This function is asked about in this quesion and in this question as well, also mentioned here, here, here, here, here & here. Despite being so wide-spread, this vectorization is, according to Chollet's book is done "manually for maximum clarity." I am interested whether there is a standard, not "manual" way of doing it.
Is there a standard Keras / Tensorflow / Scikit-learn / Pandas / Numpy implementation of a function which behaves very similarly to the function above?
Solution with MultiLabelBinarizer
Assuming sequences is an array of integers with maximum possible value upto dimension-1, we can use MultiLabelBinarizer from sklearn.preprocessing to replicate the behaviour of the function vectorize_sequences
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer(classes=range(dimension))
mlb.fit_transform(sequences)
Solution with Numpy broadcasting
Assuming sequences is an array of integers with maximum possible value upto dimension-1
(np.array(sequences)[:, :, None] == range(dimension)).any(1).view('i1')
Worked out example
>>> sequences
[[4, 1, 0],
[4, 0, 3],
[3, 4, 2]]
>>> dimension = 10
>>> mlb = MultiLabelBinarizer(classes=range(dimension))
>>> mlb.fit_transform(sequences)
array([[1, 1, 0, 0, 1, 0, 0, 0, 0, 0],
[1, 0, 0, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0, 0, 0]])
>>> (np.array(sequences)[:, :, None] == range(dimension)).any(1).view('i1')
array([[0, 1, 1, 1, 0, 0, 0, 0, 0, 0],
[1, 0, 1, 0, 1, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 1, 0, 0, 0, 0, 0]])
(In any language) For a research project, I am stuck on how to convert a matrix P of probability values to a matrix A such that A_ij = 1 with probability P_ij and 0 otherwise? I have looked through various random number generator documentations, but have been unable to figure out how to do this.
If I understand correctly:
In [11]: p = np.random.uniform(size=(5,5))
In [12]: p
Out[12]:
array([[ 0.45481883, 0.21242567, 0.3124863 , 0.00485797, 0.31970718],
[ 0.91995847, 0.29907277, 0.59154085, 0.85847147, 0.13227595],
[ 0.91914631, 0.5495813 , 0.58648856, 0.08037582, 0.23005148],
[ 0.12464628, 0.70657028, 0.75975869, 0.77632964, 0.24587041],
[ 0.69259133, 0.183515 , 0.65500547, 0.19526148, 0.26975325]])
In [13]: a = (p.round(1)==0.7).astype(np.int8)
In [14]: a
Out[14]:
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[1, 0, 1, 0, 0]], dtype=int8)
from sklearn.feature_extraction.image import extract_patches
import numpy as np
data = np.array([[1, 1 , 0 , 0 , 0 , 0 , 1 , 0],
[1, 1 , 1 , 0 , 0 , 1 , 1 , 0],
[1, 1 , 0 , 1 , 1 , 0 , 0 , 0],
[0, 0 , 0 , 1 , 1 , 0 , 0 , 0],
[0, 0 , 0 , 1 , 1 , 0 , 0 , 1],
[1, 1 , 0 , 0 , 0 , 0 , 1 , 0],
[1, 1 , 0 , 0 , 0 , 0 , 0 , 0]])
patches = extract_patches(data, patch_shape=(2, 2))
How can I keep the patch which contain all the elements 1?
From the corrections to your post, I believe you might be looking for a way to detect where submatrices of shape (2,2) are all ones. Anywhere where that condition isn't fulfilled should be zero, but priority should be given to the submatrices where that condition is fulfilled, because submatrices can be overlapping.
In that case, you're most likely interested in the staggered grid of that matrix that has a one in the center of each 2x2 submatrix whenever the 4 elements of that submatrix are all ones:
>>> import numpy as np
>>> from sklearn.feature_extraction.image import extract_patches # similar to numpy's stride_tricks
>>>
>>> data = np.array([[1, 1, 0, 0, 0, 0, 1, 0],
... [1, 1, 1, 0, 0, 1, 1, 0],
... [1, 1, 0, 1, 1, 0, 0, 0],
... [0, 0, 0, 1, 1, 0, 0, 0],
... [0, 0, 0, 1, 1, 0, 0, 1],
... [1, 1, 0, 0, 0, 0, 1, 0],
... [1, 1, 0, 0, 0, 0, 0, 0]])
>>>
>>> # to take boundary effects into account, append ones to the right and bottom
... # modify this to `np.zeros` if boundaries are to be set to zero
... data2 = np.ones((data.shape[0]+1, data.shape[1]+1))
>>> data2[:-1,:-1] = data
>>> vert = np.logical_and(data2[:-1,:], data2[1:,])
>>> dual = np.logical_and(vert[:,:-1], vert[:,1:]) # dual is now the "dual" graph/staggered grid of the data2 array
>>> patches = extract_patches(data2, patch_shape=(2, 2)) # could've used numpy stride_tricks too
>>> patches[dual==0] = 0
>>> patches[dual] = 1 # Give precedence to the dual positives
>>> data2[:-1, :-1].astype(np.uint8)
array([[1, 1, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0]], dtype=uint8)
For completeness, this staggered grid form of the matrix could also be obtained easily with a correlation with a np.ones((2,2)) kernel. However, that is computationally more heavy, because a lot more work has to be done (multiplications and summations) rather than simple bit-operations. The method above will outperform a correlation-based method in terms of speed.
The staggered grid dual above could also be generated in the following way:
patches = extract_patches(data, patch_shape=(2, 2))
dual = patches.all(axis=-1).all(axis=-1)
And you would obtain the final result with:
dual = patches.all(axis=-1).all(axis=-1)
patches[dual==False] = 0
patches[dual] = 1
It differs from the previous method in what happens at the boundaries though.
Here's an alternative method, using minimum_filter and maximum_filter from scipy.ndimage. (The description in the question is still too vague--for me, anyway--so this is based on the result shown in #OliverW.'s answer.)
In [138]: from scipy.ndimage import minimum_filter, maximum_filter
In [139]: data
Out[139]:
array([[1, 1, 0, 0, 0, 0, 1, 0],
[1, 1, 1, 0, 0, 1, 1, 0],
[1, 1, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 1],
[1, 1, 0, 0, 0, 0, 1, 0],
[1, 1, 0, 0, 0, 0, 0, 0]])
In [140]: m = minimum_filter(data, size=(2,2), mode='constant', origin=(-1,-1))
In [141]: result = maximum_filter(m, size=(2,2), mode='constant')
In [142]: result
Out[142]:
array([[1, 1, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0]])