Broadcasting and resizing an array in dask - numpy

I want to broadcast a 1D dask array and a 2D dask array.
To be specific using numpy it would be something like:
a = np.random.rand(20000, 3)
b = np.random.rand(16)
I want a 3D array of size (20000,16,3) as result, let's call it c. So for each value of b we will have (20000,3) values multiplying a*b[index], where index=0,1,...,15. In numpy it's pretty straightforward using function resize. However, resize does not exist in dask. Anyone has any idea of how I do this on dask? The array can also be xarray with dask array inside, so if anyone knows how to do it with xarray it will be appreciated as well.
Cheers

If I understand your question correctly, you want to achieve this result
import numpy as np
a = np.random.rand(20000, 3)
b = np.random.rand(16)
result = a[:, np.newaxis, :] * b[np.newaxis, :, np.newaxis]
result.shape # (20000, 16, 3)
That code directly works for a dask.array as well
import dask.array as da
dsk_a = da.from_array(a)
dsk_b = da.from_array(b)
result_dask = dsk_a[:, np.newaxis, :] * dsk_b[np.newaxis, :, np.newaxis]
result_dask.shape # (20000, 16, 3)
(result_dask.compute() == result).all() # True
Let me know if I misunderstood your question. If I have, then it would be helpful if you provide a working numpy code that provides the desired result.

Related

How to remove multiple axis from numpy array using np.squeeze?

I have a numpy array of shape (1,1,51200,50), I want to reduce it to (51200,50)
I did the below
arr1 = np.squeeze(arr, axis=0)
arr2 = np.squeeze(arr1, axis=0)
Now I get the desired shape
$arr2.shape
(51200,50)
Is there a simple way to do it, instead of 2 steps here? Please help.

Create 2D hanning, hamming, blackman, gaussian window in NumPy

I am interested in creating 2D hanning, hamming, Blackman, etc windows in NumPy. I know that off-the-shelf functions exist in NumPy for 1D versions of it such as np.blackman(51), np.hamming(51), np.kaiser(51), np.hanning(51), etc.
How to create 2D versions of them? I am not sure if the following solution is the correct way.
window1d = np.blackman(51)
window2d = np.sqrt(np.outer(window1d,window1d))
---EDIT
The concern is that np.sqrt expects only positive values while np.outer(window1d,window1d) will definitely have some negative values. One solution is to relinquish np.sqrt
Any suggestions how to extend these 1d functions to 2d?
That looks reasonable to me. If you want to verify what you are doing is sensible, you can try plotting out what you are creating.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 1.5, 51)
y = np.linspace(0, 1.5, 51)
window1d = np.abs(np.blackman(51))
window2d = np.sqrt(np.outer(window1d,window1d))
X, Y = np.meshgrid(x, y)
Z = window2d
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.contour3D(X, Y, Z, 50, cmap='viridis')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z');
plt.show()
This gives -
This looks like the 2d generalization of the 1d plot which looks like -
However, I had to do window1d = np.abs(np.blackman(51)) when creating the 1d version initially because otherwise, you would end up with small negative values in the final 2D array which you cannot take sqrt of.
Disclaimer: I am not familiar with the functions or their usual use-case. But the shapes of these plots seems to make sense. If the use-case of these functions is somewhere in which the actual values matter, this could be off.

Show class probabilities from Numpy array

I've had a look through and I don't think stack has an answer for this, I am fairly new at this though any help is appreciated.
I'm using an AWS Sagemaker endpoint to return a png mask and I'm trying to display the probability as a whole of each class.
So first stab does this:
np.set_printoptions(threshold=np.inf)
pred_map = np.argmax(mask, axis=0)
non_zero_mask = pred_map[pred_map != 0]) # get everything but background
# print(np.bincount(pred_map[pred_map != 0]).argmax()) # Ignore this line as it just shows the most probable
num_classes = 6
plt.imshow(pred_map, vmin=0, vmax=num_classes-1, cmap='jet')
plt.show()
As you can see I'm removing the background pixels, now I need to show class 1,2,3,4,5 have X probability based on the number of pixels they occupy - I'm unsure if I'll reinvent the wheel by simply taking the total number of elements from the original mask then looping and counting each pixel/class number etc - are there inbuilt methods for this please?
Update:
So after typing this out had a little think and reworded some of searches and came across this.
unique_elements, counts_elements = np.unique(pred_map[pred_map != 0], return_counts=True)
print(np.asarray((unique_elements, counts_elements)))
#[[ 2 3]
#[87430 2131]]
So then I'd just calculate the % based on this or is there a better way? For example I'd do
87430 / 89561(total number of pixels in the mask) * 100
Giving 2 in this case a 97% probability.
Update for Joe's comment below:
rec = Record()
recordio = mx.recordio.MXRecordIO(results_file, 'r')
protobuf = rec.ParseFromString(recordio.read())
values = list(rec.features["target"].float32_tensor.values)
shape = list(rec.features["shape"].int32_tensor.values)
shape = np.squeeze(shape)
mask = np.reshape(np.array(values), shape)
mask = np.squeeze(mask, axis=0)
My first thought was to use np.digitize and write a nice solution.
But then I realized how you can hack it in 10 lines:
import numpy as np
import matplotlib.pyplot as plt
size = (10, 10)
x = np.random.randint(0, 7, size) # your classes, seven excluded.
# empty array, filled with mask and number of occurrences.
x_filled = np.zeros_like(x)
for i in range(1, 7):
mask = x == i
count_mask = np.count_nonzero(mask)
x_filled[mask] = count_mask
print(x_filled)
plt.imshow(x_filled)
plt.colorbar()
plt.show()
I am not sure about the axis convention with imshow
at the moment, you might have to flip the y axis so up is up.
SageMaker does not provide in-built methods for this.

Median filter nullifies the numpy array value

I have an image which I converted to numpy array and after that applying median filter using scipy library on that changes all elements of that ndarray to zero.I don't know why this is happening and I assume it should not happen.
from PIL import Image
import numpy as np
from scipy.signal import medfilt
train = np.array(Image.open("26.jpg").getdata() ,dtype=float).reshape(176, 208, 1)
s = np.sum(train, axis =0)
print(s)
train = medfilt(train, kernel_size= 3)
s1 = np.sum(train, axis =0)
print(s1)
Due to this issue I can't go for further image processing.
medfilt effectively zeropads at the boundaries. Since you have one dimension of size 1, in this direction every pixel is sandwiched between two zeros which outvote everything.
Try omitting the third dimension
train = np.array(Image.open("26.jpg").getdata() ,dtype=float).reshape(176, 208)
and you should be fine.
You can add it after filtering if need be.

Why does MinMaxScaler add lines to image?

I want to normalize the pixel values of an image to the range [0, 1] for each channel (R, G, B).
Minimal Example
#!/usr/bin/env python
import numpy as np
import scipy
from sklearn import preprocessing
original = scipy.misc.imread('Crocodylus-johnsoni-3.jpg')
scipy.misc.imshow(original)
transformed = np.zeros(original.shape, dtype=np.float64)
scaler = preprocessing.MinMaxScaler()
for channel in range(3):
transformed[:, :, channel] = scaler.fit_transform(original[:, :, channel])
scipy.misc.imsave("transformed.jpg", transformed)
What happens
Taking https://commons.wikimedia.org/wiki/File:Crocodylus-johnsoni-3.jpg,
I get the following "normalized" result:
As you can see there are lines from top to bottom at the right side. What happened there? It seems to me that the normalization went wrong. If so: How do I fix it?
In scikit-learn, a two-dimensional array with shape (m, n) is usually interpreted as a collection of m samples, with each sample having n features.
MinMaxScaler.fit_transform() transforms each feature, so each column of your array is transformed independently of the others. That results in the vertical "stripes" in the image.
It looks like you intended to scale each color channel independently. To do that using MinMaxScaler, reshape the input so that each channel becomes one column. That is, if the original image has shape (m, n, 3), reshape it to (m*n, 3) before passing it to the fit_transform() method, and then restore the shape of the result to create the transformed array.
For example,
ascolumns = original.reshape(-1, 3)
t = scaler.fit_transform(ascolumns)
transformed = t.reshape(original.shape)
With this, transformed looks like this:
The image looks exactly like the original, because it turns out that in the array original, the minimum and maximum are 0 and 255, respectively, in each channel:
In [41]: original.min(axis=(0, 1))
Out[41]: array([0, 0, 0], dtype=uint8)
In [42]: original.max(axis=(0, 1))
Out[42]: array([255, 255, 255], dtype=uint8)
So all fit_transform does in this case is transform all the input values to the floating point range [0.0, 1.0] uniformly. If the minimum or maximum was different in one of the channels, the transformed image would look different.
By the way, it is not difficult to perform the transform using pure numpy. (I'm using Python 3, so in the following, the division automatically casts the result to floating point. If you are using Python 2, you'll need to convert one of the argument to floating point, or use from __future__ import division.)
In [58]: omin = original.min(axis=(0, 1), keepdims=True)
In [59]: omax = original.max(axis=(0, 1), keepdims=True)
In [60]: xformed = (original - omin)/(omax - omin)
In [61]: np.allclose(xformed, transformed)
Out[61]: True
(One potential problem with that method is that it will generate an error if one of the channels is constant, because then one of the values in omax - omin will be 0.)