How to remove multiple axis from numpy array using np.squeeze? - numpy

I have a numpy array of shape (1,1,51200,50), I want to reduce it to (51200,50)
I did the below
arr1 = np.squeeze(arr, axis=0)
arr2 = np.squeeze(arr1, axis=0)
Now I get the desired shape
$arr2.shape
(51200,50)
Is there a simple way to do it, instead of 2 steps here? Please help.

Related

Broadcasting and resizing an array in dask

I want to broadcast a 1D dask array and a 2D dask array.
To be specific using numpy it would be something like:
a = np.random.rand(20000, 3)
b = np.random.rand(16)
I want a 3D array of size (20000,16,3) as result, let's call it c. So for each value of b we will have (20000,3) values multiplying a*b[index], where index=0,1,...,15. In numpy it's pretty straightforward using function resize. However, resize does not exist in dask. Anyone has any idea of how I do this on dask? The array can also be xarray with dask array inside, so if anyone knows how to do it with xarray it will be appreciated as well.
Cheers
If I understand your question correctly, you want to achieve this result
import numpy as np
a = np.random.rand(20000, 3)
b = np.random.rand(16)
result = a[:, np.newaxis, :] * b[np.newaxis, :, np.newaxis]
result.shape # (20000, 16, 3)
That code directly works for a dask.array as well
import dask.array as da
dsk_a = da.from_array(a)
dsk_b = da.from_array(b)
result_dask = dsk_a[:, np.newaxis, :] * dsk_b[np.newaxis, :, np.newaxis]
result_dask.shape # (20000, 16, 3)
(result_dask.compute() == result).all() # True
Let me know if I misunderstood your question. If I have, then it would be helpful if you provide a working numpy code that provides the desired result.

Show class probabilities from Numpy array

I've had a look through and I don't think stack has an answer for this, I am fairly new at this though any help is appreciated.
I'm using an AWS Sagemaker endpoint to return a png mask and I'm trying to display the probability as a whole of each class.
So first stab does this:
np.set_printoptions(threshold=np.inf)
pred_map = np.argmax(mask, axis=0)
non_zero_mask = pred_map[pred_map != 0]) # get everything but background
# print(np.bincount(pred_map[pred_map != 0]).argmax()) # Ignore this line as it just shows the most probable
num_classes = 6
plt.imshow(pred_map, vmin=0, vmax=num_classes-1, cmap='jet')
plt.show()
As you can see I'm removing the background pixels, now I need to show class 1,2,3,4,5 have X probability based on the number of pixels they occupy - I'm unsure if I'll reinvent the wheel by simply taking the total number of elements from the original mask then looping and counting each pixel/class number etc - are there inbuilt methods for this please?
Update:
So after typing this out had a little think and reworded some of searches and came across this.
unique_elements, counts_elements = np.unique(pred_map[pred_map != 0], return_counts=True)
print(np.asarray((unique_elements, counts_elements)))
#[[ 2 3]
#[87430 2131]]
So then I'd just calculate the % based on this or is there a better way? For example I'd do
87430 / 89561(total number of pixels in the mask) * 100
Giving 2 in this case a 97% probability.
Update for Joe's comment below:
rec = Record()
recordio = mx.recordio.MXRecordIO(results_file, 'r')
protobuf = rec.ParseFromString(recordio.read())
values = list(rec.features["target"].float32_tensor.values)
shape = list(rec.features["shape"].int32_tensor.values)
shape = np.squeeze(shape)
mask = np.reshape(np.array(values), shape)
mask = np.squeeze(mask, axis=0)
My first thought was to use np.digitize and write a nice solution.
But then I realized how you can hack it in 10 lines:
import numpy as np
import matplotlib.pyplot as plt
size = (10, 10)
x = np.random.randint(0, 7, size) # your classes, seven excluded.
# empty array, filled with mask and number of occurrences.
x_filled = np.zeros_like(x)
for i in range(1, 7):
mask = x == i
count_mask = np.count_nonzero(mask)
x_filled[mask] = count_mask
print(x_filled)
plt.imshow(x_filled)
plt.colorbar()
plt.show()
I am not sure about the axis convention with imshow
at the moment, you might have to flip the y axis so up is up.
SageMaker does not provide in-built methods for this.

Visualize 1D numpy array as 2D array with matplotlib

I have a 2D array of all the numbers 1 to 100 split by 10. And boolean values for each number being prime or not prime. I'm struggling to figure out how to visualize it like in the image below.
Here is my code to help understand what I have better.
I want to visualize it like this pic online.
# excersize
is_prime = np.ones(100, dtype=bool) # array will be filled with Trues since 1 = True
# For each integer j starting from 2, cross out its higher multiples:
N_max = int(np.sqrt(len(is_prime) - 1))
for j in range(2, N_max + 1):
is_prime[2*j::j] = False
# split an array up into multiple sub arrays
split_primes = np.split(is_prime, 10);
# create overlay for numbers
num_overlay = np.arange(100)
split_overlay = np.split(num_overlay, 10)
plt.plot(split_overlay)
Creating 2D array of the numbers
Check out the documentation for numpy's reshape function. Here you can turn your array into a 2D array by doing:
data = is_prime.reshape(10,10)
we can also make an array of the first 100 integers to use for labeling in a similar fashion:
integers = np.arange(100).reshape(10,10)
Plotting the 2D array
When plotting in 2D you need to use one of the 2D functions that matplotlib provides: e.g. imshow, matshow, pcolormesh. You can either call these functions directly on your array, in which case they will use a colormap and each pixel's color will correspond to the value in associated spot in the array. Or you can explicitly make an RGB image which affords you a bit more control over the color of each box. For this case I think that that is a bit easier to do so the below solution uses that approach. However if you want to annotate heatmaps the matplolib documentation has a great resource for that here. For now we will create an array of RGB values (shape of 10 by 10 by 3) and change the colors of only the prime numbers using numpy's indexing abilities.
#create RGB array that we will fill in
rgb = np.ones((10,10,3)) #start with an array of white
rgb[data]=[1,1,0] # color the places where the data is prime to be white
plt.figure(figsize=(10,10))
plt.imshow(rgb)
# add number annotations
integers = np.arange(100).reshape(10,10)
#add annotations based on: https://stackoverflow.com/questions/20998083/show-the-values-in-the-grid-using-matplotlib
for (i, j), z in np.ndenumerate(integers):
plt.text(j, i, '{:d}'.format(z), ha='center', va='center',color='k',fontsize=15)
# remove axis and tick labels
plt.axis('off')
plt.show()
Resulting in this image:

Why does MinMaxScaler add lines to image?

I want to normalize the pixel values of an image to the range [0, 1] for each channel (R, G, B).
Minimal Example
#!/usr/bin/env python
import numpy as np
import scipy
from sklearn import preprocessing
original = scipy.misc.imread('Crocodylus-johnsoni-3.jpg')
scipy.misc.imshow(original)
transformed = np.zeros(original.shape, dtype=np.float64)
scaler = preprocessing.MinMaxScaler()
for channel in range(3):
transformed[:, :, channel] = scaler.fit_transform(original[:, :, channel])
scipy.misc.imsave("transformed.jpg", transformed)
What happens
Taking https://commons.wikimedia.org/wiki/File:Crocodylus-johnsoni-3.jpg,
I get the following "normalized" result:
As you can see there are lines from top to bottom at the right side. What happened there? It seems to me that the normalization went wrong. If so: How do I fix it?
In scikit-learn, a two-dimensional array with shape (m, n) is usually interpreted as a collection of m samples, with each sample having n features.
MinMaxScaler.fit_transform() transforms each feature, so each column of your array is transformed independently of the others. That results in the vertical "stripes" in the image.
It looks like you intended to scale each color channel independently. To do that using MinMaxScaler, reshape the input so that each channel becomes one column. That is, if the original image has shape (m, n, 3), reshape it to (m*n, 3) before passing it to the fit_transform() method, and then restore the shape of the result to create the transformed array.
For example,
ascolumns = original.reshape(-1, 3)
t = scaler.fit_transform(ascolumns)
transformed = t.reshape(original.shape)
With this, transformed looks like this:
The image looks exactly like the original, because it turns out that in the array original, the minimum and maximum are 0 and 255, respectively, in each channel:
In [41]: original.min(axis=(0, 1))
Out[41]: array([0, 0, 0], dtype=uint8)
In [42]: original.max(axis=(0, 1))
Out[42]: array([255, 255, 255], dtype=uint8)
So all fit_transform does in this case is transform all the input values to the floating point range [0.0, 1.0] uniformly. If the minimum or maximum was different in one of the channels, the transformed image would look different.
By the way, it is not difficult to perform the transform using pure numpy. (I'm using Python 3, so in the following, the division automatically casts the result to floating point. If you are using Python 2, you'll need to convert one of the argument to floating point, or use from __future__ import division.)
In [58]: omin = original.min(axis=(0, 1), keepdims=True)
In [59]: omax = original.max(axis=(0, 1), keepdims=True)
In [60]: xformed = (original - omin)/(omax - omin)
In [61]: np.allclose(xformed, transformed)
Out[61]: True
(One potential problem with that method is that it will generate an error if one of the channels is constant, because then one of the values in omax - omin will be 0.)

Iterating over multidimensional arrays(images) with numpy array - python

Hy!
I have two images(same dimension) as numpy array imgA - imgB
i would like to iterate each row and column and get somenthing like that:
for i in range(0, h-1):
for j in range(0, w-1):
final[i][j]= imgA[i,j] - imgB[i-k[i],j]
where h and w are the height and the width of the image and k is and array with dimension[h*w].
i have seen this topic:
Iterating over a numpy array
but it doens't work with images, i get the error: too many values to unpack
Is there any way to do that with numpy and python 2.7?
thanks
edit
I try to explain better myself.
I have 2 images in LAB color space.
these images are (288,384,3).
Now I would like to make deltaE so I could do like that(spitting the 2 arrays):
imgLabL=np.dsplit(imgL,3)
imgLabR=np.dsplit(imgR,3)
imgLl=imgLabL[0]
imgLa=imgLabL[1]
imgLb=imgLabL[2]
imgRl=imgLabR[0]
imgRa=imgLabR[1]
imgRb=imgLabR[2]
delta=np.sqrt(((imgLl-imgRl)**2) + ((imgLa - imgRa)**2) + ((imgLb - imgRb)**2) )
Till now everything is fine.
But now i have this array k of size (288,384).
So now i need a new delta but with different x axis,like the pixel in imgRl(0,0) i want to add the pixel in imgLl(0+k,0)
do you get more my problems?
I'm pretty sure that whatever it is you are trying to do can be vectorized and run without any loops in it. But the way your code is written, it is no surprise that it doesn't work...
If k is an array of shape (h, w), then k[i] is an array of shape (w,). when you do i-k[i], numpy will do its broadcasting magic, and you will get an array of shape (w,). So you are indexing imgB with an array of shape (w,) and a single integer. Because one of the items in the indexing is an array, fancy indexing kicks in. So assuming imgB also has shape (h, w, 1), the return value of imgB[i-k[i], j] will not be an array of shape (1,), but an array of shape (w, 1). When you then try to substract that from imgA[i, j], which is an array of shape (1,), broadcasting magic works again, and so you get an array of shape (w, 1).
We do not know what is final. But if it is an array of shape (h, w, 1), as imgA and imgB, then final[i][j] is an array of shape (1,), and you are trying to assign to it an array of shape (w, 1), which does not fit. Hence the operand requires a reduction,but reduction is not enabled error message.
EDIT
You don't really need to split your arrays to compute DeltaE...
def deltaE(a, b) :
return np.sqrt(((a - b)**2).sum(axis=-1))
delta = deltaE(imgLabL, imgLabR)
I still don't understand what you want to do in the second case... If you want to compare the two images displaced along the x-axis, I would suggest using np.roll:
deltaE(imgLabL, np.roll(imgLabR, k, axis=0))
will have at position (r, c) the deltaE between the pixel (r, c) of imgLabL and the pixel (r - k, c) of imgLAbR. Is that what you want?
I usually use numpy.nditer, the docs for which are here and have many examples. Briefly:
import numpy as np
a = np.ones([4,4])
it = np.nditer(a)
for elem in a:
#do stuff
You can also use c style iteration, i.e.
while not it.finished:
#do stuff
it.iternext()
If you need to access the indices of your arrays. In your situation, I would zip your two images together to create an array of shape [2,h,w] and then iterate over this, filling an empty array with the results of the computation.