Weird behavior about numpy.mean()

Weird behavior about numpy.mean() - numpy

I am using numpy 1.21.
import numpy as np
a = [(1, 1), (2,)]
np.mean(a)
It returns:
array([0.5, 0.5, 1. ])
It's not the mean of flattened array. Can anyone understand why it got returned?

Related

Speed up applying a transformation to each index value of a given array

I need to apply a function to the result of a transformation of all index values of a given numpy array. The following code does this:
import numpy as np
from matplotlib.transforms import IdentityTransform
# some 2D array
a = np.empty((2,3))
# some affine transformation, identity is just an example here
trans = IdentityTransform()
# some function taking a 2D index and returning some value depending
# on that index, again just an example
def f(idx):
return (idx[0]+idx[1])/2
# apply f to the result of transforming each index of a
b=np.empty_like(a)
for idx in np.ndindex(a.shape):
b[idx] = f(trans.transform(idx))
print(b)
This prints the following correct result:
[[0. 0.5 1. ]
[0.5 1. 1.5]]
The problem now is, the code is too slow when the shape of a gets larger, say 2000x3000. Is there a way to speed this up?
My idea is to create an array of indices of a idx = [[0,0], [0,1], ..., [1,2]], then transform this array in one go using something like tmp = trans.transform(idx), and lastly apply f to every element with np.vectorize(f)(tmp).
Is this a reasonable approach? If yes, how would this actually look like? If no, are there any alternatives?
Edit: I managed to get at tmp via the following code:
tmp=trans.transform(np.asarray([idx for idx in np.ndindex(a.shape)]))
So now I have an array containing the results of the affine transformation for every index value of a. But this seems to use an awful lot of memory.

I'll post an answer myself with what I figured out now. Maybe it is of use for someone.
To answer the first part of my question, I found a fast and efficient way to create the result of transforming the index values, using the result of np.indices() and then massaging the result of that until it fits to what t.transform() expects.
Given some array a = np.empty((2,3)), the indices of that array can be obtained via np.indices(a.shape). This returns two 2D arrays (one for each dimension of a, actually). What I failed to understand was how to turn these results into something transform() understands.
The key here is to apply np.ravel() to the result of each of those arrays, np.indices() returns:
>>> a=np.empty((2,3))
>>> list(map(np.ravel, np.indices(a.shape)))
[array([0, 0, 0, 1, 1, 1]), array([0, 1, 2, 0, 1, 2])]
Now I have a list of arrays containing all the x and y indices, which just needs to be put together with np.vstack() and then transposed to get an array of all (x, y) indices, and this is the form transform() will accept.
>>> l=list(map(np.ravel, np.indices(a.shape)))
>>> np.vstack(l).transpose()
array([[0, 0],
[0, 1],
[0, 2],
[1, 0],
[1, 1],
[1, 2]])
And finally, for some arbitrary affine transformation:
>>> from matplotlib.transforms import Affine2D
>>> t = Affine2D().translate(10, 20).scale(0.5)
>>> t.transform(np.vstack(l).transpose())
array([[ 5. , 10. ],
[ 5. , 10.5],
[ 5. , 11. ],
[ 5.5, 10. ],
[ 5.5, 10.5],
[ 5.5, 11. ]])
This is quite fast, even for larger array sizes. If the shape gets big enough (something like 20000x30000), I run out of memory, but for shapes 10000x10000 it still is amazingly fast.
>>> timeit.timeit("t.transform(np.vstack(list(map(np.ravel, np.indices(a.shape, dtype=np.uint16)))).transpose())",
... "import numpy as np ; from matplotlib.transforms import Affine2D ; a = np.empty((20, 10)) ; t = Affine2D().translate(10, 20).scale(0.5)", number=10)
0.0003051299718208611
>>> timeit.timeit("t.transform(np.vstack(list(map(np.ravel, np.indices(a.shape, dtype=np.uint16)))).transpose())",
... "import numpy as np ; from matplotlib.transforms import Affine2D ; a = np.empty((200, 100)) ; t = Affine2D().translate(10, 20).scale(0.5)", number=10)
0.0026413939776830375
>>> timeit.timeit("t.transform(np.vstack(list(map(np.ravel, np.indices(a.shape, dtype=np.uint16)))).transpose())",
... "import numpy as np ; from matplotlib.transforms import Affine2D ; a = np.empty((2000, 1000)) ; t = Affine2D().translate(10, 20).scale(0.5)", number=10)
0.35055489401565865
>>> timeit.timeit("t.transform(np.vstack(list(map(np.ravel, np.indices(a.shape, dtype=np.uint16)))).transpose())",
... "import numpy as np ; from matplotlib.transforms import Affine2D ; a = np.empty((20000, 10000)) ; t = Affine2D().translate(10, 20).scale(0.5)", number=10)
43.62860555597581
Now for the second part, for applying the function to each of the transformed index values I use the following code for now, which is fast enough in my case.
xxyy = t.transform(np.vstack(...).transpose())
np.fromiter((f(*xy) for xy in xxyy), dtype=np.short, count=len(xxyy))

Applying gaussian blur to images in a loop

I have a simple ndarray with shape as:
import matplotlib.pyplot as plt
%matplotlib inline
plt.imshow(trainImg[0]) #can display a sample image
print(trainImg.shape) : (4750, 128, 128, 3) #shape of the dataset
I intend to apply Gaussian blur to all the images. The for loop I went with:
trainImg_New = np.empty((4750, 128, 128,3))
for idx, img in enumerate(trainImg):
trainImg_New[idx] = cv2.GaussianBlur(img, (5, 5), 0)
I tried to display a sample blurred image as:
plt.imshow(trainImg_New[0]) #view a sample blurred image
but I get an error:
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
It just displays a blank image.

TL;DR:
The error is most likely caused by trainImg_New is float datatype and its value is larger than 1. So, as #Frightera mentioned, try using np.uint8 to convert images' datatype.
I tested the snippets as below:
import numpy as np
import matplotlib.pyplot as plt
import cv2
trainImg_New = np.random.rand(4750, 128, 128,3) # all value is in range [0, 1]
save = np.empty((4750, 128, 128,3))
for idx, img in enumerate(trainImg_New):
save[idx] = cv2.GaussianBlur(img, (5, 5), 0)
plt.imshow(np.float32(save[0]+255)) # Reported error as question
plt.imshow(np.float32(save[0]+10)) # Reported error as question
plt.imshow(np.uint8(save[0]+10)) # Good to go
First of all, cv2.GaussianBlur will not change the range of the arrays' value and the original image arrays's value is legitimate. So I believe the only reason is the datatype of the trainImg_New[0] is not match its range.
So I tested the snippets above, we can see when the datatype of trainImg_New[0] matter the available range of the arrays' value.

I suggest you use tfa.image.gaussian_filter2d from the tensorflow_addons package. I think you'll be able to pass all your images at once.
import tensorflow as tf
from skimage import data
import tensorflow_addons as tfa
import matplotlib.pyplot as plt
image = data.astronaut()
plt.imshow(image)
plt.show()
blurred = tfa.image.gaussian_filter2d(image,
filter_shape=(25, 25),
sigma=3.)
plt.imshow(blurred)
plt.show()

What's the best way to compute row-wise (or axis-wise) dot products with jax?

I have two numerical arrays of shape (N, M). I'd like to compute a row-wise dot product. I.e. produce an array of shape (N,) such that the nth row is the dot product of the nth row from each array.
I'm aware of numpy's inner1d method. What would the best way be to do this with jax? jax has jax.numpy.inner, but this does something else.

You can define your own jit-compiled version of inner1d in a few lines of jax code:
import jax
#jax.jit
def inner1d(X, Y):
return (X * Y).sum(-1)
Testing it out:
import jax.numpy as jnp
import numpy as np
from numpy.core import umath_tests
X = np.random.rand(5, 10)
Y = np.random.rand(5, 10)
print(umath_tests.inner1d(X, Y))
print(inner1d(jnp.array(X), jnp.array(Y)))
# [2.23219571 2.1013316 2.70353783 2.14094973 2.62582531]
# [2.2321959 2.1013315 2.703538 2.1409497 2.6258256]

You can try jax.numpy.einsum. Here the implementaion using numpy einsum
import numpy as np
from numpy.core.umath_tests import inner1d
arr1 = np.random.randint(0,10,[5,5])
arr2 = np.random.randint(0,10,[5,5])
arr = np.inner1d(arr1,arr2)
arr
array([ 87, 200, 229, 81, 53])
np.einsum('...i,...i->...',arr1,arr2)
array([ 87, 200, 229, 81, 53])

NumPy for dommies using Jupyter

I am learning how to use NumPy now and after cerating a 1 dim ndarray like this :
import numpy as np
x = np.array([1.5, 2.3, 4, 5.8], dtype = np.int64)
when printing x:
print(x)
the dtype = np.int64 does not work in the web base Jupyter notebook (Anaconda), can someone help me please? Thanks a lot guys!

Turn 2D NumPy array into 1D array for plotting a histogram

I'm trying to plot a histogram with matplotlib.
I need to convert my one-line 2D Array
[[1,2,3,4]] # shape is (1,4)
into a 1D Array
[1,2,3,4] # shape is (4,)
How can I do this?

Adding ravel as another alternative for future searchers. From the docs,
It is equivalent to reshape(-1, order=order).
Since the array is 1xN, all of the following are equivalent:
arr1d = np.ravel(arr2d)
arr1d = arr2d.ravel()
arr1d = arr2d.flatten()
arr1d = np.reshape(arr2d, -1)
arr1d = arr2d.reshape(-1)
arr1d = arr2d[0, :]

You can directly index the column:
>>> import numpy as np
>>> x2 = np.array([[1,2,3,4]])
>>> x2.shape
(1, 4)
>>> x1 = x2[0,:]
>>> x1
array([1, 2, 3, 4])
>>> x1.shape
(4,)
Or you can use squeeze:
>>> xs = np.squeeze(x2)
>>> xs
array([1, 2, 3, 4])
>>> xs.shape
(4,)

reshape will do the trick.
There's also a more specific function, flatten, that appears to do exactly what you want.

the answer provided by mtrw does the trick for an array that actually only has one line like this one, however if you have a 2d array, with values in two dimension you can convert it as follows
a = np.array([[1,2,3],[4,5,6]])
From here you can find the shape of the array with np.shape and find the product of that with np.product this now results in the number of elements. If you now use np.reshape() to reshape the array to one length of the total number of element you will have a solution that always works.
np.reshape(a, np.product(a.shape))
>>> array([1, 2, 3, 4, 5, 6])

Use numpy.flat
import numpy as np
import matplotlib.pyplot as plt
a = np.array([[1,0,0,1],
[2,0,1,0]])
plt.hist(a.flat, [0,1,2,3])
The flat property returns a 1D iterator over your 2D array. This method generalizes to any number of rows (or dimensions). For large arrays it can be much more efficient than making a flattened copy.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Weird behavior about numpy.mean() - numpy

I am using numpy 1.21. import numpy as np a = [(1, 1), (2,)] np.mean(a) It returns: array([0.5, 0.5, 1. ]) It's not the mean of flattened array. Can anyone understand why it got returned?

Related

Speed up applying a transformation to each index value of a given array

Applying gaussian blur to images in a loop

What's the best way to compute row-wise (or axis-wise) dot products with jax?

NumPy for dommies using Jupyter

Turn 2D NumPy array into 1D array for plotting a histogram

Categories

Resources