Is there a numpy function like np.fill(), but for arrays as fill value? - numpy

I'm trying to build an array of some given shape in which all elements are given by another array. Is there a function in numpy which does that efficiently, similar to np.full(), or any other elegant way, without simply employing for loops?
Example: Let's say I want an array with shape
(dim1,dim2) filled with a given, constant scalar value. Numpy has np.full() for this:
my_array = np.full((dim1,dim2),value)
I'm looking for an analog way of doing this, but I want the array to be filled with another array of shape (filldim1,filldim2) A brute-force way would be this:
my_array = np.array([])
for i in range(dim1):
for j in range(dim2):
my_array = np.append(my_array,fill_array)
my_array = my_array.reshape((dim1,dim2,filldim1,filldim2))
EDIT
I was being stupid, np.full() does take arrays as fill value if the shape is modified accordingly:
my_array = np.full((dim1,dim2,filldim1,filldim2),fill_array)
Thanks for pointing that out, #Arne!

You can use np.tile:
>>> shape = (2, 3)
>>> fill_shape = (4, 5)
>>> fill_arr = np.random.randn(*fill_shape)
>>> arr = np.tile(fill_arr, [*shape, 1, 1])
>>> arr.shape
(2, 3, 4, 5)
>>> np.all(arr[0, 0] == fill_arr)
True
Edit: better answer, as suggested by #Arne, directly using np.full:
>>> arr = np.full([*shape, *fill_shape], fill_arr)
>>> arr.shape
(2, 3, 4, 5)
>>> np.all(arr[0, 0] == fill_arr)
True

Related

randomly choose value between two numpy arrays

I have two numpy arrays:
left = np.array([2, 7])
right = np.array([4, 7])
right_p1 = right + 1
What I want to do is
rand = np.zeros(left.shape[0])
for i in range(left.shape[0]):
rand[i] = np.random.randint(left[i], right_p1[i])
Is there a way I could do this without using a for loop?
You could try with:
extremes = zip(left, right_p1)
rand = map(lambda x: np.random.randint(x[0], x[1]), extremes)
This way you will end up with a map object. If you need to save memory, you can keep it that way, otherwise you can get the full np.array passing through a list conversion, like this:
rand = np.array(list(map(lambda x: np.random.randint(x[0], x[1]), extremes)))

Scalar multiplication of arrays with another array in numpy

I have an array of arrays of size (4,1) like:
ar1 = np.array([[1,1],
[2,2],
[3,3],
[4,4]])
and a second array of size (1,2) like:
ar2 = np.array([2,3])
I'm trying to multiply every first item of the first array times the first item of the second array, and every second item of the first array times the second item of the second array, such as the result is:
ar_result = np.array([[2,3],
[4,6],
[6,9],
[8,12]])
Is there a way to do this in an easy and vectorized way?
When I try to ar1*ar2 it I'm getting this error:
ValueError: operands could not be broadcast together with shapes (4,) (2,)
Thanks
EDIT: To clarify, in my case ar1 is a DataFrame df1 column or row, and ar2 is the content of a cell in another DataFrame df2 (df2.loc[x,y] = [2,3])
Have you tried ar1 * ar2?
ar_result = ar1 * ar2
#array([[ 2, 3],
# [ 4, 6],
# [ 6, 9],
# [ 8, 12]])

Taking a list of 2-d arrays and getting the non-zero values as ones in a single array with Numpy

I have a list of 2-d numpy arrays, and I wish to create one array consisting of the non-zero values (or-wise) of each array set to 1. For example
arr1 = np.array([[1,0],[0,0]])
arr2 = np.array([[0,10],[0,0]])
arr3 = np.array([[0,0],[0,8]])
arrs = [arr1, arr2, arr3]
And so my op would yield
op(arrs) = [[1, 1], [0, 1]]]
What is an efficient way to do this in numpy (for about 8 image arrays of 600 by 600)?
Took me a while to understand. Try just summing all the arrays keeping their dimensions and then replace non-zero values with 1 as follows-
def op(arrs):
return np.where(np.add.reduce(arrs) != 0, 1, 0)

pyspark PandasUDFDType.SCALAR convert Row array has wrong

I want to use PandasUDFDType.SCALAR to operate the Row arrays like belows:
df = spark.createDataFrame([([1, 2, 3, 2],), ([4, 5, 5, 4],)], ['data'])
#pandas_udf(ArrayType(IntegerType()), PandasUDFType.SCALAR)
def s(x):
z = x.apply(lambda xx: xx*2)
return z
df.select(s(df.data)).show()
but it went wrong:
pyarrow.lib.ArrowInvalid: trying to convert NumPy type int32 but got int64```

Finding those elements in an array which are "close"

I have an 1 dimensional sorted array and would like to find all pairs of elements whose difference is no larger than 5.
A naive approach would to be to make N^2 comparisons doing something like
diffs = np.tile(x, (x.size,1) ) - x[:, np.newaxis]
D = np.logical_and(diffs>0, diffs<5)
indicies = np.argwhere(D)
Note here that the output of my example are indices of x. If I wanted the values of x which satisfy the criteria, I could do x[indicies].
This works for smaller arrays, but not arrays of the size with which I work.
An idea I had was to find where there are gaps larger than 5 between consecutive elements. I would split the array into two pieces, and compare all the elements in each piece.
Is this a more efficient way of finding elements which satisfy my criteria? How could I go about writing this?
Here is a small example:
x = np.array([ 9, 12,
21,
36, 39, 44, 46, 47,
58,
64, 65,])
the result should look like
array([[ 0, 1],
[ 3, 4],
[ 5, 6],
[ 5, 7],
[ 6, 7],
[ 9, 10]], dtype=int64)
Here is a solution that iterates over offsets while shrinking the set of candidates until there are none left:
import numpy as np
def f_pp(A, maxgap):
d0 = np.diff(A)
d = d0.copy()
IDX = []
k = 1
idx, = np.where(d <= maxgap)
vidx = idx[d[idx] > 0]
while vidx.size:
IDX.append(vidx[:, None] + (0, k))
if idx[-1] + k + 1 == A.size:
idx = idx[:-1]
d[idx] = d[idx] + d0[idx+k]
k += 1
idx = idx[d[idx] <= maxgap]
vidx = idx[d[idx] > 0]
return np.concatenate(IDX, axis=0)
data = np.cumsum(np.random.exponential(size=10000)).repeat(np.random.randint(1, 20, (10000,)))
pairs = f_pp(data, 1)
#pairs = set(map(tuple, pairs))
from timeit import timeit
kwds = dict(globals=globals(), number=100)
print(data.size, 'points', pairs.shape[0], 'close pairs')
print('pp', timeit("f_pp(data, 1)", **kwds)*10, 'ms')
Sample run:
99963 points 1020651 close pairs
pp 43.00256529124454 ms
Your idea of slicing the array is a very efficient approach. Since your data are sorted you can just calculate the difference and split it:
d=np.diff(x)
ind=np.where(d>5)[0]
pieces=np.split(x,ind)
Here pieces is a list, where you can then use in a loop with your own code on every element.
The best algorithm is highly dependent on the nature of your data which I'm unaware. For example another possibility is to write a nested loop:
pairs=[]
for i in range(x.size):
j=i+1
while x[j]-x[i]<=5 and j<x.size:
pairs.append([i,j])
j+=1
If you want it to be more clever, you can edit the outer loop in a way to jump when j hits a gap.