Taking a list of 2-d arrays and getting the non-zero values as ones in a single array with Numpy - numpy

I have a list of 2-d numpy arrays, and I wish to create one array consisting of the non-zero values (or-wise) of each array set to 1. For example
arr1 = np.array([[1,0],[0,0]])
arr2 = np.array([[0,10],[0,0]])
arr3 = np.array([[0,0],[0,8]])
arrs = [arr1, arr2, arr3]
And so my op would yield
op(arrs) = [[1, 1], [0, 1]]]
What is an efficient way to do this in numpy (for about 8 image arrays of 600 by 600)?

Took me a while to understand. Try just summing all the arrays keeping their dimensions and then replace non-zero values with 1 as follows-
def op(arrs):
return np.where(np.add.reduce(arrs) != 0, 1, 0)

Related

Scalar multiplication of arrays with another array in numpy

I have an array of arrays of size (4,1) like:
ar1 = np.array([[1,1],
[2,2],
[3,3],
[4,4]])
and a second array of size (1,2) like:
ar2 = np.array([2,3])
I'm trying to multiply every first item of the first array times the first item of the second array, and every second item of the first array times the second item of the second array, such as the result is:
ar_result = np.array([[2,3],
[4,6],
[6,9],
[8,12]])
Is there a way to do this in an easy and vectorized way?
When I try to ar1*ar2 it I'm getting this error:
ValueError: operands could not be broadcast together with shapes (4,) (2,)
Thanks
EDIT: To clarify, in my case ar1 is a DataFrame df1 column or row, and ar2 is the content of a cell in another DataFrame df2 (df2.loc[x,y] = [2,3])
Have you tried ar1 * ar2?
ar_result = ar1 * ar2
#array([[ 2, 3],
# [ 4, 6],
# [ 6, 9],
# [ 8, 12]])

Is there a numpy function like np.fill(), but for arrays as fill value?

I'm trying to build an array of some given shape in which all elements are given by another array. Is there a function in numpy which does that efficiently, similar to np.full(), or any other elegant way, without simply employing for loops?
Example: Let's say I want an array with shape
(dim1,dim2) filled with a given, constant scalar value. Numpy has np.full() for this:
my_array = np.full((dim1,dim2),value)
I'm looking for an analog way of doing this, but I want the array to be filled with another array of shape (filldim1,filldim2) A brute-force way would be this:
my_array = np.array([])
for i in range(dim1):
for j in range(dim2):
my_array = np.append(my_array,fill_array)
my_array = my_array.reshape((dim1,dim2,filldim1,filldim2))
EDIT
I was being stupid, np.full() does take arrays as fill value if the shape is modified accordingly:
my_array = np.full((dim1,dim2,filldim1,filldim2),fill_array)
Thanks for pointing that out, #Arne!
You can use np.tile:
>>> shape = (2, 3)
>>> fill_shape = (4, 5)
>>> fill_arr = np.random.randn(*fill_shape)
>>> arr = np.tile(fill_arr, [*shape, 1, 1])
>>> arr.shape
(2, 3, 4, 5)
>>> np.all(arr[0, 0] == fill_arr)
True
Edit: better answer, as suggested by #Arne, directly using np.full:
>>> arr = np.full([*shape, *fill_shape], fill_arr)
>>> arr.shape
(2, 3, 4, 5)
>>> np.all(arr[0, 0] == fill_arr)
True

pyspark PandasUDFDType.SCALAR convert Row array has wrong

I want to use PandasUDFDType.SCALAR to operate the Row arrays like belows:
df = spark.createDataFrame([([1, 2, 3, 2],), ([4, 5, 5, 4],)], ['data'])
#pandas_udf(ArrayType(IntegerType()), PandasUDFType.SCALAR)
def s(x):
z = x.apply(lambda xx: xx*2)
return z
df.select(s(df.data)).show()
but it went wrong:
pyarrow.lib.ArrowInvalid: trying to convert NumPy type int32 but got int64```

Given a dataframe with N elements, how can make m smaller dataframes such that the size of each m is some fraction of N?

I have a dataset (call it Data) with ~25000 instances that I want to split into a train set, development set, and test set. I want it to be such that,
train set = 0.7*Data
development set = 0.1*Data
test set = 0.2*Data
When making the split, I want the instances to be randomly sampled and NOT REPEATED between the 3 sets. This is why I can't use something like,
train_set = Data.sample(frac=0.7)
dev_set = Data.sample(frac=0.1)
train_set = Data.sample(frac=0.2)
where instances from Data may be repeated in the sets. Is there a build in function that I am missing or could you help me write a function for doing this?
I will use an array to demonstrate an example of what I am looking for.
A = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
splits = [0.7, 0.1, 0.2]
def splitFunction(data, array_of_splits):
// I need your help here
splits = splitFunction(A, splits)
#output
[[1, 3, 8, 9, 6, 7, 2], [4], [5, 0]]
Thank you in advance!
from random import shuffle
def splitFunction(data, array_of_splits):
data_copy = data[:] # copy data if don't want to change original array
shuffle(data_copy) # randomizes data
splits = []
startIndex = 0
for val in array_of_splits:
split = data_copy[startIndex:startIndex + val*len(data)]
startIndex = startIndex + val*len(data)
splits.append(split)
return splits

Extract array elements from another array indices

I have a numpy array, a:
a = np.array([[-21.78878256, 97.37484004, -11.54228119],
[ -5.72592375, 99.04189958, 3.22814204],
[-19.80795922, 95.99377136, -10.64537733]])
I have another array, b:
b = np.array([[ 54.64642121, 64.5172014, 44.39991983],
[ 9.62420892, 95.14361441, 0.67014312],
[ 49.55036427, 66.25136632, 40.38778238]])
I want to extract minimum value indices from the array, b.
ixs = [[2],
[2],
[2]]
Then, want to extract elements from the array, a using the indices, ixs:
The expected answer is:
result = [[-11.54228119]
[3.22814204]
[-10.64537733]]
I tried as:
ixs = np.argmin(b, axis=1)
print ixs
[2,2,2]
result = np.take(a, ixs)
print result
Nope!
Any ideas are welcomed
You can use
result = a[np.arange(a.shape[0]), ixs]
np.arange will generate indices for each row and ixs will have indices for each column. So effectively result will have required result.
You can try using below code
np.take(a, ixs, axis = 1)[:,0]
The initial section will create a 3 by 3 array and slice the first column
>>> np.take(a, ixs, axis = 1)
array([[-11.54228119, -11.54228119, -11.54228119],
[ 3.22814204, 3.22814204, 3.22814204],
[-10.64537733, -10.64537733, -10.64537733]])