How to compute how many elements in three arrays in python are equal to some value in the same positon betweel the arrays?

How to compute how many elements in three arrays in python are equal to some value in the same positon betweel the arrays? - numpy

I have three numpy arrays
a = [0, 1, 2, 3, 4]
b = [5, 1, 7, 3, 9]
c = [10, 1, 3, 3, 1]
and i wanna to compute how many elements in a, b, c are equal to 3 in the same position, so for that example would be 3.

An elegant solution is to use Numpy functions, like:
np.count_nonzero(np.vstack([a, b, c])==3, axis=0).max()
Details:
np.vstack([a, b, c]) - generate an array with 3 rows, composed of your
3 source arrays.
np.count_nonzero(...==3, axis=0) - count how many values of 3 occurs
in each column. For your data the result is array([0, 0, 1, 3, 0], dtype=int64).
max() - take the greatest value, in your case 3.

Related

pandas dataframe how to remove values from cell that is a list based on other column

I have a dataframe with 2 columns that represent a list:
a. b. vals. locs
1. 2. [1,2,3,4,5]. [2,3]
5 1. [1,7,2,4,9]. [0,1]
8. 2. [1,9,4,7,8]. [3]
I want, for each row, exclude from the columns vals all the locations that are in locs.
so I will get:
a. b. vals. locs. new_vals
1. 2. [1,2,3,4,5]. [2,3]. [1,2,5]
5 1. [1,7,2,4,9]. [0,1]. [2,4,9]
8. 2. [1,9,4,7,8]. [3]. [1,9,4,8]
What is the best way to do so?
Thanks!

You can use a list comprehension with an internal filter based on enumerate:
df['new_vals'] = [[v for i,v in enumerate(a) if i not in b]
for a,b in zip(df['vals'], df['locs'])]
however this will become quickly inefficient when b get large.
A much better approach would be to use python sets that enable a fast (O(1) complexity) identification of membership:
df['new_vals'] = [[v for i,v in enumerate(a) if i not in S]
for a,b in zip(df['vals'], df['locs']) for S in [set(b)]]
output:
a b vals locs new_vals
0 1 2 [1, 2, 3, 4, 5] [2, 3] [1, 2, 5]
1 5 1 [1, 7, 2, 4, 9] [0, 1] [2, 4, 9]
2 8 2 [1, 9, 4, 7, 8] [3] [1, 9, 4, 8]

Use list comprehension with enumerate and converting values to sets:
df['new_vals'] = [[z for i, z in enumerate(x) if i not in y]
for x, y in zip(df['vals'], df['locs'].apply(set))]
print (df)
a b vals locs new_vals
0 1 2 [1, 2, 3, 4, 5] [2, 3] [1, 2, 5]
1 5 1 [1, 7, 2, 4, 9] [0, 1] [2, 4, 9]
2 8 2 [1, 9, 4, 7, 8] [3] [1, 9, 4, 8]

One way to do this is to create a function that works on row,
def func(row):
ans = [v for v in row['vals'] if row['vals'].index(v) not in row['locs']]
return ans
The call this function for each row using apply.
df['new_value'] = df.apply(func, axis=1)
This will work well, if the lists are short.

How can we convert pandas dataframe two columns to python list after merging two columns vertically?

I have a dataframe...
print(df)
Name ae_rank adf de_rank
a 1 lk 4
b 2 lp 5
c 3 yi 6
How can I concat ae_rank column and de_rank column vertically and convert them into python list.
Expectation...
my_list = [1, 2, 3, 4, 5, 6]

Simpliest is join lists:
my_list = df['ae_rank'].tolist() + df['de_rank'].tolist()
If need reshape DataFrame with DataFrame.melt:
my_list = df.melt(['Name','adf'])['value'].tolist()
print (my_list )
[1, 2, 3, 4, 5, 6]

Another option is
my_list = df[['ae_rank', 'de_rank']].T.stack().tolist()
#[1, 2, 3, 4, 5, 6]

Most efficiently, use filter to select the columns by name that include "_rank" and use the underlying numpy array with ravel on the 'F' order (column major order):
my_list = df.filter(like='_rank').to_numpy().ravel('F').tolist()
output: [1, 2, 3, 4, 5, 6]

Sort one list from another list in TensorFlow

I have two tf.Tensors A: [x0, x1, x2, x3, x4] and B: [2, 2, 1, 3, 2]. I would like to sort A using B.
Basically I would like to do the following, but using only TF operators:
list1, list2 = zip(*sorted(zip(list1, list2)))
I tried tf.sort() with tf.stack, but it seem to sort each dimension independently. I think I need to use tf.argsort similarly to this answer Sort array's rows by another array in Python but the indexing fails as tensor indexing do not seems to be supported.

I think I found the solution:
list1 = [2, 2, 1, 3, 2]
list2 = [0, 1, 2, 3, 4]
ids = tf.argsort(list1)
out = tf.gather(list2, ids) # [2, 0, 1, 4, 3]

NumPy: generalize one-hot encoding to k-hot encoding

I'm using this code to one-hot encode values:
idxs = np.array([1, 3, 2])
vals = np.zeros((idxs.size, idxs.max()+1))
vals[np.arange(idxs.size), idxs] = 1
But I would like to generalize it to k-hot encoding (where shape of vals would be same, but each row can contain k ones).
Unfortunatelly, I can't figure out how to index multiple cols from each row. I tried vals[0:2, [[0, 1], [3]] to select first and second column from first row and third column from second row, but it does not work.

It's called advanced-indexing.
to select first and second column from first row and third column from second row
You just need to pass the respective rows and columns in separate iterables (tuple, list):
In [9]: a
Out[9]:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
In [10]: a[[0, 0, 1],[0, 1, 3]]
Out[10]: array([0, 1, 8])

Numpy Indexing Behavior

I am having a lot of trouble understanding numpy indexing for multidimensional arrays. In this example that I am working with, let's say that I have a 2D array, A, which is 100x10. Then I have another array, B, which is a 100x1 1D array of values between 0-9 (indices for A). In MATLAB, I would use A(sub2ind(size(A), 1:size(A,1)', B) to return for each row of A, the value at the index stored in the corresponding row of B.
So, as a test case, let's say I have this:
A = np.random.rand(100,10)
B = np.int32(np.floor(np.random.rand(100)*10))
If I print their shapes, I get:
print A.shape returns (100L, 10L)
print B.shape returns (100L,)
When I try to index into A using B naively (incorrectly)
Test1 = A[:,B]
print Test1.shape returns (100L, 100L)
but if I do
Test2 = A[range(A.shape[0]),B]
print Test2.shape returns (100L,)
which is what I want. I'm having trouble understanding the distinction being made here. In my mind, A[:,5] and A[range(A.shape[0]),5] should return the same thing, but it isn't here. How is : different from using range(sizeArray) which just creates an array from [0:sizeArray] inclusive, to use an indices?

Let's look at a simple array:
In [654]: X=np.arange(12).reshape(3,4)
In [655]: X
Out[655]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
With the slice we can pick 3 columns of X, in any order (and even repeated). In other words, take all the rows, but selected columns.
In [656]: X[:,[3,2,1]]
Out[656]:
array([[ 3, 2, 1],
[ 7, 6, 5],
[11, 10, 9]])
If instead I use a list (or array) of 3 values, it pairs them up with the column values, effectively picking 3 values, X[0,3],X[1,2],X[2,1]:
In [657]: X[[0,1,2],[3,2,1]]
Out[657]: array([3, 6, 9])
If instead I gave it a column vector to index rows, I get the same thing as with the slice:
In [659]: X[[[0],[1],[2]],[3,2,1]]
Out[659]:
array([[ 3, 2, 1],
[ 7, 6, 5],
[11, 10, 9]])
This amounts to picking 9 individual values, as generated by broadcasting:
In [663]: np.broadcast_arrays(np.arange(3)[:,None],np.array([3,2,1]))
Out[663]:
[array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]]),
array([[3, 2, 1],
[3, 2, 1],
[3, 2, 1]])]
numpy indexing can be confusing. But a good starting point is this page: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to compute how many elements in three arrays in python are equal to some value in the same positon betweel the arrays? - numpy

I have three numpy arrays a = [0, 1, 2, 3, 4] b = [5, 1, 7, 3, 9] c = [10, 1, 3, 3, 1] and i wanna to compute how many elements in a, b, c are equal to 3 in the same position, so for that example would be 3.

Related

pandas dataframe how to remove values from cell that is a list based on other column

How can we convert pandas dataframe two columns to python list after merging two columns vertically?

Sort one list from another list in TensorFlow

NumPy: generalize one-hot encoding to k-hot encoding

Numpy Indexing Behavior

Categories

Resources