How to compute how many elements in three arrays in python are equal to some value in the same positon betweel the arrays? - numpy

I have three numpy arrays
a = [0, 1, 2, 3, 4]
b = [5, 1, 7, 3, 9]
c = [10, 1, 3, 3, 1]
and i wanna to compute how many elements in a, b, c are equal to 3 in the same position, so for that example would be 3.

An elegant solution is to use Numpy functions, like:
np.count_nonzero(np.vstack([a, b, c])==3, axis=0).max()
Details:
np.vstack([a, b, c]) - generate an array with 3 rows, composed of your
3 source arrays.
np.count_nonzero(...==3, axis=0) - count how many values of 3 occurs
in each column. For your data the result is array([0, 0, 1, 3, 0], dtype=int64).
max() - take the greatest value, in your case 3.

Related

pandas dataframe how to remove values from cell that is a list based on other column

I have a dataframe with 2 columns that represent a list:
a. b. vals. locs
1. 2. [1,2,3,4,5]. [2,3]
5 1. [1,7,2,4,9]. [0,1]
8. 2. [1,9,4,7,8]. [3]
I want, for each row, exclude from the columns vals all the locations that are in locs.
so I will get:
a. b. vals. locs. new_vals
1. 2. [1,2,3,4,5]. [2,3]. [1,2,5]
5 1. [1,7,2,4,9]. [0,1]. [2,4,9]
8. 2. [1,9,4,7,8]. [3]. [1,9,4,8]
What is the best way to do so?
Thanks!
You can use a list comprehension with an internal filter based on enumerate:
df['new_vals'] = [[v for i,v in enumerate(a) if i not in b]
for a,b in zip(df['vals'], df['locs'])]
however this will become quickly inefficient when b get large.
A much better approach would be to use python sets that enable a fast (O(1) complexity) identification of membership:
df['new_vals'] = [[v for i,v in enumerate(a) if i not in S]
for a,b in zip(df['vals'], df['locs']) for S in [set(b)]]
output:
a b vals locs new_vals
0 1 2 [1, 2, 3, 4, 5] [2, 3] [1, 2, 5]
1 5 1 [1, 7, 2, 4, 9] [0, 1] [2, 4, 9]
2 8 2 [1, 9, 4, 7, 8] [3] [1, 9, 4, 8]
Use list comprehension with enumerate and converting values to sets:
df['new_vals'] = [[z for i, z in enumerate(x) if i not in y]
for x, y in zip(df['vals'], df['locs'].apply(set))]
print (df)
a b vals locs new_vals
0 1 2 [1, 2, 3, 4, 5] [2, 3] [1, 2, 5]
1 5 1 [1, 7, 2, 4, 9] [0, 1] [2, 4, 9]
2 8 2 [1, 9, 4, 7, 8] [3] [1, 9, 4, 8]
One way to do this is to create a function that works on row,
def func(row):
ans = [v for v in row['vals'] if row['vals'].index(v) not in row['locs']]
return ans
The call this function for each row using apply.
df['new_value'] = df.apply(func, axis=1)
This will work well, if the lists are short.

How can we convert pandas dataframe two columns to python list after merging two columns vertically?

I have a dataframe...
print(df)
Name ae_rank adf de_rank
a 1 lk 4
b 2 lp 5
c 3 yi 6
How can I concat ae_rank column and de_rank column vertically and convert them into python list.
Expectation...
my_list = [1, 2, 3, 4, 5, 6]
Simpliest is join lists:
my_list = df['ae_rank'].tolist() + df['de_rank'].tolist()
If need reshape DataFrame with DataFrame.melt:
my_list = df.melt(['Name','adf'])['value'].tolist()
print (my_list )
[1, 2, 3, 4, 5, 6]
Another option is
my_list = df[['ae_rank', 'de_rank']].T.stack().tolist()
#[1, 2, 3, 4, 5, 6]
Most efficiently, use filter to select the columns by name that include "_rank" and use the underlying numpy array with ravel on the 'F' order (column major order):
my_list = df.filter(like='_rank').to_numpy().ravel('F').tolist()
output: [1, 2, 3, 4, 5, 6]

Sort one list from another list in TensorFlow

I have two tf.Tensors A: [x0, x1, x2, x3, x4] and B: [2, 2, 1, 3, 2]. I would like to sort A using B.
Basically I would like to do the following, but using only TF operators:
list1, list2 = zip(*sorted(zip(list1, list2)))
I tried tf.sort() with tf.stack, but it seem to sort each dimension independently. I think I need to use tf.argsort similarly to this answer Sort array's rows by another array in Python but the indexing fails as tensor indexing do not seems to be supported.
I think I found the solution:
list1 = [2, 2, 1, 3, 2]
list2 = [0, 1, 2, 3, 4]
ids = tf.argsort(list1)
out = tf.gather(list2, ids) # [2, 0, 1, 4, 3]

NumPy: generalize one-hot encoding to k-hot encoding

I'm using this code to one-hot encode values:
idxs = np.array([1, 3, 2])
vals = np.zeros((idxs.size, idxs.max()+1))
vals[np.arange(idxs.size), idxs] = 1
But I would like to generalize it to k-hot encoding (where shape of vals would be same, but each row can contain k ones).
Unfortunatelly, I can't figure out how to index multiple cols from each row. I tried vals[0:2, [[0, 1], [3]] to select first and second column from first row and third column from second row, but it does not work.
It's called advanced-indexing.
to select first and second column from first row and third column from second row
You just need to pass the respective rows and columns in separate iterables (tuple, list):
In [9]: a
Out[9]:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
In [10]: a[[0, 0, 1],[0, 1, 3]]
Out[10]: array([0, 1, 8])

Numpy Indexing Behavior

I am having a lot of trouble understanding numpy indexing for multidimensional arrays. In this example that I am working with, let's say that I have a 2D array, A, which is 100x10. Then I have another array, B, which is a 100x1 1D array of values between 0-9 (indices for A). In MATLAB, I would use A(sub2ind(size(A), 1:size(A,1)', B) to return for each row of A, the value at the index stored in the corresponding row of B.
So, as a test case, let's say I have this:
A = np.random.rand(100,10)
B = np.int32(np.floor(np.random.rand(100)*10))
If I print their shapes, I get:
print A.shape returns (100L, 10L)
print B.shape returns (100L,)
When I try to index into A using B naively (incorrectly)
Test1 = A[:,B]
print Test1.shape returns (100L, 100L)
but if I do
Test2 = A[range(A.shape[0]),B]
print Test2.shape returns (100L,)
which is what I want. I'm having trouble understanding the distinction being made here. In my mind, A[:,5] and A[range(A.shape[0]),5] should return the same thing, but it isn't here. How is : different from using range(sizeArray) which just creates an array from [0:sizeArray] inclusive, to use an indices?
Let's look at a simple array:
In [654]: X=np.arange(12).reshape(3,4)
In [655]: X
Out[655]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
With the slice we can pick 3 columns of X, in any order (and even repeated). In other words, take all the rows, but selected columns.
In [656]: X[:,[3,2,1]]
Out[656]:
array([[ 3, 2, 1],
[ 7, 6, 5],
[11, 10, 9]])
If instead I use a list (or array) of 3 values, it pairs them up with the column values, effectively picking 3 values, X[0,3],X[1,2],X[2,1]:
In [657]: X[[0,1,2],[3,2,1]]
Out[657]: array([3, 6, 9])
If instead I gave it a column vector to index rows, I get the same thing as with the slice:
In [659]: X[[[0],[1],[2]],[3,2,1]]
Out[659]:
array([[ 3, 2, 1],
[ 7, 6, 5],
[11, 10, 9]])
This amounts to picking 9 individual values, as generated by broadcasting:
In [663]: np.broadcast_arrays(np.arange(3)[:,None],np.array([3,2,1]))
Out[663]:
[array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]]),
array([[3, 2, 1],
[3, 2, 1],
[3, 2, 1]])]
numpy indexing can be confusing. But a good starting point is this page: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html