Passing np.array in np.array - numpy

This was the question and we had to guess the output.
I didn't know we can even pass an array in another array. I got the output but don't understand how.
I am passing 2 arrays - ind1 and ind2 in arr.
Ques -
import numpy as np
arr = np.arange(9, dtype = "float").reshape(3,3)
ind1 = np.array([[1,2],[0,1]])
ind2 = np.array([[0,2],[1,2]])
arr[ind1, ind2].sum()
Output
array([[3., 8.],
[1., 5.]])

The way it works (I think) is it takes the two arrays ind1 and ind2 and uses the corresponding values as the index of the arr. For example,
ind1 = [[1,2],[0,1]]
ind1 = [[0,2],[1,2]]
[ind1, ind2] = [[[1, 0],[2, 2]],[[0, 1],[1, 2]]]
arr[1, 0] = 3
arr[2, 2] = 8
arr[0, 1] = 1
arr[1, 2] = 5

Related

Equivalent of np.isin for TensorFlow

I have categories as a list of list integers as shown below:
categories = [
[0,2,4,6,8],
[1,3,5,7,9]
]
I have a label tensor y with num_batches integers (as classes):
y = tf.constant([0, 1, 1, 2, 5, 4, 7, 9, 3, 3])
I want to replace values in y with certain indices (let's say 0-even, 1-odd) with the categories list available, such that final result would be:
cat_labels = tf.constant([0, 1, 1, 0, 1, 0, 1, 1, 1, 1])
I can get it by iterating through each value in y like below:
cat_labels = tf.Variable(tf.identity(y))
for idx in range(len(categories)):
for i, _y in enumerate(y):
if _y in categories[idx]: # if _y value is in categories[idx]
cat_labels[i].assign(idx) # replace all of them with idx
But apparently iterating is not allowed when this block is encapsulated in a #tf.function parent function.
Is there a way to apply the logic without iterating, or converting to numpy and applying np.isin, while getting speedups of tf.function?
Edit: There seem to be workarounds on this like here, but any help on explaining in the context of this use case would be appreciated.
You can try this:
y = tf.constant([0., 1., 1., 2., 5., 4., 7., 9., 3., 3.], dtype=tf.float32)
categories = [[0,2,4,6,8],[1,3,5,7,9]]
c = tf.convert_to_tensor(categories, dtype=tf.float32)
cat_labels = tf.map_fn( # apply an operation on all of the elements of Y
lambda x:tf.gather_nd( # get index of category: 0 or 1 or anything else
tf.cast( # cast dtype of the result of the inner function
tf.where( # get index of the element of Y in categories
tf.equal(c, x)), # search an element of Y within categories
dtype=tf.float32),[0,0]), y)
tf.print(cat_labels, summarize=-1)
# [0 1 1 0 1 0 1 1 1 1]

How to index ndarray in tuple using boolean with Numpy Python?

I would like to index ndarray in a tuple using a boolean mask such as below
import numpy as np
n_max = 5
list_no = np.arange ( 0, n_max )
lateral = np.tril_indices ( n_max, -1 )
mask= np.diff ( lateral [0].astype ( int ) )
mask [-1] = 1
Expected=lateral[mask!= 0]
However, when executing the line Expected=lateral[mask!= 0],
the compiler return an error
TypeError: only integer scalar arrays can be converted to a scalar
index
Expected=
0 = {ndarray: (4,)} [1 2 3 4]
1 = {ndarray: (4,)} [0 1 2 3]
May I know where did I do wrong?
So it seems like the size of the mask and lateral[0] are different. Since mask is the difference between each element in the array, it is of size n-1 when lateral[0] is of size n. You might want to append to the mask array instead.
Also, since lateral is a tuple, you would need to index on the tuple before applying the mask.
You might be need something like this:
import numpy as np
n_max = 5
list_no = np.arange(0, n_max)
lateral = np.tril_indices(n_max, -1)
mask = np.diff(lateral[0].astype(int))
mask = np.append(mask, 1)
expected_0 = lateral[0][mask != 0]
print(expected_0)
expected_1 = lateral[1][mask != 0]
print(expected_1)

How to convert string which is a list of strings to list of floats pandas

I have the following dataframe (from a large csv file using pd.read_csv):
sal_vcf_to_df = pd.read_csv(sal_filepath, delimiter='\t', header = 0, index_col = False,
low_memory=False, usecols=['listA', 'Amino_Acid_Change', 'Gene_Name'])
sal_df_wo_na = sal_vcf_to_df.dropna(axis = 0, how = 'any')
sal_df_wo_na['listA'] = sal_df_wo_na['listA'].apply(lambda x : ast.literal_eval(x))
sal_df_wo_na['listA'] = sal_df_wo_na['listA'].apply(lambda x: list(map(float, x)))
The dataframe I got:
listA Amino_Acid_Change Gene_Name
0 "['133', '115', '3', '1']" Q637K ATM
1 "['114', '115', '2', '3']" I111 PIK3R1
2 "['51', '59', '1', '1']" T2491 KMT2C
I'd like to convert the 'listA' column to list of floats.
So far I've tried to do it in several steps:
sal_df_wo_na['listA'] = sal_df_wo_na['listA'].apply(lambda x : ast.literal_eval(x))
then:
sal_df_wo_na['DP4_freeBayes'] = sal_df_wo_na['DP4_freeBayes'].apply(lambda x: list(map(float, x)))
But I got the follwing warning after the first step:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
Does anyone know how to fix the warning or have a better solution?
Option 1
pd.eval - Works for upto 100 rows
A really quick way of performing conversion on that horrendous looking column is to get rid of all the quotes and then call pd.eval -
v = pd.eval(df.listA.str.replace("['\"]", '')).astype(float)
v
array([[ 133., 115., 3., 1.],
[ 114., 115., 2., 3.],
[ 51., 59., 1., 1.]])
Assign the result back -
df['listA'] = v
df
listA Amino_Acid_Change Gene_Name
0 [133, 115, 3, 1] Q637K ATM
1 [114, 115, 2, 3] I111 PIK3R1
2 [51, 59, 1, 1] T2491 KMT2C
Option 2
ast.literal_eval - The reliable workhorse
Update: pd.eval only supports upto a 100 rows, so the slower, more reliable fallback would be using ast.literal_eval -
from ast import literal_eval
df.listA = df.listA.str.replace("'", '').apply(literal_eval)
df
listA Amino_Acid_Change Gene_Name
0 [133, 115, 3, 1] Q637K ATM
1 [114, 115, 2, 3] I111 PIK3R1
2 [51, 59, 1, 1] T2491 KMT2C
As for the SettingWithCopyWarning, the best source of reading is
How to deal with SettingWithCopyWarning in Pandas?
In a nutshell, what you're doing is creating sal_df_wo_na by extracting a slice/view from a larger dataframe, something like this -
sal_df_wo_na = df[<some condition here>]
This could lead to chained indexing, which pandas warns against. Instead, you'd need to do something like
sal_df_wo_na = df[<some condition here>].copy()
By creating a copy of the slice using the pd.DataFrame.copy function. If you have objects in your column, add deep=True as an argument to copy.

Numpy compare values inside to return greater index

I have a numpy array and another array:
[array([-1.67397643, -2.77258872]), array([-1.67397643, -2.77258872]), array([-2.77258872, -1.67397643]), array([-2.77258872, -1.67397643])]
Which index position inside the numpy arrays wins - i.e. -1.67397643 > -2.77258872 - so the first value would be 0.
Final output of the numpy array would be [0, 0, 1, 1] (a list is fine too)
How can I do that ?
It seems you have a list of arrays, so I would start by making them a proper numpy array:
a = [array([-1.67397643, -2.77258872]), array([-1.67397643, -2.77258872]), array([-2.77258872, -1.67397643]), array([-2.77258872, -1.67397643])]
b = np.array(a).T # .T transposes it.
c = b[0] < b[1]
c is now an array([False, False, True, True], dtype=bool), and probably serves your purpose. If you must have [0,0,1,1] instead, then:
d = np.zeros(len(c))
d[c] = 1
d is now an array([ 0., 0., 1., 1.])

How can I find a basis for the column space of a rectangular matrix?

Given a numpy ndarray with dimensions m by n (where n>m), how can I find the linearly independent columns?
One way is to use the LU decomposition. The factor U will be of the same size as your matrix, but will be upper-triangular. In each row of U, pick the first nonzero element: these are pivot elements, which belong to linearly independent columns. A self-contained example:
import numpy as np
from scipy.linalg import lu
A = np.array([[1, 2, 3], [2, 4, 2]]) # example for testing
U = lu(A)[2]
lin_indep_columns = [np.flatnonzero(U[i, :])[0] for i in range(U.shape[0])]
Output: [0, 2], which means the 0th and 2nd columns of A form a basis for its column space.
#user6655984's answer inspired this code, where I developed a function instead of the author's last line of code (finding pivot columns of U) so that it can handle more diverse A's.
Here it is:
import numpy as np
from scipy import linalg as LA
np.set_printoptions(precision=1, suppress=True)
A = np.array([[1, 4, 1, -1],
[2, 5, 1, -2],
[3, 6, 1, -3]])
P, L, U = LA.lu(A)
print('P', P, '', 'L', L, '', 'U', U, sep='\n')
Output:
P
[[0. 1. 0.]
[0. 0. 1.]
[1. 0. 0.]]
L
[[1. 0. 0. ]
[0.3 1. 0. ]
[0.7 0.5 1. ]]
U
[[ 3. 6. 1. -3. ]
[ 0. 2. 0.7 -0. ]
[ 0. 0. -0. -0. ]]
I came up with this function:
def get_indices_for_linearly_independent_columns_of_A(U: np.ndarray) -> list:
# I should first convert all "-0."s to "0." so that nonzero() can find them.
U_copy = U.copy()
U_copy[abs(U_copy) < 1.e-7] = 0
# Because some rows in U may not have even one nonzero element,
# I have to find the index for the first one in two steps.
index_of_all_nonzero_cols_in_each_row = (
[U_copy[i, :].nonzero()[0] for i in range(U_copy.shape[0])]
)
index_of_first_nonzero_col_in_each_row = (
[indices[0] for indices in index_of_all_nonzero_cols_in_each_row
if len(indices) > 0]
)
# Because two rows or more may have the same indices
# for their first nonzero element, I should remove duplicates.
unique_indices = sorted(list(set(index_of_first_nonzero_col_in_each_row)))
return unique_indices
Finally:
col_sp_A = A[:, get_indices_for_linearly_independent_columns_of_A(U)]
print(col_sp_A)
Output:
[[1 4]
[2 5]
[3 6]]
Try this one
def LU_decomposition(A):
"""
Perform LU decompostion of a given matrix
Args:
A: the given matrix
Returns: P, L and U, s.t. PA = LU
"""
assert A.shape[0] == A.shape[1]
N = A.shape[0]
P_idx = np.arange(0, N, dtype=np.int16).reshape(-1, 1)
for i in range(N - 1):
pivot_loc = np.argmax(np.abs(A[i:, [i]])) + i
if pivot_loc != i:
A[[i, pivot_loc], :] = A[[pivot_loc, i], :]
P_idx[[i, pivot_loc], :] = P_idx[[pivot_loc, i], :]
A[i + 1:, i] /= A[i, i]
A[i + 1:, i + 1:] -= A[i + 1:, [i]] * A[[i], i + 1:]
U, L, P = np.zeros_like(A), np.identity(N), np.zeros((N, N), dtype=np.int16)
for i in range(N):
L[i, :i] = A[i, :i]
U[i, i:] = A[i, i:]
P[i, P_idx[i][0]] = 1
return P.astype(np.float64), L, U
def get_bases(A):
assert A.ndim == 2
Q = gaussian_elimination(A)
M, N = Q.shape
pivot_idxs = []
for i in range(M):
j = i
while j < N and abs(Q[i, j]) < 1e-5:
j += 1
if j < N:
pivot_idxs.append(j)
return A[:, list(set(pivot_idxs))]