I am wondering if there is a python-numpy operator to compare two vectors of the same shape. Specifically,
Can I get the results directly through a numpy API and what is it? Thanks very much!
If your arrays are a and b:
c = ((np.repeat(a, b.shape[0]).reshape(a.shape[0], b.shape[0]) - b) == 0).astype(int)
or, as hpaulj and FBruzzesi said:
c = (a[:, None] == b).astype(int)
Related
I am currently doing some studies on computing a 4th order tensor in numpy with the einsum function.
The tensor I am computing is written in Einstein notation and the function einsun does the work perfectly! But I would like to know what it is doing in the following case:
import numpy as np
a=np.array([[2,0,3],[0,1,0],[0, 0, 4]])
b= np.eye(3)
r1=np.einsum("ij,kl->ijkl", a, b)
r2=np.einsum("ik,jl->ijkl", a, b)
in r1 I am basically doing the standard tensor product (equivalent to np.tensordot(a,b,axes=0)).
What about in r2?
I know I can get the value by doing a[:,None,:,None]*b[None,:,None,:] but I do not know what the indexing is doing. Does this operation have a name?
Sorry if this is too basic!
I tried to use the transpose definition to change multiple axes.
It works for 'ij,kl -> ijkl' , 'ik,jl->ijkl' ,'kl,ij->ijkl'
but fails for 'il,jk->ijkl', 'jl,ik->ijkl'and 'jk,il->ijkl'
import numpy as np
a=np.eye(3)
a[0][0]=2
a[0][-1]=3
a[-1][-1]=4
b=np.eye(3)
def permutation(str_,Arr):
Arr=np.reshape(Arr,[3,3,3,3])
def splitString(str_):
tmp1=str_.split(',')
tmp2=tmp1[1].split('->')
str_idx1=tmp1[0]
str_idx2=tmp2[0]
str_idx_out=tmp2[1]
return str_idx1,str_idx2, str_idx_out
idx_a, idx_b, idx_out=splitString(str_)
dict_={'i':0,'j':1,'k':2,'l':3}
def split(word):
return [char for char in word]
a,b=split(idx_a)
c,d=split(idx_b)
Arr=np.transpose(Arr,(dict_[a],dict_[b],dict_[c],dict_[d]))
return Arr
str_='jk,il->ijkl'
d=np.outer(a,b)
f=np.einsum(str_, a,b)
check=permutation(str_,d)
if (np.count_nonzero(f-check)==0):
print ('Code is working!')
else:
print("Something is wrong...")
Appreciate your suggestions!
r2 is essentially the same tensor as r1, but the indices are rearranged. In particular, r2[i,j,k,l] is equal to a[i,k]*b[k,l].
For instance:
>>> r2[0,1,2,1]
3.0
This corresponds to the fact that a[0,2]*b[1,1] is 3 * 1, which is indeed 3.
Another way to think about this is to observe that a[:,j,:,l] is equal to a whenever j == l and is a zero-matrix otherwise.
I've got a column in a Pandas dataframe comprised of variable-length lists and I'm trying to find an efficient way of extracting elements conditional on list length. Consider this minimal reproducible example:
t = pd.DataFrame({'a':[['1234','abc','444'],
['5678'],
['2468','def']]})
Say I want to extract the 2nd element (where relevant) into a new column, and use NaN otherwise. I was able to get it in a very inefficient way:
_ = []
for index,row in t.iterrows():
if (len(row['a']) > 1):
_.append(row['a'][1])
else:
_.append(np.nan)
t['element_two'] = _
And I gave an attempt using np.where(), but I'm not specifying the 'if' argument correctly:
np.where(t['a'].str.len() > 1, lambda x: x['a'][1], np.nan)
Corrections and tips to other solutions would be greatly appreciated! I'm coming from R where I take vectorization for granted.
I'm on pandas 0.25.3 and numpy 1.18.1.
Use str accesor :
n = 2
t['second'] = t['a'].str[n-1]
print(t)
a second
0 [1234, abc, 444] abc
1 [5678] NaN
2 [2468, def] def
While not incredibly efficient, apply is at least clean:
t['a'].apply(lambda _: np.nan if len(_)<2 else _[1])
I generated a new random rows matrix B (50, 40) from a matrix A (100, 40):
B = A[np.random.randint(0,100,size=50)] # it works fine.
Now, I want to take the rows from A that isn't in matrix B.
C = A not in B # pseudocode.
This should do the job:
import numpy as np
A=np.random.randint(5,size=[100,40])
l=np.random.choice(100, size=50, replace=False)
B = A[l]
C= A[np.setdiff1d(np.arange(0,100),l)]
l stores the selected rows, and for C you take the complement of l. Then C is the required matrix.
Note that I set l=np.random.choice(100, size=50, replace=False) to avoid replacement. If you use np.random.randint(0,100,size=50) you may get repeated rows as the same number is selected at random.
Inspried by this question, Check whether each row of a matrix is in another matrix [Python]. First get indices of rows exists in B, then get difference from whole A indices. select rows using difference in the end.
index = np.argwhere((B[:,None,:] == A[:,:]).all(-1))[:, 1]
C = A[np.setdiff1d(np.arange(100), index)]
The numpy_indexed package (Disclaimer: i am its author) has efficient vectorized functionality for all these kinds of operations.
import numpy_indexed as npi
C = npi.difference(A, B)
I have a Scipy matrix
a = sps.csc_matrix( (z , (x,y)), shape = (N,N), dtype = int)
I have another 1D array z that I would like to compare to each column in a and count the matches.
count = 0
for i in range(N):
count += (z == a[:,j]).sum()
This takes a VERY long time because the code is not vectorized. Is there a way to vectorize this comparison?
a == z
does not work. But I want something analogous to how a*z in numpy for two arrays will do a column-wise multiplication very fast, in contrast to explicitly looping over the columns of a and multiplying by z..
Does this give you what you want?
(a == z[:, None]).sum()
The default matrix multiplication is computed as
c[i,j] = sum(a[i,k] * b[k,j])
I am trying to use a custom formula instead of the dot product to get
c[i,j] = sum(a[i,k] == b[k,j])
Is there an efficient way to do this in numpy?
You could use broadcasting:
c = sum(a[...,np.newaxis]*b[np.newaxis,...],axis=1) # == np.dot(a,b)
c = sum(a[...,np.newaxis]==b[np.newaxis,...],axis=1)
I included the newaxis in b just make it clear how that array is expanded. There are other ways of adding dimensions to arrays (reshape, repeat, etc), but the effect is the same. Expand a and b to the same shape to do element by element multiplying (or ==), and then sum on the correct axis.