Efficient way to calculate the pairwise matrix product between one tensor and all the rolling of another tensor - numpy

Suppose we have two tensors:
tensor A whose shape is (d,m,n)
tensor B whose shape is (d,n,l).
If we want to get the pairwise matrix product of the right-most matrix of A and B, I think we can use np.einsum('dmn,...nl->d...ml',A,B) whose size is (d,d,m,l). However, I would like to get the pairwise product of not all the pairs.
Import a parameter k, 1<=k<=d, I want to get the following pairwise matrix product:
Note here we we use a rolling way to deal with tensor B. (like numpy.roll).
Finally, we actually get a tensor whose shape is (d,k,m,l).
What's the most efficient way to do this.
I know several ways like:
First get np.einsum('dmn,...nl->d...ml',A,B), then use a mask to extract the (d,k) pairs.
tile B first, then use einsum in some way.
But I think there exists a better way.

I doubt you can do much better than a for loop. Here is, for example, a vectorized version using einsum and stride_tricks compared to a double for loop:
from simple_benchmark import BenchmarkBuilder, MultiArgument
import numpy as np
from numpy.lib.stride_tricks import as_strided
B = BenchmarkBuilder()
def loopy(A,B,k):
d,m,n = A.shape
l = B.shape[-1]
out = np.empty((d,k,m,l),int)
for i in range(d):
for j in range(k):
out[i,j] = A[i]#B[(i+j)%d]
return out
def vectory(A,B,k):
d,m,n = A.shape
l = B.shape[-1]
BB = np.concatenate([B,B[:k-1]],0)
BB = as_strided(BB,(d,k,n,l),np.repeat(BB.strides,(2,1,1)))
return np.einsum("ikl,ijln->ijkn",A,BB)
#B.add_arguments('d x k x m x n x l')
def argument_provider():
for exp in range(10):
d,k,m,n,l = (np.r_[1.6,1.5,1.5,1.5,1.5]**exp*(4,2,2,2,2)).astype(int)
A = np.random.randint(0,10,(d,m,n))
B = np.random.randint(0,10,(d,n,l))
yield k*d*m*n*l,MultiArgument([A,B,k])
r = B.run()
To find an inverse matrix of A with LU decomposition

The task asks me to generate A matrix with 50 columns and 50 rows with a random library of seed 1007092020 in the range [0,1].
import numpy as np
A = np.random.randint(2, size=(3,3))
Then I have to find an inverse matrix of A with LU decomposition.
No idea how to do that.
If you need matrix A to be a 50 x 50 matrix with random floating numbers, then you can make that with the following code :
import numpy as np
A = np.random.random((50,50))
Instead, if you want integers in the range 0,1 (1 included), you can do this
A = np.random.randint(0,2,(50,50))
If you want to compute the inverse using LU decomposition, you can use SciPy. It should be noted that since you are generating random matrices, it is possible that your matrix does not have an inverse. In that case, you can not find the inverse.
Here's some code that will work in case A does have an inverse.
from scipy.linalg import lu
p,l,u = lu(A, permute_l = False)
Now that we have the lower (l) and upper (u) triangular matrices, we can find the inverse of A by the following equation : A^-1 = U^-1 L^-1
l = np.dot(p,l)
l_inv = np.linalg.inv(l)
u_inv = np.linalg.inv(u)
A_inv = np.dot(u_inv,l_inv)

Why does numpy and pytorch give different results after mean and variance normalization?

I am working on a problem in which a matrix has to be mean-var normalized row-wise. It is also required that the normalization is applied after splitting each row into tiny batches.
The code seem to work for Numpy, but fails with Pytorch (which is required for training).
It seems Pytorch and Numpy results differ. Any help will be greatly appreciated.
Example code:
import numpy as np
import torch
def normalize(x, bsize, eps=1e-6):
nc = x.shape[1]
if nc % bsize != 0:
raise Exception(f'Number of columns must be a multiple of bsize')
x = x.reshape(-1, bsize)
m = x.mean(1).reshape(-1, 1)
s = x.std(1).reshape(-1, 1)
n = (x - m) / (eps + s)
n = n.reshape(-1, nc)
return n
# numpy
a = np.float32(np.random.randn(8, 8))
n1 = normalize(a, 4)
# torch
b = torch.tensor(a)
n2 = normalize(b, 4)
n2 = n2.numpy()
In the first example you are calling normalize with a, a numpy.ndarray, while in the second you call normalize with b, a torch.Tensor.
According to the documentation page of torch.std, Bessel’s correction is used by default to measure the standard deviation. As such the default behavior between numpy.ndarray.std and torch.Tensor.std is different.
If unbiased is True, Bessel’s correction will be used. Otherwise, the sample deviation is calculated, without any correction.
torch.std(input, dim, unbiased, keepdim=False, *, out=None) → Tensor
input (Tensor) – the input tensor.
unbiased (bool) – whether to use Bessel’s correction (δN = 1).
You can try yourself:
>>> a.std(), b.std(unbiased=True), b.std(unbiased=False)
(0.8364538, tensor(0.8942), tensor(0.8365))

Knn give more weight to specific feature in distance

I'm using the Kobe Bryant Dataset.
I wish to predict the shot_made_flag with KnnRegressor.
I've used game_date to extract year and month features:
# covert season to years
kobe_data_encoded['season'] = kobe_data_encoded['season'].apply(lambda x: int(re.compile('(\d+)-').findall(x)[0]))
# add year and month using game_date
kobe_data_encoded['year'] = kobe_data_encoded['game_date'].apply(lambda x: int(re.compile('(\d{4})').findall(x)[0]))
kobe_data_encoded['month'] = kobe_data_encoded['game_date'].apply(lambda x: int(re.compile('-(\d+)-').findall(x)[0]))
kobe_data_encoded = kobe_data_encoded.drop(columns=['game_date'])
and I wish to use season, year, month features to give them more weight in the distance function so events with closer date to the current event will be closer neighbors but still maintain reasonable distances to potential other datapoints, so for example I don't wish an event withing the same day would be the closest neighbor just because of the date features but it'll take into account the other features such as shot_range etc..
To give it more weight I've tried to use metric argument with custom distance function but the arguments of the function are just numpy array without column information of pandas so I'm not sure what I can do and how to implement what I'm trying to do.
Using larger weights for date features to find the optimal k with cv of 10 running on k from [1, 100]:
from IPython.display import display
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_val_score
# scaling
min_max_scaler = preprocessing.MinMaxScaler()
scaled_features_df = kobe_data_encoded.copy()
column_names = ['loc_x', 'loc_y', 'minutes_remaining', 'period',
'seconds_remaining', 'shot_distance', 'shot_type', 'shot_zone_range']
scaled_features = min_max_scaler.fit_transform(scaled_features_df[column_names])
scaled_features_df[column_names] = scaled_features
not_classified_df = scaled_features_df[scaled_features_df['shot_made_flag'].isnull()]
classified_df = scaled_features_df[scaled_features_df['shot_made_flag'].notnull()]
X = classified_df.drop(columns=['shot_made_flag'])
y = classified_df['shot_made_flag']
cv = StratifiedKFold(n_splits=10, shuffle=True)
neighbors = [x for x in range(1, 100)]
cv_scores = []
weight = np.ones((X.shape[1],))
]] = 5
weight = weight/weight.sum() #Normalize weights
def my_distance(x, y):
dist = ((x-y)**2)
return np.dot(dist, weight)
for k in neighbors:
print('k: ', k)
knn = KNeighborsClassifier(n_neighbors=k, metric=my_distance)
cv_scores.append(np.mean(cross_val_score(knn, X, y, cv=cv, scoring='roc_auc')))
#optimal K
optimal_k_index = cv_scores.index(min(cv_scores))
optimal_k = neighbors[optimal_k_index]
print('best k: ', optimal_k)
plt.plot(neighbors, cv_scores)
plt.xlabel('Number of Neighbors K')
plt.ylabel('ROC AUC')
Runs really slow, any idea on how to make it faster?
The idea of the weighted features is to find neighbors more close to the data point date to avoid data leakage and cv for finding optimal k.
First, you have to prepare a numpy 1D weight array, specifying weight for each feature. You could do something like:
weight = np.ones((M,)) # M is no of features
weight[[1,7,10]] = 2 # Increase weight of 1st,7th and 10th features
weight = weight/weight.sum() #Normalize weights
You can use kobe_data_encoded.columns to find indexes of season, year, month features in your dataframe to replace 2nd line above.
Now define a distance function, which by guideline have to take two 1D numpy array.
def my_dist(x,y):
global weight #1D array, same shape as x or y
dist = ((x-y)**2) #1D array, same shape as x or y
return np.dot(dist,weight) # a scalar float
And initialize KNeighborsRegressor as:
knn = KNeighborsRegressor(metric=my_dist)
To make things efficient, you can precompute distance matrix, and reuse it in KNN. This should bring in significant speedup by reducing calls to my_dist, since this non-vectorized custom python distance function is quite slow. So now -
dist = np.zeros((len(X),len(X))) #Computing NXN distance matrix
for i in range(len(X)): # You can halve this by using the fact that dist[i,j] = dist[j,i]
for j in range(len(X)):
dist[i,j] = my_dist(X[i],X[j])
for k in neighbors:
print('k: ', k)
knn = KNeighborsClassifier(n_neighbors=k, metric='precomputed') #Note: metric='precomputed'
cv_scores.append(np.mean(cross_val_score(knn, dist, y, cv=cv, scoring='roc_auc'))) #Note: passing dist instead of X
I couldn't test it, so let me know if something isn't alright.
Just add on Shihab's answer regarding distance computation. Can use scipy pdist as suggested in this post, which is faster and more efficient.
from scipy.spatial.distance import pdist, minkowski, squareform
# create the custom weight array
weight = ...
# calculate pairwise distances, using Minkowski norm with custom weights
distances = pdist(X, minkowski, 2, weight)
# reformat the result as a square matrix
distances_as_2d_matrix = squareform(distances)

How to create a new array of tensors from old one

I have a tensor [a, b, c, d, e, f, g, h, i] with dimension 9 X 1536. I need to create a new tensor which is like [(a,b), (a,c), (a,d), (a,e),(a,f),(a,g), (a,h), (a,i)] with dimension [8 x 2 x 1536]. How can I do it with tensorflow ?
I tried like this
x = tf.zeros((9x1536))
x_new = tf.stack([(x[0],x[1]),
(x[0], x[2]),
(x[0], x[3]),
(x[0], x[4]),
(x[0], x[5]),
(x[0], x[6]),
(x[0], x[7]),
(x[0], x[8])])
This seems to work but I would like to know if there is a better solution or approach which can be used instead of this
You can obtain the desired output with a combination of tf.concat, tf.tile and tf.expand_dims:
import tensorflow as tf
import numpy as np
_in = tf.constant(np.random.randint(0,10,(9,1536)))
tile_shape = [(_in.shape[0]-1).value] + [1]*len(_in.shape[1:].as_list())
_out = tf.concat([
tf.expand_dims(_in[1:], 1)
tf.tile repeats the first element of _in creating a tensor of length len(_in)-1 (I compute separately the shape of the tile because we want to tile only on the first dimension).
tf.expand_dims adds a dimension we can then concat on
Finally, tf.concat stitches together the two tensors giving the desired result.
EDIT: Rewrote to fit the OP's actual use-case with multidimensional tensors.

Convolution along one axis only

I have two 2-D arrays with the same first axis dimensions. In python, I would like to convolve the two matrices along the second axis only. I would like to get C below without computing the convolution along the first axis as well.
import numpy as np
import scipy.signal as sg
M, N, P = 4, 10, 20
A = np.random.randn(M, N)
B = np.random.randn(M, P)
C = sg.convolve(A, B, 'full')[(2*M-1)/2]
Is there a fast way?
You can use np.apply_along_axis to apply np.convolve along the desired axis. Here is an example of applying a boxcar filter to a 2d array:
import numpy as np
a = np.arange(10)
a = np.vstack((a,a)).T
filt = np.ones(3)
np.apply_along_axis(lambda m: np.convolve(m, filt, mode='full'), axis=0, arr=a)
This is an easy way to generalize many functions that don't have an axis argument.
With ndimage.convolve1d, you can specify the axis...
np.apply_along_axis won't really help you, because you're trying to iterate over two arrays. Effectively, you'd have to use a loop, as described here.
Now, loops are fine if your arrays are small, but if N and P are large, then you probably want to use FFT to convolve instead.
However, you need to appropriately zero pad your arrays first, so that your "full" convolution has the expected shape:
M, N, P = 4, 10, 20
A = np.random.randn(M, N)
B = np.random.randn(M, P)
A_ = np.zeros((M, N+P-1), dtype=A.dtype)
A_[:, :N] = A
B_ = np.zeros((M, N+P-1), dtype=B.dtype)
B_[:, :P] = B
A_fft = np.fft.fft(A_, axis=1)
B_fft = np.fft.fft(B_, axis=1)
C_fft = A_fft * B_fft
C = np.real(np.fft.ifft(C_fft))
# Test
C_test = np.zeros((M, N+P-1))
for i in range(M):
C_test[i, :] = np.convolve(A[i, :], B[i, :], 'full')
assert np.allclose(C, C_test)
for 2D arrays, the function scipy.signal.convolve2d is faster and scipy.signal.fftconvolve can be even faster (depending on the dimensions of the arrays):
Here the same code with N = 100000
import time
import numpy as np
import scipy.signal as sg
M, N, P = 10, 100000, 20
A = np.random.randn(M, N)
B = np.random.randn(M, P)
T1 = time.time()
C = sg.convolve(A, B, 'full')
T1 = time.time()
C_2d = sg.convolve2d(A, B, 'full')
T1 = time.time()
C_fft = sg.fftconvolve(A, B, 'full')
>>> 12.3
>>> 2.1
>>> 0.6
Answers are all the same with slight differences due to different computation methods used (e.g., fft vs direct multiplication, but i don't know what exaclty convolve2d uses):
print(np.max(np.abs(C - C_2d)))
print(np.max(np.abs(C - C_fft)))
Late answer, but worth posting for reference. Quoting from comments of the OP:
Each row in A is being filtered by the corresponding row in B. I could
implement it like that, just thought there might be a faster way.
A is on the order of 10s of gigabytes in size and I use overlap-add.
Naive / Straightforward Approach
import numpy as np
import scipy.signal as sg
M, N, P = 4, 10, 20
A = np.random.randn(M, N) # (4, 10)
B = np.random.randn(M, P) # (4, 20)
C = np.vstack([sg.convolve(a, b, 'full') for a, b in zip(A, B)])
>>> C.shape
(4, 29)
Each row in A is convolved with each respective row in B, essentially convolving M 1D arrays/vectors.
No Loop + CUDA Supported Version
It is possible to replicate this operation by using PyTorch's F.conv1d. We have to imagine A as a 4-channel, 1D signal of length 10. We wish to convolve each channel in A with a specific kernel of length 20. This is a special case called a depthwise convolution, often used in deep learning.
Note that torch's conv is implemented as cross-correlation, so we need to flip B in advance to do actual convolution.
import torch
import torch.nn.functional as F
def torch_conv(A, B):
M, N, P = A.shape[0], A.shape[1], B.shape[1]
C = F.conv1d(A, B[:, None, :], bias=None, stride=1, groups=M, padding=N+(P-1)//2)
return C.numpy()
# Convert A and B to torch tensors + flip B
X = torch.from_numpy(A) # (4, 10)
W = torch.from_numpy(np.fliplr(B).copy()) # (4, 20)
# Do grouped conv and get np array
Y = torch_conv(X, W)
>>> Y.shape
(4, 29)
>>> np.allclose(C, Y)
Advantages of using a depthwise convolution with torch:
No loops!
The above solution can also run on CUDA/GPU, which can really speed things up if A and B are very large matrices. (From OP's comment, this seems to be the case: A is 10GB in size.)
Overhead of converting from array to tensor (should be negligible)
Need to flip B once before the operation