How to write a cupy user-defined kernel function to calculate the segmented sum - numpy

I use the following function now, but I don't think it works, but I can't understand the description of the cupy kernel definition. This function is very memory intensive and time-consuming when it comes to huge data.
def cupy_sum(self, bins):
bidx = cupy.cumsum(bins) -1,
return cupy.diff(cupy.r_[0, cupy.cumsum(self)[bidx]])
Refer to other examples and write the following code, do not know if there is a problem.
sum_section_kernel = cp.ElementwiseKernel(
'raw T bins, raw T dats',
'float32 out',
'''
T bin_f = bins[i ];
T bin_l = bins[i+1];
T biv = 0;
for(size_t j=bin_f; j<bin_l; j++){
biv += dats[j];
}
out = biv;
''',
'summe')
a = cp.array([4, 3, 5], dtype=cp.float32)
b = cp.array([1, 1, 1.1, 1, 2, 2, 2, 3, 3, 3, 3, 3], dtype=cp.float32)
y = cp.empty(3, dtype=cp.float32)
a = cp.r_[0,a.cumsum()]
out = sum_section_kernel(a, b, y)
print(out)
> [ 4.100 6.000 15.000]

The example has been put in the above, and the speed has not been improved, but I think there is still the advantage of saving memory.

Related

How can I speed up this function in Python?

I am trying to figure out a way to speed up this function. I am trying to do all pairwise comparisons between the rows and columns of a dataframe (pairwise_df) and store the result. The comparison requires two numpy arrays of continuous values taken from another dataframe (df).
pairwise_df = pd.DataFrame(index = ['insert1', 'insert2', 'insert3'], columns = ['insert1', 'insert2', 'insert3'])
df = pd.DataFrame(data = [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [10, 9, 8, 7, 6, 5, 4, 3, 2, 1],
[2, 3, 4, 5, 7, 9, 10, 1, 2, 3]], index = ['insert1', 'insert2', 'insert3'], columns = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
for row in list(pairwise_df.index.values):
for col in list(pairwise_df):
pairwise_df.at[row, col] = cosine_sim(np.array(df.loc[row]), np.array(df.loc[col]))
This works, but takes about 18mins to run on a 2000 x 2000 dataframe, and i'm sure there are ways to speed this up, but my programming experience is minimal.
The cosine_sim function is here, but the function used will vary so it doesn't matter too much:
def cosine_sim(x, y):
dot = np.dot(x, y)
norma = np.linalg.norm(x)
normb = np.linalg.norm(y)
cos = dot / (norma * normb)
return cos
Thanks!
You can avoid loops to compute cosine similarity by creating the array of all combinations using np.tile and np.reshape. The trick here is to use np.einsum to replace the dot product.
m = df.values
x = np.tile(m, m.shape[0]).reshape(-1, m.shape[1])
y = np.tile(m.T, m.shape[0]).T
c = np.einsum('ij,ij->i', x, y) / (np.linalg.norm(x, axis=1) * np.linalg.norm(y, axis=1))
>>> c.reshape(-1, m.shape[0])
array([[1. , 0.57142857, 0.75283826],
[0.57142857, 1. , 0.74102903],
[0.75283826, 0.74102903, 1. ]])

Best way to get joint probability matrix from categorical data

My goal is to get joint probability (here we use count for example) matrix from data samples. Now I can get the expected result, but I'm wondering how to optimize it. Here is my implementation:
def Fill2DCountTable(arraysList):
'''
:param arraysList: List of arrays, length=2
each array is of shape (k, sampleSize),
k == 1 (or None. numpy will align it) if it's single variable
else k for a set of variables of size k
:return: xyJointCounts, xMarginalCounts, yMarginalCounts
'''
jointUniques, jointCounts = np.unique(np.vstack(arraysList), axis=1, return_counts=True)
_, xReverseIndexs = np.unique(jointUniques[[0]], axis=1, return_inverse=True) ###HIGHLIGHT###
_, yReverseIndexs = np.unique(jointUniques[[1]], axis=1, return_inverse=True)
xyJointCounts = np.zeros((xReverseIndexs.max() + 1, yReverseIndexs.max() + 1), dtype=np.int32)
xyJointCounts[tuple(np.vstack([xReverseIndexs, yReverseIndexs]))] = jointCounts
xMarginalCounts = np.sum(xyJointCounts, axis=1) ###HIGHLIGHT###
yMarginalCounts = np.sum(xyJointCounts, axis=0)
return xyJointCounts, xMarginalCounts, yMarginalCounts
def Fill3DCountTable(arraysList):
# :param arraysList: List of arrays, length=3
jointUniques, jointCounts = np.unique(np.vstack(arraysList), axis=1, return_counts=True)
_, xReverseIndexs = np.unique(jointUniques[[0]], axis=1, return_inverse=True)
_, yReverseIndexs = np.unique(jointUniques[[1]], axis=1, return_inverse=True)
_, SReverseIndexs = np.unique(jointUniques[2:], axis=1, return_inverse=True)
SxyJointCounts = np.zeros((SReverseIndexs.max() + 1, xReverseIndexs.max() + 1, yReverseIndexs.max() + 1), dtype=np.int32)
SxyJointCounts[tuple(np.vstack([SReverseIndexs, xReverseIndexs, yReverseIndexs]))] = jointCounts
SMarginalCounts = np.sum(SxyJointCounts, axis=(1, 2))
SxJointCounts = np.sum(SxyJointCounts, axis=2)
SyJointCounts = np.sum(SxyJointCounts, axis=1)
return SxyJointCounts, SMarginalCounts, SxJointCounts, SyJointCounts
My use scenario is to do conditional independence test over variables. SampleSize is usually quite big (~10k) and each variable's categorical cardinality is relatively small (~10). I still find the speed not satisfying.
How to best optimize this code, or even logic outside the code? I may have some thoughts:
The ###HIGHLIGHT### lines. On a single X I may calculate (X;Y1), (Y2;X), (X;Y3|S1)... for many times, so what if I save cache variable's (and conditional set's) {uniqueValue: reversedIndex} dictionary and its marginal count, and then directly get marginalCounts (no need to sum) and replace to get reverseIndexs (no need to unique).
How to further use matrix parallelization to do CITest in batch, i.e. calculate (X;Y|S1), (X;Y|S2), (X;Y|S3)... simultaneously?
Will torch be faster than numpy, on same CPU? Or on GPU?
It's an open question. Thank you for any possible ideas. Big thanks for your help :)
================== A test example is as follows ==================
xs = np.array( [2, 4, 2, 3, 3, 1, 3, 1, 2, 1] )
ys = np.array( [5, 5, 5, 4, 4, 4, 4, 4, 6, 5] )
Ss = np.array([ [1, 0, 0, 0, 1, 0, 0, 0, 1, 1],
[1, 1, 1, 0, 1, 0, 1, 0, 1, 0] ])
xyJointCounts, xMarginalCounts, yMarginalCounts = Fill2DCountTable([xs, ys])
SxyJointCounts, SMarginalCounts, SxJointCounts, SyJointCounts = Fill3DCountTable([xs, ys, Ss])
get 2D from (X;Y): xMarginalCounts=[3 3 3 1], yMarginalCounts=[5 4 1], and xyJointCounts (added axes name FYI):
xy| 4 5 6
--|-------
1 | 2 1 1
2 | 0 2 1
3 | 3 0 0
4 | 0 1 0
get 3D from (X;Y|{Z1,Z2}): SxyJointCounts is of shape 4x4x3, where the first 4 means the cardinality of {Z1,Z2} (00, 01, 10, 11 with respective SMarginalCounts=[3 3 1 3]). SxJointCounts is of shape 4x4 and SyJointCounts is of shape 4x3.

Given a dataframe with N elements, how can make m smaller dataframes such that the size of each m is some fraction of N?

I have a dataset (call it Data) with ~25000 instances that I want to split into a train set, development set, and test set. I want it to be such that,
train set = 0.7*Data
development set = 0.1*Data
test set = 0.2*Data
When making the split, I want the instances to be randomly sampled and NOT REPEATED between the 3 sets. This is why I can't use something like,
train_set = Data.sample(frac=0.7)
dev_set = Data.sample(frac=0.1)
train_set = Data.sample(frac=0.2)
where instances from Data may be repeated in the sets. Is there a build in function that I am missing or could you help me write a function for doing this?
I will use an array to demonstrate an example of what I am looking for.
A = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
splits = [0.7, 0.1, 0.2]
def splitFunction(data, array_of_splits):
// I need your help here
splits = splitFunction(A, splits)
#output
[[1, 3, 8, 9, 6, 7, 2], [4], [5, 0]]
Thank you in advance!
from random import shuffle
def splitFunction(data, array_of_splits):
data_copy = data[:] # copy data if don't want to change original array
shuffle(data_copy) # randomizes data
splits = []
startIndex = 0
for val in array_of_splits:
split = data_copy[startIndex:startIndex + val*len(data)]
startIndex = startIndex + val*len(data)
splits.append(split)
return splits

Finding those elements in an array which are "close"

I have an 1 dimensional sorted array and would like to find all pairs of elements whose difference is no larger than 5.
A naive approach would to be to make N^2 comparisons doing something like
diffs = np.tile(x, (x.size,1) ) - x[:, np.newaxis]
D = np.logical_and(diffs>0, diffs<5)
indicies = np.argwhere(D)
Note here that the output of my example are indices of x. If I wanted the values of x which satisfy the criteria, I could do x[indicies].
This works for smaller arrays, but not arrays of the size with which I work.
An idea I had was to find where there are gaps larger than 5 between consecutive elements. I would split the array into two pieces, and compare all the elements in each piece.
Is this a more efficient way of finding elements which satisfy my criteria? How could I go about writing this?
Here is a small example:
x = np.array([ 9, 12,
21,
36, 39, 44, 46, 47,
58,
64, 65,])
the result should look like
array([[ 0, 1],
[ 3, 4],
[ 5, 6],
[ 5, 7],
[ 6, 7],
[ 9, 10]], dtype=int64)
Here is a solution that iterates over offsets while shrinking the set of candidates until there are none left:
import numpy as np
def f_pp(A, maxgap):
d0 = np.diff(A)
d = d0.copy()
IDX = []
k = 1
idx, = np.where(d <= maxgap)
vidx = idx[d[idx] > 0]
while vidx.size:
IDX.append(vidx[:, None] + (0, k))
if idx[-1] + k + 1 == A.size:
idx = idx[:-1]
d[idx] = d[idx] + d0[idx+k]
k += 1
idx = idx[d[idx] <= maxgap]
vidx = idx[d[idx] > 0]
return np.concatenate(IDX, axis=0)
data = np.cumsum(np.random.exponential(size=10000)).repeat(np.random.randint(1, 20, (10000,)))
pairs = f_pp(data, 1)
#pairs = set(map(tuple, pairs))
from timeit import timeit
kwds = dict(globals=globals(), number=100)
print(data.size, 'points', pairs.shape[0], 'close pairs')
print('pp', timeit("f_pp(data, 1)", **kwds)*10, 'ms')
Sample run:
99963 points 1020651 close pairs
pp 43.00256529124454 ms
Your idea of slicing the array is a very efficient approach. Since your data are sorted you can just calculate the difference and split it:
d=np.diff(x)
ind=np.where(d>5)[0]
pieces=np.split(x,ind)
Here pieces is a list, where you can then use in a loop with your own code on every element.
The best algorithm is highly dependent on the nature of your data which I'm unaware. For example another possibility is to write a nested loop:
pairs=[]
for i in range(x.size):
j=i+1
while x[j]-x[i]<=5 and j<x.size:
pairs.append([i,j])
j+=1
If you want it to be more clever, you can edit the outer loop in a way to jump when j hits a gap.

NumPy/Pandas: convert array of "steps" into bool mask

I have an array like this:
arr = np.array([4, 6, 3, 9, 2, 100, 3, 1, 1, 1, 1])
I want to convert it to a bool array like this:
[ T, F, F, F, T, F, T, F, F, T, T]
# 4, 6, 3, 9, 2, 100, 3, 1, 1, 1, 1
I can do it with a loop like this:
mask = np.zeros(len(arr), dtype=bool)
ii = 0
while ii < len(arr):
mask[ii] = True
ii += arr[ii]
It's sort of an indirect indexing scheme, where each element in the input tells us how many subsequent elements are invalid.
How can I do it without using a Python loop, so that it will be fast if the input array is large? I'm happy to use Pandas too.
There may be some vectorization trick I'm not thinking of, but if you can use numba, it's well suited for problems like this - this loop should now be very fast.
import numba
#numba.jit(nopython=True)
def jump_mask(arr):
mask = np.zeros(len(arr), dtype=np.bool_)
ii = 0
while ii < len(arr):
mask[ii] = True
ii += arr[ii]
return mask