How to create index combinations (k out of n) as sparse bitmasks for numpy - numpy

For numpy how can I efficiently create
an array/matrix representing a list of all combinations (k out of n) as lists of k indices. The shape would be (binomial(n, k), k).
a sparse array/matrix representing this combinations as bitmasks of length n. (So expanding aboves indices to bitmask.) The shape would be (binomial(n, k), n).
I need to do this with large n (and maybe small k). So the algorithm should be
time efficient (e.g. maybe allocate complete result space at once before filling it?)
space efficient (e.g. sparse bitmasks)
Many Thanks for your help.

Assuming the blowup is not that bad (as mentioned in the comment above), you might try this. It's pretty vectorized and should be fast (for cases which could be handled).
Edit: i somewhat assumed you are interested in an output based on scipy.sparse. Maybe you are not.
import itertools
import numpy as np
import scipy.sparse as sp
def combs(a, r):
Return successive r-length combinations of elements in the array a.
Should produce the same output as array(list(combinations(a, r))), but
a = np.asarray(a)
dt = np.dtype([('', a.dtype)]*r)
b = np.fromiter(itertools.combinations(a, r), dt)
b_ = b.view(a.dtype).reshape(-1, r)
return b_
def sparse_combs(k, n):
combs_ = combs(np.arange(n), k)
n_bin = combs_.shape[0]
spmat = sp.coo_matrix(( np.ones(n_bin*k),
(np.repeat(np.arange(n_bin), k),
combs_.ravel()) ),
shape=(n_bin, n))
return spmat
print(combs(range(4), 3))
print('sparse (dense for print)')
print(sparse_combs(3, 4).todense())
[[0 1 2]
[0 1 3]
[0 2 3]
[1 2 3]]
sparse (dense for print)
[[ 1. 1. 1. 0.]
[ 1. 1. 0. 1.]
[ 1. 0. 1. 1.]
[ 0. 1. 1. 1.]]
The helper-function combs i took (probably) from this question (sometime in the past).
Small (unscientific) timing:
from time import perf_counter as pc
start = pc()
spmat = sparse_combs(5, 50)
time_used = pc() - start
print('secs: ', time_used)
print('nnzs: ', spmat.nnz)
#secs: 0.5770790778094155
#nnzs: 10593800
(3, 500)
#secs: 3.4843752405405497
#nnzs: 62125500


Built-in index dependent weight for tensordot in numpy?

I would like to obtain a tensordot of two arrays with the same shape with index-dependent weight applied, without use of explicit loop. For example,
import numpy as np
for i in range(3):
for j in range(3):
C[i,j]=A[i]*B[j]*(np.exp(i-j)if i>j else 0)
Can an array similar to C be obtained with a built-in tool (e.g., with some options for tensordot)?
Here's a vectorized solution:
N = 3
C = np.tril(A[:, None] * B * np.exp(np.arange(N)[:, None] - np.arange(N)), k=-1)
>>> C
array([[ -2. , 0. , 0. ],
[-10.87312731, 12. , 0. ],
[-44.33433659, 48.92907291, 27. ]])
With np.einsum inconsistently slightly faster for some larger inputs than broadcasting, slower for others.
import numpy as np
np.einsum('ij,i,j->ij', np.tril(np.exp(np.subtract.outer(A,A)), -1), A, B)
array([[ 0. , 0. , 0. ],
[-10.87312731, 0. , 0. ],
[-44.33433659, 48.92907291, 0. ]])

Cupy slower than numpy when doing a "for loop" for columns of an array as vectors

I'm trying to parallelize the following operation with cupy:
I have an array. For each column of that array, I'm generating 2 random vectors. I take that array column, add one of the vectors, subtract the other, and make that new vector the next column of the array. I continue on until I finish with the array.
I already asked the following question - Cupy slower than numpy when iterating through array. But this is different, in that I believe I followed the advice of parallelizing the operation and having one "for loop" instead of two, and iterating only through the array columns instead of both rows and columns.
import cupy as cp
import time
#import numpy as cp
def row_size(array):
def number_of_rows(array):
x = (cp.zeros((200,200), 'f'))
#x = cp.zeros((200,200))
x[:,1] = 500000
vector_one = x * 0
vector_two = x * 0
start = time.time()
for i in range(number_of_rows(x) - 1):
if sum(x[ :, i])!=0:
vector_one[ :, i + 1], vector_two[ :, i+ 1] = cp.random.poisson(.01*x[:,i],len(x[:,i])), cp.random.poisson(.01 * x[:,i],len(x[:,i]))
x[ :, i+ 1] = x[ :, i] + vector_one[ :, i+ 1] - vector_two[ :, i+ 1]
time = time.time() - start
When I run this in cupy, the time comes out to about .62 seconds.
When I switch to numpy, so I 1) uncomment #import numpy as cp and #x = cp.zeros((200,200)) and 2) instead comment import cupy as cp
and x = (cp.zeros((200,200), 'f')):
The time comes out to about .11 seconds.
I thought maybe if I increase the array size, for example from (200,200) to (2000,2000), then I'd see a difference in cupy being faster, but it's still slower.
I know this is working properly, in a sense, because if I change the coefficient in cp.random.poisson from .01 to .5, I can only do that in cupy because that lambda is too large for numpy.
But still, how do I make it actually faster with cupy?
In general, looping on the host (CPU) and iteratively processing small device (GPU) arrays isn't ideal due to the larger number of separate kernels you will have to launch than in a columnar-oriented approach. However, sometimes a columnar-oriented approach just isn't feasible.
You can speed up your CuPy code by using CuPy's sum instead of using Python's built-in sum operation, which is forcing a device to host transfer each time you call it. With that said, you can also speed up your NumPy code by switching to NumPy's sum.
import cupy as cp
import time
#import numpy as cp
def row_size(array):
def number_of_rows(array):
x = (cp.zeros((200,200), 'f'))
#x = cp.zeros((200,200))
x[:,1] = 500000
vector_one = x * 0
vector_two = x * 0
start = time.time()
for i in range(number_of_rows(x) - 1):
# if sum(x[ :, i]) !=0:
if x[ :, i].sum() !=0: # or you could do: if x[ :, i].sum().get() !=0:
vector_one[ :, i + 1], vector_two[ :, i+ 1] = cp.random.poisson(.01*x[:,i],len(x[:,i])), cp.random.poisson(.01 * x[:,i],len(x[:,i]))
x[ :, i+ 1] = x[ :, i] + vector_one[ :, i+ 1] - vector_two[ :, i+ 1]
cp.cuda.Device().synchronize() # CuPy is asynchronous, but this doesn't really affect the timing here.
t = time.time() - start
[[ 0. 500000. 500101. ... 498121. 497922. 497740.]
[ 0. 500000. 499894. ... 502050. 502174. 502112.]
[ 0. 500000. 499989. ... 501703. 501836. 502081.]
[ 0. 500000. 499804. ... 499600. 499526. 499371.]
[ 0. 500000. 499923. ... 500371. 500184. 500247.]
[ 0. 500000. 500007. ... 501172. 501113. 501254.]]
This small change should make your workflow much faster (0.06 vs 0.6 seconds originally on my T4 GPU). Note that the .get() method in the comment is used to explicitly transfer the result of the sum operation from the GPU to the CPU before the not equal comparison. This isn't necessary, as CuPy knows how to handle logical operations, but would give you a very tiny additional speedup.

Using numpy einsum to perform high dimensional subtraction broadcasting

I'm having troubles in using a broadcasting subtraction. My problem is the following. I have an array x of shape [L,N], where L is an integer and N is the number of variables of my problem.
I need to compute a [L,N,N] array where at each element l,i,j it contains x[l,i]-x[l,j].
If L=1 this is equivalent to run broadcasting on subtraction: x-x.T
For example here with L=1 and N=3:
import numpy as np
x = np.array([[0,2,4]])
However, if one increases the dimension L things become more complicated and enter the realm of the np.einsum function.
So I tried to recreate my example, in the case L=2, where I've replicated the two rows. What I'd expect is to get a 2x3x3 array with two 3x3 matrices with equal elements.
x = np.array([[0,2,4],[0,2,4]])
n = 3
k = 2
X = np.zeros([k,n,n])
for l in range(k):
for i in range(n):
for j in range(n):
X[l,i,j] = x[l,i]-x[l,j]
which returns
[[[ 0. -2. -4.]
[ 2. 0. -2.]
[ 4. 2. 0.]]
[[ 0. -2. -4.]
[2. 0. -2.]
[ 4. 2. 0.]]]
But how to make this with numpy einsum? I can only obtain the product:
Are there specific examples of numpy batched subtractions or additions with increased dimension?

How does tf.multinomial work?

How does tf.multinomial work? Here is stated that it "Draws samples from a multinomial distribution". What does that mean?
If you perform an experiment n times that can have only two outcomes (either success or failure, head or tail, etc.), then the number of times you obtain one of the two outcomes (success) is a binomial random variable.
In other words, If you perform an experiment that can have only two outcomes (either success or failure, head or tail, etc.), then a random variable that takes value 1 in case of success and value 0 in case of failure is a Bernoulli random variable.
If you perform an experiment n times that can have K outcomes (where K can be any natural number) and you denote by X_i the number of times that you obtain the i-th outcome, then the random vector X defined as
X = [X_1, X_2, X_3, ..., X_K]
is a multinomial random vector.
In other words, if you perform an experiment that can have K outcomes and you denote by X_i a random variable that takes value 1 if you obtain the i-th outcome and 0 otherwise, then the random vector X defined as
X = [X_1, X_2, X_3, ..., X_K]
is a Multinoulli random vector. In other words, when the i-th outcome is obtained, the i-th entry of the Multinoulli random vector X takes value 1, while all other entries take value 0.
So, a multinomial distribution can be seen as a sum of mutually independent Multinoulli random variables.
And the probabilities of the K possible outcomes will be denoted by
p_1, p_2, p_3, ..., p_K
An example in Tensorflow,
In [171]: isess = tf.InteractiveSession()
In [172]: prob = [[.1, .2, .7], [.3, .3, .4]] # Shape [2, 3]
...: dist = tf.distributions.Multinomial(total_count=[4., 5], probs=prob)
...: counts = [[2., 1, 1], [3, 1, 1]]
...: # Shape [2]
Out[172]: array([ 0.0168 , 0.06479999], dtype=float32)
Note: The Multinomial is identical to the
Binomial distribution when K = 2. For more detailed information please refer either tf.compat.v1.distributions.Multinomial or the latest docs of tensorflow_probability.distributions.Multinomial

Bernoulli random number generator

I cannot understand how Bernoulli Random Number generator used in numpy is calculated and would like some explanation on it. For example:
np.random.binomial(size=3, n=1, p= 0.5)
[1 0 0]
n = number of trails
p = probability of occurrence
size = number of experiments
With how do I determine the generated numbers/results of "0" or "1"?
I created a Restricted Boltzmann Machine which always presents the same results despite being "random" on multiple code executions. The randomize is seeded using
import numpy as np
def sigmoid(u):
return 1/(1+np.exp(-u))
def gibbs_vhv(W, hbias, vbias, x):
f_s = sigmoid(, W) + hbias)
h_sample = np.random.binomial(size=f_s.shape, n=1, p=f_s)
f_u = sigmoid(, W.transpose())+vbias)
v_sample = np.random.binomial(size=f_u.shape, n=1, p=f_u)
return [f_s, h_sample, f_u, v_sample]
def reconstruction_error(f_u, x):
cross_entropy = -np.mean(
x * np.log(sigmoid(f_u)) + (1 - x) * np.log(1 - sigmoid(f_u)),
return cross_entropy
X = np.array([[1, 0, 0, 0]])
#Weight to hidden
W = np.array([[-3.85, 10.14, 1.16],
[6.69, 2.84, -7.73],
[1.37, 10.76, -3.98],
[-6.18, -5.89, 8.29]])
hbias = np.array([1.04, -4.48, 2.50]) #<= 3 bias for 3 neuron in hidden
vbias = np.array([-6.33, -1.68, -1.25, 3.45]) #<= 4 bias for 4 neuron in input
k = 2
v_sample = X
for i in range(k):
[f_s, h_sample, f_u, v_sample] = gibbs_vhv(W, hbias, vbias, v_sample)
start = v_sample
if i < 2:
print('f_s:', f_s)
print('h_sample:', h_sample)
print('f_u:', f_u)
print('v_sample:', v_sample)
print('iter:', i, ' h:', h_sample, ' x:', v_sample, ' entropy:%.3f'%reconstruction_error(f_u, v_sample))
[[1 0 0 0]]
f_s: [[ 0.05678618 0.99652957 0.97491304]]
h_sample: [[0 1 1]]
f_u: [[ 0.99310473 0.00139984 0.99604968 0.99712837]]
v_sample: [[1 0 1 1]]
[[1 0 1 1]]
iter: 0 h: [[0 1 1]] x: [[1 0 1 1]] entropy:1.637
f_s: [[ 4.90301318e-04 9.99973278e-01 9.99654440e-01]]
h_sample: [[0 1 1]]
f_u: [[ 0.99310473 0.00139984 0.99604968 0.99712837]]
v_sample: [[1 0 1 1]]
[[1 0 1 1]]
iter: 1 h: [[0 1 1]] x: [[1 0 1 1]] entropy:1.637
I am asking on how the algorithm works to produce the numbers. – WhiteSolstice 35 mins ago
Non-technical explanation
If you pass n=1 to the Binomial distribution it is equivalent to the Bernoulli distribution. In this case the function could be thought of simulating coin flips. size=3 tells it to flip the coin three times and p=0.5 makes it a fair coin with equal probabilitiy of head (1) or tail (0).
The result of [1 0 0] means the coin came down once with head and twice with tail facing up. This is random, so running it again would result in a different sequence like [1 1 0], [0 1 0], or maybe even [1 1 1]. Although you cannot get the same number of 1s and 0s in three runs, on average you would get the same number.
Technical explanation
Numpy implements random number generation in C. The source code for the Binomial distribution can be found here. Actually two different algorithms are implemented.
If n * p <= 30 it uses inverse transform sampling.
If n * p > 30 the BTPE algorithm of (Kachitvichyanukul and Schmeiser 1988) is used. (The publication is not freely available.)
I think both methods, but certainly the inverse transform sampling, depend on a random number generator to produce uniformly distributed random numbers. Numpy internally uses a Mersenne Twister pseudo random number generator. The uniform random numbers are then transformed into the desired distribution.
A Binomially distributed random variable has two parameters n and p, and can be thought of as the distribution of the number of heads obtained when flipping a biased coin n times, where the probability of getting a head at each flip is p. (More formally it is a sum of independent Bernoulli random variables with parameter p).
For instance, if n=10 and p=0.5, one could simulate a draw from Bin(10, 0.5) by flipping a fair coin 10 times and summing the number of times that the coin lands heads.
In addition to the n and p parameters described above, np.random.binomial has an additional size parameter. If size=1, np.random.binomial computes a single draw from the Binomial distribution. If size=k for some integer k, k independent draws from the same Binomial distribution will be computed. size can also be an array of indices, in which case a whole np.array with the given size will be filled with independent draws from the Binomial distribution.
Note that the Binomial distribution is a generalisation of the Bernoulli distribution - in the case that n=1, Bin(n,p) has the same distribution as Ber(p).
For more information about the binomial distribution see: