Bernoulli random number generator - numpy

I cannot understand how Bernoulli Random Number generator used in numpy is calculated and would like some explanation on it. For example:
np.random.binomial(size=3, n=1, p= 0.5)
Results:
[1 0 0]
n = number of trails
p = probability of occurrence
size = number of experiments
With how do I determine the generated numbers/results of "0" or "1"?
=================================Update==================================
I created a Restricted Boltzmann Machine which always presents the same results despite being "random" on multiple code executions. The randomize is seeded using
np.random.seed(10)
import numpy as np
np.random.seed(10)
def sigmoid(u):
return 1/(1+np.exp(-u))
def gibbs_vhv(W, hbias, vbias, x):
f_s = sigmoid(np.dot(x, W) + hbias)
h_sample = np.random.binomial(size=f_s.shape, n=1, p=f_s)
f_u = sigmoid(np.dot(h_sample, W.transpose())+vbias)
v_sample = np.random.binomial(size=f_u.shape, n=1, p=f_u)
return [f_s, h_sample, f_u, v_sample]
def reconstruction_error(f_u, x):
cross_entropy = -np.mean(
np.sum(
x * np.log(sigmoid(f_u)) + (1 - x) * np.log(1 - sigmoid(f_u)),
axis=1))
return cross_entropy
X = np.array([[1, 0, 0, 0]])
#Weight to hidden
W = np.array([[-3.85, 10.14, 1.16],
[6.69, 2.84, -7.73],
[1.37, 10.76, -3.98],
[-6.18, -5.89, 8.29]])
hbias = np.array([1.04, -4.48, 2.50]) #<= 3 bias for 3 neuron in hidden
vbias = np.array([-6.33, -1.68, -1.25, 3.45]) #<= 4 bias for 4 neuron in input
k = 2
v_sample = X
for i in range(k):
[f_s, h_sample, f_u, v_sample] = gibbs_vhv(W, hbias, vbias, v_sample)
start = v_sample
if i < 2:
print('f_s:', f_s)
print('h_sample:', h_sample)
print('f_u:', f_u)
print('v_sample:', v_sample)
print(v_sample)
print('iter:', i, ' h:', h_sample, ' x:', v_sample, ' entropy:%.3f'%reconstruction_error(f_u, v_sample))
Results:
[[1 0 0 0]]
f_s: [[ 0.05678618 0.99652957 0.97491304]]
h_sample: [[0 1 1]]
f_u: [[ 0.99310473 0.00139984 0.99604968 0.99712837]]
v_sample: [[1 0 1 1]]
[[1 0 1 1]]
iter: 0 h: [[0 1 1]] x: [[1 0 1 1]] entropy:1.637
f_s: [[ 4.90301318e-04 9.99973278e-01 9.99654440e-01]]
h_sample: [[0 1 1]]
f_u: [[ 0.99310473 0.00139984 0.99604968 0.99712837]]
v_sample: [[1 0 1 1]]
[[1 0 1 1]]
iter: 1 h: [[0 1 1]] x: [[1 0 1 1]] entropy:1.637

I am asking on how the algorithm works to produce the numbers. – WhiteSolstice 35 mins ago
Non-technical explanation
If you pass n=1 to the Binomial distribution it is equivalent to the Bernoulli distribution. In this case the function could be thought of simulating coin flips. size=3 tells it to flip the coin three times and p=0.5 makes it a fair coin with equal probabilitiy of head (1) or tail (0).
The result of [1 0 0] means the coin came down once with head and twice with tail facing up. This is random, so running it again would result in a different sequence like [1 1 0], [0 1 0], or maybe even [1 1 1]. Although you cannot get the same number of 1s and 0s in three runs, on average you would get the same number.
Technical explanation
Numpy implements random number generation in C. The source code for the Binomial distribution can be found here. Actually two different algorithms are implemented.
If n * p <= 30 it uses inverse transform sampling.
If n * p > 30 the BTPE algorithm of (Kachitvichyanukul and Schmeiser 1988) is used. (The publication is not freely available.)
I think both methods, but certainly the inverse transform sampling, depend on a random number generator to produce uniformly distributed random numbers. Numpy internally uses a Mersenne Twister pseudo random number generator. The uniform random numbers are then transformed into the desired distribution.

A Binomially distributed random variable has two parameters n and p, and can be thought of as the distribution of the number of heads obtained when flipping a biased coin n times, where the probability of getting a head at each flip is p. (More formally it is a sum of independent Bernoulli random variables with parameter p).
For instance, if n=10 and p=0.5, one could simulate a draw from Bin(10, 0.5) by flipping a fair coin 10 times and summing the number of times that the coin lands heads.
In addition to the n and p parameters described above, np.random.binomial has an additional size parameter. If size=1, np.random.binomial computes a single draw from the Binomial distribution. If size=k for some integer k, k independent draws from the same Binomial distribution will be computed. size can also be an array of indices, in which case a whole np.array with the given size will be filled with independent draws from the Binomial distribution.
Note that the Binomial distribution is a generalisation of the Bernoulli distribution - in the case that n=1, Bin(n,p) has the same distribution as Ber(p).
For more information about the binomial distribution see: https://en.wikipedia.org/wiki/Binomial_distribution

Related

Cosine similarity and cosine distance formulas relation

Can someone explain these two formulas? Do they have any relationship?
def _cosine_distance(a, b, data_is_normalized=False):
if not data_is_normalized:
a = np.asarray(a) / np.linalg.norm(a, axis=1, keepdims=True)
b = np.asarray(b) / np.linalg.norm(b, axis=1, keepdims=True)
return 1. - np.dot(a, b.T)
def findCosineSimilarity(source_representation, test_representation):
a = np.matmul(np.transpose(source_representation), test_representation)
b = np.sum(np.multiply(source_representation, source_representation))
c = np.sum(np.multiply(test_representation, test_representation))
return 1 - (a / (np.sqrt(b) * np.sqrt(c)))```
Regarding your comment, the cosine distance of two matrices of shape 2 x 5 essentially consists of finding the pairwise cosine distance between the vectors in each array. Assuming you are working with row vectors (which you should when you use NumPy conventionally), the expected output should consist of 2 * 2 = 4 elements. If you are working with column vectors, then 5 * 5 = 25 elements makes sense.
_cosine_distance looks good
The function _cosine_distance is correct in naming and implementation generally for all cases where a in N^{n x l} and b in N^{m x l}.
To use _cosine_distance for 1D arrays you can simply add a singleton dimension at axis 0, e.g. _cosine_distance(a[np.newaxis], b[np.newaxis]).
findCosineSimilarity looks bad
findCosineSimilarity is incorrect in naming (it calculates the cosine distance), and the implementation only works if you have one dimensional arrays. Using this for anything other than 1D arrays will fail as it will compute something that is incorrect by the definition of cosine distance. Also, transposing source_representation (the left matrix) hints that the function is meant for column vectors, which differs from _cosine_distance, not that findCosineSimilarity would work for matrices anyways.
It is easy to create a column/row vector agnostic test case by using a n x n matrix:
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
If we calculate the pairwise cosine distance for every vector in the matrix we should get all zeros as the vectors are the same.
import numpy as np
def findCosineSimilarity(source_representation, test_representation):
a = np.matmul(np.transpose(source_representation), test_representation)
b = np.sum(np.multiply(source_representation, source_representation))
c = np.sum(np.multiply(test_representation, test_representation))
return 1 - (a / (np.sqrt(b) * np.sqrt(c)))
def _cosine_distance(a, b, data_is_normalized=False):
if not data_is_normalized:
a = np.asarray(a) / np.linalg.norm(a, axis=1, keepdims=True)
b = np.asarray(b) / np.linalg.norm(b, axis=1, keepdims=True)
return 1. - np.dot(a, b.T)
a = np.array([
[1,1,1,1],
[1,1,1,1],
[1,1,1,1],
[1,1,1,1]
])
print(findCosineSimilarity(a,a))
print(_cosine_distance(a, a))
Output:
[[0.75 0.75 0.75 0.75]
[0.75 0.75 0.75 0.75]
[0.75 0.75 0.75 0.75]
[0.75 0.75 0.75 0.75]]
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
We see that findCosineSimilarity fails, and that _cosine_distance is correct.

Cupy slower than numpy when doing a "for loop" for columns of an array as vectors

I'm trying to parallelize the following operation with cupy:
I have an array. For each column of that array, I'm generating 2 random vectors. I take that array column, add one of the vectors, subtract the other, and make that new vector the next column of the array. I continue on until I finish with the array.
I already asked the following question - Cupy slower than numpy when iterating through array. But this is different, in that I believe I followed the advice of parallelizing the operation and having one "for loop" instead of two, and iterating only through the array columns instead of both rows and columns.
import cupy as cp
import time
#import numpy as cp
def row_size(array):
return(array.shape[1])
def number_of_rows(array):
return(array.shape[0])
x = (cp.zeros((200,200), 'f'))
#x = cp.zeros((200,200))
x[:,1] = 500000
vector_one = x * 0
vector_two = x * 0
start = time.time()
for i in range(number_of_rows(x) - 1):
if sum(x[ :, i])!=0:
vector_one[ :, i + 1], vector_two[ :, i+ 1] = cp.random.poisson(.01*x[:,i],len(x[:,i])), cp.random.poisson(.01 * x[:,i],len(x[:,i]))
x[ :, i+ 1] = x[ :, i] + vector_one[ :, i+ 1] - vector_two[ :, i+ 1]
time = time.time() - start
print(x)
print(time)
When I run this in cupy, the time comes out to about .62 seconds.
When I switch to numpy, so I 1) uncomment #import numpy as cp and #x = cp.zeros((200,200)) and 2) instead comment import cupy as cp
and x = (cp.zeros((200,200), 'f')):
The time comes out to about .11 seconds.
I thought maybe if I increase the array size, for example from (200,200) to (2000,2000), then I'd see a difference in cupy being faster, but it's still slower.
I know this is working properly, in a sense, because if I change the coefficient in cp.random.poisson from .01 to .5, I can only do that in cupy because that lambda is too large for numpy.
But still, how do I make it actually faster with cupy?
In general, looping on the host (CPU) and iteratively processing small device (GPU) arrays isn't ideal due to the larger number of separate kernels you will have to launch than in a columnar-oriented approach. However, sometimes a columnar-oriented approach just isn't feasible.
You can speed up your CuPy code by using CuPy's sum instead of using Python's built-in sum operation, which is forcing a device to host transfer each time you call it. With that said, you can also speed up your NumPy code by switching to NumPy's sum.
import cupy as cp
import time
#import numpy as cp
def row_size(array):
return(array.shape[1])
def number_of_rows(array):
return(array.shape[0])
x = (cp.zeros((200,200), 'f'))
#x = cp.zeros((200,200))
x[:,1] = 500000
vector_one = x * 0
vector_two = x * 0
start = time.time()
for i in range(number_of_rows(x) - 1):
# if sum(x[ :, i]) !=0:
if x[ :, i].sum() !=0: # or you could do: if x[ :, i].sum().get() !=0:
vector_one[ :, i + 1], vector_two[ :, i+ 1] = cp.random.poisson(.01*x[:,i],len(x[:,i])), cp.random.poisson(.01 * x[:,i],len(x[:,i]))
x[ :, i+ 1] = x[ :, i] + vector_one[ :, i+ 1] - vector_two[ :, i+ 1]
cp.cuda.Device().synchronize() # CuPy is asynchronous, but this doesn't really affect the timing here.
t = time.time() - start
print(x)
print(t)
[[ 0. 500000. 500101. ... 498121. 497922. 497740.]
[ 0. 500000. 499894. ... 502050. 502174. 502112.]
[ 0. 500000. 499989. ... 501703. 501836. 502081.]
...
[ 0. 500000. 499804. ... 499600. 499526. 499371.]
[ 0. 500000. 499923. ... 500371. 500184. 500247.]
[ 0. 500000. 500007. ... 501172. 501113. 501254.]]
0.06389498710632324
This small change should make your workflow much faster (0.06 vs 0.6 seconds originally on my T4 GPU). Note that the .get() method in the comment is used to explicitly transfer the result of the sum operation from the GPU to the CPU before the not equal comparison. This isn't necessary, as CuPy knows how to handle logical operations, but would give you a very tiny additional speedup.

How to create index combinations (k out of n) as sparse bitmasks for numpy

For numpy how can I efficiently create
an array/matrix representing a list of all combinations (k out of n) as lists of k indices. The shape would be (binomial(n, k), k).
a sparse array/matrix representing this combinations as bitmasks of length n. (So expanding aboves indices to bitmask.) The shape would be (binomial(n, k), n).
I need to do this with large n (and maybe small k). So the algorithm should be
time efficient (e.g. maybe allocate complete result space at once before filling it?)
space efficient (e.g. sparse bitmasks)
Many Thanks for your help.
Assuming the blowup is not that bad (as mentioned in the comment above), you might try this. It's pretty vectorized and should be fast (for cases which could be handled).
Edit: i somewhat assumed you are interested in an output based on scipy.sparse. Maybe you are not.
Code
import itertools
import numpy as np
import scipy.sparse as sp
def combs(a, r):
"""
Return successive r-length combinations of elements in the array a.
Should produce the same output as array(list(combinations(a, r))), but
faster.
"""
a = np.asarray(a)
dt = np.dtype([('', a.dtype)]*r)
b = np.fromiter(itertools.combinations(a, r), dt)
b_ = b.view(a.dtype).reshape(-1, r)
return b_
def sparse_combs(k, n):
combs_ = combs(np.arange(n), k)
n_bin = combs_.shape[0]
spmat = sp.coo_matrix(( np.ones(n_bin*k),
(np.repeat(np.arange(n_bin), k),
combs_.ravel()) ),
shape=(n_bin, n))
return spmat
print('dense')
print(combs(range(4), 3))
print('sparse (dense for print)')
print(sparse_combs(3, 4).todense())
Output
dense
[[0 1 2]
[0 1 3]
[0 2 3]
[1 2 3]]
sparse (dense for print)
[[ 1. 1. 1. 0.]
[ 1. 1. 0. 1.]
[ 1. 0. 1. 1.]
[ 0. 1. 1. 1.]]
The helper-function combs i took (probably) from this question (sometime in the past).
Small (unscientific) timing:
from time import perf_counter as pc
start = pc()
spmat = sparse_combs(5, 50)
time_used = pc() - start
print('secs: ', time_used)
print('nnzs: ', spmat.nnz)
#secs: 0.5770790778094155
#nnzs: 10593800
(3, 500)
#secs: 3.4843752405405497
#nnzs: 62125500

How does tf.multinomial work?

How does tf.multinomial work? Here is stated that it "Draws samples from a multinomial distribution". What does that mean?
If you perform an experiment n times that can have only two outcomes (either success or failure, head or tail, etc.), then the number of times you obtain one of the two outcomes (success) is a binomial random variable.
In other words, If you perform an experiment that can have only two outcomes (either success or failure, head or tail, etc.), then a random variable that takes value 1 in case of success and value 0 in case of failure is a Bernoulli random variable.
If you perform an experiment n times that can have K outcomes (where K can be any natural number) and you denote by X_i the number of times that you obtain the i-th outcome, then the random vector X defined as
X = [X_1, X_2, X_3, ..., X_K]
is a multinomial random vector.
In other words, if you perform an experiment that can have K outcomes and you denote by X_i a random variable that takes value 1 if you obtain the i-th outcome and 0 otherwise, then the random vector X defined as
X = [X_1, X_2, X_3, ..., X_K]
is a Multinoulli random vector. In other words, when the i-th outcome is obtained, the i-th entry of the Multinoulli random vector X takes value 1, while all other entries take value 0.
So, a multinomial distribution can be seen as a sum of mutually independent Multinoulli random variables.
And the probabilities of the K possible outcomes will be denoted by
p_1, p_2, p_3, ..., p_K
An example in Tensorflow,
In [171]: isess = tf.InteractiveSession()
In [172]: prob = [[.1, .2, .7], [.3, .3, .4]] # Shape [2, 3]
...: dist = tf.distributions.Multinomial(total_count=[4., 5], probs=prob)
...:
...: counts = [[2., 1, 1], [3, 1, 1]]
...: isess.run(dist.prob(counts)) # Shape [2]
...:
Out[172]: array([ 0.0168 , 0.06479999], dtype=float32)
Note: The Multinomial is identical to the
Binomial distribution when K = 2. For more detailed information please refer either tf.compat.v1.distributions.Multinomial or the latest docs of tensorflow_probability.distributions.Multinomial

tensorflow reduce_mean with multidimension second argument

I met the usage of the reduce_mean with the vector as the second arguments. I looked through sensor flow manual but can't find the corresponding example. The codes are below:
tf.reduce_mean(train, [0,1,2]
where train is at size batchsize x H x L x 2
I also played with some experiments but can't figure out how this second vector input will be processed
tensor = tf.constant([[[2,2,4],[2,2,0]],[[2,2,0],[2,2,0]]])
trainenergy = tf.reduce_mean(tensor, [0,1,2])
Output = 1
tensor = tf.constant([[[2,2,4],[2,2,0]],[[2,2,0],[2,2,0]]])
trainenergy = tf.reduce_mean(tensor, [0])
Output = [[2 2 2]
[2 2 0]]
tensor = tf.constant([[[2,2,4],[2,2,0]],[[2,2,0],[2,2,0]]])
trainenergy = tf.reduce_mean(tensor, [0,1])
Output = [2 2 1]
Just figure out tf.reduce_mean(train, [0,1,2]) if the second argument is the vector. It will reduce the dimension as the order of the element is the vector. For example, the [0,1,2] will reduce along the axis of 0,1,2