How does tf.multinomial work? - tensorflow

How does tf.multinomial work? Here is stated that it "Draws samples from a multinomial distribution". What does that mean?

If you perform an experiment n times that can have only two outcomes (either success or failure, head or tail, etc.), then the number of times you obtain one of the two outcomes (success) is a binomial random variable.
In other words, If you perform an experiment that can have only two outcomes (either success or failure, head or tail, etc.), then a random variable that takes value 1 in case of success and value 0 in case of failure is a Bernoulli random variable.
If you perform an experiment n times that can have K outcomes (where K can be any natural number) and you denote by X_i the number of times that you obtain the i-th outcome, then the random vector X defined as
X = [X_1, X_2, X_3, ..., X_K]
is a multinomial random vector.
In other words, if you perform an experiment that can have K outcomes and you denote by X_i a random variable that takes value 1 if you obtain the i-th outcome and 0 otherwise, then the random vector X defined as
X = [X_1, X_2, X_3, ..., X_K]
is a Multinoulli random vector. In other words, when the i-th outcome is obtained, the i-th entry of the Multinoulli random vector X takes value 1, while all other entries take value 0.
So, a multinomial distribution can be seen as a sum of mutually independent Multinoulli random variables.
And the probabilities of the K possible outcomes will be denoted by
p_1, p_2, p_3, ..., p_K
An example in Tensorflow,
In [171]: isess = tf.InteractiveSession()
In [172]: prob = [[.1, .2, .7], [.3, .3, .4]] # Shape [2, 3]
...: dist = tf.distributions.Multinomial(total_count=[4., 5], probs=prob)
...:
...: counts = [[2., 1, 1], [3, 1, 1]]
...: isess.run(dist.prob(counts)) # Shape [2]
...:
Out[172]: array([ 0.0168 , 0.06479999], dtype=float32)
Note: The Multinomial is identical to the
Binomial distribution when K = 2. For more detailed information please refer either tf.compat.v1.distributions.Multinomial or the latest docs of tensorflow_probability.distributions.Multinomial

Related

How can one utilize the indices provided by torch.topk()?

Suppose I have a pytorch tensor x of shape [N, N_g, 2]. It can be viewed as N * N_g 2d vectors. Specifically, x[i, j, :] is the 2d vector of the jth group in the ith batch.
Now I am trying to get the coordinates of vectors of top 5 length in each group. So I tried the following:
(i) First I used x_len = (x**2).sum(dim=2).sqrt() to compute their lengths, resulting in x_len.shape==[N, N_g].
(ii) Then I used tk = x_len.topk(5) to get the top 5 lengths in each group.
(iii) The desired output would be a tensor x_top5 of shape [N, 5, 2]. Naturally I thought of using tk.indices to index x so as to obtain x_top5. But I failed as it seems such indexing is not supported.
How can I do this?
A minimal example:
x = torch.randn(10,10,2) # N=10 is the batchsize, N_g=10 is the group size
x_len = (x**2).sum(dim=2).sqrt()
tk = x_len.topk(5)
x_top5 = x[tk.indices]
print(x_top5.shape)
# torch.Size([10, 5, 10, 2])
However, this gives x_top5 as a tensor of shape [10, 5, 10, 2], instead of [10, 5, 2] as desired.

Bernoulli random number generator

I cannot understand how Bernoulli Random Number generator used in numpy is calculated and would like some explanation on it. For example:
np.random.binomial(size=3, n=1, p= 0.5)
Results:
[1 0 0]
n = number of trails
p = probability of occurrence
size = number of experiments
With how do I determine the generated numbers/results of "0" or "1"?
=================================Update==================================
I created a Restricted Boltzmann Machine which always presents the same results despite being "random" on multiple code executions. The randomize is seeded using
np.random.seed(10)
import numpy as np
np.random.seed(10)
def sigmoid(u):
return 1/(1+np.exp(-u))
def gibbs_vhv(W, hbias, vbias, x):
f_s = sigmoid(np.dot(x, W) + hbias)
h_sample = np.random.binomial(size=f_s.shape, n=1, p=f_s)
f_u = sigmoid(np.dot(h_sample, W.transpose())+vbias)
v_sample = np.random.binomial(size=f_u.shape, n=1, p=f_u)
return [f_s, h_sample, f_u, v_sample]
def reconstruction_error(f_u, x):
cross_entropy = -np.mean(
np.sum(
x * np.log(sigmoid(f_u)) + (1 - x) * np.log(1 - sigmoid(f_u)),
axis=1))
return cross_entropy
X = np.array([[1, 0, 0, 0]])
#Weight to hidden
W = np.array([[-3.85, 10.14, 1.16],
[6.69, 2.84, -7.73],
[1.37, 10.76, -3.98],
[-6.18, -5.89, 8.29]])
hbias = np.array([1.04, -4.48, 2.50]) #<= 3 bias for 3 neuron in hidden
vbias = np.array([-6.33, -1.68, -1.25, 3.45]) #<= 4 bias for 4 neuron in input
k = 2
v_sample = X
for i in range(k):
[f_s, h_sample, f_u, v_sample] = gibbs_vhv(W, hbias, vbias, v_sample)
start = v_sample
if i < 2:
print('f_s:', f_s)
print('h_sample:', h_sample)
print('f_u:', f_u)
print('v_sample:', v_sample)
print(v_sample)
print('iter:', i, ' h:', h_sample, ' x:', v_sample, ' entropy:%.3f'%reconstruction_error(f_u, v_sample))
Results:
[[1 0 0 0]]
f_s: [[ 0.05678618 0.99652957 0.97491304]]
h_sample: [[0 1 1]]
f_u: [[ 0.99310473 0.00139984 0.99604968 0.99712837]]
v_sample: [[1 0 1 1]]
[[1 0 1 1]]
iter: 0 h: [[0 1 1]] x: [[1 0 1 1]] entropy:1.637
f_s: [[ 4.90301318e-04 9.99973278e-01 9.99654440e-01]]
h_sample: [[0 1 1]]
f_u: [[ 0.99310473 0.00139984 0.99604968 0.99712837]]
v_sample: [[1 0 1 1]]
[[1 0 1 1]]
iter: 1 h: [[0 1 1]] x: [[1 0 1 1]] entropy:1.637
I am asking on how the algorithm works to produce the numbers. – WhiteSolstice 35 mins ago
Non-technical explanation
If you pass n=1 to the Binomial distribution it is equivalent to the Bernoulli distribution. In this case the function could be thought of simulating coin flips. size=3 tells it to flip the coin three times and p=0.5 makes it a fair coin with equal probabilitiy of head (1) or tail (0).
The result of [1 0 0] means the coin came down once with head and twice with tail facing up. This is random, so running it again would result in a different sequence like [1 1 0], [0 1 0], or maybe even [1 1 1]. Although you cannot get the same number of 1s and 0s in three runs, on average you would get the same number.
Technical explanation
Numpy implements random number generation in C. The source code for the Binomial distribution can be found here. Actually two different algorithms are implemented.
If n * p <= 30 it uses inverse transform sampling.
If n * p > 30 the BTPE algorithm of (Kachitvichyanukul and Schmeiser 1988) is used. (The publication is not freely available.)
I think both methods, but certainly the inverse transform sampling, depend on a random number generator to produce uniformly distributed random numbers. Numpy internally uses a Mersenne Twister pseudo random number generator. The uniform random numbers are then transformed into the desired distribution.
A Binomially distributed random variable has two parameters n and p, and can be thought of as the distribution of the number of heads obtained when flipping a biased coin n times, where the probability of getting a head at each flip is p. (More formally it is a sum of independent Bernoulli random variables with parameter p).
For instance, if n=10 and p=0.5, one could simulate a draw from Bin(10, 0.5) by flipping a fair coin 10 times and summing the number of times that the coin lands heads.
In addition to the n and p parameters described above, np.random.binomial has an additional size parameter. If size=1, np.random.binomial computes a single draw from the Binomial distribution. If size=k for some integer k, k independent draws from the same Binomial distribution will be computed. size can also be an array of indices, in which case a whole np.array with the given size will be filled with independent draws from the Binomial distribution.
Note that the Binomial distribution is a generalisation of the Bernoulli distribution - in the case that n=1, Bin(n,p) has the same distribution as Ber(p).
For more information about the binomial distribution see: https://en.wikipedia.org/wiki/Binomial_distribution

How does tf.nn.moments calculate variance?

Look at the test example:
import tensorflow as tf
x = tf.constant([[1,2],[3,4],[5,6]])
mean, variance = tf.nn.moments(x, [0])
with tf.Session() as sess:
m, v = sess.run([mean, variance])
print(m, v)
The output is:
[3 4]
[2 2]
We want to calculate variance along the axis 0, the first column is [1,3,5], and mean = (1+3+5)/3=3, it is right, the variance = [(1-3)^2+(3-3)^2+(5-3)^2]/3=2.6666, but the output is 2, who can tell me how tf.nn.moments calculates variance?
By the way, view the API DOC, what does shift do?
The problem is that x is an integer tensor and TensorFlow, instead of forcing a conversion, performs the computation as good as it can without changing the type (so the outputs are also integers). You can pass float numbers in the construction of x or specify the dtype parameter of tf.constant:
x = tf.constant([[1,2],[3,4],[5,6]], dtype=tf.float32)
Then you get the expected result:
import tensorflow as tf
x = tf.constant([[1,2],[3,4],[5,6]], dtype=tf.float32)
mean, variance = tf.nn.moments(x, [0])
with tf.Session() as sess:
m, v = sess.run([mean, variance])
print(m, v)
>>> [ 3. 4.] [ 2.66666675 2.66666675]
About the shift parameter, it seems to allow you specify a value to, well, "shift" the input. By shift they mean subtract, so if your input is [1., 2., 4.] and you give a shift of, say, 2.5, TensorFlow would first subtract that amount and compute the moments from [-1.5, 0.5, 1.5]. In general, it seems safe to just leave it as None, which will perform a shift by the mean of the input, but I suppose there may be cases where giving a predetermined shift value (e.g. if you know or have an approximate idea of the mean of the input) may yield better numerical stability.
# Replace the following line with correct data dtype
x = tf.constant([[1,2],[3,4],[5,6]])
# suppose you don't want tensorflow to trim the decimal then use float data type.
x = tf.constant([[1,2],[3,4],[5,6]], dtype=tf.float32)
Results: array([ 2.66666675, 2.66666675], dtype=float32)
Note: from the original implementation shift is not used

How does tensorflow batch_matmul work?

Tensorflow has a function called batch_matmul which multiplies higher dimensional tensors. But I'm having a hard time understanding how it works, perhaps partially because I'm having a hard time visualizing it.
What I want to do is multiply a matrix by each slice of a 3D tensor, but I don't quite understand what the shape of tensor a is. Is z the innermost dimension? Which of the following is correct?
I would most prefer the first to be correct -- it's most intuitive to me and easy to see in the .eval() output. But I suspect the second is correct.
Tensorflow says that batch_matmul performs:
out[..., :, :] = matrix(x[..., :, :]) * matrix(y[..., :, :])
What does that mean? What does that mean in the context of my example? What is being multiplied with with what? And why aren't I getting a 3D tensor the way I expected?
You can imagine it as doing a matmul over each training example in the batch.
For example, if you have two tensors with the following dimensions:
a.shape = [100, 2, 5]
b.shape = [100, 5, 2]
and you do a batch tf.matmul(a, b), your output will have the shape [100, 2, 2].
100 is your batch size, the other two dimensions are the dimensions of your data.
First of all tf.batch_matmul() was removed and no longer available. Now you suppose to use tf.matmul():
The inputs must be matrices (or tensors of rank > 2, representing
batches of matrices), with matching inner dimensions, possibly after
transposition.
So let's assume you have the following code:
import tensorflow as tf
batch_size, n, m, k = 10, 3, 5, 2
A = tf.Variable(tf.random_normal(shape=(batch_size, n, m)))
B = tf.Variable(tf.random_normal(shape=(batch_size, m, k)))
tf.matmul(A, B)
Now you will receive a tensor of the shape (batch_size, n, k). Here is what is going on here. Assume you have batch_size of matrices nxm and batch_size of matrices mxk. Now for each pair of them you calculate nxm X mxk which gives you an nxk matrix. You will have batch_size of them.
Notice that something like this is also valid:
A = tf.Variable(tf.random_normal(shape=(a, b, n, m)))
B = tf.Variable(tf.random_normal(shape=(a, b, m, k)))
tf.matmul(A, B)
and will give you a shape (a, b, n, k)
You can now do it using tf.einsum, starting from Tensorflow 0.11.0rc0.
For example,
M1 = tf.Variable(tf.random_normal([2,3,4]))
M2 = tf.Variable(tf.random_normal([5,4]))
N = tf.einsum('ijk,lk->ijl',M1,M2)
It multiplies the matrix M2 with every frame (3 frames) in every batch (2 batches) in M1.
The output is:
[array([[[ 0.80474716, -1.38590837, -0.3379252 , -1.24965811],
[ 2.57852983, 0.05492432, 0.23039417, -0.74263287],
[-2.42627382, 1.70774114, 1.19503212, 0.43006262]],
[[-1.04652011, -0.32753903, -1.26430523, 0.8810069 ],
[-0.48935518, 0.12831448, -1.30816901, -0.01271309],
[ 2.33260512, -1.22395933, -0.92082584, 0.48991606]]], dtype=float32),
array([[ 1.71076882, 0.79229093, -0.58058828, -0.23246667],
[ 0.20446332, 1.30742455, -0.07969904, 0.9247328 ],
[-0.32047141, 0.66072595, -1.12330854, 0.80426538],
[-0.02781649, -0.29672042, 2.17819595, -0.73862702],
[-0.99663496, 1.3840003 , -1.39621222, 0.77119476]], dtype=float32),
array([[[ 0.76539308, 2.77609682, -1.79906654, 0.57580602, -3.21205115],
[ 4.49365759, -0.10607499, -1.64613271, 0.96234947, -3.38823152],
[-3.59156275, 2.03910899, 0.90939498, 1.84612727, 3.44476724]],
[[-1.52062428, 0.27325237, 2.24773455, -3.27834225, 3.03435063],
[ 0.02695178, 0.16020992, 1.70085776, -2.8645196 , 2.48197317],
[ 3.44154787, -0.59687197, -0.12784094, -2.06931567, -2.35522676]]], dtype=float32)]
I have verified, the arithmetic is correct.
tf.tensordot should solve this problem. It supports batch operations, e.g., if you want to contract a 2D tensor with a 3D tensor, with the latter having a batch dimension.
If a is shape [n,m] b is shape [?,m,l], then
y = tf.tensordot(b, a, axes=[1, 1]) will produce a tensor of shape [?,n,l]
https://www.tensorflow.org/api_docs/python/tf/tensordot
It is simply like splitting on the first dimension respectively, multiply and concat them back. If you want to do 3D by 2D, you can reshape, multiply, and reshape it back. I.e. [100, 2, 5] -> [200, 5] -> [200, 2] -> [100, 2, 2]
The answer to this particular answer is using tf.scan function.
If a = [5,3,2] #dimension of 5 batch, with 3X2 mat in each batch
and b = [2,3] # a constant matrix to be multiplied with each sample
then let def fn(a,x):
return tf.matmul(x,b)
initializer = tf.Variable(tf.random_number(3,3))
h = tf.scan(fn,outputs,initializer)
this h will store all the outputs.

numpy.fft.fft not computing dft at frequencies given by numpy.fft.fftfreq?

This is a mathematical question, but it is tied to the numpy implementation, so I decided to ask it at SO. Perhaps I'm hugely misunderstanding something, but if so I would like to be put straight.
numpy.ftt.ftt computes DFT according to equation:
numpy.ftt.fftfreq is supposed to return frequencies at which DFT was computed.
Say we have:
x = [0, 0, 1, 0, 0]
X = np.fft.fft(x)
freq = np.fft.fftfreq(5)
Then for signal x, its DFT transformation is X, and frequencies at which X is computed are given by freq. For example X[0] is DFT of x at frequency freq[0], X[1] is DFT of x at frequency freq[1], and so on.
But when I compute DFT of a simple signal by hand with the formula quoted above, my results indicate that X[1] is DFT of x at frequency 1, not at freq[1], X[2] is DFT of x at frequency 2, etc, not at freq[2], etc.
As an example:
In [32]: x
Out[32]: [0, 0, 1, 0, 0]
In [33]: X
Out[33]:
array([
1.00000000+0.j,
-0.80901699-0.58778525j,
0.30901699+0.95105652j, 0.30901699-0.95105652j,
-0.80901699+0.58778525j])
In [34]: freq
Out[34]: array([ 0. , 0.2, 0.4, -0.4, -0.2])
If I compute DFT of above signal for k = 0.2 (or freq[1]), I get
X at freq = 0.2: 0.876 - 0.482j, which isn't X[1].
If however I compute for k = 1 I get the same results as are in X[1] or -0.809 - 0.588j.
So what am I misunderstanding? If numpy.fft.fft(x)[n] is a DFT of x at frequency n, not at frequency numpy.fft.fttfreq(len(x))[n], what is the purpose of numpy.fft.fttfreq?
I think that because the values in the array returned by the numpy.fft.fttfreq are equal to the (k/n)*sampling frequency.
The frequencies of the dft result are equal to k/n divided by the time spacing, because the periodic function's period's amplitude will become the inverse of the original value after fft. You can consider the digital signal function is a periodic sampling function convoluted by the analog signal function. The convolution in time domain means multiplication in frequency domain, so that the time spacing of the input data will affect the frequency spacing of the dft result and the frequency spacing's value will become the original one divided by the time spacing. Originally, the frequency spacing of the dft result is equal to 1/n when the time spacing is equal to 1. So after the dft, the frequency spacing will become 1/n divided by the time spacing, which eqauls to 1/n multiplied by the sampling frequency.
To calculate that, the numpy.fft.fttfreq has two arguments, the length of the input and time spacing, which means the inverse of the sampling rate. The length of the input is equal to n, and the time spacing is equal to the value which the result k/n divided by (Default is 1.)
I have tried to let k = 2, and the result is equal to the X[2] in your example. In this situation, the k/n*1 is equal to the freq[2].
The DFT is a dimensionless basis transform or matrix multiplication. The output or result of a DFT has nothing to do with frequencies unless you know the sampling rate represented by the input vector (samples per second, per meter, per radian, etc.)
You can compute a Goertzel filter of the same length N with k=0.2, but that result isn't contained in an DFT or FFT result of length N. A DFT only contains complex Goertzel filter results for integer k values. And to get from k to the frequency represented by X[k], you need to know the sample rate.
Yours is not a SO question
You wrote
If I compute DFT of above signal for k = 0.2 .
and I reply "You shouldn't"... the DFT can be meaningfully computed only for integer values of k.
The relationship between an index k and a frequency is given by f_k = k Δf or, if you prefer circular frequencies, ω_k = k Δω where Δf = 1/T and Δω = 2πΔf, T being the period of the signal.
The arguments of fftfreq are a bit misleading... the required one is the number of samples n and the optional argument is the sampling interval, by default d=1.0, but at any rate T=n*d and Δf = 1/(n*d)
>>> fftfreq(5) # d=1
array([ 0. , 0.2, 0.4, -0.4, -0.2])
>>> fftfreq(5,2)
array([ 0. , 0.1, 0.2, -0.2, -0.1])
>>> fftfreq(5,10)
array([ 0. , 0.02, 0.04, -0.04, -0.02])
and the different T are 5,10,50 and the respective df are -.2,0.1,0.02 as (I) expected.
Why fftfreq doesn't simply require the signal's period? because it is mainly intended as an helper in demangling the Nyquist frequency issue.
As you know, the DFT is periodic, for a signal x of length N you have that
DFT(x,k) is equal to DFT(x,k+mN) where m is an integer.
This imply that there are only N/2 positive and N/2 negative distinct frequencies and that, when N/2<k<N, the frequency that must be associated with k in the most meaningful way is not k df but (k-N) df.
To perform this, fftfreq needs more information that the period T, hence the choice of requiring n and computing df from an assumption on sampling interval.