Numpy bivariate normal distribution with correlation = 1 - numpy

Consider X and Y to be marginally standard normal with correlation 1.0.
When the correlation is 1.0, the bivariate normal distribution is undefined (it's technically the y = x line), but numpy still prints out values. Why does it do this?

Oh, but the distribution is defined! It just doesn't have a well-defined density function. (At least, not with respect to the Lesbegue measure on the 2D space.) (See Mathematics Stack Exchange's discussion on broader groups of such distributions.) So numpy is doing nothing wrong.
What you're describing is the degenerate case of the bivariate (or more generally, multivariate) normal distribution. This occurs when the covariance matrix is not positive definite. However, the distribution is defined for any positive semi-definite covariance matrix.
As an example, the matrix [[1, 1], [1, 1]] is positive not definite but is positive semidefinite.
The distribution still has a host of other properties that distributions should: a support (the real line, as you note: μ + span(Σ)), moments, and more.
import numpy as np
np.random.multivariate_normal(mean=[0, 0], cov=[[1, 1], [1, 1]])
# array([0.61156886, 0.61156887])
In summary, numpy's behavior isn't broken. It's well-behaved by returning samples from a properly specified distribution.

Related

Why the second dimension in a Numpy array is empty?

Why the output here
array = np.arange(3)
array.shape
is
(3,)
and not
(1,3)
What does the missing dimension means or equals?
In case there's confusion, (3,) doesn't mean there's a missing dimension. The comma is part of the standard Python notation for a single element tuple. Shapes (1,3), (3,), and (3,1) are distinct,
While they can contain the same 3 elements, their use in calculations (broadcasting) is different, their print format is different, and their list equivalent is different:
In [21]: np.array([1,2,3])
Out[21]: array([1, 2, 3])
In [22]: np.array([1,2,3]).tolist()
Out[22]: [1, 2, 3]
In [23]: np.array([1,2,3]).reshape(1,3).tolist()
Out[23]: [[1, 2, 3]]
In [24]: np.array([1,2,3]).reshape(3,1).tolist()
Out[24]: [[1], [2], [3]]
And we don't have to stop at adding just one singleton dimension:
In [25]: np.array([1,2,3]).reshape(1,3,1).tolist()
Out[25]: [[[1], [2], [3]]]
In [26]: np.array([1,2,3]).reshape(1,3,1,1).tolist()
Out[26]: [[[[1]], [[2]], [[3]]]]
In numpy an array can have 0, 1, 2 or more dimensions. 1 dimension is just as logical as 2.
In MATLAB a matrix always has 2 dim (or more), but it doesn't have to be that way. Strictly speaking MATLAB doesn't even have scalars. An array with shape (3,) is missing a dimension only if MATLAB is taken as the standard.
numpy is built on Python which as scalars, and lists (which can nest). How many dimensions does a Python list have?
If you want to get into history, MATLAB was developed as a front end to a set of Fortran linear algebra routines. Given the problems those routines solved the concept of matrix with 2 dimensions, and row vs column vectors made sense. It wasn't until version 3.something that MATLAB was generalized to allow more than 2 dimensions (in the late 1990s).
numpy is based on several attempts to provide arrays to Python (e.g. numeric). Those developers took a more general approach to arrays, one where 2d was an artificial constraint. That has precedence in computer languages and mathematics (and physics). APL was developed in the 1960s, first as a mathematical notation, and then as a computer language. Like numpy its arrays can be 0d or higher. (Since I used APL before I used MATLAB, the numpy approach feels quite natural.)
In APL there aren't separate lists or tuples. So the shape of an array, rho A is itself an array, and rho rho A is the number of dimensions of A, also called the rank.
http://docs.dyalog.com/14.0/Dyalog%20APL%20Idioms.pdf

Compact and natural way to write matrix product of vectors in Numpy

In scientific computing I often want to do vector multiplications like
a x b^T
with a and b being row vectors and b^T is the transpose of the vector. So if a and b are of shape [n, 1] and [m, 1], the resulting matrix has shape [n, m]
Is there a good and straight forward way to write this multiplication in numpy?
Example:
a = np.array([1,2,3])
b = np.array([4,5,6,7])
Adding axes manually works:
a[:,np.newaxis] # b[np.newaxis,:]
and gives the correct result:
[[ 4 5 6 7]
[ 8 10 12 14]
[12 15 18 21]]
Einstein notation would be another way, but still somewhat weird.
np.einsum('a,b->ab', a,b)
What I was hoping to work, but doesn't work, is the following:
a # b.T
Any other approaches to do this?
In MATLAB matrix multiplication is the norm, using *. Element wise multiplication uses .* operator. Also matrices are atleast 2d.
In numpy, elementwise multiplication uses *. Matrix multiplication is done with np.dot (or its method), and more recently with the # operator (np.matmul). numpy adds broadcasting, which gives the elementwise multiplication a lot more expresiveness.
With your 2 examples arrays, of shape (3,) and (4,) the options of making a (3,4) outer product https://en.wikipedia.org/wiki/Outer_product include:
np.outer(a,b)
np.einsum('i,j->ij, a, b) # matching einstein index notation
a[:,None] * b # the most idiomatic numpy expression
This last works because of broadcasting. a[:, None], like a.reshape(-1,1) turns the (3,) array into a (3,1). b[None, :] turns a (4,) into (1,4). But broadcasting can perform this upgrade automatically (and unambiguously).
(3,1) * (4,) => (3,1) * (1,4) => (3,4)
Broadcasting does not work with np.dot. So we need
a[:, None].dot(b[None, :]) # (3,1) dot with (1,4)
The key with dot is that the last dim of a pairs with the 2nd to last of b. (np.dot also works with 2 matching 1d arrays, performing the conventional vector dot product).
# (matmul) introduces an operator that works like dot, at least in the 2d with 2d case. With higher dimensional arrays they work differently.
a[:,None].dot(b[None,:])
np.dot(a[:,None], b[None,:])
a[:,None] # b[None,:]
a[:,None] # b[:,None].T
and the reshape equivalents all create the desired (3,4) array.
np.tensordot can handle other dimensions combinations, but it works by reshaping and transposing the inputs, so in the end it can pass them to dot. It then transforms the result back into desired shape.
Quick time tests show that np.dot versions tend to be fastest - because they delegate the action to fast BLAS like libraries. For the other versions, the delegation is a bit more indirect, or they use numpy's own compiled code.
In the comments, multiple solutions were proposed, which I summarize here:
np.outer(a,b), which basically reformulates this multiplicaten as a set problem (thanks to Brenlla)
a[:,np.newaxis]*b (thanks to Divakar)
a.reshape((-1,1)) # b.reshape((-1,1)).T or just as well
a.reshape((-1,1)) # b.reshape((1,-1)) . It is a bit long, but
shows that these numpy matrix operations actually need matrices as
inputs, not only vectors (thanks to Warren Weckesser and
heltonbiker)
For completeness, my previous already working examples:
a[:,np.newaxis] # b[np.newaxis,:]
np.einsum('a,b->ab', a,b)
Remark: To reduce the number of characters even more, one can use None instead of np.newaxis.

How does the gradient of the sum trick work to get maxpooling positions in keras?

The keras examples directory contains a lightweight version of a stacked what-where autoencoder (SWWAE) which they train on MNIST data. (https://github.com/fchollet/keras/blob/master/examples/mnist_swwae.py)
In the original SWWAE paper, the authors compute the what and where using soft functions. However, in the keras implementation, they use a trick to get these locations. I would like to understand this trick.
Here is the code of the trick.
def getwhere(x):
''' Calculate the 'where' mask that contains switches indicating which
index contained the max value when MaxPool2D was applied. Using the
gradient of the sum is a nice trick to keep everything high level.'''
y_prepool, y_postpool = x
return K.gradients(K.sum(y_postpool), y_prepool) # How exactly does this line work?
Where y_prepool is a MxN matrix and y_postpool is a M/2 x N/2 matrix (lets assume canonical pooling of a size 2 pixels).
I have verified that the output of getwhere() is a bed of nails matrix where the nails indicate the position of the max (the local argmax if you will).
Can someone construct a small example demonstrating how getwhere works using this "Trick?"
Lets focus on the simplest example, without really talking about convolutions, say we have a vector
x = [1 4 2]
which we max-pool over (with a single, big window), we get
mx = 4
mathematically speaking, it is:
mx = x[argmax(x)]
now, the "trick" to recover one hot mask used by pooling is to do
magic = d mx / dx
there is no gradient for argmax, however it "passes" the corresponding gradient to an element in a vector at the location of maximum element, so:
d mx / dx = [0/dx[1] dx[2]/dx[2] 0/dx[3]] = [0 1 0]
as you can see, all the gradient for non-maximum elements are zero (due to argmax), and "1" appears at the maximum value because dx/x = 1.
Now for "proper" maxpool you have many pooling regions, connected to many input locations, thus taking analogous gradient of sum of pooled values, will recover all the indices.
Note however, that this trick will not work if you have heavily overlapping kernels - you might end up with bigger values than "1". Basically if a pixel is max-pooled by K kernels, than it will have value K, not 1, for example:
[1 ,2, 3]
x = [13,3, 1]
[4, 2, 9]
if we max pool with 2x2 window we get
mx = [13,3]
[13,9]
and the gradient trick gives you
[0, 0, 1]
magic = [2, 0, 0]
[0, 0, 1]

Does Numpy have an inbuilt elementwise matrix modular exponentiation implementation

Does numpy have an inbuilt implementation for modular exponentation of matrices?
(As pointed out by user2357112, I am actually looking for element wise modular reduction)
One way modular exponentiation on regular numbers is done is with Exponentiation by Squaring (https://en.wikipedia.org/wiki/Exponentiation_by_squaring), with a modular reduction taken at each step. I am wondering if there is a similar inbuilt solution for matrix multiplication. I am aware I can write code to emulate this easily, but I am wondering if there is an inbuilt solution.
Modular exponentiation is not currently built in NumPy (GitHub issue). The easiest/laziest way to achieve it is frompyfunc:
modexp = np.frompyfunc(pow, 3, 1)
print(modexp(np.array([[1, 2], [3, 4]]), 2, 3).astype(int))
prints
[[1 1]
[0 1]]
This is of course slower than native NumPy would be, and we get an array with dtype=object (hence astype(int) is added).

How to properly sample from a numpy.random.multivariate_normal (positive-semidefinite covariance matrix issue)

I'm hoping to generate new "fake" data from the data I already have with numpy.random.multivariate_normal.
With n samples and d features in an n x d pandas DataFrame:
means = data.mean(axis=0)
covariances = data.cov()
variances = data.var()
means.shape, covariances.shape, variances.shape
>>> ((16349,), (16349, 16349), (16349,))
This looks fine, but the covariance matrix covariances isn't positive semidefinite, which is a requirement of numpy.random.multivariate_normal.
x = np.linalg.eigvals(covariances)
np.all(x >= 0)
>>> False
len([y for y in x if y < 0]) # negative eigenvalues
>>> 4396
len([y for y in x if y > 0]) # positive eigenvalues
>>> 4585
len([y for y in x if y == 0]) # zero eigenvalues.
>>> 7368
However, Wikipedia says
In addition, every covariance matrix is positive semi-definite.
Which leads me to wonder whether pandas.DataFrame.cov gets you a real covariance matrix. Here's the function's implementation. It seems to mostly defer to numpy.cov which also seems to promise a covariance matrix.
Can someone clear this up for me? Why is pandas.DataFrame.covs() not positive semidefinite?
Updated question:
From the first answer, it seems like all the negative eigenvalues are tiny. The author of that answer suggests clipping these eigenvalues, but it's still unclear to me how to sensibly generate a proper covariance matrix with this information.
I can imagine using pd.DataFrame.cov(), doing eigendecomposition to get eigenvectors and values, clipping the values, and then multiplying those matrices to get a new covariance matrix, but that seems quite precarious. Is that done in practice, or is there a better way?
Probably what's happening is that the result is positive-semidefinite, to within the accuracy of the computation. For example:
In [71]: np.linalg.eigvals(np.cov(np.random.random((5,5))))
Out[71]:
array([ 1.87557170e-01, 9.98250875e-02, 6.85211153e-02,
1.01062281e-02, -5.99164839e-18])
has a negative eigenvalue, but the magnitude is small.
So in your shoes I'd verify that the magnitude of the violations was small, and then clip to zero.