How can I generate multivariate normal distribution? - numpy

I am trying to generate the 1000 samples from a gaussian distribution with the following code
mean = [2.2, 0]
cov = [[1, 1.5], [1.5, 1]]
x = np.random.multivariate_normal(mean,cov,1000)
But above code is giving me the runtime error of RuntimeWarning: covariance is not positive-semidefinite.
how can I fix this?

Related

Strange behaviour of numpy eigenvector: bug or no bug

NumPy's eigenvector solution differs from Wolfram Alpha and my personal calculation by hand.
>>> import numpy.linalg
>>> import numpy as np
>>> numpy.linalg.eig(np.array([[-2, 1], [2, -1]]))
(array([-3., 0.]), array([[-0.70710678, -0.4472136 ],
[ 0.70710678, -0.89442719]]))
Wolfram Alpha https://www.wolframalpha.com/input/?i=eigenvectors+%7B%7B-2,1%7D,%7B%2B2,-1%7D%7D and my personal calculation give the eigenvectors (-1, 1) and (2, 1). The NumPy solution however differs.
NumPy's calculated eigenvalues however are confirmed by Wolfram Alpha and my personal calculation.
So, is this a bug in NumPy or is my understanding of math to simple? A similar thread Numpy seems to produce incorrect eigenvectors sees the main difference in rounding/scaling of the eigenvectors but the deviation between the solutions would be massive.
Regards
numpy.linalg.eig normalizes the eigen vectors with the results being the column vectors
eig_vectors = np.linalg.eig(np.array([[-2, 1], [2, -1]]))[1]
vec_1 = eig_vectors[:,0]
vec_2 = eig_vectors[:,1]
now these 2 vectors are just normalized versions of the vectors you calculated ie
print(vec_1 * np.sqrt(2)) # where root 2 is the magnitude of [-1, 1]
print(vec_1 * np.sqrt(5)) # where root 5 is the magnitude of [2, 1]
So bottom line the both sets of calculations are equivalent just Numpy likes to normalze the results.

Pandas 0.21.1 - DataFrame.replace recursion error

I was used to run this code with no issue:
data_0 = data_0.replace([-1, 'NULL'], [None, None])
now, after the update to Pandas 0.21.1, with the very same line of code I get a:
recursionerror: maximum recursion depth exceeded
does anybody experience the same issue ? and knows how to solve ?
Note: rolling back to pandas 0.20.3 will make the trick but I think it's important to solve with latest version
thanx
I think this error message depends on what your input data is. Here's an example of input data where this works in the expected way:
data_0 = pd.DataFrame({'x': [-1, 1], 'y': ['NULL', 'foo']})
data_0.replace([-1, 'NULL'], [None, None])
replaces values of -1 and 'NULL' with None:
x y
0 NaN None
1 1.0 foo

How to understand this: `db = np.sum(dscores, axis=0, keepdims=True)`

In cs231n 2017 class, when we backpropagate the gradient we update the biases like this:
db = np.sum(dscores, axis=0, keepdims=True)
What's the basic idea behind the sum operation? Thanks
This is the formula of derivative (more precisely gradient) of the loss function with respect to the bias (see this question and this post for derivation details).
The numpy.sum call computes the per-column sums along the 0 axis. Example:
dscores = np.array([[1, 2, 3],[2, 3, 4]]) # a 2D matrix
db = np.sum(dscores, axis=0, keepdims=True) # result: [[3 5 7]]
The result is exactly element-wise sum [1, 2, 3] + [2, 3, 4] = [3 5 7]. In addition, keepdims=True preserves the rank of original matrix, that's why the result is [[3 5 7]] instead of just [3 5 7].
By the way, if we were to compute np.sum(dscores, axis=1, keepdims=True), the result would be [[6] [9]].
[Update]
Apparently, the focus of this question is the formula itself. I'd like not to go too much off-topic here and just try to tell the main idea. The sum appears in the formula because of broadcasting over the mini-batch in the forward pass. If you take just one example at a time, the bias derivative is just the error signal, i.e. dscores (see the links above explain it in detail). But for a batch of examples the gradients are added up due to linearity. That's why we take the sum along the batch axis=0.
Numpy axis visual description:

Simplex algorithm in scipy package python

I am reading the documentation of the Simplex Algorithm provided in the Scipy package of python, but the example shown in the last at this documentation page is solving a minimization problem. Whereas I want to do a maximization. How would you alter the parameters in order to perform a maximization if we can do maximization using this package?
Every maximization problem can be transformed into a minimization problem by multiplying the c-vector by -1: Say you have the 2-variable problem from the documentation, but want to maximize c=[-1,4]
from scipy.optimize import linprog
import numpy
c = numpy.array([-1, 4]) # your original c for maximization
c *= -1 # negate the objective coefficients
A = [[-3, 1], [1, 2]]
b = [6, 4]
x0_bnds = (None, None)
x1_bnds = (-3, None)
res = linprog(c, A, b, bounds=(x0_bnds, x1_bnds))
print("Objective = {}".format(res.get('fun') * -1)) # don't forget to retransform your objective back!
outputs
>>> Objective = 11.4285714286

Specify the spherical covariance in numpy's multivariate_normal random sampling

In numpy manual, it is said:
Instead of specifying the full covariance matrix, popular approximations include:
Spherical covariance (cov is a multiple of the identity matrix)
Has anybody ever specified spherical covariance? I am trying to make it work to avoid building the full covariance matrix, which is too much memory-consuming.
If you just have a diagonal covariance matrix, it is usually easier (and more efficient) to just scale standard normal variates yourself instead of using multivariate_normal().
>>> import numpy as np
>>> stdevs = np.array([3.0, 4.0, 5.0])
>>> x = np.random.standard_normal([100, 3])
>>> x.shape
(100, 3)
>>> x *= stdevs
>>> x.std(axis=0)
array([ 3.23973255, 3.40988788, 4.4843039 ])
While #RobertKern's approach is correct, you can let numpy handle all of that for you, as np.random.normal will do broadcasting on multiple means and standard deviations:
>>> np.random.normal(0, [1,2,3])
array([ 0.83227999, 3.40954682, -0.01883329])
To get more than a single random sample, you have to give it an appropriate size:
>>> x = np.random.normal(0, [1, 2, 3], size=(1000, 3))
>>> np.std(x, axis=0)
array([ 1.00034817, 2.07868385, 3.05475583])