In numpy manual, it is said:
Instead of specifying the full covariance matrix, popular approximations include:
Spherical covariance (cov is a multiple of the identity matrix)
Has anybody ever specified spherical covariance? I am trying to make it work to avoid building the full covariance matrix, which is too much memory-consuming.
If you just have a diagonal covariance matrix, it is usually easier (and more efficient) to just scale standard normal variates yourself instead of using multivariate_normal().
>>> import numpy as np
>>> stdevs = np.array([3.0, 4.0, 5.0])
>>> x = np.random.standard_normal([100, 3])
>>> x.shape
(100, 3)
>>> x *= stdevs
>>> x.std(axis=0)
array([ 3.23973255, 3.40988788, 4.4843039 ])
While #RobertKern's approach is correct, you can let numpy handle all of that for you, as np.random.normal will do broadcasting on multiple means and standard deviations:
>>> np.random.normal(0, [1,2,3])
array([ 0.83227999, 3.40954682, -0.01883329])
To get more than a single random sample, you have to give it an appropriate size:
>>> x = np.random.normal(0, [1, 2, 3], size=(1000, 3))
>>> np.std(x, axis=0)
array([ 1.00034817, 2.07868385, 3.05475583])
Related
Given I have the number of axes, can I specify the number of axes to the type hint npt.NDArray (from import numpy.typing as npt)
i.e. if I know it is a 3D array, how can I do npt.NDArray[3, np.float64]
On Python 3.9 and 3.10 the following does the job for me:
data = [[1, 2, 3], [4, 5, 6]]
arr: np.ndarray[Tuple[Literal[2], Literal[3]], np.dtype[np.int_]] = np.array(data)
It is a bit cumbersome, but you might follow numpy issue #16544 for future development on easier specification.
In particular, for now you must declare the full shape and can't only declare the rank of the array.
In the future something like ndarray[Shape[:, :, :], dtype] should be available.
I have a series of one-hot encoding vector, say
np.array([[1,0,0,0],[0,1,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]])
I want to convert it back to
np.array(0,1,1,2,3)
Is there an efficient way of doing without for loop?
As pointed out by #Divakar in the comments, NumPy's argmax is the easiest way to get the job done. Notice that you need to pass the function the proper value of parameter axis.
In [18]: import numpy as np
In [19]: x = np.array([[1,0,0,0],[0,1,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]])
In [20]: np.argmax(x, axis=-1)
Out[20]: array([0, 1, 1, 2, 3], dtype=int64)
NumPy's eigenvector solution differs from Wolfram Alpha and my personal calculation by hand.
>>> import numpy.linalg
>>> import numpy as np
>>> numpy.linalg.eig(np.array([[-2, 1], [2, -1]]))
(array([-3., 0.]), array([[-0.70710678, -0.4472136 ],
[ 0.70710678, -0.89442719]]))
Wolfram Alpha https://www.wolframalpha.com/input/?i=eigenvectors+%7B%7B-2,1%7D,%7B%2B2,-1%7D%7D and my personal calculation give the eigenvectors (-1, 1) and (2, 1). The NumPy solution however differs.
NumPy's calculated eigenvalues however are confirmed by Wolfram Alpha and my personal calculation.
So, is this a bug in NumPy or is my understanding of math to simple? A similar thread Numpy seems to produce incorrect eigenvectors sees the main difference in rounding/scaling of the eigenvectors but the deviation between the solutions would be massive.
Regards
numpy.linalg.eig normalizes the eigen vectors with the results being the column vectors
eig_vectors = np.linalg.eig(np.array([[-2, 1], [2, -1]]))[1]
vec_1 = eig_vectors[:,0]
vec_2 = eig_vectors[:,1]
now these 2 vectors are just normalized versions of the vectors you calculated ie
print(vec_1 * np.sqrt(2)) # where root 2 is the magnitude of [-1, 1]
print(vec_1 * np.sqrt(5)) # where root 5 is the magnitude of [2, 1]
So bottom line the both sets of calculations are equivalent just Numpy likes to normalze the results.
I am reading the documentation of the Simplex Algorithm provided in the Scipy package of python, but the example shown in the last at this documentation page is solving a minimization problem. Whereas I want to do a maximization. How would you alter the parameters in order to perform a maximization if we can do maximization using this package?
Every maximization problem can be transformed into a minimization problem by multiplying the c-vector by -1: Say you have the 2-variable problem from the documentation, but want to maximize c=[-1,4]
from scipy.optimize import linprog
import numpy
c = numpy.array([-1, 4]) # your original c for maximization
c *= -1 # negate the objective coefficients
A = [[-3, 1], [1, 2]]
b = [6, 4]
x0_bnds = (None, None)
x1_bnds = (-3, None)
res = linprog(c, A, b, bounds=(x0_bnds, x1_bnds))
print("Objective = {}".format(res.get('fun') * -1)) # don't forget to retransform your objective back!
outputs
>>> Objective = 11.4285714286
I want to normalize the pixel values of an image to the range [0, 1] for each channel (R, G, B).
Minimal Example
#!/usr/bin/env python
import numpy as np
import scipy
from sklearn import preprocessing
original = scipy.misc.imread('Crocodylus-johnsoni-3.jpg')
scipy.misc.imshow(original)
transformed = np.zeros(original.shape, dtype=np.float64)
scaler = preprocessing.MinMaxScaler()
for channel in range(3):
transformed[:, :, channel] = scaler.fit_transform(original[:, :, channel])
scipy.misc.imsave("transformed.jpg", transformed)
What happens
Taking https://commons.wikimedia.org/wiki/File:Crocodylus-johnsoni-3.jpg,
I get the following "normalized" result:
As you can see there are lines from top to bottom at the right side. What happened there? It seems to me that the normalization went wrong. If so: How do I fix it?
In scikit-learn, a two-dimensional array with shape (m, n) is usually interpreted as a collection of m samples, with each sample having n features.
MinMaxScaler.fit_transform() transforms each feature, so each column of your array is transformed independently of the others. That results in the vertical "stripes" in the image.
It looks like you intended to scale each color channel independently. To do that using MinMaxScaler, reshape the input so that each channel becomes one column. That is, if the original image has shape (m, n, 3), reshape it to (m*n, 3) before passing it to the fit_transform() method, and then restore the shape of the result to create the transformed array.
For example,
ascolumns = original.reshape(-1, 3)
t = scaler.fit_transform(ascolumns)
transformed = t.reshape(original.shape)
With this, transformed looks like this:
The image looks exactly like the original, because it turns out that in the array original, the minimum and maximum are 0 and 255, respectively, in each channel:
In [41]: original.min(axis=(0, 1))
Out[41]: array([0, 0, 0], dtype=uint8)
In [42]: original.max(axis=(0, 1))
Out[42]: array([255, 255, 255], dtype=uint8)
So all fit_transform does in this case is transform all the input values to the floating point range [0.0, 1.0] uniformly. If the minimum or maximum was different in one of the channels, the transformed image would look different.
By the way, it is not difficult to perform the transform using pure numpy. (I'm using Python 3, so in the following, the division automatically casts the result to floating point. If you are using Python 2, you'll need to convert one of the argument to floating point, or use from __future__ import division.)
In [58]: omin = original.min(axis=(0, 1), keepdims=True)
In [59]: omax = original.max(axis=(0, 1), keepdims=True)
In [60]: xformed = (original - omin)/(omax - omin)
In [61]: np.allclose(xformed, transformed)
Out[61]: True
(One potential problem with that method is that it will generate an error if one of the channels is constant, because then one of the values in omax - omin will be 0.)