Coefficients of 2D Chebyshev series in numpy.polynomial.chebyshev - numpy

I understand that chebvander2d and chebval2d return the Vandermonde matrix and fitted values for 2D inputs, and chebfit returns the coefficients for 1D-input series, but how do I get the coefficients for 2D-input series?

Short answer: It looks to me like this is not yet implemented. The whole of 2D polynomials seems more like a draft with some stub functions (as of June 2020).
Long answer (I came looking for the same thing, so I dug a little deeper):
First of all, this applies to all of the polynomial classes, not only chebyshev, so you also cannot fit an "ordinary" polynomial (power series). In fact, you cannot even construct one.
To understand the programming problem, let me recapture what a 2D polynomial looks like as a math formula, at an example polynomial of degree 2:
p(x, y) = c_00 + c_10 x + c_01 y + c_20 x^2 + c11 xy + c02 y^2
here the indices of c refer to the powers of x and y (the sum of the exponents must be <= degree).
First thing to notice is that, for degree d, there are (d+1)(d+2)/2 coefficients.
They could be stored in the upper left part of a matrix or in a 1D array, e.g. aranged as in the formula above.
The documentation of functions like numpy.polynomial.polynomial.polyval2d implies that numpy expects the matrix variant: p(x, y) = sum_i,j c_i,j * x^i * y^j.
Side note: it may be confusing that the row index i ("y-coordinate") of the matrix is used as exponent of x, not y; maybe the role of i and j should be switched if this is eventually implementd, or at least there should be a note in the documentation.
This leads to the core problem: the data structure for the 2D coefficients is not defined anywhere; only indirectly, like above, it can be guessed that a matrix should be used. But compared to a 1D array this is a waste of space, and evaluation of the polynomial takes two nested loops instead of just one. Also: does the matrix have to be initialized with np.zeros or do the implemented functions make sure that the lower right part is never touched so that np.empty can be used?
If the whole (d+1)^2 matrix were used, as the polyval2d function doc suggests, the degree of the polynomial would actually be d*2 (if c_d,d != 0)
To test this, I wanted to construct a numpy.polynomial.polynomial.Polynomial (yes, three times polynomial) and check the degree attribute:
import numpy as np
import numpy.polynomial.polynomial as poly
coef = np.array([
[5.00, 5.01, 5.02],
[5.10, 5.11, 0. ],
[5.20, 0. , 0. ]
])
polyObj = poly.Polynomial(coef)
print(polyObj.degree)
This gave a ValueError: Coefficient array is not 1-d before the print statement was reached. So while polyval2d expects a 2D coefficient array, it is not (yet) possible to construct such a polynomial - not manually like this at least. With this insight, it is not surprising that there is no function (yet) that computes a fit for 2D polynomials.

Related

Automatic Differentiation with respect to rank-based computations

I'm new to automatic differentiation programming, so this maybe a naive question. Below is a simplified version of what I'm trying to solve.
I have two input arrays - a vector A of size N and a matrix B of shape (N, M), as well a parameter vector theta of size M. I define a new array C(theta) = B * theta to get a new vector of size N. I then obtain the indices of elements that fall in the upper and lower quartile of C, and use them to create a new array A_low(theta) = A[lower quartile indices of C] and A_high(theta) = A[upper quartile indices of C]. Clearly these two do depend on theta, but is it possible to differentiate A_low and A_high w.r.t theta?
My attempts so far seem to suggest no - I have using the python libraries of autograd, JAX and tensorflow, but they all return a gradient of zero. (The approaches I have tried so far involve using argsort or extracting the relevant sub-arrays using tf.top_k.)
What I'm seeking help with is either a proof that the derivative is not defined (or cannot be analytically computed) or if it does exist, a suggestion on how to estimate it. My eventual goal is to minimize some function f(A_low, A_high) wrt theta.
This is the JAX computation that I wrote based on your description:
import numpy as np
import jax.numpy as jnp
import jax
N = 10
M = 20
rng = np.random.default_rng(0)
A = jnp.array(rng.random((N,)))
B = jnp.array(rng.random((N, M)))
theta = jnp.array(rng.random(M))
def f(A, B, theta, k=3):
C = B # theta
_, i_upper = lax.top_k(C, k)
_, i_lower = lax.top_k(-C, k)
return A[i_lower], A[i_upper]
x, y = f(A, B, theta)
dx_dtheta, dy_dtheta = jax.jacobian(f, argnums=2)(A, B, theta)
The derivatives are all zero, and I believe this is correct, because the change in value of the outputs does not depend on the change in value of theta.
But, you might ask, how can this be? After all, theta enters into the computation, and if you put in a different value for theta, you get different outputs. How could the gradient be zero?
What you must keep in mind, though, is that differentiation doesn't measure whether an input affects an output. It measures the change in output given an infinitesimal change in input.
Let's use a slightly simpler function as an example:
import jax
import jax.numpy as jnp
A = jnp.array([1.0, 2.0, 3.0])
theta = jnp.array([5.0, 1.0, 3.0])
def f(A, theta):
return A[jnp.argmax(theta)]
x = f(A, theta)
dx_dtheta = jax.grad(f, argnums=1)(A, theta)
Here the result of differentiating f with respect to theta is all zero, for the same reasons as above. Why? If you make an infinitesimal change to theta, it will in general not affect the sort order of theta. Thus, the entries you choose from A do not change given an infinitesimal change in theta, and thus the derivative with respect to theta is zero.
Now, you might argue that there are circumstances where this is not the case: for example, if two values in theta are very close together, then certainly perturbing one even infinitesimally could change their respective rank. This is true, but the gradient resulting from this procedure is undefined (the change in output is not smooth with respect to the change in input). The good news is this discontinuity is one-sided: if you perturb in the other direction, there is no change in rank and the gradient is well-defined. In order to avoid undefined gradients, most autodiff systems will implicitly use this safer definition of a derivative for rank-based computations.
The result is that the value of the output does not change when you infinitesimally perturb the input, which is another way of saying the gradient is zero. And this is not a failure of autodiff – it is the correct gradient given the definition of differentiation that autodiff is built on. Moreover, were you to try changing to a different definition of the derivative at these discontinuities, the best you could hope for would be undefined outputs, so the definition that results in zeros is arguably more useful and correct.

How to implement an equation with "diag" and circled-times symbol (⊗) in Python

The above equation is found in this paper (https://openreview.net/pdf?id=Sk_P2Q9sG) as equation #4.
I'm really confused as to how this gets converted into the following code:
epistemic = np.mean(p_hat**2, axis=0) - np.mean(p_hat, axis=0)**2
aleatoric = np.mean(p_hat*(1-p_hat), axis=0)
I think I may be confused due to the symbols in the equation though, for instance "diag" and the circle with a cross in it. How is the diag and circle represented as such in Python?
Thanks.
The linked article explains below eq. (2) the definition of this "outer square" operation:
v⊗2 = v vT.
In the field of machine learning, vectors are to be interpreted as column vectors. In numpy, if v is a vector of shape (n,), you would write:
v.reshape(-1, 1) * v
The result is a shape (n, n) array. If you want to stay closer to the notation with column and row vectors, you could also write:
v_column = v.reshape(-1, 1)
result = v_column # v_column.T
The function diag is the same as np.diag: you feed it a vector and you get a diagonal matrix.
How the Python snippet in your question implements the equations from the paper is hard to tell without information about what the shape of p_hat is and which axis represents t in the equation. In the most plausible definition, p_hat would have shape (T, n), in which case np.mean(p_hat**2, axis=0) would return shape (n,), which is not consistent with the equation that you quoted from the paper that should result in an (n, n) shape.
Given that the epistemic and aleatoric uncertainties in the paper seem to be scalars rather than vectors, I suspect that the authors made an error in the definition of the ⊗2 exponent: they should have written
v⊗2 = vT v
which translates to Python (on shape (n,) arrays) as
np.sum(v**2)

Implementation of Isotropic squared exponential kernel with numpy

I've come across a from scratch implementation for gaussian processes:
http://krasserm.github.io/2018/03/19/gaussian-processes/
There, the isotropic squared exponential kernel is implemented in numpy. It looks like:
The implementation is:
def kernel(X1, X2, l=1.0, sigma_f=1.0):
sqdist = np.sum(X1**2, 1).reshape(-1, 1) + np.sum(X2**2, 1) - 2 * np.dot(X1, X2.T)
return sigma_f**2 * np.exp(-0.5 / l**2 * sqdist)
consistent with the implementation of Nando de Freitas: https://www.cs.ubc.ca/~nando/540-2013/lectures/gp.py
However, I am not quite sure how this implementation matches the provided formula, especially in the sqdist part. In my opinion, it is wrong but it works (and delivers the same results as scipy's cdist with squared euclidean distance). Why do I think it is wrong? If you multiply out the multiplication of the two matrices, you get
which equals to either a scalar or a nxn matrix for a vector x_i, depending on whether you define x_i to be a column vector or not. The implementation however gives back a nx1 vector with the squared values.
I hope that anyone can shed light on this.
I found out: The implementation is correct. I just was not aware of the fuzzy notation (in my opinion) which is sometimes used in ML contexts. What is to be achieved is a distance matrix and each row vectors of matrix A are to be compared with the row vectors of matrix B to infer the covariance matrix, not (as I somehow guessed) the direct distance between two matrices/vectors.

Using vectorize to apply function to each row in Numpy 2d array

I have a 1000x784 matrix of data (10000 examples and 784 features) called X_valid and I'd like to apply the following function to each row in this matrix and get the numerical result:
def predict_prob(x_valid, cov, mean, prior):
return -0.5 * (x_valid.T.dot(np.linalg.inv(cov)).dot(x_valid) + mean.T.dot(
np.linalg.inv(cov)).dot(mean) + np.linalg.slogdet(cov)[1]) + np.log(
prior)
(x_valid is simply a row of data). I'm using numpy's vectorize to do this with the following code:
v_predict_prob = np.vectorize(predict_prob)
scores = v_predict_prob(X_valid, covariance[num], means[num], priors[num])
(covariance[num], means[num], and priors[num] are just constants.)
However, I get the following error when running this:
File "problem_5.py", line 48, in predict_prob
return -0.5 * (x_valid.T.dot(np.linalg.inv(cov)).dot(x_valid) + mean.T.dot(np.linalg.inv(cov)).dot(mean) + np.linalg.slogdet(cov)[1]) + np.log(prior)
AttributeError: 'numpy.float64' object has no attribute 'dot'
That is, it's not passing in each row of the matrix individually. Instead, it is passing in each entry of the matrix (not what I want).
How can I alter this to get the desired behavior?
vectorize is NOT a general substitute for iteration, nor does it claim to be faster. It mainly streamlines access to the numpy broadcasting functionality. In general the function that you vectorize will take scalar inputs, not rows or 1d arrays.
I don't think there is a way of configuring vectorize to pass an array to your function as opposed to an item.
You describe x_valid as 2d that you want to evaluate row by row. And the other terms as 'constants' which you select with [num]. What shape are those constants?
You function treats a lot of these terms as 2d arrays:
x_valid.T.dot(np.linalg.inv(cov)).dot(x_valid) +
mean.T.dot(np.linalg.inv(cov)).dot(mean) +
np.linalg.slogdet(cov)[1]) + np.log(prior)
x_valid.T is meaningful only if x_valid is 2d. If it is 1d, the transpose does noting.
np.linalg.inv(cov) only makes sense if cov is 2d.
mean.T.dot... assumes mean is 2d.
np.linalg.slogdet(cov)[1] assumes np.linalg.slogdet(cov) has 2 or more elements (or rows).
You need to show us that the function works with some real arrays before jumping into iteration or 'vectorize'.
I suggest just using a for loop:
def v_predict_prob(X_valid, c, m, p):
out = []
for row in X_valid:
out.append(predict_prob(row, c, m, p))
return np.array(out)
Under the hood np.vectorize is doing the same thing: http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.vectorize.html
I know this question is a bit outdated, but I thought I would provide an answer for 2020.
Since the release of numpy 1.12, there is a new optional argument, "signature", which should allow 2D array functionality in most cases. Additionally, you will want to "exclude" the constants since they will not be vectorized.
All you would need to change is:
v_predict_prob = np.vectorize(predict_prob, exclude=['cov', 'mean', 'prior'], signature='(n)->()')
This signifies that the function should expect an n-dim array and output a scalar, and cov, mean, and prior will not be vectorized.

Faster way to perform point-wise interplation of numpy array?

I have a 3D datacube, with two spatial dimensions and the third being a multi-band spectrum at each point of the 2D image.
H[x, y, bands]
Given a wavelength (or band number), I would like to extract the 2D image corresponding to that wavelength. This would be simply an array slice like H[:,:,bnd]. Similarly, given a spatial location (i,j) the spectrum at that location is H[i,j].
I would also like to 'smooth' the image spectrally, to counter low-light noise in the spectra. That is for band bnd, I choose a window of size wind and fit a n-degree polynomial to the spectrum in that window. With polyfit and polyval I can find the fitted spectral value at that point for band bnd.
Now, if I want the whole image of bnd from the fitted value, then I have to perform this windowed-fitting at each (i,j) of the image. I also want the 2nd-derivative image of bnd, that is, the value of the 2nd-derivative of the fitted spectrum at each point.
Running over the points, I could polyfit-polyval-polyder each of the x*y spectra. While this works, this is a point-wise operation. Is there some pytho-numponic way to do this faster?
If you do least-squares polynomial fitting to points (x+dx[i],y[i]) for a fixed set of dx and then evaluate the resulting polynomial at x, the result is a (fixed) linear combination of the y[i]. The same is true for the derivatives of the polynomial. So you just need a linear combination of the slices. Look up "Savitzky-Golay filters".
EDITED to add a brief example of how S-G filters work. I haven't checked any of the details and you should therefore not rely on it to be correct.
So, suppose you take a filter of width 5 and degree 2. That is, for each band (ignoring, for the moment, ones at the start and end) we'll take that one and the two on either side, fit a quadratic curve, and look at its value in the middle.
So, if f(x) ~= ax^2+bx+c and f(-2),f(-1),f(0),f(1),f(2) = p,q,r,s,t then we want 4a-2b+c ~= p, a-b+c ~= q, etc. Least-squares fitting means minimizing (4a-2b+c-p)^2 + (a-b+c-q)^2 + (c-r)^2 + (a+b+c-s)^2 + (4a+2b+c-t)^2, which means (taking partial derivatives w.r.t. a,b,c):
4(4a-2b+c-p)+(a-b+c-q)+(a+b+c-s)+4(4a+2b+c-t)=0
-2(4a-2b+c-p)-(a-b+c-q)+(a+b+c-s)+2(4a+2b+c-t)=0
(4a-2b+c-p)+(a-b+c-q)+(c-r)+(a+b+c-s)+(4a+2b+c-t)=0
or, simplifying,
22a+10c = 4p+q+s+4t
10b = -2p-q+s+2t
10a+5c = p+q+r+s+t
so a,b,c = p-q/2-r-s/2+t, (2(t-p)+(s-q))/10, (p+q+r+s+t)/5-(2p-q-2r-s+2t).
And of course c is the value of the fitted polynomial at 0, and therefore is the smoothed value we want. So for each spatial position, we have a vector of input spectral data, from which we compute the smoothed spectral data by multiplying by a matrix whose rows (apart from the first and last couple) look like [0 ... 0 -9/5 4/5 11/5 4/5 -9/5 0 ... 0], with the central 11/5 on the main diagonal of the matrix.
So you could do a matrix multiplication for each spatial position; but since it's the same matrix everywhere you can do it with a single call to tensordot. So if S contains the matrix I just described (er, wait, no, the transpose of the matrix I just described) and A is your 3-dimensional data cube, your spectrally-smoothed data cube would be numpy.tensordot(A,S).
This would be a good point at which to repeat my warning: I haven't checked any of the details in the few paragraphs above, which are just meant to give an indication of how it all works and why you can do the whole thing in a single linear-algebra operation.