R-Squared of alternative model - pandas

In order to reduce the influence of outliers and obtain a more robust regression, I've applied a winsorization technique to modify the values of a series ('x'). I then regress these values against series 'y'.
The R-squared of this model is naturally much higher, but I'm not making the right comparison.
How do I use scipy or statsmodels to obtain the R-squared of the original data using the beta estimates from the winsorized model?

You need to calculate it yourself, essentially by replicating the formula for rsquared.
For example
>>> res_tmp = OLS(np.random.randn(100), np.column_stack((np.ones(100),np.random.randn(100, 2)))).fit()
>>> y_orig = res_tmp.model.endog
>>> res_tmp.rsquared
0.022009069788207714
>>> (1 - ((y_orig - res_tmp.fittedvalues)**2).sum() / ((y_orig - y_orig.mean())**2).sum())
0.022009069788207714
The last expression would apply to your case if res_tmp.fittedvalues are the predicted or fitted values of your winsorized model, and y_orig is your original unchanged response variable. This definition of R squared applies if there is a constant in the model.
Note: The most frequent naming for the linear model corresponds to y = X b, where y is the response variable and X are the explanatory variables. IIUC, then you reversed the labeling in your question.

Related

Automatic Differentiation with respect to rank-based computations

I'm new to automatic differentiation programming, so this maybe a naive question. Below is a simplified version of what I'm trying to solve.
I have two input arrays - a vector A of size N and a matrix B of shape (N, M), as well a parameter vector theta of size M. I define a new array C(theta) = B * theta to get a new vector of size N. I then obtain the indices of elements that fall in the upper and lower quartile of C, and use them to create a new array A_low(theta) = A[lower quartile indices of C] and A_high(theta) = A[upper quartile indices of C]. Clearly these two do depend on theta, but is it possible to differentiate A_low and A_high w.r.t theta?
My attempts so far seem to suggest no - I have using the python libraries of autograd, JAX and tensorflow, but they all return a gradient of zero. (The approaches I have tried so far involve using argsort or extracting the relevant sub-arrays using tf.top_k.)
What I'm seeking help with is either a proof that the derivative is not defined (or cannot be analytically computed) or if it does exist, a suggestion on how to estimate it. My eventual goal is to minimize some function f(A_low, A_high) wrt theta.
This is the JAX computation that I wrote based on your description:
import numpy as np
import jax.numpy as jnp
import jax
N = 10
M = 20
rng = np.random.default_rng(0)
A = jnp.array(rng.random((N,)))
B = jnp.array(rng.random((N, M)))
theta = jnp.array(rng.random(M))
def f(A, B, theta, k=3):
C = B # theta
_, i_upper = lax.top_k(C, k)
_, i_lower = lax.top_k(-C, k)
return A[i_lower], A[i_upper]
x, y = f(A, B, theta)
dx_dtheta, dy_dtheta = jax.jacobian(f, argnums=2)(A, B, theta)
The derivatives are all zero, and I believe this is correct, because the change in value of the outputs does not depend on the change in value of theta.
But, you might ask, how can this be? After all, theta enters into the computation, and if you put in a different value for theta, you get different outputs. How could the gradient be zero?
What you must keep in mind, though, is that differentiation doesn't measure whether an input affects an output. It measures the change in output given an infinitesimal change in input.
Let's use a slightly simpler function as an example:
import jax
import jax.numpy as jnp
A = jnp.array([1.0, 2.0, 3.0])
theta = jnp.array([5.0, 1.0, 3.0])
def f(A, theta):
return A[jnp.argmax(theta)]
x = f(A, theta)
dx_dtheta = jax.grad(f, argnums=1)(A, theta)
Here the result of differentiating f with respect to theta is all zero, for the same reasons as above. Why? If you make an infinitesimal change to theta, it will in general not affect the sort order of theta. Thus, the entries you choose from A do not change given an infinitesimal change in theta, and thus the derivative with respect to theta is zero.
Now, you might argue that there are circumstances where this is not the case: for example, if two values in theta are very close together, then certainly perturbing one even infinitesimally could change their respective rank. This is true, but the gradient resulting from this procedure is undefined (the change in output is not smooth with respect to the change in input). The good news is this discontinuity is one-sided: if you perturb in the other direction, there is no change in rank and the gradient is well-defined. In order to avoid undefined gradients, most autodiff systems will implicitly use this safer definition of a derivative for rank-based computations.
The result is that the value of the output does not change when you infinitesimally perturb the input, which is another way of saying the gradient is zero. And this is not a failure of autodiff – it is the correct gradient given the definition of differentiation that autodiff is built on. Moreover, were you to try changing to a different definition of the derivative at these discontinuities, the best you could hope for would be undefined outputs, so the definition that results in zeros is arguably more useful and correct.

pymc python change point detection for small probabilities. ZeroProbability Error

I am trying to use pymc to find a change point in a time-series. The value I am looking at over time is probability to "convert" which is very small, 0.009 on average with a range of 0.001-0.016.
I give the two probabilities a uniform distribution as a prior between zero and the max observation.
alpha = df.cnvrs.max() # Set upper uniform
center_1_c = pm.Uniform("center_1_c", 0, alpha)
center_2_c = pm.Uniform("center_2_c", 0, alpha)
day_c = pm.DiscreteUniform("day_c", lower=1, upper=n_days)
#pm.deterministic
def lambda_(day_c=day_c, center_1_c=center_1_c, center_2_c=center_2_c):
out = np.zeros(n_days)
out[:day_c] = center_1_c
out[day_c:] = center_2_c
return out
observation = pm.Uniform("obs", lambda_, value=df.cnvrs.values, observed=True)
When I run this code I get:
ZeroProbability: Stochastic obs's value is outside its support,
or it forbids its parents' current values.
I'm pretty new to pymc so not sure if I'm missing something obvious. My guess is I might not have appropriate distributions for modelling small probabilities.
It's impossible to tell where you've introduced this bug—and programming is off-topic here, in any case—without more of your output. But there is a statistical issue here: You've somehow constructed a model that cannot produce either the observed variables or the current sample of latent ones.
To give a simple example, say you have a dataset with negative values, and you've assumed it to be gamma distributed; this will produce an error, because the data has zero probability under a gamma. Similarly, an error will be thrown if an impossible value is sampled during an MCMC chain.

Tensorflow: What exact formula is applied in `tf.nn.sparse_softmax_cross_entropy_with_logits`?

I tried to manually recompute the outputs of this function so I created a minimal example:
logits = tf.pack(np.array([[[[0,1,2]]]],dtype=np.float32)) # img of shape (1, 1, 1, 3)
labels = tf.pack(np.array([[[1]]],dtype=np.int32)) # gt of shape (1, 1, 1)
softmaxCrossEntropie = tf.nn.sparse_softmax_cross_entropy_with_logits(logits,labels)
softmaxCrossEntropie.eval() # --> output is [1.41]
Now according to my own calculation I only get [1.23]
When manually calculating, I'm simply applying softmax
and cross-entropy:
where q(x) = sigma(x_j) or (1-sigma(x_j)) depending whether j is the correct ground truth class or not and p(x) = labels which are then one-hot-encoded
I'm not sure where the difference might originate from. I cannot really imagine that some epsilon causes such a big difference. Does someone know where I can lookup, which exact formula is used by tensorflow?
Is the source code of that exact part available?
I could only find nn_ops.py, but it only uses another function called gen_nn_ops._sparse_softmax_cross_entropy_with_logits which I couldn't find on github...
Well, usually p(x) in cross-entropy equation is true distribution, while q(x) is the distribution obtained from softmax. So, if p(x) is one-hot (and this is so, otherwise sparse cross-entropy could not be applied), cross entropy is just negative log for probability of true category.
In your example, softmax(logits) is a vector with values [0.09003057, 0.24472847, 0.66524096], so the loss is -log(0.24472847) = 1.4076059 which is exactly what you got as output.

TensorFlow: Contracting a dimension of two tensors via dot product

I have two tensors, a of rank 4 and b of rank 1. I'd like to produce aprime, of rank 3, by "contracting" the last axis of a away, by replacing it with its dot product against b. In numpy, this is as easy as np.tensordot(a, b, 1). However, I can't figure out a way to do this in Tensorflow.
How can I replace the last axis of a tensor with a value equal to that axis's dot product against another tensor (of course, of the same shape)?
UPDATE:
I see in Wikipedia that this is called the "Tensor Inner Product" https://en.wikipedia.org/wiki/Dot_product#Tensors aka tensor contraction. It seems like this is a common operation, I'm surprised that there's no explicit support for it in Tensorflow.
I believe that this may be possible via tf.einsum; however, I have not been able to find a generalized way to do this that works for tensors of any rank (this is probably because I do not understand einsum and have been reduced to trial and error)
Aren't you just using tensor in the sense of a multidimensional array? Or in some disciplines a tensor is 3d (vector 1d, matrix 2d, etc). I haven't used tensorflow but I don't think it has much to do with tensors in that linear algebra sensor. They talk about data flow graphs. I'm not sure where the tensor part of the name comes from.
I assume you are talking about an expression like:
In [293]: A=np.tensordot(np.ones((5,4,3,2)),np.arange(2),1)
resulting in a (5,4,3) shape array. The einsum equivalent is
In [294]: B=np.einsum('ijkl,l->ijk',np.ones((5,4,3,2)),np.arange(2))
np.einsum implements Einstine Notation, as discussed here: https://en.wikipedia.org/wiki/Einstein_notation. I got this link from https://en.wikipedia.org/wiki/Tensor_contraction
You seem to be talking about straight forward numpy operations, not something special in tensorflow.
I would first add 3 dimensions of size 1 to b so that it can be broadcast along the 4'th dimension of a.
b = tf.reshape(b, (1, 1, 1, -1))
Then you can multiply b and a and it will broadcast b along all of the other dimensions.
a_prime = a * b
Finally, reduce the sum along the 4'th dimension to get rid of that dimension and replace it with the dot product.
a_prime = tf.reduce_sum(a_prime, [3])
This seems like it would work (for the first tensor being of any rank):
tf.einsum('...i,i->...', x, y)

Tensorflow Linear Regression: Getting values for Adjusted R Square, Coefficients, P-value

There are few key parameters associated with Linear Regression e.g. Adjusted R Square, Coefficients, P-value, R square, Multiple R etc. While using google Tensorflow API to implement Linear Regression how are these parameter mapped? Is there any way we can get the value of these parameters after/during model execution
From my experience, if you want to have these values while your model runs then you have to hand code them using tensorflow functions. If you want them after the model has run you can use scipy or other implementations. Below are some examples of how you might go about coding R^2, MAPE, RMSE...
total_error = tf.reduce_sum(tf.square(tf.sub(y, tf.reduce_mean(y))))
unexplained_error = tf.reduce_sum(tf.square(tf.sub(y, prediction)))
R_squared = tf.sub(tf.div(total_error, unexplained_error),1.0)
R = tf.mul(tf.sign(R_squared),tf.sqrt(tf.abs(unexplained_error)))
MAPE = tf.reduce_mean(tf.abs(tf.div(tf.sub(y, prediction), y)))
RMSE = tf.sqrt(tf.reduce_mean(tf.square(tf.sub(y, prediction))))
I believe the formula for R2 should be the following. Note that it would go negative when the network is so bad that it does a worse job than the mere average as a predictor:
total_error = tf.reduce_sum(tf.square(tf.subtract(y, tf.reduce_mean(y))))
unexplained_error = tf.reduce_sum(tf.square(tf.subtract(y, pred)))
R_squared = tf.subtract(1.0, tf.divide(unexplained_error, total_error))
Adjusted_R_squared = 1 - [ (1-R_squared)*(n-1)/(n-k-1) ]
whereas n is the number of observations and k is the number of features.
You should not use a formula for R Squared. This exists in Tensorflow Addons. You will only need to extend it to Adjusted R Squared.
I would strongly recommend against using a recipe to calculate r-squared itself! The examples I've found do not produce consistent results, especially with just one target variable. This gave me enormous headaches!
The correct thing to do is to use tensorflow_addons.metrics.RQsquare(). Tensorflow Add Ons is on PyPi here and the documentation is a part of Tensorflow here. All you have to do is set y_shape to the shape of your output, often it is (1,) for a single output variable.
Then you can use what RSquare() returns in your own metric that handled the adjustments.