Why does numpy and pytorch give different results after mean and variance normalization? - numpy

I am working on a problem in which a matrix has to be mean-var normalized row-wise. It is also required that the normalization is applied after splitting each row into tiny batches.
The code seem to work for Numpy, but fails with Pytorch (which is required for training).
It seems Pytorch and Numpy results differ. Any help will be greatly appreciated.
Example code:
import numpy as np
import torch
def normalize(x, bsize, eps=1e-6):
nc = x.shape[1]
if nc % bsize != 0:
raise Exception(f'Number of columns must be a multiple of bsize')
x = x.reshape(-1, bsize)
m = x.mean(1).reshape(-1, 1)
s = x.std(1).reshape(-1, 1)
n = (x - m) / (eps + s)
n = n.reshape(-1, nc)
return n
# numpy
a = np.float32(np.random.randn(8, 8))
n1 = normalize(a, 4)
# torch
b = torch.tensor(a)
n2 = normalize(b, 4)
n2 = n2.numpy()
print(abs(n1-n2).max())

In the first example you are calling normalize with a, a numpy.ndarray, while in the second you call normalize with b, a torch.Tensor.
According to the documentation page of torch.std, Bessel’s correction is used by default to measure the standard deviation. As such the default behavior between numpy.ndarray.std and torch.Tensor.std is different.
If unbiased is True, Bessel’s correction will be used. Otherwise, the sample deviation is calculated, without any correction.
torch.std(input, dim, unbiased, keepdim=False, *, out=None) → Tensor
Parameters
input (Tensor) – the input tensor.
unbiased (bool) – whether to use Bessel’s correction (δN = 1).
You can try yourself:
>>> a.std(), b.std(unbiased=True), b.std(unbiased=False)
(0.8364538, tensor(0.8942), tensor(0.8365))

Related

Vectorizing ARD (Automatic Relevance Determination) kernel implementation in Gaussian processes

I am trying to implement an ARD kernel with NumPy as given in the GPML book (M3 from Equation 5.2).
I am struggling in vectorizing this equation for NxM kernel computation. I have tried the following non-vectorized version. Can someone help in vectorizing this in NumPy/PyTorch?
import numpy as np
N = 30 # Number of data points in X1
M = 40 # Number of data points in X2
D = 6 # Number of features (ARD dimensions)
X1 = np.random.rand(N, D)
X2 = np.random.rand(M, D)
Lambda = np.random.rand(D, 1)
L_inv = np.diag(np.random.rand(D))
sigma_f = np.random.rand()
K = np.empty((N, M))
for n in range(N):
for m in range(M):
M3 = Lambda#Lambda.T + L_inv**2
d = (X1[n,:] - X2[m,:]).reshape(-1,1)
K[n, m] = sigma_f**2 * np.exp(-0.5 * d.T#M3#d)
We can use the rules of broadcasting and the neat NumPy function einsum to vectorize array operations. In few words, broadcasting allows us to operate with arrays in one-liners by adding new dimensions to the resulting array, while einsum allows us to perform operations with multiple arrays by explicitly working in the index notation (instead of matrices).
Luckily, no loops are necessary to calculate your kernel. Please see below the vectorized solution, ARD_kernel function, which is about 30x faster in my machine than the original loopy version. Now, einsum is usually as fast as it gets, but it's possible that there are faster methods though, I've not checked anything else (e.g. usual # operator instead of einsum).
Also, there is a missing term in the code (the Kronecker delta), I don't know if it was omitted in purpose (let me know if you have problems implementing it and I'll edit the answer).
import numpy as np
N = 300 # Number of data points in X1
M = 400 # Number of data points in X2
D = 6 # Number of features (ARD dimensions)
np.random.seed(1) # Fix random seed for reproducibility
X1 = np.random.rand(N, D)
X2 = np.random.rand(M, D)
Lambda = np.random.rand(D, 1)
L_inv = np.diag(np.random.rand(D))
sigma_f = np.random.rand()
# Loopy function
def ARD_kernel_loops(X1, X2, Lambda, L_inv, sigma_f):
K = np.empty((N, M))
M3 = Lambda#Lambda.T + L_inv**2
for n in range(N):
for m in range(M):
d = (X1[n,:] - X2[m,:]).reshape(-1,1)
K[n, m] = np.exp(-0.5 * d.T#M3#d)
return K * sigma_f**2
# Vectorized function
def ARD_kernel(X1, X2, Lambda, L_inv, sigma_f):
M3 = Lambda.squeeze()*Lambda + L_inv**2 # Use broadcasting to avoid transpose
d = X1[:,None] - X2[None,...] # Use broadcasting to avoid loops
# order=F for memory layout (as your arrays are (N,M,D) instead of (D,N,M))
return sigma_f**2 * np.exp(-0.5 * np.einsum("ijk,kl,ijl->ij", d, M3, d, order = 'F'))
There is perhaps an additional optimisation. The examples of the M matrices given are all positive definite. This means that the Cholesky decomposition can be applied, wo that we can find upper triangular U so that
M = U'*U
The point of this is that if we apply U to the xs, so
y[p] = U*x[p] p=1..
Then
(x[p]-x[q])'*M*(x[p]-x[q]) = (y[p]-y[q])'*(y[p]-y[q])
Thus if there are N vectors x each of dimension d,
we convert the N squared O(d squared) operations on the LHS to N squared O(d) operations on the RHS
This has cost an extra choleski decompositon (O(d cubed))
and N O( d squared) applications of U to the xs.

Efficient way to calculate the pairwise matrix product between one tensor and all the rolling of another tensor

Suppose we have two tensors:
tensor A whose shape is (d,m,n)
tensor B whose shape is (d,n,l).
If we want to get the pairwise matrix product of the right-most matrix of A and B, I think we can use np.einsum('dmn,...nl->d...ml',A,B) whose size is (d,d,m,l). However, I would like to get the pairwise product of not all the pairs.
Import a parameter k, 1<=k<=d, I want to get the following pairwise matrix product:
from
A(0,...)#B(0,...)
to
A(0,...)#B(k-1,...)
;
from
A(1,...)#B(1,...)
to
A(1,...)#B(k,...)
;
....
;
from
A(d-2,...)#B(d-2,...),
A(d-2,...)#B(d-1,...)
to
A(d-2,...)#B(k-3,...)
;
from
A(d-1,...)#B(d-1,...)
to
A(d-1,...)#B(k-2,...)
.
Note here we we use a rolling way to deal with tensor B. (like numpy.roll).
Finally, we actually get a tensor whose shape is (d,k,m,l).
What's the most efficient way to do this.
I know several ways like:
First get np.einsum('dmn,...nl->d...ml',A,B), then use a mask to extract the (d,k) pairs.
tile B first, then use einsum in some way.
But I think there exists a better way.
I doubt you can do much better than a for loop. Here is, for example, a vectorized version using einsum and stride_tricks compared to a double for loop:
Code:
from simple_benchmark import BenchmarkBuilder, MultiArgument
import numpy as np
from numpy.lib.stride_tricks import as_strided
B = BenchmarkBuilder()
#B.add_function()
def loopy(A,B,k):
d,m,n = A.shape
l = B.shape[-1]
out = np.empty((d,k,m,l),int)
for i in range(d):
for j in range(k):
out[i,j] = A[i]#B[(i+j)%d]
return out
#B.add_function()
def vectory(A,B,k):
d,m,n = A.shape
l = B.shape[-1]
BB = np.concatenate([B,B[:k-1]],0)
BB = as_strided(BB,(d,k,n,l),np.repeat(BB.strides,(2,1,1)))
return np.einsum("ikl,ijln->ijkn",A,BB)
#B.add_arguments('d x k x m x n x l')
def argument_provider():
for exp in range(10):
d,k,m,n,l = (np.r_[1.6,1.5,1.5,1.5,1.5]**exp*(4,2,2,2,2)).astype(int)
print(d,k,m,n,l)
A = np.random.randint(0,10,(d,m,n))
B = np.random.randint(0,10,(d,n,l))
yield k*d*m*n*l,MultiArgument([A,B,k])
r = B.run()
r.plot()
import pylab
pylab.savefig('diagwa.png')

Minimizing negative log-likelihood of logistic regression, scipy returning warning: "Desired error not necessarily achieved due to precision loss."

I'm trying to sort out why scipy optimize isn't converging on a solution for the minimum negative-log-likelihood of the logistic regression function (as implemented below).
It seems to converge for smaller data sets, but for the larger data sets scipy returns the warning: "Desired error not necessarily achieved due to precision loss."
I thought this was a well-behaved optimization problem, so I'm anxious that I'm missing an obvious mistake.
Can anyone spot a mistake in my implementation or make a suggestion that I might try?
I'm using the default method, but I have had little luck with the other various methods that miminize allows.
Many thanks!
Quick summary of the implementation. I'm minimizing the following statement:
with the caveat that since b is a constant, I'm using the exponent -(w*x + b). I think I've implemented that function correct, but maybe I'm not seeing something. Since the data are constants with respect to the function being minimized, I just output a function definition that retains the data within it; thus, the function to be minimized only accepts the weights.
The data is a pandas dataframe of the format: rows == samples, columns == attributes, but LAST column == label (0 or 1). I've transformed all the data to make sure it is continuous, and I've normalized it to have a mean of 0 and a standard deviation of 1. I'm also starting with random weights between [0, 0.1], treating the first weight as 'b'.
def get_optimization_func_call(data, sheepda):
#
# Extract pos/neg data without label
pos_df = data[data[LABEL] == 1].as_matrix()[:, :-1]
neg_df = data[data[LABEL] == 0].as_matrix()[:, :-1]
#
# Def evaluation of positive terms by row
def eval_pos_row(pos_row, w, b):
cur_exponent = np.dot(w, pos_row) + b
cur_val = expit(cur_exponent)
if cur_val == 0:
print("pos", cur_exponent)
return (-1 * np.log(cur_val))
#
# Def evaluation of positive terms by row
def eval_neg_row(neg_row, w, b):
cur_exponent = np.dot(w, neg_row) + b
cur_val = 1.0 - expit(cur_exponent)
if cur_val == 0:
print("neg", cur_exponent)
return (-1 * np.log(cur_val))
#
# Define the function used for optimization
def log_likelihood(weights):
#
# Separate weights
w = weights[1:]
b = weights[0]
#
# Ge the norm of weights
w_norm = np.dot(w, w)
#
# Sum over positive examples
pos_sum = np.sum(
np.apply_along_axis(eval_pos_row, 1, pos_df, w, b)
)
neg_sum = np.sum(
np.apply_along_axis(eval_neg_row, 1, neg_df, w, b)
)
#
return (0.5 * w_norm) + sheepda * (pos_sum + neg_sum)
return log_likelihood
w = uniform.rvs(size=20) / 10.0
LL = get_optimization_func_call(clean_test_data, 0.5)
res = minimize(LL, w, options={"maxiter": 1e4, "disp": True})

Sampling from posterior using custom likelihood in pymc3

I'm trying to create a custom likelihood using pymc3. The distribution is called Generalized Maximum Likelihood (GEV) which has the location (loc), scale (scale) and shape (c) parameters.
The main ideia is to choose a beta distribution as a prior to the scale parameter and fix the location and scale parameters in the GEV likelihood.
The GEV distribuition is not contained in the pymc3 standard distributions, so I have to create a custom likelihood. I googled it and found out that I should use the densitydist method but I don't know why it is incorrect.
See the code below:
import pymc3 as pm
import numpy as np
from theano.tensor import exp
data=np.random.randn(20)
with pm.Model() as model:
c=pm.Beta('c',alpha=6,beta=9)
loc=1
scale=2
gev=pm.DensityDist('gev', lambda value: exp(-1+c*(((value-loc)/scale)^(1/c))), testval=1)
modelo=pm.gev(loc=loc, scale=scale, c=c, observed=data)
step = pm.Metropolis()
trace = pm.sample(1000, step)
pm.traceplot(trace)
I'm sorry in advance if this is a dumb question, but I could'nt figure it out.
I'm studying annual maximum flows and I'm trying to implement the methodology described in "Generalized maximum-likelihood generalized extreme-value
quantile estimators for hydrologic data" written by Martins and Stedinger.
If you mean the generalized extreme value distribution (https://en.wikipedia.org/wiki/Generalized_extreme_value_distribution), then something like this should work (for c != 0):
import pymc3 as pm
import numpy as np
import theano.tensor as tt
from pymc3.distributions.dist_math import bound
data = np.random.randn(20)
with pm.Model() as model:
c = pm.Beta('c', alpha=6, beta=9)
loc = 1
scale = 2
def gev_logp(value):
scaled = (value - loc) / scale
logp = -(scale
+ ((c + 1) / c) * tt.log1p(c * scaled)
+ (1 + c * scaled) ** (-1/c))
alpha = loc - scale / c
bounds = tt.switch(value > 0, value > alpha, value < alpha)
return bound(logp, bounds, c != 0)
gev = pm.DensityDist('gev', gev_logp, observed=data)
trace = pm.sample(2000, tune=1000, njobs=4)
pm.traceplot(trace)
Your logp function was invalid. Exponentiation is ** in python, and part of the expression wasn't valid for negative values.

Represent a first order differential equation in numpy

I have an equation dy/dx = x + y/5 and an initial value, y(0) = -3.
I would like to know how to plot the exact graph of this function using pyplot.
I also have a x = np.linspace(0, interval, steps+1) which I would like to use as the x axis. So I'm only looking for the y axis values.
Thanks in advance.
Just for completeness, this kind of equation can easily be integrated numerically, using scipy.integrate.odeint.
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
# function dy/dx = x + y/5.
func = lambda y,x : x + y/5.
# Initial condition
y0 = -3 # at x=0
# values at which to compute the solution (needs to start at x=0)
x = np.linspace(0, 4, 101)
# solution
y = odeint(func, y0, x)
# plot the solution, note that y is a column vector
plt.plot(x, y[:,0])
plt.xlabel('x')
plt.ylabel('y')
plt.show()
Given that you need to solve the d.e. you might prefer doing this algebraically, with sympy. (Or you might not.)
Import the module and define the function and the dependent variable.
>>> from sympy import *
>>> f = Function('f')
>>> var('x')
x
Invoke the solver. Note that all terms of the d.e. must be transposed to the left of the equals sign, and that the y must be replaced by the designator for the function.
>>> dsolve(Derivative(f(x),x)-x-f(x)/5)
Eq(f(x), (C1 + 5*(-x - 5)*exp(-x/5))*exp(x/5))
As you would expect, the solution is given in terms of an arbitrary constant. We must solve for that using the initial value. We define it as a sympy variable.
>>> var('C1')
C1
Now we create an expression to represent this arbitrary constant as the left side of an equation that we can solve. We replace f(0) with its value in the initial condition. Then we substitute the value of x in that condition to get an equation in C1.
>>> expr = -3 - ( (C1 + 5*(-x - 5)*exp(-x/5))*exp(x/5) )
>>> expr.subs(x,0)
-C1 + 22
In other words, C1 = 22. Finally, we can use this value to obtain the particular solution of the differential equation.
>>> ((C1 + 5*(-x - 5)*exp(-x/5))*exp(x/5)).subs(C1,22)
((-5*x - 25)*exp(-x/5) + 22)*exp(x/5)
Because I'm absentminded and ever fearful of making egregious mistakes I check that this function satisfies the initial condition.
>>> (((-5*x - 25)*exp(-x/5) + 22)*exp(x/5)).subs(x,0)
-3
(Usually things are incorrect only when I forget to check them. Such is life.)
And I can plot this in sympy too.
>>> plot(((-5*x - 25)*exp(-x/5) + 22)*exp(x/5),(x,-1,5))
<sympy.plotting.plot.Plot object at 0x0000000008C2F780>