Minimizing negative log-likelihood of logistic regression, scipy returning warning: "Desired error not necessarily achieved due to precision loss." - numpy

I'm trying to sort out why scipy optimize isn't converging on a solution for the minimum negative-log-likelihood of the logistic regression function (as implemented below).
It seems to converge for smaller data sets, but for the larger data sets scipy returns the warning: "Desired error not necessarily achieved due to precision loss."
I thought this was a well-behaved optimization problem, so I'm anxious that I'm missing an obvious mistake.
Can anyone spot a mistake in my implementation or make a suggestion that I might try?
I'm using the default method, but I have had little luck with the other various methods that miminize allows.
Many thanks!
Quick summary of the implementation. I'm minimizing the following statement:
with the caveat that since b is a constant, I'm using the exponent -(w*x + b). I think I've implemented that function correct, but maybe I'm not seeing something. Since the data are constants with respect to the function being minimized, I just output a function definition that retains the data within it; thus, the function to be minimized only accepts the weights.
The data is a pandas dataframe of the format: rows == samples, columns == attributes, but LAST column == label (0 or 1). I've transformed all the data to make sure it is continuous, and I've normalized it to have a mean of 0 and a standard deviation of 1. I'm also starting with random weights between [0, 0.1], treating the first weight as 'b'.
def get_optimization_func_call(data, sheepda):
#
# Extract pos/neg data without label
pos_df = data[data[LABEL] == 1].as_matrix()[:, :-1]
neg_df = data[data[LABEL] == 0].as_matrix()[:, :-1]
#
# Def evaluation of positive terms by row
def eval_pos_row(pos_row, w, b):
cur_exponent = np.dot(w, pos_row) + b
cur_val = expit(cur_exponent)
if cur_val == 0:
print("pos", cur_exponent)
return (-1 * np.log(cur_val))
#
# Def evaluation of positive terms by row
def eval_neg_row(neg_row, w, b):
cur_exponent = np.dot(w, neg_row) + b
cur_val = 1.0 - expit(cur_exponent)
if cur_val == 0:
print("neg", cur_exponent)
return (-1 * np.log(cur_val))
#
# Define the function used for optimization
def log_likelihood(weights):
#
# Separate weights
w = weights[1:]
b = weights[0]
#
# Ge the norm of weights
w_norm = np.dot(w, w)
#
# Sum over positive examples
pos_sum = np.sum(
np.apply_along_axis(eval_pos_row, 1, pos_df, w, b)
)
neg_sum = np.sum(
np.apply_along_axis(eval_neg_row, 1, neg_df, w, b)
)
#
return (0.5 * w_norm) + sheepda * (pos_sum + neg_sum)
return log_likelihood
w = uniform.rvs(size=20) / 10.0
LL = get_optimization_func_call(clean_test_data, 0.5)
res = minimize(LL, w, options={"maxiter": 1e4, "disp": True})

Related

Is nx.eigenvector_centrality_numpy() using the Arnoldi iteration instead of the basic power method?

Since nx.eigenvector_centrality_numpy() using ARPACK, is it mean that nx.eigenvector_centrality_numpy() using Arnoldi iteration instead of the basic power method?
because when I try to compute manually using the basic power method, the result of my computation is different from the result of nx.eigenvector_centrality_numpy(). Can someone explain it to me?
To make it more clear, here is my code and the result that I got from the function and the result when I compute manually.
import networkx as nx
G = nx.DiGraph()
G.add_edge('a', 'b', weight=4)
G.add_edge('b', 'a', weight=2)
G.add_edge('b', 'c', weight=2)
G.add_edge('b','d', weight=2)
G.add_edge('c','b', weight=2)
G.add_edge('d','b', weight=2)
centrality = nx.eigenvector_centrality_numpy(G, weight='weight')
centrality
The result:
{'a': 0.37796447300922725,
'b': 0.7559289460184545,
'c': 0.3779644730092272,
'd': 0.3779644730092272}
Below is code from Power Method Python Program and I did a little bit of modification:
# Power Method to Find Largest Eigen Value and Eigen Vector
# Importing NumPy Library
import numpy as np
import sys
# Reading order of matrix
n = int(input('Enter order of matrix: '))
# Making numpy array of n x n size and initializing
# to zero for storing matrix
a = np.zeros((n,n))
# Reading matrix
print('Enter Matrix Coefficients:')
for i in range(n):
for j in range(n):
a[i][j] = float(input( 'a['+str(i)+']['+ str(j)+']='))
# Making numpy array n x 1 size and initializing to zero
# for storing initial guess vector
x = np.zeros((n))
# Reading initial guess vector
print('Enter initial guess vector: ')
for i in range(n):
x[i] = float(input( 'x['+str(i)+']='))
# Reading tolerable error
tolerable_error = float(input('Enter tolerable error: '))
# Reading maximum number of steps
max_iteration = int(input('Enter maximum number of steps: '))
# Power Method Implementation
lambda_old = 1.0
condition = True
step = 1
while condition:
# Multiplying a and x
ax = np.matmul(a,x)
# Finding new Eigen value and Eigen vector
x = ax/np.linalg.norm(ax)
lambda_new = np.vdot(ax,x)
# Displaying Eigen value and Eigen Vector
print('\nSTEP %d' %(step))
print('----------')
print('Eigen Value = %0.5f' %(lambda_new))
print('Eigen Vector: ')
for i in range(n):
print('%0.5f\t' % (x[i]))
# Checking maximum iteration
step = step + 1
if step > max_iteration:
print('Not convergent in given maximum iteration!')
break
# Calculating error
error = abs(lambda_new - lambda_old)
print('errror='+ str(error))
lambda_old = lambda_new
condition = error > tolerable_error
I used the same matrix and the result:
STEP 99
----------
Eigen Value = 3.70328
Eigen Vector:
0.51640
0.77460
0.25820
0.25820
errror=0.6172133998483682
STEP 100
----------
Eigen Value = 4.32049
Eigen Vector:
0.71714
0.47809
0.35857
0.35857
Not convergent in given maximum iteration!
I've to try to compute it with my calculator too and I know it's not convergent because |lambda1|=|lambda2|=4. I've to know the theory behind nx.eigenvector_centrality_numpy() properly so I can write it right for my thesis. Help me, please

Why does numpy and pytorch give different results after mean and variance normalization?

I am working on a problem in which a matrix has to be mean-var normalized row-wise. It is also required that the normalization is applied after splitting each row into tiny batches.
The code seem to work for Numpy, but fails with Pytorch (which is required for training).
It seems Pytorch and Numpy results differ. Any help will be greatly appreciated.
Example code:
import numpy as np
import torch
def normalize(x, bsize, eps=1e-6):
nc = x.shape[1]
if nc % bsize != 0:
raise Exception(f'Number of columns must be a multiple of bsize')
x = x.reshape(-1, bsize)
m = x.mean(1).reshape(-1, 1)
s = x.std(1).reshape(-1, 1)
n = (x - m) / (eps + s)
n = n.reshape(-1, nc)
return n
# numpy
a = np.float32(np.random.randn(8, 8))
n1 = normalize(a, 4)
# torch
b = torch.tensor(a)
n2 = normalize(b, 4)
n2 = n2.numpy()
print(abs(n1-n2).max())
In the first example you are calling normalize with a, a numpy.ndarray, while in the second you call normalize with b, a torch.Tensor.
According to the documentation page of torch.std, Bessel’s correction is used by default to measure the standard deviation. As such the default behavior between numpy.ndarray.std and torch.Tensor.std is different.
If unbiased is True, Bessel’s correction will be used. Otherwise, the sample deviation is calculated, without any correction.
torch.std(input, dim, unbiased, keepdim=False, *, out=None) → Tensor
Parameters
input (Tensor) – the input tensor.
unbiased (bool) – whether to use Bessel’s correction (δN = 1).
You can try yourself:
>>> a.std(), b.std(unbiased=True), b.std(unbiased=False)
(0.8364538, tensor(0.8942), tensor(0.8365))

How to: TensorFlow-Probability custom loss that ignores NA values (or otherwise masks loss)

I seek to implement in TensorFlow-Probability a masked loss function, that can ignore NAs in the labels.
This is a well worn task for regular tensors. I cannot find an example for distributions.
My distributions are sized (batch, time-steps, outputs) (512, 251 days, 1 to 8 time series)
The traditional loss function given in examples is this using the distribution's log probability.
neg_log_likelihood <- function (x, rv_x) {
-1*(rv_x %>% tfd_log_prob(x))
}
When I replace NAs with zeros, the model trains fine and converges. When I leave in NAs it produces NaN losses as expected.
I've experimented with many different permutations of tf$where to replace loss with 0, the label with 0, etc. In each of those cases the model stops training and loss stays near some constant. That's the case even when there's just a single NA in the labels.
neg_log_likelihood_missing <- function (x, rv_x) {
loss = -1*( rv_x %>% tfd_log_prob(x) )
loss_nonan = tf$where( tf$math$is_finite(x) , loss, 0 )
return(
loss_nonan
)
}
My use of R here is incidental, and any examples in python or otherwise I can translate. If there's a correct way to this so that losses correctly back-propagate, I would greatly appreciate it.
If you are using gradient based inference, you may need the "double where" trick.
While this gets you a correct value of y:
y = computation(x)
tf.where(is_nan(y), 0, y)
...the derivative of the tf.where can still have a nan.
Instead write:
safe_x = tf.where(is_unsafe(x), some_safe_x, x)
y = computation(safe_x)
tf.where(is_unsafe(x), 0, y)
...to get both a safe y out and a safe dy/dx.
For the case you're considering, perhaps write:
class MyMaskedDist(tfd.Distribution):
...
def _log_prob(self, x):
safe_x = tf.where(tf.is_nan(x), self.mode(), x)
lp = compute_log_prob(safe_x)
lp = tf.where(tf.is_nan(x), tf.zeros([], lp.dtype), lp)
return lp

Calculate weighted statistical moments in Python

I've been looking for a function or package that would allow me to calculate the skew and kurtosis of a distribution in a weighted way, as I have histogram data.
For instance I have the data
import numpy as np
np.array([[1, 2],
[2, 5],
[3, 6],
[4,12],
[5, 1])
where the first column [1,2,3,4,5] are the values and the second column [2,5,6,12,1] are the frequencies of the values.
I have found out how to do the first two moments (mean, standard deviation) in a weighted way using the weighted_avg_and_std function specified in this thread, but I was not quite sure how I could extend this to both the skew and kurtosis, or even the nth statistical moment.
I have found the definitions themselves here and could manually write functions to implement this from scratch, but before I go and do that I was wondering if there were any existing packages or functions that might be able to do this.
Thanks
EDIT:
I figured it out, the following code works (please note that this is for population moments)
skewnewss = np.average(((values-average)/np.sqrt(variance))**3, weights=weights)
and
kurtosis=np.average(((values-average)/np.sqrt(variance))**4-3, weights=weights)
I think you have already listed all the ingredients that you need, following the formulas in the link you provided:
import numpy as np
a = np.array([[1,2],[2,5],[3,6],[4,12],[5,1]])
values, weights = a.T
def n_weighted_moment(values, weights, n):
assert n>0 & (values.shape == weights.shape)
w_avg = np.average(values, weights = weights)
w_var = np.sum(weights * (values - w_avg)**2)/np.sum(weights)
if n==1:
return w_avg
elif n==2:
return w_var
else:
w_std = np.sqrt(w_var)
return np.sum(weights * ((values - w_avg)/w_std)**n)/np.sum(weights)
#Same as np.average(((values - w_avg)/w_std)**n, weights=weights)
Which results in:
for n in range(1,5):
print(f'Moment {n} value is {n_weighted_moment(values, weights, n)}')
Moment 1 value is 3.1923076923076925
Moment 2 value is 1.0784023668639053
Moment 3 value is -0.5962505715592139
Moment 4 value is 2.384432138280637
Notice that while you are calculating the excess kurtosis, the formula implemented for a generic n-moment doesn't account for that.
Taken from here
Here is the code
def weighted_mean(var, wts):
"""Calculates the weighted mean"""
return np.average(var, weights=wts)
def weighted_variance(var, wts):
"""Calculates the weighted variance"""
return np.average((var - weighted_mean(var, wts))**2, weights=wts)
def weighted_skew(var, wts):
"""Calculates the weighted skewness"""
return (np.average((var - weighted_mean(var, wts))**3, weights=wts) /
weighted_variance(var, wts)**(1.5))
def weighted_kurtosis(var, wts):
"""Calculates the weighted skewness"""
return (np.average((var - weighted_mean(var, wts))**4, weights=wts) /
weighted_variance(var, wts)**(2))

Build a numpy array from a random distribution until the last column exceeds a threshold

I want to build a 2d numpy array from a random distribution so that each of the values in the last column of each row exceeds a threshold.
Here's the working code I have now. Is there a cleaner way to build numpy arrays with an arbitrary condition?
def new_array(
num_rows: int,
dist: Callable[[int], np.ndarray],
min_hours: int) -> np.ndarray:
# Get the 40th percentile as a reasonable guess for how many samples we need.
# Use a lower percentile to increase num_cols and avoid looping in most cases.
p40_val = np.quantile(dist(20), 0.4)
# Generate at least 10 columns each time.
num_cols = max(int(min_hours / p40_val), 10)
def create_starts() -> np.ndarray:
return dist(num_rows * num_cols).reshape((num_rows, num_cols)).cumsum(axis=1)
max_iters = 20
starts = create_starts()
for _ in range(max_iters):
if np.min(starts[:, -1]) >= min_hours:
# All the last columns exceed min_hours.
break
last_col_vals = starts[:, -1].repeat(num_cols).reshape(starts.shape)
next_starts = create_starts() + last_col_vals
starts = np.append(starts, next_starts, axis=1)
else:
# We didn't break out of the for loop, so we hit the max iterations.
raise AssertionError('Failed to create enough samples to exceed '
'sim duration for all columns')
# Only keep columns up to the column where each value > min_hours.
mins_per_col = np.min(starts, axis=0)
cols_exceeding_sim_duration = np.nonzero(mins_per_col > min_hours)[0]
cols_to_keep = cols_exceeding_sim_duration[0]
return np.delete(starts, np.s_[cols_to_keep:], axis=1)
new_array(5, lambda size: np.random.normal(3, size=size), 7)
# Example output
array([[1.47584632, 4.04034105, 7.19592256],
[3.10804306, 6.46487043, 9.74177227],
[1.03633165, 2.62430309, 6.92413189],
[3.46100139, 6.53068143, 7.37990547],
[2.70152742, 6.09488369, 9.58376664]])
I simplified several things and replaced them with Numpy's logical indexing. The for-loop is now while and there is no need to handle the error as it just runs until there are enough rows.
Is this still working as you expect it?
def new_array(num_rows, dist, min_hours):
# Get the 40th percentile as a reasonable guess for how many samples we need.
# Use a lower percentile to increase num_cols and avoid looping in most cases.
p40_val = np.quantile(dist(20), 0.4)
# Generate at least 10 columns each time.
num_cols = max(int(min_hours / p40_val), 10)
# no need to reshape here, size can be a shape tuple
def create_starts() -> np.ndarray:
return dist((num_rows, num_cols)).cumsum(axis=1)
# append to list, in the end stack it into a Numpy array once.
# faster than numpy.append
# due to Numpy's pre-allocation which will slow down things here.
storage = []
while True:
starts = create_starts()
# boolean / logical array
is_larger = starts[:, -1] >= min_hours
# Use Numpy boolean indexing instead to find the rows
# fitting your condition
good_rows = starts[is_larger, :]
# can also be empty array if none found, but will
# be skipped later
storage.append(good_rows)
# count what is in storage so far, empty arrays will be skipped
# due to shape (0, x)
number_of_good_rows = sum([_a.shape[0] for _a in storage])
print('number_of_good_rows', number_of_good_rows)
if number_of_good_rows >= num_rows:
starts = np.vstack(storage)
print(starts)
break
# Only keep columns up to the column where each value > min_hours.
# also use logical indexing here
is_something = np.logical_not(np.all(starts > min_hours, axis=0))
return starts[:, is_something]