Negative values in Log likelihood of a bivariate gaussian - tensorflow

I am trying to implement a loss function which tries to minimize the negative log likelihood of obtaining ground truth values (x,y) from predicted bivariate gaussian distribution parameters. I am implementing this in tensorflow -
Here is the code -
def tf_2d_normal(self, x, y, mux, muy, sx, sy, rho):
'''
Function that implements the PDF of a 2D normal distribution
params:
x : input x points
y : input y points
mux : mean of the distribution in x
muy : mean of the distribution in y
sx : std dev of the distribution in x
sy : std dev of the distribution in y
rho : Correlation factor of the distribution
'''
# eq 3 in the paper
# and eq 24 & 25 in Graves (2013)
# Calculate (x - mux) and (y-muy)
normx = tf.sub(x, mux)
normy = tf.sub(y, muy)
# Calculate sx*sy
sxsy = tf.mul(sx, sy)
# Calculate the exponential factor
z = tf.square(tf.div(normx, sx)) + tf.square(tf.div(normy, sy)) - 2*tf.div(tf.mul(rho, tf.mul(normx, normy)), sxsy)
negRho = 1 - tf.square(rho)
# Numerator
result = tf.exp(tf.div(-z, 2*negRho))
# Normalization constant
denom = 2 * np.pi * tf.mul(sxsy, tf.sqrt(negRho))
# Final PDF calculation
result = -tf.log(tf.div(result, denom))
return result
When I am doing the training, I can see the loss value decreasing but it goes well past below 0. I can understand that should be because, we are minimizing the 'negative' likelihood. Even the loss values are decreasing, I can't get my results accurate. Can someone help in verifying, if the code that I have written for the loss function is correct or not.
Also is such a nature of loss desirable for training Neural Nets(specifically RNN)?
Thankss

I see you've found the sketch-rnn code from magenta, I'm working on something similar. I found this piece of code not to be stable by itself. You'll need to stabilize it using constraints, so the tf_2d_normal code can't be used or interpreted in isolation. NaNs and Infs will start appearing all over the place if your data isn't normalized properly in advance or in your loss function.
Below is a more stable loss function version I'm building with Keras. There may be some redundancy in here, it may not be perfect for your needs but I found it to be working and you can test/adapt it. I included some inline comments on how large negative log values can arise:
def r3_bivariate_gaussian_loss(true, pred):
"""
Rank 3 bivariate gaussian loss function
Returns results of eq # 24 of http://arxiv.org/abs/1308.0850
:param true: truth values with at least [mu1, mu2, sigma1, sigma2, rho]
:param pred: values predicted from a model with the same shape requirements as truth values
:return: the log of the summed max likelihood
"""
x_coord = true[:, :, 0]
y_coord = true[:, :, 1]
mu_x = pred[:, :, 0]
mu_y = pred[:, :, 1]
# exponentiate the sigmas and also make correlative rho between -1 and 1.
# eq. # 21 and 22 of http://arxiv.org/abs/1308.0850
# analogous to https://github.com/tensorflow/magenta/blob/master/magenta/models/sketch_rnn/model.py#L326
sigma_x = K.exp(K.abs(pred[:, :, 2]))
sigma_y = K.exp(K.abs(pred[:, :, 3]))
rho = K.tanh(pred[:, :, 4]) * 0.1 # avoid drifting to -1 or 1 to prevent NaN, you will have to tweak this multiplier value to suit the shape of your data
norm1 = K.log(1 + K.abs(x_coord - mu_x))
norm2 = K.log(1 + K.abs(y_coord - mu_y))
variance_x = K.softplus(K.square(sigma_x))
variance_y = K.softplus(K.square(sigma_y))
s1s2 = K.softplus(sigma_x * sigma_y) # very large if sigma_x and/or sigma_y are very large
# eq 25 of http://arxiv.org/abs/1308.0850
z = ((K.square(norm1) / variance_x) +
(K.square(norm2) / variance_y) -
(2 * rho * norm1 * norm2 / s1s2)) # z → -∞ if rho * norm1 * norm2 → ∞ and/or s1s2 → 0
neg_rho = 1 - K.square(rho) # → 0 if rho → {1, -1}
numerator = K.exp(-z / (2 * neg_rho)) # → ∞ if z → -∞ and/or neg_rho → 0
denominator = (2 * np.pi * s1s2 * K.sqrt(neg_rho)) + epsilon() # → 0 if s1s2 → 0 and/or neg_rho → 0
pdf = numerator / denominator # → ∞ if denominator → 0 and/or if numerator → ∞
return K.log(K.sum(-K.log(pdf + epsilon()))) # → -∞ if pdf → ∞
Hope you find this of value.

Related

Inequality constraints of convex relaxation with McCormick envelope

I have a nonconvex optimization problem for which I am calculating a lower bound using the McCormick envelope. Each bilinear term is replaced with an auxiliary variable which has the following constraints defined:
w_{ij} >= x_i^L * x_j + x_i * x_j^L - x_i^L * x_j^L
w_{ij} >= x_i^U * x_j + x_i * x_j^U - x_i^U * x_j^U
w_{ij} <= x_i^U * x_j + x_i * x_j^L - x_i^U * x_j^L
w_{ij} <= x_i^L * x_j + x_i * x_j^U - x_i^L * x_j^U
where
x_U <= x <= x_L
I am given a function taking in several arguments:
def convex_bounds(n,m,c,H,Q,A,b,lb,ub):
# n is the number of optimization variables
# m is the number of eq constraints
# H = positive, semidefinite matrix from objetcive function (n x n)
# Q is (mxn) x n
# A is m x n
# b is RHS of non linear eq constraints (m x 1)
# c,lb,ub are vectors size (n x 1)
......................................
# Create matrix B & b_ineq for inequality constraints
# where B*x <= b_ineq
B = np.eye(3)
b_ineq = np.array((10,10,10))
## these values would work in a scenario with no bilinear terms
My problem is that I don't know how to specify the inequality constraints matrix B and vector b_ineq. For this particular exercise my variables are x1, x2 and x3 with bounds 0 (x_L) and 10 (x_U). My bilinear terms are x_12 and x_23 (which will lead to auxiliary variables w_12 and w_23). How can I specify the known bounds (0 and 10) for x1,x2 and x3 and the calculated ones (as in the theory pasted above) in B and b_ineq?
I don't actually know how to proceed with this.

Percentage weighting given two variables to equal a target

I have a target of target = 11.82 with two variables
x = 9
y = 15
How do I find the percentage weighting that would blend x & y to equal my target? i.e. 55% of x and 45% of y - what function is most efficient way to calc a weighting to obtain my target?
Looking at it again, what I think you want is really two equations:
9x + 15y = 11.82
x + y = 1
Solving that system of equations is pretty fast on pen and paper (just do linear combination). Or you could use sympy to solve the system of linear equations:
>>> from sympy import *
>>> from sympy.solvers.solveset import linsolve
>>> x, y = symbols('x, y')
>>> linsolve([x + y - 1, 9 * x + 15 * y - 11.82], (x, y)) # make 0 on right by subtraction
FiniteSet((0.53, 0.47))
We can confirm this by substitution:
>>> 9 * 0.53 + 15 * 0.47
11.82

How to apply bounds on a variable when performing optimisation in Pytorch?

I am trying to use Pytorch for non-convex optimisation, trying to maximise my objective (so minimise in SGD). I would like to bound my dependent variable x > 0, and also have the sum of my x values be less than 1000.
I think I have the penalty implemented correctly in the form of a ramp penalty, but am struggling with the bounding of the x variable. In Pytorch you can set the bounds using clamp but it doesn't seem appropriate in this case. I think this is because optim needs the gradients free under the hood. Full working example:
import torch
from torch.autograd import Variable
import numpy as np
def objective(x, a, b, c): # Want to maximise this quantity (so minimise in SGD)
d = 1 / (1 + torch.exp(-a * (x)))
# Checking constraint
exceeded_limit = constraint(x).item()
#print(exceeded_limit)
obj = torch.sum(d * (b * c - x))
# If overlimit add ramp penalty
if exceeded_limit < 0:
obj = obj - (exceeded_limit * 10)
print("Exceeded limit")
return - obj
def constraint(x, limit = 1000): # Must be > 0
return limit - x.sum()
N = 1000
# x is variable to optimise for
x = Variable(torch.Tensor([1 for ii in range(N)]), requires_grad=True)
a = Variable(torch.Tensor(np.random.uniform(0,100,N)), requires_grad=True)
b = Variable(torch.Tensor(np.random.rand(N)), requires_grad=True)
c = Variable(torch.Tensor(np.random.rand(N)), requires_grad=True)
# Would like to include the clamp
# x = torch.clamp(x, min=0)
# Non-convex methodf
opt = torch.optim.SGD([x], lr=.01)
for i in range(10000):
# Zeroing gradients
opt.zero_grad()
# Evaluating the objective
obj = objective(x, a, b, c)
# Calculate gradients
obj.backward()
opt.step()
if i%1000==0: print("Objective: %.1f" % -obj.item())
print("\nObjective: {}".format(-obj))
print("Limit: {}".format(constraint(x).item()))
if torch.sum(x<0) > 0: print("Bounds not met")
if constraint(x).item() < 0: print("Constraint not met")
Any suggestions as to how to impose the bounds would be appreciated, either using clamp or otherwise. Or generally advice on non-convex optimisation using Pytorch. This is a much simpler and scaled down version of the problem I'm working so am trying to find a lightweight solution if possible. I am considering using a workaround such as transforming the x variable using an exponential function but then you'd have to scale the function to avoid the positive values becoming infinite, and I want some flexibility with being able to set the constraint.
I meet the same problem with you.
I want to apply bounds on a variable in PyTorch, too.
And I solved this problem by the below Way3.
Your example is a little compliex but I am still learning English.
So I give a simpler example below.
For example, there is a trainable variable v, its bounds is (-1, 1)
v = torch.tensor((0.5, ), require_grad=True)
v_loss = xxxx
optimizer.zero_grad()
v_loss.backward()
optimizer.step()
Way1. RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.
v.clamp_(-1, 1)
Way2. RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed.
v = torch.clamp(v, -1, +1) # equal to v = v.clamp(-1, +1)
Way3. NotError. I solved this problem in Way3.
with torch.no_grad():
v[:] = v.clamp(-1, +1) # You must use v[:]=xxx instead of v=xxx

Beginner Finite Elemente Code does not solve equation properly

I am trying to write the code for solving the extremely difficult differential equation:
x' = 1
with the finite element method.
As far as I understood, I can obtain the solution u as
with the basis functions phi_i(x), while I can obtain the u_i as the solution of the system of linear equations:
with the differential operator D (here only the first derivative). As a basis I am using the tent function:
def tent(l, r, x):
m = (l + r) / 2
if x >= l and x <= m:
return (x - l) / (m - l)
elif x < r and x > m:
return (r - x) / (r - m)
else:
return 0
def tent_half_down(l,r,x):
if x >= l and x <= r:
return (r - x) / (r - l)
else:
return 0
def tent_half_up(l,r,x):
if x >= l and x <= r:
return (x - l) / (r - l)
else:
return 0
def tent_prime(l, r, x):
m = (l + r) / 2
if x >= l and x <= m:
return 1 / (m - l)
elif x < r and x > m:
return 1 / (m - r)
else:
return 0
def tent_half_prime_down(l,r,x):
if x >= l and x <= r:
return - 1 / (r - l)
else:
return 0
def tent_half_prime_up(l, r, x):
if x >= l and x <= r:
return 1 / (r - l)
else:
return 0
def sources(x):
return 1
Discretizing my space:
n_vertex = 30
n_points = (n_vertex-1) * 40
space = (0,5)
x_space = np.linspace(space[0],space[1],n_points)
vertx_list = np.linspace(space[0],space[1], n_vertex)
tent_list = np.zeros((n_vertex, n_points))
tent_prime_list = np.zeros((n_vertex, n_points))
tent_list[0,:] = [tent_half_down(vertx_list[0],vertx_list[1],x) for x in x_space]
tent_list[-1,:] = [tent_half_up(vertx_list[-2],vertx_list[-1],x) for x in x_space]
tent_prime_list[0,:] = [tent_half_prime_down(vertx_list[0],vertx_list[1],x) for x in x_space]
tent_prime_list[-1,:] = [tent_half_prime_up(vertx_list[-2],vertx_list[-1],x) for x in x_space]
for i in range(1,n_vertex-1):
tent_list[i, :] = [tent(vertx_list[i-1],vertx_list[i+1],x) for x in x_space]
tent_prime_list[i, :] = [tent_prime(vertx_list[i-1],vertx_list[i+1],x) for x in x_space]
Calculating the system of linear equations:
b = np.zeros((n_vertex))
A = np.zeros((n_vertex,n_vertex))
for i in range(n_vertex):
b[i] = np.trapz(tent_list[i,:]*sources(x_space))
for j in range(n_vertex):
A[j, i] = np.trapz(tent_prime_list[j] * tent_list[i])
And then solving and reconstructing it
u = np.linalg.solve(A,b)
sol = tent_list.T.dot(u)
But it does not work, I am only getting some up and down pattern. What am I doing wrong?
First, a couple of comments on terminology and notation:
1) You are using the weak formulation, though you've done this implicitly. A formulation being "weak" has nothing to do with the order of derivatives involved. It is weak because you are not satisfying the differential equation exactly at every location. FE minimizes the weighted residual of the solution, integrated over the domain. The functions phi_j actually discretize the weighting function. The difference when you only have first-order derivatives is that you don't have to apply the Gauss divergence theorem (which simplifies to integration by parts for one dimension) to eliminate second-order derivatives. You can tell this wasn't done because phi_j is not differentiated in the LHS.
2) I would suggest not using "A" as the differential operator. You also use this symbol for the global system matrix, so your notation is inconsistent. People often use "D", since this fits better to the idea that it is used for differentiation.
Secondly, about your implementation:
3) You are using way more integration points than necessary. Your elements use linear interpolation functions, which means you only need one integration point located at the center of the element to evaluate the integral exactly. Look into the details of Gauss quadrature to see why. Also, you've specified the number of integration points as a multiple of the number of nodes. This should be done as a multiple of the number of elements instead (in your case, n_vertex-1), because the elements are the domains on which you're integrating.
4) You have built your system by simply removing the two end nodes from the formulation. This isn't the correct way to specify boundary conditions. I would suggesting building the full system first and using one of the typical methods for applying Dirichlet boundary conditions. Also, think about what constraining two nodes would imply for the differential equation you're trying to solve. What function exists that satisfies x' = 1, x(0) = 0, x(5) = 0? You have overconstrained the system by trying to apply 2 boundary conditions to a first-order differential equation.
Unfortunately, there isn't a small tweak that can be made to get the code to work, but I hope the comments above help you rethink your approach.
EDIT to address your changes:
1) Assuming the matrix A is addressed with A[row,col], then your indices are backwards. You should be integrating with A[i,j] = ...
2) A simple way to apply a constraint is to replace one row with the constraint desired. If you want x(0) = 0, for example, set A[0,j] = 0 for all j, then set A[0,0] = 1 and set b[0] = 0. This substitutes one of the equations with u_0 = 0. Do this after integrating.

Slew rate measuring

I have to measure slew rates in signals like the one in the image below. I need the slew rate of the part marked by the grey arrow.
At the moment I smoothen the signal with a hann window to get rid of eventual noise and to flatten the peaks. Then I search (starting right) the 30% and 70% points and calculate the slew rate between this two points.
But my problem is, that the signal gets flattened after smoothing. Therefore the calculated slew rate is not as high as it should be. An if I reduce smoothing, then the peaks (you can see right side in the image) get higher and the 30% point is eventually found at the wrong position.
Is there a better/safer way to find the required slew rate?
If you know between what values your signal is transitioning, and your noise is not too large, you can simply compute the time differences between all crossings of 30% and all crossings of 70% and keep the smallest one:
import numpy as np
import matplotlib.pyplot as plt
s100, s0 = 5, 0
signal = np.concatenate((np.ones((25,)) * s100,
s100 + (np.random.rand(25) - 0.5) * (s100-s0),
np.linspace(s100, s0, 25),
s0 + (np.random.rand(25) - 0.5) * (s100-s0),
np.ones((25,)) * s0))
# Interpolate to find crossings with 30% and 70% of signal
# The general linear interpolation formula between (x0, y0) and (x1, y1) is:
# y = y0 + (x-x0) * (y1-y0) / (x1-x0)
# to find the x at which the crossing with y happens:
# x = x0 + (y-y0) * (x1-x0) / (y1-y0)
# Because we are using indices as time, x1-x0 == 1, and if the crossing
# happens within the interval, then 0 <= x <= 1.
# The following code is just a vectorized version of the above
delta_s = np.diff(signal)
t30 = (s0 + (s100-s0)*.3 - signal[:-1]) / delta_s
idx30 = np.where((t30 > 0) & (t30 < 1))[0]
t30 = idx30 + t30[idx30]
t70 = (s0 + (s100-s0)*.7 - signal[:-1]) / delta_s
idx70 = np.where((t70 > 0) & (t70 < 1))[0]
t70 = idx70 + t70[idx70]
# compute all possible transition times, keep the smallest
idx = np.unravel_index(np.argmin(t30[:, None] - t70),
(len(t30), len(t70),))
print t30[idx[0]] - t70[idx[1]]
# 9.6
plt. plot(signal)
plt.plot(t30, [s0 + (s100-s0)*.3]*len(t30), 'go')
plt.plot(t30[idx[0]], [s0 + (s100-s0)*.3], 'o', mec='g', mfc='None', ms=10)
plt.plot(t70, [s0 + (s100-s0)*.7]*len(t70), 'ro')
plt.plot(t70[idx[1]], [s0 + (s100-s0)*.7], 'o', mec='r', mfc='None', ms=10 )
plt.show()