How to implement a weighted product of likelihoods for multiple random variables in pyMC3? - bayesian

I need to build a super-likelihood function from several random variables. The distribution of each variable is standard.
Using two random variables as an example, the target super-likelihood is like this:
S = F1^w1 * F2^w2 (s.t. w1 + w2 = 1)
or equivalently,
logS = w1 logF1 + w2 log F1 (s.t. w1 + w2 = 1).
Where F1 ~ Normal distribution and F2 ~ Bernoulli distribution
I use the following codes
data = <load my data>
[w1,w2] = [0.5,0.5]
with Model() as model:
mu = pm.Uniform('mu',lower=0,upper=1)
sd = pm.Uniform('sd',lower=0,upper=1)
p = pm.Uniform('p',lower=0,upper=1)
F1 = pm.Normal("F1", mu = mu, sigma = sd)
F1 = pm.Bernoulli("F2",p)
S = pm.Deterministic('S',F1**w1*F2**w2, observed=data)
step = Metropolis()
trace = pm.sample(2000, step=step)
But it does not work.
Please help to implement such a weighted likelihood model in pyMC3.

Seems like you are looking for the Mixture function: https://docs.pymc.io/api/distributions/mixture.html#pymc3.distributions.mixture.Mixture

Related

Julia DifferentialEquations.jl all variable output

I have the following example:
using DifferentialEquations
function test1(du,u,p,t)
a,b,c = p
d=a^0.1*(t+1)
e=u[1]/a
f=u[2]/d
du[1] = a*u[1]
du[2] = d*u[2]
du[3] = b*u[2] - c*u[3]
end
p = (2,0.75,0.8)
u0 = [1.0;1.0;1.0]
tspan = (0.0,3.0)
prob = ODEProblem(test1,u0,tspan,p)
sol = solve(prob,saveat=0.3)
The sol objects contain state outputs but, I need efficiently other variables ("d","e","f") as well.
The closest I can get is:
function test2(du,u,p,t)
global i
global Out_values
global sampletimes
a,b,c = p
d=a^0.1*(t+1)
e=u[1]/a
f=u[2]/d
if t in sampletimes
Out_values[1,i] = d
Out_values[2,i] = e
Out_values[3,i] = f
i=i+1
end
du[1] = a*u[1]
du[2] = d*u[2]
du[3] = b*u[2] - c*u[3]
end
sampletimes = tspan[1]:0.3:tspan[2]
Out_values = Array{Float64}(undef, 3, 2*11)
i=1
prob = ODEProblem(test2,u0,tspan,p)
sol = solve(prob,saveat=0.3,tstops=sampletimes)
However, this solution is not ideal because:
it duplicates saveat and I get two sets of slightly different outputs (not sure why), and
it can't expand if I decide not to use saveat and I want to output all solutions, i.e. sol = solve(prob).
Any help is appreciated.

Getting the charge of a single atom, per loop in MD Analysis

I have been trying to use the partial charge of one particular ion to go through a calculation within mdanalysis.
I have tried(This is just a snippet from the code that I know is throwing the error):
Cl = u.select_atoms('resname CLA and prop z <= 79.14')
Lz = 79.14 #Determined from system set-up
Q_sum = 0
COM = 38.42979431152344 #Determined from VMD
file_object1 = open(fors, 'a')
print(dcd, file = file_object1)
for ts in u.trajectory[200:]:
frame = u.trajectory.frame
time = u.trajectory.time
for coord in Cl.positions:
q= Cl.total_charge(Cl.position[coord][2])
coords = coord - (Lz/COM)
q_prof = q * (coords + (Lz / 2)) / Lz
Q_sum = Q_sum + q_prof
print(q)
But I keep getting an error associated with this.
How would I go about selecting this particular atom as it goes through the loop to get the charge of it in MD Analysis? Before I was setting q to equal a constant and the code ran fine so I know it is only this line that is throwing the error:
q = Cl.total_charge(Cl.position[coord][2])
Thanks for the help!
I figured it out with:
def Q_code(dcd, topo):
Lz = u.dimensions[2]
Q_sum = 0
count = 0
CLAs = u.select_atoms('segid IONS or segid PROA or segid PROB or segid MEMB')
ini_frames = -200
n_frames = len(u.trajectory[ini_frames:])
for ts in u.trajectory[ini_frames:]:
count += 1
membrane = u.select_atoms('segid PROA or segid PROB or segid MEMB')
COM = membrane.atoms.center_of_mass()[2]
q_prof = CLAs.atoms.charges * (CLAs.positions[:,2] + (Lz/2 - COM))/Lz
Q_instant = np.sum(q_prof)
Q_sum += Q_instant
Q_av = Q_sum / n_frames
with open('Q_av.txt', 'a') as f:
print('The Q_av for {} is {}'.format(s, Q_av), file = f)
return Q_av

Sequential sampling from conditional multivariate normal

I'm trying to sequentially sample from a Gaussian Process prior.
The problem is that the samples eventually converge to zero or diverge to infinity.
I'm using the basic conditionals described e.g. here
Note: the kernel(X,X) function returns the squared exponential kernel with isometric noise.
Here is my code:
n = 32
x_grid = np.linspace(-5,5,n)
x_all = []
y_all = []
for x in x_grid:
x_all = [x] + x_all
X = np.array(x_all).reshape(-1, 1)
# Mean and covariance of the prior
mu = np.zeros((X.shape), np.float)
cov = kernel(X, X)
if len(mu)==1: # first sample is not conditional
y = np.random.randn()*cov + mu
else:
# condition on all previous samples
u1 = mu[0]
u2 = mu[1:]
y2 = np.atleast_2d(np.array(y_all)).T
C11 = cov[:1,:1] # dependent sample
C12 = np.atleast_2d(cov[0,1:])
C21 = np.atleast_2d(cov[1:,0]).T
C22 = np.atleast_2d(cov[1:, 1:])
C22_ = la.inv(C22)
u = u1 + np.dot(C12, np.dot(C22_, (y2 - u2)))
C22_xC21 = np.dot(C22_, C21)
C_minus = np.dot(C12, C22_xC21) # this weirdly becomes larger than C!
C = C11 - C_minus
y = u + np.random.randn()*C
y_all = [y.flatten()[0]] + y_all
Here's an example with 32 samples, where it collapses:
enter image description here
Here's an example with 34 samples, where it explodes:
enter image description here
(for this particular kernel, 34 is the number of samples at which (or more) the samples start to diverge.

Probabilistic Record Linkage in Pandas

I have two dataframes (X & Y). I would like to link them together and to predict the probability that each potential match is correct.
X = pd.DataFrame({'A': ["One", "Two", "Three"]})
Y = pd.DataFrame({'A': ["One", "To", "Free"]})
Method A
I have not yet fully understood the theory but there is an approach presented in:
Sayers, A., Ben-Shlomo, Y., Blom, A.W. and Steele, F., 2015. Probabilistic record linkage. International journal of epidemiology, 45(3), pp.954-964.
Here is my attempt to implementat it in Pandas:
# Probability that Matches are True Matches
m = 0.95
# Probability that non-Matches are True non-Matches
u = min(len(X), len(Y)) / (len(X) * len(Y))
# Priors
M_Pr = u
U_Pr = 1 - M_Pr
O_Pr = M_Pr / U_Pr # Prior odds of a match
# Combine the dataframes
X['key'] = 1
Y['key'] = 1
Z = pd.merge(X, Y, on='key')
Z = Z.drop('key',axis=1)
X = X.drop('key',axis=1)
Y = Y.drop('key',axis=1)
# Levenshtein distance
def Levenshtein_distance(s1, s2):
if len(s1) > len(s2):
s1, s2 = s2, s1
distances = range(len(s1) + 1)
for i2, c2 in enumerate(s2):
distances_ = [i2+1]
for i1, c1 in enumerate(s1):
if c1 == c2:
distances_.append(distances[i1])
else:
distances_.append(1 + min((distances[i1], distances[i1 + 1], distances_[-1])))
distances = distances_
return distances[-1]
L_D = np.vectorize(Levenshtein_distance, otypes=[float])
Z["D"] = L_D(Z['A_x'], Z['A_y'])
# Max string length
def Max_string_length(X, Y):
return max(len(X), len(Y))
M_L = np.vectorize(Max_string_length, otypes=[float])
Z["L"] = M_L(Z['A_x'], Z['A_y'])
# Agreement weight
def Agreement_weight(D, L):
return 1 - ( D / L )
A_W = np.vectorize(Agreement_weight, otypes=[float])
Z["C"] = A_W(Z['D'], Z['L'])
# Likelihood ratio
def Likelihood_ratio(C):
return (m/u) - ((m/u) - ((1-m) / (1-u))) * (1-C)
L_R = np.vectorize(Likelihood_ratio, otypes=[float])
Z["G"] = L_R(Z['C'])
# Match weight
def Match_weight(G):
return math.log(G) * math.log(2)
M_W = np.vectorize(Match_weight, otypes=[float])
Z["R"] = M_W(Z['G'])
# Posterior odds
def Posterior_odds(R):
return math.exp( R / math.log(2)) * O_Pr
P_O = np.vectorize(Posterior_odds, otypes=[float])
Z["O"] = P_O(Z['R'])
# Probability
def Probability(O):
return O / (1 + O)
Pro = np.vectorize(Probability, otypes=[float])
Z["P"] = Pro(Z['O'])
I have verified that this gives the same results as in the paper. Here is a sensitivity check on m, showing that it doesn't make a lot of difference:
Method B
These assumptions won't apply to all applications but in some cases each row of X should match a row of Y. In that case:
The probabilities should sum to 1
If there are many credible candidates to match to then that should reduce the probability of getting the right one
then:
X["I"] = X.index
# Combine the dataframes
X['key'] = 1
Y['key'] = 1
Z = pd.merge(X, Y, on='key')
Z = Z.drop('key',axis=1)
X = X.drop('key',axis=1)
Y = Y.drop('key',axis=1)
# Levenshtein distance
def Levenshtein_distance(s1, s2):
if len(s1) > len(s2):
s1, s2 = s2, s1
distances = range(len(s1) + 1)
for i2, c2 in enumerate(s2):
distances_ = [i2+1]
for i1, c1 in enumerate(s1):
if c1 == c2:
distances_.append(distances[i1])
else:
distances_.append(1 + min((distances[i1], distances[i1 + 1], distances_[-1])))
distances = distances_
return distances[-1]
L_D = np.vectorize(Levenshtein_distance, otypes=[float])
Z["D"] = L_D(Z['A_x'], Z['A_y'])
# Max string length
def Max_string_length(X, Y):
return max(len(X), len(Y))
M_L = np.vectorize(Max_string_length, otypes=[float])
Z["L"] = M_L(Z['A_x'], Z['A_y'])
# Agreement weight
def Agreement_weight(D, L):
return 1 - ( D / L )
A_W = np.vectorize(Agreement_weight, otypes=[float])
Z["C"] = A_W(Z['D'], Z['L'])
# Normalised Agreement Weight
T = Z .groupby('I') .agg({'C' : sum})
D = pd.DataFrame(T)
D.columns = ['T']
J = Z.set_index('I').join(D)
J['P1'] = J['C'] / J['T']
Comparing it against Method A:
Method C
This combines method A with method B:
# Normalised Probability
U = Z .groupby('I') .agg({'P' : sum})
E = pd.DataFrame(U)
E.columns = ['U']
K = Z.set_index('I').join(E)
K['P1'] = J['P1']
K['P2'] = K['P'] / K['U']
We can see that method B (P1) doesn't take account of uncertainty whereas method C (P2) does.

How to avoid dying weights/gradients in custom LSTM cell in tensorflow. What shall be ideal loss function?

I am trying to train a name generation LSTM network. I am not using pre-defined tensorflow cells (like tf.contrib.rnn.BasicLSTMCell, etc). I have created LSTM cell myself. But the error is not reducing beyond a limit. It only decreases 30% from what it is initially (when random weights were used in forward propagation) and then it starts increasing. Also, the gradients and weights become very small after few thousand training steps.
I think the reason for non-convergence can be one of two:
1. The design of tensorflow graph i have created OR
2. The loss function i used.
I am feeding one hot vectors of each character of the word for each time-step of the network. The code i have used for graph generation and loss function is as follows. Tx is the number of time steps in RNN, n_x,n_a,n_y are length of the input vectors, LSTM cell vector and output vector respectively.
Will be great if someone can help me in identifying what i am doing wrong here.
n_x = vocab_size
n_y = vocab_size
n_a = 100
Tx = 50
Ty = Tx
with open("trainingnames_file.txt") as f:
examples = f.readlines()
examples = [x.lower().strip() for x in examples]
X0 = [[char_to_ix[x1] for x1 in list(x)] for x in examples]
X1 = np.array([np.concatenate([np.array(x), np.zeros([Tx-len(x)])]) for x in X0], dtype=np.int32).T
Y0 = [(x[1:] + [char_to_ix["\n"]]) for x in X0]
Y1 = np.array([np.concatenate([np.array(y), np.zeros([Ty-len(y)])]) for y in Y0], dtype=np.int32).T
m = len(X0)
Wf = tf.get_variable(name="Wf", shape = [n_a,(n_a+n_x)])
Wu = tf.get_variable(name="Wu", shape = [n_a,(n_a+n_x)])
Wc = tf.get_variable(name="Wc", shape = [n_a,(n_a+n_x)])
Wo = tf.get_variable(name="Wo", shape = [n_a,(n_a+n_x)])
Wy = tf.get_variable(name="Wy", shape = [n_y,n_a])
bf = tf.get_variable(name="bf", shape = [n_a,1])
bu = tf.get_variable(name="bu", shape = [n_a,1])
bc = tf.get_variable(name="bc", shape = [n_a,1])
bo = tf.get_variable(name="bo", shape = [n_a,1])
by = tf.get_variable(name="by", shape = [n_y,1])
X_input = tf.placeholder(dtype = tf.int32, shape = [Tx,None])
Y_input = tf.placeholder(dtype = tf.int32, shape = [Ty,None])
X = tf.one_hot(X_input, axis = 0, depth = n_x)
Y = tf.one_hot(Y_input, axis = 0, depth = n_y)
X.shape
a_prev = tf.zeros(shape = [n_a,m])
c_prev = tf.zeros(shape = [n_a,m])
a_all = []
c_all = []
for i in range(Tx):
ac = tf.concat([a_prev,tf.squeeze(tf.slice(input_=X,begin=[0,i,0],size=[n_x,1,m]))], axis=0)
ct = tf.tanh(tf.matmul(Wc,ac) + bc)
tug = tf.sigmoid(tf.matmul(Wu,ac) + bu)
tfg = tf.sigmoid(tf.matmul(Wf,ac) + bf)
tog = tf.sigmoid(tf.matmul(Wo,ac) + bo)
c = tf.multiply(tug,ct) + tf.multiply(tfg,c_prev)
a = tf.multiply(tog,tf.tanh(c))
y = tf.nn.softmax(tf.matmul(Wy,a) + by, axis = 0)
a_all.append(a)
c_all.append(c)
a_prev = a
c_prev = c
y_ex = tf.expand_dims(y,axis=1)
if i == 0:
y_all = y_ex
else:
y_all = tf.concat([y_all,y_ex], axis=1)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=Y,logits=y_all,dim=0))
opt = tf.train.AdamOptimizer()
train = opt.minimize(loss)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
o = sess.run(loss, feed_dict = {X_input:X1,Y_input:Y1})
print(o.shape)
print(o)
sess.run(train, feed_dict = {X_input:X1,Y_input:Y1})
o = sess.run(loss, feed_dict = {X_input:X1,Y_input:Y1})
print(o)