How to estimate parameters in a Bayes net using PyMC - bayesian-networks

I would like to estimate the parameters of a directed Bayes net using PyMC. I came across one particular example that implements the sprinkler network, which has 3 random variables and a conditional probability distribution (CPD) defined for each node.
However, this example has the CPD encoded using deterministic variables.
Is it possible to provide the joint or marginal distribution over 2 or 3 random variables as the observed data to a deterministic PyMC variable?
In other words, if my network is of the form X -> Z <- Y, is it possible to provide a set of tuples of the form 'x1,y1,z1' as observed data, to learn the parameters of the CPD (Z|X,Y)?

The sprinkler example is really setting "Static" probability values. In this line:
p_G = mc.Lambda('p_G', lambda S=S, R=R: pl.where(S, pl.where(R, .99, .9), pl.where(R, .8, 0.)),
doc='Pr[G|S,R]')
to my understanding, i think we would require to set learn one parameter for each value of the parent. so if we want to learn P(Z/X,Y), we will need for each combination of values of X and Y, learn one parameter set for Z.
so lets say X and Y take boolean values and Z is a bernoulli distribution.
for each value of (X,Y) , ie: (0,0),(0,1),(1,0),(1,1) we have parameters, p1,p2,p3,p4. And then Z has 4 pymc observed variables: Z1 with parameter p1, Z2 with parameter p2 , Z3 with parameter p3 and Z4 with parameter p4.
Thus:
P(Z=0/X=0,Y=0) is the mcmc estimated mean of p1.
P(Z=1/X=0,Y=0) = 1-p1
P(Z=0/X=1,Y=0) = p2 and so on....
I have a related question here:
How to use pymc to parameterize a probabilistic graphical model?

Related

Does variational autoencoder make distribution based on only latent representation?

If my latent representation of variational autoencoder(VAE) is r, and my dataset is x, does vae's latent representation follows normalization based on r or x?
If r= 10, that means it has 10 means and variance (multi-gussain) and distribution comes from data whole data x?
Or r = 10 constructs one distribution based on r, and every sample try to follow this distribution
I'm confused about which one is correct
VAE constructs a mapping e(x) -> Z (encoder), and d(z) -> X (decoder). This means that every elements of your input space x will be mapped through an encoder e(x) into a single, r-dimensional Gaussian. It is not a "mixture", it is just a single gaussian with diagonal covariance matrix.
I'll add my 2 cents to #lejlot answer.
Your encoder in VAE will map your sample to a distribution, that in your case has 10 dimensions... that distribution is used to say "ok my best estimate of this property of this sample is mu, but I'm not too sure, so consider that it might have variance sigma"
Therefore, you have a distribution for each sample.
However, in order to make sampling easier in VAE, we ask the VAE to keep the distributions as close to a known one, that is the standard normal distribution (we know "where the distributions are located", if you check the latent space in a normal AE you will see that you will have groups far from eachother).

STAN - Defining priors for dependent random variables

Background: I have a simulation model which has unobserved parameters. I created a metamodel using artificial neural networks (ANN) because the runtime was very long for the simulation model. I am trying to estimate the unobserved parameters using Bayesian calibration, where priors are based on current knowledge, and the likelihood of observing data is being estimated from the metamodel.
Query: I have two random variables X and Y for which I am trying to get the posterior distribution using STAN. The prior distribution of X is uniform, U(0,2). The prior for Y is also uniform, but it will always exceed X i.e., Y ~ U(X,2). Since Y is linked to X, how can I define the prior distribution for Y in STAN such that the constraint Y>X holds? I am new to STAN, so I would appreciate any suggestions or guidance on how to proceed. Thank you so much!
Stan's ordered vectors are what you need. Create an ordered vector of length 2 (I'll call it beta) in the parameters block, like this:
parameters {
ordered<lower=0,upper=2>[2] beta;
}
Ordered vectors are constrained such that each element is greater than the previous element. So beta[1] will be your estimate of X and beta[2] will be your estimate of Y.
(To make sure I understand your model correctly: you have two parameters, X and Y, and your only prior knowledge about them is that they both lie in [0, 2] and Y > X. X and Y describe some aspect of the distribution of your data - for example, maybe X is the mean of some other random variable Z, for which you have observations. Do I have that right?)
I believe Stan's priors are uniform by default, but you can make sure of this by specifying a prior for beta in the model block:
model {
beta ~ uniform(0, 2);
...
}

How to find the input that maximize the Neural Network output in Tensorflow

I'm using Tensorflow (2.4) and Keras to build my neural network model. It takes two tensors as inputs and gives a scalar output. The network is already trained and, from now on, it has fixed weights. It is possible, given one of the two inputs, to find the value of the other input that maximise the output value?
Thank you in advance
In theory, yes.
Lets call your network model f. It takes two inputs x and y and outputs f(x, y). Then, assuming x and f are fixed, you can find the value y* that maximize f(x, y) as follows:
calculate the gradient of f with respect to y. Then, there are two possibilities.
there exists stationary points. Just set df/dy = 0 and solve for y. This gives the y* at which there is either a maximum or a minimum. Compute f(x, y*) to check weather y* gives a maximum or a minimum.
there are no stationary points (or there is no maximum). Here, you need to study where f decreases or increases if y varies. To do this, look for df/dy > 0 (increases) and df/dy < 0 (decreases). You will find that, asymptotically, the function increases. Simply take y*=a, where a is the closest value to such asymptote that you can take (given your data type precision).

How to set bounds and constraints on Tensorflow Variables (tf.Variable)

I am using Tensorflow to minimize a function. The function takes about 10 parameters. Every single parameter has bounds, e.g. a minimum and a maximum value the parameter is allowed to take. For example, the parameter x1 needs to be between 1 and 10.
I also have a pair of parameters that need to have the following constraint x2 > x3. In other words, x2 must always be bigger than x3. (In addition to this, x2 and x3 also have bounds, similarly to the example of x1 above.)
I know that tf.Variable has a "constraint" argument, however I can't really find any examples or documentation on how to use this to achieve the bounds and constraints as mentioned above.
Thank you!
It seems to me (I can be mistaken) that constrained optimization (you can google for it in tensorflow) is not exactly the case for which tensroflow was designed. You may want to take a look at this repo, it may satisfy your needs, but as far as I understand, it's still not solving arbitrary constrained optimization, just some classification problems with labels and features, compatible with precision/recall scores.
If you want to use constraints on the tensorflow variable (i.e. some function applied after gradient step - which you can do manually also - by taking variable values, doing manipulations, and reassigning then), it means that you will be cutting variables after each step done using gradient in general space. It's a question whether you will successfully reach the right optimization goal this way, or your variables will stuck at boundaries, because general gradient will point somewhere outside.
My approach 1
If your problem is simple enough. you can try to parametrize your x2 and x3 as x2 = x3 + t, and then try to do cutting in the graph:
x3 = tf.get_variable('x3',
dtype=tf.float32,
shape=(1,),
initializer=tf.random_uniform_initializer(minval=1., maxval=10.),
constraint=lambda z: tf.clip_by_value(z, 1, 10))
t = tf.get_variable('t',
dtype=tf.float32,
shape=(1,),
initializer=tf.random_uniform_initializer(minval=1., maxval=10.),
constraint=lambda z: tf.clip_by_value(z, 1, 10))
x2 = x3 + t
Then, on a separate call additionally do
sess.run(tf.assign(x2, tf.clip_by_value(x2, 1.0, 10.0)))
But my opinion is that it won't work well.
My approach 2
I would also try to invent some loss terms to keep variables within constraints, which is more likely to work. For example, constraint for x2 to be in the interval [1,10] will be:
loss += alpha*tf.abs(tf.math.tan(((x-5.5)/4.5)*pi/2))
Here the expression under tan is brought to -pi/2,pi/2 and then tan function is used to make it grow very rapidly when it reaches boundaries. In this case I think you're more likely to find your optimum, but again the loss weight alpha might be too big and training will stuck somewhere nearby, if required value of x2 lies near the boundary. In this case you can try to use smaller alpha.
In addition to the answer by Slowpoke, reparameterization is another option. E.g. let's say you have a param p which should be bounded in [lower_bound,upper_bound], you could write:
p_inner = tf.Variable(...) # unbounded
p = tf.sigmoid(p_inner) * (upper_bound - lower_bound) + lower_bound
However, this will change the behavior of gradient descent.

Projection of fisheye camera model by Scaramuzza

I am trying to understand the fisheye model by Scaramuzza, which is implemented in Matlab, see https://de.mathworks.com/help/vision/ug/fisheye-calibration-basics.html#mw_8aca38cc-44de-4a26-a5bc-10fb312ae3c5
The backprojection (uv to xyz) seems fairly straightforward according to the following equation:
, where rho=sqrt(u^2 +v^2)
However, how does the projection (from xyz to uv) work?! In my understanding we get a rather complex set of equations. Unfortunately, I don't find any details on that....
Okay, I believe I understand it now fully after analyzing the functions of the (windows) calibration toolbox by Scaramuzza, see https://sites.google.com/site/scarabotix/ocamcalib-toolbox/ocamcalib-toolbox-download-page
Method 1 found in file "world2cam.m"
For the projection, use the same equation above. In the projection case, the equation has three known (x,y,z) and three unknown variables (u,v and lambda). We first substitute lambda with rho by realizing that
u = x/lambda
v = y/lambda
rho=sqrt(u^2+v^2) = 1/lambda * sqrt(x^2+y^2) --> lambda = sqrt(x^2+y^2) / rho
After that, we have the unknown variables (u,v and rho)
u = x/lambda = x / sqrt(x^2+y^2) * rho
v = y/lambda = y / sqrt(x^2+y^2) * rho
z / lambda = z /sqrt(x^2+y^2) * rho = a0 + a2*rho^2 + a3*rho^3 + a4*rho^4
As you can see, the last equation now has only one unknown, namely rho. Thus, we can solve it easily using e.g. the roots function in matlab. However, the result does not always exist nor is it necessarily unique. After solving the unknown variable rho, calculating uv is very simple using the equation above.
This procedure needs to be performed for each point (x,y,z) separately and is thus rather computationally expensive for an image.
Method 2 found in file "world2cam_fast.m"
The last equation has the form rho(x,y,z). However, if we define m = z / sqrt(x^2+y^2) = tan(90°-theta), it only depends on one variable, namely rho(m).
Instead of solving this equation rho(m) for every new m, the authors "plot" the function for several values of m and fit an 8th order polynomial to these points. Using this polynomial they can calculate an approximate value for rho(m) much quicker in the following.
This becomes clear, because "world2cam_fast.m" makes use of ocam_model.pol, which is calculated in "undistort.m". "undistort.m" in turn makes use of "findinvpoly.m".