Defining prior for a mix of continuous and binary predictors to run Bayesian linear regression using Markov Chain Monte Carlo - bayesian

I am trying to run a Bayesian linear regression using Markov Chain Monte Carlo in PyMC3. I am trying to set the prior for my problem where my response is a continuous variable and I have 12 predictors (8 binary and 4 continuous). How do I define the prior for this problem?
I tried setting up the prior as 8 binomial distributions and 4 continuous variables but I am not able to frame the equation in the right way.
I checked the following code from pymc3
# Context for the model
with pm.Model() as normal_model:
# The prior for the model parameters will be a normal distribution
family = pm.glm.families.Normal()
# Creating the model requires a formula and data (and optionally a family)
pm.GLM.from_formula(formula, data = X_train, family = family)
# Perform Markov Chain Monte Carlo sampling
normal_trace = pm.sample(draws=2000, chains = 2, tune = 500, njobs=-1)
In the above code does 'family = pm.glm.families.Normal()' set prior for all parameters as a normal distribution of mean zero and sd=1 ? How do we change this code to declare prior for the 8 binomial and 4 continuous variables?

Related

TensorFlow / PyTorch: Gradient for loss which is measured externally

I am relatively new to Machine Learning and Python.
I have a system, which consists of a NN whose output is fed into an unknown nonlinear function F, e.g. some hardware. The idea is to train the NN to be an inverse F^(-1) of that unknown nonlinear function F. This means that a loss L is calculated at the output of F. However, backpropagation cannot be used in a straightforward manner for calculating the gradients and updating the NN weights because the gradient of F is not known either.
Is there any way how to use a loss function L, which is not directly connected to the NN, for the calculation of the gradients in TensorFlow or PyTorch? Or to take a loss that was obtained with any other software (Matlab, C, etc.) use it for backpropagation?
As far as I know, Keras keras.backend.gradients only allows to calculate gradients with respect to connected weights, otherwise the gradient is either zero or NoneType.
I read about the stop_gradient() function in TensorFlow. But I am not sure whether this is what I am looking for. It allows to not compute the gradient with respect to some variables during backpropagation. But I think the operation F is not interpreted as a variable anyway.
Can I define any arbitrary loss function (including a hardware measurement) and use it for backpropagation in TensorFlow or is it required to be connected to the graph as well?
Please, let me know if my question is not specific enough.
AFAIK, all modern deep learning packages (pytorch, tensorflow, keras etc.) are relaying on gradient descent (and its many variants) to train networks.
As the name suggests, you cannot do gradient descent without gradients.
However, you might circumvent the "non differentiability" of your "given" function F by looking at the problem from a slightly different perspective:
You are trying to learn a model M that "counters" the effect of F. So you have access to F (but not its gradients) and a set of representative inputs X={x_0, x_1, ... x_n}.
For each example x_i you can compute y_i = F(x_i) and your end goal is to have a model M that given y_i will output x_i.
Therefore, you can treat y_i as your model's input and compute a loss between M(y_i) and x_i that produced it. This way you do not need to compute gradients through the "black box" F.
A pseudo code would look something like:
for x in examples:
y = F(x) # applying F on x - getting only output WITHOUT any gradients
pred = M(y) # apply the trainable model M to the output of F
loss = ||x - pred|| # loss will propagate gradients through M and stop at F
loss.backward()

How to optimize a simulation metric with deep learning without target values?

I am trying to use an RNN model that outputs bus routes and its input is the demand matrix. The bus routes are then used in a simulation which spits out a metric of how the routes performed. The question is, since there is no target value of bus routes, how do I back propagate the simulation result?
To explain the question with simple python code:
"""
The model is an RNN that takes 400,24,24 matrix as input
dimension 0 represents time, dimension 1 represents departure bus stop and dimension 2 represents the arrival bus stop. Each value is a count of the number of passengers who departed at a bus stop with an arrival bus stop in mind in a specific time
output is 64,24 matrix which will be reshaped to 8,8,24
dimension 0 is the sequence index, dimension 1 is the index of bus (there are 8 buses), dimension 2 is the softmaxed classifier dimension of 24 different bus stops. From the output, 8 bus stops are picked per bus with a sequence
These sequences are then used for path generations of buses and they are evaluated from a simulation
"""
model.train()
optimizer.zero_grad()
out = model(demand)#out is 64,24 demand is 400,24,24
demand, performance = simulation(out)#assume performance as float
#here the out has grad_fn but the performance does not
loss = SOME_NUMBER - performance
loss = torch.FloatTensor(loss)
#here I need to back propagate and it is the confusing part
#simply doing loss.backward() does nothing because no grad_fn
#out.backward() requires 64,24 gradients computed somehow from 1 #metric, causes complete divergence within few steps
optimizer.step()
How does the model output represent the bus routes? Maybe you could try a reinforced learning approach. Take a look at Deep-Q Learning, It basically takes and input vector (the state of the system) and outputs an action (usually represented by an index in your output layer), then it computes the reward of that action and uses it to train the model (without the need of target values).
Here are some resources that might help you get started:
https://towardsdatascience.com/double-deep-q-networks-905dd8325412
https://arxiv.org/pdf/1802.09477.pdf
https://arxiv.org/pdf/1509.06461.pdf
Hope this was useful.
UPDATE
There is a second option, you could define a custom loss function. Generally these functions only take two arguments, the predicted_y and the target_y, in your case, there is no target_y, so you could pass a dummy target_y and not use it inside the function (I assume that you could call your simulation process inside that function, and return the metric as the "loss"). Here are examples in PyTorch and Keras.
Keras: Make a custom loss function in keras
PyTorch:PyTorch custom loss function

Is it possible to use Keras to optimize the coefficients of a mathematical function?

I'm very new to Keras a neural network in general. and I was wondering if I had a list of points (x,y) that came from a quadratic function that looks like this (ax^2+bx+c) is it possible
to feed the points into a neural network and
get the coefficients a,b and c as an output from the network?
I know that I can simply use polynomial regression to achieve my goal. that is not the point.
If you are asking how to do polynomial regression using neural networks, here's the recipe.
Your dataset consists of points (x, y). Design your network to be a fully connected network (dense network) with 1 input layer and 1 output layer. The input layer consists of 2 nodes, the output layer consists of 1 node. Then, give to your network the inputs x and x^2. The output will be computed as:
y = w * X + c
where w is a matrix of learnable parameters. Specifically, it has shape 1x2 since it contains parameters a and b. c is a bias. The input matrix X has shape 2xN, where N is the number of points in your dataset and for each point, the first component is x^2 and the second component is x.
As loss function, use the standard Mean Squared Error loss. As for the optimizer, a simple Stochastic Gradient Descent should work just fine. At convergence, w and c will be good enough to approximate the true quadratic function.
I don't know keras, but I think it will not tough figuring out by yourself how to implement this naive network.

How to make a selective back-propagation in a mini-batch in Tensorflow?

Recently, I'm working on a project "predicting future trajectories of objects from their past trajectories by using LSTMs in Tensorflow."
(Here, a trajectory means a sequence of 2D positions.)
Input to the LSTM is, of course, 'past trajectories' and output is 'future trajectories'.
The size of mini-batch is fixed when training. However, the number of past trajectories in a mini-batch can be different. For example, let the mini-batch size be 10. If I have only 4 past trajectories for the current training iteration, 6 out of 10 in the mini-batch is padded with zero value.
When calculating the loss for the back-propagation, I let the loss from the 6 be zero so that the only 4 contribute to the back-propagation.
The problem that I concern is..it seems that Tensorflow still calculates gradients for the 6 even if their loss is zero. As a result, the training speed becomes slower as I increase the mini-batch size even if I used the same training data.
I also used tf.where function when calculating the loss. However, the training time does not decrease.
How can I reduce the training time?
Here I attached my pseudo code for training.
# For each frame in a sequence
for f in range(pred_length):
# For each element in a batch
for b in range(batch_size):
with tf.variable_scope("rnnlm") as scope:
if (f > 0 or b > 0):
scope.reuse_variables()
# for each pedestrian in an element
for p in range(MNP):
# ground-truth position
cur_gt_pose = ...
# loss mask
loss_mask_ped = ... # '1' or '0'
# go through RNN decoder
output_states_dec_list[b][p], zero_states_dec_list[b][p] = cell_dec(cur_embed_frm_dec,
zero_states_dec_list[b][p])
# fully connected layer for output
cur_pred_pose_dec = tf.nn.xw_plus_b(output_states_dec_list[b][p], output_wd, output_bd)
# go through embedding function for the next input
prev_embed_frms_dec_list[b][p] = tf.reshape(tf.nn.relu(tf.nn.xw_plus_b(cur_pred_pose_dec, embedding_wd, embedding_bd)), shape=(1, rnn_size))
# calculate MSE loss
mse_loss = tf.reduce_sum(tf.pow(tf.subtract(cur_pred_pose_dec, cur_gt_pose_dec), 2.0))
# only valid ped's traj contributes to the loss
self.loss += tf.multiply(mse_loss, loss_mask_ped)
I think you're looking for the function tf.stop_gradient. Using this, you could do something like tf.where(loss_mask, tensor, tf.stop_gradient(tensor)) to achieve the desired result, assuming that the dimensions are correct.
However, it looks like this is probably not your issue. It seems as though for each item in your dataset, you are defining new graph nodes. This is not how TensorFlow is supposed to function, you should only have one graph, built beforehand that performs some fixed function, regardless of the batch size. You should definitely not be defining new nodes for every element in the batch, since that cannot efficiently take advantage of parallelism.

Can TensorFlow support spiking neurons?

I looked around for tutorials/articles/examples/... to use spiking neurons (e.g. of the SRM/Spike Response Model type) in TensorFlow, but I could not find anything.
Is it possible to simulate these models in TensorFlow at all?
Can TensorFlow simulate models which explicitely depend on time?
Are there any plug-ins/extensions/data files which can add this capability?
Is the GPU supported?
I was also interested in this problem and have done exactly what Pietro mentioned. i.e. Took a matlab implementation of a simplified Hodgkin-Huxley model and converted it to Tensorflow.
Have a look at https://github.com/jotia1/spiking-net-tensorflow
https://joshuaarnold.com.au/simulating-spiking-nets-in-tensorflow/ for the blog post with some of my thoughts on the whole process. broken link
Interested in hearing your thoughts on it.
Yes tensorflow can implement spiking neuron models. It is a general purpose computation framework.
Is there an implementation available: I don't think so but I have a friend who is interested in this project.
The GPU is supported for many/most of the tensorflow operations. You'll have to check the docs to see which ones are not supported.
As pointed out by Steven, Tensorflow is a computation framework and as such allows implementing any algorithm.
The main difference between Tensorflow and other computation framework like Matlab or numpy/scipy is that it relies on computation graphs: you do not perform the operations directly, but instead build a graph of operations that is later evaluated inside a session.
I was also interested in Spiking neurons and Tensorflow and found that question. As joti, I implemented the same Matlab exercise in Tensorflow (link to my blog post)
Here are for instance two operations defining the membrane and recovery factor increments assuming you provide u, v and i:
n = 10
SPIKING_THRESHOLD = 35.0
v = tf.placeholder(tf.float32, shape=[n])
u = tf.placeholder(tf.float32, shape=[n])
i = tf.placeholder(tf.float32)
# Evaluate which neurons have reached the spiking threshold
has_fired_op = tf.greater_equal(v, tf.constant(SPIKING_THRESHOLD, shape=v.shape))
# Evaluate membrane potential increment for the considered time interval
# dv = 0 if the neuron fired, dv = 0.04v*v + 5v + 140 + I -u otherwise
dv_op = tf.where(has_fired_op,
tf.zeros(v.shape),
tf.subtract(tf.add_n([tf.multiply(tf.square(v), 0.04),
tf.multiply(v, 5.0),
tf.constant(140.0, shape=v.shape),
i]),
self.u))
# Evaluate membrane recovery decrement for the considered time interval
# du = 0 if the neuron fired, du = a*(b*v -u) otherwise
du_op = tf.where(has_fired_op,
tf.zeros([v.shape]),
tf.multiply(A, tf.subtract(tf.multiply(B, v), u)))
And you evaluate them like that:
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
feed = {u: np.full((n), -13.0), v: np.full((n), -65.0), i : 7.0}
dv, du = sess.run([dv_op, du_op], feed_dict=feed)
Note that this is just an example to illustrate how Tensorflow works, and not an actual simulation of spiking neuron: usually you want to evaluate also u and v based on synaptic input (in that case, the placeholders will be the synapse inputs).