Update parameters of Bayesian Network with new data - bayesian

I have a bayesian network, and I know the CPTs by learning the probabilities from existing data.
Suppose I receive a new data instance. Ideally I don't want to use all the data again to update the probabilities.
Is there a way to incrementally update the CPTs of the existing network each time new data comes in?
I think there should be, and I feel like I'm missing something :)

It's easiest to maintain the joint probability table, and rebuild the CPT from that as needed. Along with the JPT, keep a count of how many examples were used to produce it. When adding the nth example, multiply all probabilities by 1 - 1/n, and then add probability 1/n to the new example's associated probability.
If you're going to do this a bunch, you should maintain a count of examples for each row in the JPT instead of a probability. That'll cut down on numerical drift.


should I shift a dataset to use it for regression with LSTM?

Maybe this is a silly question but I didn't find much about it when I google it.
I have a dataset and I use it for regression but a normal regression with FFNN didn't worked so I thought why not try an LSTM since my data is time dependent I think because it was token from a vehicle while driving so the data is monotonic and maybe I can use LSTM in this Case to do a regression to predict a continuous value (if this doesn't make sense please tell me).
Now the first step is to prepare my data for using LSTM, since I ll predict the future I think my target(Ground truth or labels) should be shifted to the up, am I right?
So if I have a pandas dataframe where each row hold the features and the target(at the end of the row), I assume that the features should stay where they are and the target would be shifted it one step up so that the features in the first row will correspond to the target of the second row (am I wrong).
This way the LSTM will be able to predict the future value from those features.
I didn't find much about this in the internet so please can you provide me how can I do this with some Code?
I also know what I can use pandas.DataFrame.shift to shift a dataset but the last value will hold a NaN I think! how to deal with this? it would be great if you show me some examples or code.
We might need a bit more information regarding the data you are using. Also, I would suggest starting with a more simple recurrent neural network before you start going for LSTMs. The way these networks work is by you feeding the first bit of information, then the next bit of information, then the next bit etc. Let's say that when you feed the first bit of information in, it occurs at time t, then the second bit of information is fed at time t+1 ... etc. up until time t+n.
You can have the neural network output a value at each time step (so a value is outputted at time t, t+1... t+n after each respective input has been fed in). This is a many-to-many network. Or you can have the neural network output a value after all inputs have been provided (i.e. the value is outputted at time t+n). This is called a many-to-one network. What you need is dependednt on your use-case.
For example, say you were recording vehicle behaviour every 100ms and after 10 seconds (i.e. the 100th time step), you wanted to predict the likelihood that the driver was under the influence of alcohol. In this case, you would use a many-to-one network where you put in subsequent vehicle behaviour recordings at subsequent time steps (the first recording at time t, then the next recording at time t+1 etc.) and then the final timestep has the probability value outputted.
If you want a value outputted after every time step, you use a many-to-many design. It's also possible to output a value every k timesteps.

Multiple trained models vs Multple features and one model

I'm trying to build a regression based M/L model using tensorflow.
I am trying to estimate an object's ETA based on the following:
distance from target
distance from target (X component)
distance from target (Y component)
The object travels on specific journeys. This could be represented as from A->B or from A->C or from D->F (POINT 1 -> POINT 2). There are 500 specific journeys (between a set of points).
These journeys aren't completely straight lines, and every journey is different (ie. the shape of the route taken).
I have two ways of getting around this problem:
I can have 500 different models with 4 features and one label(the training ETA data).
I can have 1 model with 5 features and one label.
My dilemma is that if I use option 1, that's added complexity, but will be more accurate as every model will be specific to each journey.
If I use option 2, the model will be pretty simple, but I don't know if it would work properly. The new feature that I would add are originCode+ destinationCode. Unfortunately these are not quantifiable in order to make any numerical sense or pattern - they're just text that define the journey (journey A->B, and the feature would be 'AB').
Is there some way that I can use one model, and categorize the features so that one feature is just a 'grouping' feature (in order separate the training data with respect to the journey.
In ML, I believe that option 2 is generally the better option. We prefer general models rather than tailoring many models to specific tasks, as that gets dangerously close to hardcoding, which is what we're trying to get away from by using ML!
I think that, depending on the training data you have available, and the model size, a one-hot vector could be used to describe the starting/end points for the model. Eg, say we have 5 points (ABCDE), and we are going from position B to position C, this could be represented by the vector:
as in, the first five values correspond to the origin spot whereas the second five are the destination. It is also possible to combine these if you want to reduce your input feature space to:
There are other things to consider, as Scott has said in the comments:
How much data do you have? Maybe the feature space will be too big this way, I can't be sure. If you have enough data, then the model will intuitively learn the general distances (not actually, but intrinsically in the data) between datapoints.
If you have enough data, you might even be able to accurately predict between two points you don't have data for!
If it does come down to not having enough data, then finding representative features of the journey will come into use, ie. length of journey, shape of the journey, elevation travelled etc. Also a metric for distance travelled from the origin could be useful.
Best of luck!
I would be inclined to lean toward individual models. This is because, for a given position along a given route and a constant speed, the ETA is a deterministic function of time. If one moves monotonically closer to the target along the route, it is also a deterministic function of distance to target. Thus, there is no information to transfer from one route to the next, i.e. "lumping" their parameters offers no a priori benefit. This is assuming, of course, that you have several "trips" worth of data along each route (i.e. (distance, speed) collected once per minute, or some such). If you have only, say, one datum per route then lumping the parameters is a must. However, in such a low-data scenario, I believe that including a dummy variable for "which route" would ultimately be fruitless, since that would introduce a number of parameters that rivals the size of your dataset.
As a side note, NEITHER of the models you describe could handle new routes. I would be inclined to build an individual model per route, data quantity permitting, and a single model neglecting the route identity entirely just for handling new routes, until sufficient data is available to build a model for that route.

Should my seq2seq RNN idea work?

I want to predict stock price.
Normally, people would feed the input as a sequence of stock prices.
Then they would feed the output as the same sequence but shifted to the left.
When testing, they would feed the output of the prediction into the next input timestep like this:
I have another idea, which is to fix the sequence length, for example 50 timesteps.
The input and output are exactly the same sequence.
When training, I replace last 3 elements of the input by zero to let the model know that I have no input for those timesteps.
When testing, I would feed the model a sequence of 50 elements. The last 3 are zeros. The predictions I care are the last 3 elements of the output.
Would this work or is there a flaw in this idea?
The main flaw of this idea is that it does not add anything to the model's learning, and it reduces its capacity, as you force your model to learn identity mapping for first 47 steps (50-3). Note, that providing 0 as inputs is equivalent of not providing input for an RNN, as zero input, after multiplying by a weight matrix is still zero, so the only source of information is bias and output from previous timestep - both are already there in the original formulation. Now second addon, where we have output for first 47 steps - there is nothing to be gained by learning the identity mapping, yet network will have to "pay the price" for it - it will need to use weights to encode this mapping in order not to be penalised.
So in short - yes, your idea will work, but it is nearly impossible to get better results this way as compared to the original approach (as you do not provide any new information, do not really modify learning dynamics, yet you limit capacity by requesting identity mapping to be learned per-step; especially that it is an extremely easy thing to learn, so gradient descent will discover this relation first, before even trying to "model the future").

Inference on several inputs in order to calculate the loss function

I am modeling a perceptual process in tensorflow. In the setup I am interested in, the modeled agent is playing a resource game: it has to choose 1 out of n resouces, by relying only on the label that a classifier gives to the resource. Each resource is an ordered pair of two reals. The classifier only sees the first real, but payoffs depend on the second. There is a function taking first to second.
Anyway, ideally I'd like to train the classifier in the following way:
In each run, the classifier give labels to n resources.
The agent then gets the payoff of the resource corresponding to the highest label in some predetermined ranking (say, A > B > C > D), and randomly in case of draw.
The loss is taken to be the normalized absolute difference between the payoff thus obtained and the maximum payoff in the set of resources. I.e., (Payoff_max - Payoff) / Payoff_max
For this to work, one needs to run inference n times, once for each resource, before calculating the loss. Is there a way to do this in tensorflow? If I am tackling the problem in the wrong way feel free to say so, too.
I don't have much knowledge in ML aspects of this, but from programming point of view, I can see doing it in two ways. One is by copying your model n times. All the copies can share the same variables. The output of all of these copies would go into some function that determines the the highest label. As long as this function is differentiable, variables are shared, and n is not too large, it should work. You would need to feed all n inputs together. Note that, backprop will run through each copy and update your weights n times. This is generally not a problem, but if it is, I heart about some fancy tricks one can do by using partial_run.
Another way is to use tf.while_loop. It is pretty clever - it stores activations from each run of the loop and can do backprop through them. The only tricky part should be to accumulate the inference results before feeding them to your loss. Take a look at TensorArray for this. This question can be helpful: Using TensorArrays in the context of a while_loop to accumulate values

Is multiple regression the best approach for optimization?

I am being asked to take a look at a scenario where a company has many projects that they wish to complete, but with any company budget comes into play. There is a Y value of a predefined score, with multiple X inputs. There are also 3 main constraints of Capital Costs, Expense Cost and Time for Completion in Months.
The ask is could an algorithmic approach be used to optimize which projects should be done for the year given the 3 constraints. The approach also should give different results if the constraint values change. The suggested method is multiple regression. Though I have looked into different approaches in detail. I would like to ask the wider community, if anyone has dealt with a similar problem, and what approaches have you used.
Fisrt thing we should understood, a conclution of something is not base on one argument.
this is from communication theory, that every human make a frame of knowledge (understanding conclution), where the frame construct from many piece of knowledge / information).
the concequence is we cannot use single linear regression in math to create a ML / DL system.
at least we should use two different variabel to make a sub conclution. if we push to use single variable with use linear regression (y=mx+c). it's similar to push computer predict something with low accuration. what ever optimization method that you pick...it's still low accuracy..., why...because linear regresion if you use in real life, it similar with predict 'habbit' base on data, not calculating the real condition.
that's means...., we should use multiple linear regression (y=m1x1+m2x2+ ... + c) to calculate anything in order to make computer understood / have conclution / create model of regression. but, not so simple like it. because of computer try to make a conclution from data that have multiple character / varians ... you must classified the data and the conclution.
for an example, try to make computer understood phitagoras.
we know that phitagoras formula is c=((a^2)+(b^2))^(1/2), and we want our computer can make prediction the phitagoras side (c) from two input values (a and b). so to do that, we should make a model or a mutiple linear regresion formula of phitagoras.
step 1 of course we should make a multi character data of phitagoras.
this is an example
a b c
3 4 5
8 6 10
3 14 etc..., try put 10 until 20 data
try to make a conclution of regression formula with multiple regression to predic the c base on a and b values.
you will found that some data have high accuration (higher than 98%) for some value and some value is not to accurate (under 90%). example a=3 and b=14 or b=15, will give low accuration result (under 90%).
so you must make and optimization....but how to do it...
I know many method to optimize, but i found in manual way, if I exclude the data that giving low accuracy result and put them in different group then, recalculate again to the data group that excluded, i will get more significant result. do again...until you reach the accuracy target that you want.
each group data, that have a new regression, is a new class.
means i will have several multiple regression base on data that i input (the regression come from each group of data / class) and the accuracy is really high, 99% - 99.99%.
and with the several class, the regresion have a fuction as a 'label' of the class, this is what happens in the backgroud of the automation computation. but with many module, the user of the module, feel put 'string' object as label, but the truth is, the string object binding to a regresion that constructed as label.
with some conditional parameter you can get the good ML with minimum number of data train.
try it on excel / libreoffice before step more further...
try to follow the tutorial from this video
and implement it in simple data that easy to construct in excel, like pythagoras.
so the answer is yes...the multiple regression is the best approach for optimization.