How to handle (discrete) time-series of boundary condition in Bayesian estimation of ODE? - bayesian

I want to estimate the parameter in an ordinary differential equation (ODE). However, I don’t know how to input the time series of the boundary condition. It is not a “function”, but a time series (i.e., discrete data points). For example, daily inflow of water when modelling water volume of a lake.
I checked the manual of WinBUGS Differential interface, and it seems that their "Worked Example 2: Population PK Model" offers a solution, which is by ode.block() and piecewise() function:
R31[i] <- piecewise(vec.R31[i, 1:n.block])
vec.R31[i, 1] <- 0
vec.R31[i, 2] <- 0
vec.R31[i, 3] <- dose[i] / TI[i]
vec.R31[i, 4] <- 0
...
list(
... n.block = 4, ...)
where R31[i] can be seen as a time-variable boundary condition, n.block means that there are four sub-period for this boundary condition.
However, this solution can not be applied in my model, since I have a boundary condition whose data can not be divided into only 4 (or several) periods. The boundary condition is a daily-scale time series. Thus, if the simulation is 10 years, then I have 3650 sub-periods.
Is there a way to handle the numeric (i.e., discrete) boundary condition with many data points?

Related

How to add sequential (time series) constraint to optimization problem using python PuLP?

A simple optimization problem: Find the optimal control sequence for a refrigerator based on the cost of energy. The only constraint is to stay below a temperature threshold, and the objective function tries to minimize the cost of energy used. This problem is simplified so the control is simply a binary array, ie. [0, 1, 0, 1, 0], where 1 means using electricity to cool the fridge, and 0 means to turn of the cooling mechanism (which means there is no cost for this period, but the temperature will increase). We can assume each period is fixed period of time, and has a constant temperature change based on it's on/off status.
Here are the example values:
Cost of energy (for our example 5 periods): [466, 426, 423, 442, 494]
Minimum cooling periods (just as a test): 3
Starting temperature: 0
Temperature threshold(must be less than or equal): 1
Temperature change per period of cooling: -1
Temperature change per period of warming (when control input is 0): 2
And here is the code in PuLP
from pulp import LpProblem, LpMinimize, LpVariable, lpSum, LpStatus, value
from itertools import accumulate
l = list(range(5))
costy = [466, 426, 423, 442, 494]
cost = dict(zip(l, costy))
min_cooling_periods = 3
prob = LpProblem("Fridge", LpMinimize)
si = LpVariable.dicts("time_step", l, lowBound=0, upBound=1, cat='Integer')
prob += lpSum([cost[i]*si[i] for i in l]) # cost function to minimize
prob += lpSum([si[i] for i in l]) >= min_cooling_periods # how many values must be positive
prob.solve()
The optimization seems to work before I try to account for the temperature threshold. With just the cost function, it returns an array of 0s, which does indeed minimize the cost (duh). With the first constraint (how many values must be positive) it picks the cheapest 3 cooling periods, and calculates the total cost correctly.
obj = value(prob.objective)
print(f'Solution is {LpStatus[prob.status]}\nThe total cost of this regime is: {obj}\n')
for v in prob.variables():
print(f'{v.name} = {v.varValue}')
output:
Solution is Optimal
The total cost of this regime is: 1291.0
time_step_0 = 0.0
time_step_1 = 1.0
time_step_2 = 1.0
time_step_3 = 1.0
time_step_4 = 0.0
So, if our control sequence is [0, 1, 1, 1, 0], the temperature will look like this at the end of each cooling/warming period: [2, 1, 0, -1, 1]. The temperature goes up 2 whenever the control input is 1, and down 1 whenever the control input is 1. This example sequence is a valid answer, but will have to change if we add a max temperature threshold of 1, which would mean the first value must be a 1, or else the fridge will warm to a temperature of 2.
However I get incorrect results when trying to specify the sequential constraint of staying within the temperature thresholds with the condition:
up_temp_thresh = 1
down = -1
up = 2
# here is where I try to ensure that the control sequence would never cause the temperature to
# surpass the threshold. In practice I would like a lower and upper threshold but for now
# let us focus only on the upper threshold.
prob += lpSum([e <= up_temp_thresh for e in accumulate([down if si[i] == 1. else up for i in l])]) >= len(l)
In this case the answer comes out the same as before, I am clearly not formulating it correctly as the sequence [0, 1, 1, 1, 0] would surpass the threshold.
I am trying to encode "the temperature at the end of each control sequence must be less than the threshold". I do this by turning the control sequence into an array of the temperature changes, so control sequence [0, 1, 1, 1, 0] gives us temperature changes [2, -1, -1, -1, 2]. Then using the accumulate function, it computes a cumulative sum, equal to the fridge temp after each step, which is [2, 1, 0, -1, 1]. I would like to just check if the max of this array is less than the threshold, but using lpSum I check that the sum of values in the array less than the threshold is equal to the length of the array, which should be the same thing.
However I'm clearly formulating this step incorrectly. As written this last constraint has no effect on the output, and small changes give other wrong answers. It seems the answer should be [1, 1, 1, 0, 0], which gives an acceptable temperature series of [-1, -2, -3, -1, 1]. How can I specify the sequential nature of the control input using PuLP, or another free python optimization library?
The easiest and least error-prone approach would be to create a new set of auxillary variables of your problem which track the temperature of the fridge in each interval. These are not 'primary decision variables' because you cannot directly choose them - rather the value of them is constrained by the on/off decision variables for the fridge.
You would then add constraints on these temperature state variables to represent the dynamics. So in untested code:
l_plus_1 = list(range(6))
fridge_temp = LpVariable.dicts("fridge_temp", l_plus_1, cat='Continuos')
fridge_temp[0] = init_temp # initial temperature of fridge - a known value
for i in l:
prob += fridge_temp[i+1] == fridge_temp[i] + 2 - 3*s[i]
You can then sent the min/max temperature constraints on these new fridge_temp variables.
Note that in the above I've assumed that the fridge temperature variables are defined at one more intervals than the on/off decisions for the fridge. The fridge temperature variables represent the temperature at the start of an interval - and having one extra one means we can ensure the final temperature of the fridge is acceptable.

How to find most similar numerical arrays to one array, using Numpy/Scipy?

Let's say I have a list of 5 words:
[this, is, a, short, list]
Furthermore, I can classify some text by counting the occurrences of the words from the list above and representing these counts as a vector:
N = [1,0,2,5,10] # 1x this, 0x is, 2x a, 5x short, 10x list found in the given text
In the same way, I classify many other texts (count the 5 words per text, and represent them as counts - each row represents a different text which we will be comparing to N):
M = [[1,0,2,0,5],
[0,0,0,0,0],
[2,0,0,0,20],
[4,0,8,20,40],
...]
Now, I want to find the top 1 (2, 3 etc) rows from M that are most similar to N. Or on simple words, the most similar texts to my initial text.
The challenge is, just checking the distances between N and each row from M is not enough, since for example row M4 [4,0,8,20,40] is very different by distance from N, but still proportional (by a factor of 4) and therefore very similar. For example, the text in row M4 can be just 4x as long as the text represented by N, so naturally all counts will be 4x as high.
What is the best approach to solve this problem (of finding the most 1,2,3 etc similar texts from M to the text in N)?
Generally speaking, the most widely standard technique of bag of words (i.e. you arrays) for similarity is to check cosine similarity measure. This maps your bag of n (here 5) words to a n-dimensional space and each array is a point (which is essentially also a point vector) in that space. The most similar vectors(/points) would be ones that have the least angle to your text N in that space (this automatically takes care of proportional ones as they would be close in angle). Therefore, here is a code for it (assuming M and N are numpy arrays of the similar shape introduced in the question):
import numpy as np
cos_sim = M[np.argmax(np.dot(N, M.T)/(np.linalg.norm(M)*np.linalg.norm(N)))]
which gives output [ 4 0 8 20 40] for your inputs.
You can normalise your row counts to remove the length effect as you discussed. Row normalisation of M can be done as M / M.sum(axis=1)[:, np.newaxis]. The residual values can then be calculated as the sum of the square difference between N and M per row. The minimum difference (ignoring NaN or inf values obtained if the row sum is 0) is then the most similar.
Here is an example:
import numpy as np
N = np.array([1,0,2,5,10])
M = np.array([[1,0,2,0,5],
[0,0,0,0,0],
[2,0,0,0,20],
[4,0,8,20,40]])
# sqrt of sum of normalised square differences
similarity = np.sqrt(np.sum((M / M.sum(axis=1)[:, np.newaxis] - N / np.sum(N))**2, axis=1))
# remove any Nan values obtained by dividing by 0 by making them larger than one element
similarity[np.isnan(similarity)] = similarity[0]+1
result = M[similarity.argmin()]
result
>>> array([ 4, 0, 8, 20, 40])
You could then use np.argsort(similarity)[:n] to get the n most similar rows.

How to find time-varying coefficients for a VAR model by using the Kalman Filter

I'm trying to write some code in R to reproduce the model i found in this article.
The idea is to model the signal as a VAR model, but fit the coefficients by a Kalman-filter model. This would essentially enable me to create a robust time-varying VAR(p) model and analyze non-stationary data to a degree.
The model to track the coefficients is:
X(t) = F(t) X(t− 1) +W(t)
Y(t) = H(t) X(t) + E(t),
where H(t) is the Kronecker product between lagged measurements in my time-series Y and a unit vector, and X(t) fills the role of regression-coefficients. F(t) is taken to be an identity matrix, as that should mean we assume coefficients to evolve as a random walk.
In the article, from W(T), the state noise covariance matrix Q(t) is chosen at 10^-3 at first and then fitted based on some iteration scheme. From E(t) the state noise covariance matrix is R(t) substituted by the covariance of the noise term unexplained by the model: Y(t) - H(t)Xhat(t)
I have the a priori covariance matrix of estimation error (denoted Σ in the article) written as P (based on other sources) and the a posteriori as Pmin, since it will be used in the next recursion as a priori, if that makes sense.
So far i've written the following, based on the articles Appendix A 1.2
Y <- *my timeseries, for test purposes two channels of 3000 points*
F <- diag(8) # F is (m^2*p by m^2 *p) where m=2 dimensions and p =2 lags
H <- diag(2) %x% t(vec(Y[,1:2])) #the Kronecker product of vectorized lags Y-1 and Y-2
Xhatminus <- matrix(1,8,1) # an arbitrary *a priori* coefficient matrix
Q <- diag(8)%x%(10**-7) #just a number a really low number diagonal matrix, found it used in some examples
R<- 1 #Didnt know what else to put here just yet
Pmin = diag(8) #*a priori* error estimate, just some 1-s...
Now should start the reccursion. To test i just took the first 3000 points of one trial of my data.
Xhatstorage <- matrix(0,8,3000)
for(j in 3:3000){
H <- diag(2) %x% t(vec(Y[,(j-2):(j-1)]))
K <- (Pmin %*% t(H)) %*% solve((H%*%Pmin%*%t(H) + R)) ##Solve gives inverse matrix ()^-1
P <- Pmin - K%*% H %*% Pmin
Xhatplus <- F%*%( Xhatminus + K%*%(Y[,j]-H%*%Xhatminus) )
Pplus <- (F%*% P %*% F) + Q
Xhatminus <- Xhatplus
Xhatstorage[,j] <- Xhatplus
Pmin <- Pplus
}
I extracted Xhatplus values into a storage matrix and used them to write this primitive VAR model with them:
Yhat<-array(0,3000)
for(t in 3:3000){
Yhat[t]<- t(vec(Y[,(t-2)])) %*% Xhatstorage[c(1,3),t] + t(vec(Y[,(t-1)])) %*% Xhatstorage[c(2,4),t]
}
The looks like this .
The blue line is VAR with Kalman filter found coefficients, Black is original data..
I'm having issue understanding how i can better evaluate my coefficients? Why is it so off?
How should i better choose the first a priori and a posteriori estimates to start the recursion? Currently, adding more lags to the VAR is not the issue i'm sure, it's that i don't know how to choose the initial values for Pmin and Xhatmin. Most places i pieced this together from start from arbitrary 0 assumptions in toy models, but in this case, choosing any of the said matrixes as 0 will just collapse the entire algorithm.
Lastly, is this recursion even a correct implementation of Oya et al describe in the article? I know im still missing the R evaluation based on previously unexplained errors (V(t) in Appendix A 1.2), but in general?

Flop count for variable initialization

Consider the following pseudo code:
a <- [0,0,0] (initializing a 3d vector to zeros)
b <- [0,0,0] (initializing a 3d vector to zeros)
c <- a . b (Dot product of two vectors)
In the above pseudo code, what is the flop count (i.e. number floating point operations)?
More generally, what I want to know is whether initialization of variables counts towards the total floating point operations or not, when looking at an algorithm's complexity.
In your case, both a and b vectors are zeros and I don't think that it is a good idea to use zeros to describe or explain the flops operation.
I would say that given vector a with entries a1,a2 and a3, and also given vector b with entries b1, b2, b3. The dot product of the two vectors is equal to aTb that gives
aTb = a1*b1+a2*b2+a3*b3
Here we have 3 multiplication operations
(i.e: a1*b1, a2*b2, a3*b3) and 2 addition operations. In total we have 5 operations or 5 flops.
If we want to generalize this example for n dimensional vectors a_n and b_n, we would have n times multiplication operations and n-1 times addition operations. In total we would end up with n+n-1 = 2n-1 operations or flops.
I hope the example I used above gives you the intuition.

Bayesian estimation of log-normal using JAGS

I try to find 95% credible interval of 50 sample means. Sample sizes range from 2 to 600, and the values in each sample are bounded between 1 and 5.
ex:
sample 1 = (1,3.5,2.8,5,4.6)
sample 2 = (1,5)
sample 3 = (4.1,1.1,5,3.5,2,2.4,...)
Samples with size of 10 or more have a lognormal distribution where i used JAGS for Bayesian estimation of log-normal parameters adapted from John K. Kruschke, with model specification as below:
modelstring = "
model {
for( i in 1 : N ) {
y[i] ~ dlnorm( muOfLogY , 1/sigmaOfLogY^2 )
}
sigmaOfLogY ~ dunif( 0.001*sdOfLogY , 1000*sdOfLogY )
muOfLogY ~ dunif( 0.001*meanOfLogY , 1000*meanOfLogY )
muOfY <- exp(muOfLogY+sigmaOfLogY^2/2)
modeOfY <- exp(muOfLogY-sigmaOfLogY^2)
sigmaOfY <- sqrt(exp(2*muOfLogY+sigmaOfLogY^2)*(exp(sigmaOfLogY^2)-1))
}
"
The model works fine with sample size > 10. However, with 3 <= samples < 10 i got extreme values in upper limit (e.g., 3000) which exceeded the maximum possible value of the mean (e.g., 5).
In case of sample size = 2, i got the below error:
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
NA/NaN/Inf in 'y'
I am new to JAGS and can't figure out how to solve this issues. I think for smaples < 10 the distribution is no longer lognormal!
Any ideas?
Thank you
First a semantic note. You are not using JAGS to find sample means. You are using JAGS to find the means of the populations from which the samples arose. If you wanted to find the sample (log)means, you could just take the mean of the (logarithms of the) sample values.
Now, if the values in each sample are bounded between 1 and 5 (due to some external constraint), then the sample is NEVER drawn from a log-normal distribution, which inherently puts probability mass over values greater than five.
Let's imagine, for the sake of saying, that the samples do arise from lognormal sampling (and therefore aren't inherently bounded between 1 and 5). Then JAGS is simply telling you that there is not enough information contained in the sample to get a good estimate of the population mean from which it is drawn. I wouldn't worry about understanding the error when the sample size is two, because there is literally no way to get good inference about the population mean from two samples. This is true even if you know that the population is indeed log-normally distributed. And since your populations are not actually log-normally distributed (they are bounded between 1 and 5) the entire inferential procedure is invalid anyway.