I'm trying to find a better approach that my current one to manage the current state vector of a continuous Markov chain. The state vector used stores pairs of (state, proability) where probability is a float.
The piece of the algorithm that needs to be optimized does the following operations:
every iteration starts with a current state vector
computes the reachable states for every current one in vector and store all of them in a temporary list together with the probability of going there
for every element in this new list it calculates the new state vector by iterating over the possible transitions (mind that there can be many transitions which lead to the same state but found from different source states)
this is actually done by using hashtables that have as keys the states and as values the probabilities.
So basically to build up the new vector, for every transition, the normalized value is calculated, then the state in the vector is retrieved with a get, the probability of current transition is added and then the result is stored back.
This approach seemed to work so far but I'm trying to cope with systems that lead in very large space vectors (500k-1mil states) so, although hashtable give a constant complexity to store and retrieve, iterations starts slowing down really a lot.
This because for example, we start from a vector that has 100k states, for each one we computes the reachable states (which usually are > 1) so that we obtain a transition list of 100k*(avg reachable states). Then every transition is gone through to compute the new probability vector.
Unfortunately I need to go through the whole reachable list to find the normalizing value without actually computing the next vecto but in any case, as I saw through profiling, this is not the bottleneck of the algorithm. The bottleneck is present when every transition is computed.
This is the function used to compute the transition list from the current vector (pi):
HTable.fold (fun s p l ->
if check s f2 then (0., s, p, [s, 1.0]) :: l
else if not (check s f1) then (0., s, p, [s, 1.0]) :: l
else
let ts = P.rnext s in
if List.length ts = 0 then (0., s, p, [s, 1.0]) :: l
else
let lm = List.fold_left (fun a (s,f) -> f +. a) 0. ts in
(lm, s, p, ts) :: l) pi []
And this is the function that computes the new pi by going through the transition list and by normalizing them all:
let update_pi s v =
try
let t = HTable.find pi s in
HTable.replace pi s (v +. t)
with Not_found -> HTable.add pi s v
in
HTable.clear pi;
List.iter (fun (llm, s, p, ts) ->
if llm = 0. then
update_pi s p
else begin
List.iter (fun (ss, pp) ->
update_pi ss (p *. (pp /. lm))
) ts;
if llm < lm then update_pi s (p *. (1. -. (llm /. lm)))
end
) u;
I should find a data structures which is better suited for the operations I'm doing, the problem is that with current approach I have to do a get and a set for every single transition but doing so many operations over hashtables kills performance since the constant cost is quite expensive.
It cannot hurt to replace the linear time if List.length ts = 0 by the constant time if ts = [], although I doubt this is going to solve your performance problem.
Your algorithm sound a little like multiplying a matrix by a vector to get a new vector. This is usually sped up by blocking. But here, the representation in hashtables may be costing you locality. Would it be possible, once and for all, to number all the states, and then use arrays instead of hashtables? Note that with arbitrary transitions, the destination states would still not be local, but it may be an improvement (if only because accessing an array is more direct than accessing a hashtable).
Related
I am currently toying around with meshing voxel chunks, that is a N^3 list of voxel values. Since in my use-case most of the voxels are going to be of the same type a lot of the neighbors will share the same value. Thus, using RLE (run length encoding) makes sense to use, as it drastically cuts down the actual storage requirement at the cost of a O(log(n)) (n being the amount of runs in a chunk) random lookup. At the same time, I encode each voxel position as a z-order curve, specifically using morton encoding. This all works in my use-case and gives the expected space reductions required.
The issue is with meshing the individual chunks. Generating meshes takes up to 500ms (for N=48) which is simply too long for fluid gameplay. My current algorithm heavily borrows from the one described here and here.
Pseudocode algorithm:
- For each axis in [X, Y, Z]
- for each k in 0..N in axis
- Create a mask of size [N * N]
- for each u in 0..N of orthogonal axis:
- for each v in 0..N of second orthogonal axis:
- Check if (k, u, v) has a face between itself and (k+1, u, v)
- If yes, set mask[u*v*N] = true
- using the mask, make a greedy mesh and output that
The mesh generation itself is very fast (<<< 1ms) and wouldn't gain much from being optimized, but building the mask itself is very costly. As one can see in the tracing output below, each mesh_mask_building takes on average ~4ms which happens 3*N times per mesh!
My first thought to optimize this, was to use the inherent runs of the chunk and simply traverse those and build a mask of each, but this did not work out, as morton encoding is winding a lot throughout the primitive and as such would not be much better at constructing a mesh. It would also be highly suboptimal when one considers a chunk where only one layer is set to visible voxels. The current method generates a simple cuboid as it does not care about runs, but in my suggested method, each run would be seperate and generate more faces.
So my question is, how can I use a morton-run-length encoding to generate a mesh?
When we need to optimize a function on the positive real half-line, and we only have non-constraints optimization routines, we use y = exp(x), or y = x^2 to map to the real line and still optimize on the log or the (signed) square root of the variable.
Can we do something similar for linear constraints, of the form Ax = b where, for x a d-dimensional vector, A is a (N,n)-shaped matrix and b is a vector of length N, defining the constraints ?
While, as Ervin Kalvelaglan says this is not always a good idea, here is one way to do it.
Suppose we take the SVD of A, getting
A = U*S*V'
where if A is n x m
U is nxn orthogonal,
S is nxm, zero off the main diagonal,
V is mxm orthogonal
Computing the SVD is not a trivial computation.
We first zero out the elements of S which we think are non-zero just due to noise -- which can be a slightly delicate thing to do.
Then we can find one solution x~ to
A*x = b
as
x~ = V*pinv(S)*U'*b
(where pinv(S) is the pseudo inverse of S, ie replace the non zero elements of the diagonal by their multiplicative inverses)
Note that x~ is a least squares solution to the constraints, so we need to check that it is close enough to being a real solution, ie that Ax~ is close enough to b -- another somewhat delicate thing. If x~ doesn't satisfy the constraints closely enough you should give up: if the constraints have no solution neither does the optimisation.
Any other solution to the constraints can be written
x = x~ + sum c[i]*V[i]
where the V[i] are the columns of V corresponding to entries of S that are (now) zero. Here the c[i] are arbitrary constants. So we can change variables to using the c[] in the optimisation, and the constraints will be automatically satisfied. However this change of variables could be somewhat irksome!
This is a common problem when you want to introduce randomness, but at the same time you want your experiment to stick close to the intended probability distribution, and can not / do not want to count on the law of big numbers.
Say you have programmed a coin with 50-50 chance for heads / tails. If you simulate it 100 times, most likely you will get something close to the intended 50-50 (binary distribution centered at 50-50).
But what if you wanted similar certainty for any number of repeats of the experiment.
A client of ours asked us this ::
We may also need to add some restrictions on some of the randomizations (e.g. if spatial location of our stimuli is totally random, the program could present too many stimuli in some locations and not very many in others. Locations should be equally sampled, so more of an array that is shuffled instead of randomization with replacement).
So they wanted randomness they could control.
Implementation details aside (arrays, vs other methods), the wanted result for our client's problem was the following ::
Always have as close to 1 / N of the stimuli in each of the N potential locations, yet do so in a randomized (hard-to-predict) way.
This is commonly needed in games (when distributing objects, characters, stats, ..), and I would imagine many other applications.
My preferred method for dealing with this is to dynamically weight the intended probabilities based on how the experiment has gone so far. This effectively moves us away from independently drawn variables.
Let p[i] be the wanted probability of outcome i
Let N[i] be the number of times outcome i has happened up to now
Let N be the sum of N[] for all outcomes i
Let w[i] be the correcting weight for i
Let W_Max be the maximum weight you want to assign (ie. when an outcome has occurred 0 times)
Let P[i] be the unnormalized probability for i
Then p_c[i] is the corrected probability for i
p[i] is fixed and provided by the design. N[i] is an accumulation - every time i happens, increment N[i] by 1.
w[i] is given by
w[i] = CalculateWeight(p[i], N[i], N, W_Max)
{
if (N == 0) return 1;
if (N[i] == 0) return W_Max;
intended = p[i] * N
current = N[i]
return intended / current;
}
And P[i] is given by
P[i] = p[i] * w[i]
Then we calculate p_c[i] as
p_c[i] = P[i] / sum(P[i])
And we run the next iteration of our random experiment (sampling) with p_c[i] instead of p[i] for outcome i.
The main drawback is that you trade control for predictability. After 4 tails in a row, it's highly likely you will see a head.
Note 1 :: The described method will provide at any step a distribution close to the original if the experiment's results match the intended results, or skewed towards (away) outcomes that have happened less (more) than intended.
Note 2 :: You can introduce a "control" parameter c and add an extra step.
p_c2[i] = c * p_c[i] + (1-c) * p[i]
For c = 1, this defaults to the described method, for c = 0 it defaults to the the original probabilities (independently drawn variables).
I am trying to find a function h(r) that minimises a functional H(h) by a very simple gradient descent algorithm. The result of H(h) is a single number. (Basically, I have a field configuration in space and I am trying to minimise the energy due to this field). The field is discretized in space, and the gradient used is the derivative of H with respect to the field's value at each discrete point in space. I am doing this on Mathematica, but I think the code is easy to follow for non-users of Mathematica.
The function Hamiltonian takes a vector of field values, the spacing of points d, and the number of points imax, and gives the value of energy. EderhSym is the function that gives a table of values for the derivatives at each point. I wrote the derivative function manually to save computation time. (The details of these two functions are probably irrelevant for the question).
Hamiltonian[hvect_, d_, imax_] :=
Sum[(i^2/2)*d*(hvect[[i + 1]] - hvect[[i]])^2, {i, 1, imax - 1}] +
Sum[i^2*d^3*Potential[hvect[[i]]], {i, 1, imax}]
EderhSym[hvect_,d_,imax_]:=Join[{0},Table[2 i^2 d hvect[[i]]-i(i+1)d hvect[[i+1]]
-i(i-1)d hvect[[i-1]]+i^2 d^3 Vderh[hvect[[i]]], {i, 2, imax - 1}], {0}]
The code below shows a single iteration of gradient descent. hvect1 is some starting configuration that I have guessed using physical principles.
Ederh = EderhSym[hvect1, d, imax];
hvect1 = hvect1 - StepSize*Ederh;
The problem is that I am getting random spikes in the derivative table. The spikes keep growing until there is an overflow. I have tried changing the step size, I have tried using moving averages, low pass filters, Gaussian filters etc. I still get spikes that cause an overflow. Has anyone encountered this? Is it a problem with the way I am setting up gradient descent?
Aside - I am testing my gradient descent code as I will have to adapt it to a different multivariable Hamiltonian where I do this to find a saddle point instead: (n is an appropriately chosen small number, 10 in my case)
For[c = 1, c <= n, c++,
Ederh = EderhSym[hvect1, \[CapitalDelta], imax];
hvect = hvect - \[CapitalDelta]\[Tau]*Ederh;
];
Ederh = EderhSym[hvect1, \[CapitalDelta], imax];
hvect1 = hvect1 + n \[CapitalDelta]\[Tau]*Ederh;
Edit: It seems to be working with a much smaller step size than I had previously tried (although convergence is slow). I believe that was the problem, but I do not see why the divergences are localised at particular points.
How do you impose a constraint that all values in a vector you are trying to optimize for are greater than zero, using fmincon()?
According to the documentation, I need some parameters A and b, where A*x ≤ b, but I think if I make A a vector of -1's and b 0, then I will have optimized for the sum of x>0, instead of each value of x greater than 0.
Just in case you need it, here is my code. I am trying to optimize over a vector (x) such that the (componentwise) product of x and a matrix (called multiplierMatrix) makes a matrix for which the sum of the columns is x.
function [sse] = myfun(x) % this is a nested function
bigMatrix = repmat(x,1,120) .* multiplierMatrix;
answer = sum(bigMatrix,1)';
sse = sum((expectedAnswer - answer).^2);
end
xGuess = ones(1:120,1);
[sse xVals] = fmincon(#myfun,xGuess,???);
Let me know if I need to explain my problem better. Thanks for your help in advance!
You can use the lower bound:
xGuess = ones(120,1);
lb = zeros(120,1);
[sse xVals] = fmincon(#myfun,xGuess, [],[],[],[], lb);
note that xVals and sse should probably be swapped (if their name means anything).
The lower bound lb means that elements in your decision variable x will never fall below the corresponding element in lb, which is what you are after here.
The empties ([]) indicate you're not using linear constraints (e.g., A,b, Aeq,beq), only the lower bounds lb.
Some advice: fmincon is a pretty advanced function. You'd better memorize the documentation on it, and play with it for a few hours, using many different example problems.