How to model coin-flips with pymc (from Probabilistic Programming and Bayesian Methods for Hackers) - bayesian

Basically asking about https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/issues/150
At https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/Ch2_MorePyMC_PyMC2.ipynb the author discusses a model where students are asked if they cheated on a test or not. They flip coins and answer honestly if heads, otherwise answer randomly. Given the number of "yes" answers, how can we get the distribution of cheaters?
The author there basically models this arising from a probability of cheaters, which will give rise to some set of observations of students either cheating or not, which will then give rise to some set of answers via the coin flips, which will then yield some observed probability of answering, "yes, I cheated."
However, instead of letting that observed probability (or just the sum of "yes") answers be the observation, he THEN models a binomial distribution on top of that, and the observation recorded in the experiment is set as the observed value for that distribution.
My questions:
Is this the right thing to do? If so, why?
Assuming it's not, is there a better solution (short of the radically simplified version he presents)?
The general case of this is having an "observed" value for a sum of random variables. People online seem to suggest this is impossible, but I don't get why you couldn't just, e.g., "observe" a draw from a uniform distribution with the mean at your deterministic observation and bounds at +/- epsilon.

Prob/Stat problems are subtle. If I am reading your question correctly; The choice between binomial and uniform distros have to do with the significance of the latter part of the scenario setup.. It say's if the students flip tails, they can answer randomly. This means that an answer of no is not equally likely as yes because while a coin flip is uniformly distributed, the following answers are not.
If the student absolutely HAD to answer 'no' on tails, you could definitely use another uniform distribution. Hope that helps!

Related

How to test the convergence in bugs model?

I want to explain the convergence in a bugs model with the command plot(). An example of the output is in the follow figure
I don't sure that I can read this output well, thanks to everyone :)
Unfortunately, it does not look as if you can confirm convergence from the figure that you are showing (EDIT: There is at least some information, see below). The left hand side of the figure is just a caterpillar plot, which effectively just shows the 95% intervals of the distribution for each parameter.
Assessing convergence is a much more nuanced process, as there are multiple ways to decide if your model has converged. What you will want to determine is that your model has appropriately explored the parameter space for each parameter (through trace plots, traceplot function in the coda library), between and within chain variance (the gelman-rubin diagnostic, gelman.diag in the coda library), and auto-correlation in your chains (autocorr.plot in coda). There are a variety of other measures that others have suggested to assess if your model has converged, and looking through the rest of the coda package will illustrate this.
I highly suggest that you go through the WINBUGS tutorial in their user manual (link to pdf), it has a section that addresses checking model convergence. You want to ensure that the traceplots are well-mixed (look to tutorial to see what that means), that your Gelman Rubin diagnostic is < 1.10 for each parameter (general rule), and that your chains are not too correlated (this will reduce your effective sample size in your chains).
Good luck, and read up a bit on the subject, it will greatly benefit you if you are interested in Bayesian inference!
Edit
As #jacobsocolar pointed out, and I completely missed, the plots that are available in this question do at least have some information that indicates the model did converge. I did not see the R-hat plot on the right side of the left plot. These values should be less than 1.1 for each parameter if the model did indeed converge. Eyeballing the above plot does hint that the model converged, but this would be far easier to see if there was a vertical line at the 1.1 mark on the plot, which there is not.
Your output figure is indeed enough to (begin to) assess convergence, contra M_Fidino's answer. Next to the caterpillar plot, there is a plot of 'r-hat' values. These are the Gelman-Rubin statistic--the ratio of between-chain variance to within-chain variance, and they are all < 1.10
This is an encouraging first sign that the model has converged, assuming that the initial values were chosen to be nicely overdispersed.
Otherwise, I agree with everything in M_Fidino's answer.

Can variance be replaced by absolute value in this objective function?

Initially I modeled my objective function as follows:
argmin var(f(x),g(x))+var(c(x),d(x))
where f,g,c,d are linear functions
in order to be able to use linear solvers I modeled the problem as follows
argmin abs(f(x),g(x))+abs(c(x),d(x))
is it correct to change variance to absolute value in this context, I'm pretty sure they imply the same meaning as having the least difference between two functions
You haven't given enough context to answer the question. Even though your question doesn't seem to be about regression, in many ways it is similar to the question of choosing between least squares and least absolute deviations approaches to regression. If that term in your objective function is in any sense an error term then the most appropriate way to model the error depends on the nature of the error distribution. Least squares is better if there is normally distributed noise. Least absolute deviations is better in the nonparametric setting and is less sensitive to outliers. If the problem has nothing to do with probability at all then other criteria need to be brought in to decide between the two options.
Having said all this, the two ways of measuring distance are broadly similar. One will be fairly small if and only if the other is -- though they won't be equally small. If they are similar enough for your purposes then the fact that absolute values can be linearized could be a good motivation to use it. On the other hand -- if the variance-based one is really a better expression of what you are interested in then the fact that you can't use LP isn't sufficient justification to adopt absolute values. After all -- quadratic programming is not all that much harder than LP, at least below a certain scale.
To sum up -- they don't imply the same meaning, but they do imply similar meanings; and, whether or not they are similar enough depends upon your purposes.

Finding best path trought strongly connected component

I have a directed graph which is strongly connected and every node have some price(plus or negative). I would like to find best (highest score) path from node A to node B. My solution is some kind of brutal force so it takes ages to find that path. Is any algorithm for this or any idea how can I do it?
Have you tried the A* algorithm?
It's a fairly popular pathfinding algorithm.
The algorithm itself is not to difficult to implement, but there are plenty of implementations available online.
Dijkstra's algorithm is a special case for the A* (in which the heuristic function h(x) = 0).
There are other algorithms who can outperform it, but they usually require graph pre-processing. If the problem is not to complex and you're looking for a quick solution, give it a try.
EDIT:
For graphs containing negative edges, there's the Bellman–Ford algorithm. Detecting the negative cycles comes at the cost of performance, though (worse than the A*). But it still may be better than what you're currently using.
EDIT 2:
User #templatetypedef is right when he says the Bellman-Ford algorithm may not work in here.
The B-F works with graphs where there are edges with negative weight. However, the algorithm stops upon finding a negative cycle. I believe that is a useful behavior. Optimizing the shortest path in a graph that contains a cycle of negative weights will be like going down a Penrose staircase.
What should happen if there's the possibility of reaching a path with "minus infinity cost" depends on the problem.

Converting decision problems to optimization problems? (evolutionary algorithms)

Decision problems are not suited for use in evolutionary algorithms since a simple right/wrong fitness measure cannot be optimized/evolved. So, what are some methods/techniques for converting decision problems to optimization problems?
For instance, I'm currently working on a problem where the fitness of an individual depends very heavily on the output it produces. Depending on the ordering of genes, an individual either produces no output or perfect output - no "in between" (and therefore, no hills to climb). One small change in an individual's gene ordering can have a drastic effect on the fitness of an individual, so using an evolutionary algorithm essentially amounts to a random search.
Some literature references would be nice if you know of any.
Application to multiple inputs and examination of percentage of correct answers.
True, a right/wrong fitness measure cannot evolve towards more rightness, but an algorithm can nonetheless apply a mutable function to whatever input it takes to produce a decision which will be right or wrong. So, you keep mutating the algorithm, and for each mutated version of the algorithm you apply it to, say, 100 different inputs, and you check how many of them it got right. Then, you select those algorithms that gave more correct answers than others. Who knows, eventually you might see one which gets them all right.
There are no literature references, I just came up with it.
Well i think you must work on your fitness function.
When you say that some Individuals are more close to a perfect solution can you identify this solutions based on their genetic structure?
If you can do that a program could do that too and so you shouldn't rate the individual based on the output but on its structure.

Optimization algorithm question

This may be a simple question for those know-how guys. But I cannot figure it out by myself.
Suppose there are a large number of objects that I need to select some from. Each object has two known variables: cost and benefit. I have a budget, say $1000. How could I find out which objects I should buy to maximize the total benefit within the given budget? I want a numeric optimization solution. Thanks!
Your problem is called the "knapsack problem". You can read more on the wikipedia page. Translating the nomenclature from your original question into that of the wikipedia article, your problem's "cost" is the knapsack problem's "weight". Your problem's "benefit" is the knapsack problem's "value".
Finding an exact solution is an NP-complete problem, so be prepared for slow results if you have a lot of objects to choose from!
You might also look into Linear Programming. From MathWorld:
Simplistically, linear programming is
the optimization of an outcome based
on some set of constraints using a
linear mathematical model.
Yes, as stated before, this is the knapsack problem and I would choose to use linear programming.
The key to this problem is storing data so that you do not need to recompute things more than once (if enough memory is available). There are two general ways to go about linear programming: top-down, and bottom - up. This one is a bottom up problem.
(in general) Find base case values, what is the most optimal object to select for a small case. Then build on this. If we allow ourselves to spend more money what is the best combination of objects for that small increment in money. Possibilities could be taking more of what you previously had, taking one new object and replacing the old one, taking another small object that will still keep you under your budget etc.
Like I said, the main idea is to not recompute values. If you follow this pattern, you will get to a high number and find that in order to buy X amount of dollars worth of goods, the best solution is combining what you had for two smaller cases.