Do forward not taken jumps pollute the branch history table - branch

I've read that one of the penalties for integer overflow checking
is pollution of the branch history table.
I was wondering if it is really necessary.
Assuming the CPU statically predicts a forward branch as not taken
and the branch is indeed not taken. Can't the CPU leave it out of
the branch history table?
This way the branch history table won't be polluted and the branch will be predicted correctly next time anyway.
Does anyone know if this is actually done by some CPUs?
And if not is there a reason why its a bad idea?

Related

Party people come an go: How to use Bayes to know who is in or out?

story/problem
Imagine there are N party guests and a bouncer guarding the only door. Before the party starts, all the guests are outside (that's for sure). However, once the party starts, people come and go. Each time such an event occurs, the bouncer makes a note for each potential guest of the likelihood that it could have been him or her. One could call this score the bouncer's classification confidence. For each event, this is a list of N candidates that adds up to one. All in all, T events have been observed until the next morning.
Unfortunately, some valuables were stolen that night. To narrow down the group of suspects, the host checked the bouncers notes. However, he soon found contradictory and therefore unreliable data: For example, according to the data, the same person entered the place of high confidences twice in a row, which we know is impossible. Therefore, he attempts by cleaning the data from contradictions to improve the quality of the classifications.
where I am stuck/what I tried
First I solved this issue by formulating a Linear Program and solved this in Python. However, as the number of guests increase, this soon becomes computationally infeasible. Therefore want use Bayes' theorem to compute the probability of the guests presence.
Since we have prior beliefs of the guests attending or not and repeated information updates, I felt that a bayesian approach would be the natural way to tackle this problem.
Even though this approach seemed intuitive to me, I faced some problems:
(1) The problem of being 100% sure of the first state
Being 100% sure of a person being outside at time t=0 seems to "break" the Bayesian formular. Let A be person A and data be the information available by the bouncer.
If the prior P(A present) is either 0 or 1 the formular collapses to 0 or 1 respectively. Am I wrong here?
(2) The problem of using all available information
I did not find a way not only to use data that happened before time step t but also information gathered at a later time. E.g. given three candidates the bouncer makes three observations with confidence ct:
cout->in1=(0.5, 0.4, 0.1)T,
cout->in2=(0.4, 0.4, 0.2)T,
cout->in3=(0.9, 0.05, 0.05)T.
Using only information that was available until t, the most plausible solution would be: (A entered, B entered, C entered) in. However, using all the information available this is a better solution: (B entered, C entered, A entered).
(3) The problem of noisy observations.
I was thinking of using a Bernoulli distribution as prior, however, I do not observe binary events but confidences.
Since I'm stuck, I'm looking forward to your help on Bayesian reasoning and ultimately finding the thief.

Using Optaplanner to solve VRPTW with large number of customers and sophisticated constraints

I'm developing a solver for a VRPTW problem using the OptaPlanner and I have faced a problem when large number of customers need to be serviced. By the large number I mean up to 10,000 customers. I have tried running a solver for about 48 hours but no feasible solution was ever reached.
I use a highly customized VRPTW domain model that introduces additional planning entity so-called "Workbreak". Workbreaks are like customers but they can have a location that is actually another planning value - because every day a worker can return home or go to the hotel. Workbreaks have fixed time of departure (usually next day morning), and a variable time of arrival (because it depends on the previous entity within a chain). A hard constraint cares about not allowing to "arrive" to the Workbreak after certain point of time. There are other hard constraints too, like:
multiple service time windows per customer
every week the last customer in chain must be a special customer "storage space visit" (workers need to gather materials before the next week)
long jobs management (when a customer needs to be serviced longer than specified time it should be serviced before specific hour of a day)
max number of jobs per workday
max total job duration per workday (as worker cannot work longer than specified time)
a workbreak cannot have a location of a hotel that is too close to worker's home.
jobs can not be serviced on Sundays
... and many more - there is a total number of 19 hard constrains that have to be applied. There are 3 soft constraints too.
All the aforementioned constraints were initially written as Drools rules, but because of many accumulation-based constraints (max jobs per day, max hours per day, overtime hours per week) the overall speed of the solver (benchmarks) was about 400 step/sec.
At first I thought that solver's speed is too slow to reach a feasible solution in a reasonable time, so I have rewritten all rules into easy score calculator, and it had a decent speed - about 4600 steps/sec. I knew that is will only perform best for a really small number of customers, but I wanted to know if the Drools was the cause of that poor performance. Then I have rewritten all these rules into incremental score calculator (and survived the pain of corrupted score bugs until all of them were successfully fixed). Surprisingly incremental score calculation is a bit slower for a small number of customers, comparing to easy score calculator, but it is not an issue, because overall speed is about 4000 steps/sec - no matter how many entities I have.
The thing that bugs me the most is that above a certain number of customers (problems start at 1000 customers) the solver cannot reach feasible solution. Currently I'm using Late Acceptance and Step Counting algorithms, because they perform really good for this kind of a problem (at least for a less number of customers). I used Simulated Annealing too, but without success, mostly because I could not find good values for algorithm specific parameters.
I have implemented some custom moves too:
Composite move that changes workbreak's location when sibling entities are changed using other moves like change/swap moves (it helps escaping many score traps, as improving step usually needs at least two moves to be performed in a single step)
Move factory for better long jobs assignment (it generates moves that tries to put customers with longer service time in the front of a workday chain)
Workbreak assignment move factory (it generates moves that helps putting workbreaks in proper sequence)
Now I'm scratching my head, and wondering what I should do to diagnose the source of my problem. I suspected that maybe it was hitting a score trap, but I have modified the solver so it saves snapshots of best score each minute. After reading these snapshots I realized that the score was still decreasing. Can the number of hard constraints play the role? I suspect that many moves need to be performed to find out a move that improves the score. The fact is that maybe 48 hours isn't that much for this kind of a problem, and it should make computations a whole week? Unfortunately I have nothing to compare with.
I would like to know how to find out if it is solely a performance problem, or a solver (algorithm, custom moves, hard/soft score) configuration problem.
I really apologize for my bad English.
TL;DR but FWIW:
To scale above 1k locations you need to use NearBy selection.
To scale above 10k locations, add Partitioned Search too.

Flat reservations optimization system

i have a booking system and i need to write a script that optimize the reservations.
When a customer book a flat, the system assigns the first flat available. The problem is that after a few reservations my "grid" become fragmented.
Grid example:
Grid example
In practice I need to minimize the white space so as I can accept the maximum number of reservations.
My question is: there are some know problem that fit my problem? I had thought to some knapsack problem variations.
I can provide more info if needed.
Thanks.
This is a scheduling problem. One very important question is: can you reassign a flat to a different number once the reservation has been made?
If the answer is yes, you will find a solution if and only if you have no day with more reservations than you have flats: simply take the first idle slot of the very first day d1, and if there is a conflict with a future reservation, reallocate the future reservation by taking the first idle slot of its very first day d2 (note that d2>d1), and your algorithm will converge because you will have a strictly increasing sequence of days where you need reallocation.
If the answer is no, we come into a tricky world where your algorithm has to guess what would be the future reservations. A good heurisic, I think, would be to score the placements. For instance, you can check how many iddle slots you leave before and after the reservation, and take the option that leaves a few empty slots as possible.

Optaplanner and real time replanning without simple backup planning, minimising changes

If I have the following situation - a kind of "Travelling Technician" problem modeled on the vehicle routing but instead of vehicles its is technicians traveling to sites.
We want to:
generate a plan for the week ahead
send that plan to each of the technicians and sites with who is visiting, why and when
So far all ok, we generate the plan for the week..
But on Tuesday a technician phones in ill (or at 11:30 the technicians car breaks down). Assume we do not have a backup (so simple backup planning will not work). How can I redo the plan minimising any changes? Basically keeping the original plan constraints but adding a constraint that rewards keeping as close to the original plan as possible and minimising the number of customers that we upset.
Yes, basically every Entity has an extra field which holds the original planning variable value. That extra field is NOT a planning variable itself. Then you add rules which says that if the plannign variable != original value, it inflicts a certain soft cost. The higher the soft cost, the less volatile your schedule is. The lower the soft cost, the more flexible your schedule is towards the new situation.
See the MachineReassignment example for an example implementation. That actually has 3 types of these soft costs.

Machine Learning challenge: technique for collect the coins

Suppose that there is a company that own a couple of vending machines that collect coins. When the coin safe is full, the machine can not sell any new item. To prevent this, the company must collect the coins before that. But if the company send the technician too early, the company loses money because he made an unnecessary trip. The challenge is to predict the right time to collect coins to minimize the cost of operation.
At each visit (to collect or other operations), a reading of the level of coins in the safe is performed. This data contains historical information regarding the safe filling for each machine.
What is the best ML technique, approach to this problem computationally?
This is the two parts to the problem I see:
1) vending machine model
I would probably build a model for each machine using the historic data. Since you said a linear approach is probably not good, you need to think about things that have influence on the filling of a machine, i.e. time related things like week-day dependency, holiday dependency, etc., other influences like the weather maybe? So you need to attach these factors to the historic data to make a good predictive model. Many machine learning techniques can help creating a model and finding real data correlations. Maybe you should create despriptors from your historical data and try to correlate these to the filling state of a machine. PLS can help reducing the descriptor space and find relevant ones. Neuronal Networks are great if you really have no clue about the underlying math of a correlation. Play around with it. But pretty much any machine learning technique should be able to come up with a decent model
2) money collection
Model the cost for a random trip of the technician to a machine. Take into account the filling grade of the machines and the cost of the trip. You can send the technician on virtual collecting tours and calculate the total cost of collecting the money and the revenues from the machine. Use again maybe a neuronal network with some evolutionary strategy to find an optimum of trips and times. you can use the model of the filling grade of the machines during the virtual optimization, since you probably need to estimate the filling grade of the machines in these virtual collection rounds.
Interesting problems you have...