Uncapacited Facility Location optimized brute force - optimization

How can I optimize a brute force method to find optimal solution for UFL problem? My solution is working very slowly.
If you already know the UFL problem, you can just jump the following description.
There is a graph G. We can divide G's vertices in 2 subsets C and F.
C is the subset of clients and F the subset of facilities.
Every client has a distance to every facility, that is, dij is the distance from client i to facility j.
Every facility i has a cost fi to open
Every client i needs ci objects (from some facility)
Every client i must be served by only one facility j, at the price of (dij * ci)
We want to minimize the overall cost (to serve all clients and open the necessary facilities)
My solution is as simple as possible: test all possibilities of associating clients and facilities, and this is very bad, given that, for example, if I had 10 clients and 5 facilities, there would be 5^10 possibilities.
How can I optimize this? I thought about some preprocessings but I got confused with it because of the fi and I still didn't come up with anything.

Related

Traveling salesman ampl

I am working on a Traveling salesman problem and can't figure how to solve it. The problem contains ten workers, ten workplaces where they should be driven to and one car driving them one by one. There is a cost of $1.5 per km. Also, all the nodes (both workers and workplaces) are positioned in a 10*10 matrix and the distance between each block in the matrix is 1 km.
The problem should be solved using AMPL.
I have already calculated the distances between each coordinate in excel and have copy pasted the matrix to the dat.file in AMPL.
This is my mod.file so far (without the constrains):
param D > 0;
param D > 0;
set A = 1..W cross 1..D;
var x{A}; # 1 if the route goes from person p to work d,
# 0 otherwise
param cost;
param distance;
minimize Total_Cost:
sum {(w,d) in A} cost * x[w,d];
OK, so your route looks like: start-worker 1-job 1-worker 2-job 2-worker 3-job-3-...-job 10-end (give or take start & end points, depending on how you formulate the problem.
That being the case, the "worker n-job n" parts of your route are predetermined. You don't need to include "worker n-job n" costs in the route optimisation, because there's no choice about those parts of the route (though you do need to remember them for calculating total cost, of course).
So what you have here is really a basic TSP with 10 "destinations" (each representing a single worker and their assigned job) but with an asymmetric cost matrix (because cost of travel from job i to worker j isn't the same as cost of travel from job j to worker i).
If you already have an implementation for the basic TSP, it should be easy to adapt. If not, then you need to write one and make that small change for an asymmetric cost matrix. I've seen two different approaches to this in AMPL.
2-D decision matrix with subtour elimination
Decision variable x{1..10,1..10} is defined as: x[i,j] = 1 if the route goes from job i to job j, and 0 otherwise. Constraints require that every row and column of this matrix has exactly one 1.
The challenging part with this approach is preventing subtours (i.e. the "route" produced is actually two or more separate cycles instead of one large cycle). It sounds like your current attempt is at this stage.
One solution to the problem of subtours is an iterative approach:
Write an implementation that includes all requirements except for subtour prevention.
Solve with this implementation.
Check the resulting solution for subtours.
If no subtours are found, return the solution and end.
If you do find subtours, add a constraint which prevents that particular subtour. (Identify the arcs involved in the subtour, and set a constraint which implies they can't all be selected.) Then go to #2.
For a small exercise you may be able to do the subtour elimination by hand. For a larger exercise, or if your lecturer doesn't like that approach, you can create a .run that automates it. See Bob Fourer's post of 31/7/2013 in this thread for an example of implementation.
3-D decision matrix with time dimension
Under this approach, you set up a decision variable x{1..10,1..10,1..10} where x[i,j,t] = 1 if the route goes from job i to worker j at time t, and 0 otherwise. Your constraints then require that the route goes to and from each job/worker combination exactly once, that if it goes to worker i at time t then it must go from job i at time t+1 (excepting first/last issues), that it's doing exactly one thing at time t, and that the endpoint at time 10 matches the startpoint at time 1 (assuming you want a circuit).
This prevents subtours, because it forces a route that starts at some point at time 1, returns to that point at time 10, and doesn't visit any other point more than once - meaning that it has to go through all of them exactly once.

Can proc sql embedded in sas macros dynamically merge to data-sets, simulating residential treatment placement decisions for trouble youth?

Good afternoon and happy Friday, folks
I’m trying to automate a placement simulation of youth into residential treatment where they will have the highest likelihood of success. Success is operationalized as “not recidivating” within 3 years of entering treatment. Equations predicting recidivism have been generated for each location, and the equations have been applied to each individual in the scenario (based on youth characteristics like risk, age, etc., LOS). Each youth has predicted success rates for every location, which throws in a wrench: youth are not qualified for all of the treatment facilities for which they have predicted success rates. Indeed, treatment locations have differing, yet overlapping qualifications.
Let’s take a made-up example. Johnny (ID # 5, below) is a 15-year-old boy with drug charges. He could have “predicted success rates” of 91% for location A, 88% for location B, 50% for location C, and 75% for location D. Johnny is most likely to be successful (i.e., not recidivate within three years of entering treatment) if he is treated at location A; unfortunately, location A only accepts youth who are 17 years old or older; therefore, Johnny would not qualify for treatment here. Alternatively, for Johnny, location B is the next best location. Let us assume that Johnny is qualified for location B, but that all of location-B beds are filled; so, we must now look to location D, as it is now Johnny’s “best available” option at 75%.
The score so far: We are matching youth to available beds in location for which they qualify and might enjoy the greatest likelihood of success. Unfortunately, each location only has a certain number of available beds, and the number of available beds different across locations. The qualifications of entry into treatment facilities differ, yet overlap (e.g., 12-17 year-olds vs 14-20 year-olds).
In order to simulate what placement decisions might look like based on success rates, I went through the scenario describe above for over 400 youth, by hand, in excel. It took me about a week. I’d like to use PROC SQL imbedded in a SAS MACRO to automate these placement scenarios with the ultimate goals of a) obtain the ability to bootstrap iterations in order to examine effect sizes across distributions, b) save time, and c) prevent further brain damage from banging my head again desk and wall in frustration whilst doing this by hand. Whilst never having had the necessity—nay—the privilege of using SQL in my typical roll as a researcher, I believe that this time has now come to pass and I’m excited about it! Honestly. I believe it has the capacity I’m looking for. Unfortunately, it is beating the devil out of me!
Here’s what I’ve got cookin’ so far: I want to create and automate the placement simulation with the clever use of merging/joining/switching/or something like that.
I have two datasets (tables). The first dataset contains all of the youth information (one row per youth; several columns with demographics, location ranks, which correspond to the predicted success rates). The order of rows in the youth dataset (was/will be randomly generated (to simulate the randomness with which youth enter the system and are subsequently place into treatment). Note that I will be “cleaning” the youth dataset prior to merging such that rank-column cells will only be populated for programs for which a respective youth qualifies. This should take the “does the youth even qualify for the program” problem out of the equation.
However, it still leaves the issue of availability left to be contended with in the scenario.
The second dataset containing the treatment facility beds, with each row corresponding to an available bed in one of the treatment location; two columns contain bed numbers and location names. Each bed (row) has only one location cell populated, but locations will populate several cells.
Thus, in descending order, I want to merge each youth row with the available bed that represents his/her best chance of success, and so the merge/join/switch/thing should take place
on youth.Rank1= distinct TF.Location,
and if youth.Rank1≠ TF.location then
merge on youth.Rank2= TF.location,
if youth.Rank2≠ TF.location then merge at
youth.Rank3 = TF.location, etc.
Put plainly: “Merge on rank1 unless rank1 location is no longer available, then merge on rank2, unless rank2 location is no longer available, and on down the line, etc., etc., until all option are exhausted and foster care (i.e., alternative services). Is the only option.
I’ve had no success getting this to work. I haven’t even been successful getting the union function to work. About the only successful thing I’ve done in SQL so far is create a view of a single dataset. It’s pretty sad. I’ve been following this guidance, but I get hung up around the “where” command:
proc sql; /Calls the SQL procedure*/;
create table x as /*Tells SAS to create a table called x*/
select /*Specifies the column(s) to be selected*/
from /*Specificies the tables(s) (data sets) to be queried*/
where /*Subjests the data based on a condition*/
group by /*Classifies the data into groups based on the specified
column(s)*/
order by /*Sorts the resulting rows observations) by the specified
column(s)*/
; quit; /*Ends the proc sql procedure*/
Frankly, I’m stuck and I could use some advice. This greenhorn in me is in way over his head.
I appreciate any help or guidance anyone might lend.
Cheers!
P
The process you describe (and to be honest I skiped to the end so I might of missed something) does not lend itself to SQL because each step could affect the results of the next one. However, you want to get the most best results for the most kids. (I think a lot of that text was to convince us how important it is to help out). You don't actually give us anything we can really use to help since you don't give any details of your data model, your data, or expected results. There really is no way to answer this question. But I don't care -- I'm going to go forward with some suggestions because it is a friday and I've never done a stream of consciousness answer to a stream of consciousness question before. I will suggest you don't formulate your solution just in sql, but instead use a higher level program and engage is a process like the one described below -- because this a DB questions I've noted the locations where the DB might be involved.
Generate a list kids (this can be in a table -- called NEEDY-KID)
Have a list of locations to assign (this can also be a table LOCATION)
Run your matching for best fit from KID to location -- at this point don't worry about assign more than one kid to a location -- there can be duplicates (put this in table called KID2LOC using a query)
Check KID2LOC for locations assigned twice -- use some method to remove the duplicate ones so each loc is only assigned once. (remove from the KID2LOC using a query)
Prune the LOCATION list to remove assigned locations (once again -- a query)
If kids exist without a location go to 3 with new pruned location list.
Done.

Recursive Hierarchy Ranking

I have no idea if I wrote that correctly. I want to start learning higher end data mining techniques and I'm currently using SQL server and Access 2016.
I have a system that tracks ID cards. Each ID is tagged to one particular level of a security hierarchy, which has many branches.
For example
Root
-Maintenance
- Management
- Supervisory
- Manager
- Executive
- Vendors
- Secure
- Per Diem
- Inside Trades
There are many other departments like Maintenance, some simple, some with much more convoluted, hierarchies.
Each ID card is tagged to a level so in the Maintenance example, - Per Diem:Vendors:Maintenance:Root. Others may be just tagged to Vendors, Some to general Maintenance itself (No one has root, thank god).
So lets say I have 20 ID Cards selected, these are available personnel I can task to a job but since they have different area's of security I want to find a commonalities they can all work on together as a 20 person group or whatever other groupings I can make.
So the intended output would be
CommonMatch = - Per Diem
CardID = 1
CardID = 3
CommonMatch = Vendors
CardID = 1
CardID = 3
CardID = 20
So in the example above, while I could have 2 people working on -Per Diem work, because that is their lowest common security similarity, there is also card holder #20 who has rights to the predecessor group (Vendors), that 1 and 3 share, so I could have three of them work at that level.
I'm not looking for anyone to do the work for me (Although examples always welcome), more to point me in the right direction on what I should be studying, what I'm trying to do is called, etc. I know CTE's are a way to go but that seems like only a tool in a much bigger process that needs to be done.
Thank you all in advance
Well, it is not so much a graph-theory or data-mining problem but rather a data-structure problem and one that has almost solved itself.
The objective is to be able to partition the set of card IDs into disjoint subsets given a security clearance level.
So, the main idea here would be to layout the hierarchy tree and then assign each card ID to the path implied by its security level clearance. For this purpose, each node of the hierarchy tree now becomes a container of card IDs (e.g. each node of the hierarchy tree holds a) its own name (as unique identification) b) pointers to other nodes c) a list of card IDs assigned to its "name".)
Then, retrieving the set of cards with clearance UP TO a specific security level is simply a case of traversing the tree from that specific level downwards until the tree's leafs, all along collecting the card IDs from the node containers as they are encountered.
Suppose that we have access tree:
A
+-B
+-C
D
+-E
And card ID assignments:
B:[1,2,3]
C:[4,8]
E:[10,12]
At the moment, B,C,E only make sense as tags, there is no structural information associated with them. We therefore need to first "build" the tree. The following example uses Networkx but the same thing can be achieved with a multitude of ways:
import networkx
G = networkx.DiGraph() #Establish a directed graph
G.add_edge("A","B")
G.add_edge("A","C")
G.add_edge("A","D")
G.add_edge("D","E")
Now, assign the card IDs to the node containers (in Networkx, nodes can be any valid Python object so I am going to go with a very simple list)
G.node["B"]=[1,2,3]
G.node["C"]=[4,8]
G.node["E"]=[10,12]
So, now, to get everybody working under "A" (the root of the tree), you can traverse the tree from that level downwards either via Depth First Search (DFS) or Breadth First Search (BFS) and collect the card IDs from the containers. I am going to use DFS here, purely because Networkx has a function that returns the visited nodes depending on visiting order, directly.
#dfs_preorder_nodes returns a generator, this is an efficient way of iterating very large collections in Python but I am casting it to a "list" here, so that we get the actual list of nodes back.
vis_nodes = list(networkx.dfs_preorder_nodes(G,"A")); #Start from node "A" and DFS downwards
cardIDs = []
#I could do the following with a one-line reduce but it might be clearer this way
for aNodeID in vis_nodes:
if G.node[aNodeID]:
cardIDs.extend(G.node[aNodeID])
In the end of the above iteration, cardIDs will contain all card IDs from branch "A" downwards in one convenient list.
Of course, this example is ultra simple, but since we are talking about trees, the tree can be as large as you like and you are still traversing it in the same way requiring only a single point of entry (the top level branch).
Finally, just as a note, the fact that you are using Access as your backend is not necessarily an impediment but relational databases do not handle graph type data with great ease. You might get away easily for something like a simple tree (like what you have here for example), but the hassle of supporting this probably justifies undertaking this process outside of the database (e.g, use the database just for retrieving the data and carry out the graph type data processing in a different environment. Doing a DFS on SQL is the sort of hassle I am referring to above.)
Hope this helps.

Creating a testing strategy to check data consistency between two systems

With a quick search over stackoverflow was not able to find anything so here is my question.
I am trying to write down the testing strategy for a application where two applications sync with each other every day to keep a huge amount of data in sync.
As its a huge amount of data I don't really want to cross check everything. But just want to do a random check every time a data sync happens. What should be the strategy here for such system?
I am thinking of this 2 approach.
1) Get a count of all data and cross check both are same
2) Choose a random 5 data entry and verify that their proprty are in sync.
Any suggestion would be great.
What you need is known as Risk Management, in Software Testing it is called Software Risk Management.
It seems your question is not about "how to test" what you are about to test but how to describe what you do and why you do that (based on the question I assume you need this explanation for yourself too...).
Adding SRM to your Test Strategy should describe:
The risks of not fully testing all and every data in the mirrored system
A table scaling down SRM vs amount of data tested (ie probability of error if only n% of data tested versus -e.g.- 2n% tested), in other words saying -e.g.!- 5% of lost data/invalid data/data corrupption/etc if x% of data was tested with a k minute/hour execution time
Based on previous point, a break down of resources used for the different options (e.g. HW load% for n hours, manhours used is y, costs of HW/SW/HR use are z USD)
Probability -and cost- of errors/issues with automation code (ie data comparison goes wrong and results in false positive or false negative, giving an overhead to DBA, dev and/or testing)
What happens if SRM option taken (!!e.g.!! 10% of data tested giving 3% of data corruption/loss risk and 0.75% overhead risk -false positive/negative results-) results in actual failure, ie reference to Business Continuity and effects of data, integrity, etc loss
Everything else comes to your mind and you feel it applies to your *current issue* in your *current system* with your *actual preferences*.

optimizing a function to find global and local peaks with R

Y
I have 6 parameters for which I know maxi and mini values. I have a complex function that includes the 6 parameters and return a 7th value (say Y). I say complex because Y is not directly related to the 6 parameters; there are many embeded functions in between.
I would like to find the combination of the 6 parameters which returns the highest Y value. I first tried to calculate Y for every combination by constructing an hypercube but I have not enough memory in my computer. So I am looking for kinds of markov chains which progress in the delimited parameter space, and are able to overpass local peaks.
when I give one combination of the 6 parameters, I would like to know the highest local Y value. I tried to write a code with an iterative chain like a markov's one, but I am not sure how to process when the chain reach an edge of the parameter space. Obviously, some algorythms should already exist for this.
Question: Does anybody know what are the best functions in R to do these two things? I read that optim() could be appropriate to find the global peak but I am not sure that it can deal with complex functions (I prefer asking before engaging in a long (for me) process of code writing). And fot he local peaks? optim() should not be able to do this
In advance, thank you for any lead
Julien from France
Take a look at the Optimization and Mathematical Programming Task View on CRAN. I've personally found the differential evolution algorithm to be very fast and robust. It's implemented in the DEoptim package. The rgenoud package is another good candidate.
I like to use the Metropolis-Hastings algorithm. Since you are limiting each parameter to a range, the simple thing to do is let your proposal distribution simply be uniform over the range. That way, you won't run off the edges. It won't be fast, but if you let it run long enough, it will do a good job of sampling your space. The samples will congregate at each peak, and will spread out around them in a way that reflects the local curvature.