Constraining on a Minimum of Distinct Values with PuLP - optimization

I've been using the PuLP library for a side project (daily fantasy sports) where I optimize the projected value of a lineup based on a series of constraints.
I've implemented most of them, but one constraint is that players must come from at least three separate teams.
This paper has an implementation (page 18, 4.2), which I've attached as an image:
It seems that they somehow derive an indicator variable for each team that's one if a given team has at least one player in the lineup, and then it constrains the sum of those indicators to be greater than or equal to 3.
Does anybody know how this would be implemented in PuLP?
Similar examples would also be helpful.
Any assistance would be super appreciated!

In this case you would define a binary variable t that sets an upper limit of the x variables. In python I don't like to name variables with a single letter but as I have nothing else to go on here is how I would do it in pulp.
assume that the variables lineups, players, players_by_team and teams are set somewhere else
x_index = [i,p for i in lineups for p in players]
t_index = [i,t for i in lineups for t in teams]
x = LpVariable.dicts("x", x_index, lowBound=0)
t = LpVAriable.dicts("t", t_index, cat=LpBinary)
for l in teams:
prob += t[i,l] <=lpSum([x[i,k] for k in players_by_team[l]])
prob += lpSum([t[i,l] for l in teams]) >= 3

Related

Taking the difference of 2 nodes in a decision problem while keeping the model as an MILP

To explain the question it's best to start with this
picture
I am modeling an optimization decision problem and a feature that I'm trying to implement is heat transfer between the process stages (a = 1, 2) taking into account which equipment type is chosen (j = 1, 2, 3) by the binary decision variable y.
The temperatures for the equipment are fixed values and my goal is to find (in the case of the picture) dT = 120 - 70 = 50 while keeping the temperature difference as a parameter (I want to keep the problem linear and need to multiply the temperature difference with a variable later on).
Things I have tried:
dT = T[a,j] - T[a-1,j]
(this obviously gives T = 80 for T[a-1,j] which is incorrect)
T[a-1] = sum(T[a-1,j] * y[a-1,j] for j in (1,2,3)
This will make the problem non-linear when I multiply with another variable.
I am using pyomo and the linear "glpk" solver. Thank you for reading my post and if someone could help me with this it is greatly appreciated!
If you only have 2 stages and 3 pieces of equipment at each stage, you could reformulate and let a binary decision variable Y[i] represent each of the 9 possible connections and delta_T[i] be a parameter that represents the temp difference associated with the same 9 connections which could easily be calculated and put into a model parameter.
If you want to keep in double-indexed, and assuming that there will only be 1 piece of equipment selected at each stage, you could take the sum-product of the selection variable and temps at each stage and subtract them.
dT[a] = sum(T[a, j]*y[a, j] for j in J) - sum(T[a-1, j]*y[a-1, j] for j in J)
for a ∈ {2, 3, ..., N}

why maximum number of corner points is m+nCn?

In Linear programming we have:
maximum number of corner points for a problem with m constrains and n variable is . n+mCn . (taking a combination of the number of equations plus variables with number of variables )
why this is the case? I have no idea why this is true.
Define:
m = number of rows = number of logical variables (slacks)
n = number of columns = number of structural variables
so the total number of variables is n+m
Further, we have:
number of basic variables = m (solved by linear algebra)
number of non-basic variables = n (temporarily fixed, usually at 0)
The total number of corner points is equal to the number of ways we can choose m basic variables out of n+m total variables.
But we have:
n+m choose m = n+m choose n
Note that in general many of these bases are infeasible.

Algorithm to find the k sets of values from N sets such that the number of unique values across the k sets is maximised

I have a database of N billboards giving the IDs of all the people that saw each billboard. I need to find the k billboards that have been seen by the largest number of unique people across the k billboards.
As an example:
I have N = 3 billboards: billboard 1 was seen by persons 'a', 'b', and 'c', billboard 2 was
seen by person 'b' and billboard 3 was seen by persons 'c' and 'd'
k = 2
The solution is billboards 1 & 3, which together were seen by four people ('a', 'b', 'c' and 'd')
So each billboard represents a set of values, and I need to find the k billboards from the N available that have the highest number of unique values.
I can't do this with brute force because of the huge number of potential combinations (>10K billboards in my database), is there an algorithm that more quickly find an optimal or near-optimal solution? Speed here is more important than getting the answer exactly right.
Preferably I would also like to be able to constrain the algorithm such that the sum of the costs of the selected billboards was below a certain value, this isn't strictly required though.
I'm thinking this is similar to some of the combinatorial optimisation problems described here, in particular the knapsack problem here, except that these problems are working with sets of numbers rather than sets of sets. My maths skills are sketchy so I haven't been able to work out whether I could modify these equations to suit my needs.
Thank you

minimize/1 is not rearranging the order of solutions

For Colombia's Observatorio Fiscal[1], I am coding a simple tax minimization problem, using CLP(R) (in SWI-Prolog). I want to use minimize/1 to find the least solution first. It is instead listing the bigger solution first. Here is the code:
:- use_module(library(clpr)).
deduction(_,3). % Anyone can take the standard deduction.
deduction(Who,D) :- itemizedDeduction(Who,D). % Or they can itemize.
income(joe,10). % Joe makes $10 a year.
itemizedDeduction(joe,4). % He can deduct more if he itemizes.
taxableIncome(Who,TI) :-
deduction(Who,D),
income(Who,I),
TI is I - D,
minimize(TI).
Here is what an interactive session looks like:
?- taxableIncome(joe,N).
N = 7 ;
N = 6 ;
false.
If I switch the word "minimize" to "maximize" it behaves identically. If I include no minimize or maximize clause, it doesn't look for a third solution, but otherwise it behaves the same:
?- taxableIncome(joe,N).
N = 7 ;
N = 6.
[1] The Observatorio Fiscal is a new organization that aims to model the Colombian economy, in order to anticipate the effects of changes in the law, similar to what the Congressional Budget Office or the Tax Policy Center do in the United States.
First, let's add the following definition to the program:
:- op(950,fy, *).
*_.
Using (*)/1, we can generalize away individual goals in the program.
For example, let us generalize away the minimize/1 goal by placing * in front:
taxableIncome(Who,TI) :-
deduction(Who,D),
income(Who,I),
TI #= I - D,
* minimize(TI).
We now get:
?- taxableIncome(X, Y).
X = joe,
Y = 7 ;
X = joe,
Y = 6.
This shows that CLP(R) in fact has nothing to do with this issue! These answers show that everything is already instantiated at the time minimize/1 is called, so there is nothing left to minimize.
To truly benefit from minimize/1, you must express the task in the form of CLP(R)β€”or better: CLP(Q)β€” constraints, then apply minimize/1 on a constrained expression.
Note also that in SWI-Prolog, both CLP(R) and CLP(Q) have elementary mistakes, and you cannot trust their results.
Per Mat's response, I rewrote the program expressing the constraints using CLP. The tricky bit was that I had to first collect all (both) possible values for deduction, then convert those values into a CLP domain. I couldn't get that conversion to work in CLP(R), but I could in CLP(FD):
:- use_module(library(clpfd)).
deduction(_,3). % Anyone can take the same standard deduction.
deduction(Who,D) :- % Or they can itemize.
itemizedDeduction(Who,D).
income(joe,10).
itemizedDeduction(joe,4).
listToDomain([Elt],Elt).
listToDomain([Elt|MoreElts],Elt \/ MoreDom) :-
MoreElts \= []
, listToDomain(MoreElts,MoreDom).
taxableIncome(Who,TI) :-
income(Who,I)
, findall(D,deduction(Who,D),DList)
, listToDomain(DList,DDomain)
% Next are the CLP constraints.
, DD in DDomain
, TI #= I-DD
, labeling([min(TI)],[TI]).

Number of BST's given a linked list of numbers

Suppose I have a linked list of positive numbers, how many BST's can be generated from them, provided all nodes all required to form the tree?
Conversely, how many BST's can be generated, provided any number of the linked list nodes can exist in these trees?
Bonus: how many balanced BST's can be formed? Any help or guidance is greatly appreciated.
You can use dynamic programming to compute that.
Just note that it doesn't matter what the numbers are, just how many. In other words for any n distinct integers there is the same amount of different BSTs. Let's call this number f(n).
Then if you know f(k) for k < n, you can get f(n):
f(n) = Sum ( f(i) + f(n-1-i), i = 0,1,2,...,n-1 )
Each summand represents the number of trees for which the (1+i)-th smallest number is at the root (thus in the left subtree where are i numbers and in the right subtree there are n-1-i).
So DP solves this.
Now the total number of BSTs (with any nodes from the list) is just a sum:
Sum ( Binomial(n,k) * f(k), k=1,2,3,...,n )
This is because you can pick k of them in Binomial(n,k) ways and then you know that there are f(k) BSTs for them.