Solving the multiple-choice multidimensional knapsack - optimization

I am trying to solve some (relatively easy) instances of the multiple-choice multidimensional knapsack problem (where there are groups of items where only one item per group can be obtained and the weights of the items are multi-dimensional as well as the knapsack capacity). I have two questions regarding the formulation and solution:
If two groups have different number of items, is it possible to fill in the groups with smaller number of items with items having zero profit and weight=capacity to express the problem in a matrix form? Would this affect the solution? Specifically, assume I have optimization programs, where the first group (item-set) might have three candidate items and the second group has two items (different than three), i.e. these have the following form:
maximize (over x_ij) {v_11 x_11 + v_12 x_12 + v_13 x_13 +
v_21 x_21 + v_22 x_22}
subject to {w^i_11 x_11 + w^i_12 x_12 + w^i_13 x_13 + w^i_21 x_21 + w^i_22 x_22 <= W^i, i=1,2
x_11 + x_12 + x_13 = 1, x_21 + x_22 = 1, x_ij \in {0,1} for all i and j.
Is it OK in this scenario to add an artificial item x_23 with value v_23 = 0 and w^1_23 = W^1, w^2_23 = W^2 to have full products v_ij x_ij (i=1,2 j=1,2,3)?
Given that (1) is possible, has anyone tried to solve instances using some open-source optimization package such as cvx? I know about cplex but it is difficult to get for a non-academic and I am not sure that GLPK supports groups of variables.

Related

Taking the difference of 2 nodes in a decision problem while keeping the model as an MILP

To explain the question it's best to start with this
picture
I am modeling an optimization decision problem and a feature that I'm trying to implement is heat transfer between the process stages (a = 1, 2) taking into account which equipment type is chosen (j = 1, 2, 3) by the binary decision variable y.
The temperatures for the equipment are fixed values and my goal is to find (in the case of the picture) dT = 120 - 70 = 50 while keeping the temperature difference as a parameter (I want to keep the problem linear and need to multiply the temperature difference with a variable later on).
Things I have tried:
dT = T[a,j] - T[a-1,j]
(this obviously gives T = 80 for T[a-1,j] which is incorrect)
T[a-1] = sum(T[a-1,j] * y[a-1,j] for j in (1,2,3)
This will make the problem non-linear when I multiply with another variable.
I am using pyomo and the linear "glpk" solver. Thank you for reading my post and if someone could help me with this it is greatly appreciated!
If you only have 2 stages and 3 pieces of equipment at each stage, you could reformulate and let a binary decision variable Y[i] represent each of the 9 possible connections and delta_T[i] be a parameter that represents the temp difference associated with the same 9 connections which could easily be calculated and put into a model parameter.
If you want to keep in double-indexed, and assuming that there will only be 1 piece of equipment selected at each stage, you could take the sum-product of the selection variable and temps at each stage and subtract them.
dT[a] = sum(T[a, j]*y[a, j] for j in J) - sum(T[a-1, j]*y[a-1, j] for j in J)
for a ∈ {2, 3, ..., N}

What does O(nm/8 * log(nm/8)) + O(nm/9 * log(nm/9)) + ... + O(nm/m * log(nm/m)) equal to?

I'm sorry for the question title but I can't find a simpler way to put it. Basically, my algorithm involves quicksort for O(nm/k) elements, where k ranges from 8 to m. I wonder what the total complexity for this is, and how to deduce it? Thank you!
Drop the division inside the logarithms and we get nmlog(mn) * (1/8 + ... + 1/m) = O(nmlog(mn)log(m)) = O(mnlog(m)^2 + mnlog(m)log(n)). [I used the fact that the harmonic series is asymptotically ln(m))
Note that the fact that we dropped the divisions inside the logarithms means that we got an upper bound rather than an exact bound (but a better one than the naive approach of taking the biggest term multiplied by m).

What is meant by "unit" in IDEA code duplication analysis?

IntelliJ IDEA has an ability to find duplicated code.
One can tune the amount of "units" (according to their documentation) that is considered duplicate.
However, I can't find any explanation on what is this "unit".
I'm looking for an answer that unambiguously defines such units.
The "units" measure is used in option Do not show duplicates simpler than. This option defines the minimal weight of the reported code fragments.
This weight is computed as a sum of all element weights in the fragment.
And since different elements have the different weights sum of them must be measured in abstract "units".
Element weight can be roughly approximated as:
it's a statement -> 2
it's an expression/literal/identifier -> 1
otherwise -> 0
For example, weight of x = 42; can be approximated as w(x) + w(=) + w(42) + w(;) + w(statement(x=42;)). Which is rougly 1 + 0 + 1 + 2 = 4 .

Constraining on a Minimum of Distinct Values with PuLP

I've been using the PuLP library for a side project (daily fantasy sports) where I optimize the projected value of a lineup based on a series of constraints.
I've implemented most of them, but one constraint is that players must come from at least three separate teams.
This paper has an implementation (page 18, 4.2), which I've attached as an image:
It seems that they somehow derive an indicator variable for each team that's one if a given team has at least one player in the lineup, and then it constrains the sum of those indicators to be greater than or equal to 3.
Does anybody know how this would be implemented in PuLP?
Similar examples would also be helpful.
Any assistance would be super appreciated!
In this case you would define a binary variable t that sets an upper limit of the x variables. In python I don't like to name variables with a single letter but as I have nothing else to go on here is how I would do it in pulp.
assume that the variables lineups, players, players_by_team and teams are set somewhere else
x_index = [i,p for i in lineups for p in players]
t_index = [i,t for i in lineups for t in teams]
x = LpVariable.dicts("x", x_index, lowBound=0)
t = LpVAriable.dicts("t", t_index, cat=LpBinary)
for l in teams:
prob += t[i,l] <=lpSum([x[i,k] for k in players_by_team[l]])
prob += lpSum([t[i,l] for l in teams]) >= 3

Number of BST's given a linked list of numbers

Suppose I have a linked list of positive numbers, how many BST's can be generated from them, provided all nodes all required to form the tree?
Conversely, how many BST's can be generated, provided any number of the linked list nodes can exist in these trees?
Bonus: how many balanced BST's can be formed? Any help or guidance is greatly appreciated.
You can use dynamic programming to compute that.
Just note that it doesn't matter what the numbers are, just how many. In other words for any n distinct integers there is the same amount of different BSTs. Let's call this number f(n).
Then if you know f(k) for k < n, you can get f(n):
f(n) = Sum ( f(i) + f(n-1-i), i = 0,1,2,...,n-1 )
Each summand represents the number of trees for which the (1+i)-th smallest number is at the root (thus in the left subtree where are i numbers and in the right subtree there are n-1-i).
So DP solves this.
Now the total number of BSTs (with any nodes from the list) is just a sum:
Sum ( Binomial(n,k) * f(k), k=1,2,3,...,n )
This is because you can pick k of them in Binomial(n,k) ways and then you know that there are f(k) BSTs for them.