Algorithm to find the k sets of values from N sets such that the number of unique values across the k sets is maximised - optimization

I have a database of N billboards giving the IDs of all the people that saw each billboard. I need to find the k billboards that have been seen by the largest number of unique people across the k billboards.
As an example:
I have N = 3 billboards: billboard 1 was seen by persons 'a', 'b', and 'c', billboard 2 was
seen by person 'b' and billboard 3 was seen by persons 'c' and 'd'
k = 2
The solution is billboards 1 & 3, which together were seen by four people ('a', 'b', 'c' and 'd')
So each billboard represents a set of values, and I need to find the k billboards from the N available that have the highest number of unique values.
I can't do this with brute force because of the huge number of potential combinations (>10K billboards in my database), is there an algorithm that more quickly find an optimal or near-optimal solution? Speed here is more important than getting the answer exactly right.
Preferably I would also like to be able to constrain the algorithm such that the sum of the costs of the selected billboards was below a certain value, this isn't strictly required though.
I'm thinking this is similar to some of the combinatorial optimisation problems described here, in particular the knapsack problem here, except that these problems are working with sets of numbers rather than sets of sets. My maths skills are sketchy so I haven't been able to work out whether I could modify these equations to suit my needs.
Thank you

Related

Performing a sparse sum on Mathematica

I want to evaluate a sum in Mathematica of the form
g[[i,j,k,l,m,n]] x g[[o,p,q,r,s,t]] x ( complicated function of the indices )
But all these indices range from 0 to 3, so the total number of cases to sum over is 4^12, which will take an unforgiving amount of time. However, barely any elements of the array g[[i,j,k,l,m,n]] are nonzero -- there are probably around 8 nonzero entries -- so I would like to restrict the sum over {i,j,k,l,m,n,o,p,q,r,s,t} to precisely those combinations of indices for which both factors of g are nonzero.
I can't find a way to do this for summation over multiple indices, where the allowed index choices are particular combinations of {i,j,k,l,m,n} as opposed to specific values of each particular index. Any help appreciated!

Constraining on a Minimum of Distinct Values with PuLP

I've been using the PuLP library for a side project (daily fantasy sports) where I optimize the projected value of a lineup based on a series of constraints.
I've implemented most of them, but one constraint is that players must come from at least three separate teams.
This paper has an implementation (page 18, 4.2), which I've attached as an image:
It seems that they somehow derive an indicator variable for each team that's one if a given team has at least one player in the lineup, and then it constrains the sum of those indicators to be greater than or equal to 3.
Does anybody know how this would be implemented in PuLP?
Similar examples would also be helpful.
Any assistance would be super appreciated!
In this case you would define a binary variable t that sets an upper limit of the x variables. In python I don't like to name variables with a single letter but as I have nothing else to go on here is how I would do it in pulp.
assume that the variables lineups, players, players_by_team and teams are set somewhere else
x_index = [i,p for i in lineups for p in players]
t_index = [i,t for i in lineups for t in teams]
x = LpVariable.dicts("x", x_index, lowBound=0)
t = LpVAriable.dicts("t", t_index, cat=LpBinary)
for l in teams:
prob += t[i,l] <=lpSum([x[i,k] for k in players_by_team[l]])
prob += lpSum([t[i,l] for l in teams]) >= 3

Knapsack (multi-criteria)

If I have a knapsack where weight w have two values v1 and v2 and capacity is m. How will I find the total values for v1 and v2 where the weight does not exceed capacity m?
Ok, so your problem is defined as following. First some (variable definitions with sample values):
int N = 4; // number of items to choose from
int m = 6; // maximum weight in knapsack
// weight for an item[i] to be summed up, upper limited = m
int weight[N] = {5,2,4,3};
// two values for each items:
int values[2][N] = {
{1,3,5,2},
{6,3,2,4}
};
The knapsack is to be filled with the items, without exceeding weight "m" for the sum of weight for all items in the knapsack. Where you have 2 values for each item. We could regard this problem like:
I want to go on vacation by plane with my girlfriend. And we have one suitecase (=knapsack) and N items to choose from. Each item has a weight and the sum of the weight may not be too height (e.g. weight limit air line is 25 kg and suirecase is 1kg, so we have m=24 kg as a limit for items). For each item we have 2 values. The values[1][N] are the values for me (for having item n in the knapsack on our tour). The values[2][N] are the values for my girlfriend, who has different preferences. We also assume, that every item can be put only once into the knapsack and that the overall value of the knapsack is the sum of their values for me added to the sum of their values for her.
This problem can easily converted to the standard knapsack problem by just adding up the values-list. So an item gets an overall value (e.g. for me and her together) and we only have one value for one item:
int value[N] = {(1+6),(3+3),(5+2),(2+4)};
Or just:
int value[N] = {7, 6, 7, 5};
Now you have only -one- value for each item. Which is the normal knapsack problem.
How to solve the usual knapsack problem optimally is described on Wikipedia. Have a look at http://en.wikipedia.org/wiki/Knapsack_problem -- If English is not your mother tongue, also take a look at a version in your language (choose language from the menu there).
If you need further assistance, just ask.

Number of BST's given a linked list of numbers

Suppose I have a linked list of positive numbers, how many BST's can be generated from them, provided all nodes all required to form the tree?
Conversely, how many BST's can be generated, provided any number of the linked list nodes can exist in these trees?
Bonus: how many balanced BST's can be formed? Any help or guidance is greatly appreciated.
You can use dynamic programming to compute that.
Just note that it doesn't matter what the numbers are, just how many. In other words for any n distinct integers there is the same amount of different BSTs. Let's call this number f(n).
Then if you know f(k) for k < n, you can get f(n):
f(n) = Sum ( f(i) + f(n-1-i), i = 0,1,2,...,n-1 )
Each summand represents the number of trees for which the (1+i)-th smallest number is at the root (thus in the left subtree where are i numbers and in the right subtree there are n-1-i).
So DP solves this.
Now the total number of BSTs (with any nodes from the list) is just a sum:
Sum ( Binomial(n,k) * f(k), k=1,2,3,...,n )
This is because you can pick k of them in Binomial(n,k) ways and then you know that there are f(k) BSTs for them.

question about inversion

I have read something in the site that inversion means if i<j then A[i]>A[j] and it has some exercises about this , I have a lot of questions but I want to ask just one of them at first and then i will do the other exercises by myself if I can!!
Exercise: What permutation array (1,2, ..., n) has the highest number of inversion? What are these?
thanks
Clearly N, ..., 2, 1 has the highest number of inversions. Every pair is an inversion. For example for N = 6, we have 6 5 4 3 2 1. The inversions are 6-5, 6-4, 6-3, 6-2, 6-1, 5-4, 5-3 and so on. Their number is N * (N - 1) / 2.
Well, the identity permutation (1,2,...,n) has no inversions. Since an inversion is a pair of elements that are in reverse order than their indices, the answer probably involves some reversal of that permutation.
I have never heard the term inversion used in this way.
A decreasing array of length N, for N>0, has 1/2*N*(N-1) pairs i<j with A[i]>A[j]. This is the maximum possible.