There are 4 items: A weights 2LB has profit $40, B weights 5LB has profit $30, C weights 10LB has profit $50, and D weights 5LB has profit $10. Compute the maximum total profit you can take from any of the 4 items with a knapsack weight 16LB. You cannot take any portions of an item but the whole.
Please show how can the above problem be solved using knapsack problem approach.
A simple solution is to consider all subsets of items and calculate the total weight and value of all subsets. Then you should consider taking only the subsets that weighs less or equal to the capacity of your knapsack. From all such subsets, pick the maximum value subset.
To consider all subsets of items, there can be two cases for every item:
the item is included in the optimal subset,
the item is not included in the optimal subset.
Let's say the capacity of your knapsack is W. Therefore, the maximum value that can be obtained from n items is max of following two values.
Maximum value obtained by n-1 items and W weight (excluding the nth item)
Value of nth item plus maximum value obtained by n-1 items and W minus weight of the nth item (including nth item).
If weight of the nth item is greater than W, then the nth item can't be included and case 1 is the only option. This would be a naive approach and the solution would take 2n time.
Now for the overlapping subproblem:
weight = {2, 5, 10, 5}
Capacity = 16
The recursion tree would look like:
// Here n,k -> items remaining, capacity remaining
// Going to left child -> including the item at hand
// Going to right child -> excluding the item at hand
_______4,16______
/ \
/ \
3,14 3,16
/ \ / \
/ \ / \
2,9 2,14 2,5 2,16
\ / \ \ / \
\ / \ \ / \__
1,9 1,4 1,14 1,5 1,6 1,16
/\ /\ /\ /\ / \
/ \ / \ / \ / \ / \
0,4 0,9 0,9 0,14 0,0 0,0 0,1 0,6 0,11 0,16
Since there are overlapping subproblems in the leaf, we can solve it using dynamic programming. If you store the values, it will be efficient to use them later. Here the match occurs in leaf nodes, if you take other examples, you'll see a match can occur far before the leaf nodes.
The pseudo-code would look like:
Procedure Knapsack(n, W): //here, n = number of items, W = capacity of Knapsack
for i from 0 up to n
for j from 0 up to W
if i == 0 or j == 0
table[i][j] :=0
else if weight[i-1] <= j
table[i][j] := max(profit[i-1] + table[i-1][w-weight[i-1]], table[i-1][j])
else
table[i][j] := table[i-1][j]
end if
end for
end for
Return table[n][W]
Related
Let's say I have a list of 5 words:
[this, is, a, short, list]
Furthermore, I can classify some text by counting the occurrences of the words from the list above and representing these counts as a vector:
N = [1,0,2,5,10] # 1x this, 0x is, 2x a, 5x short, 10x list found in the given text
In the same way, I classify many other texts (count the 5 words per text, and represent them as counts - each row represents a different text which we will be comparing to N):
M = [[1,0,2,0,5],
[0,0,0,0,0],
[2,0,0,0,20],
[4,0,8,20,40],
...]
Now, I want to find the top 1 (2, 3 etc) rows from M that are most similar to N. Or on simple words, the most similar texts to my initial text.
The challenge is, just checking the distances between N and each row from M is not enough, since for example row M4 [4,0,8,20,40] is very different by distance from N, but still proportional (by a factor of 4) and therefore very similar. For example, the text in row M4 can be just 4x as long as the text represented by N, so naturally all counts will be 4x as high.
What is the best approach to solve this problem (of finding the most 1,2,3 etc similar texts from M to the text in N)?
Generally speaking, the most widely standard technique of bag of words (i.e. you arrays) for similarity is to check cosine similarity measure. This maps your bag of n (here 5) words to a n-dimensional space and each array is a point (which is essentially also a point vector) in that space. The most similar vectors(/points) would be ones that have the least angle to your text N in that space (this automatically takes care of proportional ones as they would be close in angle). Therefore, here is a code for it (assuming M and N are numpy arrays of the similar shape introduced in the question):
import numpy as np
cos_sim = M[np.argmax(np.dot(N, M.T)/(np.linalg.norm(M)*np.linalg.norm(N)))]
which gives output [ 4 0 8 20 40] for your inputs.
You can normalise your row counts to remove the length effect as you discussed. Row normalisation of M can be done as M / M.sum(axis=1)[:, np.newaxis]. The residual values can then be calculated as the sum of the square difference between N and M per row. The minimum difference (ignoring NaN or inf values obtained if the row sum is 0) is then the most similar.
Here is an example:
import numpy as np
N = np.array([1,0,2,5,10])
M = np.array([[1,0,2,0,5],
[0,0,0,0,0],
[2,0,0,0,20],
[4,0,8,20,40]])
# sqrt of sum of normalised square differences
similarity = np.sqrt(np.sum((M / M.sum(axis=1)[:, np.newaxis] - N / np.sum(N))**2, axis=1))
# remove any Nan values obtained by dividing by 0 by making them larger than one element
similarity[np.isnan(similarity)] = similarity[0]+1
result = M[similarity.argmin()]
result
>>> array([ 4, 0, 8, 20, 40])
You could then use np.argsort(similarity)[:n] to get the n most similar rows.
I've been having trouble with the following question
I have a given Binary Tree (not necessarily BST) and two pointers (x,y) and I need to find if X is Y's predecessor in a O(1) complexity, I can add as many fields as I want.
I was thinking about adding each predecessor as a field as I insert the next child to the tree but in that way how can I search if X is Y's predecessor in O(1) complexity.
If you use nodes, add an unsigned int field, call it L, starting at 1 with root.
When you recursively insert, take the value of the previous node and multiply by 2 then add 1 if you go right, or simply multiply by 2 if you go left.
You will get a tree of L values that looks like this:
1
/ \
/ \
/ \
/ \
10 11
/ \ / \
/ \ / \
100 101 110 111
\ / \
1001 1110 1111
/
10010
An ancestor P should have a value P.L such that P.L is a substring of C.L and the number of bits in P.L is strictly less than the bits in C.L.
The tree's L values in base-10 is:
1
/ \
/ \
/ \
/ \
2 3
/ \ / \
/ \ / \
4 5 6 7
\ / \
9 14 15
/
18
If you have both pointers, if you take log_2(L), You will get the # of bits in that number L, which if you notice, represents the level in the tree you are at.
So if:
// Parent (ancestor) has equal or more bits?
if (log(P.L) >= log(C.L)) {
// parent is not an ancestor because it
// is either lower in tree, or at same level
}
If that check passes, Subtract bits(P) from bits(C), this will tell you how many more bits C.L has than P.L. Or, how many levels C is lower than P.
int D = log(C.L) - log(P.L)
Since C is lower, and all we did to calculate C.L value was multiply parent' L values by two (shift left) some number of times, if we were to shift C back over to the right (divide by 2) D times, the first D bits should match.
// Divide by 2, D times
int c = C.L >> D
// Is P.L a substring of C.L?
if (c == P.L) {
// P.L is a substring of C.L
// means P is an ancestor of C
}
// If we get here, C is below P in the tree, but C
// is not in a subtree of P because the first `D bits don't match`
In essence, we use integers as strings to keep track of the path of insertion, and we use bit manipulation to check if C.L is a substring of P.L in constant time.
Note, if you used an array, then P.L and C.L are simply the indexes of the nodes you would like to check.
I have written an algorithm that given a list of words, must check each unique combination of four words in that list of words (regardless of order).
The number of combinations to be checked, x, can be calculated using the binomial coefficient i.e. x = n!/(r!(n-r)!) where n is the total number of words in the list and r is the number of words in each combination, which in my case is always 4, therefore the function is x = n!/(4!(n-4)!) = n!/(24(n-4)!). Therefore as the number of total words, n, increases the number of combinations to be checked, x, therefore increases factorially right?
What has thrown me is that WolframAlpha was able to rewrite this function as x = (n^4)/24 − (n^3)/4 + (11.n^2)/24 − n/4, so now it would appear to grow polynomially as n grows? So which is it?!
Here is a graph to visualise the growth of the function (the letter x is switched to an l)
For a fixed value of r, this function is O(n^r). In your case, r = 4, it is O(n^4). This is because most of the terms in the numerator are canceled out by the denominator:
n!/(4!(n-4)!)
= n(n-1)(n-2)(n-3)(n-4)(n-5)(n-6)...(3)(2)(1)
-------------------------------------------
4!(n-4)(n-5)(n-6)...(3)(2)(1)
= n(n-1)(n-2)(n-3)
----------------
4!
This is a 4th degree polynomial in n.
I'm attempting to solve the familiar mean-variance optimization problem using matrices in Mathematica, with a few added constraints on the solution vector "w". (The mean-variance optimization problem is basically choosing how to allocate a given budget over a number of assets according to their means and covariances in order to minimize the risk of the portfolio for a chosen level of mean return.)
My question: I'm not sure which function to use to perform the minimization of the objective function, which is quadratic:
obj = 0.5*w'* Sig * w
where w is the Nx1 vector of weights for each of the N assets and Sig is the NxN covariance matrix
From what I've been able to find (I'm fairly new to Mathematica), it seems that FindMinimum, NMinimize, etc. are meant to deal only with scalar inputs, while LinearProgramming is meant for an objective function that's linear (not quadratic) in the weight vector w. I could very well be wrong here--any help in steering me toward the correct function would be much appreciated!
If it helps, I've attached my sample code--I'm not sure how, but if there's a place to upload the sample .csv data, I can do that as well if someone could point me to it.
Thank you very much for any help you can provide.
-Dan
CODE
(* Goal: find an Nx1 vector of weights w that minimizes total \
portfolio risk w'*Sig*w (where Sig is the covariance matrix) subject to:
-The portfolio expected return is equal to the desired level d: w'*M \
= d, where M is the Nx1 vector of means
-There are exactly two assets chosen from each of the refining, \
construction, hitech, and utility sectors, and exactly one asset \
chosen from the "other" sector
^ The above two constraints are represented together as w'*R = k, \
where R is the matrix [M SEC] and k is the vector [d 2 2 2 2 1]
-Each weight in w takes an integer value of either 0 or 1, \
representing buying or not buying that physical asset (ex. a plant) -- \
this constraint is achieved as a combination of an integer constraint \
and a boundary constraint
**Note that for the T=41 days of observations in the data, not every \
asset generates a value for every day; by leaving the days when the \
asset is "off" as blanks, this shouldn't affect the mean or \
covariance matrices.
*)
Clear["Global`*"]
(* (1) Import the data for today *)
X = Import["X:\\testassets.csv", "Data"];
Dimensions[X];
(* (2) Create required vectors and matrices *)
P = Take[X, {2, 42}, {4}];
Dimensions[P]; (* Should be N assets x 1) *)
r = Take[X, {2, 42}, {10, 50}];
Dimensions[r]; (* Should be N x T *)
Sig = Covariance[
r]; (* When there's more time, add block diagonal restriction here \
*)
Dimensions[Sig]; (* Should be N x N *)
M = Mean[r\[Transpose]];
Dimensions[M]; (* Should be N x 1 *)
SEC = Take[X, {2, 42}, {5, 9}];
Dimensions[SEC]; (* Should be N x 5 *)
(* (3) Set up constrained optimization *)
d = 200; (* desired level of return *)
b = 60000;(* budget constraint *)
R = Join[M, POS];
Dimensions[R]; (* Should be N x 6 *)
k = {d, 2, 2, 2, 2, 1};
obj = w*Sig*w\[Transpose];
constr = w*R;
budgetcap = w*P;
lb = ConstantArray[0, 41];
ub = ConstantArray[1, 41];
FindMinimum[{obj, constr = k, budgetcap <= b, Element[w, Integers],
lb <= w <= ub}, w]
Let's say we have a set : {1, 2, ..., n}.
How many subsets of order R exist S = {a_i1, a_i2, ...a_iR} that sum up to a certain number S?. What is the recursion for this problem?
Just define method to solve original problem. Parameters it receives are:
max number to use (n),
subset size (R),
subset sum (S),
and returns number of combinations.
To implement this method, first we have to check is it possible to make this request. It is not possible to fulfill task if:
subset size is larger than number of possible elements (R > n)
maximal possible sum is smaller than S. n + (n-1) + ... + (n-R+1) < S => R*((n-R) + (R+1)/2) < S
After that it is enough to try all possibilities for larger element that will go in subset. In python style it should be implemented like:
def combinations(n, R, S):
if R > n or R*((n-R) + (R+1)/2) < S:
return 0
c = 0
for i in xrange(R, n+1): # try i as maximal element in subset. It can go from R to n
# recursion n is i-1, since i is already used
# recursion R is R-1, since we put i in a set
# recursion S is S-i, since i is added to a set and we are looking for sum without it
c += combinations(i-1, R-1, S-i)
return c