multi-capacities Knapsack in CPLEX - optimization

I came cross Knapsack problem, where the maximum number of multiple items from a set of items need to be placed into one bin by minimizing the cost. I am able to solve the optimization problem in CPLEX.
However, I am finding difficulties in implementing in CPLEX, when the problem consists of two bins (with different capacities).
The problem:
Bin = [B1, B2]
Capacity = [7,5]
Item = [I1, I2, I3, I4]
Weight = [6,3,1,4]
Price = [2,8,2,4]
The objective is to place the maximum number of items and to minimize the total price.
How can I implement this objective problem in CPLEX?
Below is my code snippet:
// ITEMS
int n=4; // no of items
range items = 1..n; // range of items
int p[items] = [2,8,2,6]; //price
int w[items] = [6,3,1,4]; //weight
// BINS
int m=2; // no of bins
range bins=1..m; // range of bin
int capacity[bins] = [7,5]; // capacity of each bin
dvar boolean x[items][bins];
// model ; max the profit
maximize sum(i in items, j in bins) p[i]*x[i][j];
subject to {
forall (j in bins)
cons1 : sum(i in items) w[i]*x[i][j] <= capacity[j];
forall (i in items)
cons2 : sum(j in bins) x[i][j] == 1;
}
-Thanks

If you add
assert sum(i in items) w[i]<=sum(b in bins) capacity[b];
then this assert is violated and this explains why you do not get a solution. You do not have enough bin capacity.
But then if you turn:
int capacity[bins] = [7,5]; // capacity of each bin
into
int capacity[bins] = [7,7]; // capacity of each bin
then you'll get a solution.

You can find a knapsack example in CPLEX_Studio1271\opl\examples\opl\knapsack.

Related

Can Cplex prioritize some variables so that they're likely to be chosen together?

So my problem contains a vehicle that moves from one node to the next. I have a bunch of nodes that may or may not be related to each other. I want the nodes that are similar to each other to be visited by the vehicle as much as possbible.
Is there any possible ways that i can prioritize the related nodes so that they're more likely to be grouped together? I thought to create sets or tuples that represent the different groups, and to have a variable X[i][j] = 1 if the vehicle moves from node i to node j, but i'm stuck at the "prioritize i and j if they come from the same set" part. Is it the boolean value that makes it impossible to render that? Should I modify my formulations somehow?
This is my code for the problem for now, i still haven't come out with the priority part
int nNode = 20;
range N = 1..nNode; //set of locations to visit
range V = 0..nNode; //set of locations plus the depot
range Vehicle = 1..6; //there are six vehicles
range boxType = 1..3; //three types of boxes to be transported
int demand[V][boxType] =...; //demand for a location in terms of different boxes
int timeBox[boxType] =...; //time associated with the actions on a type of box
dvar int+ totalLoad[Vehicle];
dvar int+ load[Vehicle][boxType]; //load in terms of box type
dvar boolean X[V][V][Vehicle]; /*1 if the vehicle Vehicle goes from node V to the
next node, 0 if not*/
dvar int+ t[Vehicle]; //total time a vehicle spends
dvar int time[Vehicle]; /*equals |t[vehicle] - target cycle time|, this is to make sure
each vehicle spends as close to target cycle time as possible*/
minimize sum (v in Vehicle)time[v];
subject to
{
forall (i in V)
sum (j in V, k in Vehicle)X[i][j][k] == 1; /* so that each starting node will have
exactly one destination node, i.e it will belong to exactly 1 route only*/
forall (j in V)
sum (i in V, k in Vehicle)X[i][j][k] == 1; // similar but for ending node
forall (k in Vehicle)
totalLoad[k] == sum(i in V, j in V)X[i][j][k]* (sum(b in boxType)demand[j][b]); /*total
load of a vehicle equals the total boxes collected at each stop on its path */
forall (b in boxType, k in Vehicle)
load[k][b] == sum(i in V, j in V) X[i][j][k]*(sum(j in Vehicle)demand[j][b]); /* calculate
separate number of boxes for each route*/
forall (k in Vehicle)
{
time[k] >= t[k] - 1.5;
time[k] >= - t[k] + 1.5;
time[k] <= t[k] + 1.5;
time[k] <= 2 - t[k] - 1.5; // breakdown of time[k] = |t[k]-1.5|, 1.5 is target cycle time
t[k] == sum(b in boxType) load[k][b]*timeBox[b]; // calculate the total time involved in a route
}
}
You could try adding a term into your objective that penalises giving different values to those sets of variables. Easy enough if there are only two of them but more fiddly if there are bigger subsets and/or lots of subsets to coordinate.
I would do something along the lines of what Tim suggested. Here is a little bit more meat on the bones:
x[i,j,k]=1 => L[g] ≤ k ∀i∈g, ∀j,k A lowerbound on the route k for group g
x[i,j,k]=1 => U[g] ≥ k ∀i∈g, ∀j,k An upperbound on the route k for group g
U[g]-L[g] ≥ 1 => δ[g]=1 δ[g]=1 if g is on different routes
min sum(g,δ[g]) objective
δ[g] ∈ {0,1} δ[g] is a binary variable
One way to implement the first 3 equations is:
L[g] ≤ k⋅x[i,j,k] + M(1-x[i,j,k]) ∀i ∈ g, ∀j,k
U[g] ≥ k⋅x[i,j,k] ∀i ∈ g, ∀j,k
M⋅δ[g] ≥ U[g]-L[g]
here g indicates a group. This makes the problem a multi-objective problem, so you can choose from a few possible approaches for that.
you could use priorities if you do not want to change the model from a logical point of view.
See https://github.com/AlexFleischerParis/zooopl/blob/master/zoopriorities.mod
int nbKids=300;
float costBus40=500;
float costBus30=400;
dvar int+ nbBus40;
dvar int+ nbBus30;
execute
{
nbBus40.priority=100;
nbBus30.priority=0;
}
minimize
costBus40*nbBus40 +nbBus30*costBus30;
subject to
{
40*nbBus40+nbBus30*30>=nbKids;
}
in Making optimization Simple
If you want to change the model from a logical point of view you can change the objective or add a second objective
int nbKids=350;
float costBus40=400;
float costBus30=300;
dvar int+ nbBus40;
dvar int+ nbBus30;
dexpr float absdistancebetweennumbers=abs(nbBus40-nbBus30);
minimize
staticLex(costBus40*nbBus40 +nbBus30*costBus30,absdistancebetweennumbers);
subject to
{
40*nbBus40+nbBus30*30>=nbKids;
}

Find nth int with 10 set bits

Find the nth int with 10 set bits
n is an int in the range 0<= n <= 30 045 014
The 0th int = 1023, the 1st = 1535 and so on
snob() same number of bits,
returns the lowest integer bigger than n with the same number of set bits as n
int snob(int n) {
int a=n&-n, b=a+n;
return b|(n^b)/a>>2;
}
calling snob n times will work
int nth(int n){
int o =1023;
for(int i=0;i<n;i++)o=snob(o);
return o;
}
example
https://ideone.com/ikGNo7
Is there some way to find it faster?
I found one pattern but not sure if it's useful.
using factorial you can find the "indexes" where all 10 set bits are consecutive
1023 << x = the (x+10)! / (x! * 10!) - 1 th integer
1023<<1 is the 10th
1023<<2 is the 65th
1023<<3 the 285th
...
Btw I'm not a student and this is not homework.
EDIT:
Found an alternative to snob()
https://graphics.stanford.edu/~seander/bithacks.html#NextBitPermutation
int lnbp(int v){
int t = (v | (v - 1)) + 1;
return t | ((((t & -t) / (v & -v)) >> 1) - 1);
}
I have built an implementation that should satisfy your needs.
/** A lookup table to see how many combinations preceeded this one */
private static int[][] LOOKUP_TABLE_COMBINATION_POS;
/** The number of possible combinations with i bits */
private static int[] NBR_COMBINATIONS;
static {
LOOKUP_TABLE_COMBINATION_POS = new int[Integer.SIZE][Integer.SIZE];
for (int bit = 0; bit < Integer.SIZE; bit++) {
// Ignore less significant bits, compute how many combinations have to be
// visited to set this bit, i.e.
// (bit = 4, pos = 5), before came 0b1XXX and 0b1XXXX, that's C(3, 3) + C(4, 3)
int nbrBefore = 0;
// The nth-bit can be only encountered after pos n
for (int pos = bit; pos < Integer.SIZE; pos++) {
LOOKUP_TABLE_COMBINATION_POS[bit][pos] = nbrBefore;
nbrBefore += nChooseK(pos, bit);
}
}
NBR_COMBINATIONS = new int[Integer.SIZE + 1];
for (int bits = 0; bits < NBR_COMBINATIONS.length; bits++) {
NBR_COMBINATIONS[bits] = nChooseK(Integer.SIZE, bits);
assert NBR_COMBINATIONS[bits] > 0; // Important for modulo check. Otherwise we must use unsigned arithmetic
}
}
private static int nChooseK(int n, int k) {
assert k >= 0 && k <= n;
if (k > n / 2) {
k = n - k;
}
long nCk = 1; // (N choose 0)
for (int i = 0; i < k; i++) {
// (N choose K+1) = (N choose K) * (n-k) / (k+1);
nCk *= (n - i);
nCk /= (i + 1);
}
return (int) nCk;
}
public static int nextCombination(int w, int n) {
// TODO: maybe for small n just advance naively
// Get the position of the current pattern w
int nbrBits = 0;
int position = 0;
while (w != 0) {
final int currentBit = Integer.lowestOneBit(w); // w & -w;
final int bitPos = Integer.numberOfTrailingZeros(currentBit);
position += LOOKUP_TABLE_COMBINATION_POS[nbrBits][bitPos];
// toggle off bit
w ^= currentBit;
nbrBits++;
}
position += n;
// Wrapping, optional
position %= NBR_COMBINATIONS[nbrBits];
// And reverse lookup
int v = 0;
int m = Integer.SIZE - 1;
while (nbrBits-- > 0) {
final int[] bitPositions = LOOKUP_TABLE_COMBINATION_POS[nbrBits];
// Search for largest bitPos such that position >= bitPositions[bitPos]
while (Integer.compareUnsigned(position, bitPositions[m]) < 0)
m--;
position -= bitPositions[m];
v ^= (0b1 << m--);
}
return v;
}
Now for some explanation. LOOKUP_TABLE_COMBINATION_POS[bit][pos] is the core of the algorithm that makes it as fast as it is. The table is designed so that a bit pattern with k bits at positions p_0 < p_1 < ... < p_{k - 1} has a position of `\sum_{i = 0}^{k - 1}{ LOOKUP_TABLE_COMBINATION_POS[i][p_i] }.
The intuition is that we try to move back the bits one by one until we reach the pattern where are all bits are at the lowest possible positions. Moving the i-th bit from position to k + 1 to k moves back by C(k-1, i-1) positions, provided that all lower bits are at the right-most position (no moving bits into or through each other) since we skip over all possible combinations with the i-1 bits in k-1 slots.
We can thus "decode" a bit pattern to a position, keeping track of the bits encountered. We then advance by n positions (rolling over in case we enumerated all possible positions for k bits) and encode this position again.
To encode a pattern, we reverse the process. For this, we move bits from their starting position forward, as long as the position is smaller than what we're aiming for. We could, instead of a linear search through LOOKUP_TABLE_COMBINATION_POS, employ a binary search for our target index m but it's hardly needed, the size of an int is not big. Nevertheless, we reuse our variant that a smaller bit must also come at a less significant position so that our algorithm is effectively O(n) where n = Integer.SIZE.
I remain with the following assertions to show the resulting algorithm:
nextCombination(0b1111111111, 1) == 0b10111111111;
nextCombination(0b1111111111, 10) == 0b11111111110;
nextCombination(0x00FF , 4) == 0x01EF;
nextCombination(0x7FFFFFFF , 4) == 0xF7FFFFFF;
nextCombination(0x03FF , 10) == 0x07FE;
// Correct wrapping
nextCombination(0b1 , 32) == 0b1;
nextCombination(0x7FFFFFFF , 32) == 0x7FFFFFFF;
nextCombination(0xFFFFFFEF , 5) == 0x7FFFFFFF;
Let us consider the numbers with k=10 bits set.
The trick is to determine the rank of the most significant one, for a given n.
There is a single number of length k: C(k, k)=1. There are k+1 = C(k+1, k) numbers of length k + 1. ... There are C(m, k) numbers of length m.
For k=10, the limit n are 1 + 10 + 55 + 220 + 715 + 2002 + 5005 + 11440 + ...
For a given n, you easily find the corresponding m. Then the problem is reduced to finding the n - C(m, k)-th number with k - 1 bits set. And so on recursively.
With precomputed tables, this can be very fast. 30045015 takes 30 lookups, so that I guess that the worst case is 29 x 30 / 2 = 435 lookups.
(This is based on linear lookups, to favor small values. By means of dichotomic search, you reduce this to less than 29 x lg(30) = 145 lookups at worse.)
Update:
My previous estimates were pessimistic. Indeed, as we are looking for k bits, there are only 10 determinations of m. In the linear case, at worse 245 lookups, in the dichotomic case, less than 50.
(I don't exclude off-by-one errors in the estimates, but clearly this method is very efficient and requires no snob.)

CPLEX max function over a range

I have a non-linear constraint in the form of max_{k in V, j in F^o: o > d} {U_{jk} - U_{ik} > 0 for all i in F^d. The set V denotes a fleet of vehicles, while F^o represent customers of a certain type and F^i represents customers of a certain type. How do I implement a max function that will be able to compute this in CPLEX, maxl() and IloMAx() does not seem to work
If I understood correctly your max function is to return the maximum value, among the positive values, of U_jk - U_ik for all k in V and j in F^o such that o > d. If that is correct all you need is a couple of loops, one on the k and another one on the j. For each (k, j) pair you need then to verify all the conditions:
Exceeding the higher value obtained; while also
Having a positive value for U_jk - U_ik.
I'll assume you will have the o > d condition checked outside the max function to simplify it. I propose you to do it as follows:
IloInt myMaxFunction(IloIntArray V, IloIntArray Fo, Ilo2IntArray U, IloInt i) {
IloInt jMax;// j index of the maximum value
IloInt kMax;// k index of the maximum value
IloInt maxVal = -IloInfinity;// maximum value
IloInt Difference;
for (IloInt k = 0; k < V.getSize(); k++) {
for (IloInt j = 0; j < Fo.getSize(); j++) {
Difference = U[F0[j]][V[k]] - U[i][V[k]];
if ((Difference > 0) && (Difference > maxVal)) {
jMax = j;
kMax = k;
maxVal = Difference;
}
}
}
return maxVal;
}
You will be entering the two linear arrays, the first containing the sets of vehicles, the second one containing the customers farther than a distance d to the depot I assume. The third parameter is a bidimensional set of integers you can define with: typedef IloArray<IloIntArray> Ilo2IntArray. Finally, you will also need the customer i as input.
For all (k, j) pair of elements such that k is a vehicle in V and j is a customer on the set F^o you will compute the difference U_jk - U_ik, verify simultaneously the aforementioned conditions. In such case, you will update the indexes and maximal value and continue.
Notice, that maxVal must be initialized to a value that will be improved the first time the conditions are verified. The best/higher value will be returned by the function.
One is glad to be of service...
Y

OpenCL Memory Optimization - Nearest Neighbour

I'm writing a program in OpenCL that receives two arrays of points, and calculates the nearest neighbour for each point.
I have two programs for this. One of them will calculate distance for 4 dimensions, and one for 6 dimensions. They are below:
4 dimensions:
kernel void BruteForce(
global read_only float4* m,
global float4* y,
global write_only ushort* i,
read_only uint mx)
{
int index = get_global_id(0);
float4 curY = y[index];
float minDist = MAXFLOAT;
ushort minIdx = -1;
int x = 0;
int mmx = mx;
for(x = 0; x < mmx; x++)
{
float dist = fast_distance(curY, m[x]);
if (dist < minDist)
{
minDist = dist;
minIdx = x;
}
}
i[index] = minIdx;
y[index] = minDist;
}
6 dimensions:
kernel void BruteForce(
global read_only float8* m,
global float8* y,
global write_only ushort* i,
read_only uint mx)
{
int index = get_global_id(0);
float8 curY = y[index];
float minDist = MAXFLOAT;
ushort minIdx = -1;
int x = 0;
int mmx = mx;
for(x = 0; x < mmx; x++)
{
float8 mx = m[x];
float d0 = mx.s0 - curY.s0;
float d1 = mx.s1 - curY.s1;
float d2 = mx.s2 - curY.s2;
float d3 = mx.s3 - curY.s3;
float d4 = mx.s4 - curY.s4;
float d5 = mx.s5 - curY.s5;
float dist = sqrt(d0 * d0 + d1 * d1 + d2 * d2 + d3 * d3 + d4 * d4 + d5 * d5);
if (dist < minDist)
{
minDist = dist;
minIdx = index;
}
}
i[index] = minIdx;
y[index] = minDist;
}
I'm looking for ways to optimize this program for GPGPU. I've read some articles (including http://www.macresearch.org/opencl_episode6, which comes with a source code) about GPGPU optimization by using local memory. I've tried applying it and came up with this code:
kernel void BruteForce(
global read_only float4* m,
global float4* y,
global write_only ushort* i,
__local float4 * shared)
{
int index = get_global_id(0);
int lsize = get_local_size(0);
int lid = get_local_id(0);
float4 curY = y[index];
float minDist = MAXFLOAT;
ushort minIdx = 64000;
int x = 0;
for(x = 0; x < {0}; x += lsize)
{
if((x+lsize) > {0})
lsize = {0} - x;
if ( (x + lid) < {0})
{
shared[lid] = m[x + lid];
}
barrier(CLK_LOCAL_MEM_FENCE);
for (int x1 = 0; x1 < lsize; x1++)
{
float dist = distance(curY, shared[x1]);
if (dist < minDist)
{
minDist = dist;
minIdx = x + x1;
}
}
barrier(CLK_LOCAL_MEM_FENCE);
}
i[index] = minIdx;
y[index] = minDist;
}
I'm getting garbage results for my 'i' output (e.g. many values that are the same). Can anyone point me to the right direction? I'll appreciate any answer that helps me improve this code, or maybe find the problem with the optimize version above.
Thank you very much
Cauê
One way to get a big speed up here is to use local data structures and compute entire blocks of data at a time. You should also only need a single read/write global vector (float4). The same idea can be applied to the 6d version using smaller blocks. Each work group is able to work freely through the block of data it is crunching. I will leave the exact implementation to you because you will know the specifics of your application.
some pseudo-ish-code (4d):
computeBlockSize is the size of the blocks to read from global and crunch.
this value should be a multiple of your work group size. I like 64 as a WG
size; it tends to perform well on most platforms. will be
allocating 2 * float4 * computeBlockSize + uint * computeBlockSize of shared memory.
(max value for ocl 1.0 ~448, ocl 1.1 ~896)
#define computeBlockSize = 256
__local float4[computeBlockSize] blockA;
__local float4[computeBlockSize] blockB;
__local uint[computeBlockSize] blockAnearestIndex;
now blockA gets computed against all blockB combinations. this is the job of a single work group.
*important*: only blockA ever gets written to. blockB is stored in local memory, but never changed or copied back to global
steps:
load blockA into local memory with async_work_group_copy
blockA is located at get_group_id(0) * computeBlockSize in the global vector
optional: set all blockA 'w' values to MAXFLOAT
optional: load blockAnearestIndex into local memory with async_work_group_copy if needed
need to compute blockA against itself first, then go into the blockB's
be careful to only write to blockA[j], NOT blockA[k]. j is exclusive to this work item
for(j=get_local_id(0); j<computeBlockSize;j++)
for(k=0;k<computeBlockSize; k++)
if(j==k) continue; //no self-comparison
calculate distance of blockA[j] vs blockA[k]
store min distance in blockA[j].w
store global index (= i*computeBlockSize +k) of nearest in blockAnearestIndex[j]
barrier(local_mem_fence)
for (i=0;i<get_num_groups(0);i++)
if (i==get_group_id(0)) continue;
load blockB into local memory: async_work_group_copy(...)
for(j=get_local_id(0); j<computeBlockSize;j++)
for(k=0;k<computeBlockSize; k++)
calculate distance of blockA[j] vs blockB[k]
store min distance in blockA[j].w
store global index (= i*computeBlockSize +k) of nearest in blockAnearestIndex[j]
barrier(local_mem_fence)
write blockA and blockAnearestIndex to global memory using two async_work_group_copy
There should be no problem in reading a blockB while another work group writes the same block (as its own blockA), because only the W values may have changed. If there happens to be trouble with this -- or if you do require two different vectors of points, you could use two global vectors like you have above, one with the A's (writeable) and the other with the B's (read only).
This algorithm work best when your global data size is a multiple of computeBlockSize. To handle the edges, two solutions come to mind. I recommend writing a second kernel for the non-square edge blocks that would in a similar manner as above. The new kernel can execute after the first, and you could save the second pci-e transfer. Alternately, you can use a distance of -1 to signify a skip in the comparison of two elements (ie if either blockA[j].w == -1 or blockB[k].w == -1, continue). This second solution would result in a lot more branching in your kernel though, which is why I recommend writing a new kernel. A very small percentage of your data points will actually fall in a edge block.

Using CUDA to find the pixel-wise average value of a bunch of images

So I have a cube of images. 512X512X512, I want to sum up the images pixel-wise and save it to a final resulting image. So if all the pixels were value 1...the final image would all be 512. I am having trouble understanding the indexing to do this in CUDA. I figure one thread's job will be to sum up all 512 at it's pixel...so the total thread number will be 512X512. So I plan to do it with 512 blocks, with 512 threads each. From here, I am having trouble coming up with the indexing of how to sum the depth. Any help will be greatly appreciated.
One way to solve this problem is imaging the cube as a set of Z slides. The coordinates X, Y refers to the width and height of the image, and the Z coordinate to each slide in the Z dimension. Each thread will iterate in the Z coordinate to accumulate the values.
With this in mind, configure a kernel to launch a block of 16x16 threads and a grid of enough blocks to process the width and height of the image (I'm assuming a gray scale image with 1 byte per pixel):
#define THREADS 16
// kernel configuration
dim3 dimBlock = dim3 ( THREADS, THREADS, 1 );
dim3 dimGrid = dim3 ( WIDTH / THREADS, HEIGHT / THREADS );
// call the kernel
kernel<<<dimGrid, dimBlock>>>(i_data, o_Data, WIDTH, HEIGHT, DEPTH);
If you are clear how to index a 2D array, loop through the Z dimension would be also clear
__global__ void kernel(unsigned char* i_data, unsigned char* o_data, int WIDTH, int HEIGHT, int DEPTH)
{
// in your kernel map from threadIdx/BlockIdx to pixel position
int x = threadIdx.x + blockIdx.x * blockDim.x;
int y = threadIdx.y + blockIdx.y * blockDim.y;
// calculate the global index of a pixel into the image array
// this global index is to the first slide of the cube
int idx = x + y * WIDTH;
// partial results
int r = 0;
// iterate in the Z dimension
for (int z = 0; z < DEPTH; ++z)
{
// WIDTH * HEIGHT is the offset of one slide
int idx_z = z * WIDTH*HEIGHT + idx;
r += i_data[ idx_z ];
}
// o_data is a 2D array, so you can use the global index idx
o_data[ idx ] = r;
}
This is a naive implementation. In order to maximize memory throughput, the data should be properly aligned.
This can be done easily using ArrayFire GPU library ( free). In ArrayFire, you can construct 3D arrays like the following :
Two approaches:
// Method 1:
array data = rand(x,y,z);
// Just reshaping the array, this is a noop
data = newdims(data,x*y, z, 1);
// Sum of pixels
res = sum(data);
// Method 2:
// Use ArrayFire "GFOR"
array data = rand(x,y,z);res = zeros(z,1);
gfor(array i, z) {
res(ii) = sum(data(:,:,i);
}