Why does CPlex solve this mixed integer linear program so incredibly fast? - optimization

I am working on an optimization problem where I want to find resource-activity assignment based on skill restrictions (not all resources have all skills for the demands d), resource restrictions (resources have a limited presence p) and an assignment restriction l limiting the number of resources assigned to an activity. I want to maximize the sum of weights w of all selected activities. The model is depicted here:
Now I fed this into CPlex and it solves gigantic instances in a very short amount of time as long as I allow heuristics (1000 activities, 50 resources, 5 skills within about 10 sec), even though there is a huge possible number of selected issues AND a huge number of possible assignments for each activity.
Why is this problem so easy for CPlex? Is there some kind of easily solvable underlying problem that I am missing?
edit: typical output of one of my solver runs:
Tried aggregator 1 time.
MIP Presolve eliminated 3051 rows and 64954 columns.
MIP Presolve modified 49950 coefficients.
Reduced MIP has 52001 rows, 236046 columns, and 662190 nonzeros.
Reduced MIP has 47952 binaries, 0 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.61 sec. (276.60 ticks)
Probing time = 0.38 sec. (12.12 ticks)
Tried aggregator 1 time.
MIP Presolve eliminated 1323 rows and 62181 columns.
MIP Presolve modified 3366 coefficients.
Reduced MIP has 50678 rows, 173865 columns, and 474324 nonzeros.
Reduced MIP has 47952 binaries, 0 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.78 sec. (334.49 ticks)
Probing time = 0.26 sec. (9.19 ticks)
Tried aggregator 1 time.
Reduced MIP has 50678 rows, 173865 columns, and 474324 nonzeros.
Reduced MIP has 47952 binaries, 0 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.49 sec. (220.07 ticks)
Probing time = 0.36 sec. (10.46 ticks)
MIP emphasis: balance optimality and feasibility.
MIP search method: dynamic search.
Parallel mode: deterministic, using up to 8 threads.
Root relaxation solution time = 2.86 sec. (1101.00 ticks)
Nodes Cuts/
Node Left Objective IInf Best Integer Best Bound ItCnt Gap
* 0+ 0 7.0000 5631.0000 3104 ---
0 0 2265.4000 2 7.0000 2265.4000 3104 ---
* 0+ 0 2265.0000 2265.4000 3107 0.02%
* 0 0 integral 0 2265.0000 2265.4000 3107 0.02%
Elapsed time = 7.59 sec. (2779.25 ticks, tree = 0.00 MB, solutions = 2)
Cover cuts applied: 1
Root node processing (before b&c):
Real time = 7.61 sec. (2792.47 ticks)
Parallel b&c, 8 threads:
Real time = 0.00 sec. (0.00 ticks)
Sync time (average) = 0.00 sec.
Wait time (average) = 0.00 sec.
------------
Total (root+branch&cut) = 7.61 sec. (2792.47 ticks)

I think that's because the form of your model is relatively simple, such as the objective function, and the model scale is rather small. You can test CPLEX with more complicated examples. I used to solve a MIP model with more than 800 thousand constraints and 300 thousand variables.

Related

What is more costly for a computer to compute (2*2) or (0.02*0.02)?

The title says it.
I was wondering if normalization has any effect on computing.
Does normalization effect computing?
It is costlier to compute 0.02 * 0.02 compared to 2 * 2 has 2 * 2 is a single math operation where only one multiplication is involved.
Floating point number are stored has 2*10^-2 format(scientific notation). Therefore, two operations are involved here,
2 * 2
(-2) + (-2)
Thus, the answer is computed as 4 * 10^(-4) or 0.0004.
Thus, 0.02 * 0.02 is costlier compared to 2*2.

Find longest segment

I have a dataframe (or a series) of measured voltage (in V) indexed by timestamps (in seconds). I want to know the duration of the longest segment (=consecutive values) of voltage greater than a threshold.
Example:
time voltage
0.0 1.2
0.1 1.8
0.2 2.2
0.3 2.3
0.4 1.9
0.5 1.5
0.6 2.1
0.7 2.3
0.8 2.2
0.9 1.9
1.0 1.6
In this example, threshold is 2.0 V, and desired answer is 0.3 seconds
Real data is made of 10k or more samples, and number of segments of values above the threshold is completly random, there is even the possibility of having only one segment with all values above the threshold.
I think the first step is too identified these segments et separate them, then perform calculation of duration.
You can create a True and False sequence with boolean indexing. Then use value_counts and max to get the longest sequence:
s = df.voltage > 2
(~s).cumsum()[s].value_counts().max()
Output
3
IIUC
n=2
s=df.voltage.gt(n)
df.time[s].groupby((~s).cumsum()).diff().sum()
Out[1218]: 0.30000000000000004
And if you need the longest duration , Notice here is from 0.6 to 0.8 which should be 0.2 second ..
df.time[s].groupby((~s).cumsum()).apply(lambda x : x.diff().sum()).max()
Out[1221]: 0.20000000000000007

Error in Gurobi for optimization?

sw: ampl
ampl: include Availability1.ampl
Gurobi 5.5.0: mipgap = 0.00000000000000000000000001
outlev = 1
Optimize a model with 68298 rows, 1934 columns and 28751 nonzeros
Presolve removed 1934 rows and 68298 columns
Presolve time: 0.02s
Presolve: All rows and columns removed
Iteration Objective Primal Inf. Dual Inf. Time
0 9.9948451e-01 0.000000e+00 0.000000e+00 0s
179 9.9948451e-01 0.000000e+00 0.000000e+00 0s
Solved in 179 iterations and 0.06 seconds
Optimal objective 9.994845101e-01
Gurobi 5.5.0: optimal solution; objective 0.9994845101
179 simplex iterations
Above is my output: No error but is not the optimal answer.
What's the matter? I can't get the right answer. Please help me.

Optimal mullps/addps instructions order for 3 SSE units for Intel Core 2 Duo

It's known that Intel Core 2 Duo has 3 SSE units. These 3 units allows 3 SSE instructions to be run paralelly (1), for example:
rA0 = mullps(rB0, rC0); \
rA1 = mullps(rB1, rC1); > All 3 take 1 cycle to be scheduled (* - see Remarks).
rA2 = mullps(rB2, rC2); /
It's known also, that each SSE unit consists of 2 modules: one for addition (substraction), and one for multiplication (division). The latter allows to run mullps-addps instruction sequences parallelly (2), for example:
rA0 = mullps(rB0, rC0); \
> All 2 take 1 cycle to be scheduled for 1 SSE module.
rA1 = addps(rB1, rC1); /
Question is the followig: how much cycles each of the following 2 code snippets take to be scheduled?
Code listing A:
rA0 = mullps(rB0, rC0); \
rA1 = mullps(rB1, rC1); |
rA2 = mullps(rB2, rC2); \ Do all 6 execute in one step? (See paragraph (2))
rA3 = addps(rB3, rC3); /
rA4 = addps(rB4, rC4); |
rA5 = addps(rB5, rC5); /
Code listing B:
rA0 = mullps(rB0, rC0); \
rA1 = addps(rB1, rC1); |
rA2 = mullps(rB2, rC2); \ Do all 6 execute in one step? (See paragraph (1))
rA3 = addps(rB3, rC3); /
rA4 = mullps(rB4, rC4); |
rA5 = addps(rB5, rC5); /
Which way of instruction ordering should I prefer, A or B?
More specifically:
Is it possible to distribute 3 mulps to 3 SSE multiplication units (1), and at the same time (2) to distribute addps to their respective SSE addition units, resulting in total 6 instructions per schedule cycle?
If I run N mullps first, and N addps then - which N is optimal?
Remarks
by 'scheduled' I mean throughput rate.
See Agner Fog's instruction tables for which instructions can run on which execution units. And/or use Intel's code analyzer (IACA) to find throughput bottlenecks (dependency chains or port contention).
As the commenters say, not all the execution ports can handle FP MUL. They can all handle vector-int logicals (AND/OR/XOR), but only one or two ports have a vector shuffle unit, or a vector shift unit, etc. etc.

Arc4random modulo biased

According to this documentation,
arc4random_uniform() is recommended over constructions like arc4random() % upper_bound as it avoids "modulo bias" when the upper bound is not a power of two.
How bad is the bias? For example if I generate random numbers with an upper bound of 6, what's the difference between using arc4random with % and arc4random_uniform()?
arc4random() returns an unsigned 32-bit integer, meaning the values are between
0 and 2^32-1 = 4 294 967 295.
Now, the bias results from the fact that the multiple subintervals created with
modulo are not fitting exactly into the random output range.
Lets imagine for clarity a random generator that creates numbers from 0 to 198
inclusive. You want numbers from 0 to 99, therefore you calculate random() % 100,
yielding 0 to 99:
0 % 100 = 0
99 % 100 = 99
100 % 100 = 0
198 % 100 = 98
You see that 99 is the only number which can occur only once while all
others can occur twice in a run. That means that the probability for 99
is exactly halved which is also the worst case in a bias where at least
2 subintervals are involved.
As all powers of two smaller than the range interval fits nicely into the
2^32 interval, the bias disappears in this case.
The implications are that the smaller the result set with modulo and the higher
the random output range, the smaller the bias. In your example, 6 is your upper
bound (I assume 0 is the lower bound), so you use % 7, resulting that 0-3
occurs 613 566 757 times while 4-6 occurs 613 566 756 times.
So 0-3 is 613 566 757 / 613 566 756 = 1.0000000016298 times more probable
than 4-6.
While it seems easy to dismiss, some experiments (especially Monte-Carlo
experiments) were flawed exactly because these seemingly incredible small
differences were pretty important.
Even worse is the bias if the desired output range is bigger than
the random target range. Please read the Fisher-Yates shuffle entry
because many poker sites learned the hard way that normal linear
congruential random generators and bad shuffling algorithms resulted in
impossible or very probable decks or worse, predictable decks.