consecutive days constraint in linear programming - optimization

For a work shift optimization problem, I've defined a binary variable in PuLP as follows:
pulp.LpVariable.dicts('VAR', (range(D), range(N), range(T)), 0, 1, 'Binary')
where
D = # days in each schedule we create (=28, or 4 weeks)
N = # of workers
T = types of work shift (=6)
For the 5th and 6th type of work shift (with index 4 and 5), I need to add a constraint that any worker who works these shifts must do so for seven consecutive days... and not any seven days but the seven days starting from Monday (aka a full week). I've tried defining the constraint as follows, but I'm getting an infeasible solution when I add this constraint and try to solve the problem (it worked before without it)
I know this constraint (along with the others from before) should theoretically be feasible because we manually schedule work shifts with the same set of constraints. Is there anything wrong with the way I've coded the constraint?
## looping over each worker
for j in range(N):
## looping for every Monday in the 28 days
for i in range(0,D,7):
c = None
## accessing only the 5th and 6th work shift type
for k in range(4,T):
c+=var[i][j][k]+var[i+1][j][k]+var[i+2][j][k]+var[i+3][j][k]+var[i+4][j][k]+var[i+5][j][k]+var[i+6][j][k]
problem+= c==7

If I understand correctly then your constraint requires that each worker is required to work the 4th and 5th shift in every week. This is because of c == 7, i.e. 7 of the binaries in c must be set to 1. This does not allow any worker to work in shift 0 through 3, right?
You need to change the constraint so that c == 7 is only enforced if the worker works any shift in that range. A very simple way to do that would be something like
v = list()
for k in range(4,T):
v.extend([var[i][j][k], var[i+1][j][k], var[i+2][j][k], var[i+3][j][k], var[i+4][j][k], var[i+5][j][k], var[i+6][j][k]])
c = sum(v)
problem += c <= 7 # we can pick at most 7 variables from v
for x in v:
problem += 7 * x <= c # if any variable in v is picked, then we must pick 7 of them
This is by no means the best way to model that (indicator variables would be much better), but it should give you an idea what to do.

Just to offer an alternative approach, assuming (as I read it) that for any given week a worker can either work some combination of shifts in [0:3] across the seven days, or one of the shifts [4:5] every day: we can do this by defining a new binary variable Y[w][n][t] which is 1 if in week w worker n does a restricted shift t, 0 otherwise. Then we can relate this variable to our existing variable X by adding constraints so that the values X can take depend on the values of Y.
# Define the sets of shifts
non_restricted_shifts = [0,1,2,3]
restricted_shifts = [4,5]
# Define a binary variable Y, 1 if for week w worker n works restricted shift t
Y = LpVariable.dicts('Y', (range(round(D/7)), range(N), restricted_shifts), cat=LpBinary)
# If sum(Y[week][n][:]) = 1, the total number of non-restricted shifts for that week and n must be 0
for week in range(round(D/7)):
for n in range(N):
prob += lpSum(X[d][n][t] for d in range(week*7, week*7 + 7) for t in non_restricted_shifts) <= 1000*(1-lpSum(Y[week][n][t] for t in restricted_shifts))
# If worker n has 7 restricted shift t in week w, then Y[week][n][t] == 1, otherwise it is 0
for week in range(round(D/7)):
for n in range(N):
for t in restricted_shifts:
prob += lpSum(X[d][n][t] for d in range(week*7, week*7+7)) <= 7*(Y[week][n][t])
prob += lpSum(X[d][n][t] for d in range(week*7, week*7+7)) >= Y[week][n][t]*7
Some example output (D=14, N=2, T=6):
/ M T W T F S S / M T W T F S S / M T W T F S S / M T W T F S S
WORKER 0
Shifts: / 2 3 1 3 3 2 2 / 1 0 2 3 2 2 0 / 3 1 2 2 3 1 1 / 2 3 0 3 3 0 3
WORKER 1
Shifts: / 3 1 2 3 1 1 2 / 3 3 2 3 3 3 3 / 4 4 4 4 4 4 4 / 1 3 2 2 3 2 1
WORKER 2
Shifts: / 1 2 3 1 3 1 1 / 3 3 2 2 3 2 3 / 3 2 3 0 3 1 0 / 4 4 4 4 4 4 4
WORKER 3
Shifts: / 2 2 3 2 1 2 3 / 5 5 5 5 5 5 5 / 3 1 3 1 0 3 1 / 2 2 2 2 3 0 3
WORKER 4
Shifts: / 5 5 5 5 5 5 5 / 3 3 1 0 2 3 3 / 0 3 3 3 3 0 2 / 3 3 3 2 3 2 3

Related

pandas: idxmax for k-th largest

Having df of probabilities distribution, I get max probability for rows with df.idxmax(axis=1) like this:
df['1k-th'] = df.idxmax(axis=1)
and get the following result:
(scroll the tables to the right if you can not see all the columns)
0 1 2 3 4 5 6 1k-th
0 0.114869 0.020708 0.025587 0.028741 0.031257 0.031619 0.747219 6
1 0.020206 0.012710 0.010341 0.012196 0.812495 0.113863 0.018190 4
2 0.023585 0.735475 0.091795 0.021683 0.027581 0.054217 0.045664 1
3 0.009834 0.009175 0.013165 0.016014 0.015507 0.899115 0.037190 5
4 0.023357 0.736059 0.088721 0.021626 0.027341 0.056289 0.046607 1
the question is how to get the 2-th, 3th, etc probabilities, so that I get the following result?:
0 1 2 3 4 5 6 1k-th 2-th
0 0.114869 0.020708 0.025587 0.028741 0.031257 0.031619 0.747219 6 0
1 0.020206 0.012710 0.010341 0.012196 0.812495 0.113863 0.018190 4 3
2 0.023585 0.735475 0.091795 0.021683 0.027581 0.054217 0.045664 1 4
3 0.009834 0.009175 0.013165 0.016014 0.015507 0.899115 0.037190 5 4
4 0.023357 0.736059 0.088721 0.021626 0.027341 0.056289 0.046607 1 2
Thank you!
My own solution is not the prettiest, but does it's job and works fast:
for i in range(7):
p[f'{i}k'] = p[[0,1,2,3,4,5,6]].idxmax(axis=1)
p[f'{i}k_v'] = p[[0,1,2,3,4,5,6]].max(axis=1)
for x in range(7):
p[x] = np.where(p[x]==p[f'{i}k_v'], np.nan, p[x])
The loop does:
finds the largest value and it's column index
drops the found value (sets to nan)
again
finds the 2nd largest value
drops the found value
etc ...

Extract the summary of data using groupby and optimize the inspector utilisation - pandas and and other optimisation package in python

I have accident record data as shown below across the places
Inspector_ID Place Date
0 1 A 1-09-2019
1 2 A 1-09-2019
2 1 A 1-09-2019
3 1 B 1-09-2019
4 3 A 1-09-2019
5 3 A 1-09-2019
6 1 A 2-09-2019
7 3 A 2-09-2019
8 2 B 2-09-2019
9 3 A 3-09-2019
10 1 C 3-09-2019
11 1 D 3-09-2019
12 1 A 3-09-2019
13 1 E 3-09-2019
14 1 A 3-09-2019
15 1 A 3-09-2019
16 3 A 4-09-2019
17 3 B 5-09-2019
18 4 B 5-09-2019
19 3 A 5-09-2019
20 3 C 5-09-2019
21 3 A 5-09-2019
22 3 D 5-09-2019
23 3 C 5-09-2019
From the above data, I want to optimize the inspector utlisation.
for that tried below codes get the objective function of the optimisation.
c = df.groupby('Place').Inspector_ID.agg(
Total_Number_of_accidents='count',
Number_unique_Inspector='nunique',
Unique_Inspector='unique').reset_index().sort_values(['Total_Number_of_accidents'], ascending=False)
Below is the output of above code
Place Total_Number_of_accidents Number_unique_Inspector Unique_Inspector
0 A 14 3 [1, 2, 3]
1 B 4 4 [1, 2, 3, 4]
2 C 3 2 [1, 3]
3 D 2 2 [1, 3]
4 E 1 1 [1]
And then
f = df.groupby('Inspector_ID').Place.agg(
Total_Number_of_accidents='count',
Number_unique_Place='nunique',
Unique_Place='unique').reset_index().sort_values(['Total_Number_of_accidents'], ascending=False)
Output:
Inspector_ID Total_Number_of_accidents Number_unique_Place Unique_Place
2 3 11 4 [A, B, C, D]
0 1 10 5 [A, B, C, D, E]
1 2 2 2 [A, B]
3 4 1 1 [B]
From the above we have 4 Inspectors, 5 Places and 24 accidents. I want to optimize the allocation of inspectors based on the above data.
condition 1 - There should be at least 1 inspector in each Place.
condition 2 - All inspector should be assigned at least one Place.
Condition 3 - Identify the Place which is over utilised based on number of accidents (for eg: Place - B - Only 4 accidents and four inspector, So some inpspector from Place B can be assigned to Place A and next question is which inspector? and How many?.
Is it possible to do that in python, if possible which algorithm? and how?
it is an https://en.wikipedia.org/wiki/Assignment_problem maybe it should be reduced to max-flow problem but with optimization of equality in flow (using graph package like NetworkX):
how to create di-graph:
vertice s source of flow (of accidents)
S-set would be all places that will have accidents
X_s - set of all edges (s, x) where x in S, now t is sink, and we have analogus sets T and X_t now let's set capacity for edges in X_s - it would be set from column Total_Number_of_accidents in X_t we would set max number of accidents to process by inspector and we will get back to it later on, now let's make edges from S to T (x, y) where x in X_s and y in X_t and let's set capacity of these edges to high number (e.g. 1e6) and let's call this set X_c these edges will tell us how much load will get inspector y from place x.
now solve max-flow problem, and when some edges from X_t would have too big flow you can decrease capacity of these (to reduce load on particular inspector) and when some edges in X_c will have very small flow you can just remove these edges to reduce complexity of work organization, after few iterations you should have desired solution
you can code some super algorithm but if it's real life problem you would like to avoid situations like assigning one inspector to all places and to process 0.38234 accident at each place...
also there should be probably some constraints on how many accidents should be processed by inspector in given time but you didn't mentioned it...

Compute element overlap based on another column, pandas

If I have a dataframe of the form:
tag element_id
1 12
1 13
1 15
2 12
2 13
2 19
3 12
3 15
3 22
how can I compute the overlaps of the tags in terms of the element_id ? The result I guess should be an overlap matrix of the form:
1 2 3
1 X 2 2
2 2 X 1
3 2 1 X
where I put X on the diagonal since the overlap of a tag with itself is not relevant and where the numbers in the matrix represent the total element_ids that the two tags share.
My attempts:
You can try and use a for loop like :
for item in df.itertuples():
element_lst += [item.element_id]
element_tag = item.tag
# then intersect the element_list row by row.
# This is extremely costly for large datasets
The second thing I was thinking about was to use df.groupby('tag') and try to somehow intersect on element_id, but it is not clear to me how I can do that with grouped data.
merge + crosstab
# Find element overlap, remove same tag matches
res = df.merge(df, on='element_id').query('tag_x != tag_y')
pd.crosstab(res.tag_x, res.tag_y)
Output:
tag_y 1 2 3
tag_x
1 0 2 2
2 2 0 1
3 2 1 0

Step through a single permutation order based on a number provided

I have 5 items that can be placed in any unique order, I want to store the values (numbers) of a single unique order to a variable, one by one. For example:
User input: 7
Then i_Int = 7
should give me
v_Var = 1
wait 1 sec
v_Var = 3
wait 1 sec
v_Var = 2
wait 1 sec
v_Var = 4
wait 1 sec
v_Var = 5
The data below list all possible permutations of 5 items, where the first row lists the permutation #, I will not have this data to make things easy.
1 1 2 3 4 5
2 1 2 3 5 4
3 1 2 4 3 5
4 1 2 4 5 3
5 1 2 5 3 4
6 1 2 5 4 3
7 1 3 2 4 5
8 1 3 2 5 4
9 1 3 4 2 5
10 1 3 4 5 2
...
111 5 3 2 1 4
112 5 3 2 4 1
113 5 3 4 1 2
114 5 3 4 2 1
115 5 4 1 2 3
116 5 4 1 3 2
117 5 4 2 1 3
118 5 4 2 3 1
119 5 4 3 1 2
120 5 4 3 2 1
Here is a function that returns the permutation of 1,...,n of rank i:
Function Unrank(ByVal n As Long, ByVal rank As Long, Optional lb As Long = 1) As Variant
Dim Permutation As Variant
Dim Items As Variant
ReDim Permutation(lb To lb + n - 1)
ReDim Items(0 To n - 1)
Dim i As Long, j As Long, k As Long, q As Long
Dim fact As Long
For i = 0 To n - 1
Items(i) = i + 1
Next i
rank = rank - 1
j = lb
For i = n - 1 To 1 Step -1
fact = Application.WorksheetFunction.fact(i)
q = Int(rank / fact)
Permutation(j) = Items(q)
'slide items above q 1 unit to left
For k = q + 1 To i
Items(k - 1) = Items(k)
Next k
j = j + 1
rank = rank Mod fact
Next i
'place last item:
Permutation(lb + n - 1) = Items(0)
Unrank = Permutation
End Function
As a default, it returns the result as a 1-based array. To make it 0-based, use a call like Unrank(5,7,0). As a test:
Sub test()
'fills A1:A120 with the permutations of 1,2,3,4,5
Dim i As Long
For i = 1 To 120
Cells(i, 1).Value = Join(Unrank(5, i), " ")
Next i
End Sub
13! is too large to hold in a Long variable, so the code throws an untrapped error when n=14. The algorithm that I use depends on the ability to do modular arithmetic with the relevant factorials, so there is no easy fix in VBA. Note that you could easily tweak the code so that you pass it an array of items to permute rather than always permuting 1-n. The algorithm destroys the array Items, so such a tweak would involve creating a 0-based (so that the modular arithmetic works out) copy of the passed array.

Write a number N as sum of K prime numbers

Is there any condition for writing a number N as sum of K prime numbers(prime numbers not necessarily distinct)?
Example: If N=6 and K=2 then we can write N as 6=3+3 whereas if N=11 and K=2 then we cannot represent 11 as sum of two primes.
My Approach- I deduced the condition that If K>=N then we cannot represent N as sum of K primes.Also if K=1 then by primality testing we can check whether whether N is a prime number. Also by goldbach's conjecture for even numbers(except 2) N can be represented as sum of two prime numbers.
But the main problem is that I'm not able to predict it for K>=3.
1.Well, first list out all the prime numbers less than and equal to N.
2.Brute Force Approach with backtracking method.
ex :
N = 8
k = 2.
2 2
2 3
2 5
2 7
3 3(Don't again consider 3 and 2)
3 5.
Done!
ex : 2
N = 12,
k = 4
2 2 2 2
2 2 2 3
2 2 2 5
2 2 2 7
2 2 3 3(don't again check for 2232)
2 2 3 5.
Done!
ex 3:
N = 11,
k = 3
2 2 2
2 2 3
2 2 5
2 2 7
2 2 11
2 3 3(don't check again for 232)
2 3 5
2 3 7>11(don't check for 2311)
3 3 3(don't again check the 32.. series.)
10.3 3 5
Done!