Optimization between 2 dependent variables - optimization

I couldn't find a solution nowhere, so I'm asking here. Hope someone knows how to guide me thru.
So, I'm not quite sure if this is a optimization problem (if anyone know what kind of problem is this, let me know), but I need to find the quantity of clients that each attendant has to have so that each has the same amount of orders. I don't know if there is a function or a regression that could be made of this.
Column A has the clients name. Column B has the "difficulty" that each client is to assist - that is, "1" is normal difficulty, "2" is double the normal difficulty and so on - meaning that the orders of this clients will be multiplied by his difficulty. Column C has a spec that only attendant Y can assist. Column D has the quantity of orders that each client requested. And finally column E is the account attendant.
CLIENT
ATTENTION
SPEC
ORDERS
ATTENDANT
a1
3
0
6
y
a2
3
0
7
x
a3
1
0
1
y
a4
1
0
9
y
a5
2
0
6
y
a6
1
0
7
y
a7
3
0
2
y
a8
3
0
9
x
a9
3
0
9
y
a10
2
1
8
y
a11
2
0
8
x
a12
2
0
9
y
a13
1
1
2
y
a14
2
0
4
x
a15
3
0
10
y
a16
2
0
9
x
a17
2
0
8
y
a18
1
1
5
y
a19
3
0
8
x
a20
1
1
3
y
a21
2
0
10
x
a22
2
0
6
x
Summary tables:
ATTENDANT
TOTAL ORDERS
x
61
y
84
ATTENDANT
TOTAL CLIENTS
x
8
y
14
ATTENDANT
TOTAL ORDERS
x
61
y
84
y (spec 0)
66
y (spec 1)
18

Here is a linear program that solves this. Fun problem! This uses the pyomo environment and the separately installed glpk solver. Result: It's a tie! X and Y don't need to haggle! ;)
Code:
# client assignment
import csv
from collections import namedtuple
import pyomo.environ as pyo
data_file = 'data.csv'
Record = namedtuple('Record', ['attention', 'spec', 'orders'])
records = {}
with open(data_file, 'r') as src:
src.readline() # burn header row
reader = csv.reader(src, skipinitialspace=True)
for line in reader:
record = Record(int(line[1]), int(line[2]), int(line[3]))
records[line[0]] = record
#print(records)
# set up the ILP
m = pyo.ConcreteModel()
# SETS
m.C = pyo.Set(initialize=records.keys()) # the set of clients
m.A = pyo.Set(initialize=['X', 'Y']) # the set of attendants
# PARAMETERS
m.attention = pyo.Param(m.C, initialize={c: r.attention for (c, r) in records.items()})
m.y_required = pyo.Param(m.C, initialize={c: r.spec for (c, r) in records.items()})
m.orders = pyo.Param(m.C, initialize={c: r.orders for (c, r) in records.items()})
# VARIABLES
m.assign = pyo.Var(m.A, m.C, domain=pyo.Binary) # 1: assign attendant a to client c
m.work_delta = pyo.Var(domain=pyo.NonNegativeReals) # the abs(work difference)
# OBJECTIVE
# minimize the work delta...
m.obj = pyo.Objective(expr=m.work_delta)
# CONSTRAINTS
# each client must be serviced once and only once
def service(m, c):
return sum(m.assign[a, c] for a in m.A) == 1
m.C1 = pyo.Constraint(m.C, rule=service)
# y-specific customers must be serviced by attendant y, and optionally if not reqd.
def y_serves(m, c):
return m.assign['Y', c] >= m.y_required[c]
m.C2 = pyo.Constraint(m.C, rule=y_serves)
# some convenience expressions to capture work...
m.y_work = sum(m.attention[c] * m.orders[c] * m.assign['Y', c] for c in m.C)
m.x_work = sum(m.attention[c] * m.orders[c] * m.assign['X', c] for c in m.C)
# capture the ABS(y_work - x_work) with 2 constraints
m.C3a = pyo.Constraint(expr=m.work_delta >= m.y_work - m.x_work)
m.C3b = pyo.Constraint(expr=m.work_delta >= m.x_work - m.y_work)
# check the model
#m.pprint()
# SOLVE
solver = pyo.SolverFactory('glpk')
res = solver.solve(m)
# ensure the result is optimal
status = res.Solver()['Termination condition'].value
assert(status == 'optimal', f'error occurred, status: {status}. Check model!')
print(res)
print(f'x work: {pyo.value(m.x_work)} units')
print(f'y work: {pyo.value(m.y_work)} units')
# list assignments
for c in m.C:
for a in m.A:
if pyo.value(m.assign[a, c]):
print(f'Assign {a} to customer {c}')
Output:
Problem:
- Name: unknown
Lower bound: 0.0
Upper bound: 0.0
Number of objectives: 1
Number of constraints: 47
Number of variables: 46
Number of nonzeros: 157
Sense: minimize
Solver:
- Status: ok
Termination condition: optimal
Statistics:
Branch and bound:
Number of bounded subproblems: 53
Number of created subproblems: 53
Error rc: 0
Time: 0.008551836013793945
Solution:
- number of solutions: 0
number of solutions displayed: 0
x work: 158.0 units
y work: 158.0 units
Assign Y to customer a1
Assign Y to customer a2
Assign X to customer a3
Assign Y to customer a4
Assign Y to customer a5
Assign Y to customer a6
Assign Y to customer a7
Assign Y to customer a8
Assign X to customer a9
Assign Y to customer a10
Assign X to customer a11
Assign X to customer a12
Assign Y to customer a13
Assign X to customer a14
Assign X to customer a15
Assign X to customer a16
Assign X to customer a17
Assign Y to customer a18
Assign X to customer a19
Assign Y to customer a20
Assign Y to customer a21
Assign Y to customer a22
[Finished in 588ms]
In Excel Solver
Note: in the solver options, be sure to de-select "ignore integer constraints"
This seems to work OK. The green shaded areas are "locked" to Y and not in the solver's control.
I'm always suspicious of the solver in Excel, so check everything!

Related

Trying to mark first encounter in every group pandas

I have a df with 5 columns:
What i'm trying to do is mark value of first customer interaction after talking to human in every specific group.
Hopefully the outcome would be like this:
What I have tried is shifting type column to put previous row in front of type to check if its customer and prev row is human. However, I can't figure out a grouping option to get min index for each group for each occurrence.
This works:
k = pd.DataFrame(df.groupby('group').apply(lambda g: (g['type'].eq('customer') & g['type'].shift(1).eq('human')).pipe(lambda x: [x.idxmax(), x[::-1].idxmax()])).tolist())
df['First'] = ''
df['Last'] = ''
df.loc[k[1], 'First'] = 'F'
df.loc[k[1], 'Last'] = 'L'
Output:
>>> df
group type First Last
0 x bot
1 x customer
2 x bot
3 x customer
4 x human
5 x customer F
6 x human
7 x customer L
8 y bot
9 y customer
10 y bot
11 y customer
12 y human
13 y customer F
14 y human
15 y customer L
16 z bot
17 z customer
18 z bot
19 z customer
20 z human
21 z customer F
22 z human
23 z customer L
24 z customer
25 z customer

Guidance needed | Optimization challenge. Would love to get some inputs 🙏🏻

I need some guidance on how to approach for this problem. I've simplified a real life example and if you can help me crack this by giving me some guidance, it'll be awesome.
I've been looking at public optimization algorithms here (https://www.cvxpy.org/) but I'm a noob and I'm not able to figure out which algorithm would help me (or if I really need it).
Problem:
x1 to x4 are items with certain properties (a,b,c,y,z)
I have certain needs:
Parameter My Needs
a 150
b 800
c 80
My goal is get all optimal coefficient sets for x1 to x4 (can be
fractions) so as to get as much of a, b and c as possible to satisfy
needs from the smallest possible y.
These conditions must always be met:
1)Individual values of z should stay within threshold (between maximum and minimum for x1, x2, x3 and x4)
2)And Total y should be maintained within limits (y <=1000 & y>=2000)
To illustrate an example:
x1
Each x1 has the following properties
a 20 Minimum z 10 Maximum z 50
b 200
c 0
y 300
z 20
x2
Each x2 has the following properties
a 30 Minimum z 60 Maximum z 160
b 5
c 20
y 50
z 40
x3
Each x3 has the following properties
a 20 Minimum z 100 Maximum z 200
b 200
c 15
y 200
z 40
x4
Each x4 has the following properties
a 5 Minimum z 100 Maximum z 300
b 30
c 20
y 500
z 200
One possible arrangement can be (not the optimal solution as I'm trying to keep y as low as possible but above 1000 but to illustrate output)
2x1+2x2+1x3+0.5x4
In this instance:
Coeff x1 2
Coeff x2 2
Coeff x3 3
Coeff x4 0.5
This set of coefficients yields
Optimal?
total y 1550 Yes
total a 162.5 Yes
total b 1025 Yes
total c 95 Yes
z of x1 40 Yes
z of x2 80 Yes
z of x3 120 Yes
z of x4 100 Yes
Lowest y? No
Can anyone help me out?
Thanks!

how to apply a function that uses multiple columns as input in pandas?

I have a function relative_humidity(temperature, humidity_index) which takes two variables.
I also have a DataFrame with one column being temperature and the other humidity_index, and I am trying to use this function to create a new humidity column which is calculated using these rows.
I have tried using the df.apply() function but it hasn't worked for me since I am trying to use more than one column. I have also tried looping through every row and applying the function to each row, but this appears too slow. Any help appreciated.
EDIT: my function looks like this:
def relative_humidity_calculator(T, HI):
a = c_6 + c_8*T + c_9*T**2
b = c_3 + c_4*T + c_7*T**2
c = c_1 + c_2*T + c_5*T**2 -HI
solutions = []
#adding both solutions of quadratic to list
if b**2-4*a*c>=0:
solutions.append((-b+np.sqrt(b**2-4*a*c))/(2*a))
solutions.append((-b-np.sqrt(b**2-4*a*c))/(2*a))
#solution is the correct one if it is between 0 and 100
if solutions[0]>0 and solutions[0]<100:
return solutions[0]
else:
return solutions[1]
else:
return print('imaginary roots', T, HI, a, b, c)
Based on your updated question, you can do this without comprising the function:
# sample data:
c1,c2,c3,c4,c5,c6,c7,c8,c9 = range(9)
np.random.seed(1)
df = pd.DataFrame(np.random.randint(0,100,(10,2)), columns=['T','HI'])
# shorthand for Temp and Humidity-Index
T = df['T']
HI = df['HI']
# series arithmetic operations are allowed
a = c6 + c8*T + c9*T**2
b = c3 + c4*T + c7*T**2
c = c1 + c2*T + c5*T**2 - HI
# discriminant too
deltas = b**2-4*a*c
delta_roots = np.sqrt(b**2 - 4*a*c, where=deltas>0)
# two solutions of quadratic
s0 = (- b + delta_roots)/(2*a)
s1 = (- b - delta_roots)/(2*a)
df['rel_hum'] = np.select(((s0>0) & (s0<100), # condition on first solution
deltas>=0), # quadratic has solutions
(s0, s1), np.nan)
Output:
T HI rel_hum
0 37 12 NaN
1 72 9 0.129917
2 75 5 0.028714
3 79 64 -0.629721
4 16 1 NaN
5 76 71 -0.742304
6 6 25 NaN
7 50 20 NaN
8 18 84 NaN
9 11 28 NaN

Efficiently assigning values to multidimensional array based on indices list on one dimension

I have a matrix M of size [S1, S2, S3].
I have another matrix K that serves as the indices in the first dimension that I want to assign, with size [1, S2, S3].
And V is a [1, S2, S3] matrix which contains the values to be assigned correspondingly.
With for loops, this is how I did it:
for x2 = 1:S2
for x3 = 1:S3
M(K(1,x2,x3), x2, x3) = V(1, x2, x3)
endfor % x3
endfor % x2
Is there a more efficient way to do this?
Visualization for 2D case:
M =
1 4 7 10
2 5 8 11
3 6 9 12
K =
2 1 3 2
V =
50 80 70 60
Desired =
1 80 7 10
50 5 8 60
3 6 70 12
Test case:
M = reshape(1:24, [3,4,2])
K = reshape([2,1,3,2,3,3,1,2], [1,4,2])
V = reshape(10:10:80, [1,4,2])
s = size(M)
M = assign_values(M, K, V)
M =
ans(:,:,1) =
1 20 7 10
10 5 8 40
3 6 30 12
ans(:,:,2) =
13 16 70 22
14 17 20 80
50 60 21 24
I'm looking for an efficient way to implement assign_values there.
Running Gelliant's answer somehow gives me this:
key = sub2ind(s, K, [1:s(2)])
error: sub2ind: all subscripts must be of the same size
You can use sub2ind to use your individual subscripts to linear indices. These can then be used to replace them with the values in V.
M = [1 4 7 10 ;...
2 5 8 11 ;...
3 6 9 12];
s=size(M);
K = [2 1 3 2];
K = sub2ind(s,K,[1:s(2)])
V = [50 80 70 60];
M(K)=V;
You don't need reshape and M=M(:) for it to work in Matlab.
I found that this works:
K = K(:)'+(S1*(0:numel(K)-1));
M(K) = V;
Perhaps this is supposed to work the same way as Gelliant's answer, but I couldn't make his answer work, somehow =/

How to subtract one dataframe from another?

First, let me set the stage.
I start with a pandas dataframe klmn, that looks like this:
In [15]: klmn
Out[15]:
K L M N
0 0 a -1.374201 35
1 0 b 1.415697 29
2 0 a 0.233841 18
3 0 b 1.550599 30
4 0 a -0.178370 63
5 0 b -1.235956 42
6 0 a 0.088046 2
7 0 b 0.074238 84
8 1 a 0.469924 44
9 1 b 1.231064 68
10 2 a -0.979462 73
11 2 b 0.322454 97
Next I split klmn into two dataframes, klmn0 and klmn1, according to the value in the 'K' column:
In [16]: k0 = klmn.groupby(klmn['K'] == 0)
In [17]: klmn0, klmn1 = [klmn.ix[k0.indices[tf]] for tf in (True, False)]
In [18]: klmn0, klmn1
Out[18]:
( K L M N
0 0 a -1.374201 35
1 0 b 1.415697 29
2 0 a 0.233841 18
3 0 b 1.550599 30
4 0 a -0.178370 63
5 0 b -1.235956 42
6 0 a 0.088046 2
7 0 b 0.074238 84,
K L M N
8 1 a 0.469924 44
9 1 b 1.231064 68
10 2 a -0.979462 73
11 2 b 0.322454 97)
Finally, I compute the mean of the M column in klmn0, grouped by the value in the L column:
In [19]: m0 = klmn0.groupby('L')['M'].mean(); m0
Out[19]:
L
a -0.307671
b 0.451144
Name: M
Now, my question is, how can I subtract m0 from the M column of the klmn1 sub-dataframe, respecting the value in the L column? (By this I mean that m0['a'] gets subtracted from the M column of each row in klmn1 that has 'a' in the L column, and likewise for m0['b'].)
One could imagine doing this in a way that replaces the the values in the M column of klmn1 with the new values (after subtracting the value from m0). Alternatively, one could imagine doing this in a way that leaves klmn1 unchanged, and instead produces a new dataframe klmn11 with an updated M column. I'm interested in both approaches.
If you reset the index of your klmn1 dataframe to be that of the column L, then your dataframe will automatically align the indices with any series you subtract from it:
In [1]: klmn1.set_index('L')['M'] - m0
Out[1]:
L
a 0.777595
a -0.671791
b 0.779920
b -0.128690
Name: M
Option #1:
df1.subtract(df2, fill_value=0)
Option #2:
df1.subtract(df2, fill_value=None)