I am a new bee to python and Pandas, I have a huge data set and insted of applying function row by row I want to apply to a batch of rows and return back the result and associated back to the same corresponding row back
Example:
ID Values
a 2
b 3
c 4
d 5
e 6
f 7
df['squared_values']= df['values'].apply(lambda row: function(row))
def function(x):
#making call to api and returning values related to x
return response
above one apply function row by row which is time consuming
I need a way to do batch operations on row
example:
batch=3
df['squared_values']= df['values'].apply(lambda batch: function(batch))
on first pass values should be
ID Values squared_values
a 2 4
b 3 9
c 4 16
d 5
e 6
f 7
on second pass
ID Values squared_values
a 2 4
b 3 9
c 4 16
d 5 25
e 6 36
f 7 49
Is this operation really too slow?
df['squared_values'] = df['Values'] ** 2
you can always add the iloc to select rows:
df.iloc['squared_values'].update(df.iloc[0:4]['Values'] ** 2)
But I can't imagine this being quicker
I have accident record data as shown below across the places
Inspector_ID Place Date
0 1 A 1-09-2019
1 2 A 1-09-2019
2 1 A 1-09-2019
3 1 B 1-09-2019
4 3 A 1-09-2019
5 3 A 1-09-2019
6 1 A 2-09-2019
7 3 A 2-09-2019
8 2 B 2-09-2019
9 3 A 3-09-2019
10 1 C 3-09-2019
11 1 D 3-09-2019
12 1 A 3-09-2019
13 1 E 3-09-2019
14 1 A 3-09-2019
15 1 A 3-09-2019
16 3 A 4-09-2019
17 3 B 5-09-2019
18 4 B 5-09-2019
19 3 A 5-09-2019
20 3 C 5-09-2019
21 3 A 5-09-2019
22 3 D 5-09-2019
23 3 C 5-09-2019
From the above data, I want to optimize the inspector utlisation.
for that tried below codes get the objective function of the optimisation.
c = df.groupby('Place').Inspector_ID.agg(
Total_Number_of_accidents='count',
Number_unique_Inspector='nunique',
Unique_Inspector='unique').reset_index().sort_values(['Total_Number_of_accidents'], ascending=False)
Below is the output of above code
Place Total_Number_of_accidents Number_unique_Inspector Unique_Inspector
0 A 14 3 [1, 2, 3]
1 B 4 4 [1, 2, 3, 4]
2 C 3 2 [1, 3]
3 D 2 2 [1, 3]
4 E 1 1 [1]
And then
f = df.groupby('Inspector_ID').Place.agg(
Total_Number_of_accidents='count',
Number_unique_Place='nunique',
Unique_Place='unique').reset_index().sort_values(['Total_Number_of_accidents'], ascending=False)
Output:
Inspector_ID Total_Number_of_accidents Number_unique_Place Unique_Place
2 3 11 4 [A, B, C, D]
0 1 10 5 [A, B, C, D, E]
1 2 2 2 [A, B]
3 4 1 1 [B]
From the above we have 4 Inspectors, 5 Places and 24 accidents. I want to optimize the allocation of inspectors based on the above data.
condition 1 - There should be at least 1 inspector in each Place.
condition 2 - All inspector should be assigned at least one Place.
Condition 3 - Identify the Place which is over utilised based on number of accidents (for eg: Place - B - Only 4 accidents and four inspector, So some inpspector from Place B can be assigned to Place A and next question is which inspector? and How many?.
Is it possible to do that in python, if possible which algorithm? and how?
it is an https://en.wikipedia.org/wiki/Assignment_problem maybe it should be reduced to max-flow problem but with optimization of equality in flow (using graph package like NetworkX):
how to create di-graph:
vertice s source of flow (of accidents)
S-set would be all places that will have accidents
X_s - set of all edges (s, x) where x in S, now t is sink, and we have analogus sets T and X_t now let's set capacity for edges in X_s - it would be set from column Total_Number_of_accidents in X_t we would set max number of accidents to process by inspector and we will get back to it later on, now let's make edges from S to T (x, y) where x in X_s and y in X_t and let's set capacity of these edges to high number (e.g. 1e6) and let's call this set X_c these edges will tell us how much load will get inspector y from place x.
now solve max-flow problem, and when some edges from X_t would have too big flow you can decrease capacity of these (to reduce load on particular inspector) and when some edges in X_c will have very small flow you can just remove these edges to reduce complexity of work organization, after few iterations you should have desired solution
you can code some super algorithm but if it's real life problem you would like to avoid situations like assigning one inspector to all places and to process 0.38234 accident at each place...
also there should be probably some constraints on how many accidents should be processed by inspector in given time but you didn't mentioned it...
Problem1:
We want to go from s to e. In each cell we can move right R or down D. The environment is fully known. The table has (4*5) 20 cells. The challenge is that we do not know what the reward of each cell is, but we will receive an overall reward as we pass and finish a path.
Example: a solution can be RRDDRDR and the overall reward is 16.
s 3 5 1 5
1 2 4 5 1
7 3 1 2 8
9 2 1 1 e
The target is to find a set of actions from Start to End which maximizes the obtained overall reward. How can we distribute the overall reward among actions?
Problem2:
This problem is the same as Problem1 but the rewards of problem environment is dynamic so that the way we reach a cell will affect the rewards of cells which are ahead.
Example: for two movements of RRD and DRR, both will get us to the same cell but since they have different path, the ahead cells will have different rewards.
s 3 5 1 5
1 2 4 9 -1
7 3 2 -5 18
9 2 9 7 e
(RRD path, selecting this path will result in changes of rewards of ahead cells)
s 3 5 1 5
1 2 4 3 1
7 3 30 7 -8
9 2 40 11 e
(DRR path, selecting this path will result in changes of rewards of ahead cells)
The target is to find a set of actions from Start to End which maximizes the obtained overall reward. How can we distribute the overall reward between actions? (After passing a path from Start to End and the overall reward is obtained)
Can you say more about the research you are doing? (The problem sounds a lot like the sort of thing someone might assign just to get you thinking about temporal credit assignment.)
I have an Excel file containing two sheets:
Teams (which contains details of teams and time schedule of their matches)
Results (contains calculation and number of matches and results, etc).
WhatResultsdid is that each game that has launched status on Teams it gets on Results sheet and thenResultsinput the number and calculation on that sheet.
My problem is that when there is a game inside two started games, when it starts the new name comes and shifts the name row below it and keeps the details of the row below it for itself, while it should be to next of it, here is my example:
Sheet1 Teams
A B
1 **Names** **Status**
2 TEAM A Launched
3 TEAM B Pending
4 TEAM C Pending
5 TEAM D Launched
2 Results
A B C D E
1 **Names** **1st Half goals** **2nd half** **total** **points**
2 TEAM A 1 2 3 13
3 TEAM D 3 1 4 10
So what happens here is that if change the status of TEAM B on Sheet1 of Teams it will be appear on 2 Results and it will take the row input of TEAM D.
It will be like this:
2 Results
A B C D E
1 **Names** **1st Half goals** **2nd half** **total** **points**
2 TEAM A 1 2 3 13
3 TEAM B 3 1 4 10
4 TEAM D
Is there any solution? please let me know.
This is the formula on Sheet 2 Results Cell A2
=IFERROR(INDEX(TEAMS!A$2:A$550,SMALL(IF(TEAMS!B$2:B$550="Launched",ROW(TEAMS!B$2:B$550)-ROW(TEAMS!B$1)),ROW(TEAMS!B2))),"")
and Cell B & C & E there is no formula only manual input, and for Cell D =SUM(C2,B2)
I found an answer for my question, thanks to GraH - Guido & Vletm users on Chandoo Forum, who helped me find the answer.
I found 2 answers both helpful and working for my case, you can find them here and here
Thanks.
I am really having difficulty generating a round-robin tournament roster with the following conditions:
10 Teams (Teams 1 - 10)
5 Fields (Field A - E)
9 Rounds (Round 1 - 9)
Each team must play every other team exactly once.
Only two teams can play on a field at any one time. (i.e. all 5 fields always in use)
No team is allowed to play on any particular field more than twice. <- This is the problem!
I have been trying on and off for many years to solve this problem on paper without success. So once and for all, I would like to generate a function in Excel VBA to test every combination to prove it is impossible.
I started creating a very messy piece of code that generates an array using nested if/while loops, but I can already see it's just not going to work.
Is there anyone out there with a juicy piece of code that can solve?
Edit: Thanks to Brian Camire's method below, I've been able to include further desirable constraints and still get a solution:
No team plays the same field twice in a row
A team should play on all the fields once before repeating
The solution is below. I should have asked years ago! Thanks again Brian - you are a genius!
Round 1 2 3 4 5 6 7 8 9
Field A 5v10 1v9 2v4 6v8 3v7 4v10 3v9 7v8 1v2
Field B 1v7 8v10 3v6 2v9 4v5 6v7 1v8 9v10 3v5
Field C 2v6 3v4 1v10 5v7 8v9 1v3 2v5 4v6 7v10
Field D 4v9 2v7 5v8 3v10 1v6 2v8 4v7 1v5 6v9
Field E 3v8 5v6 7v9 1v4 2v10 5v9 6v10 2v3 4v8
I think I've found at least one solution to the problem:
Round Field Team 1 Team 2
1 A 3 10
1 B 7 8
1 C 1 9
1 D 2 4
1 E 5 6
2 A 8 10
2 B 1 5
2 C 2 6
2 D 3 7
2 E 4 9
3 A 1 4
3 B 2 3
3 C 8 9
3 D 5 7
3 E 6 10
4 A 6 7
4 B 4 10
4 C 2 8
4 D 5 9
4 E 1 3
5 A 2 9
5 B 3 8
5 C 4 7
5 D 1 6
5 E 5 10
6 A 3 9
6 B 4 5
6 C 7 10
6 D 6 8
6 E 1 2
7 A 5 8
7 B 6 9
7 C 1 10
7 D 3 4
7 E 2 7
8 A 4 6
8 B 2 10
8 C 3 5
8 D 1 8
8 E 7 9
9 A 2 5
9 B 1 7
9 C 3 6
9 D 9 10
9 E 4 8
I found it using the OpenSolver add-in for Excel (as the problem was too large for the built-in Solver feature). The steps were something like this:
Set up a table with 2025 rows representing the possible matches -- that is, possible combinations of round, field, and pair of teams (with columns like the table above), plus one extra column that will be a binary (0 or 1) decision variable indicating if the match is to be selected.
Set up formulas to use the decision variables to calculate: a) the number matches at each field in each round, b) the number of matches between each pair of teams, c) the number of matches played by each team in each round, and, d) the number of matches played by each team at each field.
Set up a formula to use the decision variables to calculate the total number of matches.
Use OpenSolver to solve a model whose objective is to maximize the result of the formula from Step 3 by changing the decision variables from Step 1, subject to the constraints that the decision variables must be binary, the results of the formulas from Steps 2.a) through c) must equal 1, and the results of the formulas from Step 2.d) must be less than or equal to 2.
The details are as follows...
For Step 1, I set up my table so that columns A, B, C, and D represented the Round, Field, Team 1, and Team 2, respectively, and column E represented the decision variable. Row 1 contained the column headings, and rows 2 through 2026 each represented one possible match.
For Step 2.a), I set up a vertical list of rounds 1 through 9 in cells I2 through I10, a horizontal list of fields A through E in cells J1 through N1, and a series of formulas to calculate the number of matches in each field in each round in cells J2 through N10 by starting with =SUMIFS($E$2:$E$2026,$A$2:$A$2026,$I2,$B$2:$B$2026,J$1) in cell J2 and then copying and pasting.
For Step 2.b), I set up a vertical list of teams 1 through 9 in cells I13 through I21, a horizontal list of opposing teams 2 through 10 in cells J12 through R12, and a series of formulas to calculate the number of matches between each pair of teams in the "upper right triangular half" of cells J13 through R21 (including the diagonal) by starting with =SUMIFS($E$2:$E$2026,$C$2:$C$2026,$I13,$D$2:$D$2026,J$12) in cell J13 and then copying and pasting.
For Step 2.c), I set up a vertical list of teams 1 through 10 in cells I24 through I33, a horizontal list of rounds 1 through 9 in cells J23 through R23, and a series of formulas to calculate the number of matches played by each team in each round in cells J24 through R33 by starting with =SUMIFS($E$2:$E$2026,$C$2:$C$2026,$I24,$A$2:$A$2026,J$23)+SUMIFS($E$2:$E$2026,$D$2:$D$2026,$I24,$A$2:$A$2026,J$23) in cell J24 and then copying and pasting.
For Step 2.d), I set up a vertical list of teams 1 through 10 in cells I36 through I45, a horizontal list of fields A through B in cells J35 through N45, and series of formulas to calculate the number of matches played by each team at each field in cells J36 through N45 by starting with =SUMIFS($E$2:$E$2026,$C$2:$C$2026,$I36,$B$2:$B$2026,J$35)+SUMIFS($E$2:$E$2026,$D$2:$D$2026,$I36,$B$2:$B$2026,J$35) in cell J36 and then copying and pasting.
For Step 3, I set up a formula to calculate the total number of matches in cell G2 as =SUM($E$2:$E$2026).
For Step 4, in the OpenSolver Model dialog (available from Data, OpenSolver, Model) I set the Objective Cell to $G$2, the Variable Cells to $E$2:$E$2026, and added constraints as described above and detailed below (sorry that the constraints are not listed in the order that I described them):
Note that, for the constraints described in Step 2.b), I needed to add the constraints separately for each row, since OpenSolver raised an error message if the constraints included the blank cells in the "lower left triangular half".
After setting up the model, OpenSolver highlighted the objective, variable, and constraint cells as shown below:
I then solved the problem using OpenSolver (via Data, OpenSolver, Solve). The selected matches are the ones with a 1 in column E. You might get a different solution than I did, as there might be many feasible ones.
come on ... that's an easy one for manual solution ;-)
T1 T2 VE
1 2 A
1 3 A
1 4 B
1 5 B
1 6 C
1 7 C
1 8 D
1 9 D
1 10 E
2 3 A
2 4 B
2 5 B
2 6 C
2 7 C
2 8 D
2 9 D
2 10 E
3 4 C
3 5 C
3 6 D
3 7 D
3 8 E
3 9 E
3 10 B
4 5 C
4 6 D
4 7 D
4 8 E
4 9 E
4 10 A
5 6 E
5 7 E
5 8 A
5 9 A
5 10 D
6 7 E
6 8 A
6 9 A
6 10 B
7 8 B
7 9 B
7 10 A
8 9 B
8 10 C
9 10 C
As far as I have checked no team more then twice on the same venue. Please double check.
To divide it into rounds should be a easy one.
Edit: this time with only 5 venues :-)
Edit 2: now also with allocated rounds :-)
Edit 3: deleted the round allocation again because it was wrong.