How do I (sum) the output of a list? - sum

List
L= [23, 91, 0, -11, 4, 23, 49]
Code
for i in L:
if i > 10:
num = i * 30
else:
num = i * 1
if num % 2 == 0:
num += 6
if i > 50:
num -= 10
if i != -11:
num += 10
print(num)
Output
696
2736
6
-11
10
696
1476
I'm trying to sum the numbers in the output and then divide the total by 2.

Initialize a variable sum outside the loop then, add the value of num to sum before the print statement. Finally, print(sum/2) once outside the loop.
sum = 0
for i in L:
...
sum += num
print(num)
print(sum/2)

Related

Using recursion to iterate multiple times through rows in a dataframe - not returning the expected result

How to loop through a dataframe series multiple times using a recursive function?
I am trying to get a simple case to work and use it in a more complicated function.
I am using a simple dataframe:
df = pd.DataFrame({'numbers': [1,2,3,4,5]
I want to iterate through the rows multiple time and sum the values. Each iteration, the index starting point increments by 1.
def recursive_sum(df, mysum=0, count=0):
df = df.iloc[count:]
if len(df.index) < 2:
return mysum
else:
for i in range(len(df.index)):
mysum += df.iloc[i, 0]
count += 1
return recursive_sum(df, mysum, count)
I think I should get:
#Iteration 1: count = 0, len(df.index) = 5 < 2, mysum = 1 + 2 + 3 + 4 + 5 = 15
#Iteration 2: count = 1, len(df.index) = 4 < 2, mysum = 15 + 2 + 3 + 4 + 5 = 29
#Iteration 3: count = 2, len(df.index) = 3 < 2, mysum = 29 + 3 + 4 + 5 = 41
#Iteration 4: count = 2, len(df.index) = 2 < 2, mysum = 41 + 4 + 5 = 50
#Iteration 5: count = 2, len(df.index) = 1 < 2, mysum = 50
But I am returning 38.
Just fixed it:
def recursive_sum(df, mysum=0, count=0):
if(len(df.index) - count) < 2:
return mysum
else:
for i in range(count, len(df.index)):
mysum += df.iloc[0]
count += 1
return recursive_sum(df, mysum, count)

Python optimization of loop in data frame with max and min values

I have question how can I optimize my code, in fact only the loops. I use to calculate solutions maximum of two rows, or sometimes max of row and number.
I tried to change my code using .loc and .clip but when it is about max or min which shows up multiple times I have some troubles with logical expressions.
That it was looking at the begining:
def Calc(row):
if row['Forecast'] == 0:
return max(row['Qty'],0)
elif row['def'] == 1:
return 0
elif row['def'] == 0:
return round(max(row['Qty'] - ( max(row['Forecast_total']*14,(row['Qty_12m_1']+row['Qty_12m_2'])) * max(1, (row['Total']/row['Forecast'])/54)),0 ))
df['Calc'] = df.apply(Calc, axis=1)
I menaged to change it using functions that I pointed but I have a problem how to write this max(max())
df.loc[(combined_sf2['Forecast'] == 0),'Calc'] = df.clip(0,None)
df.loc[(combined_sf2['def'] == 1),'Calc'] = 0
df.loc[(combined_sf2['def'] == 0),'Calc'] = round(max(df['Qty']- (max(df['Forecast_total']
*14,(df['Qty_12m_1']+df['Qty_12m_2']))
*max(1, (df['Total']/df['Forecast'])/54)),0))
First two functions are working, the last one doesn't.
id Forecast def Calc Qty Forecast_total Qty_12m_1 Qty_12m_2 Total
31551 0 0 0 2 0 0 0 95
27412 0,1 0 1 3 0,1 11 0 7
23995 0,1 0 0 4 0 1 0 7
27411 5,527 1 0,036186 60 0,2 64 0 183
28902 5,527 0 0,963814 33 5,327 277 0 183
23954 5,527 0 0 6 0 6 0 183
23994 5,527 0 0 8 0 0 0 183
31549 5,527 0 0 6 0 1 0 183
31550 5,527 0 0 6 0 10 0 183
Use numpy.select and instead max use numpy.maximum:
m1 = df['Forecast'] == 0
m2 = df['def'] == 1
m3 = df['def'] == 0
s1 = df['Qty'].clip(lower=0)
s3 = round(np.maximum(df['Qty'] - (np.maximum(df['Forecast_total']*14,(df['Qty_12m_1']+df['Qty_12m_2'])) * np.maximum(1, (df['Total']/df['Forecast'])/54)),0 ))
df['Calc2'] = np.select([m1, m2, m3], [s1, 0, s3], default=None)

Pandas Return Cell position containing string

I am new to data analysis , I wand to find cell position which containing input string.
example:
Price | Rate p/lot | Total Comm|
947.2 1.25 CAD 1.25
129.3 2.1 CAD 1.25
161.69 0.8 CAD 2.00
How do I find position of string "CAD 2.00".
Required output is (2,2)
In [353]: rows, cols = np.where(df == 'CAD 2.00')
In [354]: rows
Out[354]: array([2], dtype=int64)
In [355]: cols
Out[355]: array([2], dtype=int64)
Replace columns names to numeric by range, stack and for first occurence of value use idxmax:
d = dict(zip(df.columns, range(len(df.columns))))
s = df.rename(columns=d).stack()
a = (s == 'CAD 2.00').idxmax()
print (a)
(2, 2)
If want check all occurencies use boolean indexing and convert MultiIndex to list:
a = s[(s == 'CAD 1.25')].index.tolist()
print (a)
[(0, 2), (1, 2)]
Explanation:
Create dict for rename columns names to range:
d = dict(zip(df.columns, range(len(df.columns))))
print (d)
{'Rate p/lot': 1, 'Price': 0, 'Total Comm': 2}
print (df.rename(columns=d))
0 1 2
0 947.20 1.25 CAD 1.25
1 129.30 2.10 CAD 1.25
2 161.69 0.80 CAD 2.00
Then reshape by stack for MultiIndex with positions:
s = df.rename(columns=d).stack()
print (s)
0 0 947.2
1 1.25
2 CAD 1.25
1 0 129.3
1 2.1
2 CAD 1.25
2 0 161.69
1 0.8
2 CAD 2.00
dtype: object
Compare by string:
print (s == 'CAD 2.00')
0 0 False
1 False
2 False
1 0 False
1 False
2 False
2 0 False
1 False
2 True
dtype: bool
And get position of first True - values of MultiIndex:
a = (s == 'CAD 2.00').idxmax()
print (a)
(2, 2)
Another solution is use numpy.nonzero for check values, zip values together and convert to list:
i, j = (df.values == 'CAD 2.00').nonzero()
t = list(zip(i, j))
print (t)
[(2, 2)]
i, j = (df.values == 'CAD 1.25').nonzero()
t = list(zip(i, j))
print (t)
[(0, 2), (1, 2)]
A simple alternative:
def value_loc(value, df):
for col in list(df):
if value in df[col].values:
return (list(df).index(col), df[col][df[col] == value].index[0])

algorithm to deal with series of values

With a series with a START, INCREMENT, and MAX:
START = 100
INCREMENT = 30
MAX = 315
e.g. 100, 130, 160, 190, 220, 250, 280, 310
Given an arbitrary number X return:
the values remaining in the series where the first value is >= X
the offset Y (catch up amount needed to get from X to first value of the series).
Example
In:
START = 100
INCREMENT = 30
MAX = 315
X = 210
Out:
Y = 10
S = 220, 250, 280, 310
UPDATE -- From MBo answer:
float max = 315.0;
float inc = 30.0;
float start = 100.0;
float x = 210.0;
float k0 = ceil( (x-start) / inc) ;
float k1 = floor( (max - start) / inc) ;
for (int i=k0; i<=k1; i++)
{
NSLog(#" output: %d: %f", i, start + i * inc);
}
output: 4: 220.000000
output: 5: 250.000000
output: 6: 280.000000
output: 7: 310.000000
MBo integer approach will be nicer.
School math:
Start + k0 * Inc >= X
k0 * Inc >= X - Start
k0 >= (X - Start) / Inc
Programming math:
k0 = Ceil(1.0 * (X - Start) / Inc)
k1 = Floor(1.0 * (Max - Start) / Inc)
for i = k0 to k1 (including both ends)
output Start + i * Inc
Integer math:
k0 = (X - Start + Inc - 1) / Inc //such integer division makes ceiling
k1 = (Max - Start) / Inc //integer division makes flooring
for i = k0 to k1 (including both ends)
output Start + i * Inc
Example:
START = 100
INCREMENT = 30
MAX = 315
X = 210
k0 = Ceil((210 - 100) / 30) = Ceil(3.7) = 4
k1 = Floor((315 - 100) / 30) = Floor(7.2) = 7
first 100 + 4 * 30 = 220
last 100 + 7 * 30 = 310
Solve the inequation
X <= S + K.I <= M
This is equivalent to
K0 = Ceil((X - S) / I) <= K <= Floor((M - S) / I) = K1
and
Y = X - (S + K0.I).
Note that it is possible to have K0 > K1, and there is no solution.

Avoid looping a pandas dataframe keeping track of remaining inventory

I currently loop through a pandas dataframe that contains orders so that I can remove the ordered items from inventory and keep track of which order may not get filled (this is part of a reservation system).
I'd love to avoid the loop and do this in a more pythonic/panda-esque way but haven't been able to come up with anything that let's me get to the level of granularity I like. Any ideas would be much appreciated!
Here's a much simplified version of this.
Examples of the input would look like this:
import pandas as pd
import random
def get_inventory():
df_inv = pd.DataFrame([{'sku': 'A1', 'remaining': 1000},
{'sku': 'A2', 'remaining': 600},
{'sku': 'A3', 'remaining': 180},
{'sku': 'B1', 'remaining': 800},
{'sku': 'B2', 'remaining': 500},
], columns=['sku', 'remaining']).set_index('sku')
df_inv.loc[:, 'allocated'] = 0
df_inv.loc[:, 'reserved'] = 0
df_inv.loc[:, 'missed'] = 0
return df_inv
def get_reservations():
skus = ['A1', 'A2', 'A3', 'B1', 'B2']
res = []
for i in range(0, 1000, 1):
res.append({'order_id': i,
'sku': random.choice(skus),
'number_of_items_reserved': 1})
df_res = pd.DataFrame(res,
columns=['order_id', 'sku', 'number_of_items_reserved'])
return df_res
Inventory:
df_inv = get_inventory()
print(df_inv)
remaining allocated reserved missed
sku
A1 1000 0 0 0
A2 600 0 0 0
A3 180 0 0 0
B1 800 0 0 0
B2 500 0 0 0
Reservations:
df_res = get_reservations()
print(df_res.head(10))
order_id sku number_of_items_reserved
0 0 A3 1
1 1 B1 1
2 2 A3 1
3 3 A1 1
4 4 B1 1
5 5 B1 1
6 6 B1 1
7 7 B1 1
8 8 A3 1
9 9 B1 1
The logic to allocate reservations to inventory looks roughly like this:
(this is the part I'd love to replace)
"""
df_inv: inventory grouped (indexed) by sku (style and size)
df_res: reservations by order id for a style and size
"""
df_inv = get_inventory()
df_res = get_reservations()
for i, res in df_res.iterrows():
sku = res['sku']
n_items = res['number_of_items_reserved']
inv = df_inv[df_inv.index == sku]['remaining'].values[0]
df_inv.loc[(df_inv.index == sku), 'reserved'] += n_items
if (inv-n_items) >= 0:
df_inv.loc[(df_inv.index == sku), 'allocated'] += n_items
df_inv.loc[(df_inv.index == sku), 'remaining'] -= n_items
else:
df_inv.loc[(df_inv.index == sku), 'missed'] += n_items
Results:
remaining allocated reserved missed
sku
A1 817 183 183 0
A2 390 210 210 0
A3 0 180 210 30
B1 613 187 187 0
B2 290 210 210 0
You can get way without looping due to the intrinsic data alignment in Pandas.
df_inv = get_inventory()
df_res = get_reservations()
Creates series with the index of 'sku'
n_items = df_res.groupby('sku')['number_of_items_reserved'].sum()
shortage = df_inv['remaining'] - n_items
enough_inv = shortage > 0
Because Pandas does intrinsic data alignment and df_inv index is 'sku' and the created series above index is 'sku', these calculations are done by 'sku'. Using boolean indexing to determine which 'sku's has enough inventory to increment allocated and decrement remaining or increment missed.
df_inv['reserved'] += n_items
df_inv.loc[enough_inv,'allocated'] += n_items
df_inv.loc[enough_inv,'remaining'] -= n_items
df_inv.loc[~enough_inv,'missed'] -= shortage
df_inv.loc[~enough_inv,'allocated'] += n_items + shortage
df_inv.loc[~enough_inv,'remaining'] = 0
print(df_inv)
Output:
remaining allocated reserved missed
sku
A1 815.0 185.0 185 0.0
A2 410.0 190.0 190 0.0
A3 0.0 180.0 200 20.0
B1 586.0 214.0 214 0.0
B2 289.0 211.0 211 0.0