How to implement this formula into pandas dataframe's column? - pandas

How to implement 3 Days average Sales % formula into pandas datafram's column_
I have a dataframe_
No Sale 3 Day Average Sale %
1 4786
2 7546
3 2578
4 6974 ( (No4 - ((No3+NO2+No1)/3)) / ((No3+NO2+No1)/3) ) * 100

Try rolling 4 elements at a time and apply a custom function
def average_sale_percent(x):
three_day_avg = sum(x[:3]) / 3
return ((x[3] - three_day_avg) / three_day_avg) * 100
df.Sale.rolling(4).apply(average_sale_percent)

Related

How to apply aggregate addition/arithmetic and multiplication/product based on condition in one column?

Let's assume we have this 'table' here:
Time_Order
Logic
Number
Accumulated
1
Add
20
20
2
Add
30
50
3
Add
50
100
4
Multiply
0.8
80
5
Multiply
0.5
40
6
Add
10
50
Accumulated is the results of Adding or Multiplying based on all the previous records, so in Time_Order 3 we accumulated (50 + 30 + 20) = 100, then in Time_Order 4 I want to multiply by the 0.8 so I get 100 * 0.80 = 80, then Time_Order 5 I multiply the 80 by 0.5 and get 40. I go back to Add in Time Order 6 and get 40 + 10 = 50
I have something like:
Select a.*, case when Logic = 'Add' then sum(Number) over (Order by Time_Order)
when Logic = 'Multiply' then Exp(Sum(ln(Accumulated * (1+Number)))) as Accumulated
from table a
The above won't work because I have 'Accumulated' within itself in the Multiply logic, so this is the exact problem, when faced with a conditional statement like this, how can i shift back and forth between 'add' and 'multiply' such that the accumulated number is remembered what it is from the previous row.

pandas pivot table how to rearrange columns

I have a pandas df which I am looking to build a pivot table with.
Here is a sample table
Name Week Category Amount
ABC 1 Clothing 50
ABC 1 Food 10
ABC 1 Food 10
ABC 1 Auto 20
DEF 1 Food 10
DEF 1 Services 20
The pivot table I am looking to create is to sum up the amounts per Name, per week per category.
Essentially, I am looking to land up with a table as follows:
Name Week Clothing Food Auto Services Total
ABC 1 50 20 20 0 90
DEF 1 0 10 0 20 30
If a user has no category value in a particular week, I take it as 0
And the total is the row sum.
I tried some of the options mentioned at https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html but couldnt get it to work...any thoughts on how I can achieve this. I used
df.pivot_table(values=['Amount'], index=['Name','Week','Category'], aggfunc=[np.sum]) followed by df.unstack() but that did not yield the desired result as both Week and Category got unstacked.
Thanks!
df_pvt = pd.pivot_table(df, values = 'Amount', index = ['Name', 'Week'], columns = 'Category', aggfunc = np.sum, margins=True, margins_name = 'Total', fill_value = 0
df_pvt.columns.name = None
df_pvt = df_pvt.reset_index()
Let us try crosstab
out = pd.crosstab(index = [df['Name'],df['Week']],
columns = df['Category'],
values=df['Amount'],
margins=True,
aggfunc='sum').fillna(0).iloc[:-1].reset_index()
Category Name Week Auto Clothing Food Services All
0 ABC 1 20.0 50.0 20.0 0.0 90
1 DEF 1 0.0 0.0 10.0 20.0 30

Pandas - Converting columns in percentage based on first columns value

There is a data frame with totals and counts:
pd.DataFrame({
'categorie':['a','b','c'],
'total':[100,1000,500],
'x':[10,100,5],
'y':[100,1000,500]
})
categorie
total
x
y
a
100
10
100
b
1000
100
1000
c
500
5
500
I like to convert the counted columns into percentage based on the totals:
categorie
total
x%
y%
a
100
10
100
b
1000
10
100
c
500
1
100
Following will work for a series:
(100 * df['x'] / df['total']).round(1)
How to apply this for all columns in the data frame?
try via div(),mul() and astype() method:
df[['x%','y%']]=df[['x','y']].div(df['total'],axis=0).mul(100).astype(int)
output of df:
categorie total x y x% y%
0 a 100 10 100 10 100
1 b 1000 100 1000 10 100
2 c 500 5 500 1 100

Average difference between values SQL

I'm trying to find the difference between values using SQL where the second value is always larger than the previous value.
Example Data:
Car_ID | Trip_ID | Mileage
1 1 10,000
1 2 11,000
1 3 11,500
2 1 5,000
2 2 7,000
2 3 8,000
Expect Calculation:
Car_ID: 1
(Trip 2 - Trip 1) = 1,000
(Trip 3 - Trip 2) = 500
Average Difference: 750
Car_ID: 2
(Trip 2 - Trip 1) = 2,000
(Trip 3 - Trip 2) = 1,000
Average Difference: 1,500
Expected Output:
Car_ID | Average_Difference
1 750
2 1,500
You can use aggregation:
select car_id,
(max(mileage) - min(mileage)) / nullif(count(*) - 1, 0)
from t
group by car_id;
That is, the average as you have defined it is the maximum minus the minimum divided by one less than the number of trips.

How to split numbers in pandas column into deciles?

I have a column in pandas dataset of random values ranging btw 100 and 500.
I need to create a new column 'deciles' out of it - like ranking, total of 20 deciles. I need to assign rank number out of 20 based on the value.
10 to 20 - is the first decile, number 1
20 to 30 - is the second decile, number 2
x = np.random.randint(100,501,size=(1000)) # column of 1000 rows with values ranging btw 100, 500.
df['credit_score'] = x
df['credit_decile_rank'] = df['credit_score'].map( lambda x: int(x/20) )
df.head()
Use integer division by 10:
df = pd.DataFrame({
'credit_score':[4,15,24,55,77,81],
})
df['credit_decile_rank'] = df['credit_score'] // 10
print (df)
credit_score credit_decile_rank
0 4 0
1 15 1
2 24 2
3 55 5
4 77 7
5 81 8