How to implement 3 Days average Sales % formula into pandas datafram's column_
I have a dataframe_
No Sale 3 Day Average Sale %
1 4786
2 7546
3 2578
4 6974 ( (No4 - ((No3+NO2+No1)/3)) / ((No3+NO2+No1)/3) ) * 100
Try rolling 4 elements at a time and apply a custom function
def average_sale_percent(x):
three_day_avg = sum(x[:3]) / 3
return ((x[3] - three_day_avg) / three_day_avg) * 100
df.Sale.rolling(4).apply(average_sale_percent)
Related
Let's assume we have this 'table' here:
Time_Order
Logic
Number
Accumulated
1
Add
20
20
2
Add
30
50
3
Add
50
100
4
Multiply
0.8
80
5
Multiply
0.5
40
6
Add
10
50
Accumulated is the results of Adding or Multiplying based on all the previous records, so in Time_Order 3 we accumulated (50 + 30 + 20) = 100, then in Time_Order 4 I want to multiply by the 0.8 so I get 100 * 0.80 = 80, then Time_Order 5 I multiply the 80 by 0.5 and get 40. I go back to Add in Time Order 6 and get 40 + 10 = 50
I have something like:
Select a.*, case when Logic = 'Add' then sum(Number) over (Order by Time_Order)
when Logic = 'Multiply' then Exp(Sum(ln(Accumulated * (1+Number)))) as Accumulated
from table a
The above won't work because I have 'Accumulated' within itself in the Multiply logic, so this is the exact problem, when faced with a conditional statement like this, how can i shift back and forth between 'add' and 'multiply' such that the accumulated number is remembered what it is from the previous row.
I have a pandas df which I am looking to build a pivot table with.
Here is a sample table
Name Week Category Amount
ABC 1 Clothing 50
ABC 1 Food 10
ABC 1 Food 10
ABC 1 Auto 20
DEF 1 Food 10
DEF 1 Services 20
The pivot table I am looking to create is to sum up the amounts per Name, per week per category.
Essentially, I am looking to land up with a table as follows:
Name Week Clothing Food Auto Services Total
ABC 1 50 20 20 0 90
DEF 1 0 10 0 20 30
If a user has no category value in a particular week, I take it as 0
And the total is the row sum.
I tried some of the options mentioned at https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html but couldnt get it to work...any thoughts on how I can achieve this. I used
df.pivot_table(values=['Amount'], index=['Name','Week','Category'], aggfunc=[np.sum]) followed by df.unstack() but that did not yield the desired result as both Week and Category got unstacked.
Thanks!
df_pvt = pd.pivot_table(df, values = 'Amount', index = ['Name', 'Week'], columns = 'Category', aggfunc = np.sum, margins=True, margins_name = 'Total', fill_value = 0
df_pvt.columns.name = None
df_pvt = df_pvt.reset_index()
Let us try crosstab
out = pd.crosstab(index = [df['Name'],df['Week']],
columns = df['Category'],
values=df['Amount'],
margins=True,
aggfunc='sum').fillna(0).iloc[:-1].reset_index()
Category Name Week Auto Clothing Food Services All
0 ABC 1 20.0 50.0 20.0 0.0 90
1 DEF 1 0.0 0.0 10.0 20.0 30
There is a data frame with totals and counts:
pd.DataFrame({
'categorie':['a','b','c'],
'total':[100,1000,500],
'x':[10,100,5],
'y':[100,1000,500]
})
categorie
total
x
y
a
100
10
100
b
1000
100
1000
c
500
5
500
I like to convert the counted columns into percentage based on the totals:
categorie
total
x%
y%
a
100
10
100
b
1000
10
100
c
500
1
100
Following will work for a series:
(100 * df['x'] / df['total']).round(1)
How to apply this for all columns in the data frame?
try via div(),mul() and astype() method:
df[['x%','y%']]=df[['x','y']].div(df['total'],axis=0).mul(100).astype(int)
output of df:
categorie total x y x% y%
0 a 100 10 100 10 100
1 b 1000 100 1000 10 100
2 c 500 5 500 1 100
I'm trying to find the difference between values using SQL where the second value is always larger than the previous value.
Example Data:
Car_ID | Trip_ID | Mileage
1 1 10,000
1 2 11,000
1 3 11,500
2 1 5,000
2 2 7,000
2 3 8,000
Expect Calculation:
Car_ID: 1
(Trip 2 - Trip 1) = 1,000
(Trip 3 - Trip 2) = 500
Average Difference: 750
Car_ID: 2
(Trip 2 - Trip 1) = 2,000
(Trip 3 - Trip 2) = 1,000
Average Difference: 1,500
Expected Output:
Car_ID | Average_Difference
1 750
2 1,500
You can use aggregation:
select car_id,
(max(mileage) - min(mileage)) / nullif(count(*) - 1, 0)
from t
group by car_id;
That is, the average as you have defined it is the maximum minus the minimum divided by one less than the number of trips.
I have a column in pandas dataset of random values ranging btw 100 and 500.
I need to create a new column 'deciles' out of it - like ranking, total of 20 deciles. I need to assign rank number out of 20 based on the value.
10 to 20 - is the first decile, number 1
20 to 30 - is the second decile, number 2
x = np.random.randint(100,501,size=(1000)) # column of 1000 rows with values ranging btw 100, 500.
df['credit_score'] = x
df['credit_decile_rank'] = df['credit_score'].map( lambda x: int(x/20) )
df.head()
Use integer division by 10:
df = pd.DataFrame({
'credit_score':[4,15,24,55,77,81],
})
df['credit_decile_rank'] = df['credit_score'] // 10
print (df)
credit_score credit_decile_rank
0 4 0
1 15 1
2 24 2
3 55 5
4 77 7
5 81 8