find sum of the count for the latest snapshot in pandas

find sum of the count for the latest snapshot in pandas - pandas

i have a below df and want to calculate sum of a group by taking last snapshot:
product desc id month_year count
car ford 1 2019-01 20
car ford 1 2019-02 20
car ford 1 2019-04 40
car ford 2 2019-04 30
car ford 2 2019-04 30
car ford 2 2019-04 60
and find output as
df.groupby(["product", "desc"]. ?
product desc count_overall
car ford 100
which is for id 1 take last count order by desc month_year which is 40 and similarly for 2 it is 60 which makes the total as 100

IIUC you need the id as well to get the last value of count
s=df.groupby(["product", "desc","id"])['count'].last().sum(level=[0,1]).to_frame('count_overall').reset_index()
Out[171]:
product desc count_overall
0 car ford 100

You can also use drop_duplicates given the data is sorted by date already:
(df.drop_duplicates(['product','desc','id'], keep='last')
.groupby(['product','desc'])['count'].sum()
)
Output:
product desc
car ford 100
Name: count, dtype: int64

IIUC,
we can use groupby with agg with sort_values to get the last occurance of count.
first we transform your date into a proper datetime
df['month_year'] = pd.to_datetime(df['month_year'],format='%Y-%m')
new_df = df.sort_values("count").groupby(["product", "desc", "id"]).agg(
date_max=("month_year", max), count=("count", "last")
)
print(new_df)
date_max count
product desc id
car ford 1 2019-04-01 40
2 2019-04-01 60
from here you can just do a simple sum
print(new_df.groupby(level=[0,1]).sum())
count
product desc
car ford 100

Related

Calculating moving sum (or SUM OVER) for the last X months, but with irregular number of rows

I want to do a window function (like the SUM() OVER() function), but there are two catches:
I want to consider the last 3 months on my moving sum, but the number of rows are not consistent. Some months have 3 entries, others may have 2, 4, 5, etc;
There is also a "group" column, and the moving sum should sum only the amounts of the same group.
In summary, a have a table that has the following structure:
id
date
group
amount
1
2022-01
group A
1100
2
2022-01
group D
2500
3
2022-02
group A
3000
4
2022-02
group B
1000
5
2022-02
group C
2500
6
2022-03
group A
2000
7
2022-04
group C
1000
8
2022-05
group A
1500
9
2022-05
group D
2000
10
2022-06
group B
1000
So, I want to add a moving sum column, containing the sum the amount for each group for the last 3 months. The sum should not reset every 3 months, but should consider only the previous values from the 3 months prior, and of the same group.
The end result should look like:
id
date
group
amount
moving_sum_three_months
1
2022-01
group A
1100
1100
2
2022-01
group D
2500
2500
3
2022-02
group A
3000
4100
4
2022-02
group B
1000
1000
5
2022-02
group C
2500
2500
6
2022-03
group A
2000
6100
7
2022-04
group C
1000
3500
8
2022-05
group A
1500
3500
9
2022-05
group D
2000
2000
10
2022-06
group B
1200
1200
The best example to see how the sum work in this example is line 8.
It considers only lines 8 and 6 for the sum, because they are the only one that meet the criteria;
Line 1 and 3 do not meet the criteria, because they are more than 3 months old from line 8 date;
All the other lines are not from group A, so they are also excluded from the sum.
Any ideias? Thanks in advance for the help!

Use SUM() as a window function partitioning the window by group in RANGE mode. Set the frame to go back 3 months prior the current record using INTERVAL '3 months', e.g.
SELECT *, SUM(amount) OVER w AS moving_sum_three_months
FROM t
WINDOW w AS (PARTITION BY "group" ORDER BY "date"
RANGE BETWEEN INTERVAL '3 months' PRECEDING AND CURRENT ROW)
ORDER BY id
Demo: db<>fiddle

Finding Max Price and displaying multiple columns SQL

I have a table that looks like this:
customer_id item price cost
1 Shoe 120 36
1 Bag 180 50
1 Shirt 30 9
2 Shoe 150 40
3 Shirt 30 9
4 Shoe 120 36
5 Shorts 65 14
I am trying to find the most expensive item each customer bought along with the cost of item and the item name.
I'm able to do the first part:
SELECT customer_id, max(price)
FROM sales
GROUP BY customer_id;
Which gives me:
customer_id price
1 180
2 150
3 30
4 120
5 65
How do I get this output to also show me the item and it's cost in the output? So output should look like this...
customer_id price item cost
1 180 Bag 50
2 150 Shoe 40
3 30 Shirt 9
4 120 Shoe 36
5 65 Shorts 14
I'm assuming its a Select statement within a Select? I would appreciate the help as I'm fairly new to SQL.

One method that usually has good performance is a correlated subquery:
select s.*
from sales s
where s.price = (select max(s2.price)
from sales s2
where s2.customer_id = s.customer_id
);

sql query not returning the correct MTD cumulative sum

I have the following table "card_txns":
user_sign_month month months_since_cust country txn_amt
2018-01 2018-01 1 DE 100
2018-01 2018-02 1 DE 100
2018-01 2018-03 1 DE 100
2019-01 2019-01 1 IN 100
2019-02 2019-02 1 US 1,000
2019-03 2019-03 1 DE 1,000
2019-04 2019-04 1 US 1,000
I want to see the cumulative sum, total sum by txn_month column for 2019, and the following query is not returning that
SELECT month AS Tx_MONTH,
SUM(txn_amt) AS Total_transactions_2019,
(SELECT SUM(txn_amt)
FROM card_txns AS b
WHERE a.month >= b.month
AND a.country = b.country) AS CUM_MTD_Total
FROM card_txns AS a AND substring(month, 1, 4) = '2019'
GROUP BY month
ORDER BY month
The output should look like this:
Tx_MONTH Total_transactions_2019 CUM_MTD_Total
2019-01 100 100
2019-02 100 1100
2019-03 100 2100
2019-04 100 3100
I want to have the cumulative sum by month sorted in the above manner, s o 2019-01 should appear first and so on.

You would use window functions:
SELECT month AS Tx_MONTH, SUM(txn_amt) as Total_transactions_2019,
SUM(txn_amt) OVER (ORDER BY month) as MTD_Total
FROM card_txns ct
WHERE month LIKE '2019-%'
GROUP BY month
ORDER BY month;
Here is a db<>fiddle.

SSAS MDX calculation

I need a calculated measure(SSAS MD) to calculate the percentage of count values.
I have tried below expression but I did not get the desired output.Let me know if I missing anything and I want to calculate the percentage of the age for the group by the car total:
( [DimCar].[Car], [DimAge].[Age], [Measure].[Count])/
sum([DimCar].[Car].[All].children), ([DimAge].[Age].[All], [Meaures].[Count])}*100
Below are the sample date values in cube:
Car Age Count
----- ----- -----
Benz 1 2
Camry 37
Honda 1 18
Honda 6 10
Expected output:
Car Age Count Percent TotalCount
----- ----- ----- ------ ----------
Benz 1 2 100% 2
Camry 37 100% 37
Honda 1 18 64.28% 28
Honda 6 10 35.71% 28
Forumula to calculate percentage:
18/28*100 =64.28%
10/28*100 =35.71%
Honda 1 18 64.28% 28
Honda 6 10 35.71% 28

with Member [Measures].[Total Sales Count]
as iif (isempty([Measures].[Sales]),NUll, sum([Model].[Modelname].[All].children ,[Measures].[Sales]))
Member [Measures].[Total Sales%]
as ([Measures].[Sales]/[Measures].[Total Sales Count]),FORMAT_STRING = "Percent"
select {[Measures].[Sales],[Measures].[Total Sales Count],[Measures].[Total Sales%]
}on 0
,non empty{[Car].[Carname].[Carname]*[Model].[Modelname].[Modelname]} on 1
from [Cube]
Output :
Car Model Sales Total Sales Count Total Sales%
Benz New Model 2 2 100.00%
Camry Old Model 37 37 100.00%
Honda New Model 18 28 64.29%
Honda Top Model 10 28 35.71%
Instead of "Age" attribute I have added "Model" dimension.
Below code get exact output which is expected.
enter image description here

My understanding is that for a particular car example honda, you want to divide by the total honda's irrespective of their Age. In this case 28. So for Age:six honda you use 10/28. Where as for Benz, since all Benz are Age: 1 you use 2.
Use the following code
Round(
(
( [DimCar].[Car].currentmember, [DimAge].[Age].currentmember, [Measure].[Count])
/
([DimCar].[Car].currentmember,root([DimAge]),[Measure].[Count])
)*100
,2)
Below is a similar example on adventure works
with member
measures.t as
(
( [Product].[Category].currentmember, [Delivery Date].[Calendar Year].currentmember, [Measures].[Internet Order Quantity])
/
([Product].[Category].currentmember,root([Delivery Date]),[Measures].[Internet Order Quantity])
)*100
select {[Measures].[Internet Order Quantity],measures.t}
on columns ,
non empty
([Product].[Category].[Category],[Delivery Date].[Calendar Year].[Calendar Year])
on rows
from [Adventure Works]

Calculating Cumulative Totals and Date that a Certain Threshold is Reached in SQL

I have a simple table with Person, Date and Quantity:
Person Date Qty
Jim 08/01/16 1
Jim 08/02/16 3
Jim 08/03/16 2
Jim 08/04/16 1
Jim 08/05/16 1
Jim 08/06/16 6
Sheila 08/01/16 1
Sheila 08/02/16 1
Sheila 08/03/16 1
Sheila 08/04/16 1
Sheila 08/05/16 1
Sheila 08/06/16 1
I'd like to calculate two columns: Cumulative Total and Percentage of Total, resulting in the following table:
Person Date Qty cum tot pct of tot
Jim 08/01/16 1 1 7%
Jim 08/02/16 3 4 29%
Jim 08/03/16 2 6 43%
Jim 08/04/16 1 7 50%
Jim 08/05/16 1 8 57%
Jim 08/06/16 6 14 100%
Sheila 08/01/16 1 1 17%
Sheila 08/02/16 1 2 33%
Sheila 08/03/16 1 3 50%
Sheila 08/04/16 1 4 67%
Sheila 08/05/16 1 5 83%
Sheila 08/06/16 1 6 100%
And with this dataset, I would like to identify the date for each person where their pct of tot reaches 50% (or any other percentage I supply).
So the output for the 50% threshold would be:
Jim 08/04/16
Sheila 08/03/16
Any suggestions on how I can calculate the two columns and determine the appropriate dates?

You can use the ANSI standard cumulative sum function to calculate the cumulative sum. The rest is really just arithmetic:
select t.*
from (select t.*,
sum(qty) over (partition by person order by date) as running_qty,
sum(qty) over (partition by person) as tot_qty,
(sum(qty) over (partition by person order by date) * 1.0 /
sum(qty) over (partition by person)
) as running_percent
from sales t
) t
where running_percent >= 0.5 and
running_percent - (qty * 1.0 / tot_qty) < 0.5;
The reason the where clause has two conditions is to return a single row. The first will return all rows greater than or equal to 0.5, but you only want the first -- where the percent crosses the threshold.
The * 1.0 is because some databases do integer division.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

find sum of the count for the latest snapshot in pandas - pandas

IIUC you need the id as well to get the last value of count s=df.groupby(["product", "desc","id"])['count'].last().sum(level=[0,1]).to_frame('count_overall').reset_index() Out[171]: product desc count_overall 0 car ford 100

You can also use drop_duplicates given the data is sorted by date already: (df.drop_duplicates(['product','desc','id'], keep='last') .groupby(['product','desc'])['count'].sum() ) Output: product desc car ford 100 Name: count, dtype: int64

Related

Calculating moving sum (or SUM OVER) for the last X months, but with irregular number of rows

Finding Max Price and displaying multiple columns SQL

sql query not returning the correct MTD cumulative sum

SSAS MDX calculation

Calculating Cumulative Totals and Date that a Certain Threshold is Reached in SQL

Categories

Resources