i have a below df and want to calculate sum of a group by taking last snapshot:
product desc id month_year count
car ford 1 2019-01 20
car ford 1 2019-02 20
car ford 1 2019-04 40
car ford 2 2019-04 30
car ford 2 2019-04 30
car ford 2 2019-04 60
and find output as
df.groupby(["product", "desc"]. ?
product desc count_overall
car ford 100
which is for id 1 take last count order by desc month_year which is 40 and similarly for 2 it is 60 which makes the total as 100
IIUC you need the id as well to get the last value of count
s=df.groupby(["product", "desc","id"])['count'].last().sum(level=[0,1]).to_frame('count_overall').reset_index()
Out[171]:
product desc count_overall
0 car ford 100
You can also use drop_duplicates given the data is sorted by date already:
(df.drop_duplicates(['product','desc','id'], keep='last')
.groupby(['product','desc'])['count'].sum()
)
Output:
product desc
car ford 100
Name: count, dtype: int64
IIUC,
we can use groupby with agg with sort_values to get the last occurance of count.
first we transform your date into a proper datetime
df['month_year'] = pd.to_datetime(df['month_year'],format='%Y-%m')
new_df = df.sort_values("count").groupby(["product", "desc", "id"]).agg(
date_max=("month_year", max), count=("count", "last")
)
print(new_df)
date_max count
product desc id
car ford 1 2019-04-01 40
2 2019-04-01 60
from here you can just do a simple sum
print(new_df.groupby(level=[0,1]).sum())
count
product desc
car ford 100
Related
I want to do a window function (like the SUM() OVER() function), but there are two catches:
I want to consider the last 3 months on my moving sum, but the number of rows are not consistent. Some months have 3 entries, others may have 2, 4, 5, etc;
There is also a "group" column, and the moving sum should sum only the amounts of the same group.
In summary, a have a table that has the following structure:
id
date
group
amount
1
2022-01
group A
1100
2
2022-01
group D
2500
3
2022-02
group A
3000
4
2022-02
group B
1000
5
2022-02
group C
2500
6
2022-03
group A
2000
7
2022-04
group C
1000
8
2022-05
group A
1500
9
2022-05
group D
2000
10
2022-06
group B
1000
So, I want to add a moving sum column, containing the sum the amount for each group for the last 3 months. The sum should not reset every 3 months, but should consider only the previous values from the 3 months prior, and of the same group.
The end result should look like:
id
date
group
amount
moving_sum_three_months
1
2022-01
group A
1100
1100
2
2022-01
group D
2500
2500
3
2022-02
group A
3000
4100
4
2022-02
group B
1000
1000
5
2022-02
group C
2500
2500
6
2022-03
group A
2000
6100
7
2022-04
group C
1000
3500
8
2022-05
group A
1500
3500
9
2022-05
group D
2000
2000
10
2022-06
group B
1200
1200
The best example to see how the sum work in this example is line 8.
It considers only lines 8 and 6 for the sum, because they are the only one that meet the criteria;
Line 1 and 3 do not meet the criteria, because they are more than 3 months old from line 8 date;
All the other lines are not from group A, so they are also excluded from the sum.
Any ideias? Thanks in advance for the help!
Use SUM() as a window function partitioning the window by group in RANGE mode. Set the frame to go back 3 months prior the current record using INTERVAL '3 months', e.g.
SELECT *, SUM(amount) OVER w AS moving_sum_three_months
FROM t
WINDOW w AS (PARTITION BY "group" ORDER BY "date"
RANGE BETWEEN INTERVAL '3 months' PRECEDING AND CURRENT ROW)
ORDER BY id
Demo: db<>fiddle
I have a table that looks like this:
customer_id item price cost
1 Shoe 120 36
1 Bag 180 50
1 Shirt 30 9
2 Shoe 150 40
3 Shirt 30 9
4 Shoe 120 36
5 Shorts 65 14
I am trying to find the most expensive item each customer bought along with the cost of item and the item name.
I'm able to do the first part:
SELECT customer_id, max(price)
FROM sales
GROUP BY customer_id;
Which gives me:
customer_id price
1 180
2 150
3 30
4 120
5 65
How do I get this output to also show me the item and it's cost in the output? So output should look like this...
customer_id price item cost
1 180 Bag 50
2 150 Shoe 40
3 30 Shirt 9
4 120 Shoe 36
5 65 Shorts 14
I'm assuming its a Select statement within a Select? I would appreciate the help as I'm fairly new to SQL.
One method that usually has good performance is a correlated subquery:
select s.*
from sales s
where s.price = (select max(s2.price)
from sales s2
where s2.customer_id = s.customer_id
);
I have the following table "card_txns":
user_sign_month month months_since_cust country txn_amt
2018-01 2018-01 1 DE 100
2018-01 2018-02 1 DE 100
2018-01 2018-03 1 DE 100
2019-01 2019-01 1 IN 100
2019-02 2019-02 1 US 1,000
2019-03 2019-03 1 DE 1,000
2019-04 2019-04 1 US 1,000
I want to see the cumulative sum, total sum by txn_month column for 2019, and the following query is not returning that
SELECT month AS Tx_MONTH,
SUM(txn_amt) AS Total_transactions_2019,
(SELECT SUM(txn_amt)
FROM card_txns AS b
WHERE a.month >= b.month
AND a.country = b.country) AS CUM_MTD_Total
FROM card_txns AS a AND substring(month, 1, 4) = '2019'
GROUP BY month
ORDER BY month
The output should look like this:
Tx_MONTH Total_transactions_2019 CUM_MTD_Total
2019-01 100 100
2019-02 100 1100
2019-03 100 2100
2019-04 100 3100
I want to have the cumulative sum by month sorted in the above manner, s o 2019-01 should appear first and so on.
You would use window functions:
SELECT month AS Tx_MONTH, SUM(txn_amt) as Total_transactions_2019,
SUM(txn_amt) OVER (ORDER BY month) as MTD_Total
FROM card_txns ct
WHERE month LIKE '2019-%'
GROUP BY month
ORDER BY month;
Here is a db<>fiddle.
I need a calculated measure(SSAS MD) to calculate the percentage of count values.
I have tried below expression but I did not get the desired output.Let me know if I missing anything and I want to calculate the percentage of the age for the group by the car total:
( [DimCar].[Car], [DimAge].[Age], [Measure].[Count])/
sum([DimCar].[Car].[All].children), ([DimAge].[Age].[All], [Meaures].[Count])}*100
Below are the sample date values in cube:
Car Age Count
----- ----- -----
Benz 1 2
Camry 37
Honda 1 18
Honda 6 10
Expected output:
Car Age Count Percent TotalCount
----- ----- ----- ------ ----------
Benz 1 2 100% 2
Camry 37 100% 37
Honda 1 18 64.28% 28
Honda 6 10 35.71% 28
Forumula to calculate percentage:
18/28*100 =64.28%
10/28*100 =35.71%
Honda 1 18 64.28% 28
Honda 6 10 35.71% 28
with Member [Measures].[Total Sales Count]
as iif (isempty([Measures].[Sales]),NUll, sum([Model].[Modelname].[All].children ,[Measures].[Sales]))
Member [Measures].[Total Sales%]
as ([Measures].[Sales]/[Measures].[Total Sales Count]),FORMAT_STRING = "Percent"
select {[Measures].[Sales],[Measures].[Total Sales Count],[Measures].[Total Sales%]
}on 0
,non empty{[Car].[Carname].[Carname]*[Model].[Modelname].[Modelname]} on 1
from [Cube]
Output :
Car Model Sales Total Sales Count Total Sales%
Benz New Model 2 2 100.00%
Camry Old Model 37 37 100.00%
Honda New Model 18 28 64.29%
Honda Top Model 10 28 35.71%
Instead of "Age" attribute I have added "Model" dimension.
Below code get exact output which is expected.
enter image description here
My understanding is that for a particular car example honda, you want to divide by the total honda's irrespective of their Age. In this case 28. So for Age:six honda you use 10/28. Where as for Benz, since all Benz are Age: 1 you use 2.
Use the following code
Round(
(
( [DimCar].[Car].currentmember, [DimAge].[Age].currentmember, [Measure].[Count])
/
([DimCar].[Car].currentmember,root([DimAge]),[Measure].[Count])
)*100
,2)
Below is a similar example on adventure works
with member
measures.t as
(
( [Product].[Category].currentmember, [Delivery Date].[Calendar Year].currentmember, [Measures].[Internet Order Quantity])
/
([Product].[Category].currentmember,root([Delivery Date]),[Measures].[Internet Order Quantity])
)*100
select {[Measures].[Internet Order Quantity],measures.t}
on columns ,
non empty
([Product].[Category].[Category],[Delivery Date].[Calendar Year].[Calendar Year])
on rows
from [Adventure Works]
I have a simple table with Person, Date and Quantity:
Person Date Qty
Jim 08/01/16 1
Jim 08/02/16 3
Jim 08/03/16 2
Jim 08/04/16 1
Jim 08/05/16 1
Jim 08/06/16 6
Sheila 08/01/16 1
Sheila 08/02/16 1
Sheila 08/03/16 1
Sheila 08/04/16 1
Sheila 08/05/16 1
Sheila 08/06/16 1
I'd like to calculate two columns: Cumulative Total and Percentage of Total, resulting in the following table:
Person Date Qty cum tot pct of tot
Jim 08/01/16 1 1 7%
Jim 08/02/16 3 4 29%
Jim 08/03/16 2 6 43%
Jim 08/04/16 1 7 50%
Jim 08/05/16 1 8 57%
Jim 08/06/16 6 14 100%
Sheila 08/01/16 1 1 17%
Sheila 08/02/16 1 2 33%
Sheila 08/03/16 1 3 50%
Sheila 08/04/16 1 4 67%
Sheila 08/05/16 1 5 83%
Sheila 08/06/16 1 6 100%
And with this dataset, I would like to identify the date for each person where their pct of tot reaches 50% (or any other percentage I supply).
So the output for the 50% threshold would be:
Jim 08/04/16
Sheila 08/03/16
Any suggestions on how I can calculate the two columns and determine the appropriate dates?
You can use the ANSI standard cumulative sum function to calculate the cumulative sum. The rest is really just arithmetic:
select t.*
from (select t.*,
sum(qty) over (partition by person order by date) as running_qty,
sum(qty) over (partition by person) as tot_qty,
(sum(qty) over (partition by person order by date) * 1.0 /
sum(qty) over (partition by person)
) as running_percent
from sales t
) t
where running_percent >= 0.5 and
running_percent - (qty * 1.0 / tot_qty) < 0.5;
The reason the where clause has two conditions is to return a single row. The first will return all rows greater than or equal to 0.5, but you only want the first -- where the percent crosses the threshold.
The * 1.0 is because some databases do integer division.