Sum of Top N in SQL

Sum of Top N in SQL - sql

I have a SALES table with Person, Date and Qty:
Person Date Qty
Jim 2016-08-01 1
Jim 2016-08-02 3
Jim 2016-08-03 2
Sheila 2016-08-01 1
Sheila 2016-08-02 1
Sheila 2016-08-03 1
Bob 2016-08-03 6
Bob 2016-08-02 2
Bob 2016-08-01 5
I can rank the top 2 by Date with the following code:
/****** Top 2 Salespersons ******/
SELECT *
FROM(
SELECT * ,
ROW_NUMBER() OVER( PARTITION BY [Date]
ORDER BY Qty DESC) N'Rank'
FROM [Coinmarketcap].[dbo].[sales]
GROUP BY [Date], Person, Qty
) AS NewTable
WHERE NewTable.Rank < 3
Person Date Qty Rank
Bob 2016-08-01 5 1
Jim 2016-08-01 1 2
Jim 2016-08-02 3 1
Bob 2016-08-02 2 2
Bob 2016-08-03 6 1
Jim 2016-08-03 2 2
My two questions are:
1) How can I just see the total qty for the top 2 for each date, such as:
Date Total Qty
2016-08-01 6
2016-08-02 5
2016-08-03 8
2) How can I get the total Qty each day for different ranking groups, such as:
Date Ranking Group Total Qty
2018-08-01 1-2 6
2018-08-01 3-4 1
2018-08-01 5-6 0
2018-08-02 1-2 5
2018-08-02 3-4 1
2018-08-02 5-6 0
2018-08-03 1-2 8
2018-08-03 3-4 1
2018-08-03 5-6 0

First:
SELECT NewTable.Date, Sum(NewTable.Qty)
FROM(
SELECT * ,
ROW_NUMBER() OVER( PARTITION BY [Date]
ORDER BY Qty DESC) N'Rank'
FROM [Coinmarketcap].[dbo].[sales]
GROUP BY [Date], Person, Qty
) AS NewTable
WHERE NewTable.Rank < 3
group by NewTable.Date
Second try this:
SELECT NewTable.Date,
Trunc((NewTable.Rank - 1) / 2) * 2 + 1, -- lower rank
Trunc((NewTable.Rank - 1) / 2) * 2 + 2, -- upper rank
Sum(NewTable.Qty)
FROM(
SELECT * ,
ROW_NUMBER() OVER( PARTITION BY [Date]
ORDER BY Qty DESC) N'Rank'
FROM [Coinmarketcap].[dbo].[sales]
GROUP BY [Date], Person, Qty
) AS NewTable
group by NewTable.Date,
Trunc((NewTable.Rank - 1) / 2) * 2 + 1,
Trunc((NewTable.Rank - 1) / 2) * 2 + 2

Related

Using the earliest date of a partition to determine what other dates belong to that partition

Assume this is my table:
ID DATE
--------------
1 2018-11-12
2 2018-11-13
3 2018-11-14
4 2018-11-15
5 2018-11-16
6 2019-03-05
7 2019-05-07
8 2019-05-08
9 2019-05-08
I need to have partitions be determined by the first date in the partition. Where, any date that is within 2 days of the first date, belongs in the same partition.
The table would end up looking like this if each partition was ranked
PARTITION ID DATE
------------------------
1 1 2018-11-12
1 2 2018-11-13
1 3 2018-11-14
2 4 2018-11-15
2 5 2018-11-16
3 6 2019-03-05
4 7 2019-05-07
4 8 2019-05-08
4 9 2019-05-08
I've tried using datediff with lag to compare to the previous date but that would allow a partition to be inappropriately sized based on spacing, for example all of these dates would be included in the same partition:
ID DATE
--------------
1 2018-11-12
2 2018-11-14
3 2018-11-16
4 2018-11-18
3 2018-11-20
4 2018-11-22
Previous flawed attempt:
Mark when a date is more than 2 days past the previous date:
(case when datediff(day, lag(event_time, 1) over (partition by user_id, stage order by event_time), event_time) > 2 then 1 else 0 end)

You need to use a recursive CTE for this, so the operation is expensive.
with t as (
-- add an incrementing column with no gaps
select t.*, row_number() over (order by date) as seqnum
from t
),
cte as (
select id, date, date as mindate, seqnum
from t
where seqnum = 1
union all
select t.id, t.date,
(case when t.date <= dateadd(day, 2, cte.mindate)
then cte.mindate else t.date
end) as mindate,
t.seqnum
from cte join
t
on t.seqnum = cte.seqnum + 1
)
select cte.*, dense_rank() over (partition by mindate) as partition_num
from cte;

Subtract subsequent row from previous row based on User

I have the following data and I want to subtract current row from previous row based on the UserID. I tried the code below is not given me what I want
DECLARE #DATETBLE TABLE (UserID INT, Dates DATE)
INSERT INTO #DATETBLE VALUES
(1,'2018-01-01'), (1,'2018-01-02'), (1,'2018-01-03'),(1,'2018-01-13'),
(2,'2018-01-15'),(2,'2018-01-16'),(2,'2018-01-17'), (5,'2018-02-04'),
(5,'2018-02-05'),(5,'2018-02-06'),(5,'2018-02-11'), (5,'2018-02-17')
;with cte as (
select UserID,Dates, row_number() over (order by UserID) as seqnum
from #DATETBLE t
)
select t.UserID,t.Dates, datediff(day,tprev.Dates,t.Dates)as diff
from cte t left outer join
cte tprev
on t.seqnum = tprev.seqnum + 1;
Current Output
UserID Dates diff
1 2018-01-01 NULL
1 2018-01-02 1
1 2018-01-03 1
1 2018-01-13 10
2 2018-01-15 2
2 2018-01-16 1
2 2018-01-17 1
5 2018-02-04 18
5 2018-02-05 1
5 2018-02-06 1
5 2018-02-11 5
5 2018-02-17 6
My Expected Output
UserID Dates diff
1 2018-01-01 NULL
1 2018-01-02 1
1 2018-01-03 1
1 2018-01-13 10
2 2018-01-15 NULL
2 2018-01-16 1
2 2018-01-17 1
5 2018-02-04 NULL
5 2018-02-05 1
5 2018-02-06 1
5 2018-02-11 5
5 2018-02-17 6

Your tag (sql-server-2008) suggests me to use APPLY :
select t.userid, t.dates, datediff(day, t1.dates, t.dates) as diff
from #DATETBLE t outer apply
( select top (1) t1.*
from #DATETBLE t1
where t1.userid = t.userid and
t1.dates < t.dates
order by t1.dates desc
) t1;

If you have SQL Server version 2012 or higher, you could use LAG() with a partition by UserID:
SELECT UserID
, DATEDIFF(dd,COALESCE(LAG_DATES, Dates), Dates) as diff
FROM
(
SELECT UserID
, Dates
, LAG(Dates) OVER (PARTITION BY UserID ORDER BY Dates) as LAG_DATES
FROM #DATETBLE
) exp
This will give you a 0 value instead of a NULL value for the first date in the sequence though.
Since you tagged the post with SQL Server 2008, however, you may need to use a method that doesn't rely on this windowed function.

Rank by Date and Qty in SQL

I have a table called SALES, with Person, Date and Qty.
Person Date Qty
Jim 2016-08-01 1
Jim 2016-08-02 3
Jim 2016-08-03 1
Bob 2016-08-01 5
Bob 2016-08-02 1
Bob 2016-08-03 6
Sheila 2016-08-01 4
Sheila 2016-08-02 0
Sheila 2016-08-03 2
I'm looking to rank by qty for each date, with the following output:
Person Date Qty Rank
Bob 2016-08-01 5 1
Sheila 2016-08-01 4 2
Jim 2016-08-01 1 3
Jim 2016-08-02 3 1
Bob 2016-08-02 1 2
Sheila 2016-08-02 0 3
Bob 2016-08-03 6 1
Sheila 2016-08-03 2 2
Jim 2016-08-03 1 3
How do I use the Rank Function here?

Try this
DECLARE #SALES TABLE(
Person NVARCHAR(50),
[Date] NVARCHAR(50),
Qty INT
)
INSERT INTO #SALES VALUES('Jim','2016-08-01',1)
INSERT INTO #SALES VALUES('Jim','2016-08-02',3)
INSERT INTO #SALES VALUES('Jim','2016-08-03',1)
INSERT INTO #SALES VALUES('Bob','2016-08-01',5)
INSERT INTO #SALES VALUES('Bob','2016-08-02',1)
INSERT INTO #SALES VALUES('Bob','2016-08-03',6)
INSERT INTO #SALES VALUES('Sheila','2016-08-01',4)
INSERT INTO #SALES VALUES('Sheila','2016-08-02',0)
INSERT INTO #SALES VALUES('Sheila','2016-08-03',2)
SELECT * ,
ROW_NUMBER() OVER( PARTITION BY [Date]
ORDER BY Qty DESC) N'Rank'
FROM #SALES
GROUP BY [Date], Person, Qty
ORDER BY [Date] ASC,Qty DESC
Result
==============================================
Update: select top 2 for each day
SELECT *
FROM(
SELECT * ,
ROW_NUMBER() OVER( PARTITION BY [Date]
ORDER BY Qty DESC) N'Rank'
FROM #SALES
GROUP BY [Date], Person, Qty
) AS NewTable
WHERE NewTable.Rank < 3
Result
NOTE:
You can't use WHERE Rank < 3 directly, because WHERE clause is before SELECT statement, WHERE clause can't recognize Rank column. You have to use subquery.

Sql group by latest repeated field

I don't even know what's a good title for this question.
But I'm having a table:
create table trans
(
[transid] INT IDENTITY (1, 1) NOT NULL,
[customerid] int not null,
[points] decimal(10,2) not null,
[date] datetime not null
)
and records:
--cus1
INSERT INTO trans ( customerid , points , date )
VALUES ( 1, 10, '2016-01-01' ) , ( 1, 20, '2017-02-01' ) , ( 1, 22, '2017-03-01' ) ,
( 1, 24, '2018-02-01' ) , ( 1, 50, '2018-02-25' ) , ( 2, 44, '2016-02-01' ) ,
( 2, 20, '2017-02-01' ) , ( 2, 32, '2017-03-01' ) , ( 2, 15, '2018-02-01' ) ,
( 2, 10, '2018-02-25' ) , ( 3, 10, '2018-02-25' ) , ( 4, 44, '2015-02-01' ) ,
( 4, 20, '2015-03-01' ) , ( 4, 32, '2016-04-01' ) , ( 4, 15, '2016-05-01' ) ,
( 4, 10, '2017-02-25' ) , ( 4, 10, '2018-02-27' ) ,( 4, 20, '2018-02-28' ) ,
( 5, 44, '2015-02-01' ) , ( 5, 20, '2015-03-01' ) , ( 5, 32, '2016-04-01' ) ,
( 5, 15, '2016-05-01' ) ,( 5, 10, '2017-02-25' );
-- selecting the data
select * from trans
Produces:
transid customerid points date
----------- ----------- --------------------------------------- -----------------------
1 1 10.00 2016-01-01 00:00:00.000
2 1 20.00 2017-02-01 00:00:00.000
3 1 22.00 2017-03-01 00:00:00.000
4 1 24.00 2018-02-01 00:00:00.000
5 1 50.00 2018-02-25 00:00:00.000
6 2 44.00 2016-02-01 00:00:00.000
7 2 20.00 2017-02-01 00:00:00.000
8 2 32.00 2017-03-01 00:00:00.000
9 2 15.00 2018-02-01 00:00:00.000
10 2 10.00 2018-02-25 00:00:00.000
11 3 10.00 2018-02-25 00:00:00.000
12 4 44.00 2015-02-01 00:00:00.000
13 4 20.00 2015-03-01 00:00:00.000
14 4 32.00 2016-04-01 00:00:00.000
15 4 15.00 2016-05-01 00:00:00.000
16 4 10.00 2017-02-25 00:00:00.000
17 4 10.00 2018-02-27 00:00:00.000
18 4 20.00 2018-02-28 00:00:00.000
19 5 44.00 2015-02-01 00:00:00.000
20 5 20.00 2015-03-01 00:00:00.000
21 5 32.00 2016-04-01 00:00:00.000
22 5 15.00 2016-05-01 00:00:00.000
23 5 10.00 2017-02-25 00:00:00.000
I'm trying to group all the customerid and sum their points. But here's the catch, If the trans is not active for 1 year(the next tran is 1 year and above), the points will be expired.
For this case:
Points for each customers should be:
Customer1 20+22+24+50
Customer2 20+32+15+10
Customer3 10
Customer4 10+20
Customer5 0
Here's what I have so far:
select
t1.transid as transid1,
t1.customerid as customerid1,
t1.date as date1,
t1.points as points1,
t1.rank1 as rank1,
t2.transid as transid2,
t2.customerid as customerid2,
t2.points as points2,
isnull(t2.date,getUTCDate()) as date2,
isnull(t2.rank2,t1.rank1+1) as rank2,
cast(case when(t1.date > dateadd(year,-1,isnull(t2.date,getUTCDate()))) Then 0 ELSE 1 END as bit) as ShouldExpire
from
(
select transid,CustomerID,Date,points,
RANK() OVER(PARTITION BY CustomerID ORDER BY date ASC) AS RANK1
from trans
)t1
left join
(
select transid,CustomerID,Date,points,
RANK() OVER(PARTITION BY CustomerID ORDER BY date ASC) AS RANK2
from trans
)t2 on t1.RANK1=t2.RANK2-1
and t1.customerid=t2.customerid
which gives
from the above table,how do I check for ShouldExpire field having max(rank1) for customer, if it's 1, then totalpoints will be 0, otherwise,sum all the consecutive 0's until there are no more records or a 1 is met?
Or is there a better approach to this problem?

The following query uses LEAD to get the date of the next record withing the same CustomerID slice:
;WITH CTE AS (
SELECT transid, CustomerID, [Date], points,
LEAD([Date]) OVER (PARTITION BY CustomerID
ORDER BY date ASC) AS nextDate,
CASE
WHEN [date] > DATEADD(YEAR,
-1,
-- same LEAD() here as above
ISNULL(LEAD([Date]) OVER (PARTITION BY CustomerID
ORDER BY date ASC),
getUTCDate()))
THEN 0
ELSE 1
END AS ShouldExpire
FROM trans
)
SELECT transid, CustomerID, [Date], points, nextDate, ShouldExpire
FROM CTE
ORDER BY CustomerID, [Date]
Output:
transid CustomerID Date points nextDate ShouldExpire
-------------------------------------------------------------
1 1 2016-01-01 10.00 2017-02-01 1 <-- last exp. for 1
2 1 2017-02-01 20.00 2017-03-01 0
3 1 2017-03-01 22.00 2018-02-01 0
4 1 2018-02-01 24.00 2018-02-25 0
5 1 2018-02-25 50.00 NULL 0
6 2 2016-02-01 44.00 2017-02-01 1 <-- last exp. for 2
7 2 2017-02-01 20.00 2017-03-01 0
8 2 2017-03-01 32.00 2018-02-01 0
9 2 2018-02-01 15.00 2018-02-25 0
10 2 2018-02-25 10.00 NULL 0
11 3 2018-02-25 10.00 NULL 0 <-- no exp. for 3
12 4 2015-02-01 44.00 2015-03-01 0
13 4 2015-03-01 20.00 2016-04-01 1
14 4 2016-04-01 32.00 2016-05-01 0
15 4 2016-05-01 15.00 2017-02-25 0
16 4 2017-02-25 10.00 2018-02-27 1 <-- last exp. for 4
17 4 2018-02-27 10.00 2018-02-28 0
18 4 2018-02-28 20.00 NULL 0
19 5 2015-02-01 44.00 2015-03-01 0
20 5 2015-03-01 20.00 2016-04-01 1
21 5 2016-04-01 32.00 2016-05-01 0
22 5 2016-05-01 15.00 2017-02-25 0
23 5 2017-02-25 10.00 NULL 1 <-- last exp. for 5
Now, you seem to want to calculate the sum of points after the last expiration.
Using the above CTE as a basis you can achieve the required result with:
;WITH CTE AS (
... above query here ...
)
SELECT CustomerID,
SUM(CASE WHEN rnk = 0 THEN points ELSE 0 END) AS sumOfPoints
FROM (
SELECT transid, CustomerID, [Date], points, nextDate, ShouldExpire,
SUM(ShouldExpire) OVER (PARTITION BY CustomerID ORDER BY [Date] DESC) AS rnk
FROM CTE
) AS t
GROUP BY CustomerID
Output:
CustomerID sumOfPoints
-----------------------
1 116.00
2 77.00
3 10.00
4 30.00
5 0.00
Demo here

The tricky part here is to dump all points when they expire, and start accumulating them again. I assumed that if there was only one transaction that we don't expire the points until there's a new transaction, even if that first transaction was over a year ago now?
I also get a different answer for customer #5, as they do appear to have a "transaction chain" that hasn't expired?
Here's my query:
WITH ordered AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY customerid ORDER BY [date]) AS order_id
FROM
trans),
max_transid AS (
SELECT
customerid,
MAX(transid) AS max_transid
FROM
trans
GROUP BY
customerid),
not_expired AS (
SELECT
t1.customerid,
t1.points,
t1.[date] AS t1_date,
CASE
WHEN m.customerid IS NOT NULL THEN GETDATE()
ELSE t2.[date]
END AS t2_date
FROM
ordered t1
LEFT JOIN ordered t2 ON t2.customerid = t1.customerid AND t1.transid != t2.transid AND t2.order_id = t1.order_id + 1 AND t1.[date] > DATEADD(YEAR, -1, t2.[date])
LEFT JOIN max_transid m ON m.customerid = t1.customerid AND m.max_transid = t1.transid
),
max_not_expired AS (
SELECT
customerid,
MAX(t1_date) AS max_expired
FROM
not_expired
WHERE
t2_date IS NULL
GROUP BY
customerid)
SELECT
n.customerid,
SUM(n.points) AS points
FROM
not_expired n
LEFT JOIN max_not_expired m ON m.customerid = n.customerid
WHERE
ISNULL(m.max_expired, '19000101') < n.t1_date
GROUP BY
n.customerid;
It could be refactored to be simpler, but I wanted to show the steps to get to the final answer:
customerid points
1 116.00
2 77.00
3 10.00
4 30.00
5 57.00

can you try this:
SELECT customerid,
Sum(t1.points)
FROM trans t1
WHERE NOT EXISTS (SELECT 1
FROM trans t2
WHERE Datediff(year, t1.date, t2.date) >= 1)
GROUP BY t1.customerid
Hope it helps!

try this:
select customerid,Sum(points)
from trans where Datediff(year, date, GETDATE()) < 1
group by customerid
output:
customerid Points
1 - 74.00
2 - 25.00
3 - 10.00
4 - 30.00

Query and Partition By clause group by window

I've the following code
declare #test table (id int, [Status] int, [Date] date)
insert into #test (Id,[Status],[Date]) VALUES
(1,1,'2018-01-01'),
(2,1,'2018-01-01'),
(1,1,'2017-11-01'),
(1,2,'2017-10-01'),
(1,1,'2017-09-01'),
(2,2,'2017-01-01'),
(1,1,'2017-08-01'),
(1,1,'2017-07-01'),
(1,1,'2017-06-01'),
(1,2,'2017-05-01'),
(1,1,'2017-04-01'),
(1,1,'2017-03-01'),
(1,1,'2017-01-01')
SELECT
id,
[Status],
MIN([Date]) OVER (PARTITION BY id,[Status] ORDER BY [Date],id,[Status] ) as WindowStart,
max([Date]) OVER (PARTITION BY id,[Status] ORDER BY [Date],id,[Status]) as WindowEnd,
COUNT(*) OVER (PARTITION BY id,[Status] ORDER BY [Date],id,[Status] ) as total
from #test
But the result is this:
id Status WindowStart WindowEnd total
1 1 2017-01-01 2017-01-01 1
1 1 2017-01-01 2017-03-01 2
1 1 2017-01-01 2017-04-01 3
1 1 2017-01-01 2017-06-01 4
1 1 2017-01-01 2017-07-01 5
1 1 2017-01-01 2017-08-01 6
1 1 2017-01-01 2017-09-01 7
1 1 2017-01-01 2017-11-01 8
1 1 2017-01-01 2018-01-01 9
1 2 2017-05-01 2017-05-01 1
1 2 2017-05-01 2017-10-01 2
2 1 2018-01-01 2018-01-01 1
2 2 2017-01-01 2017-01-01 1
And I need to be grouped by window like this.
id Status WindowStart WindowEnd total
1 1 2017-01-01 2017-04-01 3
1 2 2017-05-01 2017-05-01 1
1 1 2017-06-01 2017-09-01 4
1 2 2017-10-01 2017-10-01 1
1 1 2017-11-01 2018-01-01 2
2 1 2018-01-01 2018-01-01 1
2 2 2017-01-01 2017-01-01 1
The first group for the id= 1 Status = 1 should end at the first row with Status = 2 (2017-05-01) so the total is 3 and then start again from the 2017-06-01 to 2017-09-01 with a total of 4 rows.
How can get this done?

This is a "classic" Groups and Island issue. There's probably 1000's of answers for these on the Internet.
This works for what you're after, however, try having a bit more of a research before hand. :)
WITH Groups AS(
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY [Date]) -
ROW_NUMBER() OVER (PARTITION BY id, [status] ORDER BY [Date]) AS Grp
FROM #test t)
SELECT G.id,
G.[Status],
MIN([Date]) AS WindowStart,
MAX([date]) AS WindowsEnd,
COUNT(*) AS Total
FROM Groups G
GROUP BY G.id,
G.[Status],
G.Grp
ORDER BY G.id, WindowStart;
Note, that the ordering of your last 2 lines is the other way round in this solution; it seems you're ordering ASCENDING for id 1, for DESCENDING for id 2 in your expected results.

Here is one way using LAG function
;WITH cte
AS (SELECT *,
grp = Sum(CASE WHEN prev_val = Status THEN 0 ELSE 1 END)
OVER(partition BY id ORDER BY Date)
FROM (SELECT *,
prev_val = Lag(Status)OVER(partition BY id ORDER BY Date)
FROM #test) a)
SELECT id,
Status,
WindowStart = Min(date),
WindowEnd = Max(date),
Total = Count(*)
FROM cte
GROUP BY id, Status, grp
Using lag function first find the previous status of each date, then using Sum over() create a group by incrementing the number only when there is a change in status.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Sum of Top N in SQL - sql

Related

Using the earliest date of a partition to determine what other dates belong to that partition

Subtract subsequent row from previous row based on User

Rank by Date and Qty in SQL

Sql group by latest repeated field

Query and Partition By clause group by window

Categories

Resources