MSSQL: How to display TOP 10 items from GROUP BY query? - sql

I have a query which displays this as result:
Year-Month SN_NAME Raised Incidents
2015-11 A 14494
2015-11 B 8432
2015-11 D 5496
2015-11 G 4778
2015-11 H 4554
2015-11 C 4203
2015-11 X 3477
.......+ thousands more rows for 2015-11
2015-12 A 3373
2015-12 B 3322
2015-12 H 2814
2015-12 D 2745
......+ thousands more rows for 2015-12
......+ thousands more rows for 2016-01 - 2016-10
2016-11 B 2645
2016-11 C 2571
2016-11 E 2475
2016-11 D 2466
....+ thousands more rows for 2016-11
I need to select TOP 10 SN_NAME by Raised_Incident count from last month and and then show their COUNTS for previous 12 months.
The query I use to display above result is this one:
DECLARE #startOfCurrentMonth DATETIME
SET #startOfCurrentMonth = DATEADD(month, DATEDIFF(month, 0, CURRENT_TIMESTAMP), 0)
SELECT
CONVERT(char(7),IM.SN_SYS_CREATED_ON,121) as "Year-Month"
,CI.SN_NAME
,COUNT(IM.SN_NUMBER) as "Raised Incidents"
FROM [dbo].[tab_IM_Incident] IM
LEFT JOIN [dbo].[tab_SNOW_CMDB_CI] CI on IM.SN_CMDB_CI = CI.SN_SYS_ID
WHERE
IM.SN_SYS_CREATED_ON >= DATEADD(month, -13, #startOfCurrentMonth) AND IM.SN_SYS_CREATED_ON < #startOfCurrentMonth
AND (IM.SN_U_SUB_STATE <> 'Cancelled' OR IM.SN_U_SUB_STATE IS NULL)
GROUP BY
CONVERT(char(7),IM.SN_SYS_CREATED_ON,121)
, CI.SN_NAME
ORDER BY
CONVERT(char(7),IM.SN_SYS_CREATED_ON,121)
, COUNT(IM.SN_NUMBER) DESC
The problem is I don't know how to limit each month values to TOP10 only, as the query returns me around 200 000 rows in total, while it should return 13x10 = 130 rows.
The expected output is exactly as on top of the question, but limited to only top 10 rows per month for last 13 months.
Please advise.

If I understand correctly, you want the 10 incidents that are top for the most recent month, and then to see their incidents for all months in the data.
Here is one method:
with t as (
your query here
)
select t.*
from (select top 10 t.*
from t
order by YearMonth desc, RaisedIncidents desc
) top10 left join
t
on t.sn_name = top10.sn_name
order by YearMonth desc, RaisedIncidents desc;
Note that the top 10 is not filtering on the latest month. Instead, it orders by the latest month and then RaisedIncidents. This assumes that there are at least 10 incidents in the most recent month.

Related

Oracle PL/SQL group by with date field produces confusing results

In a query, I did a GROUP BY on a date field that produced these summary results. The Query was like this:
SELECT
(CASE
WHEN std.attribute_1 like '%709%' OR std.attribute_1 like '%999%' THEN 'COMPA' -- COMPA invoices either start with 709 or 999
WHEN h.manual_upload = 'Y' then 'MANUAL_UPLOAD'
ELSE 'OTHER' END) AS BILLING_SOURCE, std.created_on, COUNT(DISTINCT std.invoice_number) AS COUNT_OF_INVOICES
FROM onebiller.t_std_in_detail_his std
INNER JOIN onebiller.t_std_in_header h
ON h.job_id = std.job_id
WHERE std.invoice_number IS NOT NULL
GROUP BY (CASE
WHEN std.attribute_1 like '%709%' OR std.attribute_1 like '%999%' THEN 'COMPA' -- COMPA invoices either start with 709 or 999
WHEN h.manual_upload = 'Y' then 'MANUAL_UPLOAD'
ELSE 'OTHER' END), std.created_on
ORDER BY std.created_on ASC
Note these results for 3 datetimes on 19-Mar-2021 from the created_on field.
I then used TRUNC(created_on) to try to group all the records from a single day. This was the updated query:
SELECT
(CASE
WHEN std.attribute_1 like '%709%' OR std.attribute_1 like '%999%' THEN 'COMPA' -- COMPA invoices either start with 709 or 999
WHEN h.manual_upload = 'Y' then 'MANUAL_UPLOAD'
ELSE 'OTHER' END) AS BILLING_SOURCE, TRUNC(std.created_on), COUNT(DISTINCT std.invoice_number) AS COUNT_OF_INVOICES
FROM onebiller.t_std_in_detail_his std
INNER JOIN onebiller.t_std_in_header h
ON h.job_id = std.job_id
WHERE std.invoice_number IS NOT NULL
GROUP BY (CASE
WHEN std.attribute_1 like '%709%' OR std.attribute_1 like '%999%' THEN 'COMPA' -- COMPA invoices either start with 709 or 999
WHEN h.manual_upload = 'Y' then 'MANUAL_UPLOAD'
ELSE 'OTHER' END), TRUNC(std.created_on)
ORDER BY TRUNC(std.created_on) ASC
I was expecting a result that would sum the 3 highlighted rows from the first query (2+165+164) instead I received a count of 166 for 19-Mar-2021. Why didn't I get the sum of (2+165+164)?
Because you have the same invoice number repeated at different times on the same day. As a simpler example, say you have:
CREATED_ON
INVOICE_NUMBER
2021-03-19 09:00:00
1
2021-03-19 09:00:00
2
2021-03-19 12:00:00
2
2021-03-19 15:00:00
1
2021-03-19 15:00:00
2
2021-03-19 15:00:00
3
That shows 6 rows, but only three distinct invoice numbers - 1, 2 and 3.
A simplified version of your first query gives:
SELECT std.created_on,
COUNT(DISTINCT std.invoice_number) AS COUNT_OF_INVOICES
FROM std
GROUP BY std.created_on
ORDER BY std.created_on ASC
CREATED_ON
COUNT_OF_INVOICES
2021-03-19 09:00:00
2
2021-03-19 12:00:00
1
2021-03-19 15:00:00
3
The sum of those counts currently matches the number of rows in the table, 6. (I haven't included any duplicates at the same time, so the distinct isn't doing anything at the moment, again to keep it simple.) The row for 09:00 counts invoice numbers 1 and 2; the row for 12:00 only counts 2; and the row for 15:00 counts 1, 2 and 3. The counts in all three rows include a count for invoice number 2, the first and third include a count for invoice number 1 - so the same invoice numbers are being counted multiple times.
SELECT TRUNC(std.created_on) as created_on,
COUNT(DISTINCT std.invoice_number) AS COUNT_OF_INVOICES
FROM std
GROUP BY TRUNC(std.created_on)
ORDER BY TRUNC(std.created_on) ASC
CREATED_ON
COUNT_OF_INVOICES
2021-03-19 00:00:00
3
Now the single result is three, because that's how many distinct invoice numbers there are that day - it's now counting 1, 2 and 3 once each, not 2x2, 3x2 and 1x3.
If you didn't have the distinct then the second query would also get 6, because it wouldn't be eliminating the duplicates seen in the first query.
fiddle

SQL Query: CREATE a table with rows divided by month/year and COUNT the number of values WHERE '01/month/year' IS BETWEEN two date-columns

this is my first question here.
I have a problem in creating a complex query to group values based on if the first day of month/year falls in between two date columns.
here is an example of the table I have:
USER_ID
START_DATE
END_DATE
A
03/07/2020
31/07/2020
A
05/06/2020
03/07/2020
A
08/05/2020
05/06/2020
A
10/04/2020
08/05/2020
B
13/02/2020
12/03/2020
B
16/01/2020
13/02/2020
C
22/05/2020
19/06/2020
C
24/04/2020
22/05/2020
D
25/09/2020
23/10/2020
D
28/08/2020
25/09/2020
D
31/07/2020
28/08/2020
D
03/07/2020
31/07/2020
D
05/06/2020
03/07/2020
E
25/11/2020
23/12/2020
E
28/10/2020
25/11/2020
E
30/09/2020
28/10/2020
F
14/2/2020
13/3/2020
F
17/1/2020
14/2/2020
F
20/12/2019
17/1/2020
F
22/11/2019
20/12/2019
G
7/11/2020
5/12/2020
G
10/10/2020
7/11/2020
and I wish to have something like that:
YEAR
MONTH
COUNT(DISTINCT USER_ID)
2019
11
0
2019
12
1
2020
1
1
2020
2
2
2020
3
2
2020
4
0
2020
5
2
2020
6
2
2020
7
2
2020
8
1
2020
9
1
2020
10
2
2020
11
2
2020
12
2
For instance, in Feb 2020 user "B" and user "F" had a range of dates that included the date 01/Feb/2020 (the condition is true for:
USER_ID
START_DATE
END_DATE
B
16/01/2020
13/02/2020
and for:
USER_ID
START_DATE
END_DATE
F
17/1/2020
14/2/2020
...so the count will be 2.
Do you know any way to do it in SQL (or Ruby)?
Thanks a lot!
Try this :
WITH m AS
( SELECT generate_series(min(date_trunc('month', start_date)), max(end_date), '1 month') :: date AS month
FROM my_table AS t
)
SELECT to_char(m.month, 'YYYY') AS year
, to_char(m.month, 'MM') AS month
, count(DISTINCT t.user_id) AS "count(distinct user_id)"
FROM my_table AS t
RIGHT JOIN m
ON daterange(t.start_date, t.end_date) #> m.month
GROUP BY m.month
ORDER BY m.month
The first query "m" calculates the list of months that cover the start_date and end_date of my_table.
The second query join my_table with the resulting table "m" in order to select all the users whose interval daterange(start_date, end_date) contains the 1st day of the month (see the manual).
Then the rows are grouped by m.month and the number of distinct user_id per month is calculated with the count(DISTINCT user_id) aggregate function (see the manual).
Finally the RIGHT JOIN clause allows to select the months with no corresponding user_id in my_table (see the manual).
See the test result in dbfiddle.

Inner Join - special time conditions

Given an hourly table A with full heart_rate records, e.g.:
User Hour Heart_rate
Joe 1 60
Joe 2 70
Joe 3 72
Joe 4 75
Joe 5 68
Joe 6 71
Joe 7 78
Joe 8 83
Joe 9 85
Joe 10 80
And a subset hours where a purchase happened, e.g.
User Hour Purchase
Joe 3 'Soda'
Joe 9 'Coke'
Joe 10 'Doughnut'
I want to keep only those records from A that are in B or at most 2hr behind the B subset, without duplication, i.e. and preserving both the heart_rate from A and the item purchased from b so the outcome is
User Hour Heart_rate Purchase
Joe 1 60 null
Joe 2 70 null
Joe 3 72 'Soda'
Joe 7 78 null
Joe 8 83 null
Joe 9 85 'Coke'
Joe 10 80 'Doughnut'
How can the result be achieved with an inner join, without duplication (in this case the hours 8&9) (This is an MWE, assume multiple users and timestamps instead of hours)
The obvious solution is to combine
Inner Join + deduplication
Left join
Can this be achieved in a more elegant way?
You could use an INNER join of the tables and conditional aggregation for the deduplication:
SELECT a.User, a.Hour, a.Heart_rate,
MAX(CASE WHEN a.Hour = b.Hour THEN b.Purchase END) Purchase
FROM a INNER JOIN b
ON b.User = a.User AND a.Hour BETWEEN b.Hour - 2 AND b.Hour
WHERE a.User = 'Joe' -- remove this line if you want results for all users
GROUP BY a.User, a.Hour, a.Heart_rate;
Or with MAX() window function:
SELECT DISTINCT a.*,
MAX(CASE WHEN a.Hour = b.Hour THEN b.Purchase END) OVER (PARTITION BY a.User, a.Hour) Purchase
FROM a INNER JOIN b
ON b.User = a.User AND a.Hour BETWEEN b.Hour - 2 AND b.Hour;
See the demo (for MySql but it is standard SQL).
Your solutiuons should work and sounds good.
There is another way, using 3 Select Statements.
The inner Select combines both tables by UNION ALL. Because only tables with the same columns can be combinded, fields which are only in one table have to be defined in the other one as well and set to null. The column hour_eat is added to see when the last purchase has occured. By sorting this table, we can archive that under each row from table B lies now the row of table A which occures next.
In the middle Select statement the lag(Purchase) gets the last Purchase. If we only think about the rows from the 1st table, the Purchase value from the 2nd table is now at the right place. This comes in handy if timestamps and not defined hours are used. The row the last_value calculates the time between the purchase and measurement of the heart_beat.
The outer Select filters the rows of interest. The last 2 hours before the purchase and only the rows of the 1st table.
With
heart_tbl as (SELECT "Joe" as USER, row_number() over() Hour, Heart_rate from unnest([60,72,72,75,68,71,78,83,85,80]) Heart_rate ),
eat_tbl as (Select "Joe" as User ,3 Hour , 'Soda' as Purchase UNION ALL SELECT "Joe", 9, 'Coke' UNION ALL SELECT "Joe", 10, 'Doughnut' )
SELECT user, hour,heart_rate,Purchase_,hours_till_Purchase
from
(
SELECT *,
lag(Purchase) over (order by hour, heart_rate is not null) as Purchase_,
hour-last_value(hour_eat ignore nulls) over (order by hour desc,heart_rate is not null) as hours_till_Purchase
From # combine both tables to one table (ordered by hours)
(
SELECT user, hour,heart_rate, null as Purchase, null as hour_eat from heart_tbl
UNION ALL
Select user, hour, null as heart_rate, Purchase, hour from eat_tbl
)
)
Where heart_rate is not null and hours_till_Purchase >= -2
order by hour

I want to set data from following table group by Weeks in SQL

Date Sale Product Name
+---------------------------+------------+-----------+--------------+
2018-10-25 05:27:35.9070422 1000 P1
2018-10-18 05:27:35.9070422 2000 P2
2018-10-2 05:27:35.9070422 3050 P3
2018-10-10 05:27:35.9070422 1000 P4
2018-10-5 05:27:35.9070422 1000 P5
Let suppose today is 26-05-18
So my result should look like this.
Week Sales
+--------------------+------------+
1 4050
2 1000
3 2000
4 1000
For the week number you can use PARTDATE() function,
So you query should be :
select partdate(WK,r.date) numweek, sum(r.sales) totsales
from yourtable r
group by partdate(WK,r.date)
Take a look please at Get week number from dates in T-SQL
You can also subtruct an N numeric value from partdate(WK,r.date) and then beware of group by clause, to start your selection of weeks with 1.
Hope this can help you
You need row_number() :
select row_number() over (order by datepart(week, t.date)) as week, sum(t.amount) as sales
from table t
group by datepart(week, t.date);

Fill rows for missing data by last day of month

I have a table that looks like
UserID LastDayofMonth Count
1234 2015-09-30 00:00:00 12
1237 2015-09-30 00:00:00 5
3233 2015-09-30 00:00:00 3
8336 2015-09-30 00:00:00 22
1234 2015-10-31 00:00:00 8
1237 2015-10-31 00:00:00 5
3233 2015-10-31 00:00:00 7
8336 2015-11-30 00:00:00 52
1234 2015-11-30 00:00:00 8
1237 2015-11-30 00:00:00 5
3233 2015-11-30 00:00:00 7
(with around ~10,000 rows). As you can see in the example, UserID 8336 has no record for October 31st (dates are monthly but always the last day of the month, which I want to keep). How do I return a table with a records that fills in records for a period of four months so that users like 8336 get records like
8336 2015-10-31 00:00:00 0
I do have a calendar table with all days that I can use.
If I understand correctly, you want a record for each user and for each end of month. And, if the record does not currently exist, then you want the value of 0.
This is two step process. Generate all the rows first, using cross join. Then use left join to get the values.
So:
select u.userId, l.LastDayofMonth, coalesce(t.cnt, 0) as cnt
from (select distinct userId from t) u cross join
(select distinct LastDayofMonth from t) l left join
t
on t.userId = u.userId and t.LastDayofMonth = l.LastDayofMonth;
This solution uses a couple of CTEs, not knowing your calendar table layout. The only advantage this solution has over Gordon Linoff's is it doesn't assume at least one user per possible month. I've provided test data per your example with an extra record for the month of July, skipping August entirely.
/************** TEST DATA ******************/
IF OBJECT_ID('MonthlyUserCount','U') IS NULL
BEGIN
CREATE TABLE MonthlyUserCount
(
UserID INT
, LastDayofMonth DATETIME
, [Count] INT
)
INSERT MonthlyUserCount
VALUES (1234,'2015-07-31 00:00:00',12),--extra record
(1234,'2015-09-30 00:00:00',12),
(1237,'2015-09-30 00:00:00',5),
(3233,'2015-09-30 00:00:00',3),
(8336,'2015-09-30 00:00:00',22),
(1234,'2015-10-31 00:00:00',8),
(1237,'2015-10-31 00:00:00',5),
(3233,'2015-10-31 00:00:00',7),
(8336,'2015-11-30 00:00:00',52),
(1234,'2015-11-30 00:00:00',8),
(1237,'2015-11-30 00:00:00',5),
(3233,'2015-11-30 00:00:00',7)
END
/************ END TEST DATA ***************/
DECLARE #Start DATETIME;
DECLARE #End DATETIME;
--establish a date range
SELECT #Start = MIN(LastDayofMonth) FROM MonthlyUserCount;
SELECT #End = MAX(LastDayofMonth) FROM MonthlyUserCount;
--create a custom calendar of days using the date range above and identify the last day of the month
--if your calendar table does this already, modify the next cte to mimic this functionality
WITH cteAllDays AS
(
SELECT #Start AS [Date], CASE WHEN DATEPART(mm, #Start) <> DATEPART(mm, #Start+1) THEN 1 ELSE 0 END [Last]
UNION ALL
SELECT [Date]+1, CASE WHEN DATEPART(mm,[Date]+1) <> DatePart(mm, [Date]+2) THEN 1 ELSE 0 END
FROM cteAllDays
WHERE [Date]< #End
),
--cte using calendar of days to associate every user with every end of month
cteUserAllDays AS
(
SELECT DISTINCT m.UserID, c.[Date] LastDayofMonth
FROM MonthlyUserCount m, cteAllDays c
WHERE [Last]=1
)
--left join the cte to evaluate the NULL and present a 0 count for that month
SELECT c.UserID, c.LastDayofMonth, ISNULL(m.[Count],0) [Count]
FROM cteUserAllDays c
LEFT JOIN MonthlyUserCount m ON m.UserID = c.UserID
AND m.LastDayofMonth =c.LastDayofMonth
ORDER BY c.LastDayofMonth, c.UserID
OPTION ( MAXRECURSION 0 )