I have members, the group in which they belong and datetimes in which they were active. I want to find out which of the members had gap of more than 3 months between dates and I need to rank them.
header 1
header 2
Create Date
Rank
11111
EAM
2022-01-27 12:23:28.474000000
1
11111
EAM
2022-08-25 10:41:15.500000000
2
11111
EAM
2022-09-01 18:15:07.362000000
2
11111
EAM
2022-09-08 13:03:38.859000000
2
11111
EAM
2022-10-06 18:15:07.245000000
2
11111
PEM
2022-07-25 10:41:15.500000000
1
11111
PEM
2022-08-25 10:41:15.500000000
1
11111
PEM
2022-09-26 13:03:38.859000000
1
The desired result is above with the rank; the table contains the data without the Rank column.
One method is to use LAG to get the prior date, compared the 2 dates return 1 if it's more than 3 months, and then SUM those values in a windowed aggregate:
WITH CTE AS(
SELECT header1,
header2,
CreateDate,
CASE WHEN DATEDIFF(MONTH,LAG(CreateDate) OVER (PARTITION BY header2 ORDER BY CreateDate),CreateDate) > 3 THEN 1 ELSE 0 END AS Counter
FROM (VALUES(11111,'EAM',CONVERT(datetime2(7),'2022-01-27 12:23:28.474000000')),
(11111,'EAM',CONVERT(datetime2(7),'2022-08-25 10:41:15.500000000')),
(11111,'EAM',CONVERT(datetime2(7),'2022-09-01 18:15:07.362000000')),
(11111,'EAM',CONVERT(datetime2(7),'2022-09-08 13:03:38.859000000')),
(11111,'EAM',CONVERT(datetime2(7),'2022-10-06 18:15:07.245000000')),
(11111,'PEM',CONVERT(datetime2(7),'2022-07-25 10:41:15.500000000')),
(11111,'PEM',CONVERT(datetime2(7),'2022-08-25 10:41:15.500000000')),
(11111,'PEM',CONVERT(datetime2(7),'2022-09-26 13:03:38.859000000')))V(header1,header2,CreateDate))
SELECT header1,
header2,
CreateDate,
SUM(Counter) OVER (PARTITION BY header2 ORDER BY CreateDate) + 1 AS Rank
FROM CTE;
select header1
,header2
,Create_Date
,dense_rank() over(partition by header1, header2 order by flg) as Rank
from
(
select *
,case when datediff(month, Create_Date, lead(Create_Date) over(partition by header1, header2 order by Create_Date)) >= 3 then 0 else 1 end as flg
from t
) t
header1
header2
Create_Date
Rank
11111
EAM
2022-01-27 12:23:28.000
1
11111
EAM
2022-08-25 10:41:15.000
2
11111
EAM
2022-09-01 18:15:07.000
2
11111
EAM
2022-09-08 13:03:38.000
2
11111
EAM
2022-10-06 18:15:07.000
2
11111
PEM
2022-07-25 10:41:15.000
1
11111
PEM
2022-08-25 10:41:15.000
1
11111
PEM
2022-09-26 13:03:38.000
1
Fiddle
Related
I have members, the group in which they belong, file type and datetimes in which they were active. I want to find out which members based on the group and file type rankings when there is a gap of 1 month or more.
MID
Group
File Type
Create_Date
123A
EAM
Partial
2022-01-16 12:23:28.474000000
123A
EAM
Full
2022-03-01 10:41:15.500000000
123A
EAM
Full
2022-04-15 10:41:15.500000000
123A
EAM
Full
2022-05-26 10:41:15.500000000
123A
EAM
Full
2022-09-20 10:41:15.500000000
123A
EAM
Full
2022-10-05 10:41:15.500000000
This is the outcome I am looking for:
MID
Group
File Type
Create_Date
Rank
123A
EAM
Partial
2022-01-16 12:23:28.474000000
1
123A
EAM
Full
2022-03-01 10:41:15.500000000
2
123A
EAM
Full
2022-04-15 10:41:15.500000000
2
123A
EAM
Full
2022-05-26 10:41:15.500000000
2
123A
EAM
Full
2022-09-20 10:41:15.500000000
3
123A
EAM
Full
2022-10-05 10:41:15.500000000
3
The CTE will get you the rank number, it will give a 1 when the date is bigger as 1 month or a new group is in the database else it is 0 as no change of rank is needed
The Outer select then needs only to sum the numbers up
WITH CTE As (SELECT
[MID], [Group], [File Type], [Create_Date]
,CASE WHEN DATEDIFF(MONTH, LAG(Create_Date) OVER(PARTITION BY [Group] ORDER BY [Create_Date] ), [Create_Date]) > 1
OR DATEDIFF(MONTH, LAG(Create_Date) OVER(PARTITION BY [Group] ORDER BY [Create_Date] ), [Create_Date]) IS NULL then 1 ELSE 0 ENd new_number
FROM tab1)
SELECT
[MID], [Group], [File Type], [Create_Date],
SUM(new_number) OVER(PARTITION BY [Group] ORDER BY [Create_Date] ) [Rank]
FROM CTE
MID
Group
File Type
Create_Date
Rank
123A
EAM
Partial
2022-01-16 12:23:28.473
1
123A
EAM
Full
2022-03-01 10:41:15.500
2
123A
EAM
Full
2022-04-15 10:41:15.500
2
123A
EAM
Full
2022-05-26 10:41:15.500
2
123A
EAM
Full
2022-09-20 10:41:15.500
3
123A
EAM
Full
2022-10-05 10:41:15.500
3
fiddle
I have 2 query result tables containing records for different assessments. There are RAssessments and NAssessments which make up a complete review.
The aim is to eventually determine which reviews were completed. I would like to join the two tables on the ID, and on the date, HOWEVER the date each assessment is completed on may not be identical and may be several days apart, and some ID's may have more of an RAssessment than an NAssessment.
Therefore, I would like to join T1 on to T2 on ID & on T1Date(+ or - 7 days). There is no other way to match the two tables and to align the records other than using the date range, as this is a poorly designed database. I hope for some help with this as I am stumped.
Here is some sample data:
Table #1:
ID
RAssessmentDate
1
2020-01-03
1
2020-03-03
1
2020-05-03
2
2020-01-09
2
2020-04-09
3
2022-07-21
4
2020-06-30
4
2020-12-30
4
2021-06-30
4
2021-12-30
Table #2:
ID
NAssessmentDate
1
2020-01-07
1
2020-03-02
1
2020-05-03
2
2020-01-09
2
2020-07-06
2
2020-04-10
3
2022-07-21
4
2021-01-03
4
2021-06-28
4
2022-01-02
4
2022-06-26
I would like my end result table to look like this:
ID
RAssessmentDate
NAssessmentDate
1
2020-01-03
2020-01-07
1
2020-03-03
2020-03-02
1
2020-05-03
2020-05-03
2
2020-01-09
2020-01-09
2
2020-04-09
2020-04-10
2
NULL
2020-07-06
3
2022-07-21
2022-07-21
4
2020-06-30
NULL
4
2020-12-30
2021-01-03
4
2021-06-30
2021-06-28
4
2021-12-30
2022-01-02
4
NULL
2022-01-02
Try this:
SELECT
COALESCE(a.ID, b.ID) ID,
a.RAssessmentDate,
b.NAssessmentDate
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) RowId, *
FROM table1
) a
FULL OUTER JOIN (
SELECT
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) RowId, *
FROM table2
) b ON a.ID = b.ID AND a.RowId = b.RowId
WHERE (a.RAssessmentDate BETWEEN '2020-01-01' AND '2022-01-02')
OR (b.NAssessmentDate BETWEEN '2020-01-01' AND '2022-01-02')
I am trying to collapse data that is in a sequence sorted by date. While grouping on the person and the type.
The data is stored in an SQL server and looks like the following -
seq person date type
--- ------ ------------------- ----
1 1 2018-02-10 08:00:00 1
2 1 2018-02-11 08:00:00 1
3 1 2018-02-12 08:00:00 1
4 1 2018-02-14 16:00:00 1
5 1 2018-02-15 16:00:00 1
6 1 2018-02-16 16:00:00 1
7 1 2018-02-20 08:00:00 2
8 1 2018-02-21 08:00:00 2
9 1 2018-02-22 08:00:00 2
10 1 2018-02-23 08:00:00 1
11 1 2018-02-24 08:00:00 1
12 1 2018-02-25 08:00:00 2
13 2 2018-02-10 08:00:00 1
14 2 2018-02-11 08:00:00 1
15 2 2018-02-12 08:00:00 1
16 2 2018-02-14 16:00:00 3
17 2 2018-02-15 16:00:00 3
18 2 2018-02-16 16:00:00 3
This data set contains about 1.2 million records that resemble the above.
The result that I would like to get from this would be -
person start type
------ ------------------- ----
1 2018-02-10 08:00:00 1
1 2018-02-20 08:00:00 2
1 2018-02-23 08:00:00 1
1 2018-02-25 08:00:00 2
2 2018-02-10 08:00:00 1
2 2018-02-14 16:00:00 3
I have the data in the first format by running the following query -
select
ROW_NUMBER() OVER (ORDER BY date) AS seq
person,
date,
type,
from table
group by person, date, type
I am just not sure how to keep the minimum date with the other distinct values from person and type.
This is a gaps-and-islands problem so, you can use differences of row_number() & use them in grouping :
select person, min(date) as start, type
from (select *,
row_number() over (partition by person order by seq) seq1,
row_number() over (partition by person, type order by seq) seq2
from table
) t
group by person, type, (seq1 - seq2)
order by person, start;
The correct solution using the difference of row numbers is:
select person, type, min(date) as start
from (select t.*,
row_number() over (partition by person order by seq) as seqnum_p,
row_number() over (partition by person, type order by seq) as seqnum_pt
from t
) t
group by person, type, (seqnum_p - seqnum_pt)
order by person, start;
type needs to be included in the GROUP BY.
I'm struggling trying to get DENSE_RANK to do what I want it to do.
It is basically to create a unique invoice number based on a unique identifier, but it needs to go up in order based on the date/time of the invoice.
For example I need:
InvoiceNo TxnId TxnDate
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:01
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:02
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:03
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:04
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:05
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:06
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:07
1 6C952E91-B888-4244-9079-14FBECAE0BA2 02/01/2014 00:08
2 8A5BCC36-8A70-4BE1-9FAB-A33BDD5BB78F 02/02/2014 00:09
2 8A5BCC36-8A70-4BE1-9FAB-A33BDD5BB78F 02/02/2014 00:09
3 83168B53-1647-4EB9-AF17-0B285EAA69B4 03/03/2014 00:10
3 83168B53-1647-4EB9-AF17-0B285EAA69B4 03/03/2014 00:20
3 83168B53-1647-4EB9-AF17-0B285EAA69B4 03/03/2014 00:21
3 83168B53-1647-4EB9-AF17-0B285EAA69B4 03/03/2014 00:23
But what I get when using DENSE_RANK OVER (Order by TxnId) is:
InvoiceNo TxnId TxnDate
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:02
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:01
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:03
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:04
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:06
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:05
1 6C952E91-B888-4244-9079-14FBECAE0BA2 02/01/2014 00:08
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:07
2 83168B53-1647-4EB9-AF17-0B285EAA69B4 03/03/2014 00:10
2 83168B53-1647-4EB9-AF17-0B285EAA69B4 03/03/2014 00:21
2 83168B53-1647-4EB9-AF17-0B285EAA69B4 03/03/2014 00:20
2 83168B53-1647-4EB9-AF17-0B285EAA69B4 03/03/2014 00:23
3 8A5BCC36-8A70-4BE1-9FAB-A33BDD5BB78F 02/02/2014 00:09
3 8A5BCC36-8A70-4BE1-9FAB-A33BDD5BB78F 02/02/2014 00:09
If I do DENSE_RANK OVER(TxnId,TxnDate), it is a complete mess and doesn't do what I want either.
Any ideas guys? Am I even using the write function to do this? Any help appreciated :)
I think you want:
select dense_rank() over (order by txnid, txndate)
Everything with the same transaction id and date will have the same value.
EDIT:
If you need to extract the date, then that depends on the database. It would look something like this. For Oracle:
select dense_rank() over (order by txnid, trunc(txndate))
For Postgres:
select dense_rank() over (order by txnid, date_trunc('day', txndate))
For SQL Server:
select dense_rank() over (order by txnid, cast(txndate as date))
EDIT II:
You want the transactions ordered by the earliest date. Get the earliest date and then do the dense_rank():
select dense_rank() over (order by txnmindate, txnid)
from (select t.*, min(txndate) over (partition by txnid) as txnmindate
from table t
) t
I have two SQL Server tables containing the following information:
Table t_venues:
venue_id is unique
venue_id | start_date | end_date
1 | 01/01/2014 | 02/01/2014
2 | 05/01/2014 | 05/01/2014
3 | 09/01/2014 | 15/01/2014
4 | 20/01/2014 | 30/01/2014
Table t_venueuser:
venue_id is not unique
venue_id | start_date | end_date
1 | 02/01/2014 | 02/01/2014
2 | 05/01/2014 | 05/01/2014
3 | 09/01/2014 | 10/01/2014
4 | 23/01/2014 | 25/01/2014
From these two tables I need to find the dates that haven't been selected for each range, so the output would look like this:
venue_id | start_date | end_date
1 | 01/01/2014 | 01/01/2014
3 | 11/01/2014 | 15/01/2014
4 | 20/01/2014 | 22/01/2014
4 | 26/01/2014 | 30/01/2014
I can compare the two tables and get the date ranges from t_venues to appear in my query using 'except' but I can't get the query to produce the non-selected dates. Any help would be appreciated.
Calendar Table!
Another perfect candidate for a calendar table. If you can't be bothered to search for one, here's one I made earlier.
Setup Data
DECLARE #t_venues table (
venue_id int
, start_date date
, end_date date
);
INSERT INTO #t_venues (venue_id, start_date, end_date)
VALUES (1, '2014-01-01', '2014-01-02')
, (2, '2014-01-05', '2014-01-05')
, (3, '2014-01-09', '2014-01-15')
, (4, '2014-01-20', '2014-01-30')
;
DECLARE #t_venueuser table (
venue_id int
, start_date date
, end_date date
);
INSERT INTO #t_venueuser (venue_id, start_date, end_date)
VALUES (1, '2014-01-02', '2014-01-02')
, (2, '2014-01-05', '2014-01-05')
, (3, '2014-01-09', '2014-01-10')
, (4, '2014-01-23', '2014-01-25')
;
The Query
SELECT t_venues.venue_id
, calendar.the_date
, CASE WHEN t_venueuser.venue_id IS NULL THEN 1 ELSE 0 END As is_available
FROM dbo.calendar /* see: http://gvee.co.uk/files/sql/dbo.numbers%20&%20dbo.calendar.sql for an example */
INNER
JOIN #t_venues As t_venues
ON t_venues.start_date <= calendar.the_date
AND t_venues.end_date >= calendar.the_date
LEFT
JOIN #t_venueuser As t_venueuser
ON t_venueuser.venue_id = t_venues.venue_id
AND t_venueuser.start_date <= calendar.the_date
AND t_venueuser.end_date >= calendar.the_date
ORDER
BY t_venues.venue_id
, calendar.the_date
;
The Result
venue_id the_date is_available
----------- ----------------------- ------------
1 2014-01-01 00:00:00.000 1
1 2014-01-02 00:00:00.000 0
2 2014-01-05 00:00:00.000 0
3 2014-01-09 00:00:00.000 0
3 2014-01-10 00:00:00.000 0
3 2014-01-11 00:00:00.000 1
3 2014-01-12 00:00:00.000 1
3 2014-01-13 00:00:00.000 1
3 2014-01-14 00:00:00.000 1
3 2014-01-15 00:00:00.000 1
4 2014-01-20 00:00:00.000 1
4 2014-01-21 00:00:00.000 1
4 2014-01-22 00:00:00.000 1
4 2014-01-23 00:00:00.000 0
4 2014-01-24 00:00:00.000 0
4 2014-01-25 00:00:00.000 0
4 2014-01-26 00:00:00.000 1
4 2014-01-27 00:00:00.000 1
4 2014-01-28 00:00:00.000 1
4 2014-01-29 00:00:00.000 1
4 2014-01-30 00:00:00.000 1
(21 row(s) affected)
The Explanation
Our calendar tables contains an entry for every date.
We join our t_venues (as an aside, if you have the choice, lose the t_ prefix!) to return every day between our start_date and end_date. Example output for venue_id=4 for just this join:
venue_id the_date
----------- -----------------------
4 2014-01-20 00:00:00.000
4 2014-01-21 00:00:00.000
4 2014-01-22 00:00:00.000
4 2014-01-23 00:00:00.000
4 2014-01-24 00:00:00.000
4 2014-01-25 00:00:00.000
4 2014-01-26 00:00:00.000
4 2014-01-27 00:00:00.000
4 2014-01-28 00:00:00.000
4 2014-01-29 00:00:00.000
4 2014-01-30 00:00:00.000
(11 row(s) affected)
Now we have one row per day, we [outer] join our t_venueuser table. We join this in much the same manner as before, but with one added twist: we need to join based on the venue_id too!
Running this for venue_id=4 gives this result:
venue_id the_date t_venueuser_venue_id
----------- ----------------------- --------------------
4 2014-01-20 00:00:00.000 NULL
4 2014-01-21 00:00:00.000 NULL
4 2014-01-22 00:00:00.000 NULL
4 2014-01-23 00:00:00.000 4
4 2014-01-24 00:00:00.000 4
4 2014-01-25 00:00:00.000 4
4 2014-01-26 00:00:00.000 NULL
4 2014-01-27 00:00:00.000 NULL
4 2014-01-28 00:00:00.000 NULL
4 2014-01-29 00:00:00.000 NULL
4 2014-01-30 00:00:00.000 NULL
(11 row(s) affected)
See how we have a NULL value for rows where there is no t_venueuser record. Genius, no? ;-)
So in my first query I gave you a quick CASE statement that shows availability (1=available, 0=not available). This is for illustration only, but could be useful to you.
You can then either wrap the query up and then apply an extra filter on this calculated column or simply add a where clause in: WHERE t_venueuser.venue_id IS NULL and that will do the same trick.
This is a complete hack, but it gives the results you require, I've only tested it on the data you provided so there may well be gotchas with larger sets.
In general what you are looking at solving here is a variation of gaps and islands problem ,this is (briefly) a sequence where some items are missing. The missing items are referred as gaps and the existing items are referred as islands. If you would like to understand this issue in general check a few of the articles:
Simple talk article
blogs.MSDN article
SO answers tagged gaps-and-islands
Code:
;with dates as
(
SELECT vdates.venue_id,
vdates.vdate
FROM ( SELECT DATEADD(d,sv.number,v.start_date) vdate
, v.venue_id
FROM t_venues v
INNER JOIN master..spt_values sv
ON sv.type='P'
AND sv.number BETWEEN 0 AND datediff(d, v.start_date, v.end_date)) vdates
LEFT JOIN t_venueuser vu
ON vdates.vdate >= vu.start_date
AND vdates.vdate <= vu.end_date
AND vdates.venue_id = vu.venue_id
WHERE ISNULL(vu.venue_id,-1) = -1
)
SELECT venue_id, ISNULL([1],[2]) StartDate, [2] EndDate
FROM (SELECT venue_id, rDate, ROW_NUMBER() OVER (PARTITION BY venue_id, DateType ORDER BY rDate) AS rType, DateType as dType
FROM( SELECT d1.venue_id
,d1.vdate AS rDate
,'1' AS DateType
FROM dates AS d1
LEFT JOIN dates AS d0
ON DATEADD(d,-1,d1.vdate) = d0.vdate
LEFT JOIN dates AS d2
ON DATEADD(d,1,d1.vdate) = d2.vdate
WHERE CASE ISNULL(d2.vdate, '01 Jan 1753') WHEN '01 Jan 1753' THEN '2' ELSE '1' END = 1
AND ISNULL(d0.vdate, '01 Jan 1753') = '01 Jan 1753'
UNION
SELECT d1.venue_id
,ISNULL(d2.vdate,d1.vdate)
,'2'
FROM dates AS d1
LEFT JOIN dates AS d2
ON DATEADD(d,1,d1.vdate) = d2.vdate
WHERE CASE ISNULL(d2.vdate, '01 Jan 1753') WHEN '01 Jan 1753' THEN '2' ELSE '1' END = 2
) res
) src
PIVOT (MIN (rDate)
FOR dType IN
( [1], [2] )
) AS pvt
Results:
venue_id StartDate EndDate
1 2014-01-01 2014-01-01
3 2014-01-11 2014-01-15
4 2014-01-20 2014-01-22
4 2014-01-26 2014-01-30