Find the start and end date (set based) in T-SQL - sql

I have the below.
Name Date
A 2011-01-01 01:00:00.000
A 2011-02-01 02:00:00.000
A 2011-03-01 03:00:00.000
B 2011-04-01 04:00:00.000
A 2011-05-01 07:00:00.000
The desired output is
Name StartDate EndDate
-------------------------------------------------------------------
A 2011-01-01 01:00:00.000 2011-04-01 04:00:00.000
B 2011-04-01 04:00:00.000 2011-05-01 07:00:00.000
A 2011-05-01 07:00:00.000 NULL
How to achieve the same using TSQL in a set based approach.
DDL is as under
DECLARE #t TABLE(PersonName VARCHAR(32), [Date] DATETIME)
INSERT INTO #t VALUES('A', '2011-01-01 01:00:00')
INSERT INTO #t VALUES('A', '2011-01-02 02:00:00')
INSERT INTO #t VALUES('A', '2011-01-03 03:00:00')
INSERT INTO #t VALUES('B', '2011-01-04 04:00:00')
INSERT INTO #t VALUES('A', '2011-01-05 07:00:00')
Select * from #t

;WITH cte1
AS (SELECT *,
ROW_NUMBER() OVER (ORDER BY Date) -
ROW_NUMBER() OVER (PARTITION BY PersonName
ORDER BY Date) AS G
FROM #t),
cte2
AS (SELECT PersonName,
MIN([Date]) StartDate,
ROW_NUMBER() OVER (ORDER BY MIN([Date])) AS rn
FROM cte1
GROUP BY PersonName,
G)
SELECT a.PersonName,
a.StartDate,
b.StartDate AS EndDate
FROM cte2 a
LEFT JOIN cte2 b
ON a.rn + 1 = b.rn
Because the result of CTEs are not generally materialised however
you may well find you get better performance if you materialize the
intermediate result yourself as below.
DECLARE #t2 TABLE (
rn INT IDENTITY(1, 1) PRIMARY KEY,
PersonName VARCHAR(32),
StartDate DATETIME );
INSERT INTO #t2
SELECT PersonName,
MIN([Date]) StartDate
FROM (SELECT *,
ROW_NUMBER() OVER (ORDER BY Date) -
ROW_NUMBER() OVER (PARTITION BY PersonName
ORDER BY Date) AS G
FROM #t) t
GROUP BY PersonName,
G
ORDER BY StartDate
SELECT a.PersonName,
a.StartDate,
b.StartDate AS EndDate
FROM #t2 a
LEFT JOIN #t2 b
ON a.rn + 1 = b.rn

SELECT
PersonName,
StartDate = MIN(Date),
EndDate
FROM (
SELECT
PersonName,
Date,
EndDate = (
/* get the earliest date after current date
associated with a different person */
SELECT MIN(t1.Date)
FROM #t AS t1
WHERE t1.Date > t.Date
AND t1.PersonName <> t.PersonName
)
FROM #t AS t
) s
GROUP BY PersonName, EndDate
ORDER BY 2
Basically, for every Date we find the nearest date after it such that is associated with a different PersonName. That gives us EndDate, which now distinguishes for us consecutive groups of dates for the same person.
Now we only need to group the data by PersonName & EndDate and get the minimal Date in every group as StartDate. And yes, sort the data by StartDate, of course.

Get a row number so you will know where the previous record is. Then, take a record and the next record after it. When the state changes we have a candidate row.
select
state,
min(start_timestamp),
max(end_timestamp)
from
(
select
first.state,
first.timestamp_ as start_timestamp,
second.timestamp_ as end_timestamp
from
(
select
*, row_number() over (order by timestamp_) as id
from test
) as first
left outer join
(
select
*, row_number() over (order by timestamp_) as id
from test
) as second
on
first.id = second.id - 1
and first.state != second.state
) as agg
group by state
having max(end_timestamp) is not null
union
-- last row wont have a ending row
--(select state, timestamp_, null from test order by timestamp_ desc limit 1)
-- I think it something like this for sql server
(select top state, timestamp_, null from test order by timestamp_ desc)
order by 2
;
Tested with PostgreSQL but should work with SQL Server as well

The other answer with the cte is a good one. Another option would be to iterate over the collection in any case. It's not set based, but it is another way to do it.
You will need to iterate to either A. assign a unique id to each record that corresponds to its transaction, or B. to actually get your output.
TSQL is not ideal for iterating over records, especially if you have a lot, and so I would recommend some other way of doing it, a small .net program or something that is better at iterating.

There's a very quick way to do this using a bit of Gaps and Islands theory:
WITH CTE as (SELECT PersonName, [Date]
, Row_Number() over (ORDER BY [Date])
- Row_Number() over (ORDER BY PersonName, [Date]) as Island
FROM #t)
Select PersonName, Min([Date]), Max([Date])
from CTE
GROUP BY Island, PersonName
ORDER BY Min([Date])

Related

How to get Date Range which is matching a criteria

I can get the desired output by using while loop but since original table has thousands of record, performance is very slow.
How can I get the desired results using Common Table Expression?
Thank You.
This will produce the desired results. Not as elegant as Gordon's, but it does allow for gaps in dates and dupicate dates.
If you have a Calendar/Tally Table, the cte logic can be removed.
Example
Declare #YourTable Table ([AsOfDate] Date,[SecurityID] varchar(50),[IsHeld] bit)
Insert Into #YourTable Values
('2017-05-19','S1',1)
,('2017-05-20','S1',1)
,('2017-05-21','S1',1)
,('2017-05-22','S1',1)
,('2017-05-23','S1',0)
,('2017-05-24','S1',0)
,('2017-05-25','S1',0)
,('2017-05-26','S1',1)
,('2017-05-27','S1',1)
,('2017-05-28','S1',1)
,('2017-05-29','S1',0)
,('2017-05-30','S1',0)
,('2017-05-31','S1',1)
;with cte1 as ( Select D1=min(AsOfDate),D2=max(AsOfDate) From #YourTable )
,cte2 as (
Select Top (DateDiff(DAY,(Select D1 from cte1),(Select D2 from cte1))+1)
D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),(Select D1 from cte1))
,R=Row_Number() over (Order By (Select Null))
From master..spt_values n1,master..spt_values n2
)
Select [SecurityID]
,[StartDate] = min(D)
,[EndDate] = max(D)
From (
Select *,Grp = dense_rank() over (partition by securityId order by asofdate )-R
From #YourTable A
Join cte2 B on AsOfDate=B.D
Where IsHeld=1
) A
Group By [SecurityID],Grp
Order By min(D)
Returns
SecurityID StartDate EndDate
S1 2017-05-19 2017-05-22
S1 2017-05-26 2017-05-28
S1 2017-05-31 2017-05-31
This is a variant of the gaps-and-islands problem. In this case, you can use date arithmetic to calculate the rows with adjacent dates:
select securityId, isheld, min(asofdate), max(asofdate)
from (select t.*,
datediff(day,
- row_number() over (partition by securityId, isheld
order by asofdate
),
asofdate) as grp
from t
) t
group by grp, securityId, isheld;
Note: This assumes that the dates are contiguous and have no duplicates. The query can be modified to take those factors into account.
The basic idea is that if you have a sequence of days that are increasing one at a time, then you can subtract a sequence of values and get a constant. That is what grp is. The rest is just aggregation.

Day after max date in data

I am loading data into a table. I don't have any info on how frequent or when the source data is loaded, all I know is I need data from the source to run my script.
Here's the issue, if I run max(date) I get the latest date from the source, but I don't know if the data is still loading. I've ran into cases where I've only gotten a percentage of the data. Thus, I need the next business day after max date.
I want to know is there a way to get the second latest date in the system. I know I can get max(date) - 1, but that give me literally the day after. I don't need the literal day after.
Example, if I run the script on Tuesday, max(date) will be Monday, but since weekend are not in the source system, I need to get Friday instead of Monday.
DATE
---------
2017-04-29
2017-04-25
2017-04-21
2017-04-19
2017-04-18
2017-04-15
2017-04-10
max(date) = 2017-04-29
how do I get 2017-04-25?
Depending on your version of SQL Server, you can use a windowing function like row_number:
select [Date]
from
(
select [Date],
rn = row_number() over(order by [Date] desc)
from #yourtable
) d
where rn = 2
Here is a demo.
Should you have multiple of the same date, you can perform a distinct first:
;with cte as
(
select distinct [date]
from #yourtable
)
select [date]
from
(
select [date],
rn = row_number() over(order by [date] desc)
from cte
) x
where rn = 2;
You can use row_number and get second as below
select * from ( select *, Rown= row_number() over (order by date desc) from yourtable ) a
where a.RowN = 2
More recent SQL Server versions support FETCH FIRST:
select date
from tablename
order by date desc
offset 1 fetch first 1 row only
OFFSET 1 means skip one row. (The 2017-04-29 row.)
;With cte([DATE])
AS
(
SELECT '2017-04-29' union all
SELECT '2017-04-25' union all
SELECT '2017-04-21' union all
SELECT '2017-04-19' union all
SELECT '2017-04-18' union all
SELECT '2017-04-15' union all
SELECT '2017-04-10'
)
SELECT [DATE] FROM
(
SELECT *,ROW_NUMBER()OVER(ORDER BY Seq)-1 As Rno FROM
(
SELECT *,MAX([DATE])OVER(ORDER BY (SELECT NULL))Seq FROM cte
)dt
)Final
WHERE Final.Rno=1
OutPut
DATE
-----
2017-04-25
You can also use FIRST_VALUE with a dynamic date something like DATEADD(DD, -1, GETDATE()). The example below has the date hard coded.
SELECT DISTINCT
FIRST_VALUE([date]) OVER(ORDER BY [date] DESC) AS FirstDate
FROM CTE
WHERE [date] < '2017-04-25'
Another way
DECLARE #T TABLE ([DATE] DATE)
INSERT INTO #T VALUES
('2017-04-29'),
('2017-04-25'),
('2017-04-21'),
('2017-04-19'),
('2017-04-18'),
('2017-04-15'),
('2017-04-10');
SELECT
MAX([DATE]) AS [DATE]
FROM #T
WHERE DATENAME(DW,[DATE]) NOT IN ('Saturday','Sunday')
Another way of doing it, just for example sake...
SELECT MIN(A.date)
FROM
(
SELECT TOP 2 DISTINCT date
FROM YourTable AS C
ORDER BY date DESC
) AS A

Convert a list of dates to date ranges in SQL Server

I have a query as following:
SELECT [Date] FROM [TableX] ORDER BY [Date]
The result is:
2016-06-01
2016-06-03
2016-06-10
2016-06-11
How can I get following pairs?
From To
2016-06-01 2016-06-03
2016-06-03 2016-06-10
2016-06-10 2016-06-11
If you're using SQL Server 2012 or later, you can use the LEAD method.
Accesses data from a subsequent row in the same result set without the use of a self-join in SQL Server 2016. LEAD provides access to a row at a given physical offset that follows the current row.
I think it would look like this for you:
SELECT [Date] AS [From], LEAD([Date], 1) OVER (ORDER BY [Date]) AS [To]
FROM TableX
ORDER BY [Date]
Note that on the last row, the [To] field will be NULL. If you wanted to remove that row, you could put it in an inner query:
SELECT *
FROM
(
SELECT [Date] AS [From], LEAD([Date], 1) OVER (ORDER BY [Date]) AS [To]
FROM TableX
) x
WHERE [To] IS NOT NULL
All you need to do is add a row number for each date.
Then unite all these rows by the next row (except the last row)
WITH cteDates AS
(
SELECT [Date],
ROW_NUMBER() OVER (ORDER BY (SELECT [Date])) As RowNum
FROM TableX
)
SELECT TOP(SELECT COUNT(*) - 1 FROM cteDates)
[Date] [From],
(SELECT [Date] FROM cteDates WHERE RowNum = d.RowNum + 1) [To]
FROM cteDates d
A little tricky solution for SQL 2008.
declare #tbl table(dt datetime)
insert #tbl values
('2016-06-01'),
('2016-06-03'),
('2016-06-10'),
('2016-06-11')
;with cte as (
select dt, ROW_NUMBER() over(order by dt) rn --add number
from #tbl
),
newTbl as (
select t1.dt start, t2.dt [end]
from cte t1 inner join cte t2 on t1.rn+1=t2.rn
)
select *
from newTbl
The result is what you wish.
Since there are never any gaps as you stated, you can just used DATEADD()
SELECT DISTINCT
[Date] as [FROM],
DATEADD(DAY,1,[Date]) as [TO]
FROM TableX
ORDER BY [Date] DESC

select current date, if it doesnt exist then look for a future one else take the highest from the past

I have some data with multipe dates across some rows in a table.
what i want to do is get a date thats currently active if there is no active date then i want to take a future one, if that one also doesnt exist then il take one from the past.
table: date_from datetime, date_to datetime, userid varchar
2016-01-01 2016-03-25 Bob
2016-03-26 2016-05-01 Bob
2016-05-02 2016-04-25 Bob
2016-01-01 2016-03-25 Larry
2016-05-02 2016-04-25 Larry
2016-01-01 2016-03-25 Todd
For Bob i want to get the date_from value 2016-03-26
While for Larry i want to get 2016-05-02
And Todd 2016-01-01
Here is my sql so far (this also gets the most recent date_from where the date_to from the last row datediff is greater than a variable)
insert into table1 (date_from, resource_id)
select date_from, resource_id
from
(select t.*, row_number() over
(partition by resource_id order by
(case when datediff(day, prev_date_to, date_from) > $days
then 1 else 2 end), date_from ASC ) as seqnum
from
(select t.*, lag(date_to) over
(partition by resource_id order by date_from) as prev_date_to
from table2 t where user = '$user' and date_from <= getdate()
) t
) t where seqnum = 1
I know how to check for the current one and if it doesnt exist get a past one or i can make it get a future one, but i dont understand how to make it check for future dates then go backwards if there are none
I think you are looking for something like this?
DECLARE #table TABLE (
date_from DATE,
date_to DATE,
userid VARCHAR(50));
INSERT INTO #table SELECT '2016-01-01', '2016-03-25', 'Bob';
INSERT INTO #table SELECT '2016-03-26', '2016-05-01', 'Bob';
INSERT INTO #table SELECT '2016-05-02', '2016-04-25', 'Bob';
INSERT INTO #table SELECT '2016-01-01', '2016-03-25', 'Larry';
INSERT INTO #table SELECT '2016-05-02', '2016-04-25', 'Larry';
INSERT INTO #table SELECT '2016-01-01', '2016-03-25', 'Todd';
WITH Active AS (
SELECT * FROM #table WHERE GETDATE() BETWEEN date_from AND date_to),
Latest AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY userid ORDER BY date_from DESC) AS row_id FROM #table)
SELECT
l.userid,
ISNULL(a.date_from, l.date_from) AS date_from
FROM
Latest l
LEFT JOIN Active a ON a.userid = l.userid
WHERE
l.row_id = 1;
Results are:
userid date_from
Bob 2016-03-26
Larry 2016-05-02
Todd 2016-01-01
How does it work? I am just breaking down the problem into two steps, one to find any current records and one to find the latest record for each user. The logic is then:
if there is a current record for this user then return the start date of this record;
if there isn't a current record for this user then return the latest start date for the user. This will either be in the future or the past. Because we take the latest date we will prioritise future over past.
This won't work if a user has multiple current records, and I am not sure I fully grasped your meaning of a "current record".
Obviously, if you don't like common table expressions, you could convert this to work with sub queries (as in your example).
Anyway, it might give you a starting point?
EDIT
Okay, I just noticed your logic is that you want (in priority order):
a current record;
the earliest future record;
the latest historical record.
This means a few tweaks, so here goes:
DECLARE #table TABLE (
date_from DATE,
date_to DATE,
userid VARCHAR(50));
INSERT INTO #table SELECT '2016-01-01', '2016-03-25', 'Bob';
INSERT INTO #table SELECT '2016-03-26', '2016-05-01', 'Bob';
INSERT INTO #table SELECT '2016-05-02', '2016-04-25', 'Bob';
INSERT INTO #table SELECT '2016-01-01', '2016-03-25', 'Larry';
INSERT INTO #table SELECT '2016-05-02', '2016-04-25', 'Larry';
INSERT INTO #table SELECT '2016-01-01', '2016-03-25', 'Todd';
WITH Active AS (
SELECT * FROM #table WHERE GETDATE() BETWEEN date_from AND date_to),
LatestPast AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY userid ORDER BY date_from DESC) AS row_id FROM #table WHERE date_to < GETDATE()),
EarliestFuture AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY userid ORDER BY date_from ASC) AS row_id FROM #table WHERE date_from > GETDATE()),
Users AS (
SELECT DISTINCT userid FROM #table)
SELECT
u.userid,
COALESCE(a.date_from, f.date_from, p.date_from) AS date_from
FROM
Users u
LEFT JOIN LatestPast p ON p.userid = u.userid AND p.row_id = 1
LEFT JOIN EarliestFuture f ON f.userid = u.userid AND f.row_id = 1
LEFT JOIN Active a ON a.userid = u.userid;

Filling table with datetime's incremented by one second each

I have a MyDatabase.MyTable.DateCol with few thousand rows, which I want to fill up with datetime. I want each date to be bigger than the previous one by 1 second. How can I do that?
Sample Table
CREATE Table DateTable
(ID INT IDENTITY(1,1),Name NVARCHAR(300), Data Datetime)
GO
Test Data
INSERT INTO DateTable (Name)
VALUES ('John'),('Mark'),('Phil'),('Simon'),('Sam'),('Pete'),('Josh')
GO
Query
;WITH CTE
AS
(
SELECT *, rn = ROW_NUMBER() OVER (ORDER BY ID ASC) FROM DateTable
)
UPDATE CTE
SET Data = DATEADD(SECOND, CTE.rn, GETDATE())
Result Set
SELECT * FROM DateTable
ID Name Data
1 John 2013-11-06 20:34:59.310
2 Mark 2013-11-06 20:35:00.310
3 Phil 2013-11-06 20:35:01.310
4 Simon 2013-11-06 20:35:02.310
5 Sam 2013-11-06 20:35:03.310
6 Pete 2013-11-06 20:35:04.310
7 Josh 2013-11-06 20:35:05.310
Not quite sure of your ordering criteria, but you can use:
MIN(DateCol) OVER()
To get the first date, and
ROW_NUMBER() OVER(ORDER BY DateCol, ID)
To get the number of seconds to add (your ordering criteria may be different). Then combine the two to update a common table expression:
WITH CTE AS
( SELECT ID,
DateCol,
NewDate = DATEADD(SECOND,
MIN(DateCol) OVER(),
ROW_NUMBER() OVER(ORDER BY DateCol, ID))
FROM MyDatabase.MyTable
)
UPDATE CTE
SET DateCol = NewDate;
If you have no dates in your column then you can just enter a start date (GETDATE() below):
WITH CTE AS
( SELECT ID,
DateCol,
NewDate = DATEADD(SECOND,
GETDATE(),
ROW_NUMBER() OVER(ORDER BY DateCol, ID))
FROM MyDatabase.MyTable
)
UPDATE CTE
SET DateCol = NewDate;