SQL query to get number of days between dates - sql

I have a table in SQL where one column contains lots of IDS some of which are duplicate, a second column with lots of dates all different
I want to create a table where i have unique list of IDS down the left hand side of the table, a column called number of days and then in the table the number of days between the last date and the second to last date that are in the table.

Using Row_Number() in concert with a conditional aggregation.
Example
Declare #YourTable Table ([ID] varchar(50),[DateCol] date)
Insert Into #YourTable Values
(1,'2017-05-01')
,(1,'2017-05-05')
,(1,'2017-05-10') -- 2nd
,(1,'2017-05-31') -- Lst 21 days
,(2,'2017-05-01')
,(2,'2017-06-05')
,(2,'2017-06-10') -- 2nd
,(2,'2017-07-31') -- Lst 51 days
Select ID
,Days = DateDiff(DAY,max(case when RN=2 then DateCol end),max(case when RN=1 then DateCol end))
From (
Select *
,RN = Row_Number() over (Partition By ID Order by DateCol Desc)
from #YourTable
) A
Group By ID
Returns
ID Days
1 21
2 51

Here I take the MAX(DateCol) that is less than the subqueried MAX(DateCol) with the same ID- this is used in the DateDiff.
SELECT dT.ID
,(SELECT MAX(DateCol)
FROM #YourTable Y
WHERE Y.DateCol < dT.MaxDateCol --Date less than MAX
AND Y.ID = dT.ID
) AS [Date less than MAX] --added by request
,dT.MaxDateCol AS [Max Date] --added by request
,DATEDIFF(DAY, (SELECT MAX(DateCol)
FROM #YourTable Y
WHERE Y.DateCol < dT.MaxDateCol --Date less than MAX
AND Y.ID = dT.ID
)
, dT.MaxDateCol
) AS [Days Difference] --datediff between max and date less than MAX
FROM (
SELECT ID
,MAX([DateCol]) [MaxDateCol] --MAX, removes duplicate ID's.
FROM #YourTable
GROUP BY ID
) AS dT

Related

How to insert values based on another column value

Below is a subset of my table (for the first id)
date
id
value
01/01/2022
1
5
08/01/2022
1
2
For each id, the dates are not consecutive (e.g., for id 1, the min date is 01/01/2022 and the max date is 08/01/2022)--there are 7 days in between both dates. I want to insert rows to make the dates for each id consecutive and contiguous - the value for the value field/column to be filled with 0s so that the updated table looks like:
date
id
value
01/01/2022
1
5
02/01/2022
1
0
03/01/2022
1
0
04/01/2022
1
0
05/01/2022
1
0
06/01/2022
1
0
07/01/2022
1
0
08/01/2022
1
2
Any SQL code on how to implement this would be highly appreciated. I have a calendar table but am unsure how to join it with the above table so that I fill in missing dates dynamically for each id with 0s.
My calendar table looks like:
date
01/01/2022
02/01/2022
03/01/2022
04/01/2022
Considering you state you have a calendar table, it seems what you need to do with JOIN to it with the MIN and MAX dates from your other table, and the LEFT JOIN back to your table:
WITH MinMax AS(
SELECT ID,
MIN(date) AS MinDate,
MAX(date) AS MaxDate
FROM dbo.YourTable
GROUP BY ID),
Dates AS(
SELECT MM.ID,
C.CalendarDate AS [Date]
FROM MinMax MM
JOIN dbo.CalendarTable C ON MM.MinDate <= C.CalendarDate
AND MM.MaxDate >= C.CalendarDate)
SELECT D.ID,
D.[Date],
ISNULL(YT.[Value],0) AS [Value]
FROM Dates D
LEFT JOIN dbo.YourTable YT ON D.ID = YT.ID
AND D.[Date] = YT.[Date];
SET DATEFORMAT DMY
-- CREATE A TABLE WITH OUR INPUT DATA
DROP TABLE IF EXISTS #TheData
GO
CREATE TABLE #TheData
(TheDate DATE, id INT, TheValue INT)
INSERT INTO #TheData
(TheDate,id,Thevalue)
VALUES
('01/01/2022',1,5),
('08/01/2022',1,2),
('17/01/2022',2,7),
('25/01/2022',2,7),
('15/02/2022',2,7)
-- CREATE A CALENDAR CTE
DECLARE #StartDate date = '20210101';
DECLARE #CutoffDate date = DATEADD(DAY, -1, DATEADD(YEAR, 2, #StartDate));
;WITH DateSeq(TheDate) AS
(
SELECT #StartDate
UNION ALL
SELECT DATEADD(dd,1,TheDate) FROM DateSeq
WHERE TheDate < #CutoffDate
)
-- CROSS JOIN OUR CALENDAR CTE TO OUR SOURCE DATA. DERIVED TABLE TO GET FIRST AND LAST OF EACH RANGE TO USE FOR JOIN
SELECT
ds.*
,SourceDataRangesByID.ID
,ISNULL(td.TheValue,0) AS TheValue
FROM
DateSeq ds
CROSS JOIN
(
SELECT
d.ID
,MIN(d.TheDate) AS MinDatePerID
,MAX(d.TheDate) AS MaxDatePerID
FROM #TheData d
GROUP BY d.ID
) SourceDataRangesByID
LEFT JOIN #TheData td ON td.id = SourceDataRangesByID.ID AND td.TheDate = ds.TheDate
WHERE ds.TheDate >= SourceDataRangesByID.MinDatePerID
AND ds.TheDate <= SourceDataRangesByID.MaxDatePerID
OPTION (MAXRECURSION 0);
try the generate_series to create a date table then right join with it and coalesce for the non null value
SELECT generate_series('2016-01-01', -- series start date
'2018-06-30', -- series end date
'1 day'::interval)::date AS day) AS daily_series
from mytable
See Generate_Series for TSQL
https://dba.stackexchange.com/questions/255165/does-ms-sql-server-have-generate-series-function
(Sql server 2022)
https://learn.microsoft.com/en-us/sql/t-sql/functions/generate-series-transact-sql?view=sql-server-ver16

Finding the maximum value in a 24h period SQL

In need of some help :)
So I have a table of records with the following columns:
Key (PK, FK, int)
DT (smalldatetime)
Value (real)
The DT is a datetime for every half hour of the day with an associated value
E.g.
Key DT VALUE
1000 2010-01-01 08:00:00 80
1000 2010-01-01 08:30:00 75
1000 2010-01-01 09:00:00 100
I need to find the max value and associated DT for every 24 hour period. for a particular key and date range
Currently I have:
SELECT CAST(LEFT([DT],11) as smalldatetime) as 'DATE'
,max([VALUE]) as 'MAX_HH'
FROM TABLE 1
WHERE DT > '6-nov-2016' and [KEY] = '1000'
GROUP BY CAST(LEFT([DT],11) as smalldatetime)
ORDER BY 'DATE'
But this returns the max values for the date e.g.
Key DT VALUE
1000 2010-01-01 00:00:00 100
Any ideas on how to pull the full DT ?
Thanks guys!
Assuming you're using a database with support for windowed functions, we can use ROW_NUMBER() (or RANK if you want to support/pull in values that are tied for first place):
declare #t table ([Key] int not null , DT smalldatetime not null, Value int not null)
insert into #t([Key],DT,VALUE) values
(1000,'2010-01-01T08:00:00',80 ),
(1000,'2010-01-01T08:30:00',75 ),
(1000,'2010-01-01T09:00:00',100)
;With Numbered as (
select *,
ROW_NUMBER() OVER (PARTITION BY [Key],CAST(DT as date) ORDER BY Value desc) as rn
from #t
)
select * from Numbered
where rn=1
Damien's answer is very good, if you can't (or want) to use windowed function, try this:
SELECT T1.*
FROM TABLE_1 AS T1
INNER JOIN (
SELECT CAST([DT] as date) as 'DATE'
, MAX([VALUE]) as 'MAX_HH'
FROM TABLE_1
WHERE DT > '6-nov-2016' and [KEY] = '1000'
GROUP BY CAST([DT] as date)
) AS MAX_DT
ON MAX_DT.[DATE] = CAST(T1.[DT] as date)
AND T1.VALUE = MAX_DT.MAX_HH
WHERE DT > '6-nov-2016' and [KEY] = '1000'
ORDER BY DT
By the way, it's best not to use reserved keywords as object names (i.e. date)

SELECT DateTime not in SQL

I have the following table:
oDateTime pvalue
2017-06-01 00:00:00 70
2017-06-01 01:00:00 65
2017-06-01 02:00:00 90
ff.
2017-08-01 08:00:00 98
The oDateTime field is an hourly data which is impossible to have a duplicate value.
My question is, how can I know if the oDateTime data is correct? I meant, I need to make sure the data is not jump? It should be always 'hourly' base.
Am I missing the date? Am I missing the time?
Please advice. Thank you.
Based on this answer, you can get the missing times form your table MyLogTable it like this:
DECLARE #StartDate DATETIME = '20170601', #EndDate DATETIME = '20170801'
SELECT DATEADD(hour, nbr - 1, #StartDate)
FROM ( SELECT ROW_NUMBER() OVER ( ORDER BY c.object_id ) AS Nbr
FROM sys.columns c
) nbrs
WHERE nbr - 1 <= DATEDIFF(hour, #StartDate, #EndDate) AND
NOT EXISTS (SELECT 1 FROM MyLogTable WHERE DATEADD(hour, nbr - 1, #StartDate)= oDateTime )
If you need to check longer period, you can just add CROSS JOIN like this
FROM sys.columns c
CROSS JOIN sys.columns c1
It enables you to check much more than cca thousand records (rowcount of sys.columns table) in one query.
Since your table is not having any unique id number, use a row_number() to get the row number in the cte , then perform an self inner join with the row id and next id ,take the difference of oDateTime accordingly, this will show exactly which row do not have time difference of one hour
;with cte(oDateTime,pValue,Rid)
As
(
select *,row_number() over(order by oDateTime) from [YourTableName] t1
)
select *,datediff(HH,c1.oDateTime,c2.oDateTime) as HourDiff from cte c1
inner join cte c2
on c1.Rid=c2.Rid-1 where datediff(HH,c1.oDateTime,c2.oDateTime) >1
You could use DENSE_RANK() for numbering the hours in a day from 1 to 24. Then all you have to do is to check whether the max rank is 24 or not for a day. if there is at least one entry for each hour, then dense ranking will have max value of 24.
Use the following query to find the date when you have a oDateTime missing.
SELECT [date]
FROM
(
SELECT *
, CAST(oDateTime AS DATE) AS [date]
, DENSE_RANK() OVER(PARTITION BY CAST(oDateTime AS DATE) ORDER BY DATEPART(HOUR, oDateTime)) AS rank_num
FROM Test
) AS t
GROUP BY [date]
HAVING(MAX(rank_num) != 24);
If you need validation for each row of oDateTime, you could do self join based on rank and get the missing hour for each oDateTime.
Perhaps you are looking for this? This will return dates having count < 24 - which indicates a "jump"
;WITH datecount
AS ( SELECT CAST(oDateTime AS DATE) AS [date] ,
COUNT(CAST(oDateTime AS DATE)) AS [count]
FROM #temp
GROUP BY ( CAST(oDateTime AS DATE) )
)
SELECT *
FROM datecount
WHERE [count] < 24;
EDIT: Since you changed the requirement from "How to know if there is missing" to "What is the missing", here's an updated query.
DECLARE #calendar AS TABLE ( oDateTime DATETIME )
DECLARE #min DATETIME = (SELECT MIN([oDateTime]) FROM #yourTable)
DECLARE #max DATETIME = (SELECT MAX([oDateTime]) FROM #yourTable)
WHILE ( #min <= #max )
BEGIN
INSERT INTO #calendar
VALUES ( #min );
SET #min = DATEADD(hh, 1, #min);
END;
SELECT t1.[oDateTime]
FROM #calendar t1
LEFT JOIN #yourTable t2 ON t1.[oDateTime] = t2.[oDateTime]
GROUP BY t1.[oDateTime]
HAVING COUNT(t2.[oDateTime]) = 0;
I first created a hourly calendar based on your MAX and MIN Datetime, then compared your actual table to the calendar to find out if there is a "jump".

How to get value by a range of dates?

I have a table like so
And With this code I get the 5 latest values for each domainId
;WITH grp AS
(
SELECT DomainId, [Date],Passed, DatabasePerformance,ServerPerformance,
rn = ROW_NUMBER() OVER
(PARTITION BY DomainId ORDER BY [Date] DESC)
FROM dbo.DomainDetailDataHistory H
)
SELECT g.DomainId, g.[Date],g.Passed, g.ServerPerformance, g.DatabasePerformance
FROM grp g
INNER JOIN #Latest T ON T.DomainId = g.DomainId
WHERE rn < 7 AND t.date != g.[Date]
ORDER BY DomainId, [Date] DESC
What I Want
Well I would like to know how many tickets were sold for each of these 5 latest rows but with the following condition:
Each of these rows come with their own date which differs.
for each date I want to check how many were sold the last 15minutes AND how many were sold the last 30mns.
Example:
I get these 5 rows for each domainId
I want to extend the above with two columns, "soldTicketsLast15" and "soldTicketsLast30"
The date column contains all the dates I need and for each of these dates I want to go back 15 min and go back 30min to and get how many tickets were sold
Example:
SELECT MAX(SoldTickets) FROM DomainDetailDataHistory
WHERE [Date] >= DATEADD(minute, -15, '2016-04-12 12:10:28.2270000')
SELECT MAX(SoldTickets) FROM DomainDetailDataHistory
WHERE [Date] >= DATEADD(minute, -30, '2016-04-12 12:10:28.2270000')
How can i accomplish this?
I'd use OUTER APPLY or CROSS APPLY.
;WITH grp AS
(
SELECT
DomainId, [Date], Passed, DatabasePerformance, ServerPerformance,
rn = ROW_NUMBER() OVER (PARTITION BY DomainId ORDER BY [Date] DESC)
FROM dbo.DomainDetailDataHistory H
)
SELECT
g.DomainId, g.[Date],g.Passed, g.ServerPerformance, g.DatabasePerformance
,A15.SoldTicketsLast15
,A30.SoldTicketsLast30
FROM
grp g
INNER JOIN #Latest T ON T.DomainId = g.DomainId
OUTER APPLY
(
SELECT MAX(H.SoldTickets) - MIN(H.SoldTickets) AS SoldTicketsLast15
FROM DomainDetailDataHistory AS H
WHERE
H.DomainId = g.DomainId AND
H.[Date] >= DATEADD(minute, -15, g.[Date])
) AS A15
OUTER APPLY
(
SELECT MAX(H.SoldTickets) - MIN(H.SoldTickets) AS SoldTicketsLast30
FROM DomainDetailDataHistory AS H
WHERE
H.DomainId = g.DomainId AND
H.[Date] >= DATEADD(minute, -30, g.[Date])
) AS A30
WHERE
rn < 7
AND T.[date] != g.[Date]
ORDER BY DomainId, [Date] DESC;
To make the correlated APPLY queries efficient there should be an appropriate index, like the following:
CREATE NONCLUSTERED INDEX [IX_DomainId_Date] ON [dbo].[DomainDetailDataHistory]
(
[DomainId] ASC,
[Date] ASC
)
INCLUDE ([SoldTickets])
This index may also help to make the main part of your query (grp) efficient.
If I understood your question correctly, you want to get the tickets sold from one of your dates (in the Date column) going back 15 minutes and 30 minutes. Assuming that you are using your DATEADD function correctly, the following should work:
SELECT MAX(SoldTickets) FROM DomainDetailDataHistory
WHERE [Date] BETWEEN [DATE] AND DATEADD(minute, -15, '2016-04-12 12:10:28.2270000') GROUP BY [SoldTickets]
The between operator allows you to retrieve results between two date parameters. In the SQL above, we also need a group by since you are using a GROUPING function (MAX). The group by would depend on what you want to group by but I think in your case it would be SoldTickets.
The SQL above will give you the ones between the date and 15 minutes back. You could do something similar with the 30 minutes back.

To club the rows for week days

I have data like below:
StartDate EndDate Duration
----------
41890 41892 3
41898 41900 3
41906 41907 2
41910 41910 1
StartDate and EndDate are respective ID values for any dates from calendar. I want to calculate the sum of duration for consecutive days. Here I want to include the days which are weekends. E.g. in the above data, let's say 41908 and 41909 are weekends, then my required result set should look like below.
I already have another proc that can return me the next working day, i.e. if I pass 41907 or 41908 or 41909 as DateID in that proc, it will return 41910 as the next working day. Basically I want to check if the DateID returned by my proc when I pass the above EndDateID is same as the next StartDateID from above data, then both the rows should be clubbed. Below is the data I want to get.
ID StartDate EndDate Duration
----------
278457 41890 41892 3
278457 41898 41900 3
278457 41906 41910 3
Please let me know in case the requirement is not clear, I can explain further.
My Date Table is like below:
DateId Date Day
----------
41906 09-04-2014 Thursday
41907 09-05-2014 Friday
41908 09-06-2014 Saturdat
41909 09-07-2014 Sunday
41910 09-08-2014 Monday
Here is the SQL Code for setup:
CREATE TABLE Table1
(
StartDate INT,
EndDate INT,
LeaveDuration INT
)
INSERT INTO Table1
VALUES(41890, 41892, 3),
(41898, 41900, 3),
(41906, 41907, 3),
(41910, 41910, 1)
CREATE TABLE DateTable
(
DateID INT,
Date DATETIME,
Day VARCHAR(20)
)
INSERT INTO DateTable
VALUES(41907, '09-05-2014', 'Friday'),
(41908, '09-06-2014', 'Saturday'),
(41909, '09-07-2014', 'Sunday'),
(41910, '09-08-2014', 'Monday'),
(41911, '09-09-2014', 'Tuesday')
This is rather complicated. Here is an approach using window functions.
First, use the date table to enumerate the dates without weekends (you can also take out holidays if you want). Then, expand the periods into one day per row, by using a non-equijoin.
You can then use a trick to identify sequential days. This trick is to generate a sequential number for each id and subtract it from the sequential number for the dates. This is a constant for sequential days. The final step is simply an aggregation.
The resulting query is something like this:
with d as (
select d.*, row_number() over (order by date) as seqnum
from dates d
where day not in ('Saturday', 'Sunday')
)
select t.id, min(t.date) as startdate, max(t.date) as enddate, sum(duration)
from (select t.*, ds.seqnum, ds.date,
(d.seqnum - row_number() over (partition by id order by ds.date) ) as grp
from table t join
d ds
on ds.date between t.startdate and t.enddate
) t
group by t.id, grp;
EDIT:
The following is the version on this SQL Fiddle:
with d as (
select d.*, row_number() over (order by date) as seqnum
from datetable d
where day not in ('Saturday', 'Sunday')
)
select t.id, min(t.date) as startdate, max(t.date) as enddate, sum(duration)
from (select t.*, ds.seqnum, ds.date,
(ds.seqnum - row_number() over (partition by id order by ds.date) ) as grp
from (select t.*, 'abc' as id from table1 t) t join
d ds
on ds.dateid between t.startdate and t.enddate
) t
group by grp;
I believe this is working, but the date table doesn't have all the dates in it.