I am working on a query in SQL Server that is giving me a result set that looks something like this:
ID
DaysInState
DaysInState2
DaysInState3
DaysInState4
1
2022-04-01
2022-04-07
NULL
NULL
2
NULL
2022-04-09
NULL
NULL
3
2022-04-11
2022-04-15
NULL
2022-04-18
4
2022-04-11
NULL
NULL
2022-04-18
I need to calculate the number of days that a given item spent in a given state. The challenge I am facing is 'looking ahead' in the row. Using row 1 as an example these would be the following values:
DaysInState: 6 (DATEDIFF(day, '2022-04-11', '2022-04-07'))
DaysInState2: 12 (DATEDIFF(day, '2022-04-07', GETDATE()))
DaysInState3: NULL
DaysInState4: NULL
The challenging part here is that for each column in each row, I have to look at all the columns to the right of the reference column to see if a date exists to use in DATEDIFF. If a date is not found to the right of the reference column then GETDATE() is used. The table below shows what the result set should look like:
ID
DaysInState
DaysInState2
DaysInState3
DaysInState4
1
6
12
NULL
NULL
2
NULL
10
NULL
NULL
3
4
3
NULL
1
4
7
NULL
NULL
1
I can write fairly convoluted CASE...WHEN statements for each column such that
SELECT
CASE
WHEN DaysInState IS NOT NULL AND DaysInState2 IS NOT NULL THEN DateDiff(day, DaysInState, DaysInState2)
WHEN DaysInState IS NOT NULL AND DaysInState2 IS NULL AND DaysInState3 IS NOT NULL THEN DateDiff(day, DaysInState, DaysInState3)
...
END
...
However this isn't very maintainable when states are added / removed. Is there a more dynamic approach to solving this problem that doesn't involve lengthy CASE statements or just generally a "better" approach that maybe I am not seeing?
The COALESCE function allows multiple parameters, evaluating them from left to right, returning the first non-null value, eliminating the need for nesting:
Daysinstate1=
datediff(day,
Daysinstate1,
Coalesce(daysinstate2
,Daysinstate3
,Daysinstate4
,Getdate())
)
If it is possible to adjust the query generating your result set, I'd like to suggest a new approach. One advantage is that it can handle additional DayInState variables (5,6,7,...).
Rewrite your query so that your results have three columns: one for ID, one for "DayInState" number, and one for the date. That is, no NULL values returned. Union the result set with the distinct IDs, an exceedingly large "DayInState" number, and the result of GETDATE(). Then you can use DATEDIFF() with LAG() to look at the next dates.
Here's a working example in SQL Server using your data:
begin
declare #temp table (id int,state_num int,dt date)
insert into #temp values
(1,1,'2022-04-01'),
(1,2,'2022-04-07'),
(2,2,'2022-04-09'),
(3,1,'2022-04-11'),
(3,2,'2022-04-15'),
(3,3,'2022-04-18'),
(4,1,'2022-04-11'),
(4,4,'2022-04-18')
select t.id,t.state_num,DATEDIFF(day,t.dt,LAG(t.dt,1,GETDATE()) over(partition by t.id order by t.state_num desc))
from
(select * from #temp
union (select distinct id,999 as state_num, GETDATE() as dt from #temp) ) t
where t.state_num!=999
order by t.id,t.state_num
end
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[DateTest]') AND type in (N'U'))
DROP TABLE [dbo].[DateTest]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[DateTest](
[Id] [int] IDENTITY(1,1) NOT NULL,
[DaysInState] [date] NULL,
[DaysInState2] [date] NULL,
[DaysInState3] [date] NULL,
[DaysInState4] [date] NULL,
CONSTRAINT [PK_DateTest] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
INSERT INTO [dbo].[DateTest] ([DaysInState],[DaysInState2],[DaysInState3],[DaysInState4]) VALUES
('2022-04-01','2022-04-07',NULL,NULL),
(NULL,'2022-04-09',NULL,NULL),
('2022-04-11','2022-04-15',NULL,'2022-04-18'),
('2022-04-11',NULL,NULL,'2022-04-18');
GO
SELECT [ID],[DaysInState],[DaysInState2],[DaysInState3],[DaysInState4] FROM dbo.DateTest
Use nested ISNULL to check the next column or pass GETDATE().
Using the variable means you can alter the date if needed.
DECLARE #theDate date = GETDATE()
SELECT
[DaysInState] =DATEDIFF(day,[DaysInState], ISNULL([DaysInState2],ISNULL([DaysInState3],ISNULL([DaysInState4],#theDate))))
,[DaysInState2] =DATEDIFF(day,[DaysInState2],ISNULL([DaysInState3],ISNULL([DaysInState4],#theDate)))
,[DaysInState3] =DATEDIFF(day,[DaysInState3],ISNULL([DaysInState4],#theDate))
,[DaysInState4] =DATEDIFF(day,[DaysInState4],#theDate)
FROM dbo.DateTest
Your original query
SELECT
[DaysInState]=CASE
WHEN DaysInState IS NOT NULL AND DaysInState2 IS NOT NULL THEN DateDiff(day, DaysInState, DaysInState2)
WHEN DaysInState IS NOT NULL AND DaysInState2 IS NULL AND DaysInState3 IS NOT NULL THEN DateDiff(day, DaysInState, DaysInState3)
END
FROM dbo.DateTest
Related
I have a table as such (with thousand of rows):
ogr_fid (PK, int, not null)
90 (varchar(157), null)
80 (varchar(157), null)
70 (varchar(157), null)
1
some_text
NULL
NULL
2
NULL
NULL
other_text
3
NULL
more_text
NULL
4
even_more_text
NULL
NULL
5
NULL
NULL
NULL
I would like to merge the table to read as follows;
ogr_fid (PK, int, not null)
m_agl (int, null)
1
90
2
70
3
80
4
90
5
NULL
As you can see I will use the text as simply a IS NOT NULL sort of test. The merged column will be populated with the name of the IS NOT NULL column.
I have been trying to understand IIF and CASE but it is far beyond my current level of understanding without a working example to learn from.
My latest attempt:
ALTER TABLE [dbo].[my_table] ADD [m_agl] AS SELECT
CASE WHEN 90 IS NOT NULL THEN '90'
CASE WHEN 80 IS NOT NULL THEN '80'
CASE WHEN 70 IS NOT NULL THEN '70'
GO
ALTER TABLE [dbo].[my_table] DROP COLUMN (90, 80, 70)
GO
Many thanks.
Note, a Column name cannot be naked integer.
Solution using IIF
ALTER TABLE [dbo].[my_table] ADD [column_merged];
GO
SELECT ogr_fid, IIF(column_1 IS NOT NULL, column_1, IIF(column_2 IS NOT NULL, column_2, IIF(column_3 IS NOT NULL, column_3, NULL)))
FROM dbo.my_table;
GO
ALTER TABLE [dbo].[my_table] DROP COLUMN (column_1, column_2, column_3);
GO
Learn more about CASE at Microsoft Docs
Learn more about IIF at Microsoft Docs
I need to store simple data - suppose I have some products with codes as a primary key, some properties and validity ranges. So data could look like this:
Products
code value begin_date end_date
10905 13 2005-01-01 2016-12-31
10905 11 2017-01-01 null
Those ranges are not overlapping, so on every date I have a list of unique products and their properties. So to ease the use of it I've created the function:
create function dbo.f_Products
(
#date date
)
returns table
as
return (
select
from dbo.Products as p
where
#date >= p.begin_date and
#date <= p.end_date
)
This is how I'm going to use it:
select
*
from <some table with product codes> as t
left join dbo.f_Products(#date) as p on
p.code = t.product_code
This is all fine, but how I can let optimizer know that those rows are unique to have better execution plan?
I did some googling, and found a couple of really nice articles for DDL which prevents storing overlapping ranges in the table:
Self-maintaining, Contiguous Effective Dates in Temporal Tables
Storing intervals of time with no overlaps
But even if I try those constraint I see that optimizer cannot understand that resulting recordset will return unique codes.
What I'd like to have is certain approach which gives me basically the same performance as if I stored those products list on certain date and selected it with date = #date.
I know that some RDMBS (like PostgreSQL) have special data types for this (Range Types). But SQL Server doesn't have anything like this.
Am I missing something or there're no way to do this properly in SQL Server?
You can create an indexed view that contains a row for each code/date in the range.
ProductDate (indexed view)
code value date
10905 13 2005-01-01
10905 13 2005-01-02
10905 13 ...
10905 13 2016-12-31
10905 11 2017-01-01
10905 11 2017-01-02
10905 11 ...
10905 11 Today
Like this:
create schema digits
go
create table digits.Ones (digit tinyint not null primary key)
insert into digits.Ones (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
create table digits.Tens (digit tinyint not null primary key)
insert into digits.Tens (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
create table digits.Hundreds (digit tinyint not null primary key)
insert into digits.Hundreds (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
create table digits.Thousands (digit tinyint not null primary key)
insert into digits.Thousands (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
create table digits.TenThousands (digit tinyint not null primary key)
insert into digits.TenThousands (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
go
create schema info
go
create table info.Products (code int not null, [value] int not null, begin_date date not null, end_date date null, primary key (code, begin_date))
insert into info.Products (code, [value], begin_date, end_date) values
(10905, 13, '2005-01-01', '2016-12-31'),
(10905, 11, '2017-01-01', null)
create table info.DateRange ([begin] date not null, [end] date not null, [singleton] bit not null default(1) check ([singleton] = 1))
insert into info.DateRange ([begin], [end]) values ((select min(begin_date) from info.Products), getdate())
go
create view info.ProductDate with schemabinding
as
select
p.code,
p.value,
dateadd(day, ones.digit + tens.digit*10 + huns.digit*100 + thos.digit*1000 + tthos.digit*10000, dr.[begin]) as [date]
from
info.DateRange as dr
cross join
digits.Ones as ones
cross join
digits.Tens as tens
cross join
digits.Hundreds as huns
cross join
digits.Thousands as thos
cross join
digits.TenThousands as tthos
join
info.Products as p on
dateadd(day, ones.digit + tens.digit*10 + huns.digit*100 + thos.digit*1000 + tthos.digit*10000, dr.[begin]) between p.begin_date and isnull(p.end_date, datefromparts(9999, 12, 31))
go
create unique clustered index idx_ProductDate on info.ProductDate ([date], code)
go
select *
from info.ProductDate with (noexpand)
where
date = '2014-01-01'
drop view info.ProductDate
drop table info.Products
drop table info.DateRange
drop table digits.Ones
drop table digits.Tens
drop table digits.Hundreds
drop table digits.Thousands
drop table digits.TenThousands
drop schema digits
drop schema info
go
A solution without gaps might be this:
DECLARE #tbl TABLE(ID INT IDENTITY,[start_date] DATE);
INSERT INTO #tbl VALUES({d'2016-10-01'}),({d'2016-09-01'}),({d'2016-08-01'}),({d'2016-07-01'}),({d'2016-06-01'});
SELECT * FROM #tbl;
DECLARE #DateFilter DATE={d'2016-08-13'};
SELECT TOP 1 *
FROM #tbl
WHERE [start_date]<=#DateFilter
ORDER BY [start_date] DESC
Important: Be sure that there is an (unique) index on start_date
UPDATE: for different products
DECLARE #tbl TABLE(ID INT IDENTITY,ProductID INT,[start_date] DATE);
INSERT INTO #tbl VALUES
--product 1
(1,{d'2016-10-01'}),(1,{d'2016-09-01'}),(1,{d'2016-08-01'}),(1,{d'2016-07-01'}),(1,{d'2016-06-01'})
--product 1
,(2,{d'2016-10-17'}),(2,{d'2016-09-16'}),(2,{d'2016-08-15'}),(2,{d'2016-07-10'}),(2,{d'2016-06-11'});
DECLARE #DateFilter DATE={d'2016-08-13'};
WITH PartitionedCount AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY ProductID ORDER BY [start_date] DESC) AS Nr
,*
FROM #tbl
WHERE [start_date]<=#DateFilter
)
SELECT *
FROM PartitionedCount
WHERE Nr=1
First you need to create a unique clustered index for (begin_date, end_date, code)
Then SQL engine will be able to do INDEX SEEK.
Additionally, you can also try to create a view for dbo.Products table to join that table with pre-populated dbo.Dates table.
select p.code, p.val, p.begin_date, p.end_date, d.[date]
from dbo.Product as p
inner join dbo.dates d on p.begin_date <= d.[date] and d.[date] <= p.end_date
Then in your function, you use that view as "where #date = view.date". The result can be either better or slightly worse... it depends on the actual data.
You also can try to make that view indexed (depends on how often it is being updated).
Alternatively, you can have better performance if you populate dbo.Products table for every date in the [begin_date] .. [end_date] range.
Approach with ROW_NUMBER scans the whole Products table once. It is the best method if you have a lot of product codes in the Products table and few validity ranges for each code.
WITH
CTE_rn
AS
(
SELECT
code
,value
,ROW_NUMBER() OVER (PARTITION BY code ORDER BY begin_date DESC) AS rn
FROM Products
WHERE begin_date <= #date
)
SELECT *
FROM
<some table with product codes> as t
LEFT JOIN CTE_rn ON CTE_rn.code = t.product_code AND CTE_rn.rn = 1
;
If you have few product codes and a lot of validity ranges for each code in the Products table, then it is better to seek the Products table for each code using OUTER APPLY.
SELECT *
FROM
<some table with product codes> as t
OUTER APPLY
(
SELECT TOP(1)
Products.value
FROM Products
WHERE
Products.code = t.product_code
AND Products.begin_date <= #date
ORDER BY Products.begin_date DESC
) AS A
;
Both variants need unique index on (code, begin_date DESC) include (value).
Note how the queries don't even look at end_date, because they assume that intervals don't have gaps. They will work in SQL Server 2008.
EDIT: My original answer was using an INNER JOIN, but the questioner wanted a LEFT JOIN.
CREATE TABLE Products
(
[Code] INT NOT NULL
, [Value] VARCHAR(30) NOT NULL
, Begin_Date DATETIME NOT NULL
, End_Date DATETIME NULL
)
/*
Products
code value begin_date end_date
10905 13 2005-01-01 2016-12-31
10905 11 2017-01-01 null
*/
INSERT INTO Products ([Code], [Value], Begin_Date, End_Date) VALUES (10905, 13, '2005-01-01', '2016-12-31')
INSERT INTO Products ([Code], [Value], Begin_Date, End_Date) VALUES (10905, 11, '2017-01-01', NULL)
CREATE NONCLUSTERED INDEX SK_ProductDate ON Products ([Code], Begin_Date, End_Date) INCLUDE ([Value])
CREATE TABLE SomeTableWithProductCodes
(
[CODE] INT NOT NULL
)
INSERT INTO SomeTableWithProductCodes ([Code]) VALUES (10905)
Here is a prototypical query, with a date predicate. Note that there are more optimal ways to do this in a bulletproof fashion, using a "less than" operator on the upper bound, but that's a different discussion.
SELECT
P.[Code]
, P.[Value]
, P.[Begin_Date]
, P.[End_Date]
FROM
SomeTableWithProductCodes ST
LEFT JOIN Products AS P ON
ST.[Code] = P.[Code]
AND '2016-06-30' BETWEEN P.[Begin_Date] AND ISNULL(P.[End_Date], '9999-12-31')
This query will perform an Index Seek on the Product table.
Here is a SQL Fiddle: SQL Fiddle - Products and Dates
I need to calculate something by getting the corresponding value from a field (let's say abc256) to which i must subtract the value from the same column but 249 rows above (let's say abc7) and then multiply it by 100 into a new column in a temporary table (only for displayin the output).
How to count from the current value 249 fields above?
I already ordered the list as it should be by 2 columns asc.
So the query which orders my list looks like:
select [rN] ,[rD],[r],[rId]
from [someName].[dbo].[some_table]
where rN like '%bla%'
and rD >= 'yyy-mm-dd'
EDIT: order by rD asc, rID asc
a pseudocode of what i need is:
[(case when rN like 'something' then newSomething = (r.value - r.count(249).value))*100) as newSomething)]
FROM [someName].[dbo].[some_table]
then i tried
select [rN] ,[rD],[r],[rId]
from select (ROW_NUMBER() over (order by key ASC) AS rownumber,
r)
from
[someName].[dbo].[some_table]
where rownumber = r -249
where rN like '%bla%'
and rD >= 'yyy-mm-dd'
I should mention that i need to to a repetitive process of this (after each 249 rows i am calculating using the current value - the value from 249 rows above). And i will have 12 cases for rN like 'something1' ...'something12'
How to get this to work?
Thanks
You will have to adapt this to your specific needs, but whenever I am posed with the "Compare a column to the one nth one before it" problem, I revert to CTE expressions and then do a self-join. I've thrown together a quick example to get you started. I hope you find it useful.
-- DROP TABLE IF EXISTS
IF OBJECT_ID('Table_1') IS NOT NULL
DROP TABLE Table_1
GO
-- CREATE A TABLE FOR TESTING
CREATE TABLE [dbo].[Table_1](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Num1] [int] NULL,
[Num2] [int] NULL
) ON [PRIMARY]
GO
-- FILL THE TABLE WITH VALUES
DECLARE #cnt INT; SET #cnt = 0
WHILE #cnt <=10000
BEGIN
SET #cnt = #cnt + 1
INSERT INTO Table_1 (Num1, Num2) VALUES (#cnt, #cnt * 1000)
END
GO
-- DO THE SELECT
; With RowedTables AS (
SELECT
ROW_NUMBER() OVER (ORDER BY Id ASC) As R,
T1.*
FROM Table_1 T1)
SELECT
RT1.Num1 - RT2.Num1 AS SomeMath,
RT1.*,
RT2.*
FROM RowedTables RT1 JOIN RowedTables RT2 ON RT1.R = RT2.R - 10
I'm having difficulties writing a SQL query. This is the structure of 3 tables, table Race_ClassificationType is many-to-many table.
Table Race
----------------------------
RaceID
Name
Table Race_ClassificationType
----------------------------
Race_ClassificationTypeID
RaceID
RaceClassificationID
Table RaceClassificationType
----------------------------
RaceClassificationTypeID
Name
What I'm trying to do is get the races with certain classifications. The results are returned by a store procedure that has a table-value parameter which holds the desired classifications:
CREATE TYPE [dbo].[RaceClassificationTypeTable]
AS TABLE
(
RaceClassificationTypeID INT NULL
);
GO
CREATE PROCEDURE USP_GetRaceList
(#RaceClassificationTypeTable AS [RaceClassificationTypeTable] READONLY,
#RaceTypeID INT = NULL,
#IsCompleted BIT = NULL,
#MinDateTime DATETIME = NULL,
#MaxDateTime DATETIME = NULL,
#MaxRaces INT = NULL)
WITH RECOMPILE
AS
BEGIN
SET NOCOUNT ON;
SELECT DISTINCT
R.[RaceID]
,R.[RaceTypeID]
,R.[Name]
,R.[Abbreviation]
,R.[DateTime]
,R.[IsCompleted]
FROM [Race] R,[Race_ClassificationType] R_CT, [RaceClassificationType] RCT
WHERE (R.[RaceTypeID] = #RaceTypeID OR #RaceTypeID IS NULL)
AND (R.[IsCompleted] = #IsCompleted OR #IsCompleted IS NULL)
AND (R.[DateTime] >= #MinDateTime OR #MinDateTime IS NULL)
AND (R.[DateTime] <= #MaxDateTime OR #MaxDateTime IS NULL)
AND (R.RaceID = R_CT.RaceID)
AND (R_CT.RaceClassificationTypeID = RCT.RaceClassificationTypeID)
AND (RCT.RaceClassificationTypeID IN (SELECT DISTINCT T.RaceClassificationTypeID FROM #RaceClassificationTypeTable T))
ORDER BY [DateTime] DESC
OFFSET 0 ROWS FETCH NEXT #MaxRaces ROWS ONLY
END
GO
As it is this stored procedure doesnt work correctly because it returns all races that have at least one classification type ID in the table-value parameter of classification type IDs (because of the IN clause). I want that the store procedure returns only races that have all the classifications supplied in the table-valued parameter.
Example:
RaceClassificationTypeID RaceID
3 92728
3 92729
8 92729
29 92729
12 92729
2 92729
3 92730
8 92730
8 92731
1 92731
RaceClassificationTypeIDs in RaceClassificationTypeTable parameter: 3 and 8
OUTPUT: all the races with RaceClassificationID 3 and 8 and optionally any other (2, 29, 12)
That means only races 92729 and 92730 should be returned, as it is all the races in the example are returned.
I've set up two tables, one stores your result set and the other represents the values in the table valued parameter of your stored procedure. See below.
CREATE TABLE ABC
(
RCTID INT,
RID INT
)
INSERT INTO ABC VALUES (3,92728)
INSERT INTO ABC VALUES (3,92729)
INSERT INTO ABC VALUES (8,92729)
INSERT INTO ABC VALUES (29,92729)
INSERT INTO ABC VALUES (12,92729)
INSERT INTO ABC VALUES (2,92729)
INSERT INTO ABC VALUES (3,92730)
INSERT INTO ABC VALUES (8,92730)
INSERT INTO ABC VALUES (8,92731)
INSERT INTO ABC VALUES (1,92731)
GO
CREATE TABLE TABLEVALUEPARAMETER
(
VID INT
)
INSERT INTO TABLEVALUEPARAMETER VALUES (3)
INSERT INTO TABLEVALUEPARAMETER VALUES (8)
GO
SELECT RID FROM ABC WHERE RCTID IN (SELECT VID FROM TABLEVALUEPARAMETER) GROUP BY
RID HAVING COUNT(RID) = (SELECT COUNT(VID) FROM TABLEVALUEPARAMETER)
GO
If you run this on your machine you'll notice it produces the two IDs that you're after.
Because you have a stored procedure with a lot of columns selected it would be necessary to use a CTE (Common Table Expression). This is because if you were to try to group all the columns in the current select statement you would have to group by all the columns and you would then get duplication.
If the first CTE delivers the result set and then you uses a version of the select above you should be able to produce only the IDs you want.
If you don't know CTE's let me know!
This is an example of a "set-within-sets" subquery. One way to solve this is with aggregation and a having clause. Here is how you get the RaceIds:
select RaceID
from RaceClassification rc
group by RaceID
having sum(case when RaceClassificationTypeId = 3 then 1 else 0 end) > 0 and
sum(case when RaceClassificationTypeId = 8 then 1 else 0 end) > 0;
Each condition in the having clause is counts how many rows have each type. Only races with each (because of the > 0) are kept.
You can get all the race information by using this as a subquery:
select r.*
from Races r join
(select RaceID
from RaceClassification rc
group by RaceID
having sum(case when RaceClassificationTypeId = 3 then 1 else 0 end) > 0 and
sum(case when RaceClassificationTypeId = 8 then 1 else 0 end) > 0
) rc
on r.RaceID = rc.RaceId;
Your stored procedure seems to have other conditions. These can also be added in.
Suppose I have following table in Sql Server 2008:
ItemId StartDate EndDate
1 NULL 2011-01-15
2 2011-01-16 2011-01-25
3 2011-01-26 NULL
As you can see, this table has StartDate and EndDate columns. I want to validate data in these columns. Intervals cannot conflict with each other. So, the table above is valid, but the next table is invalid, becase first row has End Date greater than StartDate in the second row.
ItemId StartDate EndDate
1 NULL 2011-01-17
2 2011-01-16 2011-01-25
3 2011-01-26 NULL
NULL means infinity here.
Could you help me to write a script for data validation?
[The second task]
Thanks for the answers.
I have a complication. Let's assume, I have such table:
ItemId IntervalId StartDate EndDate
1 1 NULL 2011-01-15
2 1 2011-01-16 2011-01-25
3 1 2011-01-26 NULL
4 2 NULL 2011-01-17
5 2 2011-01-16 2011-01-25
6 2 2011-01-26 NULL
Here I want to validate intervals within a groups of IntervalId, but not within the whole table. So, Interval 1 will be valid, but Interval 2 will be invalid.
And also. Is it possible to add a constraint to the table in order to avoid such invalid records?
[Final Solution]
I created function to check if interval is conflicted:
CREATE FUNCTION [dbo].[fnIntervalConflict]
(
#intervalId INT,
#originalItemId INT,
#startDate DATETIME,
#endDate DATETIME
)
RETURNS BIT
AS
BEGIN
SET #startDate = ISNULL(#startDate,'1/1/1753 12:00:00 AM')
SET #endDate = ISNULL(#endDate,'12/31/9999 11:59:59 PM')
DECLARE #conflict BIT = 0
SELECT TOP 1 #conflict = 1
FROM Items
WHERE IntervalId = #intervalId
AND ItemId <> #originalItemId
AND (
(ISNULL(StartDate,'1/1/1753 12:00:00 AM') >= #startDate
AND ISNULL(StartDate,'1/1/1753 12:00:00 AM') <= #endDate)
OR (ISNULL(EndDate,'12/31/9999 11:59:59 PM') >= #startDate
AND ISNULL(EndDate,'12/31/9999 11:59:59 PM') <= #endDate)
)
RETURN #conflict
END
And then I added 2 constraints to my table:
ALTER TABLE dbo.Items ADD CONSTRAINT
CK_Items_Dates CHECK (StartDate IS NULL OR EndDate IS NULL OR StartDate <= EndDate)
GO
and
ALTER TABLE dbo.Items ADD CONSTRAINT
CK_Items_ValidInterval CHECK (([dbo].[fnIntervalConflict]([IntervalId], ItemId,[StartDate],[EndDate])=(0)))
GO
I know, the second constraint slows insert and update operations, but it is not very important for my application.
And also, now I can call function fnIntervalConflict from my application code before inserts and updates of data in the table.
Something like this should give you all overlaping periods
SELECT
*
FROM
mytable t1
JOIN mytable t2 ON t1.EndDate>t2.StartDate AND t1.StartDate < t2.StartDate
Edited for Adrians comment bellow
This will give you the rows that are incorrect.
Added ROW_NUMBER() as I didnt know if all entries where in order.
-- Testdata
declare #date datetime = '2011-01-17'
;with yourTable(itemID, startDate, endDate)
as
(
SELECT 1, NULL, #date
UNION ALL
SELECT 2, dateadd(day, -1, #date), DATEADD(day, 10, #date)
UNION ALL
SELECT 3, DATEADD(day, 60, #date), NULL
)
-- End testdata
,tmp
as
(
select *
,ROW_NUMBER() OVER(order by startDate) as rowno
from yourTable
)
select *
from tmp t1
left join tmp t2
on t1.rowno = t2.rowno - 1
where t1.endDate > t2.startDate
EDIT:
As for the updated question:
Just add a PARTITION BY clause to the ROW_NUMBER() query and alter the join.
-- Testdata
declare #date datetime = '2011-01-17'
;with yourTable(itemID, startDate, endDate, intervalID)
as
(
SELECT 1, NULL, #date, 1
UNION ALL
SELECT 2, dateadd(day, 1, #date), DATEADD(day, 10, #date),1
UNION ALL
SELECT 3, DATEADD(day, 60, #date), NULL, 1
UNION ALL
SELECT 4, NULL, #date, 2
UNION ALL
SELECT 5, dateadd(day, -1, #date), DATEADD(day, 10, #date),2
UNION ALL
SELECT 6, DATEADD(day, 60, #date), NULL, 2
)
-- End testdata
,tmp
as
(
select *
,ROW_NUMBER() OVER(partition by intervalID order by startDate) as rowno
from yourTable
)
select *
from tmp t1
left join tmp t2
on t1.rowno = t2.rowno - 1
and t1.intervalID = t2.intervalID
where t1.endDate > t2.startDate
declare #T table (ItemId int, IntervalID int, StartDate datetime, EndDate datetime)
insert into #T
select 1, 1, NULL, '2011-01-15' union all
select 2, 1, '2011-01-16', '2011-01-25' union all
select 3, 1, '2011-01-26', NULL union all
select 4, 2, NULL, '2011-01-17' union all
select 5, 2, '2011-01-16', '2011-01-25' union all
select 6, 2, '2011-01-26', NULL
select T1.*
from #T as T1
inner join #T as T2
on coalesce(T1.StartDate, '1753-01-01') < coalesce(T2.EndDate, '9999-12-31') and
coalesce(T1.EndDate, '9999-12-31') > coalesce(T2.StartDate, '1753-01-01') and
T1.IntervalID = T2.IntervalID and
T1.ItemId <> T2.ItemId
Result:
ItemId IntervalID StartDate EndDate
----------- ----------- ----------------------- -----------------------
5 2 2011-01-16 00:00:00.000 2011-01-25 00:00:00.000
4 2 NULL 2011-01-17 00:00:00.000
Not directly related to the OP, but since Adrian's expressed an interest. Here's a table than SQL Server maintains the integrity of, ensuring that only one valid value is present at any time. In this case, I'm dealing with a current/history table, but the example can be modified to work with future data also (although in that case, you can't have the indexed view, and you need to write the merge's directly, rather than maintaining through triggers).
In this particular case, I'm dealing with a link table that I want to track the history of. First, the tables that we're linking:
create table dbo.Clients (
ClientID int IDENTITY(1,1) not null,
Name varchar(50) not null,
/* Other columns */
constraint PK_Clients PRIMARY KEY (ClientID)
)
go
create table dbo.DataItems (
DataItemID int IDENTITY(1,1) not null,
Name varchar(50) not null,
/* Other columns */
constraint PK_DataItems PRIMARY KEY (DataItemID),
constraint UQ_DataItem_Names UNIQUE (Name)
)
go
Now, if we were building a normal table, we'd have the following (Don't run this one):
create table dbo.ClientAnswers (
ClientID int not null,
DataItemID int not null,
IntValue int not null,
Comment varchar(max) null,
constraint PK_ClientAnswers PRIMARY KEY (ClientID,DataItemID),
constraint FK_ClientAnswers_Clients FOREIGN KEY (ClientID) references dbo.Clients (ClientID),
constraint FK_ClientAnswers_DataItems FOREIGN KEY (DataItemID) references dbo.DataItems (DataItemID)
)
But, we want a table that can represent a complete history. In particular, we want to design the structure such that overlapping time periods can never appear in the database. We always know which record was valid at any particular time:
create table dbo.ClientAnswerHistories (
ClientID int not null,
DataItemID int not null,
IntValue int null,
Comment varchar(max) null,
/* Temporal columns */
Deleted bit not null,
ValidFrom datetime2 null,
ValidTo datetime2 null,
constraint UQ_ClientAnswerHistories_ValidFrom UNIQUE (ClientID,DataItemID,ValidFrom),
constraint UQ_ClientAnswerHistories_ValidTo UNIQUE (ClientID,DataItemID,ValidTo),
constraint CK_ClientAnswerHistories_NoTimeTravel CHECK (ValidFrom < ValidTo),
constraint FK_ClientAnswerHistories_Clients FOREIGN KEY (ClientID) references dbo.Clients (ClientID),
constraint FK_ClientAnswerHistories_DataItems FOREIGN KEY (DataItemID) references dbo.DataItems (DataItemID),
constraint FK_ClientAnswerHistories_Prev FOREIGN KEY (ClientID,DataItemID,ValidFrom)
references dbo.ClientAnswerHistories (ClientID,DataItemID,ValidTo),
constraint FK_ClientAnswerHistories_Next FOREIGN KEY (ClientID,DataItemID,ValidTo)
references dbo.ClientAnswerHistories (ClientID,DataItemID,ValidFrom),
constraint CK_ClientAnswerHistory_DeletionNull CHECK (
Deleted = 0 or
(
IntValue is null and
Comment is null
)),
constraint CK_ClientAnswerHistory_IntValueNotNull CHECK (Deleted=1 or IntValue is not null)
)
go
That's a lot of constraints. The only way to maintain this table is through merge statements (see examples below, and try to reason about why yourself). We're now going to build a view that mimics that ClientAnswers table defined above:
create view dbo.ClientAnswers
with schemabinding
as
select
ClientID,
DataItemID,
ISNULL(IntValue,0) as IntValue,
Comment
from
dbo.ClientAnswerHistories
where
Deleted = 0 and
ValidTo is null
go
create unique clustered index PK_ClientAnswers on dbo.ClientAnswers (ClientID,DataItemID)
go
And we have the PK constraint we originally wanted. We've also used ISNULL to reinstate the not null-ness of the IntValue column (even though the check constraints already guarantee this, SQL Server is unable to derive this information). If we're working with an ORM, we let it target ClientAnswers, and the history gets automatically built. Next, we can have a function that lets us look back in time:
create function dbo.ClientAnswers_At (
#At datetime2
)
returns table
with schemabinding
as
return (
select
ClientID,
DataItemID,
ISNULL(IntValue,0) as IntValue,
Comment
from
dbo.ClientAnswerHistories
where
Deleted = 0 and
(ValidFrom is null or ValidFrom <= #At) and
(ValidTo is null or ValidTo > #At)
)
go
And finally, we need the triggers on ClientAnswers that build this history. We need to use merge statements, since we need to simultaneously insert new rows, and update the previous "valid" row to end date it with a new ValidTo value.
create trigger T_ClientAnswers_I
on dbo.ClientAnswers
instead of insert
as
set nocount on
;with Dup as (
select i.ClientID,i.DataItemID,i.IntValue,i.Comment,CASE WHEN cah.ClientID is not null THEN 1 ELSE 0 END as PrevDeleted,t.Dupl
from
inserted i
left join
dbo.ClientAnswerHistories cah
on
i.ClientID = cah.ClientID and
i.DataItemID = cah.DataItemID and
cah.ValidTo is null and
cah.Deleted = 1
cross join
(select 0 union all select 1) t(Dupl)
)
merge into dbo.ClientAnswerHistories cah
using Dup on cah.ClientID = Dup.ClientID and cah.DataItemID = Dup.DataItemID and cah.ValidTo is null and Dup.Dupl = 0 and Dup.PrevDeleted = 1
when matched then update set ValidTo = SYSDATETIME()
when not matched and Dup.Dupl=1 then insert (ClientID,DataItemID,IntValue,Comment,Deleted,ValidFrom)
values (Dup.ClientID,Dup.DataItemID,Dup.IntValue,Dup.Comment,0,CASE WHEN Dup.PrevDeleted=1 THEN SYSDATETIME() END);
go
create trigger T_ClientAnswers_U
on dbo.ClientAnswers
instead of update
as
set nocount on
;with Dup as (
select i.ClientID,i.DataItemID,i.IntValue,i.Comment,t.Dupl
from
inserted i
cross join
(select 0 union all select 1) t(Dupl)
)
merge into dbo.ClientAnswerHistories cah
using Dup on cah.ClientID = Dup.ClientID and cah.DataItemID = Dup.DataItemID and cah.ValidTo is null and Dup.Dupl = 0
when matched then update set ValidTo = SYSDATETIME()
when not matched then insert (ClientID,DataItemID,IntValue,Comment,Deleted,ValidFrom)
values (Dup.ClientID,Dup.DataItemID,Dup.IntValue,Dup.Comment,0,SYSDATETIME());
go
create trigger T_ClientAnswers_D
on dbo.ClientAnswers
instead of delete
as
set nocount on
;with Dup as (
select d.ClientID,d.DataItemID,t.Dupl
from
deleted d
cross join
(select 0 union all select 1) t(Dupl)
)
merge into dbo.ClientAnswerHistories cah
using Dup on cah.ClientID = Dup.ClientID and cah.DataItemID = Dup.DataItemID and cah.ValidTo is null and Dup.Dupl = 0
when matched then update set ValidTo = SYSDATETIME()
when not matched then insert (ClientID,DataItemID,Deleted,ValidFrom)
values (Dup.ClientID,Dup.DataItemID,1,SYSDATETIME());
go
Obviously, I could have built a simpler table (not a join table), but this is my standard go-to example (albeit it took me a while to reconstruct it - I forgot the set nocount on statements for a while). But the strength here is that, the base table, ClientAnswerHistories is incapable of storing overlapping time ranges for the same ClientID and DataItemID values.
Things get more complex when you need to deal with temporal foreign keys.
Of course, if you don't want any real gaps, then you can remove the Deleted column (and associated checks), make the not null columns really not null, modify the insert trigger to do a plain insert, and make the delete trigger raise an error instead.
I've always taken a slightly different approach to the design if I have data that is never to have overlapping intervals... namely don't store intervals, but only start times. Then, have a view that helps with displaying the intervals.
CREATE TABLE intervalStarts
(
ItemId int,
IntervalId int,
StartDate datetime
)
CREATE VIEW intervals
AS
with cte as (
select ItemId, IntervalId, StartDate,
row_number() over(partition by IntervalId order by isnull(StartDate,'1753-01-01')) row
from intervalStarts
)
select c1.ItemId, c1.IntervalId, c1.StartDate,
dateadd(dd,-1,c2.StartDate) as 'EndDate'
from cte c1
left join cte c2 on c1.IntervalId=c2.IntervalId
and c1.row=c2.row-1
So, sample data might look like:
INSERT INTO intervalStarts
select 1, 1, null union
select 2, 1, '2011-01-16' union
select 3, 1, '2011-01-26' union
select 4, 2, null union
select 5, 2, '2011-01-26' union
select 6, 2, '2011-01-14'
and a simple SELECT * FROM intervals yields:
ItemId | IntervalId | StartDate | EndDate
1 | 1 | null | 2011-01-15
2 | 1 | 2011-01-16 | 2011-01-25
3 | 1 | 2011-01-26 | null
4 | 2 | null | 2011-01-13
6 | 2 | 2011-01-14 | 2011-01-25
5 | 2 | 2011-01-26 | null