For example I have three table where say DataTable1, DataTable2 and DataTable3
and need to filter it from DataRange table, every time I have used NOT exist as shown below,
Is there a better way to write this.
Temp table to hold some daterange which is used for fiter:
Declare #DateRangeTable as Table(
StartDate datetime,
EndDate datetime
)
Some temp table which will hold data on which we need to apply date range filter
INSERT INTO #DateRangeTable values
('07/01/2020','07/04/2020'),
('07/06/2020','07/08/2020');
/*Table 1 which will hold some data*/
Declare #DataTable1 as Table(
Id numeric,
Date datetime
)
INSERT INTO #DataTable1 values
(1,'07/09/2020'),
(2,'07/06/2020');
Declare #DataTable2 as Table(
Id numeric,
Date datetime
)
INSERT INTO #DataTable2 values
(1,'07/10/2020'),
(2,'07/06/2020');
Declare #DataTable3 as Table(
Id numeric,
Date datetime
)
INSERT INTO #DataTable3 values
(1,'07/11/2020'),
(2,'07/06/2020');
Now I want to filter data based on DateRange table, here I need some optimized way so that i don't have to use not exists mutiple times, In real senario, I have mutiple tables where I have to filter based on the daterange table.
Select * from #DataTable1
where NOT EXISTS(
Select 1 from #DateRangeTable
where [Date] between StartDate and EndDate
)
Select * from #DataTable2
where NOT EXISTS(
Select 1 from #DateRangeTable
where [Date] between StartDate and EndDate
)
Select * from #DataTable3
where NOT EXISTS(
Select 1 from #DateRangeTable
where [Date] between StartDate and EndDate
)
Instead of using NOT EXISTS you could join the date range table:
SELECT dt.*
FROM #DataTable1 dt
LEFT JOIN #DateRangeTable dr ON dt.[Date] BETWEEN dr.StartDate and dr.EndDate
WHERE dr.StartDate IS NULL
It may perform better on large tables but you would have to compare the execution plans and make sure you have indexes on the date columns.
I would write the same query... but if you can change table structure I would try to improve performance adding two columns to specify the month as an integer (I suppose is the first couple of figures).
Obviously you have to test with your data and compare the timings.
Declare #DateRangeTable as Table(
StartDate datetime,
EndDate datetime,
StartMonth tinyint,
EndMonth tinyint
)
INSERT INTO #DateRangeTable values
('07/01/2020','07/04/2020', 7, 7),
('07/06/2020','07/08/2020', 7, 7),
('07/25/2020','08/02/2020', 7, 8); // (another record with different months)
Now your queries can use the new column to try to reduce comparisons (is a tinyint, sql server can partition records if you define a secondary index for StartMonth and EndMonth):
Select * from #DataTable1
where NOT EXISTS(
Select 1 from #DateRangeTable
where (DATEPART('month', [Date]) between StartMonth and EndMonth)
and ([Date] between StartDate and EndDate)
)
I want to provide an EndDate when the MainAccountNum already exist. The endDate should be applied to the MainAccountNumb with the earliest startDate.
So If I have a create table statement like this:
Create Table ods.CustomerId(
ScreenDate INT NOT NULL,
CustomerNum nvarchar(40) NOT NULL,
MainAccountNum nvarchar(40) not null,
ServiceNum nvarchar(40) not null,
StartDate datetime not null,
EndDate datetime not null,
UpdatedBy nvarchar(50) not null);
and say I encounter something in the CustomerNum, MainAccountNum, StartDate, and EndDate like below:
1467823,47382906,2019-08-26 00:00:00.000, Null
1467833,47382906,2019-09-06 00:00:00.000, null
When the second record is inserted with that same MainAccountNum the first record should get the startDate of the New Record. The startDate has a default constraint as GetDat() so in the end it should look like:
1467823,47382906,2019-08-26 00:00:00.000,2019-09-06 00:00:00.000
1467833,47382906,2019-09-06 00:00:00.000, null
Please Provide code examples of how this can be accomplished
In the stored procedure used to insert new record, have something like
begin tran
declare #startDate datetime
select top 1 #oldStartDate = StartDate
from ods.CustomerId
where MainAccountNum = #mainAccountNum
order by StartDate asc
if ##rowcount > 0
update ods.CustomerId set EndDate = #startDate
where MainAccountNum = #mainAccountNum and StartDate = #oldStartDate
insert ... <your new record here>
commit
I am assuming that (MainAccountNum, StartDate) tuple is unique and can be used as a key. If not, you have to use whatever is unique for your update statement.
How to check if a DATE being inserted or updated in a table is between two other dates from another table.
Adicional Info:
I have 2 tables:
Activity:
StartDate date NOT NULL
EndDate date NULLABLE
SubActivity:
SubActivityDate date NOT NULL
When EndDate IS NOT NULL I check if: StartDate ≤ SubActivityDate ≤ EndDate
When EndDate IS NULL I check if: StartDate ≤ SubActivityDate
I was trying to write a BEFORE INSERT trigger but I figured out that it doesnt exist.
So what I could do?
AFTER INSERT?
INSTEAD OF INSERT? looks better than 1st solution
Is it possible just with CHECK Constraints?
How do I solve this problem?
EDIT
I just went with the CHECK constraint + function:
constraint:
ALTER TABLE SubActivity
ADD CONSTRAINT CK_SubActivity_Date CHECK (dbo.ufnIsSubactivityDateValid(ActivityID, SubActivityDate) = 1);
function:
CREATE FUNCTION ufnIsSubactivityDateValid(#ActivityID [int], #SubActivityDate [date])
RETURNS [bit]
AS
BEGIN
DECLARE #StartDate date, #EndDate date;
SELECT #StartDate = StartDate , #EndDate = EndDate
FROM Activity
WHERE ActivityID = #ActivityID;
IF (#SubActivityDate < #StartDate )
RETURN 0; -- out of range date
IF (#EndDate IS NULL)
RETURN 1; -- good date
ELSE
IF (#SubActivityDate > #EndDate)
RETURN 0; -- out of range date
RETURN 1; -- good date
END
What is best is situation by situation. Constraint guarantees proper values but rollsback an entire transaction over one wrong value. Triggers allow you more control but are a little more complex because of it.
Create and Populate your Table
IF OBJECT_ID('dbo.yourTable') IS NOT NULL
DROP TABLE yourTable;
CREATE TABLE yourTable
(
ID INT IDENTITY(1,1) PRIMARY KEY,
StartDate DATE NOT NULL,
SubActivityDate DATE NULL,
EndDate DATE NULL
);
INSERT INTO yourTable(StartDate,SubActivityDate,EndDate)
VALUES ('20150101',NULL,NULL),
('20150101',NULL,NULL),
('20150101',NULL,'20150201'),
('20150101',NULL,'20150201');
Constraint Method:
ALTER TABLE yourTable
ADD CONSTRAINT chk_date CHECK (StartDate <= SubActivityDate AND SubActivityDate <= EndDate);
UPDATE yourTable
SET SubActivityDate = CASE
WHEN ID = 1 THEN '20140101' --bad
WHEN ID = 2 THEN '20150102' --good
WHEN ID = 3 THEN '20140101' --bad
WHEN ID = 4 THEN '20150102' --good
END
SELECT *
FROM yourTable;
Since there is at least value that does not fit the constraint, the whole transaction is rolled back and the result is that SubActivitDate stays NULL.
Results:
ID StartDate SubActivityDate EndDate
----------- ---------- --------------- ----------
1 2015-01-01 NULL NULL
2 2015-01-01 NULL NULL
3 2015-01-01 NULL 2015-02-01
4 2015-01-01 NULL 2015-02-01
Trigger Method(My Preferred Method)
CREATE TRIGGER trg_check_date ON yourTable
INSTEAD OF UPDATE
AS
BEGIN
UPDATE yourTable
SET SubActivityDate = CASE
WHEN inserted.SubActivityDate >= inserted.StartDate AND ((Inserted.EndDate IS NULL) OR Inserted.SubActivityDate <= Inserted.EndDate) THEN inserted.SubActivityDate
ELSE NULL
END
FROM yourTable
INNER JOIN inserted
ON yourTable.ID = inserted.ID
END;
GO
UPDATE yourTable
SET SubActivityDate = CASE
WHEN ID = 1 THEN '20140101' --bad
WHEN ID = 2 THEN '20150102' --good
WHEN ID = 3 THEN '20140101' --bad
WHEN ID = 4 THEN '20150102' --good
END
SELECT *
FROM yourTable
This method allows the proper values and simply returns null for improper ones. If you wanted, you could even export the incorrect values from the inserted table into a log table so you know which ones didn't work. Or raise an error message and list the values that didn't work. In short, you have total control of the situation.
Results:
ID StartDate SubActivityDate EndDate
----------- ---------- --------------- ----------
1 2015-01-01 NULL NULL
2 2015-01-01 2015-01-02 NULL
3 2015-01-01 NULL 2015-02-01
4 2015-01-01 2015-01-02 2015-02-01
constraint:
ALTER TABLE SubActivity
ADD CONSTRAINT CK_SubActivity_Date CHECK (dbo.ufnIsSubactivityDateValid(ActivityID, SubActivityDate) = 1);
function:
CREATE FUNCTION ufnIsSubactivityDateValid(#ActivityID [int], #SubActivityDate [date])
RETURNS [bit]
AS
BEGIN
DECLARE #StartDate date, #EndDate date;
SELECT #StartDate = StartDate , #EndDate = EndDate
FROM Activity
WHERE ActivityID = #ActivityID;
IF (#SubActivityDate < #StartDate )
RETURN 0; -- out of range date
IF (#EndDate IS NULL)
RETURN 1; -- good date
ELSE
IF (#SubActivityDate > #EndDate)
RETURN 0; -- out of range date
RETURN 1; -- good date
END
Suppose I have following table in Sql Server 2008:
ItemId StartDate EndDate
1 NULL 2011-01-15
2 2011-01-16 2011-01-25
3 2011-01-26 NULL
As you can see, this table has StartDate and EndDate columns. I want to validate data in these columns. Intervals cannot conflict with each other. So, the table above is valid, but the next table is invalid, becase first row has End Date greater than StartDate in the second row.
ItemId StartDate EndDate
1 NULL 2011-01-17
2 2011-01-16 2011-01-25
3 2011-01-26 NULL
NULL means infinity here.
Could you help me to write a script for data validation?
[The second task]
Thanks for the answers.
I have a complication. Let's assume, I have such table:
ItemId IntervalId StartDate EndDate
1 1 NULL 2011-01-15
2 1 2011-01-16 2011-01-25
3 1 2011-01-26 NULL
4 2 NULL 2011-01-17
5 2 2011-01-16 2011-01-25
6 2 2011-01-26 NULL
Here I want to validate intervals within a groups of IntervalId, but not within the whole table. So, Interval 1 will be valid, but Interval 2 will be invalid.
And also. Is it possible to add a constraint to the table in order to avoid such invalid records?
[Final Solution]
I created function to check if interval is conflicted:
CREATE FUNCTION [dbo].[fnIntervalConflict]
(
#intervalId INT,
#originalItemId INT,
#startDate DATETIME,
#endDate DATETIME
)
RETURNS BIT
AS
BEGIN
SET #startDate = ISNULL(#startDate,'1/1/1753 12:00:00 AM')
SET #endDate = ISNULL(#endDate,'12/31/9999 11:59:59 PM')
DECLARE #conflict BIT = 0
SELECT TOP 1 #conflict = 1
FROM Items
WHERE IntervalId = #intervalId
AND ItemId <> #originalItemId
AND (
(ISNULL(StartDate,'1/1/1753 12:00:00 AM') >= #startDate
AND ISNULL(StartDate,'1/1/1753 12:00:00 AM') <= #endDate)
OR (ISNULL(EndDate,'12/31/9999 11:59:59 PM') >= #startDate
AND ISNULL(EndDate,'12/31/9999 11:59:59 PM') <= #endDate)
)
RETURN #conflict
END
And then I added 2 constraints to my table:
ALTER TABLE dbo.Items ADD CONSTRAINT
CK_Items_Dates CHECK (StartDate IS NULL OR EndDate IS NULL OR StartDate <= EndDate)
GO
and
ALTER TABLE dbo.Items ADD CONSTRAINT
CK_Items_ValidInterval CHECK (([dbo].[fnIntervalConflict]([IntervalId], ItemId,[StartDate],[EndDate])=(0)))
GO
I know, the second constraint slows insert and update operations, but it is not very important for my application.
And also, now I can call function fnIntervalConflict from my application code before inserts and updates of data in the table.
Something like this should give you all overlaping periods
SELECT
*
FROM
mytable t1
JOIN mytable t2 ON t1.EndDate>t2.StartDate AND t1.StartDate < t2.StartDate
Edited for Adrians comment bellow
This will give you the rows that are incorrect.
Added ROW_NUMBER() as I didnt know if all entries where in order.
-- Testdata
declare #date datetime = '2011-01-17'
;with yourTable(itemID, startDate, endDate)
as
(
SELECT 1, NULL, #date
UNION ALL
SELECT 2, dateadd(day, -1, #date), DATEADD(day, 10, #date)
UNION ALL
SELECT 3, DATEADD(day, 60, #date), NULL
)
-- End testdata
,tmp
as
(
select *
,ROW_NUMBER() OVER(order by startDate) as rowno
from yourTable
)
select *
from tmp t1
left join tmp t2
on t1.rowno = t2.rowno - 1
where t1.endDate > t2.startDate
EDIT:
As for the updated question:
Just add a PARTITION BY clause to the ROW_NUMBER() query and alter the join.
-- Testdata
declare #date datetime = '2011-01-17'
;with yourTable(itemID, startDate, endDate, intervalID)
as
(
SELECT 1, NULL, #date, 1
UNION ALL
SELECT 2, dateadd(day, 1, #date), DATEADD(day, 10, #date),1
UNION ALL
SELECT 3, DATEADD(day, 60, #date), NULL, 1
UNION ALL
SELECT 4, NULL, #date, 2
UNION ALL
SELECT 5, dateadd(day, -1, #date), DATEADD(day, 10, #date),2
UNION ALL
SELECT 6, DATEADD(day, 60, #date), NULL, 2
)
-- End testdata
,tmp
as
(
select *
,ROW_NUMBER() OVER(partition by intervalID order by startDate) as rowno
from yourTable
)
select *
from tmp t1
left join tmp t2
on t1.rowno = t2.rowno - 1
and t1.intervalID = t2.intervalID
where t1.endDate > t2.startDate
declare #T table (ItemId int, IntervalID int, StartDate datetime, EndDate datetime)
insert into #T
select 1, 1, NULL, '2011-01-15' union all
select 2, 1, '2011-01-16', '2011-01-25' union all
select 3, 1, '2011-01-26', NULL union all
select 4, 2, NULL, '2011-01-17' union all
select 5, 2, '2011-01-16', '2011-01-25' union all
select 6, 2, '2011-01-26', NULL
select T1.*
from #T as T1
inner join #T as T2
on coalesce(T1.StartDate, '1753-01-01') < coalesce(T2.EndDate, '9999-12-31') and
coalesce(T1.EndDate, '9999-12-31') > coalesce(T2.StartDate, '1753-01-01') and
T1.IntervalID = T2.IntervalID and
T1.ItemId <> T2.ItemId
Result:
ItemId IntervalID StartDate EndDate
----------- ----------- ----------------------- -----------------------
5 2 2011-01-16 00:00:00.000 2011-01-25 00:00:00.000
4 2 NULL 2011-01-17 00:00:00.000
Not directly related to the OP, but since Adrian's expressed an interest. Here's a table than SQL Server maintains the integrity of, ensuring that only one valid value is present at any time. In this case, I'm dealing with a current/history table, but the example can be modified to work with future data also (although in that case, you can't have the indexed view, and you need to write the merge's directly, rather than maintaining through triggers).
In this particular case, I'm dealing with a link table that I want to track the history of. First, the tables that we're linking:
create table dbo.Clients (
ClientID int IDENTITY(1,1) not null,
Name varchar(50) not null,
/* Other columns */
constraint PK_Clients PRIMARY KEY (ClientID)
)
go
create table dbo.DataItems (
DataItemID int IDENTITY(1,1) not null,
Name varchar(50) not null,
/* Other columns */
constraint PK_DataItems PRIMARY KEY (DataItemID),
constraint UQ_DataItem_Names UNIQUE (Name)
)
go
Now, if we were building a normal table, we'd have the following (Don't run this one):
create table dbo.ClientAnswers (
ClientID int not null,
DataItemID int not null,
IntValue int not null,
Comment varchar(max) null,
constraint PK_ClientAnswers PRIMARY KEY (ClientID,DataItemID),
constraint FK_ClientAnswers_Clients FOREIGN KEY (ClientID) references dbo.Clients (ClientID),
constraint FK_ClientAnswers_DataItems FOREIGN KEY (DataItemID) references dbo.DataItems (DataItemID)
)
But, we want a table that can represent a complete history. In particular, we want to design the structure such that overlapping time periods can never appear in the database. We always know which record was valid at any particular time:
create table dbo.ClientAnswerHistories (
ClientID int not null,
DataItemID int not null,
IntValue int null,
Comment varchar(max) null,
/* Temporal columns */
Deleted bit not null,
ValidFrom datetime2 null,
ValidTo datetime2 null,
constraint UQ_ClientAnswerHistories_ValidFrom UNIQUE (ClientID,DataItemID,ValidFrom),
constraint UQ_ClientAnswerHistories_ValidTo UNIQUE (ClientID,DataItemID,ValidTo),
constraint CK_ClientAnswerHistories_NoTimeTravel CHECK (ValidFrom < ValidTo),
constraint FK_ClientAnswerHistories_Clients FOREIGN KEY (ClientID) references dbo.Clients (ClientID),
constraint FK_ClientAnswerHistories_DataItems FOREIGN KEY (DataItemID) references dbo.DataItems (DataItemID),
constraint FK_ClientAnswerHistories_Prev FOREIGN KEY (ClientID,DataItemID,ValidFrom)
references dbo.ClientAnswerHistories (ClientID,DataItemID,ValidTo),
constraint FK_ClientAnswerHistories_Next FOREIGN KEY (ClientID,DataItemID,ValidTo)
references dbo.ClientAnswerHistories (ClientID,DataItemID,ValidFrom),
constraint CK_ClientAnswerHistory_DeletionNull CHECK (
Deleted = 0 or
(
IntValue is null and
Comment is null
)),
constraint CK_ClientAnswerHistory_IntValueNotNull CHECK (Deleted=1 or IntValue is not null)
)
go
That's a lot of constraints. The only way to maintain this table is through merge statements (see examples below, and try to reason about why yourself). We're now going to build a view that mimics that ClientAnswers table defined above:
create view dbo.ClientAnswers
with schemabinding
as
select
ClientID,
DataItemID,
ISNULL(IntValue,0) as IntValue,
Comment
from
dbo.ClientAnswerHistories
where
Deleted = 0 and
ValidTo is null
go
create unique clustered index PK_ClientAnswers on dbo.ClientAnswers (ClientID,DataItemID)
go
And we have the PK constraint we originally wanted. We've also used ISNULL to reinstate the not null-ness of the IntValue column (even though the check constraints already guarantee this, SQL Server is unable to derive this information). If we're working with an ORM, we let it target ClientAnswers, and the history gets automatically built. Next, we can have a function that lets us look back in time:
create function dbo.ClientAnswers_At (
#At datetime2
)
returns table
with schemabinding
as
return (
select
ClientID,
DataItemID,
ISNULL(IntValue,0) as IntValue,
Comment
from
dbo.ClientAnswerHistories
where
Deleted = 0 and
(ValidFrom is null or ValidFrom <= #At) and
(ValidTo is null or ValidTo > #At)
)
go
And finally, we need the triggers on ClientAnswers that build this history. We need to use merge statements, since we need to simultaneously insert new rows, and update the previous "valid" row to end date it with a new ValidTo value.
create trigger T_ClientAnswers_I
on dbo.ClientAnswers
instead of insert
as
set nocount on
;with Dup as (
select i.ClientID,i.DataItemID,i.IntValue,i.Comment,CASE WHEN cah.ClientID is not null THEN 1 ELSE 0 END as PrevDeleted,t.Dupl
from
inserted i
left join
dbo.ClientAnswerHistories cah
on
i.ClientID = cah.ClientID and
i.DataItemID = cah.DataItemID and
cah.ValidTo is null and
cah.Deleted = 1
cross join
(select 0 union all select 1) t(Dupl)
)
merge into dbo.ClientAnswerHistories cah
using Dup on cah.ClientID = Dup.ClientID and cah.DataItemID = Dup.DataItemID and cah.ValidTo is null and Dup.Dupl = 0 and Dup.PrevDeleted = 1
when matched then update set ValidTo = SYSDATETIME()
when not matched and Dup.Dupl=1 then insert (ClientID,DataItemID,IntValue,Comment,Deleted,ValidFrom)
values (Dup.ClientID,Dup.DataItemID,Dup.IntValue,Dup.Comment,0,CASE WHEN Dup.PrevDeleted=1 THEN SYSDATETIME() END);
go
create trigger T_ClientAnswers_U
on dbo.ClientAnswers
instead of update
as
set nocount on
;with Dup as (
select i.ClientID,i.DataItemID,i.IntValue,i.Comment,t.Dupl
from
inserted i
cross join
(select 0 union all select 1) t(Dupl)
)
merge into dbo.ClientAnswerHistories cah
using Dup on cah.ClientID = Dup.ClientID and cah.DataItemID = Dup.DataItemID and cah.ValidTo is null and Dup.Dupl = 0
when matched then update set ValidTo = SYSDATETIME()
when not matched then insert (ClientID,DataItemID,IntValue,Comment,Deleted,ValidFrom)
values (Dup.ClientID,Dup.DataItemID,Dup.IntValue,Dup.Comment,0,SYSDATETIME());
go
create trigger T_ClientAnswers_D
on dbo.ClientAnswers
instead of delete
as
set nocount on
;with Dup as (
select d.ClientID,d.DataItemID,t.Dupl
from
deleted d
cross join
(select 0 union all select 1) t(Dupl)
)
merge into dbo.ClientAnswerHistories cah
using Dup on cah.ClientID = Dup.ClientID and cah.DataItemID = Dup.DataItemID and cah.ValidTo is null and Dup.Dupl = 0
when matched then update set ValidTo = SYSDATETIME()
when not matched then insert (ClientID,DataItemID,Deleted,ValidFrom)
values (Dup.ClientID,Dup.DataItemID,1,SYSDATETIME());
go
Obviously, I could have built a simpler table (not a join table), but this is my standard go-to example (albeit it took me a while to reconstruct it - I forgot the set nocount on statements for a while). But the strength here is that, the base table, ClientAnswerHistories is incapable of storing overlapping time ranges for the same ClientID and DataItemID values.
Things get more complex when you need to deal with temporal foreign keys.
Of course, if you don't want any real gaps, then you can remove the Deleted column (and associated checks), make the not null columns really not null, modify the insert trigger to do a plain insert, and make the delete trigger raise an error instead.
I've always taken a slightly different approach to the design if I have data that is never to have overlapping intervals... namely don't store intervals, but only start times. Then, have a view that helps with displaying the intervals.
CREATE TABLE intervalStarts
(
ItemId int,
IntervalId int,
StartDate datetime
)
CREATE VIEW intervals
AS
with cte as (
select ItemId, IntervalId, StartDate,
row_number() over(partition by IntervalId order by isnull(StartDate,'1753-01-01')) row
from intervalStarts
)
select c1.ItemId, c1.IntervalId, c1.StartDate,
dateadd(dd,-1,c2.StartDate) as 'EndDate'
from cte c1
left join cte c2 on c1.IntervalId=c2.IntervalId
and c1.row=c2.row-1
So, sample data might look like:
INSERT INTO intervalStarts
select 1, 1, null union
select 2, 1, '2011-01-16' union
select 3, 1, '2011-01-26' union
select 4, 2, null union
select 5, 2, '2011-01-26' union
select 6, 2, '2011-01-14'
and a simple SELECT * FROM intervals yields:
ItemId | IntervalId | StartDate | EndDate
1 | 1 | null | 2011-01-15
2 | 1 | 2011-01-16 | 2011-01-25
3 | 1 | 2011-01-26 | null
4 | 2 | null | 2011-01-13
6 | 2 | 2011-01-14 | 2011-01-25
5 | 2 | 2011-01-26 | null