Related
I have a merge statement that builds my SCD type 2 table each night. This table must house all historical changes made in the source system and create a new row with the date from/date to columns populated along with the "islatest" flag. I have come across an issue today that I am not really sure how to handle.
There looks to have been multiple changes to the source table within a 24 hour period.
ID Code PAN EnterDate Cost Created
16155 1012401593331 ENRD 2015-11-05 7706.3 2021-08-17 14:34
16155 1012401593331 ENRD 2015-11-05 8584.4 2021-08-17 16:33
I use a basic merge statement to identify my changes however what would be the best approach to ensure all changes get picked up correctly? The above is giving me an error as it's trying to insert/update multiple rows with the same value
DECLARE #DateNow DATETIME = Getdate()
IF Object_id('tempdb..#meteridinsert') IS NOT NULL
DROP TABLE #meteridinsert;
CREATE TABLE #meteridinsert
(
meterid INT,
change VARCHAR(10)
);
MERGE
INTO [DIM].[Meters] AS target
using stg_meters AS source
ON target.[ID] = source.[ID]
AND target.latest=1
WHEN matched THEN
UPDATE
SET target.islatest = 0,
target.todate = #Datenow
WHEN NOT matched BY target THEN
INSERT
(
id,
code,
pan,
enterdate,
cost,
created,
[FromDate] ,
[ToDate] ,
[IsLatest]
)
VALUES
(
source.id,
source.code ,
source.pan ,
source.enterdate ,
source.cost ,
source.created ,
#Datenow ,
NULL ,
1
)
output source.id,
$action
INTO #meteridinsert;INSERT INTO [DIM].[Meters]
(
[id] ,
[code] ,
[pan] ,
[enterdate] ,
[cost] ,
[created] ,
[FromDate] ,
[ToDate] ,
[IsLatest]
)
SELECT ([id] ,[code] ,[pan] ,[enterdate] ,[cost] ,[created] , #DateNow ,NULL ,1 FROM stg_meters a
INNER JOIN #meteridinsert cid
ON a.id = cid.meterid
AND cid.change = 'UPDATE'
Maybe you can do it using merge statement, but I would prefer to use typicall update and insert approach in order to make it easier to understand (also I am not sure that merge allows you to use the same source record for update and insert...)
First of all I create the table dimscd2 to represent your dimension table
create table dimscd2
(naturalkey int, descr varchar(100), startdate datetime, enddate datetime)
And then I insert some records...
insert into dimscd2 values
(1,'A','2019-01-12 00:00:00.000', '2020-01-01 00:00:00.000'),
(1,'B','2020-01-01 00:00:00.000', NULL)
As you can see, the "current" is the one with descr='B' because it has an enddate NULL (I do recommend you to use surrogate keys for each record... This is just an incremental key for each record of your dimension, and the fact table must be linked with this surrogate key in order to reflect the status of the fact in the moment when happened).
Then, I have created some dummy data to represent the source data with the changes for the same natural key
-- new data (src_data)
select 1 as naturalkey,'C' as descr, cast('2020-01-02 00:00:00.000' as datetime) as dt into src_data
union all
select 1 as naturalkey,'D' as descr, cast('2020-01-03 00:00:00.000' as datetime) as dt
After that, I have created a temp table (##tmp) with this query to set the enddate for each record:
-- tmp table
select naturalkey, descr, dt,
lead(dt,1,0) over (partition by naturalkey order by dt) enddate,
row_number() over (partition by naturalkey order by dt) rn
into ##tmp
from src_data
The LEAD function takes the next start date for the same natural key, ordered by date (dt).
The ROW_NUMBER marks with 1 the oldest record in the source data for the natural key in the dimension.
Then, I proceed to close the "current" record using update
update d
set enddate = t.dt
from dimscd2 d
join ##tmp t
on d.naturalkey = t.naturalkey
and d.enddate is null
and t.rn = 1
And finally I add the new source data to the dimension with insert
insert into dimscd2
select naturalkey, descr, dt,
case enddate when '1900-00-00' then null else enddate end
from ##tmp
Final result is obtained with the query:
select * from dimscd2
You can test on this db<>fiddle
I'm working on a script to populate a very simple date dimension table whose granularity is down to the minute level. This table should ultimately contain a smalldatetime representing every minute from 1/1/2000 to 12/31/2015 23:59.
Here is the definition for the table:
CREATE TABLE [dbo].[REF_MinuteDimension] (
[TimeStamp] SMALLDATETIME NOT NULL,
CONSTRAINT [PK_REF_MinuteDimension] PRIMARY KEY CLUSTERED ([TimeStamp] ASC) WITH (FILLFACTOR = 100)
);
Here is the latest revision of the script:
DECLARE #CurrentTimeStamp AS SMALLDATETIME;
SELECT TOP(1) #CurrentTimeStamp = MAX([TimeStamp]) FROM [dbo].[REF_MinuteDimension];
IF #CurrentTimeStamp IS NOT NULL
SET #CurrentTimeStamp = DATEADD(MINUTE, 1, #CurrentTimeStamp);
ELSE
SET #CurrentTimeStamp = '1/1/2000 00:00';
ALTER TABLE [dbo].[REF_MinuteDimension] DROP CONSTRAINT [PK_REF_MinuteDimension];
WHILE #CurrentTimeStamp < '12/31/2050 23:59'
BEGIN
;WITH DateIndex ([TimeStamp]) AS
(
SELECT #CurrentTimeStamp
UNION ALL
SELECT DATEADD(MINUTE, 1, [TimeStamp]) FROM DateIndex di WHERE di.[TimeStamp] < dbo.fGetYearEnd(#CurrentTimeStamp)
)
INSERT INTO [dbo].[REF_MinuteDimension] ([TimeStamp])
SELECT di.[TimeStamp] FROM DateIndex di
OPTION (MAXRECURSION 0);
SET #CurrentTimeStamp = DATEADD(YEAR, 1, dbo.fGetYearBegin(#CurrentTimeStamp))
END
ALTER TABLE [dbo].[REF_MinuteDimension] ADD CONSTRAINT [PK_REF_MinuteDimension] PRIMARY KEY CLUSTERED ([TimeStamp] ASC) WITH (FILLFACTOR = 100);
A couple of things to point out:
I've added logic to drop and subsequently re-add the primary key constraint on the table, hoping to boost the performance.
I've added logic to chunk the INSERTS into yearly batches to minimize the impact on the transaction log. On a side note, we're using the SIMPLE recovery model.
Performance is so-so and takes around 15-20 minutes to complete. Any hints/suggestions on how this script could be "tuned up" or improved?
Also, for completeness here are fGetYearBegin and fGetYearEnd:
CREATE FUNCTION dbo.fGetYearBegin
(
#dtConvertDate datetime
)
RETURNS smalldatetime
AS
BEGIN
RETURN DATEADD(YEAR, DATEDIFF(YEAR, 0, #dtConvertDate), 0)
END
CREATE FUNCTION dbo.fGetYearEnd
(
#dtConvertDate datetime
)
RETURNS smalldatetime
AS
BEGIN
RETURN DATEADD(MINUTE, -1, DATEADD(YEAR, 1, dbo.fGetYearBegin(#dtConvertDate)))
END
This takes 11 seconds on my server...
If Object_ID('tempdb..#someNumbers') Is Not Null Drop Table #someNumbers;
Create Table #someNumbers (id Int);
Declare #minutes Int,
#days Int;
Select #minutes = DateDiff(Minute,'1/1/2000 00:00','1/2/2000 00:00'),
#days = DateDiff(Day,'1/1/2000 00:00','1/1/2051 00:00');
With Base As
(
Select 1 As seedID
Union All
Select 1
), Build As
(
Select seedID
From Base
Union All
Select b.seedID + 1
From Build b
Cross Join Base b2
Where b.SeedID < 14
)
Insert #someNumbers
Select Row_Number() Over (Order By seedID) As id
From Build
Option (MaxRecursion 0);
If Object_ID('tempdb..#values') Is Not Null Drop Table #values;
Create Table #values ([TimeStamp] SmallDateTime NOT NULL);
With Dates As
(
Select DateAdd(Day,id-1,'1/1/2000 00:00') As [TimeStamp]
From #someNumbers
Where id <= #days
)
Insert #values
Select Convert(SmallDateTime,DateAdd(Minute,id-1,[TimeStamp]))
From Dates d
Join #someNumbers sn
On sn.id <= #minutes
Order By 1
CREATE TABLE #AvailableDate (
CustomKey INT IDENTITY (1,1),
SelectedFaceID INT,
FromDate DATETIME,
ToDate DATETIME,
TempDate DATETIME,
Diff INT)
INSERT INTO #AvailableDate(SelectedFaceID, FromDate, ToDate, TempDate, Diff)
SELECT
SelectedFaceID,
FromDate,
ToDate,
(SELECT TOP 1 ToDate FROM #AvailableDate WITH(NOLOCK) ORDER BY #AvailableDate.CustomKey DESC),
(SELECT DATEDIFF(
d,
ToDate,
(SELECT TOP 1 ToDate FROM #AvailableDate ORDER BY CustomKey DESC)
)
)
FROM
SelectedFace WITH(NOLOCK)
Here I haven't been getting the value of SELECT TOP 1 ToDate FROM #AvailableDate WITH(NOLOCK) ORDER BY #AvailableDate.CustomKey DESC in above query or any value associated with #AvailableDate
You are creating a brand-new temp table and expect it to have some values because you are doing a select on that same table you just created. Values will be there only after you populate it.
You can still use temp table, though you will have to re-arrange your batch and first make an insert and later update records with TempDate and Diff fields.
I have a table that looks like this:
ID UserID DateTime TypeID
1 1 1/1/2010 10:00:00 1
2 2 1/1/2010 10:01:50 1
3 1 1/1/2010 10:02:50 1
4 1 1/1/2010 10:03:50 1
5 1 1/1/2010 11:00:00 1
6 2 1/1/2010 11:00:50 1
I need to query all users where their typeID is 1, but have only one row per 15 mins
For example, the result should be:
1 1 1/1/2010 10:00:00 1
2 2 1/1/2010 10:01:50 1
5 1 1/1/2010 11:00:00 1
6 2 1/1/2010 11:00:50 1
IDs 3 & 4 are not shown because 15 min haven't been passed since the last record for the specific userID.
IDs 1 & 5 are shown because 15 minutes has been passed for this specific userID
Same as for IDs 2 & 6.
How can I do it?
Thanks
Try this:
select * from
(
select ID, UserID,
Max(DateTime) as UpperBound,
Min(DateTime) as LowerBound,
TypeID
from the_table
where TypeID=1
group by ID,UserID,TypeID
) t
where datediff(mi,LowerBound,UpperBound)>=15
EDIT: SINCE MY ABOVE ATTEMPT WAS WRONG, I'm adding one more approach using a Sql table-valued Function that does not require recursion, since, understandable, it's a big concern.
Step 1: Create a table-type as follows (LoginDate is the DateTime column in Shay's example - DateTime name conflicts with a SQL data type and I think it's wise to avoid these conflicts)
CREATE TYPE [dbo].[TVP] AS TABLE(
[ID] [int] NOT NULL,
[UserID] [int] NOT NULL,
[LoginDate] [datetime] NOT NULL,
[TypeID] [int] NOT NULL
)
GO
Step 2: Create the following Function:
CREATE FUNCTION [dbo].[fnGetLoginFreq]
(
-- notice: TVP is the type (declared above)
#TVP TVP readonly
)
RETURNS
#Table_Var TABLE
(
-- This will be our result set
ID int,
UserId int,
LoginTime datetime,
TypeID int,
RowNumber int
)
AS
BEGIN
--We will insert records in this table as we go through the rows in the
--table passed in as parameter and decide that we should add an entry because
--15' had elapsed between logins
DECLARE #temp table
(
ID int,
UserId int,
LoginTime datetime,
TypeID int
)
-- seems silly, but is not because we need to add a row_number column to help
-- in our iteration and table-valued paramters cannot be modified inside the function
insert into #Table_var
select ID,UserID,Logindate,TypeID,row_number() OVER(ORDER BY UserID,LoginDate) AS [RowNumber]
from #TVP order by UserID asc,LoginDate desc
declare #Index int,#End int,#CurrentLoginTime datetime, #NextLoginTime datetime, #CurrentUserID int , #NextUserID int
select #Index=1,#End=count(*) from #Table_var
while(#Index<=#End)
begin
select #CurrentLoginTime=LoginTime,#CurrentUserID=UserID from #Table_var where RowNumber=#Index
select #NextLoginTime=LoginTime,#NextUserID=UserID from #Table_var where RowNumber=(#Index+1)
if(#CurrentUserID=#NextUserID)
begin
if( abs(DateDiff(mi,#CurrentLoginTime,#NextLoginTime))>=15)
begin
insert into #temp
select ID,UserID,LoginTime,TypeID
from #Table_var
where RowNumber=#Index
end
END
else
bEGIN
insert into #temp
select ID,UserID,LoginTime,TypeID
from #Table_var
where RowNumber=#Index and UserID=#CurrentUserID
END
if(#Index=#End)--last element?
begin
insert into #temp
select ID,UserID,LoginTime,TypeID
from #Table_var
where RowNumber=#Index and not
abs((select datediff(mi,#CurrentLoginTime,max(LoginTime)) from #temp where UserID=#CurrentUserID))<=14
end
select #Index=#Index+1
end
delete from #Table_var
insert into #Table_var
select ID, UserID ,LoginTime ,TypeID ,row_number() OVER(ORDER BY UserID,LoginTime) AS 'RowNumber'
from #temp
return
END
Step 3: Give it a spin
declare #TVP TVP
INSERT INTO #TVP
select ID,UserId,[DateType],TypeID from Shays_table where TypeID=1 --AND any other date restriction you want to add
select * from fnGetLoginFreq(#TVP) order by LoginTime asc
My tests returned this:
ID UserId LoginTime TypeID RowNumber
2 2 2010-01-01 10:01:50.000 1 3
4 1 2010-01-01 10:03:50.000 1 1
5 1 2010-01-01 11:00:00.000 1 2
6 2 2010-01-01 11:00:50.000 1 4
How about this, it's fairly straight forward and gives you the result you need:
SELECT ID, UserID, [DateTime], TypeID
FROM Users
WHERE Users.TypeID = 1
AND NOT EXISTS (
SELECT TOP 1 1
FROM Users AS U2
WHERE U2.ID <> Users.ID
AND U2.UserID = Users.UserID
AND U2.[DateTime] BETWEEN DATEADD(MI, -15, Users.[DateTime]) AND Users.[DateTime]
AND U2.TypeID = 1)
The NOT EXISTS restricts to only show records that have no record within 15minutes before them, so you will see the first record in a block rather than one every 15mins.
Edit: Since you want to see one every 15mins this should do without using recursion:
SELECT Users.ID, Users.UserID, Users.[DateTime], Users.TypeID
FROM
(
SELECT MIN(ID) AS ID, UserID,
DATEADD(minute, DATEDIFF(minute,0,[DateTime]) / 15 * 15, 0) AS [DateTime]
FROM Users
GROUP BY UserID, DATEADD(minute, DATEDIFF(minute,0,[DateTime]) / 15 * 15, 0)
) AS Dates
INNER JOIN Users AS Users ON Users.ID = Dates.ID
WHERE Users.TypeID = 1
AND NOT EXISTS (
SELECT TOP 1 1
FROM
(
SELECT MIN(ID) AS ID, UserID,
DATEADD(minute, DATEDIFF(minute,0,[DateTime]) / 15 * 15, 0) AS [DateTime]
FROM Users
GROUP BY UserID, DATEADD(minute, DATEDIFF(minute,0,[DateTime]) / 15 * 15, 0)
) AS Dates2
INNER JOIN Users AS U2 ON U2.ID = Dates2.ID
WHERE U2.ID <> Users.ID
AND U2.UserID = Users.UserID
AND U2.[DateTime] BETWEEN DATEADD(MI, -15, Users.[DateTime]) AND Users.[DateTime]
AND U2.TypeID = 1
)
ORDER BY Users.DateTime
If this doesn't work please post more sample data so that I can see what is missing.
Edit2 same as directly above but just using CTE now instead for improved readability and help improve maintainability, also I improved it to highlighted where you would also restrict the Dates table by whatever DateTime range that you would be restricting to the main query:
WITH Dates(ID, UserID, [DateTime])
AS
(
SELECT MIN(ID) AS ID, UserID,
DATEADD(minute, DATEDIFF(minute,0,[DateTime]) / 15 * 15, 0) AS [DateTime]
FROM Users
WHERE Users.TypeID = 1
--AND Users.[DateTime] BETWEEN #StartDateTime AND #EndDateTime
GROUP BY UserID, DATEADD(minute, DATEDIFF(minute,0,[DateTime]) / 15 * 15, 0)
)
SELECT Users.ID, Users.UserID, Users.[DateTime], Users.TypeID
FROM Dates
INNER JOIN Users ON Users.ID = Dates.ID
WHERE Users.TypeID = 1
--AND Users.[DateTime] BETWEEN #StartDateTime AND #EndDateTime
AND NOT EXISTS (
SELECT TOP 1 1
FROM Dates AS Dates2
INNER JOIN Users AS U2 ON U2.ID = Dates2.ID
WHERE U2.ID <> Users.ID
AND U2.UserID = Users.UserID
AND U2.[DateTime] BETWEEN DATEADD(MI, -15, Users.[DateTime]) AND Users.[DateTime]
AND U2.TypeID = 1
)
ORDER BY Users.DateTime
Also as a performance note, whenever dealing with something that might end up being recursive like this potentially could be (from other answers), you should straight away be considering if you are able to restrict the main query by a date range in general even if it's a whole year or longer range
You can use a recursive CTE for this though I would also evaluate a cursor if the result set is at all large as it may work out more efficient.
I've left out the ID column in my answer. If you really need it it would be possible to add it. It just makes the anchor part of the recursive CTE a bit more unwieldy.
DECLARE #T TABLE
(
ID INT PRIMARY KEY,
UserID INT,
[DateTime] DateTime,
TypeID INT
)
INSERT INTO #T
SELECT 1,1,'20100101 10:00:00', 1 union all
SELECT 2,2,'20100101 10:01:50', 1 union all
SELECT 3,1,'20100101 10:02:50', 1 union all
SELECT 4,1,'20100101 10:03:50', 1 union all
SELECT 5,1,'20100101 11:00:00', 1 union all
SELECT 6,2,'20100101 11:00:50', 1;
WITH RecursiveCTE
AS (SELECT UserID,
MIN([DateTime]) As [DateTime],
1 AS TypeID
FROM #T
WHERE TypeID = 1
GROUP BY UserID
UNION ALL
SELECT UserID,
[DateTime],
TypeID
FROM (
--Can't use TOP directly
SELECT T.*,
rn = ROW_NUMBER() OVER (PARTITION BY T.UserID ORDER BY
T.[DateTime])
FROM #T T
JOIN RecursiveCTE R
ON R.UserID = T.UserID
AND T.[DateTime] >=
DATEADD(MINUTE, 15, R.[DateTime])) R
WHERE R.rn = 1)
Suppose I have following table in Sql Server 2008:
ItemId StartDate EndDate
1 NULL 2011-01-15
2 2011-01-16 2011-01-25
3 2011-01-26 NULL
As you can see, this table has StartDate and EndDate columns. I want to validate data in these columns. Intervals cannot conflict with each other. So, the table above is valid, but the next table is invalid, becase first row has End Date greater than StartDate in the second row.
ItemId StartDate EndDate
1 NULL 2011-01-17
2 2011-01-16 2011-01-25
3 2011-01-26 NULL
NULL means infinity here.
Could you help me to write a script for data validation?
[The second task]
Thanks for the answers.
I have a complication. Let's assume, I have such table:
ItemId IntervalId StartDate EndDate
1 1 NULL 2011-01-15
2 1 2011-01-16 2011-01-25
3 1 2011-01-26 NULL
4 2 NULL 2011-01-17
5 2 2011-01-16 2011-01-25
6 2 2011-01-26 NULL
Here I want to validate intervals within a groups of IntervalId, but not within the whole table. So, Interval 1 will be valid, but Interval 2 will be invalid.
And also. Is it possible to add a constraint to the table in order to avoid such invalid records?
[Final Solution]
I created function to check if interval is conflicted:
CREATE FUNCTION [dbo].[fnIntervalConflict]
(
#intervalId INT,
#originalItemId INT,
#startDate DATETIME,
#endDate DATETIME
)
RETURNS BIT
AS
BEGIN
SET #startDate = ISNULL(#startDate,'1/1/1753 12:00:00 AM')
SET #endDate = ISNULL(#endDate,'12/31/9999 11:59:59 PM')
DECLARE #conflict BIT = 0
SELECT TOP 1 #conflict = 1
FROM Items
WHERE IntervalId = #intervalId
AND ItemId <> #originalItemId
AND (
(ISNULL(StartDate,'1/1/1753 12:00:00 AM') >= #startDate
AND ISNULL(StartDate,'1/1/1753 12:00:00 AM') <= #endDate)
OR (ISNULL(EndDate,'12/31/9999 11:59:59 PM') >= #startDate
AND ISNULL(EndDate,'12/31/9999 11:59:59 PM') <= #endDate)
)
RETURN #conflict
END
And then I added 2 constraints to my table:
ALTER TABLE dbo.Items ADD CONSTRAINT
CK_Items_Dates CHECK (StartDate IS NULL OR EndDate IS NULL OR StartDate <= EndDate)
GO
and
ALTER TABLE dbo.Items ADD CONSTRAINT
CK_Items_ValidInterval CHECK (([dbo].[fnIntervalConflict]([IntervalId], ItemId,[StartDate],[EndDate])=(0)))
GO
I know, the second constraint slows insert and update operations, but it is not very important for my application.
And also, now I can call function fnIntervalConflict from my application code before inserts and updates of data in the table.
Something like this should give you all overlaping periods
SELECT
*
FROM
mytable t1
JOIN mytable t2 ON t1.EndDate>t2.StartDate AND t1.StartDate < t2.StartDate
Edited for Adrians comment bellow
This will give you the rows that are incorrect.
Added ROW_NUMBER() as I didnt know if all entries where in order.
-- Testdata
declare #date datetime = '2011-01-17'
;with yourTable(itemID, startDate, endDate)
as
(
SELECT 1, NULL, #date
UNION ALL
SELECT 2, dateadd(day, -1, #date), DATEADD(day, 10, #date)
UNION ALL
SELECT 3, DATEADD(day, 60, #date), NULL
)
-- End testdata
,tmp
as
(
select *
,ROW_NUMBER() OVER(order by startDate) as rowno
from yourTable
)
select *
from tmp t1
left join tmp t2
on t1.rowno = t2.rowno - 1
where t1.endDate > t2.startDate
EDIT:
As for the updated question:
Just add a PARTITION BY clause to the ROW_NUMBER() query and alter the join.
-- Testdata
declare #date datetime = '2011-01-17'
;with yourTable(itemID, startDate, endDate, intervalID)
as
(
SELECT 1, NULL, #date, 1
UNION ALL
SELECT 2, dateadd(day, 1, #date), DATEADD(day, 10, #date),1
UNION ALL
SELECT 3, DATEADD(day, 60, #date), NULL, 1
UNION ALL
SELECT 4, NULL, #date, 2
UNION ALL
SELECT 5, dateadd(day, -1, #date), DATEADD(day, 10, #date),2
UNION ALL
SELECT 6, DATEADD(day, 60, #date), NULL, 2
)
-- End testdata
,tmp
as
(
select *
,ROW_NUMBER() OVER(partition by intervalID order by startDate) as rowno
from yourTable
)
select *
from tmp t1
left join tmp t2
on t1.rowno = t2.rowno - 1
and t1.intervalID = t2.intervalID
where t1.endDate > t2.startDate
declare #T table (ItemId int, IntervalID int, StartDate datetime, EndDate datetime)
insert into #T
select 1, 1, NULL, '2011-01-15' union all
select 2, 1, '2011-01-16', '2011-01-25' union all
select 3, 1, '2011-01-26', NULL union all
select 4, 2, NULL, '2011-01-17' union all
select 5, 2, '2011-01-16', '2011-01-25' union all
select 6, 2, '2011-01-26', NULL
select T1.*
from #T as T1
inner join #T as T2
on coalesce(T1.StartDate, '1753-01-01') < coalesce(T2.EndDate, '9999-12-31') and
coalesce(T1.EndDate, '9999-12-31') > coalesce(T2.StartDate, '1753-01-01') and
T1.IntervalID = T2.IntervalID and
T1.ItemId <> T2.ItemId
Result:
ItemId IntervalID StartDate EndDate
----------- ----------- ----------------------- -----------------------
5 2 2011-01-16 00:00:00.000 2011-01-25 00:00:00.000
4 2 NULL 2011-01-17 00:00:00.000
Not directly related to the OP, but since Adrian's expressed an interest. Here's a table than SQL Server maintains the integrity of, ensuring that only one valid value is present at any time. In this case, I'm dealing with a current/history table, but the example can be modified to work with future data also (although in that case, you can't have the indexed view, and you need to write the merge's directly, rather than maintaining through triggers).
In this particular case, I'm dealing with a link table that I want to track the history of. First, the tables that we're linking:
create table dbo.Clients (
ClientID int IDENTITY(1,1) not null,
Name varchar(50) not null,
/* Other columns */
constraint PK_Clients PRIMARY KEY (ClientID)
)
go
create table dbo.DataItems (
DataItemID int IDENTITY(1,1) not null,
Name varchar(50) not null,
/* Other columns */
constraint PK_DataItems PRIMARY KEY (DataItemID),
constraint UQ_DataItem_Names UNIQUE (Name)
)
go
Now, if we were building a normal table, we'd have the following (Don't run this one):
create table dbo.ClientAnswers (
ClientID int not null,
DataItemID int not null,
IntValue int not null,
Comment varchar(max) null,
constraint PK_ClientAnswers PRIMARY KEY (ClientID,DataItemID),
constraint FK_ClientAnswers_Clients FOREIGN KEY (ClientID) references dbo.Clients (ClientID),
constraint FK_ClientAnswers_DataItems FOREIGN KEY (DataItemID) references dbo.DataItems (DataItemID)
)
But, we want a table that can represent a complete history. In particular, we want to design the structure such that overlapping time periods can never appear in the database. We always know which record was valid at any particular time:
create table dbo.ClientAnswerHistories (
ClientID int not null,
DataItemID int not null,
IntValue int null,
Comment varchar(max) null,
/* Temporal columns */
Deleted bit not null,
ValidFrom datetime2 null,
ValidTo datetime2 null,
constraint UQ_ClientAnswerHistories_ValidFrom UNIQUE (ClientID,DataItemID,ValidFrom),
constraint UQ_ClientAnswerHistories_ValidTo UNIQUE (ClientID,DataItemID,ValidTo),
constraint CK_ClientAnswerHistories_NoTimeTravel CHECK (ValidFrom < ValidTo),
constraint FK_ClientAnswerHistories_Clients FOREIGN KEY (ClientID) references dbo.Clients (ClientID),
constraint FK_ClientAnswerHistories_DataItems FOREIGN KEY (DataItemID) references dbo.DataItems (DataItemID),
constraint FK_ClientAnswerHistories_Prev FOREIGN KEY (ClientID,DataItemID,ValidFrom)
references dbo.ClientAnswerHistories (ClientID,DataItemID,ValidTo),
constraint FK_ClientAnswerHistories_Next FOREIGN KEY (ClientID,DataItemID,ValidTo)
references dbo.ClientAnswerHistories (ClientID,DataItemID,ValidFrom),
constraint CK_ClientAnswerHistory_DeletionNull CHECK (
Deleted = 0 or
(
IntValue is null and
Comment is null
)),
constraint CK_ClientAnswerHistory_IntValueNotNull CHECK (Deleted=1 or IntValue is not null)
)
go
That's a lot of constraints. The only way to maintain this table is through merge statements (see examples below, and try to reason about why yourself). We're now going to build a view that mimics that ClientAnswers table defined above:
create view dbo.ClientAnswers
with schemabinding
as
select
ClientID,
DataItemID,
ISNULL(IntValue,0) as IntValue,
Comment
from
dbo.ClientAnswerHistories
where
Deleted = 0 and
ValidTo is null
go
create unique clustered index PK_ClientAnswers on dbo.ClientAnswers (ClientID,DataItemID)
go
And we have the PK constraint we originally wanted. We've also used ISNULL to reinstate the not null-ness of the IntValue column (even though the check constraints already guarantee this, SQL Server is unable to derive this information). If we're working with an ORM, we let it target ClientAnswers, and the history gets automatically built. Next, we can have a function that lets us look back in time:
create function dbo.ClientAnswers_At (
#At datetime2
)
returns table
with schemabinding
as
return (
select
ClientID,
DataItemID,
ISNULL(IntValue,0) as IntValue,
Comment
from
dbo.ClientAnswerHistories
where
Deleted = 0 and
(ValidFrom is null or ValidFrom <= #At) and
(ValidTo is null or ValidTo > #At)
)
go
And finally, we need the triggers on ClientAnswers that build this history. We need to use merge statements, since we need to simultaneously insert new rows, and update the previous "valid" row to end date it with a new ValidTo value.
create trigger T_ClientAnswers_I
on dbo.ClientAnswers
instead of insert
as
set nocount on
;with Dup as (
select i.ClientID,i.DataItemID,i.IntValue,i.Comment,CASE WHEN cah.ClientID is not null THEN 1 ELSE 0 END as PrevDeleted,t.Dupl
from
inserted i
left join
dbo.ClientAnswerHistories cah
on
i.ClientID = cah.ClientID and
i.DataItemID = cah.DataItemID and
cah.ValidTo is null and
cah.Deleted = 1
cross join
(select 0 union all select 1) t(Dupl)
)
merge into dbo.ClientAnswerHistories cah
using Dup on cah.ClientID = Dup.ClientID and cah.DataItemID = Dup.DataItemID and cah.ValidTo is null and Dup.Dupl = 0 and Dup.PrevDeleted = 1
when matched then update set ValidTo = SYSDATETIME()
when not matched and Dup.Dupl=1 then insert (ClientID,DataItemID,IntValue,Comment,Deleted,ValidFrom)
values (Dup.ClientID,Dup.DataItemID,Dup.IntValue,Dup.Comment,0,CASE WHEN Dup.PrevDeleted=1 THEN SYSDATETIME() END);
go
create trigger T_ClientAnswers_U
on dbo.ClientAnswers
instead of update
as
set nocount on
;with Dup as (
select i.ClientID,i.DataItemID,i.IntValue,i.Comment,t.Dupl
from
inserted i
cross join
(select 0 union all select 1) t(Dupl)
)
merge into dbo.ClientAnswerHistories cah
using Dup on cah.ClientID = Dup.ClientID and cah.DataItemID = Dup.DataItemID and cah.ValidTo is null and Dup.Dupl = 0
when matched then update set ValidTo = SYSDATETIME()
when not matched then insert (ClientID,DataItemID,IntValue,Comment,Deleted,ValidFrom)
values (Dup.ClientID,Dup.DataItemID,Dup.IntValue,Dup.Comment,0,SYSDATETIME());
go
create trigger T_ClientAnswers_D
on dbo.ClientAnswers
instead of delete
as
set nocount on
;with Dup as (
select d.ClientID,d.DataItemID,t.Dupl
from
deleted d
cross join
(select 0 union all select 1) t(Dupl)
)
merge into dbo.ClientAnswerHistories cah
using Dup on cah.ClientID = Dup.ClientID and cah.DataItemID = Dup.DataItemID and cah.ValidTo is null and Dup.Dupl = 0
when matched then update set ValidTo = SYSDATETIME()
when not matched then insert (ClientID,DataItemID,Deleted,ValidFrom)
values (Dup.ClientID,Dup.DataItemID,1,SYSDATETIME());
go
Obviously, I could have built a simpler table (not a join table), but this is my standard go-to example (albeit it took me a while to reconstruct it - I forgot the set nocount on statements for a while). But the strength here is that, the base table, ClientAnswerHistories is incapable of storing overlapping time ranges for the same ClientID and DataItemID values.
Things get more complex when you need to deal with temporal foreign keys.
Of course, if you don't want any real gaps, then you can remove the Deleted column (and associated checks), make the not null columns really not null, modify the insert trigger to do a plain insert, and make the delete trigger raise an error instead.
I've always taken a slightly different approach to the design if I have data that is never to have overlapping intervals... namely don't store intervals, but only start times. Then, have a view that helps with displaying the intervals.
CREATE TABLE intervalStarts
(
ItemId int,
IntervalId int,
StartDate datetime
)
CREATE VIEW intervals
AS
with cte as (
select ItemId, IntervalId, StartDate,
row_number() over(partition by IntervalId order by isnull(StartDate,'1753-01-01')) row
from intervalStarts
)
select c1.ItemId, c1.IntervalId, c1.StartDate,
dateadd(dd,-1,c2.StartDate) as 'EndDate'
from cte c1
left join cte c2 on c1.IntervalId=c2.IntervalId
and c1.row=c2.row-1
So, sample data might look like:
INSERT INTO intervalStarts
select 1, 1, null union
select 2, 1, '2011-01-16' union
select 3, 1, '2011-01-26' union
select 4, 2, null union
select 5, 2, '2011-01-26' union
select 6, 2, '2011-01-14'
and a simple SELECT * FROM intervals yields:
ItemId | IntervalId | StartDate | EndDate
1 | 1 | null | 2011-01-15
2 | 1 | 2011-01-16 | 2011-01-25
3 | 1 | 2011-01-26 | null
4 | 2 | null | 2011-01-13
6 | 2 | 2011-01-14 | 2011-01-25
5 | 2 | 2011-01-26 | null