SCD Type 2 - Handling Intraday changes?

SCD Type 2 - Handling Intraday changes? - sql

I have a merge statement that builds my SCD type 2 table each night. This table must house all historical changes made in the source system and create a new row with the date from/date to columns populated along with the "islatest" flag. I have come across an issue today that I am not really sure how to handle.
There looks to have been multiple changes to the source table within a 24 hour period.
ID Code PAN EnterDate Cost Created
16155 1012401593331 ENRD 2015-11-05 7706.3 2021-08-17 14:34
16155 1012401593331 ENRD 2015-11-05 8584.4 2021-08-17 16:33
I use a basic merge statement to identify my changes however what would be the best approach to ensure all changes get picked up correctly? The above is giving me an error as it's trying to insert/update multiple rows with the same value
DECLARE #DateNow DATETIME = Getdate()
IF Object_id('tempdb..#meteridinsert') IS NOT NULL
DROP TABLE #meteridinsert;
CREATE TABLE #meteridinsert
(
meterid INT,
change VARCHAR(10)
);
MERGE
INTO [DIM].[Meters] AS target
using stg_meters AS source
ON target.[ID] = source.[ID]
AND target.latest=1
WHEN matched THEN
UPDATE
SET target.islatest = 0,
target.todate = #Datenow
WHEN NOT matched BY target THEN
INSERT
(
id,
code,
pan,
enterdate,
cost,
created,
[FromDate] ,
[ToDate] ,
[IsLatest]
)
VALUES
(
source.id,
source.code ,
source.pan ,
source.enterdate ,
source.cost ,
source.created ,
#Datenow ,
NULL ,
1
)
output source.id,
$action
INTO #meteridinsert;INSERT INTO [DIM].[Meters]
(
[id] ,
[code] ,
[pan] ,
[enterdate] ,
[cost] ,
[created] ,
[FromDate] ,
[ToDate] ,
[IsLatest]
)
SELECT ([id] ,[code] ,[pan] ,[enterdate] ,[cost] ,[created] , #DateNow ,NULL ,1 FROM stg_meters a
INNER JOIN #meteridinsert cid
ON a.id = cid.meterid
AND cid.change = 'UPDATE'

Maybe you can do it using merge statement, but I would prefer to use typicall update and insert approach in order to make it easier to understand (also I am not sure that merge allows you to use the same source record for update and insert...)
First of all I create the table dimscd2 to represent your dimension table
create table dimscd2
(naturalkey int, descr varchar(100), startdate datetime, enddate datetime)
And then I insert some records...
insert into dimscd2 values
(1,'A','2019-01-12 00:00:00.000', '2020-01-01 00:00:00.000'),
(1,'B','2020-01-01 00:00:00.000', NULL)
As you can see, the "current" is the one with descr='B' because it has an enddate NULL (I do recommend you to use surrogate keys for each record... This is just an incremental key for each record of your dimension, and the fact table must be linked with this surrogate key in order to reflect the status of the fact in the moment when happened).
Then, I have created some dummy data to represent the source data with the changes for the same natural key
-- new data (src_data)
select 1 as naturalkey,'C' as descr, cast('2020-01-02 00:00:00.000' as datetime) as dt into src_data
union all
select 1 as naturalkey,'D' as descr, cast('2020-01-03 00:00:00.000' as datetime) as dt
After that, I have created a temp table (##tmp) with this query to set the enddate for each record:
-- tmp table
select naturalkey, descr, dt,
lead(dt,1,0) over (partition by naturalkey order by dt) enddate,
row_number() over (partition by naturalkey order by dt) rn
into ##tmp
from src_data
The LEAD function takes the next start date for the same natural key, ordered by date (dt).
The ROW_NUMBER marks with 1 the oldest record in the source data for the natural key in the dimension.
Then, I proceed to close the "current" record using update
update d
set enddate = t.dt
from dimscd2 d
join ##tmp t
on d.naturalkey = t.naturalkey
and d.enddate is null
and t.rn = 1
And finally I add the new source data to the dimension with insert
insert into dimscd2
select naturalkey, descr, dt,
case enddate when '1900-00-00' then null else enddate end
from ##tmp
Final result is obtained with the query:
select * from dimscd2
You can test on this db<>fiddle

Related

How to rebuild a record from a change log

I have a table of changes to an entity. I am trying to rebuild the data from the changes.
This is my Changes table:
CREATE TABLE Changes
(
Id IDENTITY,
RecordId INT,
Field INT,
Val VARCHAR(MAX),
DateOfChange DATETIME
);
Field column is a reference to what field changed, Val is the new value, RecordId is the Id of the record that changed. Ideally the Record table would contain the latest values but I am not that lucky. There are 10 different fields that changes are tracked, mostly dates but some other types are thrown in there.
This is my Record table:
CREATE TABLE Records
(
Id IDENTITY,
AUserGeneratedIdentifer VARCHAR(12)
)
I'd like to have a view to query by the rolled up values.
SELECT
AUserGeneratedIdentitfier, DateOpened, DateClosed, etc
FROM
RecordView
WHERE
AUserGeneratedIdentitfier = 'something'
I am trying to implemented it with CTEs but I am wondering if this is the correct way. I am using a CTE per field I am trying to get to.
WITH DateOpened AS
(
SELECT
RecordId, Val,
ROW_NUMBER() OVER (PARTITION BY RecordId ORDER BY DateOfChanged DESC) Rank
FROM
Changes
WHERE
FieldId = #DateOpenedId
) --- ... Repeat for every field
SELECT (my fields)
FROM Records
INNER JOIN <all ctes> on Record Id
But this method feels wrong to me, possibly due to my lack of SQL experience. Is there a better way that I am missing here? What are the performance implications of having multiple CTEs on the same table and joining with them?
Please excuse the hastily thrown together pseudo code, I hope it illustrates my problem accurately

I think you should be able to use a single ROW_NUMBER and pivot it as below.
DECLARE #Records TABLE (Id INT IDENTITY, AUserGeneratedIdentifer VARCHAR(12))
DECLARE #Changes TABLE (Id INT IDENTITY, RecordId INT, Field INT, Val VARCHAR(MAX), DateOfChange DATETIME);
INSERT INTO #Records (AUserGeneratedIdentifer)
VALUES ('qwer'), ('asdf')
INSERT INTO #Changes (RecordId, Field, Val, DateOfChange)
VALUES (1, 1, 'foo', '2021-01-01' ),
(2, 1, 'fooz', '2021-01-01' ),
(2, 2, 'barz', '2021-01-01' ),
(1, 2, 'bar', '2021-01-01' ),
(1, 1, 'foo2', '2021-01-02' ),
(2, 2, 'barz2', '2021-01-02' )
SELECT piv.RecordId, piv.[1], piv.[2]
FROM (
SELECT C.RecordId, C.Field, C.Val, ROW_NUMBER() OVER (PARTITION BY C.RecordId, C.Field ORDER BY C.DateOfChange DESC) RowNum
FROM #Changes C
) sub
PIVOT (
MAX(Val)
FOR Field IN ([1], [2])
) piv
WHERE piv.RowNum = 1

Optimize this query without using not exist repeatably, is there a better way to write this query?

For example I have three table where say DataTable1, DataTable2 and DataTable3
and need to filter it from DataRange table, every time I have used NOT exist as shown below,
Is there a better way to write this.
Temp table to hold some daterange which is used for fiter:
Declare #DateRangeTable as Table(
StartDate datetime,
EndDate datetime
)
Some temp table which will hold data on which we need to apply date range filter
INSERT INTO #DateRangeTable values
('07/01/2020','07/04/2020'),
('07/06/2020','07/08/2020');
/*Table 1 which will hold some data*/
Declare #DataTable1 as Table(
Id numeric,
Date datetime
)
INSERT INTO #DataTable1 values
(1,'07/09/2020'),
(2,'07/06/2020');
Declare #DataTable2 as Table(
Id numeric,
Date datetime
)
INSERT INTO #DataTable2 values
(1,'07/10/2020'),
(2,'07/06/2020');
Declare #DataTable3 as Table(
Id numeric,
Date datetime
)
INSERT INTO #DataTable3 values
(1,'07/11/2020'),
(2,'07/06/2020');
Now I want to filter data based on DateRange table, here I need some optimized way so that i don't have to use not exists mutiple times, In real senario, I have mutiple tables where I have to filter based on the daterange table.
Select * from #DataTable1
where NOT EXISTS(
Select 1 from #DateRangeTable
where [Date] between StartDate and EndDate
)
Select * from #DataTable2
where NOT EXISTS(
Select 1 from #DateRangeTable
where [Date] between StartDate and EndDate
)
Select * from #DataTable3
where NOT EXISTS(
Select 1 from #DateRangeTable
where [Date] between StartDate and EndDate
)

Instead of using NOT EXISTS you could join the date range table:
SELECT dt.*
FROM #DataTable1 dt
LEFT JOIN #DateRangeTable dr ON dt.[Date] BETWEEN dr.StartDate and dr.EndDate
WHERE dr.StartDate IS NULL
It may perform better on large tables but you would have to compare the execution plans and make sure you have indexes on the date columns.

I would write the same query... but if you can change table structure I would try to improve performance adding two columns to specify the month as an integer (I suppose is the first couple of figures).
Obviously you have to test with your data and compare the timings.
Declare #DateRangeTable as Table(
StartDate datetime,
EndDate datetime,
StartMonth tinyint,
EndMonth tinyint
)
INSERT INTO #DateRangeTable values
('07/01/2020','07/04/2020', 7, 7),
('07/06/2020','07/08/2020', 7, 7),
('07/25/2020','08/02/2020', 7, 8); // (another record with different months)
Now your queries can use the new column to try to reduce comparisons (is a tinyint, sql server can partition records if you define a secondary index for StartMonth and EndMonth):
Select * from #DataTable1
where NOT EXISTS(
Select 1 from #DateRangeTable
where (DATEPART('month', [Date]) between StartMonth and EndMonth)
and ([Date] between StartDate and EndDate)
)

Efficient way of storing date ranges

I need to store simple data - suppose I have some products with codes as a primary key, some properties and validity ranges. So data could look like this:
Products
code value begin_date end_date
10905 13 2005-01-01 2016-12-31
10905 11 2017-01-01 null
Those ranges are not overlapping, so on every date I have a list of unique products and their properties. So to ease the use of it I've created the function:
create function dbo.f_Products
(
#date date
)
returns table
as
return (
select
from dbo.Products as p
where
#date >= p.begin_date and
#date <= p.end_date
)
This is how I'm going to use it:
select
*
from <some table with product codes> as t
left join dbo.f_Products(#date) as p on
p.code = t.product_code
This is all fine, but how I can let optimizer know that those rows are unique to have better execution plan?
I did some googling, and found a couple of really nice articles for DDL which prevents storing overlapping ranges in the table:
Self-maintaining, Contiguous Effective Dates in Temporal Tables
Storing intervals of time with no overlaps
But even if I try those constraint I see that optimizer cannot understand that resulting recordset will return unique codes.
What I'd like to have is certain approach which gives me basically the same performance as if I stored those products list on certain date and selected it with date = #date.
I know that some RDMBS (like PostgreSQL) have special data types for this (Range Types). But SQL Server doesn't have anything like this.
Am I missing something or there're no way to do this properly in SQL Server?

You can create an indexed view that contains a row for each code/date in the range.
ProductDate (indexed view)
code value date
10905 13 2005-01-01
10905 13 2005-01-02
10905 13 ...
10905 13 2016-12-31
10905 11 2017-01-01
10905 11 2017-01-02
10905 11 ...
10905 11 Today
Like this:
create schema digits
go
create table digits.Ones (digit tinyint not null primary key)
insert into digits.Ones (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
create table digits.Tens (digit tinyint not null primary key)
insert into digits.Tens (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
create table digits.Hundreds (digit tinyint not null primary key)
insert into digits.Hundreds (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
create table digits.Thousands (digit tinyint not null primary key)
insert into digits.Thousands (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
create table digits.TenThousands (digit tinyint not null primary key)
insert into digits.TenThousands (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
go
create schema info
go
create table info.Products (code int not null, [value] int not null, begin_date date not null, end_date date null, primary key (code, begin_date))
insert into info.Products (code, [value], begin_date, end_date) values
(10905, 13, '2005-01-01', '2016-12-31'),
(10905, 11, '2017-01-01', null)
create table info.DateRange ([begin] date not null, [end] date not null, [singleton] bit not null default(1) check ([singleton] = 1))
insert into info.DateRange ([begin], [end]) values ((select min(begin_date) from info.Products), getdate())
go
create view info.ProductDate with schemabinding
as
select
p.code,
p.value,
dateadd(day, ones.digit + tens.digit*10 + huns.digit*100 + thos.digit*1000 + tthos.digit*10000, dr.[begin]) as [date]
from
info.DateRange as dr
cross join
digits.Ones as ones
cross join
digits.Tens as tens
cross join
digits.Hundreds as huns
cross join
digits.Thousands as thos
cross join
digits.TenThousands as tthos
join
info.Products as p on
dateadd(day, ones.digit + tens.digit*10 + huns.digit*100 + thos.digit*1000 + tthos.digit*10000, dr.[begin]) between p.begin_date and isnull(p.end_date, datefromparts(9999, 12, 31))
go
create unique clustered index idx_ProductDate on info.ProductDate ([date], code)
go
select *
from info.ProductDate with (noexpand)
where
date = '2014-01-01'
drop view info.ProductDate
drop table info.Products
drop table info.DateRange
drop table digits.Ones
drop table digits.Tens
drop table digits.Hundreds
drop table digits.Thousands
drop table digits.TenThousands
drop schema digits
drop schema info
go

A solution without gaps might be this:
DECLARE #tbl TABLE(ID INT IDENTITY,[start_date] DATE);
INSERT INTO #tbl VALUES({d'2016-10-01'}),({d'2016-09-01'}),({d'2016-08-01'}),({d'2016-07-01'}),({d'2016-06-01'});
SELECT * FROM #tbl;
DECLARE #DateFilter DATE={d'2016-08-13'};
SELECT TOP 1 *
FROM #tbl
WHERE [start_date]<=#DateFilter
ORDER BY [start_date] DESC
Important: Be sure that there is an (unique) index on start_date
UPDATE: for different products
DECLARE #tbl TABLE(ID INT IDENTITY,ProductID INT,[start_date] DATE);
INSERT INTO #tbl VALUES
--product 1
(1,{d'2016-10-01'}),(1,{d'2016-09-01'}),(1,{d'2016-08-01'}),(1,{d'2016-07-01'}),(1,{d'2016-06-01'})
--product 1
,(2,{d'2016-10-17'}),(2,{d'2016-09-16'}),(2,{d'2016-08-15'}),(2,{d'2016-07-10'}),(2,{d'2016-06-11'});
DECLARE #DateFilter DATE={d'2016-08-13'};
WITH PartitionedCount AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY ProductID ORDER BY [start_date] DESC) AS Nr
,*
FROM #tbl
WHERE [start_date]<=#DateFilter
)
SELECT *
FROM PartitionedCount
WHERE Nr=1

First you need to create a unique clustered index for (begin_date, end_date, code)
Then SQL engine will be able to do INDEX SEEK.
Additionally, you can also try to create a view for dbo.Products table to join that table with pre-populated dbo.Dates table.
select p.code, p.val, p.begin_date, p.end_date, d.[date]
from dbo.Product as p
inner join dbo.dates d on p.begin_date <= d.[date] and d.[date] <= p.end_date
Then in your function, you use that view as "where #date = view.date". The result can be either better or slightly worse... it depends on the actual data.
You also can try to make that view indexed (depends on how often it is being updated).
Alternatively, you can have better performance if you populate dbo.Products table for every date in the [begin_date] .. [end_date] range.

Approach with ROW_NUMBER scans the whole Products table once. It is the best method if you have a lot of product codes in the Products table and few validity ranges for each code.
WITH
CTE_rn
AS
(
SELECT
code
,value
,ROW_NUMBER() OVER (PARTITION BY code ORDER BY begin_date DESC) AS rn
FROM Products
WHERE begin_date <= #date
)
SELECT *
FROM
<some table with product codes> as t
LEFT JOIN CTE_rn ON CTE_rn.code = t.product_code AND CTE_rn.rn = 1
;
If you have few product codes and a lot of validity ranges for each code in the Products table, then it is better to seek the Products table for each code using OUTER APPLY.
SELECT *
FROM
<some table with product codes> as t
OUTER APPLY
(
SELECT TOP(1)
Products.value
FROM Products
WHERE
Products.code = t.product_code
AND Products.begin_date <= #date
ORDER BY Products.begin_date DESC
) AS A
;
Both variants need unique index on (code, begin_date DESC) include (value).
Note how the queries don't even look at end_date, because they assume that intervals don't have gaps. They will work in SQL Server 2008.

EDIT: My original answer was using an INNER JOIN, but the questioner wanted a LEFT JOIN.
CREATE TABLE Products
(
[Code] INT NOT NULL
, [Value] VARCHAR(30) NOT NULL
, Begin_Date DATETIME NOT NULL
, End_Date DATETIME NULL
)
/*
Products
code value begin_date end_date
10905 13 2005-01-01 2016-12-31
10905 11 2017-01-01 null
*/
INSERT INTO Products ([Code], [Value], Begin_Date, End_Date) VALUES (10905, 13, '2005-01-01', '2016-12-31')
INSERT INTO Products ([Code], [Value], Begin_Date, End_Date) VALUES (10905, 11, '2017-01-01', NULL)
CREATE NONCLUSTERED INDEX SK_ProductDate ON Products ([Code], Begin_Date, End_Date) INCLUDE ([Value])
CREATE TABLE SomeTableWithProductCodes
(
[CODE] INT NOT NULL
)
INSERT INTO SomeTableWithProductCodes ([Code]) VALUES (10905)
Here is a prototypical query, with a date predicate. Note that there are more optimal ways to do this in a bulletproof fashion, using a "less than" operator on the upper bound, but that's a different discussion.
SELECT
P.[Code]
, P.[Value]
, P.[Begin_Date]
, P.[End_Date]
FROM
SomeTableWithProductCodes ST
LEFT JOIN Products AS P ON
ST.[Code] = P.[Code]
AND '2016-06-30' BETWEEN P.[Begin_Date] AND ISNULL(P.[End_Date], '9999-12-31')
This query will perform an Index Seek on the Product table.
Here is a SQL Fiddle: SQL Fiddle - Products and Dates

Find conflicted date intervals using SQL

Suppose I have following table in Sql Server 2008:
ItemId StartDate EndDate
1 NULL 2011-01-15
2 2011-01-16 2011-01-25
3 2011-01-26 NULL
As you can see, this table has StartDate and EndDate columns. I want to validate data in these columns. Intervals cannot conflict with each other. So, the table above is valid, but the next table is invalid, becase first row has End Date greater than StartDate in the second row.
ItemId StartDate EndDate
1 NULL 2011-01-17
2 2011-01-16 2011-01-25
3 2011-01-26 NULL
NULL means infinity here.
Could you help me to write a script for data validation?
[The second task]
Thanks for the answers.
I have a complication. Let's assume, I have such table:
ItemId IntervalId StartDate EndDate
1 1 NULL 2011-01-15
2 1 2011-01-16 2011-01-25
3 1 2011-01-26 NULL
4 2 NULL 2011-01-17
5 2 2011-01-16 2011-01-25
6 2 2011-01-26 NULL
Here I want to validate intervals within a groups of IntervalId, but not within the whole table. So, Interval 1 will be valid, but Interval 2 will be invalid.
And also. Is it possible to add a constraint to the table in order to avoid such invalid records?
[Final Solution]
I created function to check if interval is conflicted:
CREATE FUNCTION [dbo].[fnIntervalConflict]
(
#intervalId INT,
#originalItemId INT,
#startDate DATETIME,
#endDate DATETIME
)
RETURNS BIT
AS
BEGIN
SET #startDate = ISNULL(#startDate,'1/1/1753 12:00:00 AM')
SET #endDate = ISNULL(#endDate,'12/31/9999 11:59:59 PM')
DECLARE #conflict BIT = 0
SELECT TOP 1 #conflict = 1
FROM Items
WHERE IntervalId = #intervalId
AND ItemId <> #originalItemId
AND (
(ISNULL(StartDate,'1/1/1753 12:00:00 AM') >= #startDate
AND ISNULL(StartDate,'1/1/1753 12:00:00 AM') <= #endDate)
OR (ISNULL(EndDate,'12/31/9999 11:59:59 PM') >= #startDate
AND ISNULL(EndDate,'12/31/9999 11:59:59 PM') <= #endDate)
)
RETURN #conflict
END
And then I added 2 constraints to my table:
ALTER TABLE dbo.Items ADD CONSTRAINT
CK_Items_Dates CHECK (StartDate IS NULL OR EndDate IS NULL OR StartDate <= EndDate)
GO
and
ALTER TABLE dbo.Items ADD CONSTRAINT
CK_Items_ValidInterval CHECK (([dbo].[fnIntervalConflict]([IntervalId], ItemId,[StartDate],[EndDate])=(0)))
GO
I know, the second constraint slows insert and update operations, but it is not very important for my application.
And also, now I can call function fnIntervalConflict from my application code before inserts and updates of data in the table.

Something like this should give you all overlaping periods
SELECT
*
FROM
mytable t1
JOIN mytable t2 ON t1.EndDate>t2.StartDate AND t1.StartDate < t2.StartDate
Edited for Adrians comment bellow

This will give you the rows that are incorrect.
Added ROW_NUMBER() as I didnt know if all entries where in order.
-- Testdata
declare #date datetime = '2011-01-17'
;with yourTable(itemID, startDate, endDate)
as
(
SELECT 1, NULL, #date
UNION ALL
SELECT 2, dateadd(day, -1, #date), DATEADD(day, 10, #date)
UNION ALL
SELECT 3, DATEADD(day, 60, #date), NULL
)
-- End testdata
,tmp
as
(
select *
,ROW_NUMBER() OVER(order by startDate) as rowno
from yourTable
)
select *
from tmp t1
left join tmp t2
on t1.rowno = t2.rowno - 1
where t1.endDate > t2.startDate
EDIT:
As for the updated question:
Just add a PARTITION BY clause to the ROW_NUMBER() query and alter the join.
-- Testdata
declare #date datetime = '2011-01-17'
;with yourTable(itemID, startDate, endDate, intervalID)
as
(
SELECT 1, NULL, #date, 1
UNION ALL
SELECT 2, dateadd(day, 1, #date), DATEADD(day, 10, #date),1
UNION ALL
SELECT 3, DATEADD(day, 60, #date), NULL, 1
UNION ALL
SELECT 4, NULL, #date, 2
UNION ALL
SELECT 5, dateadd(day, -1, #date), DATEADD(day, 10, #date),2
UNION ALL
SELECT 6, DATEADD(day, 60, #date), NULL, 2
)
-- End testdata
,tmp
as
(
select *
,ROW_NUMBER() OVER(partition by intervalID order by startDate) as rowno
from yourTable
)
select *
from tmp t1
left join tmp t2
on t1.rowno = t2.rowno - 1
and t1.intervalID = t2.intervalID
where t1.endDate > t2.startDate

declare #T table (ItemId int, IntervalID int, StartDate datetime, EndDate datetime)
insert into #T
select 1, 1, NULL, '2011-01-15' union all
select 2, 1, '2011-01-16', '2011-01-25' union all
select 3, 1, '2011-01-26', NULL union all
select 4, 2, NULL, '2011-01-17' union all
select 5, 2, '2011-01-16', '2011-01-25' union all
select 6, 2, '2011-01-26', NULL
select T1.*
from #T as T1
inner join #T as T2
on coalesce(T1.StartDate, '1753-01-01') < coalesce(T2.EndDate, '9999-12-31') and
coalesce(T1.EndDate, '9999-12-31') > coalesce(T2.StartDate, '1753-01-01') and
T1.IntervalID = T2.IntervalID and
T1.ItemId <> T2.ItemId
Result:
ItemId IntervalID StartDate EndDate
----------- ----------- ----------------------- -----------------------
5 2 2011-01-16 00:00:00.000 2011-01-25 00:00:00.000
4 2 NULL 2011-01-17 00:00:00.000

Not directly related to the OP, but since Adrian's expressed an interest. Here's a table than SQL Server maintains the integrity of, ensuring that only one valid value is present at any time. In this case, I'm dealing with a current/history table, but the example can be modified to work with future data also (although in that case, you can't have the indexed view, and you need to write the merge's directly, rather than maintaining through triggers).
In this particular case, I'm dealing with a link table that I want to track the history of. First, the tables that we're linking:
create table dbo.Clients (
ClientID int IDENTITY(1,1) not null,
Name varchar(50) not null,
/* Other columns */
constraint PK_Clients PRIMARY KEY (ClientID)
)
go
create table dbo.DataItems (
DataItemID int IDENTITY(1,1) not null,
Name varchar(50) not null,
/* Other columns */
constraint PK_DataItems PRIMARY KEY (DataItemID),
constraint UQ_DataItem_Names UNIQUE (Name)
)
go
Now, if we were building a normal table, we'd have the following (Don't run this one):
create table dbo.ClientAnswers (
ClientID int not null,
DataItemID int not null,
IntValue int not null,
Comment varchar(max) null,
constraint PK_ClientAnswers PRIMARY KEY (ClientID,DataItemID),
constraint FK_ClientAnswers_Clients FOREIGN KEY (ClientID) references dbo.Clients (ClientID),
constraint FK_ClientAnswers_DataItems FOREIGN KEY (DataItemID) references dbo.DataItems (DataItemID)
)
But, we want a table that can represent a complete history. In particular, we want to design the structure such that overlapping time periods can never appear in the database. We always know which record was valid at any particular time:
create table dbo.ClientAnswerHistories (
ClientID int not null,
DataItemID int not null,
IntValue int null,
Comment varchar(max) null,
/* Temporal columns */
Deleted bit not null,
ValidFrom datetime2 null,
ValidTo datetime2 null,
constraint UQ_ClientAnswerHistories_ValidFrom UNIQUE (ClientID,DataItemID,ValidFrom),
constraint UQ_ClientAnswerHistories_ValidTo UNIQUE (ClientID,DataItemID,ValidTo),
constraint CK_ClientAnswerHistories_NoTimeTravel CHECK (ValidFrom < ValidTo),
constraint FK_ClientAnswerHistories_Clients FOREIGN KEY (ClientID) references dbo.Clients (ClientID),
constraint FK_ClientAnswerHistories_DataItems FOREIGN KEY (DataItemID) references dbo.DataItems (DataItemID),
constraint FK_ClientAnswerHistories_Prev FOREIGN KEY (ClientID,DataItemID,ValidFrom)
references dbo.ClientAnswerHistories (ClientID,DataItemID,ValidTo),
constraint FK_ClientAnswerHistories_Next FOREIGN KEY (ClientID,DataItemID,ValidTo)
references dbo.ClientAnswerHistories (ClientID,DataItemID,ValidFrom),
constraint CK_ClientAnswerHistory_DeletionNull CHECK (
Deleted = 0 or
(
IntValue is null and
Comment is null
)),
constraint CK_ClientAnswerHistory_IntValueNotNull CHECK (Deleted=1 or IntValue is not null)
)
go
That's a lot of constraints. The only way to maintain this table is through merge statements (see examples below, and try to reason about why yourself). We're now going to build a view that mimics that ClientAnswers table defined above:
create view dbo.ClientAnswers
with schemabinding
as
select
ClientID,
DataItemID,
ISNULL(IntValue,0) as IntValue,
Comment
from
dbo.ClientAnswerHistories
where
Deleted = 0 and
ValidTo is null
go
create unique clustered index PK_ClientAnswers on dbo.ClientAnswers (ClientID,DataItemID)
go
And we have the PK constraint we originally wanted. We've also used ISNULL to reinstate the not null-ness of the IntValue column (even though the check constraints already guarantee this, SQL Server is unable to derive this information). If we're working with an ORM, we let it target ClientAnswers, and the history gets automatically built. Next, we can have a function that lets us look back in time:
create function dbo.ClientAnswers_At (
#At datetime2
)
returns table
with schemabinding
as
return (
select
ClientID,
DataItemID,
ISNULL(IntValue,0) as IntValue,
Comment
from
dbo.ClientAnswerHistories
where
Deleted = 0 and
(ValidFrom is null or ValidFrom <= #At) and
(ValidTo is null or ValidTo > #At)
)
go
And finally, we need the triggers on ClientAnswers that build this history. We need to use merge statements, since we need to simultaneously insert new rows, and update the previous "valid" row to end date it with a new ValidTo value.
create trigger T_ClientAnswers_I
on dbo.ClientAnswers
instead of insert
as
set nocount on
;with Dup as (
select i.ClientID,i.DataItemID,i.IntValue,i.Comment,CASE WHEN cah.ClientID is not null THEN 1 ELSE 0 END as PrevDeleted,t.Dupl
from
inserted i
left join
dbo.ClientAnswerHistories cah
on
i.ClientID = cah.ClientID and
i.DataItemID = cah.DataItemID and
cah.ValidTo is null and
cah.Deleted = 1
cross join
(select 0 union all select 1) t(Dupl)
)
merge into dbo.ClientAnswerHistories cah
using Dup on cah.ClientID = Dup.ClientID and cah.DataItemID = Dup.DataItemID and cah.ValidTo is null and Dup.Dupl = 0 and Dup.PrevDeleted = 1
when matched then update set ValidTo = SYSDATETIME()
when not matched and Dup.Dupl=1 then insert (ClientID,DataItemID,IntValue,Comment,Deleted,ValidFrom)
values (Dup.ClientID,Dup.DataItemID,Dup.IntValue,Dup.Comment,0,CASE WHEN Dup.PrevDeleted=1 THEN SYSDATETIME() END);
go
create trigger T_ClientAnswers_U
on dbo.ClientAnswers
instead of update
as
set nocount on
;with Dup as (
select i.ClientID,i.DataItemID,i.IntValue,i.Comment,t.Dupl
from
inserted i
cross join
(select 0 union all select 1) t(Dupl)
)
merge into dbo.ClientAnswerHistories cah
using Dup on cah.ClientID = Dup.ClientID and cah.DataItemID = Dup.DataItemID and cah.ValidTo is null and Dup.Dupl = 0
when matched then update set ValidTo = SYSDATETIME()
when not matched then insert (ClientID,DataItemID,IntValue,Comment,Deleted,ValidFrom)
values (Dup.ClientID,Dup.DataItemID,Dup.IntValue,Dup.Comment,0,SYSDATETIME());
go
create trigger T_ClientAnswers_D
on dbo.ClientAnswers
instead of delete
as
set nocount on
;with Dup as (
select d.ClientID,d.DataItemID,t.Dupl
from
deleted d
cross join
(select 0 union all select 1) t(Dupl)
)
merge into dbo.ClientAnswerHistories cah
using Dup on cah.ClientID = Dup.ClientID and cah.DataItemID = Dup.DataItemID and cah.ValidTo is null and Dup.Dupl = 0
when matched then update set ValidTo = SYSDATETIME()
when not matched then insert (ClientID,DataItemID,Deleted,ValidFrom)
values (Dup.ClientID,Dup.DataItemID,1,SYSDATETIME());
go
Obviously, I could have built a simpler table (not a join table), but this is my standard go-to example (albeit it took me a while to reconstruct it - I forgot the set nocount on statements for a while). But the strength here is that, the base table, ClientAnswerHistories is incapable of storing overlapping time ranges for the same ClientID and DataItemID values.
Things get more complex when you need to deal with temporal foreign keys.
Of course, if you don't want any real gaps, then you can remove the Deleted column (and associated checks), make the not null columns really not null, modify the insert trigger to do a plain insert, and make the delete trigger raise an error instead.

I've always taken a slightly different approach to the design if I have data that is never to have overlapping intervals... namely don't store intervals, but only start times. Then, have a view that helps with displaying the intervals.
CREATE TABLE intervalStarts
(
ItemId int,
IntervalId int,
StartDate datetime
)
CREATE VIEW intervals
AS
with cte as (
select ItemId, IntervalId, StartDate,
row_number() over(partition by IntervalId order by isnull(StartDate,'1753-01-01')) row
from intervalStarts
)
select c1.ItemId, c1.IntervalId, c1.StartDate,
dateadd(dd,-1,c2.StartDate) as 'EndDate'
from cte c1
left join cte c2 on c1.IntervalId=c2.IntervalId
and c1.row=c2.row-1
So, sample data might look like:
INSERT INTO intervalStarts
select 1, 1, null union
select 2, 1, '2011-01-16' union
select 3, 1, '2011-01-26' union
select 4, 2, null union
select 5, 2, '2011-01-26' union
select 6, 2, '2011-01-14'
and a simple SELECT * FROM intervals yields:
ItemId | IntervalId | StartDate | EndDate
1 | 1 | null | 2011-01-15
2 | 1 | 2011-01-16 | 2011-01-25
3 | 1 | 2011-01-26 | null
4 | 2 | null | 2011-01-13
6 | 2 | 2011-01-14 | 2011-01-25
5 | 2 | 2011-01-26 | null

Need multiple copies of one resultset in sql without using loop

Following is the sample data. I need to make 3 copies of this data in t sql without using loop and return as one resultset. This is sample data not real.
42 South Yorkshire
43 Lancashire
44 Norfolk
Edit: I need multiple copies and I have no idea in advance that how many copies I need I have to decide this on the basis of dates. Date might be 1st jan to 3rd Jan OR 1st jan to 8th Jan.
Thanks.

Don't know about better but this is definatley more creative! you can use a CROSS JOIN.
EDIT: put some code in to generate a date range, you can change the date range, the rows in the #date are your multiplier.
declare #startdate datetime
, #enddate datetime
create table #data1 ([id] int , [name] nvarchar(100))
create table #dates ([date] datetime)
INSERT #data1 SELECT 42, 'South Yorkshire'
INSERT #data1 SELECT 43, 'Lancashire'
INSERT #data1 SELECT 44, 'Norfolk'
set #startdate = '1Jan2010'
set #enddate = '3Jan2010'
WHILE (#startdate <= #enddate)
BEGIN
INSERT #dates SELECT #startdate
set #startdate=#startdate+1
END
SELECT [id] , [name] from #data1 cross join #dates
drop table #data1
drop table #dates

You could always use a CTE to do the dirty work
Replace the WHERE Counter < 4 with the amount of duplicates you need.
CREATE TABLE City (ID INTEGER PRIMARY KEY, Name VARCHAR(32))
INSERT INTO City VALUES (42, 'South Yorkshire')
INSERT INTO City VALUES (43, 'Lancashire')
INSERT INTO City VALUES (44, 'Norfolk')
/*
The CTE duplicates every row from CTE for the amount
specified by Counter
*/
;WITH CityCTE (ID, Name, Counter) AS
(
SELECT c.ID, c.Name, 0 AS Counter
FROM City c
UNION ALL
SELECT c.ID, c.Name, Counter + 1
FROM City c
INNER JOIN CityCTE cte ON cte.ID = c.ID
WHERE Counter < 4
)
SELECT ID, Name
FROM CityCTE
ORDER BY 1, 2
DROP TABLE City

This may not be the most efficient way of doing it, but it should work.
(select ....)
union all
(select ....)
union all
(select ....)

Assume the table is named CountyPopulation:
SELECT * FROM CountyPopulation
UNION ALL
SELECT * FROM CountyPopulation
UNION ALL
SELECT * FROM CountyPopulation
Share and enjoy.

There is no need to use a cursor. The set-based approach would be to use a Calendar table. So first we make our calendar table which need only be done once and be somewhat permanent:
Create Table dbo.Calendar ( Date datetime not null Primary Key Clustered )
GO
; With Numbers As
(
Select ROW_NUMBER() OVER( ORDER BY S1.object_id ) As [Counter]
From sys.columns As s1
Cross Join sys.columns As s2
)
Insert dbo.Calendar([Date])
Select DateAdd(d, [Counter], '19000101')
From Numbers
Where [Counter] <= 100000
GO
I populated it with a 100K dates which goes into 2300. Obviously you can always expand it. Next we generate our test data:
Create Table dbo.Data(Id int not null, [Name] nvarchar(20) not null)
GO
Insert dbo.Data(Id, [Name]) Values(42,'South Yorkshire')
Insert dbo.Data(Id, [Name]) Values(43, 'Lancashire')
Insert dbo.Data(Id, [Name]) Values(44, 'Norfolk')
GO
Now the problem becomes trivial:
Declare #Start datetime
Declare #End datetime
Set #Start = '2010-01-01'
Set #End = '2010-01-03'
Select Dates.[Date], Id, [Name]
From dbo.Data
Cross Join (
Select [Date]
From dbo.Calendar
Where [Date] >= #Start
And [Date] <= #End
) As Dates

By far the best solution is CROSS JOIN. Most natural.
See my answer here: How to retrieve rows multiple times in SQL Server?
If you have a Numbers table lying around, it's even easier. You can DATEDIFF the dates to give you the filter on the Numbers table

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SCD Type 2 - Handling Intraday changes? - sql

Related

How to rebuild a record from a change log

Optimize this query without using not exist repeatably, is there a better way to write this query?

Efficient way of storing date ranges

Find conflicted date intervals using SQL

Need multiple copies of one resultset in sql without using loop

Categories

Resources