SQL Server 2005 - writing an insert and update trigger for validation - sql

I'm pretty bad at SQL, so I need someone to check my trigger query and tell me if it solves the problem and how acceptable it is. The requirements are a bit convoluted, so please bear with me.
Suppose I have a table declared like this:
CREATE TABLE Documents
(
id int identity primary key,
number1 nvarchar(32),
date1 datetime,
number2 nvarchar(32),
date2 datetime
);
For this table, the following constraints must be observed:
At least one of the number-date pairs should be filled (both the number and the date field not null).
If both number1 and date1 are not null, a record is uniquely identified by this pair. There cannot be two records with the same number1 and date1 if both fields are not null.
If either number1 or date1 is null, a record is uniquely identified by the number2-date2 pair.
Yes, there is a problem of poor normalization, but I cannot do anything about that.
As far as I know, I cannot write unique indexes on the number-date pairs that check whether some of the values are null in SQL Server 2005. Thus, I tried validating the constraints with a trigger.
One last requirement - the trigger should have no inserts of its own, only validation checks. Here's what I came up with:
CREATE TRIGGER validate_requisite_uniqueness
ON [Documents]
FOR INSERT, UPDATE
AS
BEGIN
DECLARE #NUMBER1 NVARCHAR (32)
DECLARE #DATE1 DATETIME
DECLARE #NUMBER2 NVARCHAR (32)
DECLARE #DATE2 DATETIME
DECLARE #DATETEXT VARCHAR(10)
DECLARE inserted_cursor CURSOR FAST_FORWARD FOR SELECT number1, date1, number2, date2 FROM Inserted
IF NOT EXISTS (SELECT * FROM INSERTED)
RETURN
OPEN inserted_cursor
FETCH NEXT FROM inserted_cursor into #NUMBER1, #DATE1, #NUMBER2, #DATE2
WHILE ##FETCH_STATUS = 0
BEGIN
IF (#NUMBER1 IS NULL OR #DATE1 IS NULL)
BEGIN
IF (#NUMBER2 IS NULL OR #DATE2 IS NULL)
BEGIN
ROLLBACK TRANSACTION
RAISERROR ('Either the first or the second number-date pair should be filled.', 10, 1)## Heading ##
END
END
IF (#NUMBER1 IS NOT NULL AND #DATE1 IS NOT NULL)
BEGIN
IF ((SELECT COUNT(*) FROM Documents WHERE number1 = #NUMBER1 AND date1 = #DATE1) > 1)
BEGIN
ROLLBACK TRANSACTION
SET #DATETEXT = CONVERT(VARCHAR(10), #DATE1, 104)
RAISERROR ('A document with the number1 ''%s'' and date1 ''%s'' already exists.', 10, 1, #NUMBER1, #DATETEXT)
END
END
ELSE IF (#NUMBER2 IS NOT NULL AND #DATE2 IS NOT NULL) /*the DATE2 check is redundant*/
BEGIN
IF ((SELECT COUNT(*) FROM Documents WHERE number2 = #NUMBER2 AND date2 = #DATE2) > 1)
BEGIN
ROLLBACK TRANSACTION
SET #DATETEXT = CONVERT(VARCHAR(10), #DATE2, 104)
RAISERROR ('A document with the number2 ''%s'' and date2 ''%s'' already exists.', 10, 1, #NUMBER2, #DATETEXT)
END
END
FETCH NEXT FROM inserted_cursor
END
CLOSE inserted_cursor
DEALLOCATE inserted_cursor
END
Please tell me how well-written and efficient this solution is.
A couple of questions I can come up with:
Will this trigger validate correctly against existing rows and newly inserted/updated rows in case of bulk modifications? It should, because the modifications are already applied to the table in the scope of this transaction, right?
Are the constraint violations handled correctly? Meaning, was I right to use the rollback transaction and raiserror pair?
Is the "IF NOT EXISTS (SELECT * FROM INSERTED) RETURN" statement used correctly?
Is the use of COUNT to check the constraints acceptable, or should I use some other way of checking the uniqueness of number-date pairs?
Can this solution be optimized in terms of execution speed? Should I add non-unique indexes on both number-date pairs?
Thanks.
EDIT:
A solution using a check constraint and an indexed view, based on Damien_The_Unbeliever's answer:
CREATE TABLE dbo.Documents
(
id int identity primary key,
number1 nvarchar(32),
date1 datetime,
number2 nvarchar(32),
date2 datetime,
constraint CK_Documents_AtLestOneNotNull CHECK (
(number1 is not null and date1 is not null) or
(number2 is not null and date2 is not null)
)
);
go
create view dbo.UniqueDocuments
with schemabinding
as
select
CASE WHEN (number1 is not null and date1 is not null)
THEN CAST(1 AS BIT)
ELSE CAST(0 AS BIT)
END as first_pair_filled,
CASE WHEN (number1 is not null and date1 is not null)
THEN number1
ELSE number2
END as number,
CASE WHEN (number1 is not null and date1 is not null)
THEN date1
ELSE date2
END as [date]
from
dbo.Documents
go
create unique clustered index IX_UniqueDocuments on dbo.UniqueDocuments(first_pair_filled,number,[date])
go

I would avoid the trigger, and use a check constraint and an indexed view:
CREATE TABLE dbo.Documents
(
id int identity primary key,
number1 nvarchar(32),
date1 datetime,
number2 nvarchar(32),
date2 datetime,
constraint CK_Documents_AtLestOneNotNull CHECK (
(number1 is not null and date1 is not null) or
(number2 is not null and date2 is not null)
)
);
go
create view dbo.UniqueDocuments
with schemabinding
as
select
COALESCE(number1,number2) as number,
COALESCE(date1,date2) as [date]
from
dbo.Documents
go
create unique clustered index IX_UniqueDocuments on dbo.UniqueDocuments(number,[date])
go
Which has the advantage that, although there is some "trigger-like" behaviour because of the indexed view, it's well-tested code that's already been deeply integrated into SQL Server.

I would use this logic instead (I didn't type it all as it takes ages), and definitely use SELECT 1 FROM ... in the IF EXISTS() statement as it helps performance. Also remove the cursors like marc_s said.
CREATE TRIGGER trg_validate_requisite_uniqueness
ON dbo.[Documents]
AFTER INSERT, UPDATE
AS
DECLARE #Number1 NVARCHAR(100) = (SELECT TOP 1 number1 FROM dbo.Documents ORDER BY Id DESC)
DECLARE #Date1 DATETIME = (SELECT TOP 1 date1 FROM dbo.Documents ORDER BY Id DESC)
DECLARE #Number2 NVARCHAR(100) = (SELECT TOP 1 number2 FROM dbo.Documents ORDER BY Id DESC)
DECLARE #Date2 DATETIME = (SELECT TOP 1 date2 FROM dbo.Documents ORDER BY Id DESC)
DECLARE #DateText NVARCHAR(100)
IF EXISTS (SELECT 1 FROM dbo.Documents AS D
INNER JOIN INSERTED AS I ON D.id = I.id WHERE I.Number1 IS NULL AND I.number2 IS NULL)
BEGIN
ROLLBACK TRANSACTION
RAISERROR ('Either the first or the second number pair should be filled.', 10, 1)
END
ELSE IF EXISTS (SELECT 1 FROM dbo.Documents AS D
INNER JOIN INSERTED AS I ON D.id = I.id WHERE I.Date1 IS NULL AND I.Date2 IS NULL)
BEGIN
ROLLBACK TRANSACTION
RAISERROR ('Either the first or the second date pair should be filled.', 10, 1)
END
ELSE IF EXISTS (SELECT 1 FROM dbo.Documents AS D
GROUP BY D.number1, D.date1 HAVING COUNT(*) >1
)
BEGIN
ROLLBACK TRANSACTION
SET #DateText = (SELECT CONVERT(VARCHAR(10), #Date1, 104))
RAISERROR ('Cannot have duplicate values', 10, 1, #Number1, #DateText )
END

Related

Database design with Loan and LoanLines

I have not been able to come up with a viable solution for days.
I am developing a system to maintain items and lending out these items.
Loan contains IEnumerable<LoanLine> which each points at an Item:
So far so good.
The tricky part comes to light when each item can't be lent out in the same period. And that period is defined by LoanLine.PickedUp ?? Loan.DateFrom > LoanLine.Returned ?? Loan.DateTo. This means that if LoanLine.PickedUp is null, then Loan.DateFrom should be used to compare, and if LoanLine.Returned is null, then Loan.DateTo should be used.
An item can be picked up and returned outside the loans boundaries. So these scenarioes can occur:
It should also be possible to "go back", ie. set LoanLine.Returned to null, in which case Loan.DateTo is used to compare again. The same goes with LoanLine.PickedUp.
It should also be possible to update both Loan.DateFrom and Loan.DateTo, with the berforementioned constraints still in effect. That means that if an update to Loan results in one of the lines, with either DateTime set to null, is overlapping, then the constraint shall throw an error.
This is the create-script:
create table loan
(
id int primary key identity(1, 1),
datefrom date not null,
dateto date not null,
employee_id int references employee(id) not null,
recipient_id int references employee(id) null,
note nvarchar(max) not null,
constraint c_loan_chkdates check (datefrom <= dateto)
);
create table loanlineitem
(
id int primary key identity(1, 1),
loan_id int references loan(id) on delete cascade not null,
item_id int references item(id) not null,
pickedup datetime null,
returned datetime null,
constraint uq_loanlineitem unique (loan_id, item_id),
constraint c_loanlineitem_chkdates check (returned is null or pickedup <= returned)
);
And this is the constraint:
create function checkLoanLineItem(#itemId int, #loanId int, #pickedup datetime, #returned datetime)
returns bit
as
begin
declare #result bit = 0;
declare #from date = #pickedup;
declare #to date = #returned;
--If either #from or #to is null, fill the ones with null from loan-table
if (isnull(#from, #to) is null)
begin
select #from = isnull(#from, datefrom),
#to = isnull(#to, dateadd(d, 1, dateto))
from loan
where id = #loanId;
end
if not exists (select top 1 lli.id from loanlineitem lli
inner join loan l on lli.loan_id = l.id
where l.id <> #loanId
and lli.item_id = #itemId
and ((isnull(lli.pickedup, l.datefrom) >= #from and isnull(lli.pickedup, l.datefrom) < #to)
--When comparing datetime with date, the date's time is 00:00:00
--so one day is added to account for this
or (isnull(lli.returned, dateadd(d, 1, l.dateto)) >= #from and isnull(lli.returned, dateadd(d, 1, l.dateto)) < #to))
)
begin
set #result = 1;
end
return #result;
end;
go;
alter table loanlineitem
add constraint c_loanlineitem_checkoverlap check (dbo.checkLoanLineItem(item_id, loan_id, pickedup, returned) = 1)
go;
I could make a similar constraint on Loan-table but then I would have similar code two places, which I would prefer to avoid, if possible.
So what I'm asking is; should I rethink my schema to accomplish this, or is it possible with some constraints which I'm not familiar with?
For this we will need two things:
A way to track the status of an item with respect to a loan
Only allow one active loan at a point in time
The first item can be addressed through the data model (see below) but the second will require any changes to the database MUST occur through stored procedures and those stored procedures will have to contain logic to keep the database in a consistent state. Otherwise you'll have a real mess on your hands (or rely on triggers, which is another headache).
We'll track the physical state of the item through an item status based on a timestamp, and, if desired, reservations through another mechanism based on a future date.
This query will return the current status and loan of all items, as well as the next reservation. From this you can also determine which items are past due.
SELECT
Item.ItemId
,ItemStatus.UpdateDtm
,ItemStatus.StatusCd
,ItemStatus.LoanNumber
,Loan.StartDt
,Loan.EndDt
,Reservation.StartDt
,Reservation.EndDt
FROM
Item Item
LEFT JOIN
LoanItemStatus ItemStatus
ON ItemStatus.ItemId = Item.ItemId
AND ItemStatus.UpdateDtm =
(
SELECT
MAX(UpdateDtm)
FROM
LoanItemStatus
WHERE
ItemId = Item.ItemId
)
LEFT JOIN
Loan Loan
ON Loan.LoanNumber = ItemStatus.LoanNumber
LEFT JOIN
ItemReservation Reservation
ON Reservation.ItemId = Item.ItemId
AND Reservation.StartDt =
(
SELECT
MIN(StartDt)
FROM
ItemReservation
WHERE
ItemId = Item.ItemId
AND StartDt >= GetDate()
)
It will probably make sense to harden this logic into a view.
To see if an item is reserved during a given timeframe:
SELECT
Item.ItemId
,CASE
WHEN COALESCE(PriorReservation.EndDt,GETDATE()) <= #ReservationStartDt AND #ReservationEndDt <= COALESCE(NextReservation.StartDt,'9999-12-31') THEN 'Y'
ELSE 'N'
END AS ReservationAvailableInd
FROM
Item Item
LEFT JOIN
ItemReservation PriorReservation
ON PriorReservation.ItemId = Item.ItemId
AND PriorReservation.StartDt =
(
SELECT
MAX(StartDt)
FROM
ItemReservation
WHERE
ItemId = Item.ItemId
AND StartDt <= #ReservationStartDt
)
LEFT JOIN
ItemReservation NextReservation
ON NextReservation.ItemId = Item.ItemId
AND NextReservation.StartDt =
(
SELECT
MIN(StartDt)
FROM
ItemReservation
WHERE
ItemId = Item.ItemId
AND StartDt > #ReservationStartDt
)
So you'll need to roll all of this into your stored procedures so:
When an item is loaned, it is available for the time period specified
When the loan date range is changed it does not conflict with the existing items or future reservations
When new reservations are made they do not conflict with existing procedures reservations
State transitions make sense (Not loaned/Returned -> Awaiting pickup -> Picked Up -> Returned/Lost)
You cannot delete loans with items that have been picked up or items that have been picked up
Alright, I found the solution, although it isn't the most elegant or DRY one.
First a view with occupations (thanks bbaird for the suggestion, which made it easier to figure out the logic):
create view vw_loanlineitem_occupations
as
select lli.id, loan_id, item_id, isnull(lli.pickedup, l.datefrom) as [from], isnull(lli.returned, dateadd(d, 1, l.dateto)) as [to] from loanlineitem lli inner join loan l on lli.loan_id = l.id
then a general check overlap udf:
create function udf_isOverlapping(#span1Start datetime, #span1End datetime, #span2Start datetime, #span2End datetime)
returns bit
as
begin
return iif((#span1Start <= #span2End and #span1End >= #span2Start), 1, 0);
end
then a udf and constraint on loan:
create function udf_isLoanValid(#loanId int, #dateFrom date, #dateTo date)
returns bit
as
begin
declare #result bit = 0;
--When type 'date' is compared to 'datetime' the time-part is 00:00:00, so add one day
set #dateTo = dateadd(d, 1, #dateTo)
if not exists (
select top 1 lli.id from loanlineitem lli
inner join loan l on lli.loan_id = l.id
--Only check items that are in this loan
where lli.item_id in (select item_id from loanlineitem where loan_id = #loanId)
--Check if this span is overlapping with other lines/loans
--When type 'date' is compared to 'datetime' the time-part is 00:00:00, so add one day
and (dbo.udf_isOverlapping(
#dateFrom,
#dateTo,
isnull(lli.pickedup, iif(l.id = #loanId, #dateFrom, l.datefrom)),
isnull(lli.returned, iif(l.id = #loanId, #dateTo, dateadd(d, 1, l.dateto)))
) = 1
)
)
begin
set #result = 1
end
return #result;
end;
go;
alter table loan
add constraint c_loan_datecheck check (dbo.udf_isLoanValid(id, dateFrom, dateTo) = 1);
and a separate constraint on loanlineitem, which unfortunately repeats some of the code from loan's constraint:
create function udf_isLineValid(#itemId int, #loanId int, #pickedup datetime, #returned datetime)
returns bit
as
begin
declare #result bit = 0;
declare #from date = #pickedup;
declare #to date = #returned;
--If either #from or #to is null, fill the ones with null from loan-table
if (#from is null or #to is null)
begin
select #from = isnull(#from, datefrom),
#to = isnull(#to, dateadd(d, 1, dateto))
from loan
where id = #loanId;
end
--If no lines with overlap exists, this line is valid, so set result to 1
if not exists (
select top 1 id from vw_loanlineitem_occupations
where item_id = #itemId
and loan_id <> #loanId
and dbo.udf_isOverlapping(#from, #to, [from], [to]) = 1
)
begin
set #result = 1;
end
return #result;
end;
go;
alter table loanlineitem
add constraint c_loanlineitem_checkoverlap check (dbo.udf_isLineValid(item_id, loan_id, pickedup, returned) = 1)
It works, which is the most important part. I am not sure about how performance is, but data integrity is more important.

New to SQL - Why is my Insert into trying to insert NULL into primary key?

What I want to do is insert a range of dates into multiple rows for customerID=1. I have and insert for dbo.Customer(Dates), specifying my that I want to insert a record into the Dates column for my Customer table, right? I am getting error:
Cannot insert the value NULL into column 'CustomerId', table 'dbo.Customers'
Sorry if I am way off track here. I have looked at similar threads to find out what I am missing, but I'm not piecing this together. I am thinking it wants to overwrite the existing customer ID as NULL, but I am unsure why exactly since I'm specifying dbo.Customer(Dates) and not the existing customerID for that record.
declare #date_Start datetime = '03/01/2011'
declare #date_End datetime = '10/30/2011'
declare #date datetime = #date_Start
while #date <= #date_End
begin
insert into dbo.Customer(Dates) select #date
if DATEPART(dd,#date) = 0
set #date = DATEADD(dd, -1, DATEADD(mm,1,#date))
else
set #date = DATEADD(dd,1,#date)
end
select * from dbo.Customer
The primary key is customerId, but you are not inserting a value.
My guess is that you declared it as a primary key with something like this:
customerId int primary key,
You want it to be an identity column, so the database assigns a value:
customerId int identity(1, 1) primary key
Then, you don't need to assign a value into the column when you insert a new row -- the database does it for you.
Your Customer table has a column named CustomerId and which column is NOT Nullable so you have to provide that column value as well. If your column type is Int try the bellow code:
declare #date_Start datetime = '03/01/2011'
declare #date_End datetime = '10/30/2011'
declare #date datetime = #date_Start
DECLARE #cusId INT
SET #cusId = 1
while #date <= #date_End
begin
insert into dbo.Customer(CustomerId, Dates) select #cusId, #date
if DATEPART(dd,#date) = 0
set #date = DATEADD(dd, -1, DATEADD(mm,1,#date))
else
set #date = DATEADD(dd,1,#date)
SET #cusId = #cusId + 1;
end
select * from dbo.Customer
thank you for the feedback. I think I'm scrapping this and going to go with creating a separate table to JOIN. Not sure why I didn't start doing that before

How can this SQL check constraint for time ranges fail, when table is empty?

I have implemented a time range validation, as a check constraint, using a function in SQL, using this guide, almost to the letter.
Creating the function first:
create function dbo.ValidateStatusPeriodInfoTimeRange
(
#btf_id VARCHAR(32),
#start_time BIGINT,
#end_time BIGINT
)
returns bit
as
begin
declare #Valid bit = 1;
if exists( select *
from dbo.StatusPeriodInfoOccurrence o
where o.btf_id = #btf_id
and #start_time <= o.end_time and o.start_time <= #end_time )
set #Valid = 0;
return #Valid;
end
And then the constraint, using the function:
alter table dbo.StatusPeriodInfoOccurrence with nocheck add constraint
CK_StatusPeriodInfoOccurrence_ValidateTimeRange
check (dbo.ValidateStatusPeriodInfoTimeRange(btf_id, start_time, end_time) = 1);
When I try to insert an element into a completely empty table, I get:
The INSERT statement conflicted with the CHECK constraint
"CK_StatusPeriodInfoOccurrence_ValidateTimeRange". The conflict occurred in
database "D600600TD01_BSM_Surveillance", table "dbo.StatusPeriodInfoOccurrence".
I tried to figure out if I did something wrong in the function itself, and created this query to check it's return value:
DECLARE #ReturnValue INT
EXEC #ReturnValue = ValidateStatusPeriodInfoTimeRange
#btf_id = 'a596933eff9143bceda5fc5d269827cd',
#start_time = 2432432,
#end_time = 432432423
SELECT #ReturnValue
But this returns 1, as it should.
I am at a loss on how to continue debugging this. All parts seem to work, but the whole does not. Any ideas on how the insert statement can conflict with the check constraint?
Edit: Here is my insert statement for completion:
INSERT INTO StatusPeriodInfoOccurrence (btf_id, start_time, end_time) VALUES ('a596933eff9143bceda5fc5d269827cd',2432432,432432423);
There is an additional primary key comlumn with identity auto increment.
CHECK constraints happen after the row is inserted, so in its current form, the constraint fails because the very row that was inserted matches the constraint. In order for this to work as a constraint (not a trigger) there must be a way to distinguish the row we're checking from all other rows. MichaƂ's answer shows how to do this without relying on an IDENTITY, but if you do have that explicitly excluding the row may be simpler:
create function dbo.ValidateStatusPeriodInfoTimeRange
(
#id INT,
#btf_id VARCHAR(32),
#start_time BIGINT,
#end_time BIGINT
)
returns bit
as
begin
declare #Valid bit = 1;
if exists( select *
from dbo.StatusPeriodInfoOccurrence o
where o.id <> #id AND o.btf_id = #btf_id
and #start_time <= o.end_time and o.start_time <= #end_time )
set #Valid = 0;
return #Valid;
end;
with the constraint defined as
check (dbo.ValidateStatusPeriodInfoTimeRange(id, btf_id, start_time, end_time) = 1)
Regardless of the approach, indexes on (btf_id, start_time) and (btf_id, end_time) are a good idea to keep this scalable, otherwise a full table scan is necessary on every insert.
As was mentioned in comments, constraint is checked after the record is inserted into a table, then the transaction is commited or rolled back, depending on result of a check, which in your example will always fails, as query:
select *
from dbo.StatusPeriodInfoOccurrence o
where o.btf_id = #btf_id
and #start_time <= o.end_time and o.start_time <= #end_time
will return always at least one row (the one being inserted).
So, knowing that, you should check if the query returns more than one record, so the condition in if statement should become:
if (select count(*)
from dbo.StatusPeriodInfoOccurrence o
where o.btf_id = #btf_id
and #start_time <= o.end_time and o.start_time <= #end_time ) > 1
This solution works fine (tested on my DB).

check constraint not working as per the condition defined in sql server

I have a table aa(id int, sdate date, edate date, constraint chk check(sdate<= enddate). For a particular id I have to check for overlapping dates. That is I do not want any one to insert data of a perticular id which has overlapping dates. So i need to check the below conditions -
if #id = id and (#sdate >= edate or #edate <= sdate) then allow insert
if #id = id and (#sdate < edate or #edate > sdate) then do not allow insert
if #id <> id then allow inserts
I have encapsulated the above logic in a function and used that function in check constraint. Function is working fine but check constraint is not allowing me to enter any records. I do not know why - my function and constraint are mentioned below :
alter function fn_aa(#id int,#sdate date,#edate date)
returns int
as
begin
declare #i int
if exists (select * from aa where id = #id and (#sdate >= edate or #edate <= sdate)) or not exists(select * from aa where id = #id)
begin
set #i = 1
end
if exists(select * from aa where id = #id and (#sdate < edate or #edate < sdate))
begin
set #i = 0
end
return #i
end
go
alter table aa
add constraint aa_ck check(dbo.fn_aa(id,sdate,edate) = 1)
Now when I try to insert any value in the table aa I get the following error -
"Msg 547, Level 16, State 0, Line 1
The INSERT statement conflicted with the CHECK constraint "aa_ck". The conflict occurred in database "tempdb", table "dbo.aa".
The statement has been terminated."
Function is returning value 1 but constraint is not allowing to insert data. Can some one help me here. I am trying for last 2 hours but cannot understand what am i doing wrong?
-
I think your logic is wrong. Two rows overlap
alter function fn_aa(#id int,#sdate date,#edate date)
returns int
as
begin
if exists (select *
from aa
where id = #id and
#sdate < edate and #edate > sdate
)
begin
return 0;
end;
return 1;
end;
Your version would return 1 when either of these conditions is true: #sdate >= edate or #edate <= sdate. However, checking for an overlap depends on both end points.

SQL query with start and end dates - what is the best option?

I am using MS SQL Server 2005 at work to build a database. I have been told that most tables will hold 1,000,000 to 500,000,000 rows of data in the near future after it is built... I have not worked with datasets this large. Most of the time I don't even know what I should be considering to figure out what the best answer might be for ways to set up schema, queries, stuff.
So... I need to know the start and end dates for something and a value that is associated with in ID during that time frame. SO... we can the table up two different ways:
create table xxx_test2 (id int identity(1,1), groupid int, dt datetime, i int)
create table xxx_test2 (id int identity(1,1), groupid int, start_dt datetime, end_dt datetime, i int)
Which is better? How do I define better? I filled the first table with about 100,000 rows of data and it takes about 10-12 seconds to set up in the format of the second table depending on the query...
select y.groupid,
y.dt as [start],
z.dt as [end],
(case when z.dt is null then 1 else 0 end) as latest,
y.i
from #x as y
outer apply (select top 1 *
from #x as x
where x.groupid = y.groupid and
x.dt > y.dt
order by x.dt asc) as z
or
http://consultingblogs.emc.com/jamiethomson/archive/2005/01/10/t-sql-deriving-start-and-end-date-from-a-single-effective-date.aspx
Buuuuut... with the second table.... to insert a new row, I have to go look and see if there is a previous row and then if so update its end date. So... is it a question of performance when retrieving data vs insert/update things? It seems silly to store that end date twice but maybe...... not? What things should I be looking at?
this is what i used to generate my fake data... if you want to play with it for some reason (if you change the maximum of the random number to something higher it will generate the fake stuff a lot faster):
declare #dt datetime
declare #i int
declare #id int
set #id = 1
declare #rowcount int
set #rowcount = 0
declare #numrows int
while (#rowcount<100000)
begin
set #i = 1
set #dt = getdate()
set #numrows = Cast(((5 + 1) - 1) *
Rand() + 1 As tinyint)
while #i<=#numrows
begin
insert into #x values (#id, dateadd(d,#i,#dt), #i)
set #i = #i + 1
end
set #rowcount = #rowcount + #numrows
set #id = #id + 1
print #rowcount
end
For your purposes, I think option 2 is the way to go for table design. This gives you flexibility, and will save you tons of work.
Having the effective date and end date will allow you to have a query that will only return currently effective data by having this in your where clause:
where sysdate between effectivedate and enddate
You can also then use it to join with other tables in a time-sensitive way.
Provided you set up the key properly and provide the right indexes, performance (on this table at least) should not be a problem.
for anyone who can use LEAD Analytic function of SQL Server 2012 (or Oracle, DB2, ...), retrieving data from the 1st table (that uses only 1 date column) would be much much quicker than without this feature:
select
groupid,
dt "start",
lead(dt) over (partition by groupid order by dt) "end",
case when lead(dt) over (partition by groupid order by dt) is null
then 1 else 0 end "latest",
i
from x