After days of searching the internet for an answer and trying to improve this myself I have finally decided to ask for help.
I receive a flat file each day from a client that contains about 1.1 million rows of data. I import this data into a staging database with SSIS (SQL Server 2012). This takes only a few seconds. The data is basically appointment information.
There are several fields in the flat file but the ones I have to use to synchronize the reporting table are called:
UpdateType - contains either INSERT, UPDATE or DELETE.
ChangeDate - Date time-stamp of when the row changed.
UniqueKey - UniqueKey + ChangeDate create a unique key for the row
The requirements from the client are that I either INSERT, UPDATE or DELETE the row from the reporting database in the order of the ChangeDate by UniqueKey. I could not figure out how to do this in a set so I created a while loop which takes over 20 hours to run which is way too long.
Here is an example of the flat file data I receive:
UpdateType UniqueKey ChangeDate MoreDate
INSERT 27244595 2013-09-24 08:51:48.367 synchronize data follows
DELETE 27244595 2013-09-25 10:15:08.433 synchronize data follows
INSERT 27244595 2013-09-25 10:15:09.990 synchronize data follows
DELETE 27244595 2013-09-25 15:02:36.287 synchronize data follows
INSERT 27244595 2013-09-25 15:02:36.610 synchronize data follows
As you can see the same record was inserted then deleted many times but this isn't always the case. In this example data, only the last record should appear in the reporting database table, 1 appointment is scheduled.
Here is another example from the same flat file:
UpdateType UniqueKey ChangeDate MoreDate
INSERT 28243572 2013-09-25 10:15:08.610 synchronize data follows
INSERT 28243572 2013-09-25 10:15:09.880 synchronize data follows
DELETE 28243572 2013-09-25 14:01:36.210 synchronize data follows
INSERT 28243572 2013-09-25 14:02:37.287 synchronize data follows
In this example the first and last record should appear in the reporting database table. There are 2 appointments scheduled. There are other times when an update in in the mix.
I don't create the reports and have no idea what they look like.
Here is the code I wrote to synchronize the reporting database from the staging database. If you have any suggestions on how to improve this process I welcome them and appreciate the help.
DECLARE --DECLARE SOME VARIABLES TO USE IN THE LOOP
#UPDATETYPE VARCHAR(6) --THIS WILL BE INSERT, DELETE OR UPDATE
,#KEY INTEGER --THIS IS THE UNIQUEKEY
,#CHANGEDATE DATETIME --THIS IS THE CHANGE DATE FROM THE FLATFILE
--START A WHILE LOOP TO GO ROW BY ROW
WHILE (SELECT COUNT(*) FROM STAGEDB.DBO.APPIONTMENTCHANGE) > 0
BEGIN
SELECT #UPDATETYPE = (SELECT TOP 1 [UPDATETYPE] FROM STAGEDB.DBO.APPIONTMENTCHANGE
ORDER BY [UNIQUEKEY], [CHANGEDATE]) --GET THE UPDATE TYPE FOR THE IF STATEMENTS
SELECT #KEY = (SELECT TOP 1 [UNIQUEKEY] FROM STAGEDB.DBO.APPIONTMENTCHANGE
ORDER BY [UNIQUEKEY], [CHANGEDATE]) --GET THE KEY
SELECT #CHANGEDATE = (SELECT TOP 1 [CHANGEDATE] FROM STAGEDB.DBO.APPIONTMENTCHANGE
ORDER BY [UNIQUEKEY], [CHANGEDATE]) --GET THE CHANGEDATE
--IF THIS ROW IS AN INSERT THEN COMPLETE THIS ON THE REPORT DATABASE
IF #UPDATETYPE = 'INSERT'
BEGIN
INSERT INTO [REPORTDB].[DBO].[APPOINTMENT]
([REPORTDB].[DBO].[APPOINTMENT].[UNIQUEKEY]
,[REPORTDB].[DBO].[APPOINTMENT].[APPOINTMENT]
,[REPORTDB].[DBO].[APPOINTMENT].[CLIENTLEADBK]
,[REPORTDB].[DBO].[APPOINTMENT].[FIRSTNAME]
,[REPORTDB].[DBO].[APPOINTMENT].[LASTNAME]
,[REPORTDB].[DBO].[APPOINTMENT].[PHONENUMBER]
,[REPORTDB].[DBO].[APPOINTMENT].[PHONENUMBER2]
,[REPORTDB].[DBO].[APPOINTMENT].[PHONENUMBER3]
,[REPORTDB].[DBO].[APPOINTMENT].[PHONENUMBER4]
,[REPORTDB].[DBO].[APPOINTMENT].[ADDRESSSTREET]
,[REPORTDB].[DBO].[APPOINTMENT].[ADDRESSCITY]
,[REPORTDB].[DBO].[APPOINTMENT].[ADDRESSSTATE]
,[REPORTDB].[DBO].[APPOINTMENT].[ADDRESSZIP]
,[REPORTDB].[DBO].[APPOINTMENT].[ADDRESSCOUNTRY])
SELECT TOP 1 [UNIQUEKEY]
,[APPOINTMENT]
,[CLIENTLEADBK]
,[FIRSTNAME]
,[LASTNAME]
,[PHONENUMBER]
,[PHONENUMBER2]
,[PHONENUMBER3]
,[PHONENUMBER4]
,[ADDRESSSTREET]
,[ADDRESSCITY]
,[ADDRESSSTATE]
,[ADDRESSZIP]
,[ADDRESSCOUNTRY]
FROM [STAGEDB].[DBO].[APPIONTMENTCHANGE]
ORDER BY [UNIQUEKEY], [CHANGEDATE];
--ONCE THE INSERT IS COMPLETED THEN DELETE THE ALREADY WORKED RECORD FROM THE STAGING DATABASE
DELETE FROM [STAGEDB].[DBO].[APPIONTMENTCHANGE]
WHERE [UNIQUEKEY] = #KEY AND [CHANGEDATE] = #CHANGEDATE;
END
--IF THE ROW IS A DELETE REQUEST THEN COMPLETE THIS ON THE REPORT DATABASE
IF #UPDATETYPE = 'DELETE'
BEGIN
DELETE FROM [REPORTDB].[DBO].[APPOINTMENT]
WHERE [UNIQUEKEY] = #KEY AND [CHANGEDATE] = #CHANGEDATE;
--ONCE THE DELETE IS COMPLETED THEN DELETE THE ALREADY WORKED RECORD FROM THE STAGING DATABASE
DELETE FROM [STAGEDB].[DBO].[APPIONTMENTCHANGE]
WHERE [UNIQUEKEY] = #KEY AND [CHANGEDATE] = #CHANGEDATE;
END
--IF THE ROW IS A UPDATE REQUEST DO THAT
IF #UPDATETYPE = 'UPDATE'
BEGIN
UPDATE [REPORTDB].[DBO].[APPOINTMENT]
SET [REPORTDB].[DBO].[APPOINTMENT].[APPOINTMENT] = B.[APPOINTMENT]
,[REPORTDB].[DBO].[APPOINTMENT].[CLIENTLEADBK] = B.[CLIENTLEADBK]
,[REPORTDB].[DBO].[APPOINTMENT].[FIRSTNAME] = B.[FIRSTNAME]
,[REPORTDB].[DBO].[APPOINTMENT].[LASTNAME] = B.[LASTNAME]
,[REPORTDB].[DBO].[APPOINTMENT].[PHONENUMBER] = B.[PHONENUMBER]
,[REPORTDB].[DBO].[APPOINTMENT].[PHONENUMBER2] = B.[PHONENUMBER2]
,[REPORTDB].[DBO].[APPOINTMENT].[PHONENUMBER3] = B.[PHONENUMBER3]
,[REPORTDB].[DBO].[APPOINTMENT].[PHONENUMBER4] = B.[PHONENUMBER4]
,[REPORTDB].[DBO].[APPOINTMENT].[ADDRESSSTREET] = B.[ADDRESSSTREET]
,[REPORTDB].[DBO].[APPOINTMENT].[ADDRESSCITY] = B.[ADDRESSCITY]
,[REPORTDB].[DBO].[APPOINTMENT].[ADDRESSSTATE] = B.[ADDRESSSTATE]
,[REPORTDB].[DBO].[APPOINTMENT].[ADDRESSZIP] = B.[ADDRESSZIP]
,[REPORTDB].[DBO].[APPOINTMENT].[ADDRESSCOUNTRY] = B.[ADDRESSCOUNTRY]
FROM [REPORTDB].[DBO].[APPOINTMENT]
INNER JOIN [STAGEDB].[DBO].[APPIONTMENTCHANGE] B
ON [REPORTDB].[DBO].[APPOINTMENT].[UNIQUEKEY] = B.[UNIQUEKEY]
WHERE [REPORTDB].[DBO].[APPOINTMENT].[UNIQUEKEY] = #KEY;
--ONCE THE UPDATE IS COMPLETED THEN DELETE THE ALREADY WORKED RECORD FROM THE STAGING DATABASE
DELETE FROM [STAGEDB].[DBO].[APPIONTMENTCHANGE]
WHERE [UNIQUEKEY] = #KEY AND [CHANGEDATE] = #CHANGEDATE;
END
END
I would much rather have all the files that need inserted, all that need deleted and all that should be updated not every record change that occurred but this is what I have to work with right now.
All serious ideas for improvements are appreciated. Please provide as much explanation as you can.
Use normal sql commands. First the insert
insert into realtable
(field1, field2, etc)
select field1, field2, etc
from stagingtable
where idfield in
(select idfield
from stagingtable
except
select idfield
from realtable)
updates
update r
set field1 = s.field1
, etc
from realtable r join stagingtable s on something
where whatever
deletions
delete from reatable
where idfield in
(select idfield
from staging table
where you want the record deleted from the real table)
Related
I have a simple thing to do but somehow can't figure out how to do it.
I have to modify two tables (insert or update) based on existance of a row in the first table.
There is a possibility that some other process will insert the row with id = 1
between getting the flag value and "if" statement that examines its value.
The catch is - I have to change TWO tables based on the flag value.
Question: How can I ensure the atomicity of this operation?
I could lock both tables by "select with TABLOCKX", modify them and release the lock by committing the transaction but ... won't it be overkill?
declare #flag int = 0
begin tran
select #flag = id from table1 where id = 1
if #flag = 0
begin
insert table1(id, ...) values(1, ...)
insert table2(id, ...) values(1, ...)
end
else
begin
update table1 set colX = ... where id = 1
update table2 set colX = ... where id = 1
end
commit tran
To sumarize our conversation and generalize to other's case :
If your column [id] is either PRIMARY KEY or UNIQUE you can put a Lock on that row. No other process will be able to change the value of [id]
If not, in my opinion you won't have other choice than Lock the table with a TABLOCKX. It will prevent any other process to UPDATE,DELETE or INSERT a row.
With that lock, it could possibly allow an other process to SELECT over the table depending on your isolation level.
If your database is in read_committed_snapshot, the other process would read the "old" value of the same [id].
To check your isolation level you can run
SELECT name, is_read_committed_snapshot_on FROM sys.databases
Given a "main" table which has a single primary key, from which a huge number of rows need to be deleted (perhaps about 200M). In addition, there are about 30 "related" tables that are related to the main table, and related rows must also be deleted from each. It is expected that about an equivalent huge number of rows (or more) would need to be deleted from each of the related tables.
Of course it's possible to change the condition to partition the amount of data to be deleted, and run it several times, but in any case, I need an efficient solution to do this.
John Rees suggests a way to do massive deletes in a single table in Delete Large Number of Rows Is Very Slow - SQL Server
, but the problem with that is that it performs several transactional deletes in a single table. This could potentially leave the db in an inconsistent state.
John Gibb suggests a way to delete from several related tables, in How do I delete from multiple tables using INNER JOIN in SQL server
, but it does not consider the possibility that the amount of data to be deleted from each of these tables is large.
How can these two solutions be combined into an efficient way to delete a large number of rows from several related tables? (I'm new to SQL)
Perhaps it's important to note that, in the scope of this problem, each "related" table is only related to the "main" table
I think this is what you're after...
This will delete 4000 rows from the tables with the foreign key references (assuming 1:1) before deleting the same 4000 rows from the main table.
It will loop until done, or it hits the stop time (if enabled).
DECLARE #BATCHSIZE INT, #ITERATION INT, #TOTALROWS INT, #MAXRUNTIME VARCHAR(8), #BSTOPATMAXTIME BIT, #MSG VARCHAR(500)
SET DEADLOCK_PRIORITY LOW;
SET #BATCHSIZE = 4000
SET #MAXRUNTIME = '08:00:00' -- 8AM
SET #BSTOPATMAXTIME = 1 -- ENFORCE 8AM STOP TIME
SET #ITERATION = 0 -- LEAVE THIS
SET #TOTALROWS = 0 -- LEAVE THIS
IF OBJECT_ID('TEMPDB..#TMPLIST') IS NOT NULL DROP TABLE #TMPLIST
CREATE TABLE #TMPLIST (ID BIGINT)
WHILE #BATCHSIZE>0
BEGIN
-- IF #BSTOPATMAXTIME = 1, THEN WE'LL STOP THE WHOLE JOB AT A SET TIME...
IF CONVERT(VARCHAR(8),GETDATE(),108) >= #MAXRUNTIME AND #BSTOPATMAXTIME=1
BEGIN
RETURN
END
TRUNCATE TABLE #TMPLIST
INSERT INTO #TMPLIST (ID)
SELECT TOP(#BATCHSIZE) ID
FROM MAINTABLE
WHERE X=Y -- DELETE CRITERIA HERE...
SET #BATCHSIZE=##ROWCOUNT
DELETE T1
FROM SOMETABLE1 T1
WHERE EXISTS (SELECT 1 FROM #TMPLIST T WHERE T1.MAINID=T.ID)
DELETE T2
FROM SOMETABLE2 T2
WHERE EXISTS (SELECT 1 FROM #TMPLIST T WHERE T2.MAINID=T.ID)
DELETE T3
FROM SOMETABLE3 T3
WHERE EXISTS (SELECT 1 FROM #TMPLIST T WHERE T3.MAINID=T.ID)
DELETE M
FROM MAINTABLE M
WHERE EXISTS (SELECT 1 FROM #TMPLIST T WHERE T3.MAINID=M.ID)
SET #ITERATION=#ITERATION+1
SET #TOTALROWS=#TOTALROWS+#BATCHSIZE
SET #MSG = 'Iteration: ' + CAST(#ITERATION AS VARCHAR) + ' Total deletes:' + CAST(#TOTALROWS AS VARCHAR)
RAISERROR (#MSG, 0, 1) WITH NOWAIT
END
I use this SQL Server trigger to look for insert/update of multiple records from a specific table and put it into another queue table (for processing later).
ALTER TRIGGER [dbo].[IC_ProductUpdate] ON [dbo].[StockItem]
AFTER INSERT, UPDATE
AS
BEGIN
SELECT RowNum = ROW_NUMBER() OVER(ORDER BY ItemID) , ItemID
INTO #ProductUpdates
FROM INSERTED;
DECLARE #MaxRownum INT;
SET #MaxRownum = (SELECT MAX(RowNum) FROM #ProductUpdates);
DECLARE #Iter INT;
SET #Iter = (SELECT MIN(RowNum) FROM #ProductUpdates);
WHILE #Iter <= #MaxRownum
BEGIN
-- Get Product Id
DECLARE #StockItemID INT = (SELECT ItemID FROM #ProductUpdates WHERE RowNum = #Iter);
-- Proceed If This Product Is Sync-able
IF (dbo.IC_CanSyncProduct(#StockItemID) = 1)
BEGIN
-- Check If There Is A [ProductUpdate] Queue Entry Already Exist For This Product
IF ((SELECT COUNT(*) FROM IC_ProductUpdateQueue WHERE StockItemID = #StockItemID) > 0)
BEGIN
-- Reset [ProductUpdate] Queue Entry
UPDATE IC_ProductUpdateQueue
SET Synced = 0
WHERE StockItemID = #StockItemID
END
ELSE
BEGIN
-- Insert [ProductUpdate] Queue Entry
INSERT INTO IC_ProductUpdateQueue (StockItemID, Synced)
VALUES (#StockItemID, 0)
END
END
SET #Iter = #Iter + 1;
END
DROP TABLE #ProductUpdates;
END
This works fine, however I only want the above trigger to react if certain columns were updated.
The columns I am interested in are:
Name
Description
I know I can use the following T-SQL syntax to check if a column really updated (during update event) like this:
IF (UPDATE(Name) OR UPDATE(Description))
BEGIN
// do something...
END
But, I am not sure how to incorporate this into the above trigger, since my trigger handles multiple rows being updated at same time also.
Any ideas? At which point in the trigger could i use IF (UPDATE(colX))?
First, I would suggest to have one separate trigger for each operation - one for INSERT, and another for UPDATE. Keeps the code cleaner (less messy IF statements and so forth).
The INSERT trigger is pretty simple, since there's nothing to check for updating - and there's absolutely no need for a temporary table and a slow WHILE loop - just two simple, set-based statements and you're done:
CREATE TRIGGER [dbo].[IC_ProductInsert] ON [dbo].[StockItem]
AFTER INSERT
AS
BEGIN
-- update the queue for those entries that already exist
-- those rows that *DO NOT* exist yet are not being touched
UPDATE puq
SET Synced = 0
FROM dbo.IC_ProductUpdateQueue puq
INNER JOIN Inserted i ON puq.StockItemID = i.StockItemID
-- for those rows that don't exist yet - insert the values
INSERT INTO dbo.IC_ProductUpdateQueue (StockItemID, Synced)
SELECT
i.StockItemID, 0
FROM
Inserted i
WHERE
NOT EXISTS (SELECT * FROM dbo.IC_ProductUpdateQueue puq
WHERE puq.StockItemID = i.StockItemID)
END
The UPDATE trigger needs one extra check - to see whether or not one of the two columns of interest has changed. This can be handled quite easily by combining the Inserted pseudo table with the new values (after the UPDATE), and the Deleted pseudo table with the "old" values (before the UPDATE):
ALTER TRIGGER [dbo].[IC_ProductUpdate] ON [dbo].[StockItem]
AFTER UPDATE
AS
BEGIN
-- update the queue for those entries that already exist
-- those rows that *DO NOT* exist yet are not being touched
UPDATE puq
SET Synced = 0
FROM dbo.IC_ProductUpdateQueue puq
INNER JOIN Inserted i ON puq.StockItemID = i.StockItemID
INNER JOIN Deleted d ON d.StockItemID = i.StockItemID
WHERE
i.Name <> d.Name OR i.Description <> d.Description
-- for those rows that don't exist yet - insert the values
INSERT INTO dbo.IC_ProductUpdateQueue (StockItemID, Synced)
SELECT
i.StockItemID, 0
FROM
Inserted i
INNER JOIN
Deleted d ON d.StockItemID = i.StockItemID
WHERE
i.Name <> d.Name OR i.Description <> d.Description
AND NOT EXISTS (SELECT * FROM dbo.IC_ProductUpdateQueue puq
WHERE puq.StockItemID = i.StockItemID)
END
You can join to deleted and use where I.Name <> D.Name...
https://www.mssqltips.com/sqlservertip/2342/understanding-sql-server-inserted-and-deleted-tables-for-dml-triggers/
I recently ran an update query on all the rows (Approx. 25k)...
My update query was a simple update as shown here:
Update om_Challans set LockType = 'Locked', LockActionDate = getdate(), LockActionBy = 'Query'
I have not updated any other column at all.
I also have a trigger that keeps the history in 'om_challans_history' table, so any row that changes is moved to the history table.
I recently noticed that data in about 26 rows have been changed to ZERO automatically by the query. Here is the sample of what i mean:
Any idea of how is that possible will be greatly appreciated
Update
Here is the trigger on om_Challans Table
ALTER TRIGGER [dbo].[T_CreateUpdateHistory1]
on [dbo].[OM_Challans]
after update
as
set nocount on
if ((select [Status] from deleted) LIKE '%FieldData%')
BEGIN
return
END
insert into om_challans_history (
[Rec_Ltr1] ,
[challan_id] ,
[LockType],
[LockActionDate],
[LockActionBy],
[SamplesSent]
) /* Columns in OM_Challans Table */
select
i.Rec_Ltr1 ,
i.challan_id ,
i.LockType,
i.LockActionDate,
i.LockActionBy,
i.SamplesSent /* Old Values of this table */
from
OM_Challans a
inner join
deleted i
on
a.sno = i.sno /* Primary Key columns from table A */
I have a two tables one is main and other one is for history, having the same schema with different records apart from the unique one.
I want to create a query which can tell me which column was updated and what the before and after values along with who has updated and what time.
Please see below. Can anyone help me to get this done using SQL?
UniqueID Field Modified Before Value After Value updatedby: Change Date
111 Company Name Exxon Mobile ExxonMobileTest Dev 1/13/2014
122 Account Category Focused Pursuit Jeff 1/13/2014
Make and audit table that mirrors the one you want to log, then every new row you insert into the audit signifies a change in data, use this trigger:
CREATE TRIGGER [dbo].[mytrigger]
ON [dbo].[mytable]
AFTER INSERT,DELETE,UPDATE
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
declare #ERR_TXT char(255),#ER int,#RC int
if (select COUNT(*) from inserted) <> 0 -- validate the new data
begin
-- do stuff to check new data if needed
end
if (select count(*) from inserted) = (select count(*) from deleted) -- update
begin
INSERT INTO [DB].[dbo].[myAudit]
... (all columns)
SELECT ... (all columns)
from inserted
end
if (select count(*) from inserted) = 0 and (select count(*) from deleted) <> 0 -- delete
begin
INSERT INTO [DB].[dbo].[myAudit]
... (all columns)
SELECT ... (all columns)
from deleted
end
if (select count(*) from inserted) <> 0 and (select count(*) from deleted) = 0 -- insert
begin
INSERT INTO [DB].[dbo].[myAudit]
... (all columns)
SELECT ... (all columns)
from inserted
end
END
This works well for me. You can use some of the server functions to see who did the transactions, like SUSER_ID() & GETDATE().