Get ROWCOUNT for all actions performed in MERGE Statement - sql
I amazed myself with this MERGE statement, the company isn't truly doing a Type 2 Slowing Changing Dimension but close. Oddly it's not even analytical data but let's ignore that horrendous decision. I have this working referencing HashBytes to indicated changed rows. Unfortunately, to get all scenarios addressed I ended up with that additional INSERT at then end from the temp table which actually holds the updated rows.
Alas it's functional but if you have a more effective design, please do share. I would appreciate it.
However, I am attempting to get a row count representing not only for the INSERT from the Temp table, but the updates AND the new INSERTS, all are distinct separate actions with their own row count, that I need to document and account for.
How can I do this, please ?
DECLARE #dtNow AS DATETIME = GetDate()
DECLARE #dtPast AS DATETIME = DATEADD(day,-1,GetDate())
DECLARE #dtFuture AS DATETIME = '22991231'
SET NOCOUNT ON;
-- Temp Table is JUST Updating Rows reflecting
--Historical Marker on existing row No content change to row's columnar content data
IF OBJECT_ID('tempdb..#TheTempTableName') IS NOT NULL DROP TABLE #TheTempTableName
CREATE TABLE #TheTempTableName
(
ABunchOfColumns
RowCreatedDate datetime NULL,
RowEffectiveDate datetime NULL,
RowTerminationDate datetime NULL,
RowIsCurrent bit NULL,
RowHash varchar(max) NULL,
)
INSERT INTO #TheTempTableName
(
ABunchOfColumns
,RowCreatedDate
,RowEffectiveDate
,RowTerminationDate
,RowIsCurrent
,RowHash
)
SELECT
ABunchOfColumns
,RowCreatedDate
,RowEffectiveDate
,RowTerminationDate
,RowIsCurrent
,RowHash
FROM
(
MERGE tblDim WITH (HOLDLOCK) AS target
USING
(
SELECT
ABunchOfColumns
,RowCreatedDate
,RowEffectiveDate
,RowTerminationDate
,RowIsCurrent
,RowHash
FROM dbo.tblStaging
)
AS source
ON target.PKID = source.PKID
WHEN MATCHED
AND target.RowIsCurrent = 1
AND target.RowHash != source.RowHash
------- PROCESS ONE -- UPDATE --- HISTORICALLY MARK EXISTING ROWS
THEN UPDATE SET
RowEffectiveDate = #dtPast
,RowTerminationDate = #dtPast
,RowIsCurrent = 0
----- PROCESS TWO -- INSERT ---INSERT NEW ROWS
WHEN NOT MATCHED
THEN INSERT --- THIS INSERT Goes directly into Target ( DIM ) Table (New Rows not matched with PK = PK )
(
ABunchOfColumns
,RowCreatedDate
,RowEffectiveDate
,RowTerminationDate
,RowIsCurrent
,RowHash
)
VALUES
(
source.ABunchOfColumns
,#dtNow --source.RowCreatedDate,
,#dtFuture ---source.RowEffectiveDate,
,#dtFuture ---source.RowTerminationDate,
,1 ---source.RowIsCurrent,
,source.RowHash
)
-------PROCESS THREE a -- INSERT ---OUTPUT MATCHED ROWS FROM PROCESS ONE THAT CAUSED HISTORICAL MARK (CHANGES) "INSERT"
OUTPUT
$action Action_Out,
ABunchOfColumns
,RowCreatedDate
,RowEffectiveDate
,RowTerminationDate
,RowIsCurrent
,RowHash
)
AS MERGE_OUT
WHERE MERGE_OUT.Action_Out = 'UPDATE';
----------PROCESS THREE b -- INSERT FROM Temp Tbl to final
--Now we flush the data in the temp table into dim table
INSERT INTO tblDim
(
ABunchOfColumns
,RowCreatedDate
,RowEffectiveDate
,RowTerminationDate
,RowIsCurrent
,RowHash
)
SELECT
ABunchOfColumns
,#dtNow AS RowCreatedDate
,#dtFuture AS RowEffectiveDate
,#dtFuture AS RowTerminationDate
,1 AS RowIsCurrent
,RowHash
FROM #TheTempTableName
END
There are two types of deletes (1) real deletes (2) primary key updates.
So you can also say there are two types of inserts (1) real inserts (2) primary key updates
The updates are always updates.
The dilemma then is when is an insert/delete combination is really an update.
Usually if you dont really care about the one above a simple merge like this is sufficient
MERGE esqlProductTarget T
USING esqlProductSource S
ON (S.ProductID = T.ProductID)
WHEN MATCHED
THEN UPDATE
SET T.Name = S.Name,
T.ProductNumber = S.ProductNumber,
T.Color = S.Color
WHEN NOT MATCHED BY TARGET
THEN INSERT (ProductID, Name, ProductNumber, Color)
VALUES (S.ProductID, S.Name, S.ProductNumber, S.Color)
WHEN NOT MATCHED BY SOURCE
THEN DELETE
OUTPUT S.ProductID, $action into #MergeLog;
SELECT MergeAction, Cnt=count(*)
FROM #MergeLog
GROUP BY MergeAction
The output will be like:
+-------------+-----+--+
| MergeAction | Cnt | |
+-------------+-----+--+
| DELETE | 100 | |
| UPDATE | 60 | |
| INSERT | 70 | |
+-------------+-----+--+
Refer to https://www.essentialsql.com/introduction-merge-statement/
I am not sure why you have "WHERE MERGE_OUT.Action_Out = 'UPDATE'. But if you remove that, then you can get your rowcount. Unless I have misunderstood your query.
Based on your further comments i think the main issue is how you handle the type 2 updates. The quick answer is you need two operations of UPDATE (insert/update); and DELETES are not really DELETES but UPDATES on the timestamp.
I have formulated a sample query below how to handle type2 updates and the results should be self explanatory. I have tried doing a double operation on the UPDATE merge and it is interesting it cannot do it and gives an error: "An action of type 'INSERT' is not allowed in the 'WHEN MATCHED' clause of a MERGE statement." So i think there is no choice but to split the update and insert of the UPDATE statement.
The last consideration is also the DELETE that manifest as an update. I have handled it as well in the code below how to determine when an action of UPDATE is really a DELETE.
DROP TABLE IF EXISTS _a
CREATE TABLE _a (
id int
,val int
,fromdate datetime
,todate datetime
,isactive bit
)
INSERT INTO _a
select 1,100,'2015-Jan-1',NULL,1
UNION ALL select 2,200,'2015-Feb-1',NULL,1
UNION ALL select 3,300,'2015-Mar-1',NULL,1
DROP TABLE IF EXISTS #data
DROP TABLE IF EXISTS #outputdata
select * INTO #data from _a
select TOP 0 action=CAST('' as varchar(10)),* INTO #outputdata from _a
DELETE #data where id = 3
UPDATE #data set val = 2000 where id = 2
INSERT INTO #data
select 4,400,GETDATE(),NULL,1
--select * from #data
-- _a is your data warehouse table using type2
BEGIN TRAN
select Note='OLD STATE OF _a',* from _a
select Note='NEW SET OF DATA',* from #data
MERGE dbo._a T
USING (
select id,val from #data
) S
ON (S.id = T.id)
WHEN MATCHED
AND ((S.val <> T.val OR (S.val IS NOT NULL AND T.val IS NULL) OR (S.val IS NULL AND T.val IS NOT NULL)))
THEN UPDATE SET
todate = GETDATE()
,isactive = 0
WHEN NOT MATCHED BY TARGET
THEN INSERT (id,val,fromdate,todate,isactive)
VALUES (id,val,GETDATE(),NULL,1)
WHEN NOT MATCHED BY SOURCE --AND T.id IN (SELECT id FROM #data)
--THEN DELETE TYPE2
THEN UPDATE SET /*NO-PK*/
todate = GETDATE()
,isactive = 0
OUTPUT $action as Action
,ISNULL(inserted.id,deleted.id) as id
,ISNULL(inserted.val,deleted.val) as val
,ISNULL(inserted.fromdate,deleted.fromdate) as fromdate
,ISNULL(inserted.todate,deleted.todate) as todate
,ISNULL(inserted.isactive,deleted.isactive) as isactive
INTO #outputdata;
select Note='Logs Output',* from #outputdata
-- FIND THE NEW RECORD
INSERT INTO _a (id,val,fromdate,todate,isactive)
SELECT a.id,a.val,GETDATE()+.000001,a.todate,a.isactive
FROM #data a
INNER JOIN #outputdata b
on a.id = b.id
WHERE b.action ='UPDATE'
select Note='NEW STATE OF _a',* from _a
SELECT Note='Real Action',d1.id,action=CASE WHEN action='UPDATE' AND d2.id is null then 'DELETE' ELSE action END
FROM #outputdata d1
LEFT JOIN _a d2
on d1.action ='UPDATE' and d1.id = d2.id and d2.isactive =1
ROLLBACK TRAN
The results will be:
+-----------------+----+-----+-------------------------+--------+----------+
| Note | id | val | fromdate | todate | isactive |
+-----------------+----+-----+-------------------------+--------+----------+
| OLD STATE OF _a | 1 | 100 | 2015-01-01 00:00:00.000 | NULL | 1 |
| OLD STATE OF _a | 2 | 200 | 2015-02-01 00:00:00.000 | NULL | 1 |
| OLD STATE OF _a | 3 | 300 | 2015-03-01 00:00:00.000 | NULL | 1 |
+-----------------+----+-----+-------------------------+--------+----------+
+-----------------+----+------+-------------------------+--------+----------+
| Note | id | val | fromdate | todate | isactive |
+-----------------+----+------+-------------------------+--------+----------+
| NEW SET OF DATA | 1 | 100 | 2015-01-01 00:00:00.000 | NULL | 1 |
| NEW SET OF DATA | 2 | 2000 | 2015-02-01 00:00:00.000 | NULL | 1 |
| NEW SET OF DATA | 4 | 400 | 2019-01-31 09:49:45.943 | NULL | 1 |
+-----------------+----+------+-------------------------+--------+----------+
+-------------+--------+----+-----+-------------------------+-------------------------+----------+
| Note | action | id | val | fromdate | todate | isactive |
+-------------+--------+----+-----+-------------------------+-------------------------+----------+
| Logs Output | INSERT | 4 | 400 | 2019-01-31 09:51:13.647 | NULL | 1 |
| Logs Output | UPDATE | 2 | 200 | 2015-02-01 00:00:00.000 | 2019-01-31 09:51:13.647 | 0 |
| Logs Output | UPDATE | 3 | 300 | 2015-03-01 00:00:00.000 | 2019-01-31 09:51:13.647 | 0 |
+-------------+--------+----+-----+-------------------------+-------------------------+----------+
-- OPERATIONS 1 INSERT 1 UPDATE 1 DELETE
DELETE #data where id = 3
UPDATE #data set val = 2000 where id = 2
INSERT INTO #data
select 4,400,GETDATE(),NULL,1
+-----------------+----+------+-------------------------+-------------------------+----------+
| Note | id | val | fromdate | todate | isactive |
+-----------------+----+------+-------------------------+-------------------------+----------+
| NEW STATE OF _a | 1 | 100 | 2015-01-01 00:00:00.000 | NULL | 1 |
| NEW STATE OF _a | 2 | 200 | 2015-02-01 00:00:00.000 | 2019-01-31 09:51:13.647 | 0 |
| NEW STATE OF _a | 3 | 300 | 2015-03-01 00:00:00.000 | 2019-01-31 09:51:13.647 | 0 |
| NEW STATE OF _a | 4 | 400 | 2019-01-31 09:51:13.647 | NULL | 1 |
| NEW STATE OF _a | 2 | 2000 | 2019-01-31 09:51:13.733 | NULL | 1 |
+-----------------+----+------+-------------------------+-------------------------+----------+
+-------------+----+--------+
| Note | id | action |
+-------------+----+--------+
| Real Action | 4 | INSERT |
| Real Action | 2 | UPDATE |
| Real Action | 3 | DELETE |
+-------------+----+--------+
Related
How to add items from another table based on a string aggregated column
I have 2 tables like this [Table 1]: |cust_id| tran |item | | ------| -----|------- | id1 | 123 |a,b,c | | id2 | 234 |b,b | | id3 | 345 |c,d,a,b| [Table 2]: | item. | value | | ----- | ----- | | a | 1 | | b | 2 | | c | 3 | | d | 4 | I want to create a target value by doing a lookup from table 2 in table 1 using big query. |cust_id| tran.|item |target| | ------| -----|------|------| | id1 | 123 |a,b,c | 6 | id2 | 234 |b,b | 4 | id3 | 345 |c,d,a,b| 10 What can I try next?
Consider below simple approach select *, ( select sum(value) from unnest(split(item)) item join table2 using (item) ) target from table1 if applied to sample data in your question - output is
Try the following: select t1.cust_id , t1.tran , t1.item , sum(t2.value) as target from table_1 t1 , UNNEST(split(t1.item ,',')) as item_unnested LEFT JOIN table_2 t2 on item_unnested=t2.item group by t1.cust_id , t1.tran , t1.item With your data it gives the following:
Create a center table that splits the item column values on rows and join that table with table2. Try following --Cursor is used to split the item data row by row --#temp is a temporary table create table #temp (id varchar(10), trans varchar(10), item varchar(10), item1 varchar(10)); DECLARE #item varchar(10); DECLARE #id varchar(10); DECLARE #trans varchar(10); DECLARE item_cusor CURSOR FOR SELECT * FROM table1; OPEN item_cusor FETCH NEXT FROM item_cusor INTO #id,#trans,#item WHILE ##FETCH_STATUS = 0 BEGIN insert into #temp SELECT #id,#trans,#item,* FROM STRING_SPLIT (#item, ',') FETCH NEXT FROM item_cusor INTO #id,#trans,#item END CLOSE item_cusor; DEALLOCATE item_cusor; --select * from temp select t.id as cust_id, t.trans,t.item , sum(cast(t2.value as int)) as target from #temp t JOIN table2 t2 on t.item1=t2.item group by t.id, t.trans,t.item; Cursors: https://www.c-sharpcorner.com/article/cursors-in-sql-server/ Temporary tables: https://www.sqlservertutorial.net/sql-server-basics/sql-server-temporary-tables/ String split function: https://learn.microsoft.com/en-us/sql/t-sql/functions/string-split-transact-sql
Query to revert db changes to a specific time by checking audit data
I have a module which handles some business rules in my application, there are a few tables where these rules are stored. Orignal Business Rule table : br_tbl_1 br_id | col_1 | col_2 ------+-------+-------- 1 | a | myk 2 | b | abc Related Tables: br_tbl_2 id | br_id | col_1 ---+-------+-------- 1 | 1 | something 2 | 1 | something_else 3 | 2 | Another thing and so on... Now to track the changes made to the business rules, I have an audit table for each of the above tables, like so.. Business Rule Audit Table: br_tbl_1_audit id | br_id | col_1 | col_2 | audit_dtme | operation ---+-------+--------+-------+---------------------+----------------- 1 | 1 | a | xyz | 01-01-2001 12:30:10 | INSERT 2 | 1 | a | myk | 02-01-2001 01:00:00 | UPDATE 3 | 2 | b | abc | 02-01-2001 01:10:30 | INSERT by looking at the data from br_tbl_1_audit table we can see that the value for col_2 for br_id = 1 has changed from "xyz" to "myk" Similarly we have an audit table for the other business rules tables. Related Table's Audit Table: br_tbl_2_audit id | br_id | col_1 | audit_dtme | operation ---+-------+------------------+----------------------+-------------- 1 | 1 | something | 01-01-2001 12:30:10 | INSERT 2 | 1 | something_else | 01-01-2001 12:30:10 | INSERT 3 | 2 | Another thing | 02-01-2001 01:10:30 | INSERT I need a Query which takes in a br_id and an audit_date_time and rolls back all the data for that br_id in all tables to that audit_dtme I can do this with a Script, however I am not very good with SQL Queries, I appriciate the help. FYI : I am using Postgres, but any SQL should be enough t push me in the right direction.
In any given table, you can use distinct on: select distinct on (a.br_id) a.* from br_tbl_1_audit a where a.audit_dtime <= $audit_date_time order by a.br_id, a.audit_dtime desc; You can also filter for one or more br_id values as well. You can repeat this for all the tables you care about. If you need to replace a row, then you can use update: update br_tbl_1 t set col_1 = a.col_1, col_2 = a.col_2 from (select a.* from br_tbl_1_audit a where a.audit_dtime <= $audit_date_time and a.br_id = 1 order by a.audit_dtime desc limit 1 ) a where t.br_id = 1;
I would probably say that this would be very tough to handle if you have too many tables linked. Following is the sample code if you have to just delete from one table. Now you can modify this as per your requirement. declare #id int, #br_id int, #br_id_input int = 1, #col_1 varchar(100), #col_2 varchar(100), #audit_dtme datetime, #operation varchar(100), #audit_date_time datetime = '2001-01-01 12:30:10.000'; declare cur cursor for select id, br_id, col_1, col_2, audit_dtme, operation from br_tbl_1_audit where br_id = #br_id_input and audit_dtme > #audit_date_time order by id desc open cur fetch next from cur into #id, #br_id, #col_1, #col_2, #audit_dtme, #operation while ##fetch_status = 0 begin if (#operation = 'INSERT') begin delete from br_tbl_1 where br_id = #br_id; end else if (#operation = 'DELETE') begin set identity_insert br_tbl_1 on; insert into br_tbl_1 (br_id, col_1, col_2) values (#br_id, #col_1, #col_2) set identity_insert br_tbl_1 off; end else begin ;with cte as ( select top 1 * from br_tbl_1_audit where br_id = #br_id and audit_dtme < #audit_dtme order by id desc ) update tb1 set tb1.col_1 = cte.col_1, tb1.col_2 = cte.col_2 from br_tbl_1 tb1 join cte on cte.br_id = tb1.br_id end delete from br_tbl_1_audit where id = #id; fetch next from cur into #id, #br_id, #col_1, #col_2, #audit_dtme, #operation end close cur deallocate cur For deleting in the foreign key tables, you will have to add another cursor inside the main cursor which will insert/update/delete in the foreign key tables as per the primary key table rows. Although cursor may not be the best solution as it may be slow if there are too many tables or data rows to restore.
SQL Server better way to iterate through millions of rows
I am working with SAP Timesheet data, so there are millions of rows. What I am trying to do is select the data from the SAP table and insert it into a table on MS SQL Server. So I want to insert the original record, then if an update to the original record happens, which is in the form of a new SAP record with a refcounter, I want to find the original record in my table and update it, keeping the original counter value. So I have done this successfully with a cursor (I know not the best), but with millions of records, I am wondering if there is a faster way, because I am on day 4 of my cursor running. Is there a better way then what I have below: BEGIN CREATE TABLE CATSDB ( [COUNTER] nvarchar(12), REFCOUNTER nvarchar(12), PERNR nvarchar(8), WORKDATE nvarchar(8), CATSHOURS decimal(7, 3), APDAT nvarchar(8), LAETM nvarchar(6), CATS_STATUS nvarchar(2), APPR_STATUS nvarchar(2) ) INSERT INTO CATSDB ( [COUNTER],REFCOUNTER,PERNR,WORKDATE,CATSHOURS,APDAT,LAETM,CATS_STATUS,APPR_STATUS ) VALUES ('000421692670',NULL,'00000071','20190114','6.00','20190204','174541','30','30'), ('000421692671',NULL,'00000071','20190114','3.00','20190204','174541','30','30'), ('000421692672',NULL,'00000071','20190115','6.00','00000000','000000','60','20'), ('000421692673',NULL,'00000071','20190115','3.00','00000000','000000','60','20'), ('000421692712','000421692672','00000071','20190115','0.00','20190115','111007','30','30'), ('000421692713','000421692673','00000071','20190115','0.00','20190115','111007','30','30'), ('000429718015',NULL,'00000072','20190313','7.00','00000000','000000','60','20'), ('000429718016',NULL,'00000072','20190313','1.50','20190315','164659','30','30'), ('000429718017',NULL,'00000072','20190313','1.00','20190315','164659','30','30'), ('000430154143',NULL,'00000072','20190313','2.00','00000000','000000','60','20'), ('000430154142','000429718015','00000072','20190313','5.00','00000000','000000','60','20'), ('000430154928','000430154142','00000072','20190313','4.50','20190315','164659','30','30'), ('000430154929','000430154143','00000072','20190313','2.50','20190315','164659','30','30'), ('000429774620',NULL,'00000152','20190314','1.00','00000000','000000','60','20'), ('000429774619',NULL,'00000152','20190314','1.00','00000000','000000','60','20'), ('000429802106','000429774620','00000152','20190314','2.00','00000000','000000','60','20'), ('000429802105','000429774619','00000152','20190314','3.00','00000000','000000','60','20'), ('000429840242','000429802106','00000152','20190314','4.00','20190315','143857','30','30'), ('000429840241','000429802105','00000152','20190314','5.00','20190315','143857','30','30') CREATE TABLE [TBL_COUNTER] ( [COUNTER] [varchar](12) NOT NULL, [REFCOUNTER] [varchar](12) NULL ) CREATE TABLE TEMP ( [COUNTER] [nvarchar](12) NOT NULL, [REFCOUNTER] [nvarchar](12) NULL, [PERNR] [nvarchar](8) NULL, [WORKDATE] [nvarchar](8) NULL, [CATSHOURS] [decimal](7, 3) NULL, [APDAT] [nvarchar](8) NULL, [LAETM] [nvarchar](6) NULL, [CATS_STATUS] [nvarchar](2) NULL, [APPR_STATUS] [nvarchar](2) NULL ) END BEGIN DECLARE #COUNTER nvarchar(12), #REFCOUNTER nvarchar(12), #PERNR nvarchar(8), #WORKDATE nvarchar(8), #CATSHOURS decimal(7, 3), #APDAT nvarchar(8), #LAETM nvarchar(6), #CATS_STATUS nvarchar(2), #APPR_STATUS nvarchar(2) DECLARE #orig_counter nvarchar(12) END BEGIN DECLARE curs CURSOR FOR SELECT [COUNTER], REFCOUNTER, PERNR, WORKDATE, CATSHOURS, APDAT, LAETM, CATS_STATUS, APPR_STATUS FROM CATSDB END BEGIN OPEN curs END BEGIN FETCH NEXT FROM curs INTO #COUNTER, #REFCOUNTER, #PERNR, #WORKDATE, #CATSHOURS, #APDAT, #LAETM, #CATS_STATUS, #APPR_STATUS END BEGIN WHILE ##FETCH_STATUS = 0 BEGIN BEGIN IF NOT EXISTS (SELECT * FROM TBL_COUNTER WHERE [COUNTER] = #COUNTER) BEGIN INSERT INTO TBL_COUNTER ([COUNTER] ,REFCOUNTER) VALUES (#COUNTER ,#REFCOUNTER) END END BEGIN IF NOT EXISTS (SELECT * FROM TEMP WHERE [COUNTER] = #COUNTER) BEGIN --If REFCOUNTER is populated, get the original COUNTER value, then update that row with the new values. Otherwise insert new record IF #REFCOUNTER <> '' AND #REFCOUNTER IS NOT NULL BEGIN BEGIN WITH n([COUNTER], REFCOUNTER) AS ( SELECT cnt.[COUNTER], cnt.REFCOUNTER FROM TBL_COUNTER cnt WHERE cnt.[COUNTER] = #REFCOUNTER UNION ALL SELECT nplus1.[COUNTER], nplus1.REFCOUNTER FROM TBL_COUNTER as nplus1, n WHERE n.[COUNTER] = nplus1.REFCOUNTER ) SELECT #orig_counter = [COUNTER] FROM n WHERE REFCOUNTER = '' OR REFCOUNTER IS NULL END BEGIN UPDATE TEMP SET [REFCOUNTER] = #REFCOUNTER ,[PERNR] = #PERNR ,[WORKDATE] = #WORKDATE ,[CATSHOURS] = #CATSHOURS ,[APDAT] = #APDAT ,[LAETM] = #LAETM ,[CATS_STATUS] = #CATS_STATUS ,[APPR_STATUS] = #APPR_STATUS WHERE [COUNTER] = #orig_counter END END ELSE BEGIN INSERT INTO TEMP ([COUNTER] ,[REFCOUNTER] ,[PERNR] ,[WORKDATE] ,[CATSHOURS] ,[APDAT] ,[LAETM] ,[CATS_STATUS] ,[APPR_STATUS]) VALUES (#COUNTER ,#REFCOUNTER ,#PERNR ,#WORKDATE ,#CATSHOURS ,#APDAT ,#LAETM ,#CATS_STATUS ,#APPR_STATUS) END END FETCH NEXT FROM curs INTO #COUNTER, #REFCOUNTER, #PERNR, #WORKDATE, #CATSHOURS, #APDAT, #LAETM, #CATS_STATUS, #APPR_STATUS END END END BEGIN CLOSE curs DEALLOCATE curs END I shortened it and created the tables for you all to be able to see what is going on. The expected result is +--------------+--------------+----------+----------+-----------+----------+--------+-------------+-------------+ | COUNTER | REFCOUNTER | PERNR | WORKDATE | CATSHOURS | APDAT | LAETM | CATS_STATUS | APPR_STATUS | +--------------+--------------+----------+----------+-----------+----------+--------+-------------+-------------+ | 000421692670 | NULL | 00000071 | 20190114 | 6.00 | 20190204 | 174541 | 30 | 30 | | 000421692671 | NULL | 00000071 | 20190114 | 3.00 | 20190204 | 174541 | 30 | 30 | | 000421692672 | 000421692672 | 00000071 | 20190115 | 0.00 | 20190115 | 111007 | 30 | 30 | | 000421692673 | 000421692673 | 00000071 | 20190115 | 0.00 | 20190115 | 111007 | 30 | 30 | | 000429718015 | 000430154142 | 00000072 | 20190313 | 4.50 | 20190315 | 164659 | 30 | 30 | | 000429718016 | NULL | 00000072 | 20190313 | 1.50 | 20190315 | 164659 | 30 | 30 | | 000429718017 | NULL | 00000072 | 20190313 | 1.0 | 20190315 | 164659 | 30 | 30 | | 000430154143 | 000430154143 | 00000072 | 20190313 | 2.50 | 20190315 | 164659 | 30 | 30 | | 000429774620 | 000429774620 | 00000152 | 20190314 | 2.00 | 00000000 | 000000 | 60 | 20 | | 000429774619 | 000429802105 | 00000152 | 20190314 | 5.00 | 20190315 | 143857 | 30 | 30 | +--------------+--------------+----------+----------+-----------+----------+--------+-------------+-------------+ I need to add to this. So there is two phases to this. The first phase is I will pull all the data from 2019 for an initial load of my table. Then on a weekly basis, I will pull the data from the origin source for new records and changed records from the last time i ran it. So I will not have the full chain every week. There needs to be a way to get back to the original counter value, without the full dataset, which is why i had the counter table. I apologize for not being more clear. I am swamped with work and havent been able to focus on this as much as I planned. I am trying all these different techniques.
I believe, following query would help you to start with and it's much efficient way to approach you goal. It was created to maintain historical info of SQL Servers in central location, and performs following activities, you have to include/replace your table structures in respective blocks of script Creates temp table Collects the information from multiple servers using OPENQUERY via Lined Servers (source) and loads into Temp Table. Creates Indexes on Temp tables Loads the data into Central Table (destination) with 3 scenarios (as commented in script) Note: Replaced the script as per your scenario BEGIN Create Table #SrcTemp ( AENAM nvarchar(12), AUTYP nvarchar(2), AWART nvarchar(4), BELNR nvarchar(10), CATSHOURS decimal(7, 3), CATSQUANTITY decimal(18, 3), CHARGE_HOLD nvarchar(24), [COUNTER] nvarchar(12), ERNAM nvarchar(12), ERSDA nvarchar(8), ERSTM nvarchar(6), HRCOSTASG nvarchar(1), LAEDA nvarchar(8), LSTAR nvarchar(6), LTXA1 nvarchar(40), MANDT nvarchar(3), PERNR nvarchar(8), RAPLZL nvarchar(8), RAUFPL nvarchar(10), REFCOUNTER nvarchar(12), RNPLNR nvarchar(12), SKOSTL nvarchar(10), CATS_STATUS nvarchar(2), SUPP3 nvarchar(10), WORKDATE nvarchar(8), ZZOH_ORDER nvarchar(24), APDAT nvarchar(8), APNAM nvarchar(12), LAETM nvarchar(6), APPR_STATUS nvarchar(2) ); -- DECLARE #orig_counter nvarchar(12) END UPDATE #SrcTemp SET REFCOUNTER = '0' WHERE REFCOUNTER = '' or REFCOUNTER is null; CREATE Clustered Index CLU_SrvTemp on #SrcTemp ([COUNTER], REFCOUNTER); BEGIN INSERT INTO #SrcTemp SELECT AENAM,AUTYP,AWART,BELNR,CATSHOURS,CATSQUANTITY,CHARGE_HOLD,[COUNTER],ERNAM,ERSDA,ERSTM,HRCOSTASG,LAEDA,LSTAR,LTXA1,MANDT, PERNR,RAPLZL,RAUFPL,REFCOUNTER,RNPLNR,SKOSTL,CATS_STATUS,SUPP3,WORKDATE,ZZOH_ORDER,APDAT,APNAM,LAETM,APPR_STATUS FROM CATSDB; END --BEGIN -- OPEN curs --END -- Scope: UNCHANGED Records ================================================================================================================================== IF EXISTS (select * from ( SELECT ROW_NUMBER () OVER (PARTITION BY [COUNTER] ORDER BY COUNTER) AS RN FROM #SrcTemp WHERE REFCOUNTER = '0' ) as t where t.RN > 1 ) BEGIN RAISERROR ('Primary key violation occurred in "UNCHANGED" records processing block', 16, 1) with NOWAIT; END ELSE BEGIN -- When NON-CHANGED Records NOT Existed in SQL table ------------------------------------------- BEGIN INSERT INTO TEMP ([AENAM],[AUTYP],[AWART],[BELNR],[CATSHOURS],[CATSQUANTITY],[CHARGE_HOLD],[COUNTER],[ERNAM] ,[ERSDA],[ERSTM],[HRCOSTASG],[LAEDA],[LSTAR],[LTXA1],[MANDT],[PERNR],[RAPLZL],[RAUFPL] ,[REFCOUNTER],[RNPLNR],[SKOSTL],[CATS_STATUS],[SUPP3],[WORKDATE],[ZZOH_ORDER],[APDAT],[APNAM] ,[LAETM],[APPR_STATUS] ) SELECT s.[AENAM], s.[AUTYP], s.[AWART], s.[BELNR], s.[CATSHOURS], s.[CATSQUANTITY], s.[CHARGE_HOLD], s.[COUNTER], s.[ERNAM] , s.[ERSDA], s.[ERSTM], s.[HRCOSTASG], s.[LAEDA], s.[LSTAR], s.[LTXA1], s.[MANDT], s.[PERNR], s.[RAPLZL], s.[RAUFPL] , s.[REFCOUNTER], s.[RNPLNR], s.[SKOSTL], s.[CATS_STATUS], s.[SUPP3], s.[WORKDATE], s.[ZZOH_ORDER], s.[APDAT], s.[APNAM] , s.[LAETM], s.[APPR_STATUS] FROM #SrcTemp as S LEFT JOIN TEMP as D on s.COUNTER = d.COUNTER WHERE (S.REFCOUNTER = '0') and D.COUNTER is null ; END -- When NON-CHANGED Records Existed in SQL table ------------------------------------------- BEGIN UPDATE S SET [AENAM] = D.AENAM ,[AUTYP] = D.AUTYP ,[AWART] = D.AWART ,[BELNR] = D.BELNR ,[CATSHOURS] = D.CATSHOURS ,[CATSQUANTITY] = D.CATSQUANTITY ,[CHARGE_HOLD] = D.CHARGE_HOLD ,[ERNAM] = D.ERNAM ,[ERSDA] = D.ERSDA ,[ERSTM] = D.ERSTM ,[HRCOSTASG] = D.HRCOSTASG ,[LAEDA] = D.LAEDA ,[LSTAR] = D.LSTAR ,[LTXA1] = D.LTXA1 ,[MANDT] = D.MANDT ,[PERNR] = D.PERNR ,[RAPLZL] = D.RAPLZL ,[RAUFPL] = D.RAUFPL ,[REFCOUNTER] = D.REFCOUNTER ,[RNPLNR] = D.RNPLNR ,[SKOSTL] = D.SKOSTL ,[CATS_STATUS] = D.CATS_STATUS ,[SUPP3] = D.SUPP3 ,[WORKDATE] = D.WORKDATE ,[ZZOH_ORDER] = D.ZZOH_ORDER ,[APDAT] = D.APDAT ,[APNAM] = D.APNAM ,[LAETM] = D.LAETM ,[APPR_STATUS] = D.APPR_STATUS FROM #SrcTemp as S LEFT JOIN TEMP as D on (s.COUNTER = d.COUNTER and S.REFCOUNTER = D.REFCOUNTER) WHERE (S.REFCOUNTER = '0') and D.COUNTER is NOT null END END -- Scope: CHANGED Records ================================================================================================================================== IF EXISTS (select * from ( SELECT ROW_NUMBER () OVER (PARTITION BY [COUNTER], REFCOUNTER ORDER BY [COUNTER]) AS RN FROM #SrcTemp WHERE not REFCOUNTER = '0' ) as t where t.RN > 1 ) BEGIN RAISERROR ('Primary key violation occurred in "CHANGED" records processing block', 10, 1) with NOWAIT; END ELSE BEGIN -- When CHANGED Records NOT Existed in SQL table ------------------------------------------- BEGIN INSERT INTO TEMP ([AENAM],[AUTYP],[AWART],[BELNR],[CATSHOURS],[CATSQUANTITY],[CHARGE_HOLD],[COUNTER],[ERNAM] ,[ERSDA],[ERSTM],[HRCOSTASG],[LAEDA],[LSTAR],[LTXA1],[MANDT],[PERNR],[RAPLZL],[RAUFPL] ,[REFCOUNTER],[RNPLNR],[SKOSTL],[CATS_STATUS],[SUPP3],[WORKDATE],[ZZOH_ORDER],[APDAT],[APNAM] ,[LAETM],[APPR_STATUS] ) SELECT s.[AENAM], s.[AUTYP], s.[AWART], s.[BELNR], s.[CATSHOURS], s.[CATSQUANTITY], s.[CHARGE_HOLD], s.[COUNTER], s.[ERNAM] , s.[ERSDA], s.[ERSTM], s.[HRCOSTASG], s.[LAEDA], s.[LSTAR], s.[LTXA1], s.[MANDT], s.[PERNR], s.[RAPLZL], s.[RAUFPL] , s.[REFCOUNTER], s.[RNPLNR], s.[SKOSTL], s.[CATS_STATUS], s.[SUPP3], s.[WORKDATE], s.[ZZOH_ORDER], s.[APDAT], s.[APNAM] , s.[LAETM], s.[APPR_STATUS] FROM #SrcTemp as S LEFT JOIN TEMP as D on s.COUNTER = d.COUNTER and S.REFCOUNTER = D.REFCOUNTER WHERE (not S.REFCOUNTER = '0') and D.COUNTER is null END -- When NON-CHANGED Records Existed in SQL table ------------------------------------------- BEGIN UPDATE S SET [AENAM] = D.AENAM ,[AUTYP] = D.AUTYP ,[AWART] = D.AWART ,[BELNR] = D.BELNR ,[CATSHOURS] = D.CATSHOURS ,[CATSQUANTITY] = D.CATSQUANTITY ,[CHARGE_HOLD] = D.CHARGE_HOLD ,[ERNAM] = D.ERNAM ,[ERSDA] = D.ERSDA ,[ERSTM] = D.ERSTM ,[HRCOSTASG] = D.HRCOSTASG ,[LAEDA] = D.LAEDA ,[LSTAR] = D.LSTAR ,[LTXA1] = D.LTXA1 ,[MANDT] = D.MANDT ,[PERNR] = D.PERNR ,[RAPLZL] = D.RAPLZL ,[RAUFPL] = D.RAUFPL ,[REFCOUNTER] = D.REFCOUNTER ,[RNPLNR] = D.RNPLNR ,[SKOSTL] = D.SKOSTL ,[CATS_STATUS] = D.CATS_STATUS ,[SUPP3] = D.SUPP3 ,[WORKDATE] = D.WORKDATE ,[ZZOH_ORDER] = D.ZZOH_ORDER ,[APDAT] = D.APDAT ,[APNAM] = D.APNAM ,[LAETM] = D.LAETM ,[APPR_STATUS] = D.APPR_STATUS FROM #SrcTemp as S LEFT JOIN TEMP as D on s.COUNTER = d.COUNTER and S.REFCOUNTER = D.REFCOUNTER WHERE (not S.REFCOUNTER = '0' ) and D.COUNTER is NOT null END END Drop table #SrcTemp;
It looks like it can be done with a simple recursive query. Having suitable index is also important. Sample data This is how your sample data should look like in the question. Only few relevant columns. It would be better to include several sets/chains of changes, not just one. Having only this sample data would make it harder for you to verify if presented solutions are correct. +-----------+---------------------+-----------+------------+ | BELNR | CHARGE_HOLD | COUNTER | REFCOUNTER | +-----------+---------------------+-----------+------------+ | 417548605 | T4-GS023ABC2 0150#* | 420202428 | NULL | | 417549506 | T4-GS023-ABC2 | 420203329 | 420202428 | | 417553156 | JGS023001 0010#* | 420206979 | 420203329 | | 417557221 | T4-GS023-ABC2 | 420211044 | 420206979 | | 417581675 | JGS023001 0010#* | 420235498 | 420211044 | | 417677969 | JGS023001 0010#* | 420331792 | 420235498 | +-----------+---------------------+-----------+------------+ The main recursive part of the query WITH CTE AS ( SELECT 1 AS Lvl, CATSDB.BELNR AS OriginalBELNR, CATSDB.CHARGE_HOLD AS OriginalCHARGE_HOLD, CATSDB.[COUNTER] AS OriginalCOUNTER, CATSDB.REFCOUNTER AS OrginalREFCOUNTER, CATSDB.BELNR AS NewBELNR, CATSDB.CHARGE_HOLD AS NewCHARGE_HOLD, CATSDB.[COUNTER] AS NewCOUNTER, CATSDB.REFCOUNTER AS NewREFCOUNTER FROM CATSDB WHERE REFCOUNTER IS NULL UNION ALL SELECT CTE.Lvl + 1 AS Lvl, CTE.OriginalBELNR, CTE.OriginalCHARGE_HOLD, CTE.OriginalCOUNTER, CTE.OrginalREFCOUNTER, CATSDB.BELNR AS NewBELNR, CATSDB.CHARGE_HOLD AS NewCHARGE_HOLD, CATSDB.[COUNTER] AS NewCOUNTER, CATSDB.REFCOUNTER AS NewREFCOUNTER FROM CATSDB INNER JOIN CTE ON CATSDB.REFCOUNTER = CTE.NewCOUNTER ) SELECT * FROM CTE; Intermediate result +-----+---------------+---------------------+-----------------+-------------------+-----------+---------------------+------------+---------------+ | Lvl | OriginalBELNR | OriginalCHARGE_HOLD | OriginalCOUNTER | OrginalREFCOUNTER | NewBELNR | NewCHARGE_HOLD | NewCOUNTER | NewREFCOUNTER | +-----+---------------+---------------------+-----------------+-------------------+-----------+---------------------+------------+---------------+ | 1 | 417548605 | T4-GS023ABC2 0150#* | 420202428 | NULL | 417548605 | T4-GS023ABC2 0150#* | 420202428 | NULL | | 2 | 417548605 | T4-GS023ABC2 0150#* | 420202428 | NULL | 417549506 | T4-GS023-ABC2 | 420203329 | 420202428 | | 3 | 417548605 | T4-GS023ABC2 0150#* | 420202428 | NULL | 417553156 | JGS023001 0010#* | 420206979 | 420203329 | | 4 | 417548605 | T4-GS023ABC2 0150#* | 420202428 | NULL | 417557221 | T4-GS023-ABC2 | 420211044 | 420206979 | | 5 | 417548605 | T4-GS023ABC2 0150#* | 420202428 | NULL | 417581675 | JGS023001 0010#* | 420235498 | 420211044 | | 6 | 417548605 | T4-GS023ABC2 0150#* | 420202428 | NULL | 417677969 | JGS023001 0010#* | 420331792 | 420235498 | +-----+---------------+---------------------+-----------------+-------------------+-----------+---------------------+------------+---------------+ You can see that we've taken the starting row of the chain (where RefCounter is NULL) and carried it over the whole chain of changes. Now we just need to pick the rows with the last change, i.e. with the largest Lvl for each starting row. One way to do it is to use ROW_NUMBER function with suitable partitioning. Final query WITH CTE AS ( SELECT 1 AS Lvl, CATSDB.BELNR AS OriginalBELNR, CATSDB.CHARGE_HOLD AS OriginalCHARGE_HOLD, CATSDB.[COUNTER] AS OriginalCOUNTER, CATSDB.REFCOUNTER AS OrginalREFCOUNTER, CATSDB.BELNR AS NewBELNR, CATSDB.CHARGE_HOLD AS NewCHARGE_HOLD, CATSDB.[COUNTER] AS NewCOUNTER, CATSDB.REFCOUNTER AS NewREFCOUNTER FROM CATSDB WHERE REFCOUNTER IS NULL UNION ALL SELECT CTE.Lvl + 1 AS Lvl, CTE.OriginalBELNR, CTE.OriginalCHARGE_HOLD, CTE.OriginalCOUNTER, CTE.OrginalREFCOUNTER, CATSDB.BELNR AS NewBELNR, CATSDB.CHARGE_HOLD AS NewCHARGE_HOLD, CATSDB.[COUNTER] AS NewCOUNTER, CATSDB.REFCOUNTER AS NewREFCOUNTER FROM CATSDB INNER JOIN CTE ON CATSDB.REFCOUNTER = CTE.NewCOUNTER ) ,CTE_rn AS ( SELECT * ,ROW_NUMBER() OVER (PARTITION BY OriginalCOUNTER ORDER BY Lvl DESC) AS rn FROM CTE ) SELECT * FROM CTE_rn WHERE rn = 1 --OPTION (MAXRECURSION 0) ; If you can have a chain longer than 100 you should add OPTION (MAXRECURSION 0) to the query, because by default SQL Server limits recursion depth to 100. Result +-----+---------------+---------------------+-----------------+-------------------+-----------+---------------------+------------+---------------+----+ | Lvl | OriginalBELNR | OriginalCHARGE_HOLD | OriginalCOUNTER | OrginalREFCOUNTER | NewBELNR | NewCHARGE_HOLD | NewCOUNTER | NewREFCOUNTER | rn | +-----+---------------+---------------------+-----------------+-------------------+-----------+---------------------+------------+---------------+----+ | 6 | 417548605 | T4-GS023ABC2 0150#* | 420202428 | NULL | 417677969 | JGS023001 0010#* | 420331792 | 420235498 | 1 | +-----+---------------+---------------------+-----------------+-------------------+-----------+---------------------+------------+---------------+----+ Efficiency To make it work efficiently we need to have an index on REFCOUNTER column. Also, the query assumes that REFCOUNTER is NULL, not ''. If you have a mix of NULLs and empty strings, unify your data, otherwise an index would not be useful. This index is the minimum what you need to have. Ideally, you should have a CLUSTERED index on REFCOUNTER column, because the query always selects all columns from the table. CREATE CLUSTERED INDEX [IX_RefCounter] ON [dbo].[CATSDB] ( [REFCOUNTER] ASC ) If you can't change the indexes of your original table, I would recommend to copy all millions of rows into a temp table and create this clustered index for that temp table. I got a pretty good plan with this clustered index.
Few things you can do to improve performance: Convert COUNTER and REFCOUNTER to datatype int from nvarchar, operations on int are much faster than characters. Do not use a cursors, you can still process one record at at time using a while loop. DECLARE #CCOUNTER int = 0 WHILE (1 = 1) BEGIN /* SELECT #COUNTER = MIN(COUNTER) > #COUNTER FROM CATSDB */ /* IF ##ROWCOUNT != 1 THEN BREAK OUT OF THE WHILE LOOP, WE ARE DONE */ /* SELECT RECORD FOR THIS #COUNTER FROM CATSDB */ /* DO THE PROCESSING FOR THIS RECORD */ END
There is a method called sql Bulk copy i don't it will help in your problem but give it a try.
The most performant way to do this is through BCP. https://learn.microsoft.com/en-us/sql/tools/bcp-utility?view=sql-server-2017. You can BCP all of the data into a staging table in SQL Server and then run your inserts and updates. Also when checking for non-existence of a record to determine whether this is an insert or an update "IF NOT EXISTS (SELECT * FROM TEMP WHERE [COUNTER] = #COUNTER)" is very expensive. Example of a more performant way to do this: (Table names TBL_SOURCE, TBL_DESTINATION, #TBL_UPDATES, and #TBL_INSERTS) SELECT * into #TBL_INSERTS FROM TBL_SOURCE S left outer join TBL_DESTINATION D on S.COUNTER=D.COUNTER WHERE D.Counter is null SELECT * into #TBL_UPDATES FROM TBL_SOURCE S left outer join TBL_DESTINATION D on S.COUNTER=D.COUNTER WHERE D.Counter is not null Updates will be captured in #tbl_updates and inserts in #tbl_inserts
See based few sample data and given output, our script cannot be 100% OK and optimized ,where millions of data to updated is concern. I have confidence in my script that it can be improve in that direction,after fully understanding the requirement . First of all I wonder why data type are nvarchar,if possible make it to varchar,int,datetime . If you can change data type then it will do wonder to the performance. Also there is no identity column which should be Clustered Index. This two point matter from performance point of view. So in my example, CREATE TABLE CATSDB ( id int identity , [COUNTER] nvarchar(12), REFCOUNTER nvarchar(12), PERNR nvarchar(8), WORKDATE nvarchar(8), CATSHOURS decimal(7, 3), APDAT nvarchar(8), LAETM nvarchar(6), CATS_STATUS nvarchar(2), APPR_STATUS nvarchar(2) ) ALTER TABLE CATSDB ADD CONSTRAINT PK_CATSDB_ID PRIMARY KEY CLUSTERED(ID) CREATE NONCLUSTERED INDEX FICATSDB_REFCOUNTER ON CATSDB(REFCOUNTER,[COUNTER]); IF OBJECT_ID('tempdb..#TEMP', 'U') IS NOT NULL DROP TABLE #TEMP; CREATE TABLE #TEMP (UpdateID INT, FINDID INT PRIMARY KEY, [COUNTER] [NVARCHAR](12) NOT NULL, [REFCOUNTER] [NVARCHAR](12) NULL, [PERNR] [NVARCHAR](8) NULL, [WORKDATE] [NVARCHAR](8) NULL, [CATSHOURS] [DECIMAL](7, 3) NULL, [APDAT] [NVARCHAR](8) NULL, [LAETM] [NVARCHAR](6) NULL, [CATS_STATUS] [NVARCHAR](2) NULL, [APPR_STATUS] [NVARCHAR](2) NULL ); WITH CTE AS (SELECT a.id, a.[COUNTER], a.REFCOUNTER, a.id AS Findid FROM dbo.CATSDB A UNION ALL SELECT b.id, a.[COUNTER], a.[refCOUNTER], a.id FROM dbo.CATSDB A INNER JOIN CTE b ON(a.REFCOUNTER = b.[COUNTER]) WHERE a.id >= b.Findid), CTE1 AS (SELECT id, MAX(Findid) Findid FROM CTE GROUP BY id) INSERT INTO #TEMP (UpdateID, FINDID, [COUNTER], [REFCOUNTER], [PERNR], [WORKDATE], [CATSHOURS], [APDAT], [LAETM], [CATS_STATUS], [APPR_STATUS] ) SELECT c1.ID, c1.FINDID, a.COUNTER, a.REFCOUNTER, a.PERNR, a.WORKDATE, a.CATSHOURS, a.APDAT, a.LAETM, a.CATS_STATUS, a.APPR_STATUS FROM dbo.CATSDB A INNER JOIN CTE1 c1 ON a.id = c1.Findid; BEGIN TRY BEGIN TRAN; UPDATE A SET [REFCOUNTER] = b.REFCOUNTER, [PERNR] = b.PERNR, [WORKDATE] = b.WORKDATE, [CATSHOURS] = b.CATSHOURS, [APDAT] = b.APDAT, [LAETM] = b.LAETM, [CATS_STATUS] = b.CATS_STATUS, [APPR_STATUS] = b.APPR_STATUS FROM CATSDB A INNER JOIN #TEMP B ON a.id = b.UpdateID; -- this is only test query SELECT c1.UpdateID AS UpdateID, a.* FROM dbo.CATSDB A INNER JOIN #TEMP c1 ON a.id = c1.Findid; IF(##trancount > 0) ROLLBACK; -- commit END TRY BEGIN CATCH IF(##trancount > 0) ROLLBACK; END CATCH; #Temp should be permanent table. IMO, your table badly need identity column which should be identity and Clustered Index. You can try, you can Alter it . REFCOUNTER,COUNTER should be Non Clustered Index. After and only after optimizing the query and with proper PLAN above index is going to boost performance. Proper PLAN : Should you use Recursive or RBAR and update millions of records in one time or Should I Batch update ? You can first Test the script with millions of row with Rollback.
Is there a way using SQL to insert duplicate rows into Table A depending upon the results of a number column in Table B?
I am using TSQL on SQL Server and have bumped into a challenge... I am querying the data out of TableA and then inserting it into TableB. See my stored procedure code below for more info. However as an added layer of complexity one of the Columns in TableA holds a numeric number (It can be any number from 0 to 50) and depending upon this number I have to make 'n' number of Duplicates for that specific row. (for example in TableA we have a column called TableA.RepeatNumber and this will dictate how many duplicate rows I need to create of this row in TableB. Its worth noting that some of the rows won't need any duplicates as they will have a value of 0 in TableA.RepeatNumber) (This stored procedure below works fine to insert single rows into TableB.) ALTER PROCEDURE [dbo].[Insert_rows] #IDCode As NVarChar(20), #UserName As NVarChar(20) AS BEGIN -- SET NOCOUNT ON added to prevent extra result sets from -- interfering with SELECT statements. SET NOCOUNT ON; -- Insert statements for procedure here Insert INTO TableB (Status, Number, Date, Time, User) SELECT Status, Number, date, Time, User, FROM TableA where Status = 1 AND RepeatNumber > 0 AND Code = #IDCode AND User = #UserName END Any pointers on where I should look to find a solution to this problem (if it exists would be greatly appreciated.) Best wishes Dick
You can use a recursive CTE: with a as ( select a.Status, a.Number, a.date, a.Time, a.User, a.RepeatNumber, 1 as seqnum from tablea a where Status = 1 and RepeatNumber > 0 and Code = #IDCode and User = #UserName union all select Status, Number, date, Time, User, RepeatNumber, seqnum + 1 from a where seqnum < RepeatNumber ) insert INTO TableB (Status, Number, Date, Time, User) select Status, Number, date, Time, User from a; You only need up to 50 duplicates, so you don't have to worry about maximum recursion. A numbers table can also be used for this purpose.
To achieve this using a numbers table and avoiding recursion which may have a performance penalty, you can do the following (if you already have an actual numbers table in your database you can just join to that and avoid the cte): declare #TableA table(Status nvarchar(10),RepeatNumber int,[date] date,Time time,[User] nvarchar(10)); insert into #TableA values('Status 0',0,'20190101','00:00:00','User 0'),('Status 1',1,'20190101','01:01:01','User 1'),('Status 2',2,'20190102','02:02:02','User 2'),('Status 3',3,'20190103','03:03:03','User 3'); with t(t)as(select t from(values(1),(1),(1),(1),(1),(1),(1),(1))as t(t)) ,n(n)as(select top 50 row_number()over(order by(select null)) from t,t t2) select Status ,RepeatNumber ,[date] ,Time ,[User] ,n.n from #TableA as a join n on a.RepeatNumber >= n.n where RepeatNumber > 0 order by a.Status ,n.n; Output +----------+--------------+------------+------------------+--------+---+ | Status | RepeatNumber | date | Time | User | n | +----------+--------------+------------+------------------+--------+---+ | Status 1 | 1 | 2019-01-01 | 01:01:01.0000000 | User 1 | 1 | | Status 2 | 2 | 2019-01-02 | 02:02:02.0000000 | User 2 | 1 | | Status 2 | 2 | 2019-01-02 | 02:02:02.0000000 | User 2 | 2 | | Status 3 | 3 | 2019-01-03 | 03:03:03.0000000 | User 3 | 1 | | Status 3 | 3 | 2019-01-03 | 03:03:03.0000000 | User 3 | 2 | | Status 3 | 3 | 2019-01-03 | 03:03:03.0000000 | User 3 | 3 | +----------+--------------+------------+------------------+--------+---+
Multiple sql query or Cursor?
I need help on something that seems to be complex to me. I made a query to create a tbl1 which is the Cartesian product of the tables Item and Warehouse. It give’s me back all items in all warehouses: SELECT i.ItemID, w.WarehouseID FROM Item i, Warehouse w I made a second query (tbl2) where I check the date of the last document previous or equal to a variable date (#datevar) and whose quantity rule is 1 (PhysicalQtyRule = 1), this by Item and Warehouse, obtained from StockHistory table SELECT MAX(CreateDate) AS [DATE1], ItemID, Quantity, WarehouseID FROM StockHistory WHERE PhysicalQtyRule = 1 AND CreateDate <= #datevar GROUP BY ItemID, Quantity, WarehouseID Now, I need more three steps: Build a third table containing per item and warehouse the sum of quantity, but the quantity rule is 2 (PhysicalQtyRule = 2) and date between tbl2.date (if exists) and the date of the variable #datevar, obtained from the table StockHistory. Something like that: SELECT ItemID, WarehouseID, SUM(Quantity) FROM StockHistory WHERE PhysicalQtyRule = 2 AND CreateDate > tbl2.DATE1 --If exists AND CreateDate <= #datevar GROUP BY ItemID, WarehouseID Build a fourth table containing per item and warehouse the sum of quantity, but the quantity rule is 3 (PhysicalQtyRule = 3) and date between tbl2.date (if any) and the date of the variable #datevar, obtained from the table StockHistory. Something like that: SELECT ItemID, WarehouseID, SUM(Quantity) FROM StockHistory WHERE PhysicalQtyRule = 3 AND CreateDate > tbl2.DATE1 --If exists AND CreateDate <= #datevar GROUP BY ItemID, WarehouseID Create a final table based on the first one, with an sum quantity column, something like that: SELECT i.ItemID, w.WarehouseID, tbl2.Quantity + tbl3.Quantity – tbl4.Quantity AS [Qty] FROM Item i, Warehouse w I don't know if need cursors (something new for me) or multiple querys, but it's important the best performance because my StockHistory table have millions of records. Can anyone help-me please? Thank you! Some sample data, only for one Item and one warehouse: +--------+-------------+------------+-----------------+----------+ | ItemID | WarehouseID | CreateDate | PhysicalQtyRule | Quantity | Balance | comments +--------+-------------+------------+-----------------+----------+ | 1234 | 11 | 2013-03-25 | 2 | 35 | 35 | Rule 2 = In | 1234 | 11 | 2013-03-28 | 3 | 30 | 5 | Rule 3 = Out | 1234 | 11 | 2013-04-01 | 1 | 3 | 3 | Rule 1 = Reset | 1234 | 11 | 2013-07-12 | 2 | 40 | 43 | Rule 2 = In | 1234 | 11 | 2013-09-05 | 3 | 20 | 23 | Rule 3 = Out | 1234 | 11 | 2013-12-31 | 1 | 25 | 25 | Rule 1 = Reset | 1234 | 11 | 2014-01-09 | 3 | 11 | 14 | Rule 3 = Out | 1234 | 11 | 2014-01-16 | 3 | 6 | 8 | Rule 3 = Out I want to know the balance on any variable date.
Without your data, I can't test this but I believe this should be your solution. SELECT i.ItemID ,w.WarehouseID ,[Qty] = tbl2.Quantity + tbl3.Quantity – tbl4.Quantity FROM Item i CROSS JOIN Warehouse w OUTER APPLY ( SELECT [DATE1] = MAX(sh.CreateDate) ,sh.ItemID ,sh.Quantity ,sh.WarehouseID FROM StockHistory sh WHERE sh.PhysicalQtyRule = 1 AND sh.CreateDate <= #datevar AND i.ItemID = sh.ItemID AND w.WarehouseID = sh.WarehouseID GROUP BY sh.ItemID, sh.Quantity, sh.WarehouseID ) tbl2 OUTER APPLY ( SELECT sh.ItemID ,sh.WarehouseID ,[Quantity] = SUM(sh.Quantity) FROM StockHistory sh WHERE sh.PhysicalQtyRule = 2 AND sh.CreateDate > tbl2.DATE1 --If exists AND sh.CreateDate <= #datevar AND i.ItemID = sh.ItemID AND w.WarehouseID = sh.WarehouseID GROUP BY sh.ItemID, sh.WarehouseID ) tbl3 OUTER APPLY ( SELECT sh.ItemID ,sh.WarehouseID ,[Quantity] = SUM(sh.Quantity) FROM StockHistory sh WHERE sh.PhysicalQtyRule = 3 AND sh.CreateDate > tbl2.DATE1 --If exists AND sh.CreateDate <= #datevar AND i.ItemID = sh.ItemID AND w.WarehouseID = sh.WarehouseID GROUP BY sh.ItemID, sh.WarehouseID ) tbl4