I have a SQL Server database with some audit records showing changes to a third party database (OpenEdge). I have no control over the structure of the audit data, nor the way the third party database audits data changes. So I'm left with, for example, the following data...
If you follow the first five rows you can see they all belong to TransId 1532102 (represents a database transaction) where the TransSeq represents a database action within a single transaction.
In the columns prefix New the audit changes are visible. If the value is NULL then no change to that field took place.
Looking at the data you can see that where TransId = 1532102 the PrimaryIdentifier is changed from 2 to -2 (row 1), then from -2 to 3 (row 3), then from 3 to 4 (row 4) and finally from 4 to 5 (row 5). You might also notice that when the PrimaryIdentifier changes from 3 to 4 the SecondaryIdentifier changes from 'abcd' to 'efgh' (row 4).
So these multiple changes are actually only occurring on a single source record. So with this in mind rows 1, 3, 4 & 5 can all be condensed into a single row (see below)
Ultimately there are only two record changes in TransId 1532102..
I need to translate these changes into a single UPDATE statement on a target database. In order to do this I need to ensure I have a single record showing the before and after values.
So given the source data presented here I need to produce the following data set..
What query structures could I use to achieve this? I was thinking recursive CTEs or perhaps using Hierarchical structures?
Ultimately I need this to perform as well as possible so I wanted to pose the question here in case I hadn't considered all possible approaches.
Thoughts welcome and here's a script for the sample data
DECLARE #TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT #TestTable
SELECT 128, 1532102, 0, 2, 'abcd', -2, NULL, NULL, 'test data'
UNION SELECT 128, 1532102, 1, 3, 'abcd', 2, NULL, NULL, NULL
UNION SELECT 128, 1532102, 2, -2, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532102, 3, 3, 'abcd', 4, 'efgh', NULL, NULL
UNION SELECT 128, 1532102, 4, 4, 'efgh', 5, NULL, 2, NULL
UNION SELECT 128, 1532102, 5, 5, 'efgh', NULL, 'ghfi', NULL, NULL
UNION SELECT 128, 1532106, 0, 3, 'abcd', -3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 1, 4, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 2, -3, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 0, 4, 'abcd', -4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 1, 5, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 2, -4, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 0, 5, 'abcd', -5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 1, 4, 'abcd', 5, NULL, 1, NULL
UNION SELECT 128, 1532114, 2, -5, 'abcd', 4, NULL, NULL, 'some more test data'
SELECT *
FROM #TestTable
EDIT:
I've actually been unable to write any queries that successfully track the identifier changes. Can anyone help - I need a query that tracks the changes in PrimaryIdentifier values and ultimately provides a single record for each tracking with start values and end values.
EDIT 2:
There's been a deleted answer that suggests the update to the key identifiers is not possible when condensed and that I should step through the changes instead. I thought it would be valuable to add my comments for further info to the question..
I need to condense the dataset because of the volume of audit records being generated; most of which are unecessary because of the way the source DBMS makes its changes. I need to reduce the dataset and I need to track key identifier changes. The update should be possible without clashing on id change during the update statement - see this example.
I assume that
1) (PrimaryIdentifier, SecondaryIdentifier) is a PK of the target table,
2) Every transacton in the audit table leaves target table in a consistent state.
So the update of the PK in a single statement for every transaction using case will run OK:
declare #t table (id int primary key, old int);
insert #t(id, old) values (4,4),(5,5);
update #t set id = case id
when 4 then 5
when 5 then 4 end;
select * from #t;
The plan is
1. Condense transactions
2. Generate update sql into temp table. Then you can run all or selected items from the temp table. Every item is of the form
UPDATE myTable SET
PrimaryIdentifier = CASE WHEN PrimaryIdentifier=2 AND SecondaryIdentifier='abcd' THEN 5
WHEN PrimaryIdentifier=3 AND SecondaryIdentifier='abcd' THEN 2 END,
SecondaryIdentifier = CASE WHEN PrimaryIdentifier=2 AND SecondaryIdentifier='abcd' THEN 'efgh'
WHEN PrimaryIdentifier=3 AND SecondaryIdentifier='abcd' THEN 'abcd' END ,
Level= CASE WHEN PrimaryIdentifier=2 AND SecondaryIdentifier='abcd' THEN 2
WHEN PrimaryIdentifier=3 AND SecondaryIdentifier='abcd' THEN Level END ,
Value= CASE WHEN PrimaryIdentifier=2 AND SecondaryIdentifier='abcd' THEN 'test data'
WHEN PrimaryIdentifier=3 AND SecondaryIdentifier='abcd' THEN Value END
WHERE 1=2 OR (PrimaryIdentifier=2 AND SecondaryIdentifier='abcd')
OR (PrimaryIdentifier=3 AND SecondaryIdentifier='abcd')
The query
DECLARE #TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT #TestTable
SELECT 128, 1532102, 0, 2, 'abcd', -2, NULL, NULL, 'test data'
UNION SELECT 128, 1532102, 1, 3, 'abcd', 2, NULL, NULL, NULL
UNION SELECT 128, 1532102, 2, -2, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532102, 3, 3, 'abcd', 4, 'efgh', NULL, NULL
UNION SELECT 128, 1532102, 4, 4, 'efgh', 5, NULL, 2, NULL
UNION SELECT 128, 1532106, 0, 3, 'abcd', -3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 1, 4, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 2, -3, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 0, 4, 'abcd', -4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 1, 5, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 2, -4, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532110, 3, 5, 'abcd', 6, NULL, NULL, NULL
UNION SELECT 128, 1532110, 4, 6, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 0, 5, 'abcd', -5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 1, 4, 'abcd', 5, NULL, 1, NULL
UNION SELECT 128, 1532114, 2, -5, 'abcd', 4, NULL, NULL, 'some more test data'
;
WITH root AS (
-- Top parent updates within transactions
SELECT SyncId, TransId, TransSeq, PrimaryIdentifier AS rPrimaryIdentifier, SecondaryIdentifier AS rSecondaryIdentifier,
NewPrimaryIdentifier,
coalesce(NewSecondaryIdentifier, SecondaryIdentifier) AS NewSecondaryIdentifier,
newLevel, NewValue
FROM #TestTable t
WHERE NOT EXISTS (SELECT 1
FROM #TestTable t2
WHERE t2.SyncId=t.SyncId AND t2.TransId = t.TransId
AND t2.TransSeq < t.TransSeq
AND t.PrimaryIdentifier = t2.NewPrimaryIdentifier
AND t.SecondaryIdentifier = coalesce(t2.NewSecondaryIdentifier, t2.SecondaryIdentifier)
)
-- recursion to track the chain of updates
UNION ALL
SELECT root.SyncId, root.TransId, t.TransSeq, rPrimaryIdentifier, rSecondaryIdentifier,
t.NewPrimaryIdentifier,
coalesce(t.NewSecondaryIdentifier, root.NewSecondaryIdentifier),
coalesce(root.NewLevel, t.NewLevel), coalesce(root.NewValue, t.NewValue)
FROM root
JOIN #TestTable t ON root.SyncId=t.SyncId AND root.TransId = t.TransId
AND root.TransSeq < t.TransSeq
AND t.PrimaryIdentifier = root.NewPrimaryIdentifier
AND t.SecondaryIdentifier = root.NewSecondaryIdentifier
)
,condensed as (
-- last update in the chain
SELECT TOP(1) WITH TIES *
FROM root
ORDER BY row_number() over (partition by SyncId, TransId, rPrimaryIdentifier, rSecondaryIdentifier
order by TransSeq desc)
)
-- generate sql
SELECT SyncId, TransId, sql = 'UPDATE myTable SET PrimaryIdentifier = CASE'
+ (SELECT ' WHEN PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20))
+' AND SecondaryIdentifier=''' + rSecondaryIdentifier
+''' THEN ' + CAST(NewPrimaryIdentifier as varchar(20))
FROM condensed c2
WHERE c1.SyncId = c2.SyncId AND c1.TransId= c2.TransId
FOR XML PATH('') )
+ ' END, SecondaryIdentifier = CASE'
+ (SELECT ' WHEN PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20))
+' AND SecondaryIdentifier=''' + rSecondaryIdentifier
+''' THEN ''' + NewSecondaryIdentifier + ''''
FROM condensed c2
WHERE c1.SyncId = c2.SyncId AND c1.TransId= c2.TransId
FOR XML PATH('') )
+ ' END , Level= CASE'
+ (SELECT ' WHEN PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20))
+' AND SecondaryIdentifier=''' + rSecondaryIdentifier
+''' THEN '
+ CASE WHEN NewLevel IS NULL THEN ' Level ' ELSE CAST(NewLevel as varchar(20)) END
FROM condensed c2
WHERE c1.SyncId = c2.SyncId AND c1.TransId= c2.TransId
FOR XML PATH('') )
+ ' END , Value= CASE'
+ (SELECT ' WHEN PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20))
+' AND SecondaryIdentifier=''' + rSecondaryIdentifier
+''' THEN '
+ CASE WHEN NewValue IS NULL THEN ' Value ' ELSE '''' + NewValue + '''' END
FROM condensed c2
WHERE c1.SyncId = c2.SyncId AND c1.TransId= c2.TransId
FOR XML PATH('') )
+ ' END'
+ ' WHERE 1=2'
+ (SELECT ' OR (PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20))
+' AND SecondaryIdentifier=''' + rSecondaryIdentifier +''')'
FROM condensed c2
WHERE c1.SyncId = c2.SyncId AND c1.TransId= c2.TransId
FOR XML PATH('') )
INTO #UpdSql
FROM condensed c1
GROUP BY SyncId, TransId
SELECT *
FROM #UpdSql
ORDER BY SyncId, TransId
EDIT
Taking into account NewPrimaryIdentifier can be NULL too. See added row at #TestTable. Sql generation skipped.
DECLARE #TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT #TestTable
SELECT 128, 1532102, 0, 2, 'abcd', -2, NULL, NULL, 'test data'
UNION SELECT 128, 1532102, 1, 3, 'abcd', 2, NULL, NULL, NULL
UNION SELECT 128, 1532102, 2, -2, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532102, 3, 3, 'abcd', 4, 'efgh', NULL, NULL
UNION SELECT 128, 1532102, 4, 4, 'efgh', 5, NULL, 2, NULL
UNION SELECT 128, 1532102, 5, 5, 'efgh', null, 'ghfi', null, NULL -- added
UNION SELECT 128, 1532106, 0, 3, 'abcd', -3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 1, 4, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 2, -3, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 0, 4, 'abcd', -4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 1, 5, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 2, -4, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532110, 3, 5, 'abcd', 6, NULL, NULL, NULL
UNION SELECT 128, 1532110, 4, 6, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 0, 5, 'abcd', -5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 1, 4, 'abcd', 5, NULL, 1, NULL
UNION SELECT 128, 1532114, 2, -5, 'abcd', 4, NULL, NULL, 'some more test data'
;
WITH root AS (
-- Top parent updates within transactions
SELECT SyncId, TransId, TransSeq, PrimaryIdentifier AS rPrimaryIdentifier, SecondaryIdentifier AS rSecondaryIdentifier,
coalesce(NewPrimaryIdentifier, PrimaryIdentifier) AS NewPrimaryIdentifier,
coalesce(NewSecondaryIdentifier, SecondaryIdentifier) AS NewSecondaryIdentifier,
newLevel, NewValue
FROM #TestTable t
WHERE NOT EXISTS (SELECT 1
FROM #TestTable t2
WHERE t2.SyncId=t.SyncId AND t2.TransId = t.TransId
AND t2.TransSeq < t.TransSeq
AND t.PrimaryIdentifier = coalesce(t2.NewPrimaryIdentifier, t2.PrimaryIdentifier)
AND t.SecondaryIdentifier = coalesce(t2.NewSecondaryIdentifier, t2.SecondaryIdentifier)
)
-- recursion to track the chain of updates
UNION ALL
SELECT root.SyncId, root.TransId, t.TransSeq, rPrimaryIdentifier, rSecondaryIdentifier,
coalesce(t.NewPrimaryIdentifier, root.NewPrimaryIdentifier),
coalesce(t.NewSecondaryIdentifier, root.NewSecondaryIdentifier),
coalesce(t.NewLevel, root.NewLevel), coalesce(t.NewValue, root.NewValue)
FROM root
JOIN #TestTable t ON root.SyncId=t.SyncId AND root.TransId = t.TransId
AND root.TransSeq < t.TransSeq
AND t.PrimaryIdentifier = root.NewPrimaryIdentifier
AND t.SecondaryIdentifier = root.NewSecondaryIdentifier
)
,condensed as (
-- last update in the chain
SELECT TOP(1) WITH TIES *
FROM root
ORDER BY row_number() over (partition by SyncId, TransId, rPrimaryIdentifier, rSecondaryIdentifier
order by TransSeq desc)
)
SELECT *
FROM condensed
ORDER BY SyncId, TransId, rPrimaryIdentifier, rSecondaryIdentifier
Here is a second stab at producing the originally asked for output. This time using a bunch of CTE:s.
DECLARE #TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT #TestTable
SELECT 128, 1532102, 0, 2, 'abcd', -2, NULL, NULL, 'test data'
UNION SELECT 128, 1532102, 1, 3, 'abcd', 2, NULL, NULL, NULL
UNION SELECT 128, 1532102, 2, -2, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532102, 3, 3, 'abcd', 4, 'efgh', NULL, NULL
UNION SELECT 128, 1532102, 4, 4, 'efgh', 5, NULL, 2, NULL
UNION SELECT 128, 1532106, 0, 3, 'abcd', -3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 1, 4, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 2, -3, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 0, 4, 'abcd', -4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 1, 5, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 2, -4, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532110, 3, 5, 'abcd', 6, NULL, NULL, NULL
UNION SELECT 128, 1532110, 4, 6, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 0, 5, 'abcd', -5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 1, 4, 'abcd', 5, NULL, 1, NULL
UNION SELECT 128, 1532114, 2, -5, 'abcd', 4, NULL, NULL, 'some more test data'
;with baseCTE as (
select SyncId, TransId, TransSeq, PrimaryIdentifier, SecondaryIdentifier,
isnull(NewPrimaryIdentifier, PrimaryIdentifier) as NewPrimaryIdentifier,
isnull(NewSecondaryIdentifier, SecondaryIdentifier) as NewSecondaryIdentifier,
NewLevel, NewValue
from #TestTable
),
syncTransEntryPointsCte as (
select *
from baseCTE b
where not exists(
select *
from baseCTE subb
where b.SyncId = subb.SyncId
and b.TransId = subb.TransId
and b.PrimaryIdentifier = subb.NewPrimaryIdentifier
and b.SecondaryIdentifier = subb.NewSecondaryIdentifier
and b.TransSeq > subb.TransSeq
)
)
, recursiveBaseCte as (
select *, 0 as lev, TransSeq as OrigTransSec from syncTransEntryPointsCte
union all
select
c.SyncId, c.TransId, c.TransSeq, p.PrimaryIdentifier, p.SecondaryIdentifier, c.NewPrimaryIdentifier, c.NewSecondaryIdentifier, isnull(c.NewLevel, p.NewLevel), isnull(c.NewValue, p.NewValue),
p.lev + 1,
p.OrigTransSec
from baseCTE c
join recursiveBaseCte as p on (
c.SyncId = p.SyncId and c.TransId = p.TransId and c.PrimaryIdentifier = p.NewPrimaryIdentifier and c.SecondaryIdentifier = p.NewSecondaryIdentifier and c.TransSeq > p.TransSeq
)
)
select r.SyncId, r.TransId, r.OrigTransSec as TransSec,
r.PrimaryIdentifier, r.SecondaryIdentifier,
nullif(r.NewPrimaryIdentifier, r.PrimaryIdentifier) as NewPrimaryIdentifier,
nullif(r.NewSecondaryIdentifier, r.SecondaryIdentifier) as NewSecondaryIdentifier,
r.NewLevel, r.NewValue
from recursiveBaseCte r
join (
select SyncId, TransId, PrimaryIdentifier, SecondaryIdentifier, max(lev) as mlev
from recursiveBaseCte
group by SyncId, TransId, PrimaryIdentifier, SecondaryIdentifier
) as selectForOutput on
r.SyncId = selectForOutput.SyncId
and r.TransId = selectForOutput.TransId
and r.PrimaryIdentifier = selectForOutput.PrimaryIdentifier
and r.SecondaryIdentifier = selectForOutput.SecondaryIdentifier
and r.lev = selectForOutput.mlev
order by 1,2,3
Whether or not the CTE approach is any faster than the cursor based one is difficult to guess. I do suggest you test run this at a suitable time when the server in question is not under heavy load.
Update
The script first declares the baseCTE which is used just to make sure that we have values in NewPrimaryIdentifier and NewSecondaryIdentifier for each row, even if one or both of them were not changed in the update. This makes everything after that easier since we can then join to the next row for the same combination within a specific transaction.
The syncTransEntryPointCte in turn uses baseCTE to find all rows within one transaction that were not preceded by another row within the same transaction.
recursiveBaseCte then uses both of the previous CTE:s to recursively find rows and aggregate changes. The final query then uses it to produce the final output.
The output should be usable for updating a stale copy of the source table if you can manage to do the updates for one condensed transaction in one update statement. If, as I originally assumed, you try to build one update statement for each row in the condensed audit output, it will not work.
Finally, obligatory disclaimer: This seems to work with the test data you gave in the question. I can give no guarantees that it works for the real thing, so use with caution.
Here is a first stab at getting the desired output. It's using a CURSOR, so don't expect great performance.
set nocount on
DECLARE #TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
DECLARE #OutputTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT #TestTable
SELECT 128, 1532102, 0, 2, 'abcd', -2, NULL, NULL, 'test data'
UNION SELECT 128, 1532102, 1, 3, 'abcd', 2, NULL, NULL, NULL
UNION SELECT 128, 1532102, 2, -2, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532102, 3, 3, 'abcd', 4, 'efgh', NULL, NULL
UNION SELECT 128, 1532102, 4, 4, 'efgh', 5, NULL, 2, NULL
UNION SELECT 128, 1532106, 0, 3, 'abcd', -3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 1, 4, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 2, -3, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 0, 4, 'abcd', -4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 1, 5, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 2, -4, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532110, 3, 5, 'abcd', 6, NULL, NULL, NULL
UNION SELECT 128, 1532110, 4, 6, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 0, 5, 'abcd', -5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 1, 4, 'abcd', 5, NULL, 1, NULL
UNION SELECT 128, 1532114, 2, -5, 'abcd', 4, NULL, NULL, 'some more test data'
--SELECT * FROM #TestTable
declare #cSyncId int, #cTransId int, #cTransSeq int, #cPrimaryId int, #cSecondaryId nchar(4), #cNewPrimaryId int, #cNewSecondary nchar(4), #cNewLevel int, #cNewValue nvarchar(20)
declare #newTransSeq int, #prevSyncId int, #prevTransId int
set #newTransSeq = 0
set #prevSyncId = 0
set #prevTransId = 0
declare auditCursor CURSOR for
select SyncId, TransId, TransSeq, PrimaryIdentifier, SecondaryIdentifier,
isnull(NewPrimaryIdentifier, PrimaryIdentifier) as NewPrimaryIdentifier,
isnull(NewSecondaryIdentifier, SecondaryIdentifier) as NewSecondaryIdentifier,
NewLevel, NewValue
from #TestTable
order by SyncId, TransId, TransSeq
open auditCursor
fetch next from auditCursor into #cSyncId, #cTransId, #cTransSeq, #cPrimaryId, #cSecondaryId, #cNewPrimaryId, #cNewSecondary, #cNewLevel, #cNewValue
while ##FETCH_STATUS = 0
begin
if #prevSyncId != #cSyncId or #prevTransId != #cTransId
begin
set #newTransSeq = 0
set #prevSyncId = #cSyncId
set #prevTransId = #cTransId
end
if(not exists(select * from #OutputTable where SyncId = #cSyncId and TransId = #cTransId and NewPrimaryIdentifier = #cPrimaryId and NewSecondaryIdentifier = #cSecondaryId))
begin
insert into #OutputTable values(#cSyncId, #cTransId, #newTransSeq, #cPrimaryId, #cSecondaryId, #cNewPrimaryId, #cNewSecondary, #cNewLevel, #cNewValue)
set #newTransSeq = #newTransSeq + 1
end
else
begin
update #OutputTable
set NewPrimaryIdentifier = isnull(#cNewPrimaryId, NewPrimaryIdentifier),
NewSecondaryIdentifier = isnull(#cNewSecondary, NewSecondaryIdentifier),
NewLevel = isnull(#cNewLevel, NewLevel),
NewValue = isnull(#cNewValue, NewValue)
where SyncId = #cSyncId
and TransId = #cTransId
and NewPrimaryIdentifier = #cPrimaryId
and NewSecondaryIdentifier = #cSecondaryId
end
fetch next from auditCursor into #cSyncId, #cTransId, #cTransSeq, #cPrimaryId, #cSecondaryId, #cNewPrimaryId, #cNewSecondary, #cNewLevel, #cNewValue
end
deallocate auditCursor
select
SyncId, TransId, TransSeq, PrimaryIdentifier, SecondaryIdentifier,
nullif(NewPrimaryIdentifier, PrimaryIdentifier) as NewPrimaryIdentifier,
nullif(NewSecondaryIdentifier, SecondaryIdentifier) as NewSecondaryIdentifier,
NewLevel, NewValue
from #OutputTable order by 1,2,3
As far as I can tell, this will give the output you want. But then it depends on what you want to do next if this is actually the output you should want.
If, for example, you are going to use the output to somehow generate update scripts in order to sync a copy of the database so that the copy is up to date with the source database, this will not work.
If we look at the output for transaction 1532106, the condensed audit has primary id 3 change to 4, then primary id 4 change to 3. That will of course not work.
Based on how the audit trail looks, it seems that the program manipulating the tables switches a primary id to negative value when it need to free up the id on the row. If we change one line in my sample:
if(not exists(select * from #OutputTable where SyncId = #cSyncId and TransId = #cTransId and NewPrimaryIdentifier = #cPrimaryId and NewSecondaryIdentifier = #cSecondaryId))
to
if(not exists(select * from #OutputTable where SyncId = #cSyncId and TransId = #cTransId and NewPrimaryIdentifier = #cPrimaryId and NewSecondaryIdentifier = #cSecondaryId) or #cPrimaryId < 0)
(added or #cPrimaryId < 0) then we get a different, less condensed, output that as far as I can tell should be workable for the case mentioned.
Here is a way to get "the latest condensed record" only using SQL. Since I don't have a full data set I cannot tell you how well it will perform though.
DECLARE #TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT #TestTable
SELECT 128, 1532102, 0, 2, 'abcd', -2, NULL, NULL, 'test data'
UNION SELECT 128, 1532102, 1, 3, 'abcd', 2, NULL, NULL, NULL
UNION SELECT 128, 1532102, 2, -2, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532102, 3, 3, 'abcd', 4, 'efgh', NULL, NULL
UNION SELECT 128, 1532102, 4, 4, 'efgh', 5, NULL, 2, NULL
UNION SELECT 128, 1532106, 0, 3, 'abcd', -3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 1, 4, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 2, -3, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 0, 4, 'abcd', -4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 1, 5, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 2, -4, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 0, 5, 'abcd', -5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 1, 4, 'abcd', 5, NULL, 1, NULL
UNION SELECT 128, 1532114, 2, -5, 'abcd', 4, NULL, NULL, 'some more test data';
WITH data AS (
SELECT *
, ROW_NUMBER() OVER(PARTITION BY TRANSID ORDER BY CASE WHEN PrimaryIdentifier IS NULL THEN 1 ELSE 0 END, TRANSSeq desc) AS rn_PrimaryIdentifier
, ROW_NUMBER() OVER(PARTITION BY TRANSID ORDER BY CASE WHEN SecondaryIdentifier IS NULL THEN 1 ELSE 0 END, TRANSSeq desc) AS rn_SecondaryIdentifier
, ROW_NUMBER() OVER(PARTITION BY TRANSID ORDER BY CASE WHEN NewPrimaryIdentifier IS NULL THEN 1 ELSE 0 END, TRANSSeq desc) AS rn_NewPrimaryIdentifier
, ROW_NUMBER() OVER(PARTITION BY TRANSID ORDER BY CASE WHEN NewSecondaryIdentifier IS NULL THEN 1 ELSE 0 END, TRANSSeq desc) AS rn_NewSecondaryIdentifier
, ROW_NUMBER() OVER(PARTITION BY TRANSID ORDER BY CASE WHEN NewLevel IS NULL THEN 1 ELSE 0 END, TRANSSeq desc) AS rn_NewLevel
, ROW_NUMBER() OVER(PARTITION BY TRANSID ORDER BY CASE WHEN NewValue IS NULL THEN 1 ELSE 0 END, TRANSSeq desc) AS rn_NewValue
FROM #TestTable
)
, transIds
AS (
SELECT DISTINCT SyncId, TransId
FROM #TestTable)
SELECT t.SyncId
, t.TransId
, (SELECT d.PrimaryIdentifier FROM data d WHERE d.TransId = t.TransId AND d.rn_PrimaryIdentifier = 1) AS PrimaryIdentifier
, (SELECT d.SecondaryIdentifier FROM data d WHERE d.TransId = t.TransId AND d.rn_SecondaryIdentifier = 1) AS SecondaryIdentifier
, (SELECT d.NewPrimaryIdentifier FROM data d WHERE d.TransId = t.TransId AND d.rn_NewPrimaryIdentifier = 1) AS NewPrimaryIdentifier
, (SELECT d.NewSecondaryIdentifier FROM data d WHERE d.TransId = t.TransId AND d.rn_NewSecondaryIdentifier = 1) AS NewSecondaryIdentifier
, (SELECT d.NewLevel FROM data d WHERE d.TransId = t.TransId AND d.rn_NewLevel = 1) AS NewLevel
, (SELECT d.NewValue FROM data d WHERE d.TransId = t.TransId AND d.rn_NewValue = 1) AS NewValue
FROM transIds t;
I'm using two CTE's. "data" contains all of the data along with a priority order for which row to use for each column of interest. "transIds" is just the distinct list of TransIds so the final result will have one row per Transaction Id in the original data set.
Note the use of the window's function in the data CTE:
, ROW_NUMBER() OVER(PARTITION BY TRANSID ORDER BY CASE WHEN PrimaryIdentifier IS NULL THEN 1 ELSE 0 END, TRANSSeq desc) AS rn_PrimaryIdentifier
The logic behind the windows function is to make it so the latest row with a non-null value in the respective column has a value of "1". Breaking it down:
ROWNUMBER(): gets a sequence of numbers
PARTITION BY TRANSID: restarts the sequence for each different TransId
ORDER BY CASE WHEN column IS NULL THEN 1 ELSE 0 END: Sort all nulls to the end prior to the sequence being applied.
(ORDER BY) TRANSSeq desc: Sort so the latest TransSeq is first.
In the final select, for each TransId I query the data table to get the latest non-null value for each column based on my previous window's function:
, (SELECT d.PrimaryIdentifier FROM data d WHERE d.TransId = t.TransId AND d.rn_PrimaryIdentifier = 1) AS PrimaryIdentifier
In your original question you asked to get both the original values and the latest values. I'm not sure that this makes sense. If you want an audit log of your own each time this changes, then you should just save the "current" row to an audit log table in your database prior to update. If you really want the first row from the original data set than I would suggest a union all combined with my query above. Just append this code to the above query:
UNION ALL
SELECT SyncId, TransId, PrimaryIdentifier, SecondaryIdentifier, NewPrimaryIdentifier, NewSecondaryIdentifier, NewLevel, NewValue
FROM #TestTable
WHERE TransSeq = 0
ORDER BY TransId;