MS SQL FIFO Partial transfers - sql

I have a number of transactions that transfer inventory from one account to another account. I can transfer all inventory and I can transfer partial inventory.
I need to pay commission to the owner of the account where inventory resides at my commission date.
My report needs to show the original origin of the inventory items if they have transferred and provide a unit_balance that I can calculate commission from.
Example Transactions:
Account 100
Account, trxid, transacted_units, transactiontype, transferfrom, transferto, date
100, 1, 100, buy, NULL, NULL, 1/1/2020
100, 2, 50, transfer in, 200, NULL, 1/2/2020
Account 200
Account, trxid, transacted_units, transactiontype, transferfrom, transferto, date
200, 3, 40, buy, NULL, NULL, 12/1/2019
200, 4, 30, buy, NULL, NULL, 12/2/2019
200, 5, 7, sell, NULL, NULL, 12/3/2019
200, 6, 50, transfer out, NULL, 100, 1/2/2020
My report output needs to show the full details of accounts associated with the inventory that relates to the unit_balance
Report Output:
[level], Account, trxid, parenttrxid, transacted_units, transactiontype, transferfrom, transferto, date, units_balance
0, 100, 1, NULL, 100, buy, NULL, NULL, 1/1/2020, 100
0, 100, 2, NULL, 50, transfer in, 200, NULL, 1/2/2020, NULL
1, 200, 3, 2, 40, buy, NULL, NULL, 12/1/2019, 33
1, 200, 4, 2, 30, buy, NULL, NULL, 12/2/2019, 17
1, 200, 5, 2, 7, sell, NULL, NULL, 12/3/2019, 0
1, 200, 6, 2, 50, transfer out, NULL, 100, 1/2/2020, 0
*The FIFO logic applies the 7 units sold to the first buy for account 200. The transfer out should then calculate the units_balance on the remaining eligible transactions.
The SQL code I have today only works when I transfer out the full inventory amount, not partial transfers:
select
[level],
parentid,
trxid,
account,
transactiontype,
date,
rnk,
transacted_units,
cumulative,
CASE
WHEN cumulative>0 and transacted_units>=cumulative THEN cumulative
WHEN cumulative>0 and transacted_units<cumulative THEN transacted_units
ELSE 0
END units_bal
from (
select
*,
sum(transacted_units*Positive_Negative_Indicator) over (partition by parenttrxid, account order by rnk, date, trxid RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) cumulative
from (
select *,
CASE
WHEN transacted_units*Positive_Negative_Indicator < 0 THEN 0
ELSE ROW_NUMBER() OVER (PARTITION BY parenttrxid, account ORDER BY Positive_Negative_Indicator ASC, date ASC, trxid ASC)
END rnk
from Transactions
) a
) a
The positive_negative_indicator field represents the direction of a transaction. A sell or transfer out is negative whereas the others are positive.

for each current "in" transaction, calculate the running total (of units) for the previous "in" transactions. Then assign as many "out" units that haven't been consumed by the previous "in" transactions (as many "out" units== running total of "out" units, that can be consumed by the current "in" transaction).
declare #t table
(
Account int,
trxid int,
trunits int,
trtype varchar(20),
transfrom int,
transto int,
thedate date
);
insert into #t(Account, trxid, trunits, trtype, transfrom, transto, thedate)
values
(100, 1, 100, 'buy', NULL, NULL, '20200101'),
(100, 2, 50, 'transfer in', 200, NULL, '20200201'),
(200, 3, 40, 'buy', NULL, NULL, '20190112'),
(200, 4, 30, 'buy', NULL, NULL, '20190213'),
(200, 5, 10, 'buy', NULL, NULL, '20190214'),
(200, 6, 7, 'sell', NULL, NULL, '20190315'),
(200, 7, 9, 'sell', NULL, NULL, '20190316'),
(200, 8, 25, 'buy', NULL, NULL, '20190317'),
(200, 9, 39, 'sell', NULL, NULL, '20190318'),
(200, 10, 18, 'sell', NULL, NULL, '20190319'),
(200, 11, 14, 'sell', NULL, NULL, '20190320'),
(200, 11, 50, 'transfer out', NULL, 100, '20200201');
select *, case when t.trtype not in ('sell', 'transfer out') then t.trunits -isnull(otu.out_units, 0) else null end as leftover_units
from
(
select *, sum(case when trtype not in ('sell', 'transfer out') then trunits else 0 end) over (partition by Account order by thedate rows between unbounded preceding and 1 preceding) as previous_in_running_units
from #t
) as t
outer apply
(
select top (1) ort.out_units_running_total - isnull(t.previous_in_running_units, 0) as out_units
from
(
select sum(o.trunits) over(order by o.thedate) as out_units_running_total
from #t as o
where o.trtype in ('sell', 'transfer out')
and o.Account = t.Account
and t.trtype not in ('sell', 'transfer out') --no calculations needed when cross applying for "out" transactions
) as ort --out running totals
where ort.out_units_running_total-isnull(t.previous_in_running_units, 0) <= t.trunits --<-- ("in") use as many out units as can be consumed by current t.transaction/date after deducting what has been consumed by the previous t.transaction/date
and ort.out_units_running_total-isnull(t.previous_in_running_units, 0) > 0 --not needed(?) if balance is guaranteed.. total_out = total_in
order by ort.out_units_running_total desc
) as otu; --"out units"

Related

How to sum and subtract one column value based on percentage in SQL Server 2008

DECLARE #BalanceTblRec TABLE
(
NetAmount decimal(18, 3),
Percentage int,
[Description] nvarchar(max)
)
DECLARE #BalanceTblPay TABLE
(
NetAmount decimal(18, 3),
Percentage int,
[Description] nvarchar(max)
)
INSERT INTO #BalanceTblRec
VALUES (21, 11, 'ReceiveReceipt'),
(20, 11, 'ReceiveReceipt'),
(20, 10, 'ReceiveReceipt'),
(20, 20, 'ReceiveReceipt'),
(10, 10, 'ReceiveReceipt')
INSERT INTO #BalanceTblPay
VALUES (10, 11, 'PayReceipt'),
(10, 11, 'PayReceipt'),
(10, 2, 'PayReceipt'),
(5, 15, 'PayReceipt'),
(30, 10, 'PayReceipt'),
(20, 10, 'PayReceipt')
;WITH MaPercentage AS
(
SELECT
Percentage,
SUM(NetAmount) AS Net,
'Receive' AS Flag
FROM
#BalanceTblRec
GROUP BY
Percentage
UNION ALL
SELECT
Percentage,
SUM(NetAmount) AS Net,
'Pay' AS Flag
FROM
#BalanceTblPay
GROUP BY
Percentage
)
SELECT * FROM MaPercentage
Now here I want subtract net from net based on falg, receive - pay based on percentage.
Like this:
Per Net Flag
-----------------------
10 30.000 - 50 Receive
11 41.000 - 20 Receive
20 20.000 Receive
2 10.000 Pay
15 5.000 Pay
I think this is what you want:
DECLARE #BalanceTblRec TABLE (NetAmount decimal(18,3), Percentage int, [Description] nvarchar(max))
DECLARE #BalanceTblPay TABLE (NetAmount decimal(18,3), Percentage int, [Description] nvarchar(max))
insert into #BalanceTblRec values (21, 11, 'ReceiveReceipt'),(20, 11, 'ReceiveReceipt'),(20, 10, 'ReceiveReceipt'),(20, 20, 'ReceiveReceipt'), (10, 10, 'ReceiveReceipt')
insert into #BalanceTblPay values (10, 11, 'PayReceipt'),(10, 11, 'PayReceipt'),(10, 2, 'PayReceipt'),(5, 15, 'PayReceipt'),(30, 10, 'PayReceipt') ,(20, 10, 'PayReceipt')
;WITH MaPercentage as (
select Percentage, sum(NetAmount) as Net, 'Receive' as Flag from #BalanceTblRec group by Percentage
union all
select Percentage, -sum(NetAmount) as Net, 'Pay' as Flag from #BalanceTblPay group by Percentage
)
select
Percentage,
abs(sum(net)) as SumNet,
case when sum(net) > 0 then 'Receive'
else 'Pay'
end as Flag
from MaPercentage
group by Percentage
Just changed the sign in the Pays and sum groupping by percentage.
Another way is to FULL JOIN the receivements with the payments.
;WITH RCV AS (
select Percentage, sum(NetAmount) as Net
from #BalanceTblRec
group by Percentage
)
, PAY AS (
select Percentage, sum(NetAmount) as Net
from #BalanceTblPay
group by Percentage
)
SELECT
COALESCE(r.Percentage, p.Percentage) AS Percentage,
ABS(COALESCE(r.Net, 0) - COALESCE(p.Net, 0)) AS Net,
(CASE
WHEN (COALESCE(r.Net, 0) - COALESCE(p.Net, 0)) < 0 THEN 'Pay'
ELSE 'Receive'
END) AS Flag
FROM RCV r
FULL JOIN PAY p ON p.Percentage = r.Percentage

SQL Dynamically Joining Tables on Various Columns

First time posting!
Have a use case where we want to join some sales data to a master agreement table to determine applicable fee's at a transactional level.
The hard part is that the agreement table has VARIOUS possibilities, and in a worse case scenario at least a "catch all".
We would want to start at the *most granular" level. So the purple line matches on all possible values.
However, a field like the blue sales record does not match on any value to the master except supplier, so in that case it is a catch all.
I've thought of concat'ing all the rows in the master, but then I'd need to find a way of joining it to sales? a simple concat would not successfully join the blue row example together. So it's like the join would have to dynamically choose which columns to compare.
By chance would any users have some idea's on how to achieve this?
Thanks!
(Code for tables)
create TABLE T_TEST_AGREEMENT (
SUPPLIER VARCHAR(254),
ITEM VARCHAR(254),
PROGRAM INT,
RXDA VARCHAR(254),
CTRCT INT,
FEE INT
);
create TABLE T_TEST_AGREEMENT_SALES (
SUPPLIER VARCHAR(254),
ITEM VARCHAR(254),
PROGRAM INT,
RXDA VARCHAR(254),
CTRCT INT
);
INSERT INTO T_TEST_AGREEMENT values
(123, 'A', 60, 'Y', 4, 1),
(123, 'A', 61, 'N', 4, 2),
(123, 'B', 62, null, 5, 3),
(123, 'C', null, 'Y', 6, 4),
(123, null, 63, null, null, 5),
(123, null, null, 'Y', null, 6),
(123, null, null, null, null, 7);
INSERT INTO T_TEST_AGREEMENT_SALES values
(123, 'D', 63, null, null),
(123, 'F', null, null, null),
(123, 'A', 61, 'N', 4),
(123, 'C', null, 'Y', 6);
You can use a correlated subquery:
select st.*,
(select m.fee
from master m
where m.supplier = st.supplier and
(m.item is null or m.item = st.item) and
(m.program is null or m.program = st.program) and
(m.rxda is null or m.rxda = st.rxda) and
(m.ctrct is null or m.ctrct = st.ctrct)
order by ( (case when m.item = st.item then 1 else 0 end) +
(case when m.program = st.program then 1 else 0 end) +
(case when m.rxda = st.rxda then 1 else 0 end) +
(case when m.ctrct = st.ctrct then 1 else 0 end) +
) desc
fetch first 1 row only
) as fee
from sales_transactions st;
This uses standard SQL syntax. It might vary depending on your row.

SQL Gaps and Island problem with a twist -- reset a flag based on duration from previous flag

I have a data set of calls with Call date from customers and the outcome of the call (Status). I would like to pay my sales rep for each sold call if the most recent previous paid call has been >= 5 days.
Below is my sample data set with the table I have. Also, attached is the picture of the table with the columns I have and column I want. Colored records are sold calls; green records are ones I want to way and the red are ones I wouldn't want to pay even if it is a sold call.
I have tried a few versions using window functions but haven't yet been successful. Any help is highly appreciated.
DECLARE #have TABLE
(
CallDate DATE,
Status VARCHAR(10)
);
INSERT INTO #have (CallDate, Status)
values
('2019-01-01', 'unsold'),
('2019-01-02', 'unsold'),
('2019-01-04', 'unsold'),
('2019-01-08', 'sold'),
('2019-01-09', 'sold'),
('2019-01-13', 'unsold'),
('2019-01-14', 'sold'),
('2019-01-19', 'unsold'),
('2019-01-21', 'unsold'),
('2019-01-22', 'sold'),
('2019-01-24', 'unsold'),
('2019-01-25', 'sold'),
('2019-01-29', 'sold'),
('2019-01-30', 'unsold'),
('2019-02-04', 'sold'),
('2019-02-05', 'sold'),
('2019-02-06', 'sold'),
('2019-02-11', 'sold'),
('2019-02-12', 'unsold'),
('2019-02-17', 'sold'),
('2019-02-18', 'unsold'),
('2019-02-19', 'unsold'),
('2019-02-20', 'sold')
;
DECLARE #want TABLE
(
CallDate DATE,
Status VARCHAR(10),
PaidCall int,
Days_Since_Last_Paid_Call int
);
INSERT INTO #want (CallDate, Status, PaidCall, Days_Since_Last_Paid_Call)
values
('2019-01-01', 'unsold', 0, NULL),
('2019-01-02', 'unsold', 0, NULL),
('2019-01-04', 'unsold', 0, NULL),
('2019-01-08', 'sold', 1, NULL),
('2019-01-09', 'sold', 0, 1),
('2019-01-13', 'unsold', 0, NULL),
('2019-01-14', 'sold', 1, 6),
('2019-01-19', 'unsold', 0, NULL),
('2019-01-21', 'unsold', 0, NULL),
('2019-01-22', 'sold', 1, 8),
('2019-01-24', 'unsold', 0, NULL),
('2019-01-25', 'sold', 0, 3),
('2019-01-29', 'sold', 1, 7),
('2019-01-30', 'unsold', 0, NULL),
('2019-02-04', 'sold', 1, 6),
('2019-02-05', 'sold', 0, 1),
('2019-02-06', 'sold', 0, 2),
('2019-02-11', 'sold', 1, 7),
('2019-02-12', 'unsold', 0, NULL),
('2019-02-17', 'sold', 1, 6),
('2019-02-18', 'unsold', 0, NULL),
('2019-02-19', 'unsold', 0, NULL),
('2019-02-20', 'sold', 0, 3)
;
I would like to add the PaidCall flag in my table as shown in this table. Days_Since_Paid_Call is just for illustration purpose to explain how I am coming up with the PaidCall column.
Unfortunately, you need to iteratively process the data. One method is using a recursive CTE:
with s as (
select h.*, row_number() over (order by calldate) as seqnum
from have h
where status = 'sold'
),
cte as (
select calldate, seqnum, 1 as paidquote, calldate as paidquote_date
from s
where seqnum = 1
union all
select s.calldate, s.seqnum,
(case when s.calldate > dateadd(day, 5, paidquote_date) then 1 else 0 end),
(case when s.calldate > dateadd(day, 5, paidquote_date) then s.calldate else cte.paidquote_date end)
from cte join
s
on s.seqnum = cte.seqnum + 1
)
select h.calldate, h.status, coalesce(cte.paidquote, 0) as paidquote
from have h left join
cte
on h.calldate = cte.calldate
order by h.calldate;
Here is a db<>fiddle.

find the first value based on date and id column

I want to find the values of time taken by a given depot for the stationary.
Below is the code for the create table and values. I have also achieved the other requirements for the same table and also have shared the code below.
I want to create an new column [StationaryFirstWaitTime] where I can get the First wait time for the same scenario based.
For a given ShipmentId, VehicleId,
on where DepotId = StationayId get the [StationaryEndTime] - [StationaryStarttime] for the first value which is received on an given date for an specific vehicle and shipmentid.
below is the code
CREATE TABLE [dbo].[Table_Consolidate_Friday](
[Sno] [int] NOT NULL,
[VehicleId] [nchar](10) NULL,
[DepotId] [int] NULL,
[DepotVisitStartTime] [datetime2](7) NULL,
[DepotVisitEndTime] [datetime2](7) NULL,
[StationaryId] [int] NULL,
[StationaryStartTime] [datetime2](7) NULL,
[StationaryEndTime] [datetime2](7) NULL,
[ActualQty] [bigint] NULL,
[AggreageQty] [bigint] NULL,
[StationaryWaitTimeTotal] [datetime2](7) NULL,
[StationaryFirstWaitTime] [datetime2](7) NULL,
[StationaryRowCount] [bigint] NULL
) ON [PRIMARY]
GO
INSERT [dbo].[Table_Consolidate_Friday] ([Sno], [VehicleId], [DepotId], [DepotVisitStartTime], [DepotVisitEndTime], [StationaryId], [StationaryStartTime], [StationaryEndTime], [ActualQty], [AggreageQty], [StationaryWaitTimeTotal], [StationaryRowCount]) VALUES
(1, N'TN1 ', 15, '2019-02-15T07:25:33', '2019-02-15T17:25:33', 15, '2019-02-15T07:55:32', '2019-02-15T08:15:23', 10, 119, '2019-02-22T02:02:47', 4),
(1, N'TN1 ', 3, '2019-02-15T07:25:33', '2019-02-15T17:25:33', 3, '2019-02-15T09:22:52', '2019-02-15T09:45:59', 20, 119, '2019-02-22T02:02:47', 4),
(1, N'TN1 ', 8, '2019-02-15T07:25:33', '2019-02-15T17:25:33', 8, '2019-02-15T11:25:36', '2019-02-15T02:35:37', 33, 119, '2019-02-22T02:02:47', 4),
(1, N'TN1 ', 12, '2019-02-15T07:25:33', '2019-02-15T17:25:33', 12, '2019-02-15T15:15:33', '2019-02-15T15:25:21', 56, 119, '2019-02-22T02:02:47', 4),
(2, N'KA2 ', 23, '2019-02-15T06:12:52', '2019-02-15T11:21:35', 23, '2019-02-15T10:25:13', '2019-02-15T11:15:23', 72, 114, '2019-02-22T01:24:10', 2),
(2, N'KA2 ', 20, '2019-02-15T06:12:52', '2019-02-15T11:21:35', 20, '2019-02-15T07:11:33', '2019-02-15T07:45:33', 42, 114, '2019-02-22T01:24:10', 2),
(3, N'AP3 ', 20, '2019-02-15T06:32:52', '2019-02-15T11:21:35', 20, '2019-02-15T07:13:13', '2019-02-15T08:05:01', 15, 37, '2019-02-22T01:14:18', 2),
(3, N'AP3 ', 21, '2019-02-15T06:32:52', '2019-02-15T11:21:35', 21, '2019-02-15T09:43:12', '2019-02-15T10:05:42', 22, 37, '2019-02-22T01:14:18', 2),
(3, N'AP3 ', 15, '2019-02-15T13:12:21', '2019-02-15T19:23:32', 15, '2019-02-15T14:13:13', '2019-02-15T14:45:21', 34, 34, '2019-02-22T00:32:08', 1)
I have written code to add and aggregate values and count as below
SELECT
AggreageQty = SUM(ActualQty) OVER (PARTITION BY Sno, DepotVisitStartTime),
StationaryWaitTimeTotal = CAST(DATEADD(SECOND, SUM(DATEDIFF(SECOND, StationaryStartTime, StationaryEndTime) ) OVER (PARTITION BY Sno, DepotVisitStartTime), 0) AS TIME),
StationaryRowCount = COUNT(*) OVER (PARTITION BY Sno, DepotVisitStartTime)
FROM [dbo].[Table_Consolidate]
I need to get the result as below for [StationaryFirstWaitTime] as below
FirstWaitTime
0:-19:-51
0:-19:-51
0:-19:-51
0:-19:-51
0:-50:-10
0:-50:-10
0:-51:-48
0:-51:-48
0:-32:-8
Platform: Azure SQL Datawarehouse
Window aggregate function: FIRST_VALUE.
Requested extra column is indeed has a non-standard look, so FORMAT() to meet such requirement:
SQL:
SELECT
AggreageQty = SUM(ActualQty) OVER (PARTITION BY Sno, DepotVisitStartTime),
StationaryWaitTimeTotal = CAST(DATEADD(SECOND, SUM(DATEDIFF(SECOND, StationaryStartTime, StationaryEndTime) ) OVER (PARTITION BY Sno, DepotVisitStartTime), 0) AS TIME),
StationaryRowCount = COUNT(*) OVER (PARTITION BY Sno, DepotVisitStartTime),
StationaryFirstWaitTime = FORMAT(FIRST_VALUE ( CAST(DATEADD(SECOND, DATEDIFF(SECOND, StationaryStartTime, StationaryEndTime) , 0) AS datetime) ) OVER (PARTITION BY Sno, DepotVisitStartTime order by StationaryStartTime), 'H:-m:-s')
FROM [dbo].[Table_Consolidate_Friday]
That extra column of interest results to:
StationaryFirstWaitTime
0:-19:-51
0:-19:-51
0:-19:-51
0:-19:-51
0:-34:-0
0:-34:-0
0:-51:-48
0:-51:-48
0:-32:-8
Update:
OP uses SQL Datawarehouse. FORMAT() is not available there, workaround:
StationaryFirstWaitTime = REPLACE(CONVERT(VARCHAR(8),FIRST_VALUE ( CAST(DATEADD(SECOND, DATEDIFF(SECOND, StationaryStartTime, StationaryEndTime) , 0) AS TIME) ) OVER (PARTITION BY Sno, DepotVisitStartTime order by StationaryStartTime), 8), ':', ':-')
Which results to:
StationaryFirstWaitTime
00:-19:-51
00:-19:-51
00:-19:-51
00:-19:-51
00:-34:-00
00:-34:-00
00:-51:-48
00:-51:-48
00:-32:-08

Most performant way in SQL Server to condense multiple data changes into before and after values

I have a SQL Server database with some audit records showing changes to a third party database (OpenEdge). I have no control over the structure of the audit data, nor the way the third party database audits data changes. So I'm left with, for example, the following data...
If you follow the first five rows you can see they all belong to TransId 1532102 (represents a database transaction) where the TransSeq represents a database action within a single transaction.
In the columns prefix New the audit changes are visible. If the value is NULL then no change to that field took place.
Looking at the data you can see that where TransId = 1532102 the PrimaryIdentifier is changed from 2 to -2 (row 1), then from -2 to 3 (row 3), then from 3 to 4 (row 4) and finally from 4 to 5 (row 5). You might also notice that when the PrimaryIdentifier changes from 3 to 4 the SecondaryIdentifier changes from 'abcd' to 'efgh' (row 4).
So these multiple changes are actually only occurring on a single source record. So with this in mind rows 1, 3, 4 & 5 can all be condensed into a single row (see below)
Ultimately there are only two record changes in TransId 1532102..
I need to translate these changes into a single UPDATE statement on a target database. In order to do this I need to ensure I have a single record showing the before and after values.
So given the source data presented here I need to produce the following data set..
What query structures could I use to achieve this? I was thinking recursive CTEs or perhaps using Hierarchical structures?
Ultimately I need this to perform as well as possible so I wanted to pose the question here in case I hadn't considered all possible approaches.
Thoughts welcome and here's a script for the sample data
DECLARE #TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT #TestTable
SELECT 128, 1532102, 0, 2, 'abcd', -2, NULL, NULL, 'test data'
UNION SELECT 128, 1532102, 1, 3, 'abcd', 2, NULL, NULL, NULL
UNION SELECT 128, 1532102, 2, -2, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532102, 3, 3, 'abcd', 4, 'efgh', NULL, NULL
UNION SELECT 128, 1532102, 4, 4, 'efgh', 5, NULL, 2, NULL
UNION SELECT 128, 1532102, 5, 5, 'efgh', NULL, 'ghfi', NULL, NULL
UNION SELECT 128, 1532106, 0, 3, 'abcd', -3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 1, 4, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 2, -3, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 0, 4, 'abcd', -4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 1, 5, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 2, -4, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 0, 5, 'abcd', -5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 1, 4, 'abcd', 5, NULL, 1, NULL
UNION SELECT 128, 1532114, 2, -5, 'abcd', 4, NULL, NULL, 'some more test data'
SELECT *
FROM #TestTable
EDIT:
I've actually been unable to write any queries that successfully track the identifier changes. Can anyone help - I need a query that tracks the changes in PrimaryIdentifier values and ultimately provides a single record for each tracking with start values and end values.
EDIT 2:
There's been a deleted answer that suggests the update to the key identifiers is not possible when condensed and that I should step through the changes instead. I thought it would be valuable to add my comments for further info to the question..
I need to condense the dataset because of the volume of audit records being generated; most of which are unecessary because of the way the source DBMS makes its changes. I need to reduce the dataset and I need to track key identifier changes. The update should be possible without clashing on id change during the update statement - see this example.
I assume that
1) (PrimaryIdentifier, SecondaryIdentifier) is a PK of the target table,
2) Every transacton in the audit table leaves target table in a consistent state.
So the update of the PK in a single statement for every transaction using case will run OK:
declare #t table (id int primary key, old int);
insert #t(id, old) values (4,4),(5,5);
update #t set id = case id
when 4 then 5
when 5 then 4 end;
select * from #t;
The plan is
1. Condense transactions
2. Generate update sql into temp table. Then you can run all or selected items from the temp table. Every item is of the form
UPDATE myTable SET
PrimaryIdentifier = CASE WHEN PrimaryIdentifier=2 AND SecondaryIdentifier='abcd' THEN 5
WHEN PrimaryIdentifier=3 AND SecondaryIdentifier='abcd' THEN 2 END,
SecondaryIdentifier = CASE WHEN PrimaryIdentifier=2 AND SecondaryIdentifier='abcd' THEN 'efgh'
WHEN PrimaryIdentifier=3 AND SecondaryIdentifier='abcd' THEN 'abcd' END ,
Level= CASE WHEN PrimaryIdentifier=2 AND SecondaryIdentifier='abcd' THEN 2
WHEN PrimaryIdentifier=3 AND SecondaryIdentifier='abcd' THEN Level END ,
Value= CASE WHEN PrimaryIdentifier=2 AND SecondaryIdentifier='abcd' THEN 'test data'
WHEN PrimaryIdentifier=3 AND SecondaryIdentifier='abcd' THEN Value END
WHERE 1=2 OR (PrimaryIdentifier=2 AND SecondaryIdentifier='abcd')
OR (PrimaryIdentifier=3 AND SecondaryIdentifier='abcd')
The query
DECLARE #TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT #TestTable
SELECT 128, 1532102, 0, 2, 'abcd', -2, NULL, NULL, 'test data'
UNION SELECT 128, 1532102, 1, 3, 'abcd', 2, NULL, NULL, NULL
UNION SELECT 128, 1532102, 2, -2, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532102, 3, 3, 'abcd', 4, 'efgh', NULL, NULL
UNION SELECT 128, 1532102, 4, 4, 'efgh', 5, NULL, 2, NULL
UNION SELECT 128, 1532106, 0, 3, 'abcd', -3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 1, 4, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 2, -3, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 0, 4, 'abcd', -4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 1, 5, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 2, -4, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532110, 3, 5, 'abcd', 6, NULL, NULL, NULL
UNION SELECT 128, 1532110, 4, 6, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 0, 5, 'abcd', -5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 1, 4, 'abcd', 5, NULL, 1, NULL
UNION SELECT 128, 1532114, 2, -5, 'abcd', 4, NULL, NULL, 'some more test data'
;
WITH root AS (
-- Top parent updates within transactions
SELECT SyncId, TransId, TransSeq, PrimaryIdentifier AS rPrimaryIdentifier, SecondaryIdentifier AS rSecondaryIdentifier,
NewPrimaryIdentifier,
coalesce(NewSecondaryIdentifier, SecondaryIdentifier) AS NewSecondaryIdentifier,
newLevel, NewValue
FROM #TestTable t
WHERE NOT EXISTS (SELECT 1
FROM #TestTable t2
WHERE t2.SyncId=t.SyncId AND t2.TransId = t.TransId
AND t2.TransSeq < t.TransSeq
AND t.PrimaryIdentifier = t2.NewPrimaryIdentifier
AND t.SecondaryIdentifier = coalesce(t2.NewSecondaryIdentifier, t2.SecondaryIdentifier)
)
-- recursion to track the chain of updates
UNION ALL
SELECT root.SyncId, root.TransId, t.TransSeq, rPrimaryIdentifier, rSecondaryIdentifier,
t.NewPrimaryIdentifier,
coalesce(t.NewSecondaryIdentifier, root.NewSecondaryIdentifier),
coalesce(root.NewLevel, t.NewLevel), coalesce(root.NewValue, t.NewValue)
FROM root
JOIN #TestTable t ON root.SyncId=t.SyncId AND root.TransId = t.TransId
AND root.TransSeq < t.TransSeq
AND t.PrimaryIdentifier = root.NewPrimaryIdentifier
AND t.SecondaryIdentifier = root.NewSecondaryIdentifier
)
,condensed as (
-- last update in the chain
SELECT TOP(1) WITH TIES *
FROM root
ORDER BY row_number() over (partition by SyncId, TransId, rPrimaryIdentifier, rSecondaryIdentifier
order by TransSeq desc)
)
-- generate sql
SELECT SyncId, TransId, sql = 'UPDATE myTable SET PrimaryIdentifier = CASE'
+ (SELECT ' WHEN PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20))
+' AND SecondaryIdentifier=''' + rSecondaryIdentifier
+''' THEN ' + CAST(NewPrimaryIdentifier as varchar(20))
FROM condensed c2
WHERE c1.SyncId = c2.SyncId AND c1.TransId= c2.TransId
FOR XML PATH('') )
+ ' END, SecondaryIdentifier = CASE'
+ (SELECT ' WHEN PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20))
+' AND SecondaryIdentifier=''' + rSecondaryIdentifier
+''' THEN ''' + NewSecondaryIdentifier + ''''
FROM condensed c2
WHERE c1.SyncId = c2.SyncId AND c1.TransId= c2.TransId
FOR XML PATH('') )
+ ' END , Level= CASE'
+ (SELECT ' WHEN PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20))
+' AND SecondaryIdentifier=''' + rSecondaryIdentifier
+''' THEN '
+ CASE WHEN NewLevel IS NULL THEN ' Level ' ELSE CAST(NewLevel as varchar(20)) END
FROM condensed c2
WHERE c1.SyncId = c2.SyncId AND c1.TransId= c2.TransId
FOR XML PATH('') )
+ ' END , Value= CASE'
+ (SELECT ' WHEN PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20))
+' AND SecondaryIdentifier=''' + rSecondaryIdentifier
+''' THEN '
+ CASE WHEN NewValue IS NULL THEN ' Value ' ELSE '''' + NewValue + '''' END
FROM condensed c2
WHERE c1.SyncId = c2.SyncId AND c1.TransId= c2.TransId
FOR XML PATH('') )
+ ' END'
+ ' WHERE 1=2'
+ (SELECT ' OR (PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20))
+' AND SecondaryIdentifier=''' + rSecondaryIdentifier +''')'
FROM condensed c2
WHERE c1.SyncId = c2.SyncId AND c1.TransId= c2.TransId
FOR XML PATH('') )
INTO #UpdSql
FROM condensed c1
GROUP BY SyncId, TransId
SELECT *
FROM #UpdSql
ORDER BY SyncId, TransId
EDIT
Taking into account NewPrimaryIdentifier can be NULL too. See added row at #TestTable. Sql generation skipped.
DECLARE #TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT #TestTable
SELECT 128, 1532102, 0, 2, 'abcd', -2, NULL, NULL, 'test data'
UNION SELECT 128, 1532102, 1, 3, 'abcd', 2, NULL, NULL, NULL
UNION SELECT 128, 1532102, 2, -2, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532102, 3, 3, 'abcd', 4, 'efgh', NULL, NULL
UNION SELECT 128, 1532102, 4, 4, 'efgh', 5, NULL, 2, NULL
UNION SELECT 128, 1532102, 5, 5, 'efgh', null, 'ghfi', null, NULL -- added
UNION SELECT 128, 1532106, 0, 3, 'abcd', -3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 1, 4, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 2, -3, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 0, 4, 'abcd', -4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 1, 5, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 2, -4, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532110, 3, 5, 'abcd', 6, NULL, NULL, NULL
UNION SELECT 128, 1532110, 4, 6, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 0, 5, 'abcd', -5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 1, 4, 'abcd', 5, NULL, 1, NULL
UNION SELECT 128, 1532114, 2, -5, 'abcd', 4, NULL, NULL, 'some more test data'
;
WITH root AS (
-- Top parent updates within transactions
SELECT SyncId, TransId, TransSeq, PrimaryIdentifier AS rPrimaryIdentifier, SecondaryIdentifier AS rSecondaryIdentifier,
coalesce(NewPrimaryIdentifier, PrimaryIdentifier) AS NewPrimaryIdentifier,
coalesce(NewSecondaryIdentifier, SecondaryIdentifier) AS NewSecondaryIdentifier,
newLevel, NewValue
FROM #TestTable t
WHERE NOT EXISTS (SELECT 1
FROM #TestTable t2
WHERE t2.SyncId=t.SyncId AND t2.TransId = t.TransId
AND t2.TransSeq < t.TransSeq
AND t.PrimaryIdentifier = coalesce(t2.NewPrimaryIdentifier, t2.PrimaryIdentifier)
AND t.SecondaryIdentifier = coalesce(t2.NewSecondaryIdentifier, t2.SecondaryIdentifier)
)
-- recursion to track the chain of updates
UNION ALL
SELECT root.SyncId, root.TransId, t.TransSeq, rPrimaryIdentifier, rSecondaryIdentifier,
coalesce(t.NewPrimaryIdentifier, root.NewPrimaryIdentifier),
coalesce(t.NewSecondaryIdentifier, root.NewSecondaryIdentifier),
coalesce(t.NewLevel, root.NewLevel), coalesce(t.NewValue, root.NewValue)
FROM root
JOIN #TestTable t ON root.SyncId=t.SyncId AND root.TransId = t.TransId
AND root.TransSeq < t.TransSeq
AND t.PrimaryIdentifier = root.NewPrimaryIdentifier
AND t.SecondaryIdentifier = root.NewSecondaryIdentifier
)
,condensed as (
-- last update in the chain
SELECT TOP(1) WITH TIES *
FROM root
ORDER BY row_number() over (partition by SyncId, TransId, rPrimaryIdentifier, rSecondaryIdentifier
order by TransSeq desc)
)
SELECT *
FROM condensed
ORDER BY SyncId, TransId, rPrimaryIdentifier, rSecondaryIdentifier
Here is a second stab at producing the originally asked for output. This time using a bunch of CTE:s.
DECLARE #TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT #TestTable
SELECT 128, 1532102, 0, 2, 'abcd', -2, NULL, NULL, 'test data'
UNION SELECT 128, 1532102, 1, 3, 'abcd', 2, NULL, NULL, NULL
UNION SELECT 128, 1532102, 2, -2, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532102, 3, 3, 'abcd', 4, 'efgh', NULL, NULL
UNION SELECT 128, 1532102, 4, 4, 'efgh', 5, NULL, 2, NULL
UNION SELECT 128, 1532106, 0, 3, 'abcd', -3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 1, 4, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 2, -3, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 0, 4, 'abcd', -4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 1, 5, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 2, -4, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532110, 3, 5, 'abcd', 6, NULL, NULL, NULL
UNION SELECT 128, 1532110, 4, 6, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 0, 5, 'abcd', -5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 1, 4, 'abcd', 5, NULL, 1, NULL
UNION SELECT 128, 1532114, 2, -5, 'abcd', 4, NULL, NULL, 'some more test data'
;with baseCTE as (
select SyncId, TransId, TransSeq, PrimaryIdentifier, SecondaryIdentifier,
isnull(NewPrimaryIdentifier, PrimaryIdentifier) as NewPrimaryIdentifier,
isnull(NewSecondaryIdentifier, SecondaryIdentifier) as NewSecondaryIdentifier,
NewLevel, NewValue
from #TestTable
),
syncTransEntryPointsCte as (
select *
from baseCTE b
where not exists(
select *
from baseCTE subb
where b.SyncId = subb.SyncId
and b.TransId = subb.TransId
and b.PrimaryIdentifier = subb.NewPrimaryIdentifier
and b.SecondaryIdentifier = subb.NewSecondaryIdentifier
and b.TransSeq > subb.TransSeq
)
)
, recursiveBaseCte as (
select *, 0 as lev, TransSeq as OrigTransSec from syncTransEntryPointsCte
union all
select
c.SyncId, c.TransId, c.TransSeq, p.PrimaryIdentifier, p.SecondaryIdentifier, c.NewPrimaryIdentifier, c.NewSecondaryIdentifier, isnull(c.NewLevel, p.NewLevel), isnull(c.NewValue, p.NewValue),
p.lev + 1,
p.OrigTransSec
from baseCTE c
join recursiveBaseCte as p on (
c.SyncId = p.SyncId and c.TransId = p.TransId and c.PrimaryIdentifier = p.NewPrimaryIdentifier and c.SecondaryIdentifier = p.NewSecondaryIdentifier and c.TransSeq > p.TransSeq
)
)
select r.SyncId, r.TransId, r.OrigTransSec as TransSec,
r.PrimaryIdentifier, r.SecondaryIdentifier,
nullif(r.NewPrimaryIdentifier, r.PrimaryIdentifier) as NewPrimaryIdentifier,
nullif(r.NewSecondaryIdentifier, r.SecondaryIdentifier) as NewSecondaryIdentifier,
r.NewLevel, r.NewValue
from recursiveBaseCte r
join (
select SyncId, TransId, PrimaryIdentifier, SecondaryIdentifier, max(lev) as mlev
from recursiveBaseCte
group by SyncId, TransId, PrimaryIdentifier, SecondaryIdentifier
) as selectForOutput on
r.SyncId = selectForOutput.SyncId
and r.TransId = selectForOutput.TransId
and r.PrimaryIdentifier = selectForOutput.PrimaryIdentifier
and r.SecondaryIdentifier = selectForOutput.SecondaryIdentifier
and r.lev = selectForOutput.mlev
order by 1,2,3
Whether or not the CTE approach is any faster than the cursor based one is difficult to guess. I do suggest you test run this at a suitable time when the server in question is not under heavy load.
Update
The script first declares the baseCTE which is used just to make sure that we have values in NewPrimaryIdentifier and NewSecondaryIdentifier for each row, even if one or both of them were not changed in the update. This makes everything after that easier since we can then join to the next row for the same combination within a specific transaction.
The syncTransEntryPointCte in turn uses baseCTE to find all rows within one transaction that were not preceded by another row within the same transaction.
recursiveBaseCte then uses both of the previous CTE:s to recursively find rows and aggregate changes. The final query then uses it to produce the final output.
The output should be usable for updating a stale copy of the source table if you can manage to do the updates for one condensed transaction in one update statement. If, as I originally assumed, you try to build one update statement for each row in the condensed audit output, it will not work.
Finally, obligatory disclaimer: This seems to work with the test data you gave in the question. I can give no guarantees that it works for the real thing, so use with caution.
Here is a first stab at getting the desired output. It's using a CURSOR, so don't expect great performance.
set nocount on
DECLARE #TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
DECLARE #OutputTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT #TestTable
SELECT 128, 1532102, 0, 2, 'abcd', -2, NULL, NULL, 'test data'
UNION SELECT 128, 1532102, 1, 3, 'abcd', 2, NULL, NULL, NULL
UNION SELECT 128, 1532102, 2, -2, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532102, 3, 3, 'abcd', 4, 'efgh', NULL, NULL
UNION SELECT 128, 1532102, 4, 4, 'efgh', 5, NULL, 2, NULL
UNION SELECT 128, 1532106, 0, 3, 'abcd', -3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 1, 4, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 2, -3, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 0, 4, 'abcd', -4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 1, 5, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 2, -4, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532110, 3, 5, 'abcd', 6, NULL, NULL, NULL
UNION SELECT 128, 1532110, 4, 6, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 0, 5, 'abcd', -5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 1, 4, 'abcd', 5, NULL, 1, NULL
UNION SELECT 128, 1532114, 2, -5, 'abcd', 4, NULL, NULL, 'some more test data'
--SELECT * FROM #TestTable
declare #cSyncId int, #cTransId int, #cTransSeq int, #cPrimaryId int, #cSecondaryId nchar(4), #cNewPrimaryId int, #cNewSecondary nchar(4), #cNewLevel int, #cNewValue nvarchar(20)
declare #newTransSeq int, #prevSyncId int, #prevTransId int
set #newTransSeq = 0
set #prevSyncId = 0
set #prevTransId = 0
declare auditCursor CURSOR for
select SyncId, TransId, TransSeq, PrimaryIdentifier, SecondaryIdentifier,
isnull(NewPrimaryIdentifier, PrimaryIdentifier) as NewPrimaryIdentifier,
isnull(NewSecondaryIdentifier, SecondaryIdentifier) as NewSecondaryIdentifier,
NewLevel, NewValue
from #TestTable
order by SyncId, TransId, TransSeq
open auditCursor
fetch next from auditCursor into #cSyncId, #cTransId, #cTransSeq, #cPrimaryId, #cSecondaryId, #cNewPrimaryId, #cNewSecondary, #cNewLevel, #cNewValue
while ##FETCH_STATUS = 0
begin
if #prevSyncId != #cSyncId or #prevTransId != #cTransId
begin
set #newTransSeq = 0
set #prevSyncId = #cSyncId
set #prevTransId = #cTransId
end
if(not exists(select * from #OutputTable where SyncId = #cSyncId and TransId = #cTransId and NewPrimaryIdentifier = #cPrimaryId and NewSecondaryIdentifier = #cSecondaryId))
begin
insert into #OutputTable values(#cSyncId, #cTransId, #newTransSeq, #cPrimaryId, #cSecondaryId, #cNewPrimaryId, #cNewSecondary, #cNewLevel, #cNewValue)
set #newTransSeq = #newTransSeq + 1
end
else
begin
update #OutputTable
set NewPrimaryIdentifier = isnull(#cNewPrimaryId, NewPrimaryIdentifier),
NewSecondaryIdentifier = isnull(#cNewSecondary, NewSecondaryIdentifier),
NewLevel = isnull(#cNewLevel, NewLevel),
NewValue = isnull(#cNewValue, NewValue)
where SyncId = #cSyncId
and TransId = #cTransId
and NewPrimaryIdentifier = #cPrimaryId
and NewSecondaryIdentifier = #cSecondaryId
end
fetch next from auditCursor into #cSyncId, #cTransId, #cTransSeq, #cPrimaryId, #cSecondaryId, #cNewPrimaryId, #cNewSecondary, #cNewLevel, #cNewValue
end
deallocate auditCursor
select
SyncId, TransId, TransSeq, PrimaryIdentifier, SecondaryIdentifier,
nullif(NewPrimaryIdentifier, PrimaryIdentifier) as NewPrimaryIdentifier,
nullif(NewSecondaryIdentifier, SecondaryIdentifier) as NewSecondaryIdentifier,
NewLevel, NewValue
from #OutputTable order by 1,2,3
As far as I can tell, this will give the output you want. But then it depends on what you want to do next if this is actually the output you should want.
If, for example, you are going to use the output to somehow generate update scripts in order to sync a copy of the database so that the copy is up to date with the source database, this will not work.
If we look at the output for transaction 1532106, the condensed audit has primary id 3 change to 4, then primary id 4 change to 3. That will of course not work.
Based on how the audit trail looks, it seems that the program manipulating the tables switches a primary id to negative value when it need to free up the id on the row. If we change one line in my sample:
if(not exists(select * from #OutputTable where SyncId = #cSyncId and TransId = #cTransId and NewPrimaryIdentifier = #cPrimaryId and NewSecondaryIdentifier = #cSecondaryId))
to
if(not exists(select * from #OutputTable where SyncId = #cSyncId and TransId = #cTransId and NewPrimaryIdentifier = #cPrimaryId and NewSecondaryIdentifier = #cSecondaryId) or #cPrimaryId < 0)
(added or #cPrimaryId < 0) then we get a different, less condensed, output that as far as I can tell should be workable for the case mentioned.
Here is a way to get "the latest condensed record" only using SQL. Since I don't have a full data set I cannot tell you how well it will perform though.
DECLARE #TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT #TestTable
SELECT 128, 1532102, 0, 2, 'abcd', -2, NULL, NULL, 'test data'
UNION SELECT 128, 1532102, 1, 3, 'abcd', 2, NULL, NULL, NULL
UNION SELECT 128, 1532102, 2, -2, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532102, 3, 3, 'abcd', 4, 'efgh', NULL, NULL
UNION SELECT 128, 1532102, 4, 4, 'efgh', 5, NULL, 2, NULL
UNION SELECT 128, 1532106, 0, 3, 'abcd', -3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 1, 4, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 2, -3, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 0, 4, 'abcd', -4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 1, 5, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 2, -4, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 0, 5, 'abcd', -5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 1, 4, 'abcd', 5, NULL, 1, NULL
UNION SELECT 128, 1532114, 2, -5, 'abcd', 4, NULL, NULL, 'some more test data';
WITH data AS (
SELECT *
, ROW_NUMBER() OVER(PARTITION BY TRANSID ORDER BY CASE WHEN PrimaryIdentifier IS NULL THEN 1 ELSE 0 END, TRANSSeq desc) AS rn_PrimaryIdentifier
, ROW_NUMBER() OVER(PARTITION BY TRANSID ORDER BY CASE WHEN SecondaryIdentifier IS NULL THEN 1 ELSE 0 END, TRANSSeq desc) AS rn_SecondaryIdentifier
, ROW_NUMBER() OVER(PARTITION BY TRANSID ORDER BY CASE WHEN NewPrimaryIdentifier IS NULL THEN 1 ELSE 0 END, TRANSSeq desc) AS rn_NewPrimaryIdentifier
, ROW_NUMBER() OVER(PARTITION BY TRANSID ORDER BY CASE WHEN NewSecondaryIdentifier IS NULL THEN 1 ELSE 0 END, TRANSSeq desc) AS rn_NewSecondaryIdentifier
, ROW_NUMBER() OVER(PARTITION BY TRANSID ORDER BY CASE WHEN NewLevel IS NULL THEN 1 ELSE 0 END, TRANSSeq desc) AS rn_NewLevel
, ROW_NUMBER() OVER(PARTITION BY TRANSID ORDER BY CASE WHEN NewValue IS NULL THEN 1 ELSE 0 END, TRANSSeq desc) AS rn_NewValue
FROM #TestTable
)
, transIds
AS (
SELECT DISTINCT SyncId, TransId
FROM #TestTable)
SELECT t.SyncId
, t.TransId
, (SELECT d.PrimaryIdentifier FROM data d WHERE d.TransId = t.TransId AND d.rn_PrimaryIdentifier = 1) AS PrimaryIdentifier
, (SELECT d.SecondaryIdentifier FROM data d WHERE d.TransId = t.TransId AND d.rn_SecondaryIdentifier = 1) AS SecondaryIdentifier
, (SELECT d.NewPrimaryIdentifier FROM data d WHERE d.TransId = t.TransId AND d.rn_NewPrimaryIdentifier = 1) AS NewPrimaryIdentifier
, (SELECT d.NewSecondaryIdentifier FROM data d WHERE d.TransId = t.TransId AND d.rn_NewSecondaryIdentifier = 1) AS NewSecondaryIdentifier
, (SELECT d.NewLevel FROM data d WHERE d.TransId = t.TransId AND d.rn_NewLevel = 1) AS NewLevel
, (SELECT d.NewValue FROM data d WHERE d.TransId = t.TransId AND d.rn_NewValue = 1) AS NewValue
FROM transIds t;
I'm using two CTE's. "data" contains all of the data along with a priority order for which row to use for each column of interest. "transIds" is just the distinct list of TransIds so the final result will have one row per Transaction Id in the original data set.
Note the use of the window's function in the data CTE:
, ROW_NUMBER() OVER(PARTITION BY TRANSID ORDER BY CASE WHEN PrimaryIdentifier IS NULL THEN 1 ELSE 0 END, TRANSSeq desc) AS rn_PrimaryIdentifier
The logic behind the windows function is to make it so the latest row with a non-null value in the respective column has a value of "1". Breaking it down:
ROWNUMBER(): gets a sequence of numbers
PARTITION BY TRANSID: restarts the sequence for each different TransId
ORDER BY CASE WHEN column IS NULL THEN 1 ELSE 0 END: Sort all nulls to the end prior to the sequence being applied.
(ORDER BY) TRANSSeq desc: Sort so the latest TransSeq is first.
In the final select, for each TransId I query the data table to get the latest non-null value for each column based on my previous window's function:
, (SELECT d.PrimaryIdentifier FROM data d WHERE d.TransId = t.TransId AND d.rn_PrimaryIdentifier = 1) AS PrimaryIdentifier
In your original question you asked to get both the original values and the latest values. I'm not sure that this makes sense. If you want an audit log of your own each time this changes, then you should just save the "current" row to an audit log table in your database prior to update. If you really want the first row from the original data set than I would suggest a union all combined with my query above. Just append this code to the above query:
UNION ALL
SELECT SyncId, TransId, PrimaryIdentifier, SecondaryIdentifier, NewPrimaryIdentifier, NewSecondaryIdentifier, NewLevel, NewValue
FROM #TestTable
WHERE TransSeq = 0
ORDER BY TransId;