find the first value based on date and id column - sql

I want to find the values of time taken by a given depot for the stationary.
Below is the code for the create table and values. I have also achieved the other requirements for the same table and also have shared the code below.
I want to create an new column [StationaryFirstWaitTime] where I can get the First wait time for the same scenario based.
For a given ShipmentId, VehicleId,
on where DepotId = StationayId get the [StationaryEndTime] - [StationaryStarttime] for the first value which is received on an given date for an specific vehicle and shipmentid.
below is the code
CREATE TABLE [dbo].[Table_Consolidate_Friday](
[Sno] [int] NOT NULL,
[VehicleId] [nchar](10) NULL,
[DepotId] [int] NULL,
[DepotVisitStartTime] [datetime2](7) NULL,
[DepotVisitEndTime] [datetime2](7) NULL,
[StationaryId] [int] NULL,
[StationaryStartTime] [datetime2](7) NULL,
[StationaryEndTime] [datetime2](7) NULL,
[ActualQty] [bigint] NULL,
[AggreageQty] [bigint] NULL,
[StationaryWaitTimeTotal] [datetime2](7) NULL,
[StationaryFirstWaitTime] [datetime2](7) NULL,
[StationaryRowCount] [bigint] NULL
) ON [PRIMARY]
GO
INSERT [dbo].[Table_Consolidate_Friday] ([Sno], [VehicleId], [DepotId], [DepotVisitStartTime], [DepotVisitEndTime], [StationaryId], [StationaryStartTime], [StationaryEndTime], [ActualQty], [AggreageQty], [StationaryWaitTimeTotal], [StationaryRowCount]) VALUES
(1, N'TN1 ', 15, '2019-02-15T07:25:33', '2019-02-15T17:25:33', 15, '2019-02-15T07:55:32', '2019-02-15T08:15:23', 10, 119, '2019-02-22T02:02:47', 4),
(1, N'TN1 ', 3, '2019-02-15T07:25:33', '2019-02-15T17:25:33', 3, '2019-02-15T09:22:52', '2019-02-15T09:45:59', 20, 119, '2019-02-22T02:02:47', 4),
(1, N'TN1 ', 8, '2019-02-15T07:25:33', '2019-02-15T17:25:33', 8, '2019-02-15T11:25:36', '2019-02-15T02:35:37', 33, 119, '2019-02-22T02:02:47', 4),
(1, N'TN1 ', 12, '2019-02-15T07:25:33', '2019-02-15T17:25:33', 12, '2019-02-15T15:15:33', '2019-02-15T15:25:21', 56, 119, '2019-02-22T02:02:47', 4),
(2, N'KA2 ', 23, '2019-02-15T06:12:52', '2019-02-15T11:21:35', 23, '2019-02-15T10:25:13', '2019-02-15T11:15:23', 72, 114, '2019-02-22T01:24:10', 2),
(2, N'KA2 ', 20, '2019-02-15T06:12:52', '2019-02-15T11:21:35', 20, '2019-02-15T07:11:33', '2019-02-15T07:45:33', 42, 114, '2019-02-22T01:24:10', 2),
(3, N'AP3 ', 20, '2019-02-15T06:32:52', '2019-02-15T11:21:35', 20, '2019-02-15T07:13:13', '2019-02-15T08:05:01', 15, 37, '2019-02-22T01:14:18', 2),
(3, N'AP3 ', 21, '2019-02-15T06:32:52', '2019-02-15T11:21:35', 21, '2019-02-15T09:43:12', '2019-02-15T10:05:42', 22, 37, '2019-02-22T01:14:18', 2),
(3, N'AP3 ', 15, '2019-02-15T13:12:21', '2019-02-15T19:23:32', 15, '2019-02-15T14:13:13', '2019-02-15T14:45:21', 34, 34, '2019-02-22T00:32:08', 1)
I have written code to add and aggregate values and count as below
SELECT
AggreageQty = SUM(ActualQty) OVER (PARTITION BY Sno, DepotVisitStartTime),
StationaryWaitTimeTotal = CAST(DATEADD(SECOND, SUM(DATEDIFF(SECOND, StationaryStartTime, StationaryEndTime) ) OVER (PARTITION BY Sno, DepotVisitStartTime), 0) AS TIME),
StationaryRowCount = COUNT(*) OVER (PARTITION BY Sno, DepotVisitStartTime)
FROM [dbo].[Table_Consolidate]
I need to get the result as below for [StationaryFirstWaitTime] as below
FirstWaitTime
0:-19:-51
0:-19:-51
0:-19:-51
0:-19:-51
0:-50:-10
0:-50:-10
0:-51:-48
0:-51:-48
0:-32:-8
Platform: Azure SQL Datawarehouse

Window aggregate function: FIRST_VALUE.
Requested extra column is indeed has a non-standard look, so FORMAT() to meet such requirement:
SQL:
SELECT
AggreageQty = SUM(ActualQty) OVER (PARTITION BY Sno, DepotVisitStartTime),
StationaryWaitTimeTotal = CAST(DATEADD(SECOND, SUM(DATEDIFF(SECOND, StationaryStartTime, StationaryEndTime) ) OVER (PARTITION BY Sno, DepotVisitStartTime), 0) AS TIME),
StationaryRowCount = COUNT(*) OVER (PARTITION BY Sno, DepotVisitStartTime),
StationaryFirstWaitTime = FORMAT(FIRST_VALUE ( CAST(DATEADD(SECOND, DATEDIFF(SECOND, StationaryStartTime, StationaryEndTime) , 0) AS datetime) ) OVER (PARTITION BY Sno, DepotVisitStartTime order by StationaryStartTime), 'H:-m:-s')
FROM [dbo].[Table_Consolidate_Friday]
That extra column of interest results to:
StationaryFirstWaitTime
0:-19:-51
0:-19:-51
0:-19:-51
0:-19:-51
0:-34:-0
0:-34:-0
0:-51:-48
0:-51:-48
0:-32:-8
Update:
OP uses SQL Datawarehouse. FORMAT() is not available there, workaround:
StationaryFirstWaitTime = REPLACE(CONVERT(VARCHAR(8),FIRST_VALUE ( CAST(DATEADD(SECOND, DATEDIFF(SECOND, StationaryStartTime, StationaryEndTime) , 0) AS TIME) ) OVER (PARTITION BY Sno, DepotVisitStartTime order by StationaryStartTime), 8), ':', ':-')
Which results to:
StationaryFirstWaitTime
00:-19:-51
00:-19:-51
00:-19:-51
00:-19:-51
00:-34:-00
00:-34:-00
00:-51:-48
00:-51:-48
00:-32:-08

Related

SQL Pivot Half of table

I have a table that consists of time information. It's basically:
Employee, Date, Seq, Time In, Time Out.
They can clock out multiple times a day, so I'm trying to get all of the clock outs in a day on one row. My result would be something like:
Employee, Date, TimeIn1, TimeOut1, TimeIn2, TimeOut2, TimeIn3, TimeOut3....
Where the 1, 2, and 3 are the sequence numbers. I know I could just do a bunch of left joins to the table itself based on employee=employee, date=date, and seq=seq+1, but is there a way to do it in a pivot? I don't want to pivot the employee and date fields, just the time in and time out.
The short answer is: Yes, it's possible.
The exact code will be updated if/when you provide sample data to clarify some points, but you can absolutely pivot the times out while leaving the employee/work date alone.
Sorry for the wall of code; none of the fiddle sites are working from my current computer
declare #test table (
pk int,
workdate date,
seq int,
tIN time,
tOUT time
)
insert into #test values
(1, '2020-11-25', 1, '08:00', null),
(1, '2020-11-25', 2, null, '11:00'),
(1, '2020-11-25', 3, '11:32', null),
(1, '2020-11-25', 4, null, '17:00'),
(2, '2020-11-25', 5, '08:00', null),
(2, '2020-11-25', 6, null, '09:00'),
(2, '2020-11-25', 7, '09:15', null),
-- new date
(1, '2020-11-27', 8, '08:00', null),
(1, '2020-11-27', 9, null, '08:22'),
(1, '2020-11-27', 10, '09:14', null),
(1, '2020-11-27', 11, null, '12:08'),
(1, '2020-11-27', 12, '01:08', null),
(1, '2020-11-27', 13, null, '14:40'),
(1, '2020-11-27', 14, '14:55', null),
(1, '2020-11-27', 15, null, '17:00')
select *
from (
/* this just sets the column header names and condenses their values */
select
pk,
workdate,
colName = case when tin is not null then 'TimeIn' + cast(empDaySEQ as varchar) else 'TimeOut' + cast(empDaySEQ as varchar) end,
colValue = coalesce(tin, tout)
from (
/* main query */
select
pk,
workdate,
/* grab what pair # this clock in or out is; reset by employee & date */
empDaySEQ = (row_number() over (partition by pk, workdate order by seq) / 2) + (row_number() over (partition by pk, workdate order by seq) % 2),
tin,
tout
from #test
) i
) a
PIVOT (
max(colValue)
for colName
IN ( /* replace w/ dynamic if you don't know upper boundary of max in/out pairs */
[TimeIn1],
[TimeOut1],
[TimeIn2],
[TimeOut2],
[TimeIn3],
[TimeOut3],
[TimeIn4],
[TimeOut4]
)
) mypivotTable
generates these results.
(I would provide a fiddle demo but they're not working for me today)

MS SQL FIFO Partial transfers

I have a number of transactions that transfer inventory from one account to another account. I can transfer all inventory and I can transfer partial inventory.
I need to pay commission to the owner of the account where inventory resides at my commission date.
My report needs to show the original origin of the inventory items if they have transferred and provide a unit_balance that I can calculate commission from.
Example Transactions:
Account 100
Account, trxid, transacted_units, transactiontype, transferfrom, transferto, date
100, 1, 100, buy, NULL, NULL, 1/1/2020
100, 2, 50, transfer in, 200, NULL, 1/2/2020
Account 200
Account, trxid, transacted_units, transactiontype, transferfrom, transferto, date
200, 3, 40, buy, NULL, NULL, 12/1/2019
200, 4, 30, buy, NULL, NULL, 12/2/2019
200, 5, 7, sell, NULL, NULL, 12/3/2019
200, 6, 50, transfer out, NULL, 100, 1/2/2020
My report output needs to show the full details of accounts associated with the inventory that relates to the unit_balance
Report Output:
[level], Account, trxid, parenttrxid, transacted_units, transactiontype, transferfrom, transferto, date, units_balance
0, 100, 1, NULL, 100, buy, NULL, NULL, 1/1/2020, 100
0, 100, 2, NULL, 50, transfer in, 200, NULL, 1/2/2020, NULL
1, 200, 3, 2, 40, buy, NULL, NULL, 12/1/2019, 33
1, 200, 4, 2, 30, buy, NULL, NULL, 12/2/2019, 17
1, 200, 5, 2, 7, sell, NULL, NULL, 12/3/2019, 0
1, 200, 6, 2, 50, transfer out, NULL, 100, 1/2/2020, 0
*The FIFO logic applies the 7 units sold to the first buy for account 200. The transfer out should then calculate the units_balance on the remaining eligible transactions.
The SQL code I have today only works when I transfer out the full inventory amount, not partial transfers:
select
[level],
parentid,
trxid,
account,
transactiontype,
date,
rnk,
transacted_units,
cumulative,
CASE
WHEN cumulative>0 and transacted_units>=cumulative THEN cumulative
WHEN cumulative>0 and transacted_units<cumulative THEN transacted_units
ELSE 0
END units_bal
from (
select
*,
sum(transacted_units*Positive_Negative_Indicator) over (partition by parenttrxid, account order by rnk, date, trxid RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) cumulative
from (
select *,
CASE
WHEN transacted_units*Positive_Negative_Indicator < 0 THEN 0
ELSE ROW_NUMBER() OVER (PARTITION BY parenttrxid, account ORDER BY Positive_Negative_Indicator ASC, date ASC, trxid ASC)
END rnk
from Transactions
) a
) a
The positive_negative_indicator field represents the direction of a transaction. A sell or transfer out is negative whereas the others are positive.
for each current "in" transaction, calculate the running total (of units) for the previous "in" transactions. Then assign as many "out" units that haven't been consumed by the previous "in" transactions (as many "out" units== running total of "out" units, that can be consumed by the current "in" transaction).
declare #t table
(
Account int,
trxid int,
trunits int,
trtype varchar(20),
transfrom int,
transto int,
thedate date
);
insert into #t(Account, trxid, trunits, trtype, transfrom, transto, thedate)
values
(100, 1, 100, 'buy', NULL, NULL, '20200101'),
(100, 2, 50, 'transfer in', 200, NULL, '20200201'),
(200, 3, 40, 'buy', NULL, NULL, '20190112'),
(200, 4, 30, 'buy', NULL, NULL, '20190213'),
(200, 5, 10, 'buy', NULL, NULL, '20190214'),
(200, 6, 7, 'sell', NULL, NULL, '20190315'),
(200, 7, 9, 'sell', NULL, NULL, '20190316'),
(200, 8, 25, 'buy', NULL, NULL, '20190317'),
(200, 9, 39, 'sell', NULL, NULL, '20190318'),
(200, 10, 18, 'sell', NULL, NULL, '20190319'),
(200, 11, 14, 'sell', NULL, NULL, '20190320'),
(200, 11, 50, 'transfer out', NULL, 100, '20200201');
select *, case when t.trtype not in ('sell', 'transfer out') then t.trunits -isnull(otu.out_units, 0) else null end as leftover_units
from
(
select *, sum(case when trtype not in ('sell', 'transfer out') then trunits else 0 end) over (partition by Account order by thedate rows between unbounded preceding and 1 preceding) as previous_in_running_units
from #t
) as t
outer apply
(
select top (1) ort.out_units_running_total - isnull(t.previous_in_running_units, 0) as out_units
from
(
select sum(o.trunits) over(order by o.thedate) as out_units_running_total
from #t as o
where o.trtype in ('sell', 'transfer out')
and o.Account = t.Account
and t.trtype not in ('sell', 'transfer out') --no calculations needed when cross applying for "out" transactions
) as ort --out running totals
where ort.out_units_running_total-isnull(t.previous_in_running_units, 0) <= t.trunits --<-- ("in") use as many out units as can be consumed by current t.transaction/date after deducting what has been consumed by the previous t.transaction/date
and ort.out_units_running_total-isnull(t.previous_in_running_units, 0) > 0 --not needed(?) if balance is guaranteed.. total_out = total_in
order by ort.out_units_running_total desc
) as otu; --"out units"

SQL query not returning correct date range

I have a simple view created in VS 2017. Here it is:
CREATE VIEW [dbo].[ApplicantStat]
AS SELECT ISNULL(CONVERT(VARCHAR(50), NEWID()), '') AS ID,
ISNULL(AVG(ApplicationTime), 0) AS 'AvgApplicationTime',
ISNULL(AVG(ResponseTime), 0) AS 'AvgResponseTime',
ISNULL(CAST(COUNT(CASE WHEN [IsAccepted] = 1 THEN 1 END) / COUNT(CASE WHEN [IsValid] = 1 THEN 1 END) AS float), 0) AS 'PctAccepted'
FROM [Application]
WHERE CreatedOn BETWEEN CAST(GETDATE()-30 AS date) AND CAST(GETDATE()-1 AS date)
As you can see, it gets data between 2 dates and does some simple aggregation.
The idea of the cast is that I want to ignore the time and get everything for the date range regardless - so as of today, 15th Mar, I would it to fetch everything for 14th March 00:00:00 - 23:59:59 and 29 days previous.
This does not happen - it picks up 3 rows (13th) - it should pick up all 5 rows. And yes, my system date is currently 15/03/2018 14:44 (UK time).
Here's, the table and data:
CREATE TABLE [dbo].[Application] (
[Id] INT NOT NULL,
[ApplicantId] INT NOT NULL,
[LoanAmount] INT NOT NULL,
[LoanTerm] SMALLINT NOT NULL,
[EmailAddress] VARCHAR (254) NOT NULL,
[MobilePhone] VARCHAR (11) NOT NULL,
[House] VARCHAR (25) NOT NULL,
[Street] VARCHAR (50) NOT NULL,
[TownCity] VARCHAR (50) NOT NULL,
[Postcode] VARCHAR (7) NOT NULL,
[IpAddress] VARCHAR (39) NOT NULL,
[IsValid] BIT NOT NULL,
[IsAccepted] BIT NOT NULL,
[Commission] DECIMAL (9, 2) NOT NULL,
[Processors] VARCHAR (500) NOT NULL,
[ResponseTime] SMALLINT NOT NULL,
[ApplicationTime] SMALLINT NOT NULL,
[CreatedOn] DATETIME NOT NULL,
PRIMARY KEY CLUSTERED ([Id] ASC)
);
INSERT INTO [dbo].[Application] ([Id], [ApplicantId], [LoanAmount], [LoanTerm], [EmailAddress], [MobilePhone], [House], [Street], [TownCity], [Postcode], [IpAddress], [IsValid], [IsAccepted], [Commission], [Processors], [ResponseTime], [ApplicationTime], [CreatedOn]) VALUES (1, 1, 300, 3, N'john.doe#tmail.com', N'07957000000', N'1', N'Acacia Avenue', N'Suburbia', N'SB1 2RB', N'100.100.100.100', 1, 1, CAST(3.20 AS Decimal(9, 2)), N'1,2,3,4,5', 90, 600, N'2018-03-13 08:00:00')
INSERT INTO [dbo].[Application] ([Id], [ApplicantId], [LoanAmount], [LoanTerm], [EmailAddress], [MobilePhone], [House], [Street], [TownCity], [Postcode], [IpAddress], [IsValid], [IsAccepted], [Commission], [Processors], [ResponseTime], [ApplicationTime], [CreatedOn]) VALUES (2, 2, 500, 12, N'a#b.com', N'0', N'1', N'a', N's', N's', N'1', 0, 1, CAST(5.00 AS Decimal(9, 2)), N'1', 60, 300, N'2018-03-14 16:00:00')
INSERT INTO [dbo].[Application] ([Id], [ApplicantId], [LoanAmount], [LoanTerm], [EmailAddress], [MobilePhone], [House], [Street], [TownCity], [Postcode], [IpAddress], [IsValid], [IsAccepted], [Commission], [Processors], [ResponseTime], [ApplicationTime], [CreatedOn]) VALUES (3, 3, 1000, 6, N'a#b.com', N'0', N'1', N'a', N's', N's', N'1', 1, 1, CAST(7.00 AS Decimal(9, 2)), N'1', 75, 360, N'2018-03-13 10:00:00')
INSERT INTO [dbo].[Application] ([Id], [ApplicantId], [LoanAmount], [LoanTerm], [EmailAddress], [MobilePhone], [House], [Street], [TownCity], [Postcode], [IpAddress], [IsValid], [IsAccepted], [Commission], [Processors], [ResponseTime], [ApplicationTime], [CreatedOn]) VALUES (4, 4, 2000, 24, N'a#b.com', N'0', N'1', N'a', N's', N's', N'1', 1, 1, CAST(20.00 AS Decimal(9, 2)), N'1', 30, 365, N'2018-03-14 11:00:00')
INSERT INTO [dbo].[Application] ([Id], [ApplicantId], [LoanAmount], [LoanTerm], [EmailAddress], [MobilePhone], [House], [Street], [TownCity], [Postcode], [IpAddress], [IsValid], [IsAccepted], [Commission], [Processors], [ResponseTime], [ApplicationTime], [CreatedOn]) VALUES (5, 5, 3000, 18, N'a#b.com', N'0', N'1', N'a', N's', N's', N'1', 1, 1, CAST(40.00 AS Decimal(9, 2)), N'1', 45, 330, N'2018-03-13 12:00:00')
Try this out:
WHERE
CreatedOn >= CAST(GETDATE()-30 AS date) AND
CreatedOn < CAST(GETDATE() AS date)
The problem is your converting to date the day before today.
You can CAST your CreatedOn field as DATE to remove the time portion, which is getting in your way here...
Perhaps
WHERE CAST(CreatedOn AS DATE) BETWEEN CAST(GETDATE()-30 AS date) AND CAST(GETDATE()-1 AS date)
BUT - CASTing a field in the WHERE expression may make it non SARGable. See here. So avoid this solution for large or production environments unless you know the expression will be SARGable. Use only as a test to refine your logic and options. (Even if there is no explicit index on CreatedOn - it may still suffer as SQL builds its own indexes all the time if no index exists explicitly.
Always worth confirming whether it is SARGable so you know for sure.)
To see what is happening - view your values in your SELECT - just to get an idea of what is working
For example:
SELECT TOP 1000
CreatedOn
,CAST(GETDATE()-30 AS date)
,CAST(GETDATE()-1 AS date)
FROM [Application]
Or see the other options for removing time values from datatime fields here
as you may want to coerce or round the time value instead
Instead of trying to ignore the time value, just make sure that your search terms are accurate for it. Also, don't blindly add things like ISNULL to every column. Spend a few seconds thinking if it's relevant or not. NEWID() for example, is never going to return a NULL value to you. Adding that kind of code is poor programming which will lead to less legible code.
Here's how I would write it to account for the time portions:
CREATE VIEW dbo.ApplicantStat
AS
SELECT
CONVERT(VARCHAR(50), NEWID()) AS ID,
COALESCE(AVG(ApplicationTime), 0) AS AvgApplicationTime,
COALESCE(AVG(ResponseTime), 0) AS AvgResponseTime,
COALESCE(CAST(COUNT(CASE WHEN [IsAccepted] = 1 THEN 1 END) / COUNT(CASE WHEN [IsValid] = 1 THEN 1 END) AS float), 0) AS PctAccepted
FROM
dbo.Application
WHERE
CreatedOn >= DATEADD(DAY, -30, CAST(GETDATE() AS DATE)) AND
CreatedOn < CAST(GETDATE() AS DATE)

Import wizard with subqueries

I want to import 100k+ rows on to a SQL Server table.
I have my insert like this (observe the 6th value that is a subquery):
INSERT INTO BD_S3I.dbo.AGENDA
(COD_UNDFBR, COD_DCPLNA, COD_TECNCA, COD_ATVIDE, DAT_PROGM_AGENDA, NUM_SQNCL_AGENDA, DAT_FINAL_AGENDA, COD_OCORR, COD_ROTA, NUM_SEMAN_PRGINS, NUM_DIAIN_PRGINS, DAT_INIC_PRGINS, MRC_SITUA_AGENDA, DAT_SUSPN_AGENDA, DAT_CONCL_AGENDA, DAT_REPRG_AGENDA, DCR_SITUA_AGENDA, DCR_AGENDA, MRC_AVISO_AGENDA, MRC_NEGLG_AGENDA, NUM_PERIO_PRGINS, DAT_DIAIN_PRGINS, DAT_JUSTN_AGENDA, COD_MTVNVS, MRC_ERP_AGENDA, COD_USUS3I_JUSTN)
VALUES
(1, 290, 2, 6, '2017-09-11 00:00:00.000', (SELECT CASE WHEN MAX(AGENDA.NUM_SQNCL_AGENDA) + 1 IS NULL THEN 1 ELSE MAX(AGENDA.NUM_SQNCL_AGENDA) + 1 END FROM AGENDA WHERE AGENDA.COD_UNDFBR = 1 AND AGENDA.COD_DCPLNA = 290 AND AGENDA.COD_TECNCA = 2 AND AGENDA.COD_ATVIDE = 6 AND AGENDA.DAT_PROGM_AGENDA = '2017-09-11 00:00:00.000'), '2017-09-17 00:00:00.000', NULL, 492, NULL, NULL, '2017-07-24 08:30:00.000', 'P', NULL, NULL, NULL, NULL, NULL, 'S', 'S', 7, '2017-07-24 00:00:00.000', NULL, NULL, 'N', NULL);
I put al the 100k inserts below each other and start the import. It is working fine, but thakes too much time to execute all the 100k+ rows.
I was thinking to use the import wizard (the time is better?).
The problem is that when I choose the excel file with my data, the import wizard do not understand the subquery on the value. It calls it a longtext.
Select atleast one return type that's needs to be inserted into 6th column.
Just like
( SELECT x = CASE .... )
Or use return type along with subquery at the end.
Simply convert your INSERT...VALUES to INSERT...SELECT which works since all other values are scalars and can be included inline with the subquery's SELECT statement:
INSERT INTO BD_S3I.dbo.AGENDA (COD_UNDFBR, COD_DCPLNA, COD_TECNCA, COD_ATVIDE,
DAT_PROGM_AGENDA, NUM_SQNCL_AGENDA, DAT_FINAL_AGENDA,
COD_OCORR, COD_ROTA, NUM_SEMAN_PRGINS, NUM_DIAIN_PRGINS,
DAT_INIC_PRGINS, MRC_SITUA_AGENDA, DAT_SUSPN_AGENDA,
DAT_CONCL_AGENDA, DAT_REPRG_AGENDA, DCR_SITUA_AGENDA,
DCR_AGENDA, MRC_AVISO_AGENDA, MRC_NEGLG_AGENDA,
NUM_PERIO_PRGINS, DAT_DIAIN_PRGINS, DAT_JUSTN_AGENDA,
COD_MTVNVS, MRC_ERP_AGENDA, COD_USUS3I_JUSTN)
SELECT 1, 290, 2, 6, '2017-09-11 00:00:00.000',
CASE WHEN MAX(AGENDA.NUM_SQNCL_AGENDA) + 1 IS NULL
THEN 1
ELSE MAX(AGENDA.NUM_SQNCL_AGENDA) + 1
END,
'2017-09-17 00:00:00.000', NULL, 492, NULL,
NULL, '2017-07-24 08:30:00.000', 'P',
NULL, NULL, NULL, NULL, NULL, 'S', 'S', 7,
'2017-07-24 00:00:00.000', NULL, NULL, 'N', NULL
FROM AGENDA
WHERE AGENDA.COD_UNDFBR = 1 AND AGENDA.COD_DCPLNA = 290
AND AGENDA.COD_TECNCA = 2 AND AGENDA.COD_ATVIDE = 6
AND AGENDA.DAT_PROGM_AGENDA = '2017-09-11 00:00:00.000')

Most performant way in SQL Server to condense multiple data changes into before and after values

I have a SQL Server database with some audit records showing changes to a third party database (OpenEdge). I have no control over the structure of the audit data, nor the way the third party database audits data changes. So I'm left with, for example, the following data...
If you follow the first five rows you can see they all belong to TransId 1532102 (represents a database transaction) where the TransSeq represents a database action within a single transaction.
In the columns prefix New the audit changes are visible. If the value is NULL then no change to that field took place.
Looking at the data you can see that where TransId = 1532102 the PrimaryIdentifier is changed from 2 to -2 (row 1), then from -2 to 3 (row 3), then from 3 to 4 (row 4) and finally from 4 to 5 (row 5). You might also notice that when the PrimaryIdentifier changes from 3 to 4 the SecondaryIdentifier changes from 'abcd' to 'efgh' (row 4).
So these multiple changes are actually only occurring on a single source record. So with this in mind rows 1, 3, 4 & 5 can all be condensed into a single row (see below)
Ultimately there are only two record changes in TransId 1532102..
I need to translate these changes into a single UPDATE statement on a target database. In order to do this I need to ensure I have a single record showing the before and after values.
So given the source data presented here I need to produce the following data set..
What query structures could I use to achieve this? I was thinking recursive CTEs or perhaps using Hierarchical structures?
Ultimately I need this to perform as well as possible so I wanted to pose the question here in case I hadn't considered all possible approaches.
Thoughts welcome and here's a script for the sample data
DECLARE #TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT #TestTable
SELECT 128, 1532102, 0, 2, 'abcd', -2, NULL, NULL, 'test data'
UNION SELECT 128, 1532102, 1, 3, 'abcd', 2, NULL, NULL, NULL
UNION SELECT 128, 1532102, 2, -2, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532102, 3, 3, 'abcd', 4, 'efgh', NULL, NULL
UNION SELECT 128, 1532102, 4, 4, 'efgh', 5, NULL, 2, NULL
UNION SELECT 128, 1532102, 5, 5, 'efgh', NULL, 'ghfi', NULL, NULL
UNION SELECT 128, 1532106, 0, 3, 'abcd', -3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 1, 4, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 2, -3, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 0, 4, 'abcd', -4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 1, 5, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 2, -4, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 0, 5, 'abcd', -5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 1, 4, 'abcd', 5, NULL, 1, NULL
UNION SELECT 128, 1532114, 2, -5, 'abcd', 4, NULL, NULL, 'some more test data'
SELECT *
FROM #TestTable
EDIT:
I've actually been unable to write any queries that successfully track the identifier changes. Can anyone help - I need a query that tracks the changes in PrimaryIdentifier values and ultimately provides a single record for each tracking with start values and end values.
EDIT 2:
There's been a deleted answer that suggests the update to the key identifiers is not possible when condensed and that I should step through the changes instead. I thought it would be valuable to add my comments for further info to the question..
I need to condense the dataset because of the volume of audit records being generated; most of which are unecessary because of the way the source DBMS makes its changes. I need to reduce the dataset and I need to track key identifier changes. The update should be possible without clashing on id change during the update statement - see this example.
I assume that
1) (PrimaryIdentifier, SecondaryIdentifier) is a PK of the target table,
2) Every transacton in the audit table leaves target table in a consistent state.
So the update of the PK in a single statement for every transaction using case will run OK:
declare #t table (id int primary key, old int);
insert #t(id, old) values (4,4),(5,5);
update #t set id = case id
when 4 then 5
when 5 then 4 end;
select * from #t;
The plan is
1. Condense transactions
2. Generate update sql into temp table. Then you can run all or selected items from the temp table. Every item is of the form
UPDATE myTable SET
PrimaryIdentifier = CASE WHEN PrimaryIdentifier=2 AND SecondaryIdentifier='abcd' THEN 5
WHEN PrimaryIdentifier=3 AND SecondaryIdentifier='abcd' THEN 2 END,
SecondaryIdentifier = CASE WHEN PrimaryIdentifier=2 AND SecondaryIdentifier='abcd' THEN 'efgh'
WHEN PrimaryIdentifier=3 AND SecondaryIdentifier='abcd' THEN 'abcd' END ,
Level= CASE WHEN PrimaryIdentifier=2 AND SecondaryIdentifier='abcd' THEN 2
WHEN PrimaryIdentifier=3 AND SecondaryIdentifier='abcd' THEN Level END ,
Value= CASE WHEN PrimaryIdentifier=2 AND SecondaryIdentifier='abcd' THEN 'test data'
WHEN PrimaryIdentifier=3 AND SecondaryIdentifier='abcd' THEN Value END
WHERE 1=2 OR (PrimaryIdentifier=2 AND SecondaryIdentifier='abcd')
OR (PrimaryIdentifier=3 AND SecondaryIdentifier='abcd')
The query
DECLARE #TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT #TestTable
SELECT 128, 1532102, 0, 2, 'abcd', -2, NULL, NULL, 'test data'
UNION SELECT 128, 1532102, 1, 3, 'abcd', 2, NULL, NULL, NULL
UNION SELECT 128, 1532102, 2, -2, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532102, 3, 3, 'abcd', 4, 'efgh', NULL, NULL
UNION SELECT 128, 1532102, 4, 4, 'efgh', 5, NULL, 2, NULL
UNION SELECT 128, 1532106, 0, 3, 'abcd', -3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 1, 4, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 2, -3, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 0, 4, 'abcd', -4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 1, 5, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 2, -4, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532110, 3, 5, 'abcd', 6, NULL, NULL, NULL
UNION SELECT 128, 1532110, 4, 6, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 0, 5, 'abcd', -5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 1, 4, 'abcd', 5, NULL, 1, NULL
UNION SELECT 128, 1532114, 2, -5, 'abcd', 4, NULL, NULL, 'some more test data'
;
WITH root AS (
-- Top parent updates within transactions
SELECT SyncId, TransId, TransSeq, PrimaryIdentifier AS rPrimaryIdentifier, SecondaryIdentifier AS rSecondaryIdentifier,
NewPrimaryIdentifier,
coalesce(NewSecondaryIdentifier, SecondaryIdentifier) AS NewSecondaryIdentifier,
newLevel, NewValue
FROM #TestTable t
WHERE NOT EXISTS (SELECT 1
FROM #TestTable t2
WHERE t2.SyncId=t.SyncId AND t2.TransId = t.TransId
AND t2.TransSeq < t.TransSeq
AND t.PrimaryIdentifier = t2.NewPrimaryIdentifier
AND t.SecondaryIdentifier = coalesce(t2.NewSecondaryIdentifier, t2.SecondaryIdentifier)
)
-- recursion to track the chain of updates
UNION ALL
SELECT root.SyncId, root.TransId, t.TransSeq, rPrimaryIdentifier, rSecondaryIdentifier,
t.NewPrimaryIdentifier,
coalesce(t.NewSecondaryIdentifier, root.NewSecondaryIdentifier),
coalesce(root.NewLevel, t.NewLevel), coalesce(root.NewValue, t.NewValue)
FROM root
JOIN #TestTable t ON root.SyncId=t.SyncId AND root.TransId = t.TransId
AND root.TransSeq < t.TransSeq
AND t.PrimaryIdentifier = root.NewPrimaryIdentifier
AND t.SecondaryIdentifier = root.NewSecondaryIdentifier
)
,condensed as (
-- last update in the chain
SELECT TOP(1) WITH TIES *
FROM root
ORDER BY row_number() over (partition by SyncId, TransId, rPrimaryIdentifier, rSecondaryIdentifier
order by TransSeq desc)
)
-- generate sql
SELECT SyncId, TransId, sql = 'UPDATE myTable SET PrimaryIdentifier = CASE'
+ (SELECT ' WHEN PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20))
+' AND SecondaryIdentifier=''' + rSecondaryIdentifier
+''' THEN ' + CAST(NewPrimaryIdentifier as varchar(20))
FROM condensed c2
WHERE c1.SyncId = c2.SyncId AND c1.TransId= c2.TransId
FOR XML PATH('') )
+ ' END, SecondaryIdentifier = CASE'
+ (SELECT ' WHEN PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20))
+' AND SecondaryIdentifier=''' + rSecondaryIdentifier
+''' THEN ''' + NewSecondaryIdentifier + ''''
FROM condensed c2
WHERE c1.SyncId = c2.SyncId AND c1.TransId= c2.TransId
FOR XML PATH('') )
+ ' END , Level= CASE'
+ (SELECT ' WHEN PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20))
+' AND SecondaryIdentifier=''' + rSecondaryIdentifier
+''' THEN '
+ CASE WHEN NewLevel IS NULL THEN ' Level ' ELSE CAST(NewLevel as varchar(20)) END
FROM condensed c2
WHERE c1.SyncId = c2.SyncId AND c1.TransId= c2.TransId
FOR XML PATH('') )
+ ' END , Value= CASE'
+ (SELECT ' WHEN PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20))
+' AND SecondaryIdentifier=''' + rSecondaryIdentifier
+''' THEN '
+ CASE WHEN NewValue IS NULL THEN ' Value ' ELSE '''' + NewValue + '''' END
FROM condensed c2
WHERE c1.SyncId = c2.SyncId AND c1.TransId= c2.TransId
FOR XML PATH('') )
+ ' END'
+ ' WHERE 1=2'
+ (SELECT ' OR (PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20))
+' AND SecondaryIdentifier=''' + rSecondaryIdentifier +''')'
FROM condensed c2
WHERE c1.SyncId = c2.SyncId AND c1.TransId= c2.TransId
FOR XML PATH('') )
INTO #UpdSql
FROM condensed c1
GROUP BY SyncId, TransId
SELECT *
FROM #UpdSql
ORDER BY SyncId, TransId
EDIT
Taking into account NewPrimaryIdentifier can be NULL too. See added row at #TestTable. Sql generation skipped.
DECLARE #TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT #TestTable
SELECT 128, 1532102, 0, 2, 'abcd', -2, NULL, NULL, 'test data'
UNION SELECT 128, 1532102, 1, 3, 'abcd', 2, NULL, NULL, NULL
UNION SELECT 128, 1532102, 2, -2, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532102, 3, 3, 'abcd', 4, 'efgh', NULL, NULL
UNION SELECT 128, 1532102, 4, 4, 'efgh', 5, NULL, 2, NULL
UNION SELECT 128, 1532102, 5, 5, 'efgh', null, 'ghfi', null, NULL -- added
UNION SELECT 128, 1532106, 0, 3, 'abcd', -3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 1, 4, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 2, -3, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 0, 4, 'abcd', -4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 1, 5, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 2, -4, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532110, 3, 5, 'abcd', 6, NULL, NULL, NULL
UNION SELECT 128, 1532110, 4, 6, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 0, 5, 'abcd', -5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 1, 4, 'abcd', 5, NULL, 1, NULL
UNION SELECT 128, 1532114, 2, -5, 'abcd', 4, NULL, NULL, 'some more test data'
;
WITH root AS (
-- Top parent updates within transactions
SELECT SyncId, TransId, TransSeq, PrimaryIdentifier AS rPrimaryIdentifier, SecondaryIdentifier AS rSecondaryIdentifier,
coalesce(NewPrimaryIdentifier, PrimaryIdentifier) AS NewPrimaryIdentifier,
coalesce(NewSecondaryIdentifier, SecondaryIdentifier) AS NewSecondaryIdentifier,
newLevel, NewValue
FROM #TestTable t
WHERE NOT EXISTS (SELECT 1
FROM #TestTable t2
WHERE t2.SyncId=t.SyncId AND t2.TransId = t.TransId
AND t2.TransSeq < t.TransSeq
AND t.PrimaryIdentifier = coalesce(t2.NewPrimaryIdentifier, t2.PrimaryIdentifier)
AND t.SecondaryIdentifier = coalesce(t2.NewSecondaryIdentifier, t2.SecondaryIdentifier)
)
-- recursion to track the chain of updates
UNION ALL
SELECT root.SyncId, root.TransId, t.TransSeq, rPrimaryIdentifier, rSecondaryIdentifier,
coalesce(t.NewPrimaryIdentifier, root.NewPrimaryIdentifier),
coalesce(t.NewSecondaryIdentifier, root.NewSecondaryIdentifier),
coalesce(t.NewLevel, root.NewLevel), coalesce(t.NewValue, root.NewValue)
FROM root
JOIN #TestTable t ON root.SyncId=t.SyncId AND root.TransId = t.TransId
AND root.TransSeq < t.TransSeq
AND t.PrimaryIdentifier = root.NewPrimaryIdentifier
AND t.SecondaryIdentifier = root.NewSecondaryIdentifier
)
,condensed as (
-- last update in the chain
SELECT TOP(1) WITH TIES *
FROM root
ORDER BY row_number() over (partition by SyncId, TransId, rPrimaryIdentifier, rSecondaryIdentifier
order by TransSeq desc)
)
SELECT *
FROM condensed
ORDER BY SyncId, TransId, rPrimaryIdentifier, rSecondaryIdentifier
Here is a second stab at producing the originally asked for output. This time using a bunch of CTE:s.
DECLARE #TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT #TestTable
SELECT 128, 1532102, 0, 2, 'abcd', -2, NULL, NULL, 'test data'
UNION SELECT 128, 1532102, 1, 3, 'abcd', 2, NULL, NULL, NULL
UNION SELECT 128, 1532102, 2, -2, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532102, 3, 3, 'abcd', 4, 'efgh', NULL, NULL
UNION SELECT 128, 1532102, 4, 4, 'efgh', 5, NULL, 2, NULL
UNION SELECT 128, 1532106, 0, 3, 'abcd', -3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 1, 4, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 2, -3, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 0, 4, 'abcd', -4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 1, 5, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 2, -4, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532110, 3, 5, 'abcd', 6, NULL, NULL, NULL
UNION SELECT 128, 1532110, 4, 6, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 0, 5, 'abcd', -5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 1, 4, 'abcd', 5, NULL, 1, NULL
UNION SELECT 128, 1532114, 2, -5, 'abcd', 4, NULL, NULL, 'some more test data'
;with baseCTE as (
select SyncId, TransId, TransSeq, PrimaryIdentifier, SecondaryIdentifier,
isnull(NewPrimaryIdentifier, PrimaryIdentifier) as NewPrimaryIdentifier,
isnull(NewSecondaryIdentifier, SecondaryIdentifier) as NewSecondaryIdentifier,
NewLevel, NewValue
from #TestTable
),
syncTransEntryPointsCte as (
select *
from baseCTE b
where not exists(
select *
from baseCTE subb
where b.SyncId = subb.SyncId
and b.TransId = subb.TransId
and b.PrimaryIdentifier = subb.NewPrimaryIdentifier
and b.SecondaryIdentifier = subb.NewSecondaryIdentifier
and b.TransSeq > subb.TransSeq
)
)
, recursiveBaseCte as (
select *, 0 as lev, TransSeq as OrigTransSec from syncTransEntryPointsCte
union all
select
c.SyncId, c.TransId, c.TransSeq, p.PrimaryIdentifier, p.SecondaryIdentifier, c.NewPrimaryIdentifier, c.NewSecondaryIdentifier, isnull(c.NewLevel, p.NewLevel), isnull(c.NewValue, p.NewValue),
p.lev + 1,
p.OrigTransSec
from baseCTE c
join recursiveBaseCte as p on (
c.SyncId = p.SyncId and c.TransId = p.TransId and c.PrimaryIdentifier = p.NewPrimaryIdentifier and c.SecondaryIdentifier = p.NewSecondaryIdentifier and c.TransSeq > p.TransSeq
)
)
select r.SyncId, r.TransId, r.OrigTransSec as TransSec,
r.PrimaryIdentifier, r.SecondaryIdentifier,
nullif(r.NewPrimaryIdentifier, r.PrimaryIdentifier) as NewPrimaryIdentifier,
nullif(r.NewSecondaryIdentifier, r.SecondaryIdentifier) as NewSecondaryIdentifier,
r.NewLevel, r.NewValue
from recursiveBaseCte r
join (
select SyncId, TransId, PrimaryIdentifier, SecondaryIdentifier, max(lev) as mlev
from recursiveBaseCte
group by SyncId, TransId, PrimaryIdentifier, SecondaryIdentifier
) as selectForOutput on
r.SyncId = selectForOutput.SyncId
and r.TransId = selectForOutput.TransId
and r.PrimaryIdentifier = selectForOutput.PrimaryIdentifier
and r.SecondaryIdentifier = selectForOutput.SecondaryIdentifier
and r.lev = selectForOutput.mlev
order by 1,2,3
Whether or not the CTE approach is any faster than the cursor based one is difficult to guess. I do suggest you test run this at a suitable time when the server in question is not under heavy load.
Update
The script first declares the baseCTE which is used just to make sure that we have values in NewPrimaryIdentifier and NewSecondaryIdentifier for each row, even if one or both of them were not changed in the update. This makes everything after that easier since we can then join to the next row for the same combination within a specific transaction.
The syncTransEntryPointCte in turn uses baseCTE to find all rows within one transaction that were not preceded by another row within the same transaction.
recursiveBaseCte then uses both of the previous CTE:s to recursively find rows and aggregate changes. The final query then uses it to produce the final output.
The output should be usable for updating a stale copy of the source table if you can manage to do the updates for one condensed transaction in one update statement. If, as I originally assumed, you try to build one update statement for each row in the condensed audit output, it will not work.
Finally, obligatory disclaimer: This seems to work with the test data you gave in the question. I can give no guarantees that it works for the real thing, so use with caution.
Here is a first stab at getting the desired output. It's using a CURSOR, so don't expect great performance.
set nocount on
DECLARE #TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
DECLARE #OutputTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT #TestTable
SELECT 128, 1532102, 0, 2, 'abcd', -2, NULL, NULL, 'test data'
UNION SELECT 128, 1532102, 1, 3, 'abcd', 2, NULL, NULL, NULL
UNION SELECT 128, 1532102, 2, -2, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532102, 3, 3, 'abcd', 4, 'efgh', NULL, NULL
UNION SELECT 128, 1532102, 4, 4, 'efgh', 5, NULL, 2, NULL
UNION SELECT 128, 1532106, 0, 3, 'abcd', -3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 1, 4, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 2, -3, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 0, 4, 'abcd', -4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 1, 5, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 2, -4, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532110, 3, 5, 'abcd', 6, NULL, NULL, NULL
UNION SELECT 128, 1532110, 4, 6, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 0, 5, 'abcd', -5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 1, 4, 'abcd', 5, NULL, 1, NULL
UNION SELECT 128, 1532114, 2, -5, 'abcd', 4, NULL, NULL, 'some more test data'
--SELECT * FROM #TestTable
declare #cSyncId int, #cTransId int, #cTransSeq int, #cPrimaryId int, #cSecondaryId nchar(4), #cNewPrimaryId int, #cNewSecondary nchar(4), #cNewLevel int, #cNewValue nvarchar(20)
declare #newTransSeq int, #prevSyncId int, #prevTransId int
set #newTransSeq = 0
set #prevSyncId = 0
set #prevTransId = 0
declare auditCursor CURSOR for
select SyncId, TransId, TransSeq, PrimaryIdentifier, SecondaryIdentifier,
isnull(NewPrimaryIdentifier, PrimaryIdentifier) as NewPrimaryIdentifier,
isnull(NewSecondaryIdentifier, SecondaryIdentifier) as NewSecondaryIdentifier,
NewLevel, NewValue
from #TestTable
order by SyncId, TransId, TransSeq
open auditCursor
fetch next from auditCursor into #cSyncId, #cTransId, #cTransSeq, #cPrimaryId, #cSecondaryId, #cNewPrimaryId, #cNewSecondary, #cNewLevel, #cNewValue
while ##FETCH_STATUS = 0
begin
if #prevSyncId != #cSyncId or #prevTransId != #cTransId
begin
set #newTransSeq = 0
set #prevSyncId = #cSyncId
set #prevTransId = #cTransId
end
if(not exists(select * from #OutputTable where SyncId = #cSyncId and TransId = #cTransId and NewPrimaryIdentifier = #cPrimaryId and NewSecondaryIdentifier = #cSecondaryId))
begin
insert into #OutputTable values(#cSyncId, #cTransId, #newTransSeq, #cPrimaryId, #cSecondaryId, #cNewPrimaryId, #cNewSecondary, #cNewLevel, #cNewValue)
set #newTransSeq = #newTransSeq + 1
end
else
begin
update #OutputTable
set NewPrimaryIdentifier = isnull(#cNewPrimaryId, NewPrimaryIdentifier),
NewSecondaryIdentifier = isnull(#cNewSecondary, NewSecondaryIdentifier),
NewLevel = isnull(#cNewLevel, NewLevel),
NewValue = isnull(#cNewValue, NewValue)
where SyncId = #cSyncId
and TransId = #cTransId
and NewPrimaryIdentifier = #cPrimaryId
and NewSecondaryIdentifier = #cSecondaryId
end
fetch next from auditCursor into #cSyncId, #cTransId, #cTransSeq, #cPrimaryId, #cSecondaryId, #cNewPrimaryId, #cNewSecondary, #cNewLevel, #cNewValue
end
deallocate auditCursor
select
SyncId, TransId, TransSeq, PrimaryIdentifier, SecondaryIdentifier,
nullif(NewPrimaryIdentifier, PrimaryIdentifier) as NewPrimaryIdentifier,
nullif(NewSecondaryIdentifier, SecondaryIdentifier) as NewSecondaryIdentifier,
NewLevel, NewValue
from #OutputTable order by 1,2,3
As far as I can tell, this will give the output you want. But then it depends on what you want to do next if this is actually the output you should want.
If, for example, you are going to use the output to somehow generate update scripts in order to sync a copy of the database so that the copy is up to date with the source database, this will not work.
If we look at the output for transaction 1532106, the condensed audit has primary id 3 change to 4, then primary id 4 change to 3. That will of course not work.
Based on how the audit trail looks, it seems that the program manipulating the tables switches a primary id to negative value when it need to free up the id on the row. If we change one line in my sample:
if(not exists(select * from #OutputTable where SyncId = #cSyncId and TransId = #cTransId and NewPrimaryIdentifier = #cPrimaryId and NewSecondaryIdentifier = #cSecondaryId))
to
if(not exists(select * from #OutputTable where SyncId = #cSyncId and TransId = #cTransId and NewPrimaryIdentifier = #cPrimaryId and NewSecondaryIdentifier = #cSecondaryId) or #cPrimaryId < 0)
(added or #cPrimaryId < 0) then we get a different, less condensed, output that as far as I can tell should be workable for the case mentioned.
Here is a way to get "the latest condensed record" only using SQL. Since I don't have a full data set I cannot tell you how well it will perform though.
DECLARE #TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT #TestTable
SELECT 128, 1532102, 0, 2, 'abcd', -2, NULL, NULL, 'test data'
UNION SELECT 128, 1532102, 1, 3, 'abcd', 2, NULL, NULL, NULL
UNION SELECT 128, 1532102, 2, -2, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532102, 3, 3, 'abcd', 4, 'efgh', NULL, NULL
UNION SELECT 128, 1532102, 4, 4, 'efgh', 5, NULL, 2, NULL
UNION SELECT 128, 1532106, 0, 3, 'abcd', -3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 1, 4, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 2, -3, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 0, 4, 'abcd', -4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 1, 5, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 2, -4, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 0, 5, 'abcd', -5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 1, 4, 'abcd', 5, NULL, 1, NULL
UNION SELECT 128, 1532114, 2, -5, 'abcd', 4, NULL, NULL, 'some more test data';
WITH data AS (
SELECT *
, ROW_NUMBER() OVER(PARTITION BY TRANSID ORDER BY CASE WHEN PrimaryIdentifier IS NULL THEN 1 ELSE 0 END, TRANSSeq desc) AS rn_PrimaryIdentifier
, ROW_NUMBER() OVER(PARTITION BY TRANSID ORDER BY CASE WHEN SecondaryIdentifier IS NULL THEN 1 ELSE 0 END, TRANSSeq desc) AS rn_SecondaryIdentifier
, ROW_NUMBER() OVER(PARTITION BY TRANSID ORDER BY CASE WHEN NewPrimaryIdentifier IS NULL THEN 1 ELSE 0 END, TRANSSeq desc) AS rn_NewPrimaryIdentifier
, ROW_NUMBER() OVER(PARTITION BY TRANSID ORDER BY CASE WHEN NewSecondaryIdentifier IS NULL THEN 1 ELSE 0 END, TRANSSeq desc) AS rn_NewSecondaryIdentifier
, ROW_NUMBER() OVER(PARTITION BY TRANSID ORDER BY CASE WHEN NewLevel IS NULL THEN 1 ELSE 0 END, TRANSSeq desc) AS rn_NewLevel
, ROW_NUMBER() OVER(PARTITION BY TRANSID ORDER BY CASE WHEN NewValue IS NULL THEN 1 ELSE 0 END, TRANSSeq desc) AS rn_NewValue
FROM #TestTable
)
, transIds
AS (
SELECT DISTINCT SyncId, TransId
FROM #TestTable)
SELECT t.SyncId
, t.TransId
, (SELECT d.PrimaryIdentifier FROM data d WHERE d.TransId = t.TransId AND d.rn_PrimaryIdentifier = 1) AS PrimaryIdentifier
, (SELECT d.SecondaryIdentifier FROM data d WHERE d.TransId = t.TransId AND d.rn_SecondaryIdentifier = 1) AS SecondaryIdentifier
, (SELECT d.NewPrimaryIdentifier FROM data d WHERE d.TransId = t.TransId AND d.rn_NewPrimaryIdentifier = 1) AS NewPrimaryIdentifier
, (SELECT d.NewSecondaryIdentifier FROM data d WHERE d.TransId = t.TransId AND d.rn_NewSecondaryIdentifier = 1) AS NewSecondaryIdentifier
, (SELECT d.NewLevel FROM data d WHERE d.TransId = t.TransId AND d.rn_NewLevel = 1) AS NewLevel
, (SELECT d.NewValue FROM data d WHERE d.TransId = t.TransId AND d.rn_NewValue = 1) AS NewValue
FROM transIds t;
I'm using two CTE's. "data" contains all of the data along with a priority order for which row to use for each column of interest. "transIds" is just the distinct list of TransIds so the final result will have one row per Transaction Id in the original data set.
Note the use of the window's function in the data CTE:
, ROW_NUMBER() OVER(PARTITION BY TRANSID ORDER BY CASE WHEN PrimaryIdentifier IS NULL THEN 1 ELSE 0 END, TRANSSeq desc) AS rn_PrimaryIdentifier
The logic behind the windows function is to make it so the latest row with a non-null value in the respective column has a value of "1". Breaking it down:
ROWNUMBER(): gets a sequence of numbers
PARTITION BY TRANSID: restarts the sequence for each different TransId
ORDER BY CASE WHEN column IS NULL THEN 1 ELSE 0 END: Sort all nulls to the end prior to the sequence being applied.
(ORDER BY) TRANSSeq desc: Sort so the latest TransSeq is first.
In the final select, for each TransId I query the data table to get the latest non-null value for each column based on my previous window's function:
, (SELECT d.PrimaryIdentifier FROM data d WHERE d.TransId = t.TransId AND d.rn_PrimaryIdentifier = 1) AS PrimaryIdentifier
In your original question you asked to get both the original values and the latest values. I'm not sure that this makes sense. If you want an audit log of your own each time this changes, then you should just save the "current" row to an audit log table in your database prior to update. If you really want the first row from the original data set than I would suggest a union all combined with my query above. Just append this code to the above query:
UNION ALL
SELECT SyncId, TransId, PrimaryIdentifier, SecondaryIdentifier, NewPrimaryIdentifier, NewSecondaryIdentifier, NewLevel, NewValue
FROM #TestTable
WHERE TransSeq = 0
ORDER BY TransId;