Combine multiple updates with conditions, better merge? - sql

A follow up question to SQL Server Merge: update only changed data, tracking changes?
we have been struggling to get an effective merge statement working, and are now thinking about only using updates, we have a very simple problem: Update Target from Source where values are different and record the changes, both tables are the same layout.
So, the two questions we have are: is it possible to combine this very simple update into a single statement?
UPDATE tbladsgroups
SET tbladsgroups.Description = s.Description,
tbladsgroups.action='Updated'
FROM tbladsgroups t
INNER JOIN tbladsgroups_staging s
ON t.SID = s.SID
Where s.Description <> t.Description
UPDATE tbladsgroups
SET tbladsgroups.DisplayName = s.DisplayName,
tbladsgroups.action='Updated'
FROM tbladsgroups t
INNER JOIN tbladsgroups_staging s
ON t.SID = s.SID
Where s.DisplayName <> t.DisplayName
....for each column.
Second question.
Can we record into a separate table/variable which record has been updated?
Merge would be perfect, however we cannot see which record is updated as the data returned from OUTPUT shows all rows, as the target is always updated.
edit complete merge:
M
ERGE tblADSGroups AS TARGET
USING tblADSGroups_STAGING AS SOURCE
ON (TARGET.[SID] = SOURCE.[SID])
WHEN MATCHED
THEN UPDATE SET
TARGET.[Description]=CASE
WHEN source.[Description] != target.[Description] THEN(source.[Description]
)
ELSE target.[Description] END,
TARGET.[displayname] = CASE
WHEN source.[displayname] != target.[displayname] THEN source.[displayname]
ELSE target.[displayname] END
...other columns cut for brevity
WHEN NOT MATCHED BY TARGET
THEN
INSERT (
[SID],[SamAccountName],[DisplayName],[Description],[DistinguishedName],[GroupCategory],[GroupScope],[Created],[Members],[MemberOf],[SYNCtimestamp],[Action]
)
VALUES (
source.[SID],[SamAccountName],[DisplayName],[Description],[DistinguishedName],[GroupCategory],[GroupScope],[Created],[Members],[MemberOf],[SYNCtimestamp],[Action]
)
WHEN NOT MATCHED BY SOURCE
THEN
UPDATE SET ACTION='Deleted'

You can use a single UPDATE with an OUTPUT clause, and use an INTERSECT or EXCEPT subquery in the join clause to check whether any columns have changed.
For example
UPDATE t
SET Description = s.Description,
DisplayName = s.DisplayName,
action = 'Updated'
OUTPUT inserted.ID, inserted.Description, inserted.DisplayName
INTO #tbl (ID, Description, DisplayName)
FROM tbladsgroups t
INNER JOIN tbladsgroups_staging s
ON t.SID = s.SID
AND NOT EXISTS (
SELECT s.Description, s.DisplayName
INTERSECT
SELECT t.Description, t.DisplayName
);
You can do a similar thing with MERGE, if you also want to INSERT
MERGE tbladsgroups t
USING tbladsgroups_staging s
ON t.SID = s.SID
WHEN MATCHED AND NOT EXISTS ( -- do NOT place this condition in the ON
SELECT s.Description, s.DisplayName
INTERSECT
SELECT t.Description, t.DisplayName
)
THEN UPDATE SET
Description = s.Description,
DisplayName = s.DisplayName,
action = 'Updated'
WHEN NOT MATCHED
THEN INSERT (ID, Description, DisplayName)
VALUES (s.ID, s.Description, s.DisplayName)
OUTPUT inserted.ID, inserted.Description, inserted.DisplayName
INTO #tbl (ID, Description, DisplayName)
;

We have similar needs when dealing with values in our Data Warehouse dimensions. Merge works fine, but can be inefficient for large tables. Your method would work, but also seems fairly inefficient in that you would have individual updates for every column. One way to shorten things would be to compare multiple columns in one statement (which obviously makes things more complex). You also do not seem to take NULL values into consideration.
What we ended up using is essentially the technique described on this page: https://sqlsunday.com/2016/07/14/comparing-nullable-columns/
Using INTERSECT allows you to easily (and quickly) compare differences between our staging and our dimension table, without having to explicitly write a comparison for each individual column.
To answer your second question, the technique above would not enable you to catch which column changed. However, you can compare the old row vs the new row (we "close" the earlier version of the row by setting a "ValidTo" date, and then add the new row with a "ValidFrom" date equal to today's date.
Our code ends up looking like the following:
INSERT all rows from the stage table that do not have a matching key value in the new table (new rows)
Compare stage vs dimension using the INTERSECT and store all matches in a table variable
Using the table variable, "close" all matching rows in the Dimension
Using the table variable, INSERT the new rows
If there's a full load taking place, we can also check for Keys that only exist in the dimension but not in the stage table. This would indicate those rows were deleted in the source system, and we mark them as "IsDeleted" in the dimension.

I think you may be overthinking the complexity, but yes. Your underlying update is a compare between the ads group and staging tables based on the matching ID in each query. Since you are already checking the join on ID and comparing for different description OR display name, just update both fields. Why?
groups description groups display staging description staging display
SomeValue Show Me SOME other Value Show Me
Try This Attempt Try This Working on it
Both Diff Changes Both Are Diff Change Me
So the ultimate value you want is to pull both description and display FROM the staging back to the ads groups table.
In the above sample, I have three samples which if based on matching ID present entries that would need to be changed. If the value is the same in one column, but not the other and you update both columns, the net effect is the one bad column that get updated. The first would ultimately remain the same. If both are different, both get updated anyhow.
UPDATE tbladsgroups
SET tbladsgroups.Description = s.Description,
tbladsgroups.DisplayName = s.DisplayName,
tbladsgroups.action='Updated'
FROM tbladsgroups t
INNER JOIN tbladsgroups_staging s
ON t.SID = s.SID
Where s.Description <> t.Description
OR s.DisplayName <> t.DisplayName
Now, all this resolution being said, you have redundant data and that is the whole point of a lookup table. The staging appears to always have the correct display name and description. Your tblAdsGroups should probably remove those two columns and always get them from staging to begin with... Something like..
select
t.*,
s.Description,
s.DisplayName
from
tblAdsGroups t
JOIN tblAdsGroups_Staging s
on t.sid = s.sid
Then you always have the correct description and display name and dont have to keep synching updates between them.

Related

Request optimisation

I have two tables, on one there are all the races that the buses do
dbo.Courses_Bus
|ID|ID_Bus|ID_Line|DateHour_Start_Course|DateHour_End_Course|
On the other all payments made in these buses
dbo.Payments
|ID|ID_Bus|DateHour_Payment|
The goal is to add the notion of a Line in the payment table to get something like this
dbo.Payments
|ID|ID_Bus|DateHour_Payment|Line|
So I tried to do this :
/** I first added a Line column to the dbo.Payments table**/
UPDATE
Table_A
SET
Table_A.Line = Table_B.ID_Line
FROM
[dbo].[Payments] AS Table_A
INNER JOIN [dbo].[Courses_Bus] AS Table_B
ON Table_A.ID_Bus = Table_B.ID_Bus
AND Table_A.DateHour_Payment BETWEEN Table_B.DateHour_Start_Course AND Table_B.DateHour_End_Course
And this
UPDATE
Table_A
SET
Table_A.Line = Table_B.ID_Line
FROM
[dbo].[Payments] AS Table_A
INNER JOIN (
SELECT
P.*,
CP.ID_Line AS ID_Line
FROM
[dbo].[Payments] AS P
INNER JOIN [dbo].[Courses_Bus] CP ON CP.ID_Bus = P.ID_Bus
AND CP.DateHour_Start_Course <= P.Date
AND CP.DateHour_End_Course >= P.Date
) AS Table_B ON Table_A.ID_Bus = Table_B.ID_Bus
The main problem, apart from the fact that these requests do not seem to work properly, is that each table has several million lines that are increasing every day, and because of the datehour filter (mandatory since a single bus can be on several lines everyday) SSMS must compare each row of the second table to all rows of the other table.
So it takes an infinite amount of time, which will increase every day.
How can I make it work and optimise it ?
Assuming that this is the logic you want:
UPDATE p
SET p.Line = cb.ID_Line
FROM [dbo].[Payments] p JOIN
[dbo].[Courses_Bus] cb
ON p.ID_Bus = cb.ID_Bus AND
p.DateHour_Payment BETWEEN cb.DateHour_Start_Course AND cb.DateHour_End_Course;
To optimize this query, then you want an index on Courses_Bus(ID_Bus, DateHour_Start_Course, DateHour_End_Course).
There might be slightly more efficient ways to optimize the query, but your question doesn't have enough information -- is there always exactly one match, for instance?
Another big issue is that updating all the rows is quite expensive. You might find that it is better to do this in loops, one chunk at a time:
UPDATE TOP (10000) p
SET p.Line = cb.ID_Line
FROM [dbo].[Payments] p JOIN
[dbo].[Courses_Bus] cb
ON p.ID_Bus = cb.ID_Bus AND
p.DateHour_Payment BETWEEN cb.DateHour_Start_Course AND cb.DateHour_End_Course
WHERE p.Line IS NULL;
Once again, though, this structure depends on all the initial values being NULL and an exact match for all rows.
Thank you Gordon for your answer.
I have investigated and came with this query :
MERGE [dbo].[Payments] AS p
USING [dbo].[Courses_Bus] AS cb
ON p.ID_Bus= cb.ID_Bus AND
p.DateHour_Payment>= cb.DateHour_Start_Course AND
p.DateHour_Payment<= cb.DateHour_End_Course
WHEN MATCHED THEN
UPDATE SET p.Line = cb.ID_Ligne;
As it seems to be the most suitable in an MS-SQL environment.
It also came with the error :
The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.
I understood this to mean that it finds several lines with identical
[p.ID_Bus= cb.ID_Bus AND
p.DateHour_Payment >= cb.DateHour_Start_Course AND
p.DateHour_Payment <= cb.DateHour_End_Course]
Yes, this is a possible case, however the ID is different each time.
For example, if two blue cards are beeped at the same time, or if there is a loss of network and the equipment has been updated, thus putting the beeps at the same time. These are different lines that must be treated separately, and you can obtain for example:
|ID|ID_Bus|DateHour_Payments|Line|
----------------------------------
|56|204|2021-01-01 10:00:00|15|
----------------------------------
|82|204|2021-01-01 10:00:00|15|
How can I improve this query so that it takes into account different payment IDs?
I can't figure out how to do this with the help I find online. Maybe this method is not the right one in this context.

SQL: IS NULL fails when trying to update values

I am currently trying to update some table values and I am stuck with a particular instance.
The situation is that I have a main table DBO.MAIN_INTERACTIONS that I join with an external table. Based on the below WHERE values I would then like to update the main database values. KCMACustomer.DBO.DATA_EXT_GREEN_ENR is a table that is linked to one particular product and my idea was to select and update the table rows of MAIN that do not have a connection with this GREEN_ENR table. That's why I put the EXT.NUMBER_OF_ACCOUNTS IS NULL.
So the rows I am trying to retrieve will have no match on this JOIN condition 'MAIN.GENID = EXT.GENID'. Is this what makes my update query fail? and would there be a better way to get the rows that have no connection with KCMACustomer.DBO.DATA_EXT_GREEN_ENR (the main table values are all the same, so I can't differentiate there)
Extra info: the second query does work, probably because there is a succesful 'MAIN.GENID = EXT.GENID' join
UPDATE
MAIN
SET
STATE_CODE='S8',
STATE_NAME='Reminder 1',
OLD_STATE='S4',
MODIFIED_DT = #NOW
OUTPUT INSERTED.ID, 0, 'AUTO-STATE','business','Doc.expected -> Reminder 1',INSERTED.CAMPAIGNID, #NOW,'S4','S8' INTO KCMACustomer.DBO.DATA_EBW_FFC_LOG_STATES
FROM KCMACUSTOMER.DBO.MAIN_INTERACTIONS AS MAIN
JOIN KCMACustomer.DBO.DATA_EXT_GREEN_ENR AS EXT
ON MAIN.GENID = EXT.GENID
WHERE
MAIN.STATE_CODE='S4'
AND
MAIN.TYPE_DEMAND='S4'
AND
EXT.NUMBER_OF_ACCOUNTS IS NULL
AND
DATEDIFF(hh, MAIN.MODIFIED_DT, #NOW)>=168
AND
MAIN.PRODUCT IN (#HELLO4YOU, #COMFORT_PACK, #PREMIUM_PACK)
In this case the 'MAIN.GENID = EXT.GENID' will have a match and it does update the records I want
UPDATE
MAIN
SET
STATE_CODE='S8',
STATE_NAME='Reminder 1 eID',
OLD_STATE='S4',
MODIFIED_DT = #NOW
OUTPUT INSERTED.ID, 0, 'AUTO-STATE','business','Doc expected -> Reminder 1 eID',INSERTED.CAMPAIGNID, #NOW,'S4','S8' INTO KCMACustomer.DBO.DATA_EBW_FFC_LOG_STATES
FROM KCMACUSTOMER.DBO.MAIN_INTERACTIONS AS MAIN
JOIN KCMACustomer.DBO.DATA_EXT_GREEN_ENR AS EXT
ON MAIN.GENID = EXT.GENID
WHERE
MAIN.STATE_CODE = 'S4'
AND
DATEDIFF(hh, MAIN.MODIFIED_DT, #NOW)>=120
AND
MAIN.PRODUCT IN (#HELLO4YOU, #COMFORT_PACK, #PREMIUM_PACK)
AND
EXT.NUMBER_OF_ACCOUNTS IS NOT NULL
AND
MAIN.DEMAND_DT > '2020-05-27 00:00:00'
JOIN implies an INNER JOIN
The use case you described requires a LEFT JOIN.
You will need to change your query so that it uses
LEFT JOIN KCMACustomer.DBO.DATA_EXT_GREEN_ENR AS EXT
ON MAIN.GENID = EXT.GENID
I also suggest to test the NULL condition of the same column you use in the join condition. So instead of doing
EXT.NUMBER_OF_ACCOUNTS IS NULL
check if
EXT.GENID IS NULL
You are more familiar with your data so it might not have an impact on your query. But the record from MAIN could be linked to a record in EXT, but that record could have NUMBER_OF_ACCOUNTS NULL.
However, checking on the GENID of the EXT table would ensure that a link was not found when trying to find a matching record in main.

SQL Server - UPDATE data based on SELECT

I have written the following which returns a list of Buildings that have only one room, but the area of that room (fma0.area) is not equal to the area of the building (fmb0.nia)
select
rtrim(fma0.bldgcode) As bldgcode
from fma0
left join fmb0 on fma0.bldgcode = fmb0.bldgcode
where fma0.bldgcode in (
select fma0.bldgcode
from fma0
left join fmb0 on fma0.bldgcode = fmb0.bldgcode
where fmb0.bldgstatus = ''
group by fma0.bldgcode
having count(fma0.auto_key) = 1
)
and round(fma0.area,0) <> fmb0.nia
and fmb0.nia > 0
order by 1
I need to use this list of buildings to UPDATE several fields in the FMA0 table (FMA0.GROSS, FMA0.AREA, FMA0.RENTABLE) for each BLDGCODE with the value from FMB0.NIA for the same BLDGCODE
How do I convert this into an UPDATE statement that looks up the FMB0.NIA value for each BLDGCODE and updates the value in each field for the same BLDGCODE in the FMA0 table
Thanks
This seems like a much simpler way to get the buildings that you want:
select b.bldcode
from fmbo b join
(select r.bldgcode, max(r.area) as room_area
from fma0 r
group by r.bldgcode
having count(*) = 1
) r
on r.bldgcode = b.bldgcode and r.room_area <> b.nia;
The first subquery gets the area of rooms in buildings that have only one room. The join then simply combines them according to your rules.
This is readily turned into an update:
update b
set . . .
from fmbo b join
(select r.bldgcode, max(r.area) as room_area
from fma0 r
group by r.bldgcode
having count(*) = 1
) r
on r.bldgcode = b.bldgcode and r.room_area <> b.nia;
Under most circumstances, SQL updates are performed using direct references to a particular table (UPDATE books SET books.title = 'The Hobbit' WHERE books.id = 1). Yet, on occasion, it may prove beneficial to alter the contents of a table indirectly, by using a subset of data obtained from secondary query statement.
Performing an UPDATE using a secondary SELECT statement can be accomplished in one of two ways, primarily depending upon which version of SQL Server you are using. We’ll briefly explore both options so you can find what works best for you.
Using INNER JOINS
For all SQL Server installations, the most basic method of performing this action is to use an INNER JOIN, whereby values in the columns of two different tables are compared to one another.
UPDATE
books
SET
books.primary_author = authors.name
FROM
books
INNER JOIN
authors
ON
books.author_id = authors.id
WHERE
books.title = 'The Hobbit'
In the above example, we’re UPDATING the books.primary_author field to match the authors.name for ‘The Hobbit’ by JOINING both tables in the query to their respective, matching values of authors.id and books.author_id.
Using MERGE to UPDATE and INSERT Simultaneously
For SQL Server 2008 and newer, Microsoft introduced the exceptionally useful MERGE operation which is similar to the above INNER JOIN method, but MERGE attempts to perform both an UPDATE and an INSERT command together. This effectively synchronizes the two tables based on the query performed, updating and inserting records as necessary for the two to match.
MERGE INTO
books
USING
authors
ON
books.author_id = authors.id
WHEN MATCHED THEN
UPDATE SET
books.primary_author = authors.name
WHEN NOT MATCHED THEN
INSERT
(books.author_id, books.primary_author)
VALUES
(authors.id, authors.name)
The full query when using MERGE is certainly a bit more complex then that of a basic INNER JOIN, but once you grasp how the operation functions, you’ll quickly understand how powerful this capability can truly be.
The first few lines are rather self-explanatory:
MERGE INTO
books
USING
authors
ON
books.author_id = authors.id
We want to MERGE INTO (UPDATE/INSERT) the books table by using the secondary authors table, and we’re matching the two based on the same books.author_id = authors.id comparison.
Where the MERGE command differs is in the branching logic that follows.
WHEN MATCHED THEN
UPDATE SET
books.primary_author = authors.name
Here we’re asking SQL to perform an action only when records MATCHED – when an existing record is found. In that case, we perform a standard UPDATE just as we did before, setting the books.primary_author field to equal the authors.name field.
Finally, if the query discovers a matching comparative record that doesn’t exist, we instead perform an INSERT.
WHEN NOT MATCHED THEN
INSERT
(books.author_id, books.primary_author)
VALUES
(authors.id, authors.name)
Here we’re simply asking SQL to INSERT a new record into the books table and passing along the values for the author_id and primary_author fields, grabbed from the associated authors table record.
The end result of our MERGE statement is that for every author in the authors table, we verify whether a corresponding book exists in books. If a record is found, we ensure books.primary_author is set using UPDATE, and where no match is found, we add a new record to books.
With that, you should have a solid understanding of two different methods that can be used to UPDATE records in SQL by using secondary, comparative SELECT statements.

Adding new fields to an existing table, inserting data into proper position, then joining

Scenario One
I have two new fields that I want to add to a table called existingTable. After I add these fields, I can update SOME but NOT ALL records with data for those fields. There will be blank entries, and I am fine with this.
Problem One
I want to make sure that the CORRECT records are updated. The primary key for the existing table and the incoming data table is Email.
Proposed Solution One
An UPDATE query looking like this is the solution.
UPDATE existingTable
SET existingTable.newField1 = incomingDataTable.newField1, existingTable.newField2 = incomingDataTable.newField2
WHERE existingTable.Email = incomingDataTable.Email
What do you think?
Scenario Two
After the table is updated with the new fields & data in the proper records, I want to join this table with two other ones. I want ALL entries, even if some fields are blank, to be in this join. I don't want ANY records excluded.
By the way, each record in these tables has a 1-to-1 relationship with its partner in the other tables. There SHOULD NOT BE ANY duplicate records. In the past, I've seen Access use an INNER JOIN, which excludes records that do not have values for newField1 and newField2. This is not what I want.
Problem
I'm inexperienced at joining tables. The different joins are a bit confusing to me.
Proposed Solution
Does the join I use necessarily matter since the three to-be-joined tables should have a one-to-one relationship?
SELECT * FROM existingTable
FULL JOIN tableToJoinWith1, tableToJoinWith2
On existingTable.Email = tableToJoinWith1.Email, tableToJoinWith1.Email = tableToJoiNWith2.Email
Clarifying your Scenario 2. I'm assuming you mean you want all the rows from existingTable even if there is no match on the Email field with either of the other tables. In this case, a LEFT JOIN is what you want:
SELECT * FROM existingTable
LEFT JOIN tableToJoinWith1 ON existingTable.email = tableToJoinWith1.email
LEFT JOIN tableToJoinWith2 ON existingTable.email = tableToJoinWith2.email
For scenario 1, the problem is that you haven't given it any sort of SELECT for incomingDataTable. In standard SQL, to my knowledge, there's no nice way to do this that supports multiple columns. So it depends what database you're using. Some will let you do this:
UPDATE existingTable
SET newField1 = incomingDataTable.newField1, newField2 = incomingDataTable.newField2
FROM incomingDataTable
WHERE existingTable.Email = incomingDataTable.Email
But some won't. Others will allow this:
UPDATE (Select * FROM existingTable JOIN incomingDataTable
ON existingTable.Email = incomingDataTable.Email)
SET existingTable.newField1 = incomingDataTable.newField1,
existingTable.newField2 = incomingDataTable.newField2
If it were only a single column, you could do this which is totally standard:
UPDATE existingTable SET newField1 = (SELECT newField1 FROM incomingDataTable
WHERE existingTable.Email = incomingDataTable.Email)

Compare Temp table and Main table to find matches

I have two tables #mtss table:
#mtss
( [MM],[YYYY],[month_Start],[month_Finish],[ProjectID],[ProjectedBillable],[ProjectedPayable],[ActualBilled],[ActualPaid],[Total_To_Bill],[Total_To_Pay])
tbl_Snapshot ([MM],[YYYY],[month_Start],[month_Finish],[ProjectID],[ProjectedBillable],[ProjectedPayable],[ActualBilled],[ActualPaid],[Total_To_Bill],[Total_To_Pay]
)
I need to compare two tables and find matches
if tbl_snapshot [MM] and [ProjectId] matches then delete record in tbl_snapshot and insert record from #mtts.
Thanks in advance.
Since you're on 2008, you can use MERGE to perform the as a single logical operation:
;merge into tbl_Snapshot s
using #mtss m on s.MM = m.MM and s.ProjectId = m.ProjectId
when matched then update
set
YYYY = m.YYYY,
month_Start = m.month_Start
/* Other columns as well, not going to type them all out */
;
This would also easily extend to other cases you may have to deal with, if the data only exists in one table and not the other, with addition match clauses.
Of course, thinking further, in this case a simple UPDATE would work also. A DELETE followed by an INSERT (where the deleted and inserted rows are related by a key) is the equivalent of an UPDATE.
Somethig like:
DELETE tbl_snapshot
FROM tbl_snapshot ss
INNER JOIN #mtss m ON m.MM = ss.MM AND m.ProjectId = ss.ProjectId
(I wrote the code directly to this editor so it might have errors, but it should give you an idea how to continue)