SQL Server - UPDATE data based on SELECT

SQL Server - UPDATE data based on SELECT - sql

I have written the following which returns a list of Buildings that have only one room, but the area of that room (fma0.area) is not equal to the area of the building (fmb0.nia)
select
rtrim(fma0.bldgcode) As bldgcode
from fma0
left join fmb0 on fma0.bldgcode = fmb0.bldgcode
where fma0.bldgcode in (
select fma0.bldgcode
from fma0
left join fmb0 on fma0.bldgcode = fmb0.bldgcode
where fmb0.bldgstatus = ''
group by fma0.bldgcode
having count(fma0.auto_key) = 1
)
and round(fma0.area,0) <> fmb0.nia
and fmb0.nia > 0
order by 1
I need to use this list of buildings to UPDATE several fields in the FMA0 table (FMA0.GROSS, FMA0.AREA, FMA0.RENTABLE) for each BLDGCODE with the value from FMB0.NIA for the same BLDGCODE
How do I convert this into an UPDATE statement that looks up the FMB0.NIA value for each BLDGCODE and updates the value in each field for the same BLDGCODE in the FMA0 table
Thanks

This seems like a much simpler way to get the buildings that you want:
select b.bldcode
from fmbo b join
(select r.bldgcode, max(r.area) as room_area
from fma0 r
group by r.bldgcode
having count(*) = 1
) r
on r.bldgcode = b.bldgcode and r.room_area <> b.nia;
The first subquery gets the area of rooms in buildings that have only one room. The join then simply combines them according to your rules.
This is readily turned into an update:
update b
set . . .
from fmbo b join
(select r.bldgcode, max(r.area) as room_area
from fma0 r
group by r.bldgcode
having count(*) = 1
) r
on r.bldgcode = b.bldgcode and r.room_area <> b.nia;

Under most circumstances, SQL updates are performed using direct references to a particular table (UPDATE books SET books.title = 'The Hobbit' WHERE books.id = 1). Yet, on occasion, it may prove beneficial to alter the contents of a table indirectly, by using a subset of data obtained from secondary query statement.
Performing an UPDATE using a secondary SELECT statement can be accomplished in one of two ways, primarily depending upon which version of SQL Server you are using. We’ll briefly explore both options so you can find what works best for you.
Using INNER JOINS
For all SQL Server installations, the most basic method of performing this action is to use an INNER JOIN, whereby values in the columns of two different tables are compared to one another.
UPDATE
books
SET
books.primary_author = authors.name
FROM
books
INNER JOIN
authors
ON
books.author_id = authors.id
WHERE
books.title = 'The Hobbit'
In the above example, we’re UPDATING the books.primary_author field to match the authors.name for ‘The Hobbit’ by JOINING both tables in the query to their respective, matching values of authors.id and books.author_id.
Using MERGE to UPDATE and INSERT Simultaneously
For SQL Server 2008 and newer, Microsoft introduced the exceptionally useful MERGE operation which is similar to the above INNER JOIN method, but MERGE attempts to perform both an UPDATE and an INSERT command together. This effectively synchronizes the two tables based on the query performed, updating and inserting records as necessary for the two to match.
MERGE INTO
books
USING
authors
ON
books.author_id = authors.id
WHEN MATCHED THEN
UPDATE SET
books.primary_author = authors.name
WHEN NOT MATCHED THEN
INSERT
(books.author_id, books.primary_author)
VALUES
(authors.id, authors.name)
The full query when using MERGE is certainly a bit more complex then that of a basic INNER JOIN, but once you grasp how the operation functions, you’ll quickly understand how powerful this capability can truly be.
The first few lines are rather self-explanatory:
MERGE INTO
books
USING
authors
ON
books.author_id = authors.id
We want to MERGE INTO (UPDATE/INSERT) the books table by using the secondary authors table, and we’re matching the two based on the same books.author_id = authors.id comparison.
Where the MERGE command differs is in the branching logic that follows.
WHEN MATCHED THEN
UPDATE SET
books.primary_author = authors.name
Here we’re asking SQL to perform an action only when records MATCHED – when an existing record is found. In that case, we perform a standard UPDATE just as we did before, setting the books.primary_author field to equal the authors.name field.
Finally, if the query discovers a matching comparative record that doesn’t exist, we instead perform an INSERT.
WHEN NOT MATCHED THEN
INSERT
(books.author_id, books.primary_author)
VALUES
(authors.id, authors.name)
Here we’re simply asking SQL to INSERT a new record into the books table and passing along the values for the author_id and primary_author fields, grabbed from the associated authors table record.
The end result of our MERGE statement is that for every author in the authors table, we verify whether a corresponding book exists in books. If a record is found, we ensure books.primary_author is set using UPDATE, and where no match is found, we add a new record to books.
With that, you should have a solid understanding of two different methods that can be used to UPDATE records in SQL by using secondary, comparative SELECT statements.

Related

Combine multiple updates with conditions, better merge?

A follow up question to SQL Server Merge: update only changed data, tracking changes?
we have been struggling to get an effective merge statement working, and are now thinking about only using updates, we have a very simple problem: Update Target from Source where values are different and record the changes, both tables are the same layout.
So, the two questions we have are: is it possible to combine this very simple update into a single statement?
UPDATE tbladsgroups
SET tbladsgroups.Description = s.Description,
tbladsgroups.action='Updated'
FROM tbladsgroups t
INNER JOIN tbladsgroups_staging s
ON t.SID = s.SID
Where s.Description <> t.Description
UPDATE tbladsgroups
SET tbladsgroups.DisplayName = s.DisplayName,
tbladsgroups.action='Updated'
FROM tbladsgroups t
INNER JOIN tbladsgroups_staging s
ON t.SID = s.SID
Where s.DisplayName <> t.DisplayName
....for each column.
Second question.
Can we record into a separate table/variable which record has been updated?
Merge would be perfect, however we cannot see which record is updated as the data returned from OUTPUT shows all rows, as the target is always updated.
edit complete merge:
M
ERGE tblADSGroups AS TARGET
USING tblADSGroups_STAGING AS SOURCE
ON (TARGET.[SID] = SOURCE.[SID])
WHEN MATCHED
THEN UPDATE SET
TARGET.[Description]=CASE
WHEN source.[Description] != target.[Description] THEN(source.[Description]
)
ELSE target.[Description] END,
TARGET.[displayname] = CASE
WHEN source.[displayname] != target.[displayname] THEN source.[displayname]
ELSE target.[displayname] END
...other columns cut for brevity
WHEN NOT MATCHED BY TARGET
THEN
INSERT (
[SID],[SamAccountName],[DisplayName],[Description],[DistinguishedName],[GroupCategory],[GroupScope],[Created],[Members],[MemberOf],[SYNCtimestamp],[Action]
)
VALUES (
source.[SID],[SamAccountName],[DisplayName],[Description],[DistinguishedName],[GroupCategory],[GroupScope],[Created],[Members],[MemberOf],[SYNCtimestamp],[Action]
)
WHEN NOT MATCHED BY SOURCE
THEN
UPDATE SET ACTION='Deleted'

You can use a single UPDATE with an OUTPUT clause, and use an INTERSECT or EXCEPT subquery in the join clause to check whether any columns have changed.
For example
UPDATE t
SET Description = s.Description,
DisplayName = s.DisplayName,
action = 'Updated'
OUTPUT inserted.ID, inserted.Description, inserted.DisplayName
INTO #tbl (ID, Description, DisplayName)
FROM tbladsgroups t
INNER JOIN tbladsgroups_staging s
ON t.SID = s.SID
AND NOT EXISTS (
SELECT s.Description, s.DisplayName
INTERSECT
SELECT t.Description, t.DisplayName
);
You can do a similar thing with MERGE, if you also want to INSERT
MERGE tbladsgroups t
USING tbladsgroups_staging s
ON t.SID = s.SID
WHEN MATCHED AND NOT EXISTS ( -- do NOT place this condition in the ON
SELECT s.Description, s.DisplayName
INTERSECT
SELECT t.Description, t.DisplayName
)
THEN UPDATE SET
Description = s.Description,
DisplayName = s.DisplayName,
action = 'Updated'
WHEN NOT MATCHED
THEN INSERT (ID, Description, DisplayName)
VALUES (s.ID, s.Description, s.DisplayName)
OUTPUT inserted.ID, inserted.Description, inserted.DisplayName
INTO #tbl (ID, Description, DisplayName)
;

We have similar needs when dealing with values in our Data Warehouse dimensions. Merge works fine, but can be inefficient for large tables. Your method would work, but also seems fairly inefficient in that you would have individual updates for every column. One way to shorten things would be to compare multiple columns in one statement (which obviously makes things more complex). You also do not seem to take NULL values into consideration.
What we ended up using is essentially the technique described on this page: https://sqlsunday.com/2016/07/14/comparing-nullable-columns/
Using INTERSECT allows you to easily (and quickly) compare differences between our staging and our dimension table, without having to explicitly write a comparison for each individual column.
To answer your second question, the technique above would not enable you to catch which column changed. However, you can compare the old row vs the new row (we "close" the earlier version of the row by setting a "ValidTo" date, and then add the new row with a "ValidFrom" date equal to today's date.
Our code ends up looking like the following:
INSERT all rows from the stage table that do not have a matching key value in the new table (new rows)
Compare stage vs dimension using the INTERSECT and store all matches in a table variable
Using the table variable, "close" all matching rows in the Dimension
Using the table variable, INSERT the new rows
If there's a full load taking place, we can also check for Keys that only exist in the dimension but not in the stage table. This would indicate those rows were deleted in the source system, and we mark them as "IsDeleted" in the dimension.

I think you may be overthinking the complexity, but yes. Your underlying update is a compare between the ads group and staging tables based on the matching ID in each query. Since you are already checking the join on ID and comparing for different description OR display name, just update both fields. Why?
groups description groups display staging description staging display
SomeValue Show Me SOME other Value Show Me
Try This Attempt Try This Working on it
Both Diff Changes Both Are Diff Change Me
So the ultimate value you want is to pull both description and display FROM the staging back to the ads groups table.
In the above sample, I have three samples which if based on matching ID present entries that would need to be changed. If the value is the same in one column, but not the other and you update both columns, the net effect is the one bad column that get updated. The first would ultimately remain the same. If both are different, both get updated anyhow.
UPDATE tbladsgroups
SET tbladsgroups.Description = s.Description,
tbladsgroups.DisplayName = s.DisplayName,
tbladsgroups.action='Updated'
FROM tbladsgroups t
INNER JOIN tbladsgroups_staging s
ON t.SID = s.SID
Where s.Description <> t.Description
OR s.DisplayName <> t.DisplayName
Now, all this resolution being said, you have redundant data and that is the whole point of a lookup table. The staging appears to always have the correct display name and description. Your tblAdsGroups should probably remove those two columns and always get them from staging to begin with... Something like..
select
t.*,
s.Description,
s.DisplayName
from
tblAdsGroups t
JOIN tblAdsGroups_Staging s
on t.sid = s.sid
Then you always have the correct description and display name and dont have to keep synching updates between them.

Abap subquery Where Cond [duplicate]

I have a requirement to pull records, that do not have history in an archive table. 2 Fields of 1 record need to be checked for in the archive.
In technical sense my requirement is a left join where right side is 'null' (a.k.a. an excluding join), which in abap openSQL is commonly implemented like this (for my scenario anyways):
Select * from xxxx //xxxx is a result for a multiple table join
where xxxx~key not in (select key from archive_table where [conditions] )
and xxxx~foreign_key not in (select key from archive_table where [conditions] )
Those 2 fields are also checked against 2 more tables, so that would mean a total of 6 subqueries.
Database engines that I have worked with previously usually had some methods to deal with such problems (such as excluding join or outer apply).
For this particular case I will be trying to use ABAP logic with 'for all entries', but I would still like to know if it is possible to use results of a sub-query to check more than than 1 field or use another form of excluding join logic on multiple fields using SQL (without involving application server).

I have tested quite a few variations of sub-queries in the life-cycle of the program I was making. NOT EXISTS with multiple field check (shortened example below) to exclude based on 2 keys works in certain cases.
Performance acceptable (processing time is about 5 seconds), although, it's noticeably slower than the same query when excluding based on 1 field.
Select * from xxxx //xxxx is a result for a multiple table inner joins and 1 left join ( 1-* relation )
where NOT EXISTS (
select key from archive_table
where key = xxxx~key OR key = XXXX-foreign_key
)
EDIT:
With changing requirements (for more filtering) a lot has changed, so I figured I would update this. The construct I marked as XXXX in my example contained a single left join ( where main to secondary table relation is 1-* ) and it appeared relatively fast.
This is where context becomes helpful for understanding the problem:
Initial requirement: pull all vendors, without financial records in 3
tables.
Additional requirements: also exclude based on alternative
payers (1-* relationship). This is what example above is based on.
More requirements: also exclude based on alternative payee (*-* relationship between payer and payee).
Many-to-many join exponentially increased the record count within the construct I labeled XXXX, which in turn produces a lot of unnecessary work. For instance: a single customer with 3 payers, and 3 payees produced 9 rows, with a total of 27 fields to check (3 per row), when in reality there are only 7 unique values.
At this point, moving left-joined tables from main query into sub-queries and splitting them gave significantly better performance.
than any smarter looking alternatives.
select * from lfa1 inner join lfb1
where
( lfa1~lifnr not in ( select lifnr from bsik where bsik~lifnr = lfa1~lifnr )
and lfa1~lifnr not in ( select wyt3~lifnr from wyt3 inner join t024e on wyt3~ekorg = t024e~ekorg and wyt3~lifnr <> wyt3~lifn2
inner join bsik on bsik~lifnr = wyt3~lifn2 where wyt3~lifnr = lfa1~lifnr and t024e~bukrs = lfb1~bukrs )
and lfa1~lifnr not in ( select lfza~lifnr from lfza inner join bsik on bsik~lifnr = lfza~empfk where lfza~lifnr = lfa1~lifnr )
)
and [3 more sets of sub queries like the 3 above, just checking different tables].
My Conclusion:
When exclusion is based on a single field, both not in/not exits work. One might be better than the other, depending on filters you use.
When exclusion is based on 2 or more fields and you don't have many-to-many join in main query, not exists ( select .. from table where id = a.id or id = b.id or... ) appears to be the best.
The moment your exclusion criteria implements a many-to-many relationship within your main query, I would recommend looking for an optimal way to implement multiple sub-queries instead (even having a sub-query for each key-table combination will perform better than a many-to-many join with 1 good sub-query, that looks good).
Anyways, any additional insight into this is welcome.
EDIT2: Although it's slightly off topic, given how my question was about sub-queries, I figured I would post an update. After over a year I had to revisit the solution I worked on to expand it. I learned that proper excluding join works. I just failed horribly at implementing it the first time.
select header~key
from headers left join items on headers~key = items~key
where items~key is null

if it is possible to use results of a sub-query to check more than
than 1 field or use another form of excluding join logic on multiple
fields
No, it is not possible to check two columns in subquery, as SAP Help clearly says:
The clauses in the subquery subquery_clauses must constitute a scalar
subquery.
Scalar is keyword here, i.e. it should return exactly one column.
Your subquery can have multi-column key, and such syntax is completely legit:
SELECT planetype, seatsmax
FROM saplane AS plane
WHERE seatsmax < #wa-seatsmax AND
seatsmax >= ALL ( SELECT seatsocc
FROM sflight
WHERE carrid = #wa-carrid AND
connid = #wa-connid )
however you say that these two fields should be checked against different tables
Those 2 fields are also checked against two more tables
so it's not the case for you. Your only choice seems to be multi-join.
P.S. FOR ALL ENTRIES does not support negation logic, you cannot just use some sort of NOT IN FOR ALL ENTRIES, it won't be that easy.

Request optimisation

I have two tables, on one there are all the races that the buses do
dbo.Courses_Bus
|ID|ID_Bus|ID_Line|DateHour_Start_Course|DateHour_End_Course|
On the other all payments made in these buses
dbo.Payments
|ID|ID_Bus|DateHour_Payment|
The goal is to add the notion of a Line in the payment table to get something like this
dbo.Payments
|ID|ID_Bus|DateHour_Payment|Line|
So I tried to do this :
/** I first added a Line column to the dbo.Payments table**/
UPDATE
Table_A
SET
Table_A.Line = Table_B.ID_Line
FROM
[dbo].[Payments] AS Table_A
INNER JOIN [dbo].[Courses_Bus] AS Table_B
ON Table_A.ID_Bus = Table_B.ID_Bus
AND Table_A.DateHour_Payment BETWEEN Table_B.DateHour_Start_Course AND Table_B.DateHour_End_Course
And this
UPDATE
Table_A
SET
Table_A.Line = Table_B.ID_Line
FROM
[dbo].[Payments] AS Table_A
INNER JOIN (
SELECT
P.*,
CP.ID_Line AS ID_Line
FROM
[dbo].[Payments] AS P
INNER JOIN [dbo].[Courses_Bus] CP ON CP.ID_Bus = P.ID_Bus
AND CP.DateHour_Start_Course <= P.Date
AND CP.DateHour_End_Course >= P.Date
) AS Table_B ON Table_A.ID_Bus = Table_B.ID_Bus
The main problem, apart from the fact that these requests do not seem to work properly, is that each table has several million lines that are increasing every day, and because of the datehour filter (mandatory since a single bus can be on several lines everyday) SSMS must compare each row of the second table to all rows of the other table.
So it takes an infinite amount of time, which will increase every day.
How can I make it work and optimise it ?

Assuming that this is the logic you want:
UPDATE p
SET p.Line = cb.ID_Line
FROM [dbo].[Payments] p JOIN
[dbo].[Courses_Bus] cb
ON p.ID_Bus = cb.ID_Bus AND
p.DateHour_Payment BETWEEN cb.DateHour_Start_Course AND cb.DateHour_End_Course;
To optimize this query, then you want an index on Courses_Bus(ID_Bus, DateHour_Start_Course, DateHour_End_Course).
There might be slightly more efficient ways to optimize the query, but your question doesn't have enough information -- is there always exactly one match, for instance?
Another big issue is that updating all the rows is quite expensive. You might find that it is better to do this in loops, one chunk at a time:
UPDATE TOP (10000) p
SET p.Line = cb.ID_Line
FROM [dbo].[Payments] p JOIN
[dbo].[Courses_Bus] cb
ON p.ID_Bus = cb.ID_Bus AND
p.DateHour_Payment BETWEEN cb.DateHour_Start_Course AND cb.DateHour_End_Course
WHERE p.Line IS NULL;
Once again, though, this structure depends on all the initial values being NULL and an exact match for all rows.

Thank you Gordon for your answer.
I have investigated and came with this query :
MERGE [dbo].[Payments] AS p
USING [dbo].[Courses_Bus] AS cb
ON p.ID_Bus= cb.ID_Bus AND
p.DateHour_Payment>= cb.DateHour_Start_Course AND
p.DateHour_Payment<= cb.DateHour_End_Course
WHEN MATCHED THEN
UPDATE SET p.Line = cb.ID_Ligne;
As it seems to be the most suitable in an MS-SQL environment.
It also came with the error :
The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.
I understood this to mean that it finds several lines with identical
[p.ID_Bus= cb.ID_Bus AND
p.DateHour_Payment >= cb.DateHour_Start_Course AND
p.DateHour_Payment <= cb.DateHour_End_Course]
Yes, this is a possible case, however the ID is different each time.
For example, if two blue cards are beeped at the same time, or if there is a loss of network and the equipment has been updated, thus putting the beeps at the same time. These are different lines that must be treated separately, and you can obtain for example:
|ID|ID_Bus|DateHour_Payments|Line|
----------------------------------
|56|204|2021-01-01 10:00:00|15|
----------------------------------
|82|204|2021-01-01 10:00:00|15|
How can I improve this query so that it takes into account different payment IDs?
I can't figure out how to do this with the help I find online. Maybe this method is not the right one in this context.

SQL Update using value from join table

I tried using this sql to update a new column in a table from a value in table that already exists.
update "PROMOTION" t1
set "OFFER_CHAIN_ID" = poc."OFFER_CHAIN_ID"
from "PROMOTION_OFFER_CHAIN" poc
inner join "PROMOTION" on "PROMOTION"."ID" = poc."PROMOTION_ID"
What happened is that the first value of the join got replicated in all the subsequent entries. about a both tables. The original table has unique values all the values in the updated column are the same.
Eventually I used this SQL instead.
update "PROMOTION" t1
set "OFFER_CHAIN_ID" = poc."OFFER_CHAIN_ID"
from "PROMOTION_OFFER_CHAIN" poc
where
t1."ID" = poc."PROMOTION_ID"
This update works and duplicates all the data, 1000 unique elements in the original table, 1000 unique elements in the updated table.
Is this a bug, or is this the expected result?

SQL is behaving correctly. Your original query is:
update "PROMOTION" t1
--------^
set "OFFER_CHAIN_ID" = poc."OFFER_CHAIN_ID"
from "PROMOTION_OFFER_CHAIN" poc inner join
"PROMOTION"
-----------^
on "PROMOTION"."ID" = poc."PROMOTION_ID"
Note that the table PROMOTION is mentioned twice. Not good. So, the join takes place, producing lots of rows. Then there is no correlation to the t1 version of the table.
You don't mention the database you are using. In SQL Server, you would just do:
update p
set "OFFER_CHAIN_ID" = poc."OFFER_CHAIN_ID"
from "PROMOTION_OFFER_CHAIN" poc inner join
"PROMOTION" p
on p."ID" = poc."PROMOTION_ID";
Note the alias is used after the update (or table name with if there is no alias). Now the table is mentioned only once, so the update should behave as desired.

Adding new fields to an existing table, inserting data into proper position, then joining

Scenario One
I have two new fields that I want to add to a table called existingTable. After I add these fields, I can update SOME but NOT ALL records with data for those fields. There will be blank entries, and I am fine with this.
Problem One
I want to make sure that the CORRECT records are updated. The primary key for the existing table and the incoming data table is Email.
Proposed Solution One
An UPDATE query looking like this is the solution.
UPDATE existingTable
SET existingTable.newField1 = incomingDataTable.newField1, existingTable.newField2 = incomingDataTable.newField2
WHERE existingTable.Email = incomingDataTable.Email
What do you think?
Scenario Two
After the table is updated with the new fields & data in the proper records, I want to join this table with two other ones. I want ALL entries, even if some fields are blank, to be in this join. I don't want ANY records excluded.
By the way, each record in these tables has a 1-to-1 relationship with its partner in the other tables. There SHOULD NOT BE ANY duplicate records. In the past, I've seen Access use an INNER JOIN, which excludes records that do not have values for newField1 and newField2. This is not what I want.
Problem
I'm inexperienced at joining tables. The different joins are a bit confusing to me.
Proposed Solution
Does the join I use necessarily matter since the three to-be-joined tables should have a one-to-one relationship?
SELECT * FROM existingTable
FULL JOIN tableToJoinWith1, tableToJoinWith2
On existingTable.Email = tableToJoinWith1.Email, tableToJoinWith1.Email = tableToJoiNWith2.Email

Clarifying your Scenario 2. I'm assuming you mean you want all the rows from existingTable even if there is no match on the Email field with either of the other tables. In this case, a LEFT JOIN is what you want:
SELECT * FROM existingTable
LEFT JOIN tableToJoinWith1 ON existingTable.email = tableToJoinWith1.email
LEFT JOIN tableToJoinWith2 ON existingTable.email = tableToJoinWith2.email
For scenario 1, the problem is that you haven't given it any sort of SELECT for incomingDataTable. In standard SQL, to my knowledge, there's no nice way to do this that supports multiple columns. So it depends what database you're using. Some will let you do this:
UPDATE existingTable
SET newField1 = incomingDataTable.newField1, newField2 = incomingDataTable.newField2
FROM incomingDataTable
WHERE existingTable.Email = incomingDataTable.Email
But some won't. Others will allow this:
UPDATE (Select * FROM existingTable JOIN incomingDataTable
ON existingTable.Email = incomingDataTable.Email)
SET existingTable.newField1 = incomingDataTable.newField1,
existingTable.newField2 = incomingDataTable.newField2
If it were only a single column, you could do this which is totally standard:
UPDATE existingTable SET newField1 = (SELECT newField1 FROM incomingDataTable
WHERE existingTable.Email = incomingDataTable.Email)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas