SQL update all fields based on Join Model

SQL update all fields based on Join Model - sql

I've got two tables in Postgres:
Sources [id, term, type]
Posts [id, source_id, message, term, type]
I'm de-normalizing this data, so I'm adding the term and type columns to each of the posts, and getting rid of the Sources table.
Is there a way to do a FAST query that update the Posts with each of their respective sources data (there are about 8 million posts).
Something like:
UPDATE posts
JOIN sources
ON posts.source_id = sources.id
SET post.term = sources.term,
posts.term_type = sources.term_type;
But that is throwing a syntax error for me.

The correct syntax in Postgres is:
UPDATE posts
SET posts.source = sources.source,
post.term = sources.term,
posts.term_type = sources.term_type
FROM sources
WHERE posts.source_id = sources.id;
Or, you can use a row constructor:
UPDATE posts
SET (source, term, term_type) = (select s.source, s.term, s.term_type
from source s
where posts.source_id = s.id
);

In postgres each update consist on one insert and one delete. So beside double work also have impact if the index are active.
If you want update the whole table usually is much faster just create the table with the new values
CREATE TABLE post2 AS
SELECT p.id, p.source_id, p.message, s.term, s.term_type.
FROM posts p
INNER JOIN source s
ON p.source_id = s.id;
Then use ALTER to rename the tablename and create the proper index.

Related

Combine multiple updates with conditions, better merge?

A follow up question to SQL Server Merge: update only changed data, tracking changes?
we have been struggling to get an effective merge statement working, and are now thinking about only using updates, we have a very simple problem: Update Target from Source where values are different and record the changes, both tables are the same layout.
So, the two questions we have are: is it possible to combine this very simple update into a single statement?
UPDATE tbladsgroups
SET tbladsgroups.Description = s.Description,
tbladsgroups.action='Updated'
FROM tbladsgroups t
INNER JOIN tbladsgroups_staging s
ON t.SID = s.SID
Where s.Description <> t.Description
UPDATE tbladsgroups
SET tbladsgroups.DisplayName = s.DisplayName,
tbladsgroups.action='Updated'
FROM tbladsgroups t
INNER JOIN tbladsgroups_staging s
ON t.SID = s.SID
Where s.DisplayName <> t.DisplayName
....for each column.
Second question.
Can we record into a separate table/variable which record has been updated?
Merge would be perfect, however we cannot see which record is updated as the data returned from OUTPUT shows all rows, as the target is always updated.
edit complete merge:
M
ERGE tblADSGroups AS TARGET
USING tblADSGroups_STAGING AS SOURCE
ON (TARGET.[SID] = SOURCE.[SID])
WHEN MATCHED
THEN UPDATE SET
TARGET.[Description]=CASE
WHEN source.[Description] != target.[Description] THEN(source.[Description]
)
ELSE target.[Description] END,
TARGET.[displayname] = CASE
WHEN source.[displayname] != target.[displayname] THEN source.[displayname]
ELSE target.[displayname] END
...other columns cut for brevity
WHEN NOT MATCHED BY TARGET
THEN
INSERT (
[SID],[SamAccountName],[DisplayName],[Description],[DistinguishedName],[GroupCategory],[GroupScope],[Created],[Members],[MemberOf],[SYNCtimestamp],[Action]
)
VALUES (
source.[SID],[SamAccountName],[DisplayName],[Description],[DistinguishedName],[GroupCategory],[GroupScope],[Created],[Members],[MemberOf],[SYNCtimestamp],[Action]
)
WHEN NOT MATCHED BY SOURCE
THEN
UPDATE SET ACTION='Deleted'

You can use a single UPDATE with an OUTPUT clause, and use an INTERSECT or EXCEPT subquery in the join clause to check whether any columns have changed.
For example
UPDATE t
SET Description = s.Description,
DisplayName = s.DisplayName,
action = 'Updated'
OUTPUT inserted.ID, inserted.Description, inserted.DisplayName
INTO #tbl (ID, Description, DisplayName)
FROM tbladsgroups t
INNER JOIN tbladsgroups_staging s
ON t.SID = s.SID
AND NOT EXISTS (
SELECT s.Description, s.DisplayName
INTERSECT
SELECT t.Description, t.DisplayName
);
You can do a similar thing with MERGE, if you also want to INSERT
MERGE tbladsgroups t
USING tbladsgroups_staging s
ON t.SID = s.SID
WHEN MATCHED AND NOT EXISTS ( -- do NOT place this condition in the ON
SELECT s.Description, s.DisplayName
INTERSECT
SELECT t.Description, t.DisplayName
)
THEN UPDATE SET
Description = s.Description,
DisplayName = s.DisplayName,
action = 'Updated'
WHEN NOT MATCHED
THEN INSERT (ID, Description, DisplayName)
VALUES (s.ID, s.Description, s.DisplayName)
OUTPUT inserted.ID, inserted.Description, inserted.DisplayName
INTO #tbl (ID, Description, DisplayName)
;

We have similar needs when dealing with values in our Data Warehouse dimensions. Merge works fine, but can be inefficient for large tables. Your method would work, but also seems fairly inefficient in that you would have individual updates for every column. One way to shorten things would be to compare multiple columns in one statement (which obviously makes things more complex). You also do not seem to take NULL values into consideration.
What we ended up using is essentially the technique described on this page: https://sqlsunday.com/2016/07/14/comparing-nullable-columns/
Using INTERSECT allows you to easily (and quickly) compare differences between our staging and our dimension table, without having to explicitly write a comparison for each individual column.
To answer your second question, the technique above would not enable you to catch which column changed. However, you can compare the old row vs the new row (we "close" the earlier version of the row by setting a "ValidTo" date, and then add the new row with a "ValidFrom" date equal to today's date.
Our code ends up looking like the following:
INSERT all rows from the stage table that do not have a matching key value in the new table (new rows)
Compare stage vs dimension using the INTERSECT and store all matches in a table variable
Using the table variable, "close" all matching rows in the Dimension
Using the table variable, INSERT the new rows
If there's a full load taking place, we can also check for Keys that only exist in the dimension but not in the stage table. This would indicate those rows were deleted in the source system, and we mark them as "IsDeleted" in the dimension.

I think you may be overthinking the complexity, but yes. Your underlying update is a compare between the ads group and staging tables based on the matching ID in each query. Since you are already checking the join on ID and comparing for different description OR display name, just update both fields. Why?
groups description groups display staging description staging display
SomeValue Show Me SOME other Value Show Me
Try This Attempt Try This Working on it
Both Diff Changes Both Are Diff Change Me
So the ultimate value you want is to pull both description and display FROM the staging back to the ads groups table.
In the above sample, I have three samples which if based on matching ID present entries that would need to be changed. If the value is the same in one column, but not the other and you update both columns, the net effect is the one bad column that get updated. The first would ultimately remain the same. If both are different, both get updated anyhow.
UPDATE tbladsgroups
SET tbladsgroups.Description = s.Description,
tbladsgroups.DisplayName = s.DisplayName,
tbladsgroups.action='Updated'
FROM tbladsgroups t
INNER JOIN tbladsgroups_staging s
ON t.SID = s.SID
Where s.Description <> t.Description
OR s.DisplayName <> t.DisplayName
Now, all this resolution being said, you have redundant data and that is the whole point of a lookup table. The staging appears to always have the correct display name and description. Your tblAdsGroups should probably remove those two columns and always get them from staging to begin with... Something like..
select
t.*,
s.Description,
s.DisplayName
from
tblAdsGroups t
JOIN tblAdsGroups_Staging s
on t.sid = s.sid
Then you always have the correct description and display name and dont have to keep synching updates between them.

Cannot update a table using a simple inner join

I have 2 tables in access 2007.
See attached picture to see the structure of the tables and the expected result.
I am trying to update the quantity field (ITQTY) in TABLE_BLNC by summarizing all the quantity field (LOCQTY) from TABLE_DTL for same items (LOITNBR=ITNBR).
In TABLE_BLNC, the item is unique while in TABLE_DTL, the item can be in multiple records.
My query is:
UPDATE TABLE_BLNC INNER JOIN
(
SELECT LOITNBR, Sum(LOCQTY) AS SumOfLOCQTY FROM TABLE_DTL GROUP BY LOITNBR) AS DTL
ON TABLE_BLNC.ITNBR=DTL.LOITNBR SET TABLE_BLNC.ITQTY = DTL.SumOfLOCQTY;
I am getting the error:
Operation must use an updateable query.

Domain Aggregate functions can be useful when Access complains that an UPDATE is not updateable. In this case, use DSum() ...
UPDATE TABLE_BLNC
SET ITQTY =
DSum("LOCQTY", "TABLE_DTL", "LOITNBR='" & ITNBR & "'");
Index TABLE_DTL.LOITNBR for optimum performance.

One of the great annoyances of Access SQL is its inability to update a table from an non-updatable source. Non-updatable sources include read-only links to ODBC tables, and GROUP BY (summary) queries.
What I always do is:
Copy the structure of TABLE_BLNK to a temp table: TABLE_BLNK_temp.
In your code, first delete the temp:
DELETE * FROM TABLE_BLNK_temp;
Insert the result of your summary query into temp:
INSERT INTO TABLE_BLNK_temp (ITNBR, ITQTY)
SELECT LOITNBR, Sum(LOCQTY) AS SumOfLOCQTY
FROM TABLE_DTL GROUP BY LOITNBR;
Update TABLE_BLNK from TABLE_BLNK_temp:
UPDATE TABLE_BLNC INNER JOIN TABLE_BLNK_temp AS t
ON TABLE_BLNC.ITNBR = t.ITNBR
SET TABLE_BLNC.ITQTY = t.ITQTY;
While it is an extra step or two, this approach:
Always works
Is more performant than Domain Aggregate functions for larger datasets

MERGE vs. UPDATE

I was trying to look for it online but couldn't find anything that will settle my doubts.
I want to figure out which one is better to use, when and why?
I know MERGE is usually used for an upsert, but there are some cases that a normal update with with subquery has to select twice from the table(one from a where clause).
E.G.:
MERGE INTO TableA s
USING (SELECT sd.dwh_key,sd.serial_number from TableA#to_devstg sd
where sd.dwh_key = s.dwh_key and sd.serial_number <> s.serial_number) t
ON(s.dwh_key = t.dwh_key)
WHEN MATCHED UPDATE SET s.serial_number = t.serial_number
In my case, i have to update a table with about 200mil records in one enviorment, based on the same table from another enviorment where change has happen on serial_number field. As you can see, it select onces from this huge table.
On the other hand, I can use an UPDATE STATEMENT like this:
UPDATE TableA s
SET s.serial_number = (SELECT t.serial_number
FROM TableA#to_Other t
WHERE t.dwh_serial_key = s.dwh_serial_key)
WHERE EXISTS (SELECT 1
FROM TableA#To_Other t
WHERE t.dwh_serial_key = s.dwh_serial_key
AND t.serial_number <> s.serial_number)
As you can see, this select from the huge table twice now. So, my question is, what is better? why?.. which cases one will be better than the other..
Thanks in advance.

I would first try to load all necessary data from remote DB to the temporary table and then work with that temporary table.
create global temporary table tmp_stage (
dwh_key <your_dwh_key_type#to_devstg>,
serial_number <your_serial_number_type##to_devstg>
) on commit preserve rows;
insert into tmp_stage
select dwh_key, serial_number
from TableA#to_devstg sd
where sd.dwh_key = s.dwh_key;
/* index (PK on dwh_key) your temporary table if necessary ...*/
update (select
src.dwh_key src_key,
tgt.dwh_key tgt_key,
src.serial_number src_serial_number,
tgt.serial_number tgt_serial_number
from tmp_stage src
join TableA tgt
on src.dwh_key = tgt.dwh_key
)
set src_serial_number = tgt_serial_number;

Adding new fields to an existing table, inserting data into proper position, then joining

Scenario One
I have two new fields that I want to add to a table called existingTable. After I add these fields, I can update SOME but NOT ALL records with data for those fields. There will be blank entries, and I am fine with this.
Problem One
I want to make sure that the CORRECT records are updated. The primary key for the existing table and the incoming data table is Email.
Proposed Solution One
An UPDATE query looking like this is the solution.
UPDATE existingTable
SET existingTable.newField1 = incomingDataTable.newField1, existingTable.newField2 = incomingDataTable.newField2
WHERE existingTable.Email = incomingDataTable.Email
What do you think?
Scenario Two
After the table is updated with the new fields & data in the proper records, I want to join this table with two other ones. I want ALL entries, even if some fields are blank, to be in this join. I don't want ANY records excluded.
By the way, each record in these tables has a 1-to-1 relationship with its partner in the other tables. There SHOULD NOT BE ANY duplicate records. In the past, I've seen Access use an INNER JOIN, which excludes records that do not have values for newField1 and newField2. This is not what I want.
Problem
I'm inexperienced at joining tables. The different joins are a bit confusing to me.
Proposed Solution
Does the join I use necessarily matter since the three to-be-joined tables should have a one-to-one relationship?
SELECT * FROM existingTable
FULL JOIN tableToJoinWith1, tableToJoinWith2
On existingTable.Email = tableToJoinWith1.Email, tableToJoinWith1.Email = tableToJoiNWith2.Email

Clarifying your Scenario 2. I'm assuming you mean you want all the rows from existingTable even if there is no match on the Email field with either of the other tables. In this case, a LEFT JOIN is what you want:
SELECT * FROM existingTable
LEFT JOIN tableToJoinWith1 ON existingTable.email = tableToJoinWith1.email
LEFT JOIN tableToJoinWith2 ON existingTable.email = tableToJoinWith2.email
For scenario 1, the problem is that you haven't given it any sort of SELECT for incomingDataTable. In standard SQL, to my knowledge, there's no nice way to do this that supports multiple columns. So it depends what database you're using. Some will let you do this:
UPDATE existingTable
SET newField1 = incomingDataTable.newField1, newField2 = incomingDataTable.newField2
FROM incomingDataTable
WHERE existingTable.Email = incomingDataTable.Email
But some won't. Others will allow this:
UPDATE (Select * FROM existingTable JOIN incomingDataTable
ON existingTable.Email = incomingDataTable.Email)
SET existingTable.newField1 = incomingDataTable.newField1,
existingTable.newField2 = incomingDataTable.newField2
If it were only a single column, you could do this which is totally standard:
UPDATE existingTable SET newField1 = (SELECT newField1 FROM incomingDataTable
WHERE existingTable.Email = incomingDataTable.Email)

Compare Temp table and Main table to find matches

I have two tables #mtss table:
#mtss
( [MM],[YYYY],[month_Start],[month_Finish],[ProjectID],[ProjectedBillable],[ProjectedPayable],[ActualBilled],[ActualPaid],[Total_To_Bill],[Total_To_Pay])
tbl_Snapshot ([MM],[YYYY],[month_Start],[month_Finish],[ProjectID],[ProjectedBillable],[ProjectedPayable],[ActualBilled],[ActualPaid],[Total_To_Bill],[Total_To_Pay]
)
I need to compare two tables and find matches
if tbl_snapshot [MM] and [ProjectId] matches then delete record in tbl_snapshot and insert record from #mtts.
Thanks in advance.

Since you're on 2008, you can use MERGE to perform the as a single logical operation:
;merge into tbl_Snapshot s
using #mtss m on s.MM = m.MM and s.ProjectId = m.ProjectId
when matched then update
set
YYYY = m.YYYY,
month_Start = m.month_Start
/* Other columns as well, not going to type them all out */
;
This would also easily extend to other cases you may have to deal with, if the data only exists in one table and not the other, with addition match clauses.
Of course, thinking further, in this case a simple UPDATE would work also. A DELETE followed by an INSERT (where the deleted and inserted rows are related by a key) is the equivalent of an UPDATE.

Somethig like:
DELETE tbl_snapshot
FROM tbl_snapshot ss
INNER JOIN #mtss m ON m.MM = ss.MM AND m.ProjectId = ss.ProjectId
(I wrote the code directly to this editor so it might have errors, but it should give you an idea how to continue)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas