Compare Temp table and Main table to find matches - sql

I have two tables #mtss table:
#mtss
( [MM],[YYYY],[month_Start],[month_Finish],[ProjectID],[ProjectedBillable],[ProjectedPayable],[ActualBilled],[ActualPaid],[Total_To_Bill],[Total_To_Pay])
tbl_Snapshot ([MM],[YYYY],[month_Start],[month_Finish],[ProjectID],[ProjectedBillable],[ProjectedPayable],[ActualBilled],[ActualPaid],[Total_To_Bill],[Total_To_Pay]
)
I need to compare two tables and find matches
if tbl_snapshot [MM] and [ProjectId] matches then delete record in tbl_snapshot and insert record from #mtts.
Thanks in advance.

Since you're on 2008, you can use MERGE to perform the as a single logical operation:
;merge into tbl_Snapshot s
using #mtss m on s.MM = m.MM and s.ProjectId = m.ProjectId
when matched then update
set
YYYY = m.YYYY,
month_Start = m.month_Start
/* Other columns as well, not going to type them all out */
;
This would also easily extend to other cases you may have to deal with, if the data only exists in one table and not the other, with addition match clauses.
Of course, thinking further, in this case a simple UPDATE would work also. A DELETE followed by an INSERT (where the deleted and inserted rows are related by a key) is the equivalent of an UPDATE.

Somethig like:
DELETE tbl_snapshot
FROM tbl_snapshot ss
INNER JOIN #mtss m ON m.MM = ss.MM AND m.ProjectId = ss.ProjectId
(I wrote the code directly to this editor so it might have errors, but it should give you an idea how to continue)

Related

Combine multiple updates with conditions, better merge?

A follow up question to SQL Server Merge: update only changed data, tracking changes?
we have been struggling to get an effective merge statement working, and are now thinking about only using updates, we have a very simple problem: Update Target from Source where values are different and record the changes, both tables are the same layout.
So, the two questions we have are: is it possible to combine this very simple update into a single statement?
UPDATE tbladsgroups
SET tbladsgroups.Description = s.Description,
tbladsgroups.action='Updated'
FROM tbladsgroups t
INNER JOIN tbladsgroups_staging s
ON t.SID = s.SID
Where s.Description <> t.Description
UPDATE tbladsgroups
SET tbladsgroups.DisplayName = s.DisplayName,
tbladsgroups.action='Updated'
FROM tbladsgroups t
INNER JOIN tbladsgroups_staging s
ON t.SID = s.SID
Where s.DisplayName <> t.DisplayName
....for each column.
Second question.
Can we record into a separate table/variable which record has been updated?
Merge would be perfect, however we cannot see which record is updated as the data returned from OUTPUT shows all rows, as the target is always updated.
edit complete merge:
M
ERGE tblADSGroups AS TARGET
USING tblADSGroups_STAGING AS SOURCE
ON (TARGET.[SID] = SOURCE.[SID])
WHEN MATCHED
THEN UPDATE SET
TARGET.[Description]=CASE
WHEN source.[Description] != target.[Description] THEN(source.[Description]
)
ELSE target.[Description] END,
TARGET.[displayname] = CASE
WHEN source.[displayname] != target.[displayname] THEN source.[displayname]
ELSE target.[displayname] END
...other columns cut for brevity
WHEN NOT MATCHED BY TARGET
THEN
INSERT (
[SID],[SamAccountName],[DisplayName],[Description],[DistinguishedName],[GroupCategory],[GroupScope],[Created],[Members],[MemberOf],[SYNCtimestamp],[Action]
)
VALUES (
source.[SID],[SamAccountName],[DisplayName],[Description],[DistinguishedName],[GroupCategory],[GroupScope],[Created],[Members],[MemberOf],[SYNCtimestamp],[Action]
)
WHEN NOT MATCHED BY SOURCE
THEN
UPDATE SET ACTION='Deleted'
You can use a single UPDATE with an OUTPUT clause, and use an INTERSECT or EXCEPT subquery in the join clause to check whether any columns have changed.
For example
UPDATE t
SET Description = s.Description,
DisplayName = s.DisplayName,
action = 'Updated'
OUTPUT inserted.ID, inserted.Description, inserted.DisplayName
INTO #tbl (ID, Description, DisplayName)
FROM tbladsgroups t
INNER JOIN tbladsgroups_staging s
ON t.SID = s.SID
AND NOT EXISTS (
SELECT s.Description, s.DisplayName
INTERSECT
SELECT t.Description, t.DisplayName
);
You can do a similar thing with MERGE, if you also want to INSERT
MERGE tbladsgroups t
USING tbladsgroups_staging s
ON t.SID = s.SID
WHEN MATCHED AND NOT EXISTS ( -- do NOT place this condition in the ON
SELECT s.Description, s.DisplayName
INTERSECT
SELECT t.Description, t.DisplayName
)
THEN UPDATE SET
Description = s.Description,
DisplayName = s.DisplayName,
action = 'Updated'
WHEN NOT MATCHED
THEN INSERT (ID, Description, DisplayName)
VALUES (s.ID, s.Description, s.DisplayName)
OUTPUT inserted.ID, inserted.Description, inserted.DisplayName
INTO #tbl (ID, Description, DisplayName)
;
We have similar needs when dealing with values in our Data Warehouse dimensions. Merge works fine, but can be inefficient for large tables. Your method would work, but also seems fairly inefficient in that you would have individual updates for every column. One way to shorten things would be to compare multiple columns in one statement (which obviously makes things more complex). You also do not seem to take NULL values into consideration.
What we ended up using is essentially the technique described on this page: https://sqlsunday.com/2016/07/14/comparing-nullable-columns/
Using INTERSECT allows you to easily (and quickly) compare differences between our staging and our dimension table, without having to explicitly write a comparison for each individual column.
To answer your second question, the technique above would not enable you to catch which column changed. However, you can compare the old row vs the new row (we "close" the earlier version of the row by setting a "ValidTo" date, and then add the new row with a "ValidFrom" date equal to today's date.
Our code ends up looking like the following:
INSERT all rows from the stage table that do not have a matching key value in the new table (new rows)
Compare stage vs dimension using the INTERSECT and store all matches in a table variable
Using the table variable, "close" all matching rows in the Dimension
Using the table variable, INSERT the new rows
If there's a full load taking place, we can also check for Keys that only exist in the dimension but not in the stage table. This would indicate those rows were deleted in the source system, and we mark them as "IsDeleted" in the dimension.
I think you may be overthinking the complexity, but yes. Your underlying update is a compare between the ads group and staging tables based on the matching ID in each query. Since you are already checking the join on ID and comparing for different description OR display name, just update both fields. Why?
groups description groups display staging description staging display
SomeValue Show Me SOME other Value Show Me
Try This Attempt Try This Working on it
Both Diff Changes Both Are Diff Change Me
So the ultimate value you want is to pull both description and display FROM the staging back to the ads groups table.
In the above sample, I have three samples which if based on matching ID present entries that would need to be changed. If the value is the same in one column, but not the other and you update both columns, the net effect is the one bad column that get updated. The first would ultimately remain the same. If both are different, both get updated anyhow.
UPDATE tbladsgroups
SET tbladsgroups.Description = s.Description,
tbladsgroups.DisplayName = s.DisplayName,
tbladsgroups.action='Updated'
FROM tbladsgroups t
INNER JOIN tbladsgroups_staging s
ON t.SID = s.SID
Where s.Description <> t.Description
OR s.DisplayName <> t.DisplayName
Now, all this resolution being said, you have redundant data and that is the whole point of a lookup table. The staging appears to always have the correct display name and description. Your tblAdsGroups should probably remove those two columns and always get them from staging to begin with... Something like..
select
t.*,
s.Description,
s.DisplayName
from
tblAdsGroups t
JOIN tblAdsGroups_Staging s
on t.sid = s.sid
Then you always have the correct description and display name and dont have to keep synching updates between them.

Request optimisation

I have two tables, on one there are all the races that the buses do
dbo.Courses_Bus
|ID|ID_Bus|ID_Line|DateHour_Start_Course|DateHour_End_Course|
On the other all payments made in these buses
dbo.Payments
|ID|ID_Bus|DateHour_Payment|
The goal is to add the notion of a Line in the payment table to get something like this
dbo.Payments
|ID|ID_Bus|DateHour_Payment|Line|
So I tried to do this :
/** I first added a Line column to the dbo.Payments table**/
UPDATE
Table_A
SET
Table_A.Line = Table_B.ID_Line
FROM
[dbo].[Payments] AS Table_A
INNER JOIN [dbo].[Courses_Bus] AS Table_B
ON Table_A.ID_Bus = Table_B.ID_Bus
AND Table_A.DateHour_Payment BETWEEN Table_B.DateHour_Start_Course AND Table_B.DateHour_End_Course
And this
UPDATE
Table_A
SET
Table_A.Line = Table_B.ID_Line
FROM
[dbo].[Payments] AS Table_A
INNER JOIN (
SELECT
P.*,
CP.ID_Line AS ID_Line
FROM
[dbo].[Payments] AS P
INNER JOIN [dbo].[Courses_Bus] CP ON CP.ID_Bus = P.ID_Bus
AND CP.DateHour_Start_Course <= P.Date
AND CP.DateHour_End_Course >= P.Date
) AS Table_B ON Table_A.ID_Bus = Table_B.ID_Bus
The main problem, apart from the fact that these requests do not seem to work properly, is that each table has several million lines that are increasing every day, and because of the datehour filter (mandatory since a single bus can be on several lines everyday) SSMS must compare each row of the second table to all rows of the other table.
So it takes an infinite amount of time, which will increase every day.
How can I make it work and optimise it ?
Assuming that this is the logic you want:
UPDATE p
SET p.Line = cb.ID_Line
FROM [dbo].[Payments] p JOIN
[dbo].[Courses_Bus] cb
ON p.ID_Bus = cb.ID_Bus AND
p.DateHour_Payment BETWEEN cb.DateHour_Start_Course AND cb.DateHour_End_Course;
To optimize this query, then you want an index on Courses_Bus(ID_Bus, DateHour_Start_Course, DateHour_End_Course).
There might be slightly more efficient ways to optimize the query, but your question doesn't have enough information -- is there always exactly one match, for instance?
Another big issue is that updating all the rows is quite expensive. You might find that it is better to do this in loops, one chunk at a time:
UPDATE TOP (10000) p
SET p.Line = cb.ID_Line
FROM [dbo].[Payments] p JOIN
[dbo].[Courses_Bus] cb
ON p.ID_Bus = cb.ID_Bus AND
p.DateHour_Payment BETWEEN cb.DateHour_Start_Course AND cb.DateHour_End_Course
WHERE p.Line IS NULL;
Once again, though, this structure depends on all the initial values being NULL and an exact match for all rows.
Thank you Gordon for your answer.
I have investigated and came with this query :
MERGE [dbo].[Payments] AS p
USING [dbo].[Courses_Bus] AS cb
ON p.ID_Bus= cb.ID_Bus AND
p.DateHour_Payment>= cb.DateHour_Start_Course AND
p.DateHour_Payment<= cb.DateHour_End_Course
WHEN MATCHED THEN
UPDATE SET p.Line = cb.ID_Ligne;
As it seems to be the most suitable in an MS-SQL environment.
It also came with the error :
The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.
I understood this to mean that it finds several lines with identical
[p.ID_Bus= cb.ID_Bus AND
p.DateHour_Payment >= cb.DateHour_Start_Course AND
p.DateHour_Payment <= cb.DateHour_End_Course]
Yes, this is a possible case, however the ID is different each time.
For example, if two blue cards are beeped at the same time, or if there is a loss of network and the equipment has been updated, thus putting the beeps at the same time. These are different lines that must be treated separately, and you can obtain for example:
|ID|ID_Bus|DateHour_Payments|Line|
----------------------------------
|56|204|2021-01-01 10:00:00|15|
----------------------------------
|82|204|2021-01-01 10:00:00|15|
How can I improve this query so that it takes into account different payment IDs?
I can't figure out how to do this with the help I find online. Maybe this method is not the right one in this context.

MERGE vs. UPDATE

I was trying to look for it online but couldn't find anything that will settle my doubts.
I want to figure out which one is better to use, when and why?
I know MERGE is usually used for an upsert, but there are some cases that a normal update with with subquery has to select twice from the table(one from a where clause).
E.G.:
MERGE INTO TableA s
USING (SELECT sd.dwh_key,sd.serial_number from TableA#to_devstg sd
where sd.dwh_key = s.dwh_key and sd.serial_number <> s.serial_number) t
ON(s.dwh_key = t.dwh_key)
WHEN MATCHED UPDATE SET s.serial_number = t.serial_number
In my case, i have to update a table with about 200mil records in one enviorment, based on the same table from another enviorment where change has happen on serial_number field. As you can see, it select onces from this huge table.
On the other hand, I can use an UPDATE STATEMENT like this:
UPDATE TableA s
SET s.serial_number = (SELECT t.serial_number
FROM TableA#to_Other t
WHERE t.dwh_serial_key = s.dwh_serial_key)
WHERE EXISTS (SELECT 1
FROM TableA#To_Other t
WHERE t.dwh_serial_key = s.dwh_serial_key
AND t.serial_number <> s.serial_number)
As you can see, this select from the huge table twice now. So, my question is, what is better? why?.. which cases one will be better than the other..
Thanks in advance.
I would first try to load all necessary data from remote DB to the temporary table and then work with that temporary table.
create global temporary table tmp_stage (
dwh_key <your_dwh_key_type#to_devstg>,
serial_number <your_serial_number_type##to_devstg>
) on commit preserve rows;
insert into tmp_stage
select dwh_key, serial_number
from TableA#to_devstg sd
where sd.dwh_key = s.dwh_key;
/* index (PK on dwh_key) your temporary table if necessary ...*/
update (select
src.dwh_key src_key,
tgt.dwh_key tgt_key,
src.serial_number src_serial_number,
tgt.serial_number tgt_serial_number
from tmp_stage src
join TableA tgt
on src.dwh_key = tgt.dwh_key
)
set src_serial_number = tgt_serial_number;

SQL joins before where clause when selecting columns?

This is kind of a weird one. This particular piece of bad database design has caught me out so many times and I've always had to make ridiculous work arounds and this is no exception. To summarize: I have 3 tables, the first one is a lookup table of questions, the second is a lookup table of answers and the third stores a questions and answer id to show which questions have been answered. So far straight forward.
However the answer can be 1 of 3 types: Free text, multiple choice or multiple selection and these are all stored in the same column (Answer). Free text can be anything, like 'Hello' or a datetime '2015-07-03 00:00:00'. Multiple choice gets stored as integers 1 or 49 etc and Multiple selection gets stored as a delimited string '1,4,7,8' (i know this is very bad design, a column shouldn't store more than 1 value however it is before my time and written into our aspx web application, as I work on my own I simply do not have the resource or time to change it)
Here comes the problem; take a look at this query:
Select *
FROM AnswersTable
JOIN LK_Questions
ON AnswersTable.QuestionID = LK_Questions.QuestionID
JOIN LK_Answers
ON AnswersTable.Answer = LK_Answers.AnswerID
Where LK_Questions.QuestionTypeID = 1
The where clause should ensure that the only questions that are returned are multiple choice. (So I am not joining a free text answer to an integer) and in fact when I run this query it runs ok but when i try to select individual columns it errors out with this error message:
Conversion failed when converting the varchar value ',879' to data type smallint.
It almost like it's doing the join before it does the where although I know the query optimizer doesn't work that way. The problem is I need to select column names as this is going into a table so I need to define the column names. ?Is there anything I can do? I've tried for ages but with no results. I should mention that I am running SQL Server 2005.
Many thanks in advance
EDIT:
This is the query that causes an error:
Select LK_Answers.Answer
FROM AnswersTable
JOIN LK_Questions
ON AnswersTable.QuestionID = LK_Questions.QuestionID
JOIN LK_Answers
ON AnswersTable.Answer = LK_Answers.AnswerID
Where LK_Questions.QuestionTypeID = 1
The issue in your query is that you have a datatype mismatch on the columns you're using to join your tables together.
One of the easiest ways to correct this is to explicitly CAST both sides of the join to VARCHAR so that there is no problem matching datatypes when joining. This isn't ideal, but if you're not able to change the table schema then you have to work around it.
SQL Fiddle Demo
CREATE TABLE LeftTable
(
id INT
);
CREATE TABLE RightTable
(
id VARCHAR(30)
);
INSERT INTO LeftTable (id)
VALUES (1), (2), (3), (4), (5);
INSERT INTO RightTable (id)
VALUES ('1'), ('2'), ('3'), ('4'), ('5'), (',879');
SELECT l.*, r.*
FROM LeftTable l
JOIN RightTable r ON CAST(l.id AS VARCHAR(30)) = CAST(r.id AS VARCHAR(30))
WHERE l.id = '1'
You can use a subselect for this.
Select *
FROM AnswersTable
JOIN ( SELECT * FROM LK_Questions Where QuestionTypeID = 1) as LK_Questions
ON AnswersTable.QuestionID = LK_Questions.QuestionID
JOIN LK_Answers
ON AnswersTable.Answer = LK_Answers.AnswerID
You are joining the ID field of the lookup table against the answer field of the answer table. Is this OK?
Pay attention to those fields;
AnswersTable.QuestionID, LK_Questions.QuestionID
AnswersTable.Answer , LK_Answers.AnswerID and LK_Questions.QuestionTypeID
are they sharing the same datatype?
if so, you should change your query like;
SELECT * FROM AnswersTable RIGHT JOIN LK_Questions ON LK_Questions.QuestionTypeID = AnswersTable.QuestionID JOIN LK_Answers ON AnswersTable.Answer = LK_Answers.AnswerID where LK_Questions.QuestionTypeID = 1

Determining best method to traverse a table and update another table

I am using Delphi 7, BDE, and Interbase (testing), Oracle (Production).
I have two tables (Master, Responses)
I need to step through the Responses table, use its Master_Id field to look it up in Master table (id) for matching record and update a date field in the Master table with a date field in the Responses table
Can this be done in SQL, or do i actually have to create two TTables or TQueries and step through each record?
Example:
Open two tables (Table1, Table2)
with Table1 do
begin
first;
while not EOF do
begin
//get master_id field
//locate in id field in table 2
//edit record in table 2
next;
end;
end;
thanks
One slight modification to Chris' query, throw in a where clause to select only the records that need the update. Otherwise it will set the rest of the dates to NULL
UPDATE Master m
SET
m.date = (SELECT r.date FROM Reponses r WHERE r.master_id = m.id)
WHERE m.id IN (SELECT master_id FROM Responses)
Updated to use aliases to avoid confusion which col comes from which table.
This is not ready made, copy-past'able query as UPDATE syntax differs from database to database.
You may need to consult your database sql reference for JOIN in UPDATE statement syntax.
When there are multiple responses to same master entry
UPDATE Master m
SET m.date = (
SELECT MAX(r.date) FROM Reponses r WHERE r.master_id = m.id)
WHERE m.id IN (SELECT master_id FROM Responses)
I used MAX() you can use whatever suits your business.
Again invest some time understanding SQL. Its hardly a few days effort. Get PLSQL Complete reference if you are into Oracle
Try this SQL (changing names to fit your situation)
UPDATE Master m
SET date = ( SELECT date FROM Responses WHERE id = m.id )