Oracle updating table based on a join - sql

I have two tables
table 1 : rm_example(customer, weekno, salenum, card_type,...., imputed)
table 2 : rm_dummy(customer, weekno, imputed)
The imputed column in table one is null(all columns).
I want to set "imputed" column in table 1 with the value of "imputed" in table two where customer and weekno match....
below the query I wrote.....but it is taking forever to execute...
update rm_example e
set e.imputed =
(select imputed
from rm_dummy d
inner join rm_example e on e.customer=d.customer and e.weekno=d.weekno)...
Is something wrong with the query?
I am working on remote database using sqldeveloperplus...and we are talking about million rows.

MERGE is usually quite a bit faster than an UPDATE with a subquery (the syntax might seem a little bit weird, but you'll get used to it); this assumes rm_example has a primary key column PK:
MERGE INTO rm_example target
USING
(SELECT e.pk as e_pk,
d.imputed
FROM rm_dummy d
INNER JOIN rm_example e ON e.customer=d.customer AND e.weekno=d.weekno) src
ON (target.pk = src.e_pk)
WHEN MATCHED THEN UPDATE
SET target.imputed = src.imputed;

Not sure if it will be faster than what you have done already but you try this
UPDATE
(SELECT e.imputed, d.imputed
FROM rm_example e
INNER JOIN rm_dummy d ON e.customer = d.customer AND e.weekno = d.weekno)
SET e.imputed = d.imputed;
After reading 8 Bulk Update Methods Compared (Oracle) I see this is really a Deprecated method that a MERGE syntax should be used. But in saying that depending on your system this could possibly have better performance

Related

What if the column to be indexed is nvarchar data type in SQL Server?

I retrieve data by joining multiple tables as indicated on the image below. On the other hand, as there is no data in the FK column (EmployeeID) of Event table, I have to use CardNo (nvarchar) fields in order to join the two tables. On the other hand, the digit numbers of CardNo fields in the Event and Employee tables are different, I also have to use RIGHT function of SQL Server and this makes the query to be executed approximately 10 times longer. So, in this scene what should I do? Can I use CardNo field without changing its data type to int, etc (because there are other problem might be seen after changing it and it sill be better to find a solution without changing the data type of it). Here is also execution plan of the query below.
Query:
; WITH a AS (SELECT emp.EmployeeName, emp.Status, dep.DeptName, job.JobName, emp.CardNo
FROM TEmployee emp
LEFT JOIN TDeptA AS dep ON emp.DeptAID = dep.DeptID
LEFT JOIN TJob AS job ON emp.JobID = job.JobID),
b AS (SELECT eve.EventID, eve.EventTime, eve.CardNo, evt.EventCH, dor.DoorName
FROM TEvent eve LEFT JOIN TEventType AS evt ON eve.EventType = evt.EventID
LEFT JOIN TDoor AS dor ON eve.DoorID = dor.DoorID)
SELECT * FROM b LEFT JOIN a ON RIGHT(a.CardNo, 8) = RIGHT(b.CardNo, 8)
ORDER BY b.EventID ASC
You can add a computed column to your table like this:
ALTER TABLE TEmployee -- Don't start your table names with prefixes, you already know they're tables
ADD CardNoRight8 AS RIGHT(CardNo, 8) PERSISTED
ALTER TABLE TEvent
ADD CardNoRight8 AS RIGHT(CardNo, 8) PERSISTED
CREATE INDEX TEmployee_CardNoRight8_IDX ON TEmployee (CardNoRight8)
CREATE INDEX TEvent_CardNoRight8_IDX ON TEvent (CardNoRight8)
You don't need to persist the column since it already matches the criteria for a computed column to be indexed, but adding the PERSISTED keyword shouldn't hurt and might help the performance of other queries. It will cause a minor performance hit on updates and inserts, but that's probably fine in your case unless you're importing a lot of data (millions of rows) at a time.
The better solution though is to make sure that your columns that are supposed to match actually match. If the right 8 characters of the card number are something meaningful, then they shouldn't be part of the card number, they should be another column. If this is an issue where one table uses leading zeroes and the other doesn't then you should fix that data to be consistent instead of putting together work arounds like this.
This line is what is costing you 86% of the query time:
LEFT JOIN a ON RIGHT(a.CardNo, 8) = RIGHT(b.CardNo, 8)
This is happening because it has to run RIGHT() on those fields for every row and then match them with the other table. This is obviously going to be inefficient.
The most straightforward solution is probably to either remove the RIGHT() entirely or else to re-implement it as a built-in column on the table so it doesn't have to be calculated on the fly while the query is running.
While inserting the record, you would have to also insert the eight, right digits of the card number and store it in this field. My original thought was to use a computed column but I don't think those can be indexed so you'd have to use a regular column.
; WITH a AS (
SELECT emp.EmployeeName, emp.Status, dep.DeptName, job.JobName, emp.CardNoRightEight
FROM TEmployee emp
LEFT JOIN TDeptA AS dep ON emp.DeptAID = dep.DeptID
LEFT JOIN TJob AS job ON emp.JobID = job.JobID
),
b AS (
SELECT eve.EventID, eve.EventTime, eve.CardNoRightEight, evt.EventCH, dor.DoorName
FROM TEvent eve LEFT JOIN TEventType AS evt ON eve.EventType = evt.EventID
LEFT JOIN TDoor AS dor ON eve.DoorID = dor.DoorID
)
SELECT *
FROM b
LEFT JOIN a ON a.CardNoRightEight = b.CardNoRightEight
ORDER BY b.EventID ASC
This will help you see how to add a calculated column to your database.
create table #temp (test varchar(30))
insert into #temp
values('000456')
alter table #temp
add test2 as right(test, 3) persisted
select * from #temp
The other alternative is to fix the data and the data entry so that both columns are the same data type and contain the same leading zeros (or remove them)
Many thanks all of your help. With the help of your answers, I managed to reduce the query execution time from 2 minutes to 1 at the first step after using computed columns. After that, when creating an index for these columns, I managed to reduce the execution time to 3 seconds. Wow, it is really perfect :)
Here are the steps posted for those who suffers from a similar problem:
Step I: Adding computed columns to the tables (As CardNo fields are nvarchar data type, I specify data type of computed columns as int):
ALTER TABLE TEvent ADD CardNoRightEight AS RIGHT(CAST(CardNo AS int), 8)
ALTER TABLE TEmployee ADD CardNoRightEight AS RIGHT(CAST(CardNo AS int), 8)
Step II: Create index for the computed columns in order to execute the query faster:
CREATE INDEX TEmployee_CardNoRightEight_IDX ON TEmployee (CardNoRightEight)
CREATE INDEX TEvent_CardNoRightEight_IDX ON TEvent (CardNoRightEight)
Step 3: Update the query by using the computed columns in it:
; WITH a AS (
SELECT emp.EmployeeName, emp.Status, dep.DeptName, job.JobName, emp.CardNoRightEight --emp.CardNo
FROM TEmployee emp
LEFT JOIN TDeptA AS dep ON emp.DeptAID = dep.DeptID
LEFT JOIN TJob AS job ON emp.JobID = job.JobID
),
b AS (
SELECT eve.EventID, eve.EventTime, evt.EventCH, dor.DoorName, eve.CardNoRightEight --eve.CardNo
FROM TEvent eve
LEFT JOIN TEventType AS evt ON eve.EventType = evt.EventID
LEFT JOIN TDoor AS dor ON eve.DoorID = dor.DoorID)
SELECT * FROM b LEFT JOIN a ON a.CardNoRightEight = b.CardNoRightEight --ON RIGHT(a.CardNo, 8) = RIGHT(b.CardNo, 8)
ORDER BY b.EventID ASC

SQL: execute changes in inner joins AFTER changes in a row values of table

My question is somewhat conceptual.
I have a procedure on SQL where I join a few tables together. As an example: "mkt_Original" contains codes (A1,A3,etc) while "mkt_Desc" contains descriptors (Aruba, Argentina, etc). They are joined by an inner join and added to a temporary table:
SELECT CountryDesc, ColumnX into mkt_Temp
FROM mkt_Original
INNER JOIN mkt_Desc ON Mkt_Original.CountryCode = mkt_Desc.CountryCode
After generating the temporary table, I do a few changes to the country codes depending on the values of the other columns (ColumnX). For example:
UPDATE mkt_Temp
SET mkt_Temp.CountryCode = A3
WHERE mkt_Temp.ColumnX = 3
However, once i'm done with all the updates, i'm curious as to how the Country "description" will update (because now theres another country code). For example if originally the value A4 was stored it would bring back Japan and when I update to A3 it should say China.
Do I have to run another INNER JOIN query between the temporary table and all the other ones or is there a more efficient way? I was reading about Triggers but i'm not sure how one for this situation would look like. Any help or code for a trigger would be very useful!
Thanks in advance for your time.
If i were you I will be writing a case statement in the join condition itself if there are not a lot of changes
SELECT CountryDesc, ColumnX into mkt_Temp
FROM mkt_Original
INNER JOIN mkt_Desc
ON (case when Mkt_Original.CountryCode = (case
when ColumnX = 3 then 'A3' else
mkt_Desc.CountryCode end)

INNER LOOP JOIN Failing

I need to update a field called FamName in a table called Episode with randomly generated Germanic names from a different table called Surnames which has a single column called Surname.
To do this I have first added an ID field and NONCLUSTERED INDEX to my Surnames table
ALTER TABLE Surnames
ADD ID INT NOT NULL IDENTITY(1, 1);
GO
CREATE UNIQUE NONCLUSTERED INDEX idx ON Surnames(ID);
GO
I then attempt to update my Episode table via
UPDATE E
SET E.FamName = S.Surnames
FROM Episode AS E
INNER LOOP JOIN Surnames AS S
ON S.ID = (1 + ABS(CRYPT_GEN_RANDOM(8) % (SELECT COUNT(*) FROM Surnames)));
GO
where I am attempting to force the query to 'loop' using the LOOP join hint. Of course if I don't force the optimizer to loop (using LOOP) I will get the same German name for all rows. However, this query is strangely returning zero rows affected.
Why is this returning zero affected rows and how can this be amended to work?
Note, I could use a WHILE loop to perform this update, but I want a succinct way of doing this and to find out what I am doing wrong in this particular case.
You cannot (reliably) affect query results with join hints. They are performance hints, not semantic hints. You try to rely on undefined behavior.
Moving the random number computation out of the join condition into one of the join sources prevents the expression to be treated as a constant:
UPDATE E
SET E.FamName = S.Surnames
FROM (
SELECT *, (1 + ABS(CRYPT_GEN_RANDOM(8) % (SELECT COUNT(*) FROM Surnames))) AS SurnameID
FROM Episode AS E
) E
INNER LOOP JOIN Surnames AS S ON S.ID = E.SurnameID
The derived table E adds the computed SurnameID as a new column.
You don't need join hints any longer. I just tested that this works in my specific test case although I'm not whether this is guaranteed to work.

Optimizing sql update syntax rather than the Server

There are two tables, one is getting updated based on second table. The SQL is working but it is taking too much time because of number of records, i think. See this fiddle. The actual master table contains 1,500,000 and child contains 700,000 records and the following sql kept executing for 4 hours, hence terminated.
UPDATE master m SET m.amnt = (SELECT amnt FROM child c WHERE c.seqn = m.seqn)
WHERE m.seqn IN (SELECT seqn FROM child);
The execution plan of this sql is (Red one is master, other is child)
seqn is the primary key. No doubt it all depends upon server's performance and stats of indices. However, It bothers me that master is not being accessed by the index and child is being read twice. It is possible that the sql is optimized but oracle decided to go in this way, however i tried to optimize the sql as
UPDATE (
SELECT m.seqn m_seqn,c.seqn c_seqn, c.amnt c_amnt, m.amnt m_amnt
FROM master m INNER JOIN child c ON m.seqn = c.seqn)
SET m_amnt = c_amnt
which resulted in following error
ORA-01779: cannot modify a column which maps to a non key-preserved
table : UPDATE ( SELECT m.seqn m_seqn,c.seqn c_seqn, c.amnt c_amnt, m.amnt m_amnt
FROM master m INNER JOIN child c ON m.seqn = c.seqn) SET m_amnt = c_amnt
Is there any way i can optimize the SQL other than updating stats and tuning up the server?
EDIT The solution by #Sebas will not work if the column to be JOINED ON is not PK
check this one out:
UPDATE
(
SELECT m.amnt AS tochange, c.amnt AS newvalue
FROM child c
JOIN master m ON c.seqn = m.seqn
) t
SET t.tochange = t.newvalue;
SELECT * FROM master;
fiddle: http://www.sqlfiddle.com/#!4/c6b73/2
you just missed the PK in the fiddle.

Error while posting to a table

Platform used:
SQL Server 2008 and C++ Builder
I am doing an inner join between 2 tables which was giving me an error:
Row cannot be located for updating
Query:
SELECT DISTINCT
b.Acc, b.Region, b.Off, b.Sale, a.OrgDate
FROM
sales b
INNER JOIN
dates a ON (a.Acc = b.Acc and a.Region = b.Region and a.year= b.year)
WHERE
(a.xdate <> a.yDate)
and (b.Sale = a.SaleDate)
and b.year = 2010
Note: Acc, Region, Off are primary keys of table b and are also present in table a.
Table a has an id which is the primary key which does not appear in the query.
It turned out that my inner join was returning duplicate rows.
I changed my inner join query to use 'DISTINCT' so that only distinct rows are returned and not duplicate. The query runs but then I get the error:
Insufficient key column information for updating or refreshing.
It does turn out that the fields which are primary keys in Table A have the same names as the fields in Table B
I found that this is a bug which occurs while updating ADO record-sets.
BUG: Problem Updating ADO Hierarchical Recordset When Join Tables Share Same Column Name
I have the following 2 questions:
Is it not a good idea to use Distinct on an inner join query?
Has anyone found a resolution for that bug associated with TADO Query's?
Thank you,
The way I would solve this is to construct an update query by hand and run it through TADOQuery.ExecSQL. That assumes you actually know what you are doing.
The question is WHY are you working on a recordset that results in multiples of the same row, on all fields? You should be inspecting your query and fixing it. DISTINCT doesn't help, because SQL Server has picked one record but ADO won't know which one it picked, since there isn't enough information to properly identify the source on each side of the JOIN.
This query pulls in a.id to make the source records identifiable:
SELECT Acc,Region,Off,Sale,OrgDate,id
FROM
(
SELECT b.Acc,b.Region,b.Off,b.Sale,a.OrgDate, a.id,
rn=row_number() over (partition by b.Acc,b.Region,b.Off order by a.id asc)
FROM sales b
JOIN dates a ON(a.Acc = b.Acc and a.Region = b.Region and a.year= b.year)
WHERE a.xdate <> a.yDate
and b.Sale = a.SaleDate
and b.year = 2010
) X
WHERE rn=1;
Not tested, but it should work with ADO