Problem
I'm trying to refactor a low-performing MERGE statement to an UPDATE statement in Oracle 12.1.0.2.0. The MERGE statement looks like this:
MERGE INTO t
USING (
SELECT t.rowid rid, u.account_no_new
FROM t, u, v
WHERE t.account_no = u.account_no_old
AND t.contract_id = v.contract_id
AND v.tenant_id = u.tenant_id
) s
ON (t.rowid = s.rid)
WHEN MATCHED THEN UPDATE SET t.account_no = s.account_no_new
It is mostly low performing because there are two expensive accesses to the large (100M rows) table t
Schema
These are the simplified tables involved:
t The target table whose account_no column is being migrated.
u The migration instruction table containing a account_no_old → account_no_new mapping
v An auxiliary table modelling a to-one relationship between contract_id and tenant_id
The schema is:
CREATE TABLE v (
contract_id NUMBER(18) NOT NULL PRIMARY KEY,
tenant_id NUMBER(18) NOT NULL
);
CREATE TABLE t (
t_id NUMBER(18) NOT NULL PRIMARY KEY,
-- tenant_id column is missing here
account_no NUMBER(18) NOT NULL,
contract_id NUMBER(18) NOT NULL REFERENCES v
);
CREATE TABLE u (
u_id NUMBER(18) NOT NULL PRIMARY KEY,
tenant_id NUMBER(18) NOT NULL,
account_no_old NUMBER(18) NOT NULL,
account_no_new NUMBER(18) NOT NULL,
UNIQUE (tenant_id, account_no_old)
);
I cannot modify the schema. I'm aware that adding t.tenant_id would solve the problem by preventing the JOIN to v
Alternative MERGE doesn't work:
ORA-38104: Columns referenced in the ON Clause cannot be updated
Note, the self join cannot be avoided, because this alternative, equivalent query leads to ORA-38104:
MERGE INTO t
USING (
SELECT u.account_no_old, u.account_no_new, v.contract_id
FROM u, v
WHERE v.tenant_id = u.tenant_id
) s
ON (t.account_no = s.account_no_old AND t.contract_id = s.contract_id)
WHEN MATCHED THEN UPDATE SET t.account_no = s.account_no_new
UPDATE view doesn't work:
ORA-01779: cannot modify a column which maps to a non-key-preserved table
Intuitively, I would apply transitive closure here, which should guarantee that for each updated row in t, there can be only at most 1 row in u and in v. But apparently, Oracle doesn't recognise this, so the following UPDATE statement doesn't work:
UPDATE (
SELECT t.account_no, u.account_no_new
FROM t, u, v
WHERE t.account_no = u.account_no_old
AND t.contract_id = v.contract_id
AND v.tenant_id = u.tenant_id
)
SET account_no = account_no_new
The above raises ORA-01779. Adding the undocumented hint /*+BYPASS_UJVC*/ does not seem to work anymore on 12c.
How to tell Oracle that the view is key preserving?
In my opinion, the view is still key preserving, i.e. for each row in t, there is exactly one row in v, and thus at most one row in u. The view should thus be updatable. Is there any way to rewrite this query to make Oracle trust my judgement?
Or is there any other syntax I'm overlooking that prevents the MERGE statement's double access to t?
Is there any way to rewrite this query to make Oracle trust my judgement?
I've managed to "convince" Oracle to do MERGE by introducing helper column in target:
MERGE INTO (SELECT (SELECT t.account_no FROM dual) AS account_no_temp,
t.account_no, t.contract_id
FROM t) t
USING (
SELECT u.account_no_old, u.account_no_new, v.contract_id
FROM u, v
WHERE v.tenant_id = u.tenant_id
) s
ON (t.account_no_temp = s.account_no_old AND t.contract_id = s.contract_id)
WHEN MATCHED THEN UPDATE SET t.account_no = s.account_no_new;
db<>fiddle demo
EDIT
A variation of idea above - subquery moved directly to ON part:
MERGE INTO (SELECT t.account_no, t.contract_id FROM t) t
USING (
SELECT u.account_no_old, u.account_no_new, v.contract_id
FROM u, v
WHERE v.tenant_id = u.tenant_id
) s
ON ((SELECT t.account_no FROM dual) = s.account_no_old
AND t.contract_id = s.contract_id)
WHEN MATCHED THEN UPDATE SET t.account_no = s.account_no_new;
db<>fiddle demo2
Related article: Columns referenced in the ON Clause cannot be updated
EDIT 2:
MERGE INTO (SELECT t.account_no, t.contract_id FROM t) t
USING (SELECT u.account_no_old, u.account_no_new, v.contract_id
FROM u, v
WHERE v.tenant_id = u.tenant_id) s
ON((t.account_no,t.contract_id,'x')=((s.account_no_old,s.contract_id,'x')) OR 1=2)
WHEN MATCHED THEN UPDATE SET t.account_no = s.account_no_new;
db<>fiddle demo3
You may define a temporary table containing the pre-joined data from U and V.
Back it with a unique index on contract_id, account_no_old (which should be unique).
Then you may use this temporary table in an updateable join view.
create table tmp as
SELECT v.contract_id, u.account_no_old, u.account_no_new
FROM u, v
WHERE v.tenant_id = u.tenant_id;
create unique index tmp_ux1 on tmp ( contract_id, account_no_old);
UPDATE (
SELECT t.account_no, tmp.account_no_new
FROM t, tmp
WHERE t.account_no = tmp.account_no_old
AND t.contract_id = tmp.contract_id
)
SET account_no = account_no_new
;
Trying to do this with a simpler update. Still requires a subselect.
update t
set t.account_no = (SELECT u.account_no_new
FROM u, v
WHERE t.account_no = u.account_no_old
AND t.contract_id = v.contract_id
AND v.tenant_id = u.tenant_id);
Bobby
Related
In Postgres 9.5, I want to connect to another DB using Postgres' dblink, get data and then use them to update another table.
-- connect to another DB, get data from table, put it in a WITH
WITH temp_table AS
(
SELECT r_id, descr, p_id
FROM
dblink('myconnection',
'SELECT
r_id, descr, p_id
FROM table
WHERE table.p_id
IN (10,20);'
)
AS tempTable(r_id integer, descr text, p_id integer)
)
-- now use temp_table to update
UPDATE anothertable
SET
descr =temp_table.descr
FROM anothertable AS x
INNER JOIN temp_table
ON
x.r_id = temp_table.r_id
AND
x.p_id = temp_table.p_id
AND
x.p_id IN (2) ;
dblink works fine and if I do select * from temp_table before the UPDATE, it has data.
The issue is the UPDATE itself. It runs with no errors, but it never actually updates the table.
I tried changing the UPDATE to:
UPDATE anothertable
SET
descr =temp_table.descr
FROM anothertable AS x , temp_table
WHERE x.r_id = temp_table.r_id
AND
x.p_id = temp_table.p_id
AND
x.p_id IN (2) ;
Same as above: runs with no errors, but it never actually updates the table.
I also tried to change the UPDATE to:
UPDATE anothertable
INNER JOIN temp_table
ON x.r_id = temp_table.r_id
AND
x.p_id = temp_table.p_id
AND
x.p_id IN (2)
SET descr =temp_table.descr
But I get:
ERROR: syntax error at or near "INNER" SQL state: 42601
Character: 1894
How can I fix this to actually update?
Don't repeat the target table in the FROM clause of the UPDATE:
WITH temp_table AS ( ... )
UPDATE anothertable x
SET descr = t.descr
FROM temp_table t
WHERE x.r_id = t.r_id
AND x.p_id = t.p_id
AND x.p_id IN (2);
Or simplified:
...
AND x.p_id = 2
AND t.p_id = 2
The manual:
Do not repeat the target table as a from_item unless you intend a self-join (in which case it must appear with an alias in the from_item).
Related:
UPDATE statement with multiple joins in PostgreSQL
SQL update query with substring WHERE clause
I have to get rid of some unnecessary data from my Postgresql database.
Here is the query which works for small data:
WITH bad_row_history(survey_id, template_id) AS ((
SELECT row_id, (row_value->>'template_id')::INTEGER
FROM public.row_history
WHERE record_table='survey_storage'
AND row_value->>'status'IN ('Never Surveyed','Incomplete Configuration')
AND row_id NOT IN (
SELECT row_id
FROM public.row_history
WHERE record_table='survey_storage'
AND row_value->>'status'='Ready to Launch'
)
) LIMIT 10),
delete_su AS (
DELETE FROM survey_user
WHERE survey_id = ANY(ARRAY(select survey_id FROM bad_row_history))
),
delete_slu AS(
DELETE FROM survey_library_users
WHERE survey_library_id = ANY(ARRAY(select template_id FROM bad_row_history))
),
delete_ss AS(
DELETE FROM survey_storage
WHERE id = ANY(ARRAY(select survey_id FROM bad_row_history))
),
delete_sl AS(
DELETE FROM survey_library
WHERE id = ANY(ARRAY(select template_id FROM bad_row_history))
)
delete FROM row_history
WHERE row_id = ANY(ARRAY(select survey_id FROM bad_row_history))
In the cte, you will find I have added a limit.Otherwise the query never completes.Without limit the cte yields 937,147 rows.There are 5 delete statements.For each delete there could be at least one row and may be 3 to 5 rows at max.
I have 3 questions:
If the query could be improved? Instead of subquery should I use join?
Instead of one script should I split into multiple scripts?
Second question is should I use pg_cron?
If I do not put the limit will it be able to handle?
I understand this will be a time taking job.Let it be, but at least it should work. Should not hang. Yesterday I ran it without LIMIT and after running few hours it hanged & all the deletions got rolled back.But earlier with small limits like 10, 100 it has worked.
UPDATE As per suggestions I have introduced temp table & deletes with sub-query to the temp table. Here is the script:
DROP bad_row_history if EXISTS;
CREATE TEMPORARY TABLE bad_row_history (
survey_id int8 NOT NULL,
template_id int8 NOT NULL
);
ANALYZE bad_row_history;
INSERT INTO bad_row_history(survey_id, template_id)
(SELECT row_id, (row_value->>'template_id')::INTEGER
FROM public.row_history
WHERE record_table='survey_storage'
AND row_value->>'status'IN ('Never Surveyed','Incomplete Configuration')
AND row_id NOT IN (
SELECT row_id
FROM public.row_history
WHERE record_table='survey_storage'
AND row_value->>'status'='Ready to Launch'
)
);
DELETE FROM survey_user
WHERE survey_id IN (select survey_id FROM bad_row_history);
DELETE FROM survey_library_users
WHERE survey_library_id IN(select template_id FROM bad_row_history);
DELETE FROM survey_storage
WHERE id IN(select survey_id FROM bad_row_history);
DELETE FROM survey_library
WHERE id IN(select template_id FROM bad_row_history);
delete FROM row_history
WHERE row_id IN(select survey_id FROM bad_row_history)
UPDATE-2
disable_triggers.sql
ALTER TABLE survey_user DISABLE TRIGGER ALL;
ALTER TABLE survey_storage DISABLE TRIGGER ALL;
ALTER TABLE survey_library DISABLE TRIGGER ALL;
script
CREATE TEMPORARY TABLE bad_survey (
survey_id int8 NOT NULL,
template_id int8 NOT NULL
);
analyze bad_survey;
insert into bad_survey(survey_id, template_id)
(select id as survey_id, template_id
from survey_storage
where status in ('Never Surveyed','Incomplete Configuration','Ready to Launch')
and id=original_row_id
and tenant_id=owner_tenant_id
and tenant_id=5);
insert into bad_survey(survey_id, template_id)
(select pss.id, pss.template_id
from survey_storage css
inner join company_by_path cbp
on css.company_by_path_id = cbp.id
and css.tenant_id = cbp.tenant_id
and cbp.relationship_type = 'partner'
inner join survey_storage pss
on cbp.owner_tenant_id = pss.tenant_id
and css.master_template_id = pss.master_template_id
and css.tenant_id = pss.owner_tenant_id
and css.source_id = pss.source_id
and css.tenant_id != pss.tenant_id
and css.template_id != pss.template_id
and pss.id != pss.original_row_id
where css.id in (select id as survey_id
from survey_storage
where status in ('Never Surveyed','Incomplete Configuration','Ready to Launch')
and id=original_row_id
and tenant_id=owner_tenant_id
and tenant_id=5));
DELETE FROM survey_user su
USING bad_survey bs
WHERE su.survey_id = bs.survey_id;
DELETE FROM survey_library_users slu
USING bad_survey bs
WHERE slu.survey_library_id = bs.template_id;
DELETE FROM row_history rh
USING bad_survey bs
WHERE rh.row_id = bs.survey_id;
DELETE FROM survey_storage ss
USING bad_survey bs
WHERE ss.id = bs.survey_id;
DELETE FROM survey_library sl
USING bad_survey bs
WHERE sl.id = bs.template_id;
enable_triggers.sql
ALTER TABLE survey_user ENABLE TRIGGER ALL;
ALTER TABLE survey_storage ENABLE TRIGGER ALL;
ALTER TABLE survey_library ENABLE TRIGGER ALL;
Instead of doing everything in a single statement, proceed like this:
Create a temporary table from the result of the first CTE.
ANALYZE that temporary table.
Run one DELETE statement per table, joining with the temporary table.
The problem with your query is that Postgres is materializing the CTE, i.e. computing ~1m rows and storing them in memory, then the delete queries convert that to an array 5 separate times, and that's very expensive and slow.
I think you could make it a lot faster by not converting to an array, i.e.
survey_library_id IN (select template_id FROM bad_row_history)
rather than
survey_library_id = ANY(ARRAY(select template_id FROM bad_row_history))
What I would probably do though is make bad_row_history a temporary table, with columns template_id, survey_id etc, and then run the deletes as separate statements with subselects on the temporary table. That way the optimiser should be able to work more effectively on each delete.
SELECT A.GRPNO, A.EMPNO, A.DEPNO, A.PENDCD FROM EMPDEP A, EEDPELIG B
WHERE A.GRPNO=B.GRPNO
AND A.EMPNO=B.EMPNO
AND A.DEPNO=B.DEPNO
AND A.GRPNO = 6606 AND A.SPOUSE = 'T'
AND B.ELIGFLAG01 = 'T' AND SNAPTHRUDT ='DEC312999'
Our selection statement has been successful at pulling the information we need however we're new with SQL and are struggling to create an update statement that is replacing the "a.pendcd=0" to "a.pendcd=20" from the information in the select statement. Any help is appreciated, thank you.
update a
a.pendcd=20
FROM EMPDEP A inner join EEDPELIG B
on A.GRPNO=B.GRPNO
AND A.EMPNO=B.EMPNO
AND A.DEPNO=B.DEPNO
AND A.GRPNO = 6606 AND A.SPOUSE = 'T'
AND B.ELIGFLAG01 = 'T' AND SNAPTHRUDT ='DEC312999'
where a.pendcd=0
Oracle does not support FROM or JOIN in UPDATE (under most circumstances).
Just use EXISTS:
UPDATE EMPDEP ed
SET . . .
WHERE EXISTS (SELECT 1
FROM EEDPELIG p
WHERE ed.GRPNO = p.GRPNO AND
ed.EMPNO= p.EMPNO AND
ed.DEPNO= p.DEPNO AND
p.ELIGFLAG01 = 'T'
)
ed.GRPNO = 6606 AND
ed.SPOUSE = 'T' AND
ed.SNAPTHRUDT ='DEC312999';
It is unclear if the condition on SNAPTHRUDT is on the outer table or inner table. If it is on p, then move it to the subquery.
You can use MERGE statement as following:
Lets assume EMPDEP table has primary key which is EMPDEP_UID.
MERGE INTO EMPDEP TRG
USING
(SELECT A.EMPDEP_UID, A.PENDCD
FROM EMPDEP A, EEDPELIG B
WHERE A.GRPNO=B.GRPNO
AND A.EMPNO=B.EMPNO
AND A.DEPNO=B.DEPNO
AND A.GRPNO = 6606
AND A.SPOUSE = 'T'
AND B.ELIGFLAG01 = 'T'
AND SNAPTHRUDT ='DEC312999') SRC
ON (TRG.EMPDEP_UID = SRC.EMPDEP_UID)
WHEN MATCHED THEN
UPDATE SET TRG.PENDCD = 0
WHERE TRG.PENCD = 20;
You can use unique keys instead of primary key to identify the records to be updated. But it is safe to use primary key as unique key can contain nulls which can change the behaviour of our query.
Cheers!!
I've generated two temporary tables, also assigned a primary key to the generated tables - to get an index on them.
Like this on both:
ALTER TABLE TEMP_MEASURINGS ADD PRIMARY KEY (MEASURINGID)
ALTER TABLE TEMP_VALUES ADD PRIMARY KEY (<some_other_col>)
The two temp-tables are related by a date and another id as you can see by the query. Now I need to update the "measuringid" in TEMP_VALUES based on the other table.
Can I make this query go faster in any way?
UPDATE TEMP_VALUES v
SET v.MEASURINGID =
(
SELECT MEASURINGID
FROM TEMP_MEASURINGS m
WHERE m.MEASURDATE = v.MEASUREDATE
AND m.ORDERID = v.ORDERID
)
The tables needs to be generated first, so I can't do an insert directly.
SELECT COUNT(*) FROM TEMP_VALUES ~6M
SELECT COUNT(*) FROM TEMP_MEASURINGS ~1.5M
Your query is going to be slow, because so many rows are being updated.
You can speed it with an index on TEMP_MEASURINGS(MEASUREDATE, ORDERID, MEASURINGID). This is a covering index for the subquery. The lookups should be fast.
You might find it faster just to create a new table:
create new_temp_values as
select v.*, m.measuringid
from temp_values v left join
temp_measurings m
on v.measuredate = m.measuredate and v.orderid = m.orderid;
The same index will work here (you can adjust the select columns to be what you really need).
Typically, creating a new table is much, much faster than updating all or even a significant number of rows in a given table.
Try with below merge for peformance:
MERGE TEMP_VALUES v
USING (SELECT MEASURINGID,
MEASURDATE,
ORDERID
FROM TEMP_MEASURINGS) m
ON
m.MEASURDATE = v.MEASUREDATE
AND m.ORDERID = v.ORDERID
WHEN MATCHED THEN
UPDATE
SET v.MEASURINGID = m.MEASURINGID;
Instead of updating all records you can update records that exists in temp table:
UPDATE TEMP_VALUES v
SET v.MEASURINGID =
(
SELECT MEASURINGID
FROM TEMP_MEASURINGS m
WHERE m.MEASURDATE = v.MEASUREDATE
AND m.ORDERID = v.ORDERID
)
where
exists (select 1 from TEMP_VALUES ttt, TEMP_MEASURINGS ttm WHERE ttm.MEASURDATE = ttt.MEASUREDATE
AND ttm.ORDERID = ttt.ORDERID and ttt.ID = v.ID)
I have two tables in Oracle, defined as the following:
Location Data
loc_id
capacity
loc_type
User Data
loc_id
custom001
What I'm wanting to do is copy what's in location.capacity into user.custom001 if the following constraints are met:
custom001 <> capacity
loc_type IN ('A','B')
capacity is not NULL or 0
Based on some other queries I've found on Stack Exchange, I've developed this:
UPDATE user u
SET u.custom001 =
(SELECT l.capacity
FROM location l
WHERE u.loc_id = l.loc_id
AND l.capacity <> u.custom001
AND l.loc_type IN ('110','210')
AND l.capacity IS NOT NULL
AND l.capacity <> 0)
WHERE exists (select capacity from location l WHERE l.loc_id = u.loc_id)
But it's not respecting the constraints and is updating most every row in the user table, most of them with NULLs.
Where do I need to go from here?
You're close.
The following should work:
UPDATE user u
SET u.custom001 = (
SELECT l.capacity
FROM location l
WHERE u.loc_id = l.loc_id
) where exists (
select null from location l
WHERE l.loc_id = u.loc_id
AND l.capacity <> u.custom001
AND l.loc_type IN ('110','210')
AND l.capacity IS NOT NULL
AND l.capacity <> 0
)
The basic issue with what you have is that you're applying all the restrictions, but then updating everything anyway due to the restrictions not existing in your WHERE exists clause
I use this approch and works well for me . You can use sql Developer to generate the ddl of the table . Then you can create new table with constraints , like primary key , not null constraint and foreign key from the generated ddl . If your table just ahs not null constraint use
Create table new_table as ( select * from old_table);