I've generated two temporary tables, also assigned a primary key to the generated tables - to get an index on them.
Like this on both:
ALTER TABLE TEMP_MEASURINGS ADD PRIMARY KEY (MEASURINGID)
ALTER TABLE TEMP_VALUES ADD PRIMARY KEY (<some_other_col>)
The two temp-tables are related by a date and another id as you can see by the query. Now I need to update the "measuringid" in TEMP_VALUES based on the other table.
Can I make this query go faster in any way?
UPDATE TEMP_VALUES v
SET v.MEASURINGID =
(
SELECT MEASURINGID
FROM TEMP_MEASURINGS m
WHERE m.MEASURDATE = v.MEASUREDATE
AND m.ORDERID = v.ORDERID
)
The tables needs to be generated first, so I can't do an insert directly.
SELECT COUNT(*) FROM TEMP_VALUES ~6M
SELECT COUNT(*) FROM TEMP_MEASURINGS ~1.5M
Your query is going to be slow, because so many rows are being updated.
You can speed it with an index on TEMP_MEASURINGS(MEASUREDATE, ORDERID, MEASURINGID). This is a covering index for the subquery. The lookups should be fast.
You might find it faster just to create a new table:
create new_temp_values as
select v.*, m.measuringid
from temp_values v left join
temp_measurings m
on v.measuredate = m.measuredate and v.orderid = m.orderid;
The same index will work here (you can adjust the select columns to be what you really need).
Typically, creating a new table is much, much faster than updating all or even a significant number of rows in a given table.
Try with below merge for peformance:
MERGE TEMP_VALUES v
USING (SELECT MEASURINGID,
MEASURDATE,
ORDERID
FROM TEMP_MEASURINGS) m
ON
m.MEASURDATE = v.MEASUREDATE
AND m.ORDERID = v.ORDERID
WHEN MATCHED THEN
UPDATE
SET v.MEASURINGID = m.MEASURINGID;
Instead of updating all records you can update records that exists in temp table:
UPDATE TEMP_VALUES v
SET v.MEASURINGID =
(
SELECT MEASURINGID
FROM TEMP_MEASURINGS m
WHERE m.MEASURDATE = v.MEASUREDATE
AND m.ORDERID = v.ORDERID
)
where
exists (select 1 from TEMP_VALUES ttt, TEMP_MEASURINGS ttm WHERE ttm.MEASURDATE = ttt.MEASUREDATE
AND ttm.ORDERID = ttt.ORDERID and ttt.ID = v.ID)
Related
I am trying to make a query that works with a temp table, work without that temp table
I tried doing a join in the subquery without the temp table but I don't get the same results as the query with the temp table.
This is the query with the temp table that works as I want:
create table #results(
RowId id_t,
LastUpdatedAt date_T
)
insert into #results
select H.RowId, H.LastUpdatedAt from MemberCarrierMap M Join MemberCarrierMapHistory H on M.RowId = H.RowId
update MemberCarrierMap
set CreatedAt = (select MIN(LastUpdatedAt) from #results r where r.rowId = MemberCarrierMap.rowId)
Where CreatedAt is null;
and here is the query I tried without the temp table that doesn't work like the above:
update MemberCarrierMap
set CreatedAt = (select MIN(MH.LastUpdatedAt) from MemberCarrierMapHistory MH join MemberCarrierMap M on MH.RowId = M.RowId where MH.RowId = M.RowId )
Where CreatedAt is null;
I was expecting the 2nd query to work as the first but It is not. Any suggestions on how to achieve what the first query does without the temp table?
This should work:
update M
set M.CreatedAt = (select MIN(MH.LastUpdatedAt) from MemberCarrierMapHistory MH WHERE MH.RowId = M.RowId)
FROM MemberCarrierMap M
Where M.CreatedAt is null;
Your question is more or less a duplicate of this answer. There, you will find multiple solutions. But the ones that implement correlated subqueires are less performant than the one that simply uses an uncorrelated aggregation subquery inside a join.
Applying it to your situation, you will have this:
update m
set m.createdDate = hAgg.maxVal
from memberCarrierMap m
join (
select rowId, max(lastUpdatedAt) as maxVal
from memberCarrierMapHistory
group by rowId
) as hAgg
on m.rowId = hAgg.rowId
where m.createdAt is null;
Basically, it's more performant because it is more expensive to run aggregations and filterings on a row-by-row basis (which is what happens in a correlated subquery) than to just get the aggregations out of the way all at once (joins tend to happen early in processing) and perform the match afterwards.
Problem
I'm trying to refactor a low-performing MERGE statement to an UPDATE statement in Oracle 12.1.0.2.0. The MERGE statement looks like this:
MERGE INTO t
USING (
SELECT t.rowid rid, u.account_no_new
FROM t, u, v
WHERE t.account_no = u.account_no_old
AND t.contract_id = v.contract_id
AND v.tenant_id = u.tenant_id
) s
ON (t.rowid = s.rid)
WHEN MATCHED THEN UPDATE SET t.account_no = s.account_no_new
It is mostly low performing because there are two expensive accesses to the large (100M rows) table t
Schema
These are the simplified tables involved:
t The target table whose account_no column is being migrated.
u The migration instruction table containing a account_no_old → account_no_new mapping
v An auxiliary table modelling a to-one relationship between contract_id and tenant_id
The schema is:
CREATE TABLE v (
contract_id NUMBER(18) NOT NULL PRIMARY KEY,
tenant_id NUMBER(18) NOT NULL
);
CREATE TABLE t (
t_id NUMBER(18) NOT NULL PRIMARY KEY,
-- tenant_id column is missing here
account_no NUMBER(18) NOT NULL,
contract_id NUMBER(18) NOT NULL REFERENCES v
);
CREATE TABLE u (
u_id NUMBER(18) NOT NULL PRIMARY KEY,
tenant_id NUMBER(18) NOT NULL,
account_no_old NUMBER(18) NOT NULL,
account_no_new NUMBER(18) NOT NULL,
UNIQUE (tenant_id, account_no_old)
);
I cannot modify the schema. I'm aware that adding t.tenant_id would solve the problem by preventing the JOIN to v
Alternative MERGE doesn't work:
ORA-38104: Columns referenced in the ON Clause cannot be updated
Note, the self join cannot be avoided, because this alternative, equivalent query leads to ORA-38104:
MERGE INTO t
USING (
SELECT u.account_no_old, u.account_no_new, v.contract_id
FROM u, v
WHERE v.tenant_id = u.tenant_id
) s
ON (t.account_no = s.account_no_old AND t.contract_id = s.contract_id)
WHEN MATCHED THEN UPDATE SET t.account_no = s.account_no_new
UPDATE view doesn't work:
ORA-01779: cannot modify a column which maps to a non-key-preserved table
Intuitively, I would apply transitive closure here, which should guarantee that for each updated row in t, there can be only at most 1 row in u and in v. But apparently, Oracle doesn't recognise this, so the following UPDATE statement doesn't work:
UPDATE (
SELECT t.account_no, u.account_no_new
FROM t, u, v
WHERE t.account_no = u.account_no_old
AND t.contract_id = v.contract_id
AND v.tenant_id = u.tenant_id
)
SET account_no = account_no_new
The above raises ORA-01779. Adding the undocumented hint /*+BYPASS_UJVC*/ does not seem to work anymore on 12c.
How to tell Oracle that the view is key preserving?
In my opinion, the view is still key preserving, i.e. for each row in t, there is exactly one row in v, and thus at most one row in u. The view should thus be updatable. Is there any way to rewrite this query to make Oracle trust my judgement?
Or is there any other syntax I'm overlooking that prevents the MERGE statement's double access to t?
Is there any way to rewrite this query to make Oracle trust my judgement?
I've managed to "convince" Oracle to do MERGE by introducing helper column in target:
MERGE INTO (SELECT (SELECT t.account_no FROM dual) AS account_no_temp,
t.account_no, t.contract_id
FROM t) t
USING (
SELECT u.account_no_old, u.account_no_new, v.contract_id
FROM u, v
WHERE v.tenant_id = u.tenant_id
) s
ON (t.account_no_temp = s.account_no_old AND t.contract_id = s.contract_id)
WHEN MATCHED THEN UPDATE SET t.account_no = s.account_no_new;
db<>fiddle demo
EDIT
A variation of idea above - subquery moved directly to ON part:
MERGE INTO (SELECT t.account_no, t.contract_id FROM t) t
USING (
SELECT u.account_no_old, u.account_no_new, v.contract_id
FROM u, v
WHERE v.tenant_id = u.tenant_id
) s
ON ((SELECT t.account_no FROM dual) = s.account_no_old
AND t.contract_id = s.contract_id)
WHEN MATCHED THEN UPDATE SET t.account_no = s.account_no_new;
db<>fiddle demo2
Related article: Columns referenced in the ON Clause cannot be updated
EDIT 2:
MERGE INTO (SELECT t.account_no, t.contract_id FROM t) t
USING (SELECT u.account_no_old, u.account_no_new, v.contract_id
FROM u, v
WHERE v.tenant_id = u.tenant_id) s
ON((t.account_no,t.contract_id,'x')=((s.account_no_old,s.contract_id,'x')) OR 1=2)
WHEN MATCHED THEN UPDATE SET t.account_no = s.account_no_new;
db<>fiddle demo3
You may define a temporary table containing the pre-joined data from U and V.
Back it with a unique index on contract_id, account_no_old (which should be unique).
Then you may use this temporary table in an updateable join view.
create table tmp as
SELECT v.contract_id, u.account_no_old, u.account_no_new
FROM u, v
WHERE v.tenant_id = u.tenant_id;
create unique index tmp_ux1 on tmp ( contract_id, account_no_old);
UPDATE (
SELECT t.account_no, tmp.account_no_new
FROM t, tmp
WHERE t.account_no = tmp.account_no_old
AND t.contract_id = tmp.contract_id
)
SET account_no = account_no_new
;
Trying to do this with a simpler update. Still requires a subselect.
update t
set t.account_no = (SELECT u.account_no_new
FROM u, v
WHERE t.account_no = u.account_no_old
AND t.contract_id = v.contract_id
AND v.tenant_id = u.tenant_id);
Bobby
This query shown below is taking almost 2 hrs to run and I want to reduce the execution time of this query. Any help would be really helpful for me.
Currently:
If Exists (Select 1
From PRODUCTS prd
Join STORE_RANGE_GRP_MATCH srg On prd.Store_Range_Grp_Id = srg.Orig_Store_Range_Grp_ID
And srg.Match_Flag = 'Y'
And prd.Range_Event_Id = srg.LAR_Range_Event_Id
Where srg.Range_Event_Id Not IN (Select distinct Range_Event_Id
From Last_Authorised_Range)
)
I have tried replacing the Not IN clause by Not Exists and Left join but no luck in runtime execution.
What I have used:
If Exists( Select top 1 *
From PRODUCTS prd
Join STORE srg
On prd.Store_Range_Grp_Id = srg.Orig_Store_Range_Grp_ID
And srg.Match_Flag = 'Y'
And prd.Range_Event_Id = srg.LAR_Range_Event_Id
and srg.Range_Event_Id ='45655'
Where NOT EXISTS (Select top 1 *
From Last_Authorised_Range where Range_Event_Id=srg.Range_Event_Id)
)
Product table has 432837 records and the Store table also has almost the same number of records. This table I am creating in the stored procedure itself and then dropping it in the end in the stored procedure.
Create Table PRODUCTS
(
Range_Event_Id int,
Store_Range_Grp_Id int,
Ranging_Prod_No nvarchar(14) collate database_default,
Space_Break_Code nchar(1) collate database_default
)
Create Clustered Index Idx_tmpLAR_PRODUCTS
ON PRODUCTS (Range_Event_Id, Ranging_Prod_No, Store_Range_Grp_Id, Space_Break_Code)
Should I use non clustered index on this table or what all can I do to lessen the execution time? Thanks in advance
First, you don't need top 1 or distinct in exists and in subqueries. But this shouldn't affect performance.
This is the query, slightly re-arranged so I can understand it better:
Select 1
From PRODUCTS prd Join
STORE srg
On prd.Store_Range_Grp_Id = srg.Orig_Store_Range_Grp_ID and
prd.Range_Event_Id = srg.LAR_Range_Event_Id
Where srg.Match_Flag = 'Y'
srg.Range_Event_Id = 45655 and
Where NOT EXISTS (Select 1
From Last_Authorised_Range lar
where lar.Range_Event_Id = srg.Range_Event_Id)
)
Do note that I removed the double quotes around 45655. I presume this column is actually a number. If so, don't confuse yourself and the optimizer by using a string for the comparison.
Then, try indexes. I think the best indexes are:
store(Range_Event_Id, Match_Flag, Orig_Store_Range_Grp_ID, LAR_Range_Event_Id)
products(Store_Range_Grp_Id, Range_Event_Id) (or any index, clustered or otherwise, that starts with these two columns in either order)
Last_Authorised_Range(Range_Event_Id)
From what you describe as the volume of data, your query should not be taking hours. I think indexes can help.
I am going to explain again what I am trying to do in hopes that you can help.
Table 1 has 4061 rows with columns that include
[Name],[Address1],[Address2],[Address3],[City],[State],[Zip],[Country],[Phone]
and 20 other columns. Table 1 is data that needs to be deidentified. Table 1 has 1534 distinct [Name] rows out of 4061 rows total.
Table 2 has auto generated data which includes the same columns. I would like to replace the above mentioned columns in table 1 with data from table 2. I want to select distinct based on [Name] from table one and then [Name],[Address1],[Address2],[Address3],[City],[State],[Zip],[Country],[Phone] with a new set of distinct data from table 2.
I do not want to just update each row with a new address as that will screw up the data consistency. By replacing only distinct this will allow me to preserve the data consistency while changing the row data in table 1. When I am done I would like to have 1534 distinct new de-identified [Name] [Address1],[Address2],[Address3],[City],[State],[Zip],[Country],[Phone] in table 1 from table 2.
You would use join in the update. You can generate a join key for 1500 rows using row_number():
update toupdate
set t.address = f.address
from (select t.*, row_number() over (order by newid()) as seqnum
from table t
) toupdate join
(select f.*, row_number() over (order by newid()) as seqnum
fake f
) f
on toupdate.seqnum = f.seqnum and t.seqnum <= 1500;
Here is how I ended up doing it.
First I ran a statement to select distinct and inserted it into a table.
Select Distinct [Name],[Address1],[City],[State],[Zip],[Country],[Phone]
INTO APMAST2
FROM APMAST
I then added name2 column in APMAST2 and used a statement to create a sequential id field into APMAST2.
DECLARE #id INT
SET #id = 0
UPDATE APMAST2
SET #id = id = #id + 1
GO
Now I have my distinct info plus a blank name field and a sequential ID field in APMAST2. Now I can join this date with my fakenames table which I generated from. HERE using their bulk tool.
Using a Join Statement I joined my fake data with APMAST2
Update dbo.APMAST2
SET dbo.APMAST2.Name = dbo.fakenames.company,
dbo.APMAST2.Address1 = dbo.fakenames.streetaddress,
dbo.APMAST2.City = dbo.fakenames.City,
dbo.APMAST2.State = dbo.fakenames.State,
dbo.APMAST2.Zip = dbo.fakenames.zipcode,
dbo.APMAST2.Country = dbo.fakenames.countryfull,
dbo.APMAST2.Phone = dbo.fakenames.telephonenumber
FROM
dbo.APMAST2
INNER JOIN
dbo.fakenames
ON dbo.fakenames.number = dbo.APMAST2.id
Now I have my fake data loaded but I kept my original Name field so I could reload this data into my full table ARMAST so now I can do a join between ARMAST2 and ARMAST.
Update dbo.APMAST
SET dbo.APMAST.Name = dbo.APMAST2.Name,
dbo.APMAST.Address1 = dbo.APMAST2.Address1,
dbo.APMAST.City = dbo.APMAST2.City,
dbo.APMAST.State = dbo.APMAST2.State,
dbo.APMAST.Zip = dbo.APMAST2.Zip,
dbo.APMAST.Country = dbo.APMAST2.Country,
dbo.APMAST.Phone = dbo.APMAST2.Phone
FROM
dbo.APMAST
INNER JOIN
dbo.apmast2
ON dbo.apmast.name = dbo.APMAST2.name2
Now my original table has all fake data in it but it keeps the integrity it had , well most of it, so the data looks good when reported on but is de-identified. You can now remove APMAST2 or keep it if you need to match this with other data later on. I know this is long and I am sure there is a better way to do it but this is how I did it, suggestions welcome.
I wrote this query:
INSERT INTO KeysTable (KeyText, Id)
SELECT KeyText as BKT, KeysTable.ID as CID FROM KeysTable
INNER JOIN StatTable ON KeysTable.ID = StatTable.Key_ID
WHERE StatTable.StatCommandCode = 4 AND
EXISTS (SELECT 1 FROM StatTable WHERE StatCommandCode = 4 AND StatTable.Key_ID = CID);
I know that removing the condition
AND StatTable.Key_ID = CID
would make the query very fast. Also if I replace it with
AND StatTable.Key_ID = 444 // (444 - random static number)
the query will be very fast too. Both the columns in this condition are indexed:
CREATE INDEX IF NOT EXISTS StatsIndex ON StatTable (Key_ID);
and in KeysTable the ID column is primary key. Why doesn't the index improve perfomance in this case?
Thanks for answers and sorry for my bad english :(.
If there is no CID column in any of the two tables, then the EXISTS subquery is useless. Rewrite the statement as:
INSERT INTO KeysTable (KeyText, Id)
SELECT KeyText
, KeysTable.ID
FROM KeysTable
INNER JOIN StatTable
ON KeysTable.ID = StatTable.Key_ID
WHERE StatTable.StatCommandCode = 4
If it is still slow, you can try adding an index on (StatCommandCode, Key_ID)