oracle update with correlated query - sql

What is the correct answer? Choose two.
Examine this SQL statement:
UPDATE orders o
SET customer_name = (
SELECT cust_last_name FROM customers WHERE customer_id=o.customer_id
);
Which two are true?
A. The subquery is executed before the UPDATE statement is executed.
B. All existing rows in the ORDERS table are updated.
C. The subquery is executed for every updated row in the ORDERS
table.
D. The UPDATE statement executes successfully even if the subquery
selects multiple rows.
E. The subquery is not a correlated subquery.
I know B is correct, but all other selection I believe is incorrect.
A. Subquery executes for every row that the outer query returns, so
it should execute after the outer query.
C. NOT for every updated row, it is for every row that the outer
query returns.
D. I tried. It causes an error ORA-01427: single-row subquery returns
more than one row
E. It is a correlated subquery.

Consider option C:
C. The subquery is executed for every updated row in the ORDERS table.
You said:
NOT for every updated row, it is for every row that the outer query returns.
Yes. The subquery is indeed executed for every row in the outer query (let apart possible optimizations applied by the database). And every row in the outer query is updated - as you spotted, since you already, and correctly, selected option B: all existing rows in the ORDERS table are updated.
Note: your arguments against options A, D and 3 are valid.

The only second true answer is
F. this is a wrong desing denormalizing the CUSTOMER_NAME in the orders table and conflicting therefor with the normal form.
The answer C could be right somewhere in the times of Oracle 8 (i.e. 20 years ago) but now it is definitively wrong!.
Oracle introduces the scalar subquery caching event for the reason to limit the number of executions of the subqueries.
Here a Simple Demonstration
This setup in Oracle 19.2 has 10K orders and 1K customers.
create table customers as
select rownum customer_id, 'cust_'||rownum customer_name from dual connect by level <= 1000;
create index customers_idx1 on customers (customer_id);
create table orders as
select rownum order_id, trunc(rownum/10)+1 customer_id, cast (null as varchar2(100)) customer_name
from dual connect by level <= 10000;
The update is performed on 100K rows as expected
UPDATE /*+ gather_plan_statistics */ orders o
SET customer_name = (
SELECT customer_name FROM customers WHERE customer_id=o.customer_id
);
The hint gather_plan_statistics collects teh execution statistics which we will examine.
SQL_ID 8r610vz9fknr6, child number 0
-------------------------------------
UPDATE /*+ gather_plan_statistics */ orders o SET customer_name = (
SELECT customer_name FROM customers WHERE customer_id=o.customer_id )
Plan hash value: 3416863305
--------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
--------------------------------------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 1 | | 0 |00:00:00.18 | 60863 | 21 |
| 1 | UPDATE | ORDERS | 1 | | 0 |00:00:00.18 | 60863 | 21 |
| 2 | TABLE ACCESS FULL | ORDERS | 1 | 10000 | 10000 |00:00:00.01 | 21 | 18 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| CUSTOMERS | 1001 | 1 | 1000 |00:00:00.01 | 2020 | 3 |
|* 4 | INDEX RANGE SCAN | CUSTOMERS_IDX1 | 1001 | 1 | 1000 |00:00:00.01 | 1020 | 3 |
--------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("CUSTOMER_ID"=:B1)
The importatnt information is in the column Start, we see that the table customers were accessed only 1001 time, i.e. nearly only once per customer and not once per order.

Related

where column is null taking longer time to execute

I am executing a select statement like the one below which is taking more than 6mins to execute.
select * from table where col1 is null;
whereas:
select * from table;
returns results in few seconds. The table contains 25million records. No indexes are used. there is a composite PK but not on the col used. Same query when executed on a different table with 50 million records, returns results in few seconds. only this table poses a problem.
Rebuilt the table to check if there was a miss, but still facing the same issue.
can some one help me here on why it is taking time?
datatype: VARCHAR2(40)
PLAN:
Plan hash value: 2838772322
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 794 | 60973 (16)| 00:00:03 |
|* 1 | TABLE ACCESS STORAGE FULL| table | 1 | 794 | 60973 (16)| 00:00:03 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - storage("column" IS NULL)
filter("column" IS NULL)
select * from table;
Oracle SQL Developer tool has a default setting to display only 50 records unless it was manually edited. So the entire 25 million records will not be fetched as you don't need all the records for display.
select * from table where col1 is null;
But when you filter for null values, the entire set of 25 million rows has to be scanned to apply the filter and get your 81 records satisfying that predicate. Hence your second query takes longer.

How MIN/MAX function works in SQL?

I'm trying to understand how MIN/MAX function calculates value in backed in sql
Lets say I have below table Duplicate
ID NAME
1 A
2 A
3 A
4 A
5 A
6 B
7 B
8 B
9 B
10 B
11 C
12 C
13 C
14 C
SO when I run a below query
SELECT MAX(ID), NAME FROM Duplicate
GROUP BY NAME
Does sql engine finds first MAX value of ID in every group and then finds MAX ID out of those Grouped records ? Is it correct or something else happens ?
You'll see something like this in Oracle
SQL> set autotrace traceonly explain
SQL> select owner, max(object_id)
2 from t
3 group by owner;
Execution Plan
----------------------------------------------------------
Plan hash value: 47235625
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 37 | 407 | 431 (2)| 00:00:01 |
| 1 | HASH GROUP BY | | 37 | 407 | 431 (2)| 00:00:01 |
| 2 | TABLE ACCESS FULL| T | 78939 | 847K| 427 (1)| 00:00:01 |
---------------------------------------------------------------------------
"group by hash". This a mechanism via which we can avoid a massive sorting cost to perform aggregation (min, max, etc etc).
Conceptually its like this:
Read first row
Hash the group by column ("owner" in my case)
Lets say the hash value is 1234.
Store value of "object_id" in bucket 1234.
then
Read next row
Hash the group by column ("owner" in my case)
Lets say the hash value is 5678.
Store value of "object_id" in bucket 5678.
then
Read next row
Hash the group by column ("owner" in my case)
Lets say the hash value is 1234 (ie, same value is row 1).
Compare object_id value with existing object_id in bucket 5678. If it's larger, then replace it, otherwise ignore and move on.
So you can see we can identify the max value without sorting - just a single scan of the all the data.
I don't know what DB you're using, but for Teradata, which distributes table rows in a parallel manner, a simple aggregation with GROUP BY typically will do:
Aggregate rows (local)
Redistribute rows
Sort rows
Aggregate rows (global)
Return final result
What DBMS are you using? Can you run an EXPLAIN on your query to see what the query plan is? That would give you some idea.

rownum / fetch first n rows

select * from Schem.Customer
where cust='20' and cust_id >= '890127'
and rownum between 1 and 2 order by cust, cust_id;
Execution time appr 2 min 10 sec
select * from Schem.Customer where cust='20'
and cust_id >= '890127'
order by cust, cust_id fetch first 2 rows only ;
Execution time appr 00.069 ms
The execution time is a huge difference but results are the same. My team is not adopting to later one. Don't ask why.
So what is the difference between Rownum and fetch first 2 rows and what should I do to improve or convince anyone to adopt.
DBMS : DB2 LUW
Although both SQL end up giving same resultset, it only happens for your data. There is a great chance that resultset would be different. Let me explain why.
I will make your SQL a little simpler to make it simple to understand:
SELECT * FROM customer
WHERE ROWNUM BETWEEN 1 AND 2;
In this SQL, you want only first and second rows. That's fine. DB2 will optimize your query and never look rows beyond 2nd. Because only first 2 rows qualify your query.
Then you add ORDER BY clause:
SELECT * FROM customer
WHERE ROWNUM BETWEEN 1 AND 2;
ORDER BY cust, cust_id;
In this case, DB2 first fetches 2 rows then order them by cust and cust_id. Then sends to client(you). So far so good. But what if you want to order by cust and cust_id first, then ask for first 2 rows? There is a great difference between them.
This is the simplified SQL for this case:
SELECT * FROM customer
ORDER BY cust, cust_id
FETCH FIRST 2 ROWS ONLY;
In this SQL, ALL rows qualify the query, so DB2 fetches all of the rows, then sorts them, then sends first 2 rows to client.
In your case, both queries give same results because first 2 rows are already ordered by cust and cust_id. But it won't work if first 2 rows would have different cust and cust_id values.
A hint about this is FETCH FIRST n ROWS comes after order by, that means DB2 orders the result then retrieves first n rows.
Excellent answer here:
https://blog.dbi-services.com/oracle-rownum-vs-rownumber-and-12c-fetch-first/
Now the index range scan is chosen, with the right cardinality estimation.
So which solution it the best one? I prefer row_number() for several reasons:
I like analytic functions. They have larger possibilities, such as setting the limit as a percentage of total number of rows for example.
11g documentation for rownum says:
The ROW_NUMBER built-in SQL function provides superior support for ordering the results of a query
12c allows the ANSI syntax ORDER BY…FETCH FIRST…ROWS ONLY which is translated to row_number() predicate
12c documentation for rownum adds:
The row_limiting_clause of the SELECT statement provides superior support
rownum has first_rows_n issues as well
PLAN_TABLE_OUTPUT
SQL_ID 49m5a3f33cmd0, child number 0
-------------------------------------
select /*+ FIRST_ROWS(10) */ * from test where contract_id=500
order by start_validity fetch first 10 rows only
Plan hash value: 1912639229
--------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | Buffers |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 10 | 15 |
|* 1 | VIEW | | 1 | 10 | 10 | 15 |
|* 2 | WINDOW NOSORT STOPKEY | | 1 | 10 | 10 | 15 |
| 3 | TABLE ACCESS BY INDEX ROWID| TEST | 1 | 10 | 11 | 15 |
|* 4 | INDEX RANGE SCAN | TEST_PK | 1 | | 11 | 4 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber" <=10)
2 - filter(ROW_NUMBER() OVER ( ORDER BY "TEST"."START_VALIDITY") <=10 )
4 - access("CONTRACT_ID"=500)

SQL script runs VERY slowly with small change

I am relatively new to SQL. I have a script that used to run very quickly (<0.5 seconds) but runs very slowly (>120 seconds) if I add one change - and I can't see why this change makes such a difference. Any help would be hugely appreciated!
This is the script and it runs quickly if I do NOT include "tt2.bulk_cnt
" in line 26:
with bulksum1 as
(
select t1.membercode,
t1.schemecode,
t1.transdate
from mina_raw2 t1
where t1.transactiontype in ('RSP','SP','UNTV','ASTR','CN','TVIN','UCON','TRAS')
group by t1.membercode,
t1.schemecode,
t1.transdate
),
bulksum2 as
(
select t1.schemecode,
t1.transdate,
count(*) as bulk_cnt
from bulksum1 t1
group by t1.schemecode,
t1.transdate
having count(*) >= 10
),
results as
(
select t1.*, tt2.bulk_cnt
from mina_raw2 t1
inner join bulksum2 tt2
on t1.schemecode = tt2.schemecode and t1.transdate = tt2.transdate
where t1.transactiontype in ('RSP','SP','UNTV','ASTR','CN','TVIN','UCON','TRAS')
)
select * from results
EDIT: I apologise for not putting enough detail in here previously - although I can use basic SQL code, I am a complete novice when it comes to databases.
Database: Oracle (I'm not sure which version, sorry)
Execution plans:
QUICK query:
Plan hash value: 1712123489
---------------------------------------------
| Id | Operation | Name |
---------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH JOIN | |
| 2 | VIEW | |
| 3 | FILTER | |
| 4 | HASH GROUP BY | |
| 5 | VIEW | VM_NWVW_0 |
| 6 | HASH GROUP BY | |
| 7 | TABLE ACCESS FULL| MINA_RAW2 |
| 8 | TABLE ACCESS FULL | MINA_RAW2 |
---------------------------------------------
SLOW query:
Plan hash value: 1298175315
--------------------------------------------
| Id | Operation | Name |
--------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | FILTER | |
| 2 | HASH GROUP BY | |
| 3 | HASH JOIN | |
| 4 | VIEW | VM_NWVW_0 |
| 5 | HASH GROUP BY | |
| 6 | TABLE ACCESS FULL| MINA_RAW2 |
| 7 | TABLE ACCESS FULL | MINA_RAW2 |
--------------------------------------------
A few observations, and then some things to do:
1) More information is needed. In particular, how many rows are there in the MINA_RAW2 table, what indexes exist on this table, and when was the last time it was analyzed? To determine the answers to these questions, run:
SELECT COUNT(*) FROM MINA_RAW2;
SELECT TABLE_NAME, LAST_ANALYZED, NUM_ROWS
FROM USER_TABLES
WHERE TABLE_NAME = 'MINA_RAW2';
From looking at the plan output it looks like the database is doing two FULL SCANs on MINA_RAW2 - it would be nice if this could be reduced to no more than one, and hopefully none. It's always tough to tell without very detailed information about the data in the table, but at first blush it appears that an index on TRANSACTIONTYPE might be helpful. If such an index doesn't exist you might want to consider adding it.
2) Assuming that the statistics are out-of-date (as in, old, nonexistent, or a significant amount of data (> 10%) has been added, deleted, or updated since the last analysis) run the following:
BEGIN
DBMS_STATS.GATHER_TABLE_STATS(owner => 'YOUR-SCHEMA-NAME',
table_name => 'MINA_RAW2');
END;
substituting the correct schema name for "YOUR-SCHEMA-NAME" above. Remember to capitalize the schema name! If you don't know if you should or shouldn't gather statistics, err on the side of caution and do it. It shouldn't take much time.
3) Re-try your existing query after updating the table statistics. I think there's a fair chance that having up-to-date statistics in the database will solve your issues. If not:
4) This query is doing a GROUP BY on the results of a GROUP BY. This doesn't appear to be necessary as the initial GROUP BY doesn't do any grouping - instead, it appears this is being done to get the unique combinations of MEMBERCODE, SCHEMECODE, and TRANSDATE so that the count of the members by scheme and date can be determined. I think the whole query can be simplified to:
WITH cteWORKING_TRANS AS (SELECT *
FROM MINA_RAW2
WHERE TRANSACTIONTYPE IN ('RSP','SP','UNTV',
'ASTR','CN','TVIN',
'UCON','TRAS')),
cteBULKSUM AS (SELECT a.SCHEMECODE,
a.TRANSDATE,
COUNT(*) AS BULK_CNT
FROM (SELECT DISTINCT MEMBERCODE,
SCHEMECODE,
TRANSDATE
FROM cteWORKING_TRANS) a
GROUP BY a.SCHEMECODE,
a.TRANSDATE)
SELECT t.*, b.BULK_CNT
FROM cteWORKING_TRANS t
INNER JOIN cteBULKSUM b
ON b.SCHEMECODE = t.SCHEMECODE AND
b.TRANSDATE = t.TRANSDATE
I managed to remove an unnecessary subquery, but this syntax with distinct inside count may not work outside of PostgreSQL or may not be the desired result. I know I've certainly used it there.
select t1.*, tt2.bulk_cnt
from mina_raw2 t1
inner join (select t2.schemecode,
t2.transdate,
count(DISTINCT membercode) as bulk_cnt
from mina_raw2 t2
where t2.transactiontype in ('RSP','SP','UNTV','ASTR','CN','TVIN','UCON','TRAS')
group by t2.schemecode,
t2.transdate
having count(DISTINCT membercode) >= 10) tt2
on t1.schemecode = tt2.schemecode and t1.transdate = tt2.transdate
where t1.transactiontype in ('RSP','SP','UNTV','ASTR','CN','TVIN','UCON','TRAS')
When you use those with queries, instead of subqueries when you don't need to, you're kneecapping the query optimizer.

Need to speed up my UPDATE QUERY based on the EXPLAIN PLAN

I am updating my table data using a temporary table and it takes forever and it still has not completed. So I collected an explain plan on the query. Can someone advise me on how to tune the query or build indexes on them.
The query:
UPDATE w_product_d A
SET A.CREATED_ON_DT = (SELECT min(B.creation_date)
FROM mtl_system_items_b_temp B
WHERE to_char(B.inventory_item_id) = A.integration_id
and B.organization_id IN ('102'))
where A.CREATED_ON_DT is null;
Explain plan:
Plan hash value: 1520882583
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 47998 | 984K| 33M (2)|110:06:25 |
| 1 | UPDATE | W_PRODUCT_D | | | | |
|* 2 | TABLE ACCESS FULL | W_PRODUCT_D | 47998 | 984K| 9454 (1)| 00:01:54 |
| 3 | SORT AGGREGATE | | 1 | 35 | | |
|* 4 | TABLE ACCESS FULL| MTL_SYSTEM_ITEMS_B_TEMP | 1568 | 54880 | 688 (2)| 00:00:09 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("A"."CREATED_ON_DT" IS NULL)
4 - filter("B"."ORGANIZATION_ID"=102 AND TO_CHAR("B"."INVENTORY_ITEM_ID")=:B1)
Note
-----
- dynamic sampling used for this statement (level=2)
For this query:
UPDATE w_product_d A
SET A.CREATED_ON_DT = (SELECT min(B.creation_date)
FROM mtl_system_items_b_temp B
WHERE to_char(B.inventory_item_id) = A.integration_id
and B.organization_id IN ('102'))
where A.CREATED_ON_DT is null;
You have a problem. Why are you creating a temporary table with the wrong type for inventory_item_id? That is likely to slow down any access. So, let's fix the table first and then do the update:
alter table mtl_system_items_b_temp
add better_inventory_item_id varchar2(255); -- or whatever the right type is
update mtl_system_items_b_temp
set better_inventory_item_id = to_char(inventory_item_id);
Next, let's define the appropriate index:
create index idx_mtl_system_items_b_temp_3 on mtl_system_items_b_temp(better_inventory_item_id, organization_id, creation_date);
Finally, an index on w_product_d can also help:
create index idx_ w_product_d_1 w_product_d(CREATED_ON_DT);
Then, write the query as:
UPDATE w_product_d p
SET CREATED_ON_DT = (SELECT min(t.creation_date)
FROM mtl_system_items_b_temp t
WHERE t.better_nventory_item_id) = p.integration_id and
t.organization_id IN ('102')
)
WHERE p.CREATED_ON_DT is null;
Try a MERGE statement. It will likely go faster because it can read all the mtl_system_items_b_temp records at once rather than reading them over-and-over again for each row in w_product_d.
Also, your tables look like they're part of an Oracle e-BS environment. In the MTL_SYSTEM_ITEMS_B in such an environment, the INVENTORY_ITEM_ID and ORGANIZATION_ID columns are NUMBER. You seem to be using VARCHAR2 in your tables. Whenever you don't use the correct data types in your queries, you invite performance problems because Oracle must implicitly convert to the correct data type and, in doing so, loses its ability to use indexes on the column. So, make sure your queries treat each column correctly according to it's datatype. (E.g., if a column is a NUMBER use COLUMN_X = 123 instead of COLUMN_X = '123'.
Here's the MERGE example:
MERGE INTO w_product_d t
USING ( SELECT to_char(inventory_item_id) inventory_item_id_char, min(creation_date) min_creation_date
FROM mtl_system_items_b_temp
WHERE organization_id IN ('102') -- change this to IN (102) if organization_id is a NUMBER field!
) u
ON ( t.integration_id = u.inventory_item_id_char AND t.created_on_dt IS NULL )
WHEN MATCHED THEN UPDATE SET t.created_on_dt = nvl(t.created_on_date, u.min_creation_date) -- NVL is just in case...