Handling sub query which retrieves large amount of data - sql

I am using this query:
select test.* from
(
SELECT /*+ full(P) parallel(a,24) */
a.TRANS_DATE,a.STORE_NO,a.POS_NO,a.TICKET_NO,a.TICKET_START_TIME, a.INTERNAL_NO,
a.TRANS_DATE||a.STORE_NO||a.POS_NO||a.TICKET_NO||a.TICKET_START_TIME as VISITS,
s.city,s.region,
a.CUSTOMER_NO,
a.GROSS_AMT,
a.SALE_QTY,
a.DISCOUNT_AMT,
a.DISCOUNT_QTY,
S.FORMAT_CD,
S.FORMAT_DESC,
P.BRAND_ID,
P.BRAND_ID_DESC,
P.BRAND_TYPE,
P.BRAND_TYPE_DESC,
P.LEVEL1,P.LEVEL1_DESC,P.LEVEL2,
P.LEVEL2_DESC,P.LEVEL3,P.LEVEL3_DESC,P.LEVEL4,
P.LEVEL4_DESC,P.LEVEL5,P.LEVEL5_DESC,
P.Material_No,P.Medium_Desc,
b.mobile,
D.MOBILE,
ROW_NUMBER() OVER (PARTITION BY a.TRANS_DATE,a.STORE_NO,a.POS_NO,a.TICKET_NO,a.TICKET_START_TIME, a.INTERNAL_NO,
s.city,s.region,a.CUSTOMER_NO,a.GROSS_AMT,a.SALE_QTY,a.DISCOUNT_AMT,a.DISCOUNT_QTY,
S.FORMAT_CD,S.FORMAT_DESC,P.BRAND_ID,P.BRAND_ID_DESC,P.BRAND_TYPE,P.BRAND_TYPE_DESC,
P.LEVEL1,P.LEVEL1_DESC,P.LEVEL2,P.LEVEL2_DESC,P.LEVEL3,P.LEVEL3_DESC,P.LEVEL4,P.LEVEL4_DESC,
P.LEVEL5,P.LEVEL5_DESC,P.Material_No,P.Medium_Desc,b.mobile
ORDER BY D.MOBILE) as rnk
FROM P
INNER JOIN a ON a.Internal_No= P.Material_No
INNER JOIN S ON TO_CHAR(a.STORE_NO)= S.STORE_NO
LEFT OUTER JOIN b ON a.store_no=b.store_no and a.ticket_no=b.ticket_no and a.trans_date=b.trans_date
and a.ticket_start_time=b.ticket_start_time and a.pos_no=b.pos_no
LEFT OUTER JOIN d ON a.customer_no=d.customer_no
where a.TRANS_DATE between '07/04/2014' and sysdate
)test
where test.rnk=1
Problem with this query is, subquery is retriving large amount of rows and every time I run this query I am getting below error -
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP1.
How can I handle this without increasing TEMP tablespace.
Best Regards

Related

Converting a SQL subquery to a join for performance gains

I have a subquery with an inner join, This join is meant to cut down the data size to a more manageable size, before extracting data via an unpivot which has a further join to only pull out relevant matches..
When i've looked at the execution plan, it seems like the outer select is being executed first and thus taking an inordinate amount of time to complete as it is processing the data for all gamers instead of the cut down cohort..
this is the query
SELECT
t2.Gamer_ID,
C.Feature_Code,
C.Feature_Name,
t2.CODE_DATE
FROM
(
SELECT
CAST(A.Gamer_ID
,[Identification_Code_Code_1]
,[Identification_Code_Code_2]
,[Identification_Code_Code_3]
,[Identification_Code_Code_4]
,[Identification_Code_Code_5]
,[Identification_Code_Code_6]
,[Identification_Code_Code_7]
,[Identification_Code_Code_8]
,[Identification_Code_Code_9]
,[Identification_Code_Code_10]
,CAST(Joining_date AS DATE) AS CODE_DATE
FROM Gamer_Characteristics A
INNER JOIN Gamer_Population P ON P.Gamer_ID = A.Gamer_ID --cuts down the number of gamers to the selected cohort
) s
unpivot (CODE for col in (
[Identification_Code_Code_1]
,[Identification_Code_Code_2]
,[Identification_Code_Code_3]
,[Identification_Code_Code_4]
,[Identification_Code_Code_5]
,[Identification_Code_Code_6]
,[Identification_Code_Code_7]
,[Identification_Code_Code_8]
,[Identification_Code_Code_9]
,[Identification_Code_Code_10])) as t2
INNER JOIN Gamer_feature_Code C ON C.CODE = LEFT(t2.CODE,C.CODE_LENGTH) --join to a dimension table to pull through characteristcs based on code and code length
WHERE
T2.CODE_DATE <= '2020-03-31'
GROUP BY t2.Gamer_ID,
C.Feature_Code,
C.Feature_Name,
t2.CODE_DATE
I have two questions.
1: can this be converted to use a join instead of subquery
2: Can i force the inner join in the subquery to take precedence over the inner join in the outer select?

Why does separate table perform significantly better than subquery?

I was trying to improve performance of a SQL query and tried few combinations.
Original Query
SELECT ALIAS_A.id1,
ALIAS_A.id2,
ALIAS_B.columnA,
ALIAS_C.columnB,
ALIAS_B.columnC
FROM db_A.table_A ALIAS_A
LEFT OUTER JOIN db_A.table_B ALIAS_B
ON ALIAS_A.id2 = ALIAS_B.id2
LEFT OUTER JOIN db_B.table_C ALIAS_C
ON ALIAS_B.columnA = ALIAS_C.item_num
LEFT OUTER JOIN db_A.table_D ALIAS_D
ON ALIAS_A.id2 = ALIAS_D.id2
INNER JOIN db_C.table_E ALIAS_E
ON Cast(ALIAS_A.column_date AS DATE) BETWEEN
ALIAS_E.column_startdate AND ALIAS_E.column_enddate
WHERE ALIAS_E.fiscalyear >= 2016
AND Cast(ALIAS_A.columnD AS DATE) BETWEEN
CURRENT_DATE - 5 AND CURRENT_DATE
The above query consumes nearly 400k impactCPU
Optimized Query 1
SELECT New_sub_table.id1,
New_sub_table.id2,
ALIAS_B.columnA,
ALIAS_C.columnB,
ALIAS_B.columnC
--changed part start--
FROM ( sel * from db_A.table_A ALIAS_A WHERE Cast(ALIAS_A.columnD AS DATE) BETWEEN
CURRENT_DATE - 5 AND CURRENT_DATE ) New_sub_table -- created a subquery
--changed part end--
LEFT OUTER JOIN db_A.table_B ALIAS_B
ON New_sub_table.id2 = ALIAS_B.id2
LEFT OUTER JOIN db_B.table_C ALIAS_C
ON ALIAS_B.columnA = ALIAS_C.item_num
LEFT OUTER JOIN db_A.table_D ALIAS_D
ON New_sub_table.id2 = ALIAS_D.id2
INNER JOIN db_C.table_E ALIAS_E
ON Cast(New_sub_table.column_date AS DATE) BETWEEN
ALIAS_E.column_startdate AND ALIAS_E.column_enddate
WHERE ALIAS_E.fiscalyear >= 2016
I thought to filter the data first and then do the joins. After I checked the performance stats. It was consuming nearly 390k CPU. Not much of a difference.
Optimized Query 2
SELECT ALIAS_A.id1,
ALIAS_A.id2,
ALIAS_B.columnA,
ALIAS_C.columnB,
ALIAS_B.columnC
--changed part start--
FROM INTERMEDIATE_DB.INTERMEDIATE_TABLE ALIAS_A --CREATED AN INTERMEDIATE TABLE
--changed part end--
LEFT OUTER JOIN db_A.table_B ALIAS_B
ON ALIAS_A.id2 = ALIAS_B.id2
LEFT OUTER JOIN db_B.table_C ALIAS_C
ON ALIAS_B.columnA = ALIAS_C.item_num
LEFT OUTER JOIN db_A.table_D ALIAS_D
ON ALIAS_A.id2 = ALIAS_D.id2
INNER JOIN db_C.table_E ALIAS_E
ON Cast(ALIAS_A.column_date AS DATE) BETWEEN
ALIAS_E.column_startdate AND ALIAS_E.column_enddate
WHERE ALIAS_E.fiscalyear >= 2016
MACRO for loading data into intermediate table
INSERT INTO INTERMEDIATE_DB.INTERMEDIATE_TABLE
sel * from db_A.table_A ALIAS_A WHERE Cast(ALIAS_A.columnD AS DATE) BETWEEN
CURRENT_DATE - 5 AND CURRENT_DATE
So what I did here was. I used an intermediate table instead of subquery. The intermediate table gets loaded via the macro first and then the select query runs. It now consumes only 50k impactCPU (for both Macro and Select query combined).
My question -
I am unable to reason why this is happening even though the logic behind both queries is same (or so I think it is). What would be the best practice if this is incorrect way ?
Your main problem is the Cast(ALIAS_A.columnD AS DATE). When you check Explains you will notice the optimizer has no confidence for this step, probably greatly overestimating the number of rows returned.
But when you materialize the Select the number of rows is better known and the order of joins changes.
You would probably get the same plan when you Collect Statistics on the Cast(ALIAS_A.columnD AS DATE), run DIAGNOSTIC HELPSTATS ON FOR SESSION; and Explain should show you this as recommended stats.

SQL Using WITH and JOIN together to create a view efficiently

I'm trying to use data from one table named period which specifies what period a date falls into, then using that instance to join into another table using the following statement.
WITH rep_prod AS (
SELECT t.tran_num, t.amount, t.provider_id, t.clinic,
t.tran_date, t.type, t.impacts, p.period_id, p.fiscal_year, p.period_weeks
FROM transactions t, period p
WHERE tran_date BETWEEN period_start AND period_end
)
SELECT r.tran_num, r.amount, r.provider_id, d.first_name, d.last_name,
d.clinic, r.tran_date, r.period_id, r.period_weeks, r.type, r.impacts
FROM rep_prod AS r
INNER JOIN provider AS d
ON r.provider_id = d.provider_id AND r.clinic = d.clinic
Looking to create this as a view on my DB, is there a more efficient way to accomplish this? This is currently holding around 6.2 million rows and it will only continue to get bigger. This query alone took over 7 minutes to complete, granted I'm using SQL Express with the memory limitations.
Update: Change query to reflect the removal of the SELECT DISTINCT function
EDIT: #Rabbit So you're suggesting something like this?
SELECT t.tran_num, t.amount, t.provider_id, d.first_name, d.last_name,
d.clinic, t.tran_date, p.period_id, p.period_weeks, p.fiscal_year, p.period_start, p.period_end, t.type, t.impacts
FROM transactions t
INNER JOIN provider d
ON provider.provider_id = transactions.provider_id AND provider.clinic = transactions.clinic
INNER JOIN period p
ON t.tran_date BETWEEN p.period_start AND p.period_end

Google Bigquery says "Response too large to return" with simple select

Modifier allowLargeResults is set on and I have also tried interactive and batch query priority.
There are 70M records in table search_results, 10M records in searches and about (just) 900 in buy table. And also the WHERE reduces the number of rows pretty well.
SELECT
s.flyFrom, s.to, s.typeFlight, r.price, b.price, b.affily
FROM [sptest.buy] AS b
INNER JOIN [sptest.search_results] AS r
ON b.booking_token=r.booking_token
INNER JOIN [sptest.searches] AS s
ON s.searchid=r.searchid
WHERE
DATE(r.saved_at) >= DATE('2015-06-23 00:00:00') AND
DATE(s.saved_at) >= DATE('2015-06-23 00:00:00')
LIMIT 10
Could the problem be caused by large joining keys? The booking_token key is variable size 50-600 chars.
I would do couple modifications to this query:
Move WHERE clause filters closer to the table scan
Use JOIN EACH construct
SELECT
s.flyFrom, s.to, s.typeFlight, r.price, b.price, b.affily
FROM [sptest.buy] AS b
INNER JOIN EACH
(SELECT * FROM [sptest.search_results] WHERE saved_at > DATE('2015-06-23 00:00:00')) AS r
ON b.booking_token=r.booking_token
INNER JOIN EACH
(SELECT * FROM [sptest.searches] WHERE saved_at > DATE('2015-06-23 00:00:00') AS s
ON s.searchid=r.searchid
LIMIT 10

ORA-22813 error with SQL complex query

I have a big SELECT statement which has many nested selects in it. When I run it, it gives me an ORA-22813 error:
Ora-22813:- The Collection value from one of the inner sub queries has exceeded the system limits and hence this error.
I have given below some of the nested selects which return huge data.
---The 1st select returns the most data.
Can I handle and process the huge data returned by the INNER SELECTs into the tables in any alternate way so that there is no error of memory less, sort size less.
get, any other way so that the QUERY successfully processes without error.
/*****************************************BEGIN
LEFT OUTER JOIN
( SELECT *
FROM STUDENT_COURSE stu_c
LEFT OUTER JOIN STUDENT_history ch on stu_c.course_id = ch.ch_course_id
LEFT OUTER JOIN STUDENT_master stu_mca on ch.course_history_id = stu_mca.item_id
) stu_c ON stu_c.HISTORY_ID = toa.ACTIVITY_ID ----->This table is joined earlier
LEFT OUTER JOIN
(SELECT c_e.EV_ID, c_e.EV_NAME, ma.item_id, ma.cata_id
FROM EVENTS c_e LEFT OUTER
JOIN COURSE_master ma on c_e.event_Id = ma.item_id ) c_e ON c_e.EVENT_ID = toa.ACTIVITY_ID
After these selects---we have GROUP_BYs to further sort.
---I have checked that if I put a extra limit qualification
like where rownum <30,<20 in each of these SELECTs it works fine.
Full query
SELECT * FROM (SELECT
mcat.CATALOG_ITEM_ID,
mcat.CATALOG_ITEM_NAME ,
mcat.DESCRIPTION,
mcat.CATALOG_ITEM_TYPE,
mcat.DELIVERY_METHOD,
XMLElement("TRAINING_PLAN",XMLAttributes( TP.TPLAN_ID as "id" ),
XMLELEMENT("COMPLETE_QUANTITY", TP.COMPLETE_QUANTITY),
XMLELEMENT("COMPLETE_UNIT", TP.COMPLETE_UNIT),
XMLElement("TOTAL_CREDITS", TP.numberOfCredits ),
XMLELEMENT("IS_CREDIT_BASED", TP.IS_CREDIT_BASED),
XMLELEMENT("IS_FOR_CERT", TP.IS_FOR_CERT),
XMLELEMENT("ACCREDIT_ORG_NAME", TP.ACCRED_ORG_NAME),
XMLELEMENT("ACCREDIT_ORG_ID", TP.accredit_org_id ),
XMLElement("OBJECTIVE_LIST", TP.OBJECTIVE_LIST )
).extract('/').getClobVal() AS PLAN_LIST
FROM
student_master_catalog mcat
INNER JOIN
(SELECT stu_tp.TPLAN_ID,
stu_tp.COMPLETE_QUANTITY,
stu_tp.COMPLETE_UNIT,
stu_tp.TPLAN_XML_DATA.extract('//numberOfCredits/text()').getStringVal() as numberOfCredits,
stu_tp.IS_CREDIT_BASED,
stu_tp.IS_FOR_CERT,
stu_oa.ACCRED_ORG_NAME,
stu_tp.TPLAN_XML_DATA.extract('//accreditingOrg/text()').getStringVal() as accredit_org_id,
objective_list.OBJECTIVE_LIST
FROM
student_training_catalog stu_tp
LEFT OUTER JOIN
stu_accrediting_org stu_oa on stu_tp.TPLAN_XML_DATA.extract('//accreditingOrg/text()').getStringVal() = stu_oa.ACCRED_ORG_ID
INNER JOIN
(SELECT *
FROM
(SELECT
stu_tpo.TPLAN_ID AS OBJECTIVE_TPLAN_ID,
XMLAgg(
XMLElement("OBJECTIVE",
XMLElement("OBJECTIVE_ID",stu_tpo.T_OBJECTIVE_ID ),
XMLElement("OBJECTIVE_NAME",stu_to.T_OBJECTIVE_NAME ),
XMLElement("OBJECTIVE_REQUIRED_CREDITS_OR_ACTIVITIES",stu_tpo.REQUIRED_CREDITS ),
XMLElement("ITEM_ORDER", stu_tpo.ITEM_ORDER ),
XMLElement("ACTIVITY_LIST", activity_list.ACTIVITY_LIST )
)
) as OBJECTIVE_LIST
FROM
stu_TP_OBJECTIVE stu_tpo
INNER JOIN
stu_TRAINING_OBJECTIVE stu_to ON stu_tpo.T_OBJECTIVE_ID = stu_to.T_OBJECTIVE_ID
INNER JOIN
(SELECT *
FROM
(SELECT stu_toa.T_OBJECTIVE_ID AS ACTIVITY_TOBJ_ID, XMLAgg(
XMLElement("ACTIVITY",
XMLElement("ACTIVITY_ID",stu_toa.ACTIVITY_ID ),
XMLElement("CATALOG_ID",COALESCE(stu_c.CATALOG_ID, COALESCE( stu_e.CATALOG_ID, stu_t.CATALOG_ID ) ) ),
XMLElement("CATALOG_ITEM_ID",COALESCE(stu_c.CATALOG_ITEM_ID, COALESCE( stu_e.CATALOG_ITEM_ID, stu_t.CATALOG_ITEM_ID ) ) ),
XMLElement("DELIVERY_METHOD",COALESCE(stu_c.DELIVERY_METHOD, COALESCE( stu_e.DELIVERY_METHOD, stu_t.DELIVERY_METHOD ) ) ),
XMLElement("ACTIVITY_NAME",COALESCE(stu_c.COURSE_NAME, COALESCE( stu_e.EVENT_NAME, stu_t.TEST_NAME ) ) ),
XMLElement("ACTIVITY_TYPE",initcap( stu_toa.ACTIVITY_TYPE ) ),
XMLElement("IS_REQUIRED",stu_toa.IS_REQUIRED ),
XMLElement("IS_PREFERRED",stu_toa.IS_PREFERRED ),
XMLElement("NUMBER_OF_CREDITS",stu_lac.CREDIT_HOURS),
XMLElement("ITEM_ORDER", stu_toa.ITEM_ORDER )
)) as ACTIVITY_LIST
FROM stu_TRAIN_OBJ_ACTIVITY stu_toa
LEFT OUTER JOIN
(
SELECT distinct lac.LEARNING_ACTIVITY_ID, lac.CREDIT_HOURS
FROM student_training_catalog tp
INNER JOIN stu_TP_OBJECTIVE tpo on tp.TPLAN_ID = tpo.TPLAN_ID
INNER JOIN stu_TRAIN_OBJ_ACTIVITY toa on tpo.T_OBJECTIVE_ID = toa.T_OBJECTIVE_ID
INNER JOIN stu_LEARNINGACTIVITY_CREDITS lac on lac.LEARNING_ACTIVITY_ID = toa.ACTIVITY_ID and tp.TPLAN_XML_DATA.extract ('//accreditingOrg/text()').getStringVal() = lac.ACC_ORG_ID
where tp.tplan_id ='*************'
) stu_lac ON stu_lac.LEARNING_ACTIVITY_ID = stu_toa.ACTIVITY_ID ------>This Select returns correct no. of rows
I want to join the below nested SELECTs with stu_toa.ACTIVITY_ID. This would solve my issues.
This below SELECT inside the LEFT OUTER JOIN is the Problem. it returns too much because 3 tables are joined directly without any value qualification.
LEFT OUTER JOIN
( SELECT ch.COURSE_HISTORY_ID, stu_c.COURSE_NAME, mca.catalog_item_id, mca.catalog_id, mca.delivery_method
FROM stu_COURSE stu_c
LEFT OUTER JOIN stu_course_history ch on stu_c.course_id = ch.ch_course_id -
--If I can qualify here with ch.ch_course_id = stu_toa.ACTIVITY_ID (stu_toa.ACTIVITY_ID from the above select with correct no. of rows )
--Here, I get errors because I can't access outside values inside a left outer join
LEFT OUTER JOIN student_master_catalog mca on ch.course_history_id = mca.catalog_item_id
) stu_c ON stu_c.COURSE_HISTORY_ID = stu_toa.ACTIVITY_ID
LEFT OUTER JOIN
(SELECT stu_e.EVENT_ID, stu_e.EVENT_NAME, mca.catalog_item_id, mca.catalog_id, mca.delivery_method FROM stu_EVENTS stu_e LEFT OUTER JOIN student_master_catalog mca on stu_e.event_Id = mca.catalog_item_id ) stu_e ON stu_e.EVENT_ID = stu_toa.ACTIVITY_ID
LEFT OUTER JOIN
(SELECT stu_t.TEST_HISTORY_ID, stu_t.TEST_NAME, mca.catalog_item_id, mca.catalog_id, mca.delivery_method FROM stu_TEST_HISTORY stu_t LEFT OUTER JOIN student_master_catalog mca on stu_t.test_history_id = mca.catalog_item_id) stu_t ON stu_t.test_history_id = stu_toa.ACTIVITY_ID
GROUP BY stu_toa.T_OBJECTIVE_ID) ) activity_list ON activity_list.ACTIVITY_TOBJ_ID = stu_tpo.T_OBJECTIVE_ID
GROUP BY stu_tpo.TPLAN_ID) ) objective_list ON objective_list.OBJECTIVE_TPLAN_ID = stu_tp.TPLAN_ID
)TP ON TP.TPLAN_ID = mcat.CATALOG_ITEM_ID
WHERE
mcat.CATALOG_ITEM_ID = '*****************' and mcat.CATALOG_ORG_ID = '********')
Please post the DDLs, approximate sizes (relative to each other), and the complete query, rather than just an excerpt.
Some quick hits that may or may not solve your problem (for better help, I need better information) --
Are you sure you mean OUTER join? Outer joining students to courses means students who are not taking any courses will still be around. Is that the desired behaviour?
Don't select * if you only want a limited subset of the columns. Enumerate the exact columns you need. The rest might not seem like much on a row-by-row basis, but when you multiply by the total number of rows you have, this sort of thing can mean the difference between in-memory sorts and spilling to disk.
How many rows of data are you looking at? there are times when separate queries with programmatic aggregation can work better. Someone with more knowledge of Oracle query optimization may be able to help, also, tweaking the settings could help here too...
I've had instances where a sproc was being called that aggregated data from more than one source took exponentially longer than two calls in the app, and putting it together in memory.
Post DDL of your tables and exact plan of the query.
Meanwhile, try increasing pga_aggregate_target, sort_area_size and hash_area_size