Left outer join with left() function - sql

I'm running the below query daily (overnight) and it's taking considerable time to run (1-1.5 hours). I'm certain the "Acc.DateKey >= LEFT (LocationKey, 8)" is the reason and if this part of the join is removed the query executes in around 5 minutes. I just cannot think of a more efficient way.
Acc.DateKey is a bigint typically 20180101 etc., with the location key being a bigint typically 201801011234 etc.
So far I've considered including a new column in the LO table "AccLocationKey" which will be inserted with the LEFT (LocationKey, 8) function when loaded.
I've decided to pose the question here first - could this be improved upon without changing the LO table?
SELECT
ISNULL(MAX(L.LocationKey),(SELECT MIN(LocationKey) FROM LO WHERE Location = Acc.Location)) AS LocationKey
FROM
Acc
LEFT OUTER JOIN
(
SELECT
LocationKey
,Location
FROM
LO
)AS L
ON Acc.Location = L.Location AND Acc.DateKey >= LEFT(LocationKey,8)

Let's rewrite the query without the subquery in the SELECT:
SELECT COALESCE(MAX(L.LocationKey), MIN(L.MIN_LocationKey)) AS LocationKey
FROM Acc LEFT OUTER JOIN
(SELECT MIN(l.LocationKey) OVER (PARTITION BY l.Location) as min_location,
l.*
FROM LO l
) L
ON Acc.Location = L.Location AND Acc.DateKey >= LEFT(l.LocationKey, 8);
Probably your best chance at performance is to add a computed column and appropriate index. So:
alter table lo add locationkey_datekey as (try_convert(bigint, LEFT(l.LocationKey, 8))) persisted;
Then, the appropriate index:
create index idx_lo_location_datekey on lo(location, locationkey_datekey);
Then use this in the query:
SELECT COALESCE(MAX(L.LocationKey), MIN(L.MIN_LocationKey)) AS LocationKey
FROM Acc LEFT OUTER JOIN
(SELECT MIN(l.LocationKey) OVER (PARTITION BY l.Location) as min_location,
l.*
FROM LO l
) L
ON Acc.Location = L.Location AND Acc.DateKey >= l.LocationKey_datekey;
Happily, this index will also work for the window function.

Related

Sort & Parallelism costing my query too much time

My SQL query is taking a large amount of time to run. I wrote a similar query and pit them against each other and this one runs FASTER when a small dataset (10K lines) is used, but about 20-30x slower than the other one when a LARGE dataset (500K+ lines) is used. My first query however does not have ONE column that I need, and I cannot add it without going about it with this different approach.
SELECT a.[RFIDTAGID], a.[JOB_NUMBER], d.[PROJECT_NUMBER], a.[PART_NUMBER], a.[QUANTITY], b.[DESIGNATION] as LOCATION,
c.[DESIGNATION] as CONTAINER, a.[LAST_SEEN_TIME], b.[TYPE], b.[BLDG], d.[PBG], d.[PLANNED_MFG_DELIVERY_DATE], d.[EXTENSION_DATE], a.[ORG_ID]
FROM [LTS].[dbo].[LTS_PACKAGE] as a
LEFT OUTER JOIN (
SELECT [DESIGNATION], [CONTAINER_ID], [LOCATION_ID]
FROM [LTS].[dbo].[LTS_CONTAINER]
) c ON a.[CONTAINER_ID] = c.[CONTAINER_ID]
LEFT OUTER JOIN (
SELECT [DESIGNATION], [TYPE], [BLDG], [LOCATION_ID]
FROM [LTS].[dbo].[LTS_LOCATION]
) b ON a.[LAST_SEEN_LOC_ID] = b.[LOCATION_ID] OR b.[LOCATION_ID] = c.[LOCATION_ID]
INNER JOIN (
SELECT [PBG], [PLANNED_MFG_DELIVERY_DATE], [EXTENSION_DATE], [DISCRETE_JOB_NUMBER], [PROJECT_NUMBER]
FROM [LTS].[dbo].[LTS_DISCRETE_JOB_SUMMARY]
)d ON a.[JOB_NUMBER] = d.[DISCRETE_JOB_NUMBER]
WHERE
d.[PLANNED_MFG_DELIVERY_DATE] <= GETDATE()
AND b.[TYPE] NOT IN('MFG', 'Manufacturing')
AND (b.[DESIGNATION] IS NOT NULL OR c.[DESIGNATION] IS NOT NULL)
ORDER BY [JOB_NUMBER], d.[PLANNED_MFG_DELIVERY_DATE] desc, [RFIDTAGID];
You can see below the usage, 100% is roughly 20,000, whereas my other query is about 900:
Is there something I can do to speed up my query, or where did I bog it down?
Remove inner selects and join directly to the tables:
SELECT a.[RFIDTAGID], a.[JOB_NUMBER], d.[PROJECT_NUMBER], a.[PART_NUMBER], a.[QUANTITY], b.[DESIGNATION] as LOCATION,
c.[DESIGNATION] as CONTAINER, a.[LAST_SEEN_TIME], b.[TYPE], b.[BLDG], d.[PBG], d.[PLANNED_MFG_DELIVERY_DATE], d.[EXTENSION_DATE], a.[ORG_ID]
FROM [LTS].[dbo].[LTS_PACKAGE] a
LEFT OUTER JOIN [LTS].[dbo].[LTS_CONTAINER]
c ON a.[CONTAINER_ID] = c.[CONTAINER_ID]
LEFT OUTER JOIN [dbo].[LTS_LOCATION]
b ON a.[LAST_SEEN_LOC_ID] = b.[LOCATION_ID] OR b.[LOCATION_ID] = c.[LOCATION_ID]
INNER JOIN
[LTS].[dbo].[LTS_DISCRETE_JOB_SUMMARY]
d ON a.[JOB_NUMBER] = d.[DISCRETE_JOB_NUMBER]
WHERE
d.[PLANNED_MFG_DELIVERY_DATE] <= GETDATE()
AND b.[TYPE] NOT IN('MFG', 'Manufacturing')
AND (b.[DESIGNATION] IS NOT NULL OR c.[DESIGNATION] IS NOT NULL)
ORDER BY [JOB_NUMBER], d.[PLANNED_MFG_DELIVERY_DATE] desc, [RFIDTAGID];

Why does separate table perform significantly better than subquery?

I was trying to improve performance of a SQL query and tried few combinations.
Original Query
SELECT ALIAS_A.id1,
ALIAS_A.id2,
ALIAS_B.columnA,
ALIAS_C.columnB,
ALIAS_B.columnC
FROM db_A.table_A ALIAS_A
LEFT OUTER JOIN db_A.table_B ALIAS_B
ON ALIAS_A.id2 = ALIAS_B.id2
LEFT OUTER JOIN db_B.table_C ALIAS_C
ON ALIAS_B.columnA = ALIAS_C.item_num
LEFT OUTER JOIN db_A.table_D ALIAS_D
ON ALIAS_A.id2 = ALIAS_D.id2
INNER JOIN db_C.table_E ALIAS_E
ON Cast(ALIAS_A.column_date AS DATE) BETWEEN
ALIAS_E.column_startdate AND ALIAS_E.column_enddate
WHERE ALIAS_E.fiscalyear >= 2016
AND Cast(ALIAS_A.columnD AS DATE) BETWEEN
CURRENT_DATE - 5 AND CURRENT_DATE
The above query consumes nearly 400k impactCPU
Optimized Query 1
SELECT New_sub_table.id1,
New_sub_table.id2,
ALIAS_B.columnA,
ALIAS_C.columnB,
ALIAS_B.columnC
--changed part start--
FROM ( sel * from db_A.table_A ALIAS_A WHERE Cast(ALIAS_A.columnD AS DATE) BETWEEN
CURRENT_DATE - 5 AND CURRENT_DATE ) New_sub_table -- created a subquery
--changed part end--
LEFT OUTER JOIN db_A.table_B ALIAS_B
ON New_sub_table.id2 = ALIAS_B.id2
LEFT OUTER JOIN db_B.table_C ALIAS_C
ON ALIAS_B.columnA = ALIAS_C.item_num
LEFT OUTER JOIN db_A.table_D ALIAS_D
ON New_sub_table.id2 = ALIAS_D.id2
INNER JOIN db_C.table_E ALIAS_E
ON Cast(New_sub_table.column_date AS DATE) BETWEEN
ALIAS_E.column_startdate AND ALIAS_E.column_enddate
WHERE ALIAS_E.fiscalyear >= 2016
I thought to filter the data first and then do the joins. After I checked the performance stats. It was consuming nearly 390k CPU. Not much of a difference.
Optimized Query 2
SELECT ALIAS_A.id1,
ALIAS_A.id2,
ALIAS_B.columnA,
ALIAS_C.columnB,
ALIAS_B.columnC
--changed part start--
FROM INTERMEDIATE_DB.INTERMEDIATE_TABLE ALIAS_A --CREATED AN INTERMEDIATE TABLE
--changed part end--
LEFT OUTER JOIN db_A.table_B ALIAS_B
ON ALIAS_A.id2 = ALIAS_B.id2
LEFT OUTER JOIN db_B.table_C ALIAS_C
ON ALIAS_B.columnA = ALIAS_C.item_num
LEFT OUTER JOIN db_A.table_D ALIAS_D
ON ALIAS_A.id2 = ALIAS_D.id2
INNER JOIN db_C.table_E ALIAS_E
ON Cast(ALIAS_A.column_date AS DATE) BETWEEN
ALIAS_E.column_startdate AND ALIAS_E.column_enddate
WHERE ALIAS_E.fiscalyear >= 2016
MACRO for loading data into intermediate table
INSERT INTO INTERMEDIATE_DB.INTERMEDIATE_TABLE
sel * from db_A.table_A ALIAS_A WHERE Cast(ALIAS_A.columnD AS DATE) BETWEEN
CURRENT_DATE - 5 AND CURRENT_DATE
So what I did here was. I used an intermediate table instead of subquery. The intermediate table gets loaded via the macro first and then the select query runs. It now consumes only 50k impactCPU (for both Macro and Select query combined).
My question -
I am unable to reason why this is happening even though the logic behind both queries is same (or so I think it is). What would be the best practice if this is incorrect way ?
Your main problem is the Cast(ALIAS_A.columnD AS DATE). When you check Explains you will notice the optimizer has no confidence for this step, probably greatly overestimating the number of rows returned.
But when you materialize the Select the number of rows is better known and the order of joins changes.
You would probably get the same plan when you Collect Statistics on the Cast(ALIAS_A.columnD AS DATE), run DIAGNOSTIC HELPSTATS ON FOR SESSION; and Explain should show you this as recommended stats.

How can I make this SQL query more efficient?

I have a query trying to pull data from multiple tables but when I run it, it takes a really long time (So long I haven't even been able to wait long enough). I know it's extremely inefficient and wanted to get some input as to how it can be written better. Here it is:
SELECT
P.patient_name,
LOH.patient_id,
LOH.requesting_location,
LOH.sample_date,
LOH.lab_doing_work,
L.location_name,
LOD.test_code,
LOD.test_rdx,
LSR.tube_type
FROM
mis_db.dbo.lab_order_header AS LOH,
mis_db.dbo.patient AS P,
mis_db.dbo.lab_order_detail AS LOD,
mis_db.dbo.lab_sample_rule AS LSR,
mis_db.dbo.location AS L
WHERE
LOH.requesting_location = '000839' AND
LOH.lab_order_id = LOD.lab_order_id AND
LOH.sample_date IN ('05/28/2015', '05/29/2015')
--LOH.patient_id = LOD.patient_id
--LOD.sample_date = LOH.sample_date
ORDER BY
P.patient_name DESC
try this (or something like it)
SELECT P.patient_name,
lo.patient_id, lo.requesting_location,
lo.sample_date, lo.lab_doing_work,
l.location_name, d.test_code, d.test_rdx,
d.tube_type
FROM mis_db.dbo.lab_order_header lo
join mis_db.dbo.patient p on p.patient_id = lo.Patient_id
join mis_db.dbo.lab_order_detail d on d.lab_order_id = lo.lab_order_id
join mis_db.dbo.lab_sample_rule r on r.rule_id = lo.ruleId -- ????
join mis_db.dbo.location l on l.locationid = lo.requesting_location
WHERE lo.requesting_location = '000839' AND
lo.sample_date IN ('05/28/2015', '05/29/2015')
ORDER BY p.patient_name DESC
I ended up going with the following and was able to get the results I wanted:
SELECT LOH.patient_id,
patient_name,
[mis_db_rpt].[common].[string_date_format](LOD.sample_date) AS
[Draw Date],
test_description,
LOD.test_code,
LOH.lab_doing_work,
tube_type,
L.short_name
FROM [mis_db].[dbo].[lab_order_header]
LOH
INNER JOIN
[mis_db].[dbo].[lab_order_detail]
LOD
ON LOH.lab_order_id = LOD.lab_order_id
INNER JOIN
[mis_db].[dbo].[patient]
P
ON P.patient_id = LOD.patient_id
INNER JOIN
[mis_db].[dbo].[sample_tube]
ST
ON LOD.sample_id = ST.sample_id
INNER JOIN
[mis_db].[dbo].[location] AS
L
ON LOH.lab_doing_work = L.location_id
INNER JOIN
[mis_db].[dbo].[lab_test] AS
LT
ON LOD.test_code = LT.test_code
WHERE LOH.requesting_location = '000839' AND
LOD.sample_date IN ('05/28/2015', '05/29/2015')
ORDER BY LOD.sample_date,
patient_name,
LOD.patient_id,
test_description
I would try
Click to run the estimated execution plan in SSMS and see if it suggests any missing indexes. I would think a non clustered index on lo.requesting_location and sample_date might help with the filter
Also in desc index on p.patient_name may help with the performance of the order by.
Try changing the IN date filter to "between '05/28/2015' and '05/29/2015'

Enhance Postgresql query (many subqueries)

I have a postgrsql query which takes a lot of time to execute (5 min) because of sub queries I think. I would like to find a way to enhance this query:
select v.id, v.pos, v.time, v.status, vi.name,vi.type,
(select c.fullname
from company c
where vi.registered_owner_code = c.owcode ) AS registered_owner
,(select c.fullname
from company c
where vi.group_beneficial_owner_code=c.owcode) AS group_beneficial_owner
,(select c.fullname
from company c
where vi.operator_code = c.owcode ) AS operator
,(select c.fullname
from company c
where vi.manager_code = c.owcode ) AS manager
from (car_pos v left join cars vi on v.id = vi.id)
where age(now(), v.time::time with time zone) < '1 days'::interval
because of subqueries I think
This is not really a guessing game. You can get query execution plan explanation in pgadmin or just under console
http://www.pgadmin.org/docs/1.4/query.html
http://www.postgresql.org/docs/current/static/sql-explain.html
then you can see what's going on and what takes that much time.
After analysis you can add indexes or change something else but at least you will know what needs to be changed.
The WHERE condition can't use an index, you have to change that one. v.time should not be in a volatile function, age() in this case.
3 key ingredients:
Do away with the correlated subqueries, use JOIN instead - like
other answers mentioned already.
In the WHERE clause, don't use an expression on your column, which cannot utilize an index. #Frank already mentions it. Only the most basic, stable expressions can be rewritten by the query planner to use an index. See how I rewrote it.
Create suitable indexes.
SELECT v.id, v.pos, v.time, v.status, c.name, c.type
,r.fullname AS registered_owner
,g.fullname AS group_beneficial_owner
,o.fullname AS operator
,m.fullname AS manager
FROM car_pos v
LEFT JOIN cars c ON USING (id)
LEFT JOIN company r ON r.owcode = c.registered_owner_code
LEFT JOIN company g ON g.owcode = c.group_beneficial_owner_code
LEFT JOIN company o ON o.owcode = c.operator_code
LEFT JOIN company m ON m.owcode = c.manager_code
WHERE v.time > (now() - interval '1 day');
You need unique indexes on cars.id and company.owcode (primary keys do the job, too).
And you need an index on car_pos.time like:
CREATE INDEX car_pos_time_idx ON car_pos (time DESC);
Works without descending order, too. If you have lots of rows (-> big table, big index), you might want to create a partial index that covers only recent history and recreate it on a daily or weekly basis at off hours:
CREATE INDEX car_pos_time_idx ON car_pos (time DESC);
WHERE time > $mydate
Where $mydate is the result of (now() - interval '1 day'). This matches the condition of your query logically at any time. Effectiveness slowly deteriorates over time.
Aside: don't name a column of type timestamp "time", that's misleading from a documentation point of view. Actually, rather don't use time as column name at all. It's a reserved word in every SQL standard and a type name in PostgreSQL.
select v.id, v.pos, v.time, v.status, vi.name,vi.type,
c1.fullname as Registered_owner,
c2.fullname as group_beneficial_owner,
c3.fullname AS operator,
c4.fullname AS manager
from car_pos v
left outer join cars vi on v.id = vi.id
left outerjoin company c1 on vi.registered_owner_code=c1.owcode
left outerjoin company c2 on vi.group_beneficial_owner_code=c2.owcode
left outerjoin company c3 on vi.operator_code=c3.owcode
left outerjoin company c4 on vi.manager_code=c4.owcode
where age(now(), v.time::time with time zone) < '1 days'::interval
One trivial solution would be to convert it to joins
select v.id, v.pos, v.time, v.status, vi.name,vi.type,
reg_owner.fullname AS registered_owner,
gr_ben_owner.fullname AS group_beneficial_owner,
op.fullname AS operator,
man.fullname AS manager
from
car_pos v
left join cars vi on v.id = vi.id
left join company reg_owner on vi.registered_owner_code = reg_owner.owcode
left join company gr_ben_owner on vi.group_beneficial_owner_code = gr_ben_owner.owcode
left join company op on vi.operator_code = op.owcode
left join company man on vi.manager_code = man.owcode
where age(now(), v.time::time with time zone) < '1 days'::interval
I suspect however, that it might be possible by doing only one join of the table Company... I'm not 100% sure about the exact syntax to, and I have doubts that this will enhance performance (because of all the CASE-WHEN, GROUP by, etc) compared to the four time join solution, but I think this should work too. (I assumed, that cars-car_pos is a one-to-one relationship)
select v.id, MAX(v.pos) as pos, MAX(v.time) as vtime, MAX(v.status) as status, MAX(vi.name) as name,MAX(vi.type) as type,
MAX(CASE WHEN c.owcode = vi.registered_owner_code THEN c.fullname END) AS registered_owner,
MAX(CASE WHEN c.owcode = vi.group_beneficial_owner_code THEN c.fullname END) AS group_beneficial_owner,
MAX(CASE WHEN c.owcode = vi.operator_code THEN op.fullname END) AS operator,
MAX(CASE WHEN c.owcode = vi.manager_code THEN man.fullname END) AS manager
from
car_pos v
left join cars vi on v.id = vi.id
left join company c on c.owcode IN (vi.registered_owner_code, vi.group_beneficial_owner_code, vi.operator_code, vi.manager_code)
group by v.id
having age(now(), vtime::time with time zone) < '1 days'::interval
If you could put the table creation DDL scripts, and some inserts into the question, it would be easy to try in SQL fiddle...

ORA-22813 error with SQL complex query

I have a big SELECT statement which has many nested selects in it. When I run it, it gives me an ORA-22813 error:
Ora-22813:- The Collection value from one of the inner sub queries has exceeded the system limits and hence this error.
I have given below some of the nested selects which return huge data.
---The 1st select returns the most data.
Can I handle and process the huge data returned by the INNER SELECTs into the tables in any alternate way so that there is no error of memory less, sort size less.
get, any other way so that the QUERY successfully processes without error.
/*****************************************BEGIN
LEFT OUTER JOIN
( SELECT *
FROM STUDENT_COURSE stu_c
LEFT OUTER JOIN STUDENT_history ch on stu_c.course_id = ch.ch_course_id
LEFT OUTER JOIN STUDENT_master stu_mca on ch.course_history_id = stu_mca.item_id
) stu_c ON stu_c.HISTORY_ID = toa.ACTIVITY_ID ----->This table is joined earlier
LEFT OUTER JOIN
(SELECT c_e.EV_ID, c_e.EV_NAME, ma.item_id, ma.cata_id
FROM EVENTS c_e LEFT OUTER
JOIN COURSE_master ma on c_e.event_Id = ma.item_id ) c_e ON c_e.EVENT_ID = toa.ACTIVITY_ID
After these selects---we have GROUP_BYs to further sort.
---I have checked that if I put a extra limit qualification
like where rownum <30,<20 in each of these SELECTs it works fine.
Full query
SELECT * FROM (SELECT
mcat.CATALOG_ITEM_ID,
mcat.CATALOG_ITEM_NAME ,
mcat.DESCRIPTION,
mcat.CATALOG_ITEM_TYPE,
mcat.DELIVERY_METHOD,
XMLElement("TRAINING_PLAN",XMLAttributes( TP.TPLAN_ID as "id" ),
XMLELEMENT("COMPLETE_QUANTITY", TP.COMPLETE_QUANTITY),
XMLELEMENT("COMPLETE_UNIT", TP.COMPLETE_UNIT),
XMLElement("TOTAL_CREDITS", TP.numberOfCredits ),
XMLELEMENT("IS_CREDIT_BASED", TP.IS_CREDIT_BASED),
XMLELEMENT("IS_FOR_CERT", TP.IS_FOR_CERT),
XMLELEMENT("ACCREDIT_ORG_NAME", TP.ACCRED_ORG_NAME),
XMLELEMENT("ACCREDIT_ORG_ID", TP.accredit_org_id ),
XMLElement("OBJECTIVE_LIST", TP.OBJECTIVE_LIST )
).extract('/').getClobVal() AS PLAN_LIST
FROM
student_master_catalog mcat
INNER JOIN
(SELECT stu_tp.TPLAN_ID,
stu_tp.COMPLETE_QUANTITY,
stu_tp.COMPLETE_UNIT,
stu_tp.TPLAN_XML_DATA.extract('//numberOfCredits/text()').getStringVal() as numberOfCredits,
stu_tp.IS_CREDIT_BASED,
stu_tp.IS_FOR_CERT,
stu_oa.ACCRED_ORG_NAME,
stu_tp.TPLAN_XML_DATA.extract('//accreditingOrg/text()').getStringVal() as accredit_org_id,
objective_list.OBJECTIVE_LIST
FROM
student_training_catalog stu_tp
LEFT OUTER JOIN
stu_accrediting_org stu_oa on stu_tp.TPLAN_XML_DATA.extract('//accreditingOrg/text()').getStringVal() = stu_oa.ACCRED_ORG_ID
INNER JOIN
(SELECT *
FROM
(SELECT
stu_tpo.TPLAN_ID AS OBJECTIVE_TPLAN_ID,
XMLAgg(
XMLElement("OBJECTIVE",
XMLElement("OBJECTIVE_ID",stu_tpo.T_OBJECTIVE_ID ),
XMLElement("OBJECTIVE_NAME",stu_to.T_OBJECTIVE_NAME ),
XMLElement("OBJECTIVE_REQUIRED_CREDITS_OR_ACTIVITIES",stu_tpo.REQUIRED_CREDITS ),
XMLElement("ITEM_ORDER", stu_tpo.ITEM_ORDER ),
XMLElement("ACTIVITY_LIST", activity_list.ACTIVITY_LIST )
)
) as OBJECTIVE_LIST
FROM
stu_TP_OBJECTIVE stu_tpo
INNER JOIN
stu_TRAINING_OBJECTIVE stu_to ON stu_tpo.T_OBJECTIVE_ID = stu_to.T_OBJECTIVE_ID
INNER JOIN
(SELECT *
FROM
(SELECT stu_toa.T_OBJECTIVE_ID AS ACTIVITY_TOBJ_ID, XMLAgg(
XMLElement("ACTIVITY",
XMLElement("ACTIVITY_ID",stu_toa.ACTIVITY_ID ),
XMLElement("CATALOG_ID",COALESCE(stu_c.CATALOG_ID, COALESCE( stu_e.CATALOG_ID, stu_t.CATALOG_ID ) ) ),
XMLElement("CATALOG_ITEM_ID",COALESCE(stu_c.CATALOG_ITEM_ID, COALESCE( stu_e.CATALOG_ITEM_ID, stu_t.CATALOG_ITEM_ID ) ) ),
XMLElement("DELIVERY_METHOD",COALESCE(stu_c.DELIVERY_METHOD, COALESCE( stu_e.DELIVERY_METHOD, stu_t.DELIVERY_METHOD ) ) ),
XMLElement("ACTIVITY_NAME",COALESCE(stu_c.COURSE_NAME, COALESCE( stu_e.EVENT_NAME, stu_t.TEST_NAME ) ) ),
XMLElement("ACTIVITY_TYPE",initcap( stu_toa.ACTIVITY_TYPE ) ),
XMLElement("IS_REQUIRED",stu_toa.IS_REQUIRED ),
XMLElement("IS_PREFERRED",stu_toa.IS_PREFERRED ),
XMLElement("NUMBER_OF_CREDITS",stu_lac.CREDIT_HOURS),
XMLElement("ITEM_ORDER", stu_toa.ITEM_ORDER )
)) as ACTIVITY_LIST
FROM stu_TRAIN_OBJ_ACTIVITY stu_toa
LEFT OUTER JOIN
(
SELECT distinct lac.LEARNING_ACTIVITY_ID, lac.CREDIT_HOURS
FROM student_training_catalog tp
INNER JOIN stu_TP_OBJECTIVE tpo on tp.TPLAN_ID = tpo.TPLAN_ID
INNER JOIN stu_TRAIN_OBJ_ACTIVITY toa on tpo.T_OBJECTIVE_ID = toa.T_OBJECTIVE_ID
INNER JOIN stu_LEARNINGACTIVITY_CREDITS lac on lac.LEARNING_ACTIVITY_ID = toa.ACTIVITY_ID and tp.TPLAN_XML_DATA.extract ('//accreditingOrg/text()').getStringVal() = lac.ACC_ORG_ID
where tp.tplan_id ='*************'
) stu_lac ON stu_lac.LEARNING_ACTIVITY_ID = stu_toa.ACTIVITY_ID ------>This Select returns correct no. of rows
I want to join the below nested SELECTs with stu_toa.ACTIVITY_ID. This would solve my issues.
This below SELECT inside the LEFT OUTER JOIN is the Problem. it returns too much because 3 tables are joined directly without any value qualification.
LEFT OUTER JOIN
( SELECT ch.COURSE_HISTORY_ID, stu_c.COURSE_NAME, mca.catalog_item_id, mca.catalog_id, mca.delivery_method
FROM stu_COURSE stu_c
LEFT OUTER JOIN stu_course_history ch on stu_c.course_id = ch.ch_course_id -
--If I can qualify here with ch.ch_course_id = stu_toa.ACTIVITY_ID (stu_toa.ACTIVITY_ID from the above select with correct no. of rows )
--Here, I get errors because I can't access outside values inside a left outer join
LEFT OUTER JOIN student_master_catalog mca on ch.course_history_id = mca.catalog_item_id
) stu_c ON stu_c.COURSE_HISTORY_ID = stu_toa.ACTIVITY_ID
LEFT OUTER JOIN
(SELECT stu_e.EVENT_ID, stu_e.EVENT_NAME, mca.catalog_item_id, mca.catalog_id, mca.delivery_method FROM stu_EVENTS stu_e LEFT OUTER JOIN student_master_catalog mca on stu_e.event_Id = mca.catalog_item_id ) stu_e ON stu_e.EVENT_ID = stu_toa.ACTIVITY_ID
LEFT OUTER JOIN
(SELECT stu_t.TEST_HISTORY_ID, stu_t.TEST_NAME, mca.catalog_item_id, mca.catalog_id, mca.delivery_method FROM stu_TEST_HISTORY stu_t LEFT OUTER JOIN student_master_catalog mca on stu_t.test_history_id = mca.catalog_item_id) stu_t ON stu_t.test_history_id = stu_toa.ACTIVITY_ID
GROUP BY stu_toa.T_OBJECTIVE_ID) ) activity_list ON activity_list.ACTIVITY_TOBJ_ID = stu_tpo.T_OBJECTIVE_ID
GROUP BY stu_tpo.TPLAN_ID) ) objective_list ON objective_list.OBJECTIVE_TPLAN_ID = stu_tp.TPLAN_ID
)TP ON TP.TPLAN_ID = mcat.CATALOG_ITEM_ID
WHERE
mcat.CATALOG_ITEM_ID = '*****************' and mcat.CATALOG_ORG_ID = '********')
Please post the DDLs, approximate sizes (relative to each other), and the complete query, rather than just an excerpt.
Some quick hits that may or may not solve your problem (for better help, I need better information) --
Are you sure you mean OUTER join? Outer joining students to courses means students who are not taking any courses will still be around. Is that the desired behaviour?
Don't select * if you only want a limited subset of the columns. Enumerate the exact columns you need. The rest might not seem like much on a row-by-row basis, but when you multiply by the total number of rows you have, this sort of thing can mean the difference between in-memory sorts and spilling to disk.
How many rows of data are you looking at? there are times when separate queries with programmatic aggregation can work better. Someone with more knowledge of Oracle query optimization may be able to help, also, tweaking the settings could help here too...
I've had instances where a sproc was being called that aggregated data from more than one source took exponentially longer than two calls in the app, and putting it together in memory.
Post DDL of your tables and exact plan of the query.
Meanwhile, try increasing pga_aggregate_target, sort_area_size and hash_area_size