ORA-22813 error with SQL complex query - sql

I have a big SELECT statement which has many nested selects in it. When I run it, it gives me an ORA-22813 error:
Ora-22813:- The Collection value from one of the inner sub queries has exceeded the system limits and hence this error.
I have given below some of the nested selects which return huge data.
---The 1st select returns the most data.
Can I handle and process the huge data returned by the INNER SELECTs into the tables in any alternate way so that there is no error of memory less, sort size less.
get, any other way so that the QUERY successfully processes without error.
/*****************************************BEGIN
LEFT OUTER JOIN
( SELECT *
FROM STUDENT_COURSE stu_c
LEFT OUTER JOIN STUDENT_history ch on stu_c.course_id = ch.ch_course_id
LEFT OUTER JOIN STUDENT_master stu_mca on ch.course_history_id = stu_mca.item_id
) stu_c ON stu_c.HISTORY_ID = toa.ACTIVITY_ID ----->This table is joined earlier
LEFT OUTER JOIN
(SELECT c_e.EV_ID, c_e.EV_NAME, ma.item_id, ma.cata_id
FROM EVENTS c_e LEFT OUTER
JOIN COURSE_master ma on c_e.event_Id = ma.item_id ) c_e ON c_e.EVENT_ID = toa.ACTIVITY_ID
After these selects---we have GROUP_BYs to further sort.
---I have checked that if I put a extra limit qualification
like where rownum <30,<20 in each of these SELECTs it works fine.
Full query
SELECT * FROM (SELECT
mcat.CATALOG_ITEM_ID,
mcat.CATALOG_ITEM_NAME ,
mcat.DESCRIPTION,
mcat.CATALOG_ITEM_TYPE,
mcat.DELIVERY_METHOD,
XMLElement("TRAINING_PLAN",XMLAttributes( TP.TPLAN_ID as "id" ),
XMLELEMENT("COMPLETE_QUANTITY", TP.COMPLETE_QUANTITY),
XMLELEMENT("COMPLETE_UNIT", TP.COMPLETE_UNIT),
XMLElement("TOTAL_CREDITS", TP.numberOfCredits ),
XMLELEMENT("IS_CREDIT_BASED", TP.IS_CREDIT_BASED),
XMLELEMENT("IS_FOR_CERT", TP.IS_FOR_CERT),
XMLELEMENT("ACCREDIT_ORG_NAME", TP.ACCRED_ORG_NAME),
XMLELEMENT("ACCREDIT_ORG_ID", TP.accredit_org_id ),
XMLElement("OBJECTIVE_LIST", TP.OBJECTIVE_LIST )
).extract('/').getClobVal() AS PLAN_LIST
FROM
student_master_catalog mcat
INNER JOIN
(SELECT stu_tp.TPLAN_ID,
stu_tp.COMPLETE_QUANTITY,
stu_tp.COMPLETE_UNIT,
stu_tp.TPLAN_XML_DATA.extract('//numberOfCredits/text()').getStringVal() as numberOfCredits,
stu_tp.IS_CREDIT_BASED,
stu_tp.IS_FOR_CERT,
stu_oa.ACCRED_ORG_NAME,
stu_tp.TPLAN_XML_DATA.extract('//accreditingOrg/text()').getStringVal() as accredit_org_id,
objective_list.OBJECTIVE_LIST
FROM
student_training_catalog stu_tp
LEFT OUTER JOIN
stu_accrediting_org stu_oa on stu_tp.TPLAN_XML_DATA.extract('//accreditingOrg/text()').getStringVal() = stu_oa.ACCRED_ORG_ID
INNER JOIN
(SELECT *
FROM
(SELECT
stu_tpo.TPLAN_ID AS OBJECTIVE_TPLAN_ID,
XMLAgg(
XMLElement("OBJECTIVE",
XMLElement("OBJECTIVE_ID",stu_tpo.T_OBJECTIVE_ID ),
XMLElement("OBJECTIVE_NAME",stu_to.T_OBJECTIVE_NAME ),
XMLElement("OBJECTIVE_REQUIRED_CREDITS_OR_ACTIVITIES",stu_tpo.REQUIRED_CREDITS ),
XMLElement("ITEM_ORDER", stu_tpo.ITEM_ORDER ),
XMLElement("ACTIVITY_LIST", activity_list.ACTIVITY_LIST )
)
) as OBJECTIVE_LIST
FROM
stu_TP_OBJECTIVE stu_tpo
INNER JOIN
stu_TRAINING_OBJECTIVE stu_to ON stu_tpo.T_OBJECTIVE_ID = stu_to.T_OBJECTIVE_ID
INNER JOIN
(SELECT *
FROM
(SELECT stu_toa.T_OBJECTIVE_ID AS ACTIVITY_TOBJ_ID, XMLAgg(
XMLElement("ACTIVITY",
XMLElement("ACTIVITY_ID",stu_toa.ACTIVITY_ID ),
XMLElement("CATALOG_ID",COALESCE(stu_c.CATALOG_ID, COALESCE( stu_e.CATALOG_ID, stu_t.CATALOG_ID ) ) ),
XMLElement("CATALOG_ITEM_ID",COALESCE(stu_c.CATALOG_ITEM_ID, COALESCE( stu_e.CATALOG_ITEM_ID, stu_t.CATALOG_ITEM_ID ) ) ),
XMLElement("DELIVERY_METHOD",COALESCE(stu_c.DELIVERY_METHOD, COALESCE( stu_e.DELIVERY_METHOD, stu_t.DELIVERY_METHOD ) ) ),
XMLElement("ACTIVITY_NAME",COALESCE(stu_c.COURSE_NAME, COALESCE( stu_e.EVENT_NAME, stu_t.TEST_NAME ) ) ),
XMLElement("ACTIVITY_TYPE",initcap( stu_toa.ACTIVITY_TYPE ) ),
XMLElement("IS_REQUIRED",stu_toa.IS_REQUIRED ),
XMLElement("IS_PREFERRED",stu_toa.IS_PREFERRED ),
XMLElement("NUMBER_OF_CREDITS",stu_lac.CREDIT_HOURS),
XMLElement("ITEM_ORDER", stu_toa.ITEM_ORDER )
)) as ACTIVITY_LIST
FROM stu_TRAIN_OBJ_ACTIVITY stu_toa
LEFT OUTER JOIN
(
SELECT distinct lac.LEARNING_ACTIVITY_ID, lac.CREDIT_HOURS
FROM student_training_catalog tp
INNER JOIN stu_TP_OBJECTIVE tpo on tp.TPLAN_ID = tpo.TPLAN_ID
INNER JOIN stu_TRAIN_OBJ_ACTIVITY toa on tpo.T_OBJECTIVE_ID = toa.T_OBJECTIVE_ID
INNER JOIN stu_LEARNINGACTIVITY_CREDITS lac on lac.LEARNING_ACTIVITY_ID = toa.ACTIVITY_ID and tp.TPLAN_XML_DATA.extract ('//accreditingOrg/text()').getStringVal() = lac.ACC_ORG_ID
where tp.tplan_id ='*************'
) stu_lac ON stu_lac.LEARNING_ACTIVITY_ID = stu_toa.ACTIVITY_ID ------>This Select returns correct no. of rows
I want to join the below nested SELECTs with stu_toa.ACTIVITY_ID. This would solve my issues.
This below SELECT inside the LEFT OUTER JOIN is the Problem. it returns too much because 3 tables are joined directly without any value qualification.
LEFT OUTER JOIN
( SELECT ch.COURSE_HISTORY_ID, stu_c.COURSE_NAME, mca.catalog_item_id, mca.catalog_id, mca.delivery_method
FROM stu_COURSE stu_c
LEFT OUTER JOIN stu_course_history ch on stu_c.course_id = ch.ch_course_id -
--If I can qualify here with ch.ch_course_id = stu_toa.ACTIVITY_ID (stu_toa.ACTIVITY_ID from the above select with correct no. of rows )
--Here, I get errors because I can't access outside values inside a left outer join
LEFT OUTER JOIN student_master_catalog mca on ch.course_history_id = mca.catalog_item_id
) stu_c ON stu_c.COURSE_HISTORY_ID = stu_toa.ACTIVITY_ID
LEFT OUTER JOIN
(SELECT stu_e.EVENT_ID, stu_e.EVENT_NAME, mca.catalog_item_id, mca.catalog_id, mca.delivery_method FROM stu_EVENTS stu_e LEFT OUTER JOIN student_master_catalog mca on stu_e.event_Id = mca.catalog_item_id ) stu_e ON stu_e.EVENT_ID = stu_toa.ACTIVITY_ID
LEFT OUTER JOIN
(SELECT stu_t.TEST_HISTORY_ID, stu_t.TEST_NAME, mca.catalog_item_id, mca.catalog_id, mca.delivery_method FROM stu_TEST_HISTORY stu_t LEFT OUTER JOIN student_master_catalog mca on stu_t.test_history_id = mca.catalog_item_id) stu_t ON stu_t.test_history_id = stu_toa.ACTIVITY_ID
GROUP BY stu_toa.T_OBJECTIVE_ID) ) activity_list ON activity_list.ACTIVITY_TOBJ_ID = stu_tpo.T_OBJECTIVE_ID
GROUP BY stu_tpo.TPLAN_ID) ) objective_list ON objective_list.OBJECTIVE_TPLAN_ID = stu_tp.TPLAN_ID
)TP ON TP.TPLAN_ID = mcat.CATALOG_ITEM_ID
WHERE
mcat.CATALOG_ITEM_ID = '*****************' and mcat.CATALOG_ORG_ID = '********')

Please post the DDLs, approximate sizes (relative to each other), and the complete query, rather than just an excerpt.
Some quick hits that may or may not solve your problem (for better help, I need better information) --
Are you sure you mean OUTER join? Outer joining students to courses means students who are not taking any courses will still be around. Is that the desired behaviour?
Don't select * if you only want a limited subset of the columns. Enumerate the exact columns you need. The rest might not seem like much on a row-by-row basis, but when you multiply by the total number of rows you have, this sort of thing can mean the difference between in-memory sorts and spilling to disk.

How many rows of data are you looking at? there are times when separate queries with programmatic aggregation can work better. Someone with more knowledge of Oracle query optimization may be able to help, also, tweaking the settings could help here too...
I've had instances where a sproc was being called that aggregated data from more than one source took exponentially longer than two calls in the app, and putting it together in memory.

Post DDL of your tables and exact plan of the query.
Meanwhile, try increasing pga_aggregate_target, sort_area_size and hash_area_size

Related

How to do operations between a column and a subquery

I would like to know how I can do operations between a column and a subquery, what I want to do is add to the field Subtotal what was obtained in the subquery Impuestos, the following is the query that I am using for this case.
Select
RC.PURCHID;
LRC.VALUEMST as 'Subtotal',
isnull((
select sum((CONVERT(float, TD1.taxvalue)/100)*LRC1.VALUEMST ) as a
FROM TAXONITEM TOI1
inner join TAXDATA TD1 ON (TD1.TAXCODE = TOI1.TAXCODE and RC.DATAAREAID = TD1.DATAAREAID)
inner join TRANS LRC1 on (LRC1.VEND = RC.RECID)
WHERE TOI1.TAXITEMGROUP = PL.TAXITEMGROUP and RC.DATAAREAID = TOI1.DATAAREAID
), 0) Impuestos
from VEND RC
inner join VENDTABLE VTB on VTB.ACCOUNTNUM = RC.INVOICEACCOUNT
inner join TRANS LRC on (LRC.VEND = RC.RECID)
inner join PURCHLINE PL on (PL.LINENUMBER =LRC.LINENUM and PL.PURCHID =RC.PURCHID)
where year (RC.DELIVERYDATE) =2021 and RC.PURCHASETYPE =3 order by RC.PURCHID;
Hope someone can give me some guidance when doing operations with subqueries.
A few disjointed facts that may help:
When a SELECT statement returns only one row with one column, you can enclose that statement in parenthesis and use it as a plain value. In your case, let's say that select sum(......= TOI1.DATAAREAID returns 500. Then, your outer select's second column is equivalent to isnull(500,0)
You mention in your question "subquery Impuestos". Keep in mind that, although you indeed used a subquery as we mentioned earlier, by the time it was enclosed in parentheses it is not treated as a subquery (more accurately: derived table), but as a value. Thus, the "Impuestos" is only a column alias at this point
I dislike and avoid subqueries before the from, makes things much harder to read. Here is a solution with apply which will keep your code mostly intact:
Select
RC.PURCHID,
LRC.VALUEMST as 'Subtotal',
isnull(subquery1.a, 0) as Impuestos
from VEND RC
inner join VENDTABLE VTB on VTB.ACCOUNTNUM = RC.INVOICEACCOUNT
inner join TRANS LRC on (LRC.VEND = RC.RECID)
inner join PURCHLINE PL on (PL.LINENUMBER =LRC.LINENUM and PL.PURCHID =RC.PURCHID)
outer apply
(
select sum((CONVERT(float, TD1.taxvalue)/100)*LRC1.VALUEMST ) as a
FROM TAXONITEM TOI1
inner join TAXDATA TD1 ON (TD1.TAXCODE = TOI1.TAXCODE and RC.DATAAREAID = TD1.DATAAREAID)
inner join TRANS LRC1 on (LRC1.VEND = RC.RECID)
WHERE TOI1.TAXITEMGROUP = PL.TAXITEMGROUP and RC.DATAAREAID = TOI1.DATAAREAID
) as subquery1
where year (RC.DELIVERYDATE) =2021 and RC.PURCHASETYPE =3 order by RC.PURCHID;

SQL - select only newest record with WHERE clause

I have been trying to get some data off our database but got stuck when I needed to only get the newest file upload for each file type. I have done this before using the WHERE clause but this time there is an extra table involved that is needed to determine the file type.
My query looks like this so far and i am getting six records for this user (2x filetypeNo4 and 4x filetypeNo2).
SELECT db_file.fileID
,db_profile.NAME
,db_applicationFileType.fileTypeID
,> db_file.dateCreated
FROM db_file
LEFT JOIN db_applicationFiles
ON db_file.fileID = db_applicationFiles.fileID
LEFT JOIN db_profile
ON db_applicationFiles.profileID = db_profile.profileID
LEFT JOIN db_applicationFileType
ON db_applicationFiles.fileTypeID = > > db_applicationFileType.fileTypeID
WHERE db_profile.profileID IN ('19456')
AND db_applicationFileType.fileTypeID IN ('2','4')
I have the WHERE clause looking like this which is not working:
(db_file.dateCreated IS NULL
OR db_file.dateCreated = (
SELECT MAX(db_file.dateCreated)
FROM db_file left join
db_applicationFiles on db_file.fileID = db_applicationFiles.fileID
WHERE db_applicationFileType.fileTypeID = db_applicationFiles.FiletypeID
))
Sorry I am a noob so this may be really simple, but I just learn this stuff as I go on my own..
SELECT
ff.fileID,
pf.NAME,
ff.fileTypeID,
ff.dateCreated
FROM db_profile pf
OUTER APPLY
(
SELECT TOP 1 af.fileTypeID, df.dateCreated, df.fileID
FROM db_file df
INNER JOIN db_applicationFiles af
ON df.fileID = af.fileID
WHERE af.profileID = pf.profileID
AND af.fileTypeID IN ('2','4')
ORDER BY create_date DESC
) ff
WHERE pf.profileID IN ('19456')
And it looks like all of your joins are actually INNER. Unless there may be profile without files (that's why OUTER apply instead of CROSS).
What about an obvious:
SELECT * FROM
(SELECT * FROM db_file ORDER BY dateCreated DESC) AS files1
GROUP BY fileTypeID ;

LEFT JOIN ON COALESCE(a, b, c) - very strange behavior

I have encountered very strange behavior of my query and I wasted a lot of time to understand what causes it, in vane. So I am asking for your help.
SELECT count(*) FROM main_table
LEFT JOIN front_table ON front_table.pk = main_table.fk_front_table
LEFT JOIN info_table ON info_table.pk = front_table.fk_info_table
LEFT JOIN key_table ON key_table.pk = COALESCE(info_table.fk_key_table, front_table.fk_key_table_1, front_table.fk_key_table_2)
LEFT JOIN side_table ON side_table.fk_front_table = front_table.pk
WHERE side_table.pk = (SELECT MAX(pk) FROM side_table WHERE fk_front_table = front_table.pk)
OR side_table.pk IS NULL
Seems like a simple join query, with coalesce, I've used this technique before(not too many times) and it worked right.
In this query I don't ever get nulls for side_table.pk. If I remove coalesce or just don't use key_table, then the query returns rows with many null side_table.pk, but if I add coalesce join, I can't get those nulls.
It seems key_table and side_table don't have anything in common, but the result is so weird.
Also, when I don't use side_table and WHERE clause, the count(*) result with coalesce and without differs, but I can't see any pattern in rows missing, it seems random!
Real query:
SELECT ECHANGE.EXC_AUTO_KEY, STOCK_RESERVATIONS.STR_AUTO_KEY FROM EXCHANGE
LEFT JOIN WO_BOM ON WO_BOM.WOB_AUTO_KEY = EXCHANGE.WOB_AUTO_KEY
LEFT JOIN VIEW_WO_SUB ON VIEW_WO_SUB.WOO_AUTO_KEY = WO_BOM.WOO_AUTO_KEY
LEFT JOIN STOCK stock3 ON stock3.STM_AUTO_KEY = EXCHANGE.STM_AUTO_KEY
LEFT JOIN STOCK stock2 ON stock2.STM_AUTO_KEY = EXCHANGE.ORIG_STM
LEFT JOIN CONSIGNMENT_CODES con2 ON con2.CNC_AUTO_KEY = stock2.CNC_AUTO_KEY
LEFT JOIN CONSIGNMENT_CODES con3 ON con3.CNC_AUTO_KEY = stock3.CNC_AUTO_KEY
LEFT JOIN CI_UTL ON CI_UTL.CUT_AUTO_KEY = EXCHANGE.CUT_AUTO_KEY
LEFT JOIN PART_CONDITION_CODES pcc2 ON pcc2.PCC_AUTO_KEY = stock2.PCC_AUTO_KEY
LEFT JOIN PART_CONDITION_CODES pcc3 ON pcc3.PCC_AUTO_KEY = stock3.PCC_AUTO_KEY
LEFT JOIN STOCK_RESERVATIONS ON STOCK_RESERVATIONS.STM_AUTO_KEY = stock3.STM_AUTO_KEY
LEFT JOIN WAREHOUSE wh2 ON wh2.WHS_AUTO_KEY = stock2.WHS_ORIGINAL
LEFT JOIN SM_HISTORY ON (SM_HISTORY.STM_AUTO_KEY = EXCHANGE.ORIG_STM AND SM_HISTORY.WOB_REF = EXCHANGE.WOB_AUTO_KEY)
LEFT JOIN RC_DETAIL ON stock3.RCD_AUTO_KEY = RC_DETAIL.RCD_AUTO_KEY
LEFT JOIN RC_HEADER ON RC_HEADER.RCH_AUTO_KEY = RC_DETAIL.RCH_AUTO_KEY
LEFT JOIN WAREHOUSE wh3 ON wh3.WHS_AUTO_KEY = COALESCE(RC_DETAIL.WHS_AUTO_KEY, stock3.WHS_ORIGINAL, stock3.WHS_AUTO_KEY)
WHERE STOCK_RESERVATIONS.STR_AUTO_KEY = (SELECT MAX(STR_AUTO_KEY) FROM STOCK_RESERVATIONS WHERE STM_AUTO_KEY = stock3.STM_AUTO_KEY)
OR STOCK_RESERVATIONS.STR_AUTO_KEY IS NULL
Removing LEFT JOIN WAREHOUSE wh3 gives me about unique EXC_AUTO_KEY values with a lot of NULL STR_AUTO_KEY, while leaving this row removes all NULL STR_AUTO_KEY.
I recreated simple tables with numbers with the same structure and query works without any problems o.0
I have a feeling COALESCE is acting as a REQUIRED flag for the joined table, hence shooting the LEFT JOIN to become an INNER JOIN.
Try this:
SELECT COUNT(*)
FROM main_table
LEFT JOIN front_table ON front_table.pk = main_table.fk_front_table
LEFT JOIN info_table ON info_table.pk = front_table.fk_info_table
LEFT JOIN key_table ON key_table.pk = NVL(info_table.fk_key_table, NVL(front_table.fk_key_table_1, front_table.fk_key_table_2))
LEFT JOIN (SELECT fk_, MAX(pk) as pk FROM side_table GROUP BY fk_) st ON st.fk_ = front_table.pk
NVL might behave just the same though...
I undertood what was the problem (not entirely though): there is a LEFT JOIN VIEW_WO_SUB in original query, 3rd line. It causes this query to act in a weird way.
When I replaced the view with the other table which contained the information I needed, the query started returning right results.
Basically, with this view join, NVL, COALESCE or CASE join with combination of certain arguments did not work along with OR clause in WHERE subquery, all rest was fine. ALthough, I could get the query to work with this view join, by changing the order of joined tables, I had to place table participating in where subquery to the bottom.

Recursive query with outer joins?

I'm attempting the following query,
DECLARE #EntityType varchar(25)
SET #EntityType = 'Accessory';
WITH Entities (
E_ID, E_Type,
P_ID, P_Name, P_DataType, P_Required, P_OnlyOne,
PV_ID, PV_Value, PV_EntityID, PV_ValueEntityID,
PV_UnitValueID, PV_UnitID, PV_UnitName, PV_UnitDesc, PV_MeasureID, PV_MeasureName, PV_UnitValue,
PV_SelectionID, PV_DropDownID, PV_DropDownName, PV_DropDownOptionID, PV_DropDownOptionName, PV_DropDownOptionDesc,
RecursiveLevel
)
AS
(
-- Original Query
SELECT dbo.Entity.ID AS E_ID, dbo.EntityType.Name AS E_Type,
dbo.Property.ID AS P_ID, dbo.Property.Name AS P_Name, DataType.Name AS P_DataType, Required AS P_Required, OnlyOne AS P_OnlyOne,
dbo.PropertyValue.ID AS PV_ID, dbo.PropertyValue.Value AS PV_Value, dbo.PropertyValue.EntityID AS PV_EntityID, dbo.PropertyValue.ValueEntityID AS PV_ValueEntityID,
dbo.UnitValue.ID AS PV_UnitValueID, dbo.UnitOfMeasure.ID AS PV_UnitID, dbo.UnitOfMeasure.Name AS PV_UnitName, dbo.UnitOfMeasure.Description AS PV_UnitDesc, dbo.Measure.ID AS PV_MeasureID, dbo.Measure.Name AS PV_MeasureName, dbo.UnitValue.UnitValue AS PV_UnitValue,
dbo.DropDownSelection.ID AS PV_SelectionID, dbo.DropDown.ID AS PV_DropDownID, dbo.DropDown.Name AS PV_DropDownName, dbo.DropDownOption.ID AS PV_DropDownOptionID, dbo.DropDownOption.Name AS PV_DropDownOptionName, dbo.DropDownOption.Description AS PV_DropDownOptionDesc,
0 AS RecursiveLevel
FROM dbo.Entity
INNER JOIN dbo.EntityType ON dbo.EntityType.ID = dbo.Entity.TypeID
INNER JOIN dbo.Property ON dbo.Property.EntityTypeID = dbo.Entity.TypeID
INNER JOIN dbo.PropertyValue ON dbo.Property.ID = dbo.PropertyValue.PropertyID AND dbo.PropertyValue.EntityID = dbo.Entity.ID
INNER JOIN dbo.DataType ON dbo.DataType.ID = dbo.Property.DataTypeID
LEFT JOIN dbo.UnitValue ON dbo.UnitValue.ID = dbo.PropertyValue.UnitValueID
LEFT JOIN dbo.UnitOfMeasure ON dbo.UnitOfMeasure.ID = dbo.UnitValue.UnitOfMeasureID
LEFT JOIN dbo.Measure ON dbo.Measure.ID = dbo.UnitOfMeasure.MeasureID
LEFT JOIN dbo.DropDownSelection ON dbo.DropDownSelection.ID = dbo.PropertyValue.DropDownSelectedID
LEFT JOIN dbo.DropDownOption ON dbo.DropDownOption.ID = dbo.DropDownSelection.SelectedOptionID
LEFT JOIN dbo.DropDown ON dbo.DropDown.ID = dbo.DropDownSelection.DropDownID
WHERE dbo.EntityType.Name = #EntityType
UNION ALL
-- Recursive Query?
SELECT E2.E_ID AS E_ID, dbo.EntityType.Name AS E_Type,
dbo.Property.ID AS P_ID, dbo.Property.Name AS P_Name, DataType.Name AS P_DataType, Required AS P_Required, OnlyOne AS P_OnlyOne,
dbo.PropertyValue.ID AS PV_ID, dbo.PropertyValue.Value AS PV_Value, dbo.PropertyValue.EntityID AS PV_EntityID, dbo.PropertyValue.ValueEntityID AS PV_ValueEntityID,
dbo.UnitValue.ID AS PV_UnitValueID, dbo.UnitOfMeasure.ID AS PV_UnitID, dbo.UnitOfMeasure.Name AS PV_UnitName, dbo.UnitOfMeasure.Description AS PV_UnitDesc, dbo.Measure.ID AS PV_MeasureID, dbo.Measure.Name AS PV_MeasureName, dbo.UnitValue.UnitValue AS PV_UnitValue,
dbo.DropDownSelection.ID AS PV_SelectionID, dbo.DropDown.ID AS PV_DropDownID, dbo.DropDown.Name AS PV_DropDownName, dbo.DropDownOption.ID AS PV_DropDownOptionID, dbo.DropDownOption.Name AS PV_DropDownOptionName, dbo.DropDownOption.Description AS PV_DropDownOptionDesc,
(RecursiveLevel + 1)
FROM Entities AS E2
INNER JOIN dbo.Entity ON dbo.Entity.ID = E2.PV_ValueEntityID
INNER JOIN dbo.EntityType ON dbo.EntityType.ID = dbo.Entity.TypeID
INNER JOIN dbo.Property ON dbo.Property.EntityTypeID = dbo.Entity.TypeID
INNER JOIN dbo.PropertyValue ON dbo.Property.ID = dbo.PropertyValue.PropertyID AND dbo.PropertyValue.EntityID = E2.E_ID
INNER JOIN dbo.DataType ON dbo.DataType.ID = dbo.Property.DataTypeID
INNER JOIN dbo.UnitValue ON dbo.UnitValue.ID = dbo.PropertyValue.UnitValueID
INNER JOIN dbo.UnitOfMeasure ON dbo.UnitOfMeasure.ID = dbo.UnitValue.UnitOfMeasureID
INNER JOIN dbo.Measure ON dbo.Measure.ID = dbo.UnitOfMeasure.MeasureID
INNER JOIN dbo.DropDownSelection ON dbo.DropDownSelection.ID = dbo.PropertyValue.DropDownSelectedID
INNER JOIN dbo.DropDownOption ON dbo.DropDownOption.ID = dbo.DropDownSelection.SelectedOptionID
INNER JOIN dbo.DropDown ON dbo.DropDown.ID = dbo.DropDownSelection.DropDownID
)
SELECT E_ID, E_Type,
P_ID, P_Name, P_DataType, P_Required, P_OnlyOne,
PV_ID, PV_Value, PV_EntityID, PV_ValueEntityID,
PV_UnitValueID, PV_UnitID, PV_UnitName, PV_UnitDesc, PV_MeasureID, PV_MeasureName, PV_UnitValue,
PV_SelectionID, PV_DropDownID, PV_DropDownName, PV_DropDownOptionID, PV_DropDownOptionName, PV_DropDownOptionDesc,
RecursiveLevel
FROM Entities
INNER JOIN [dbo].[Entity] AS dE
ON dE.ID = PV_EntityID
The problem is the second query, the "recursive one" is getting the data I expect since I can't do the LEFT JOINs like in the first query. (At least to my understanding).
If I remove the fetching of the data that requires the LEFT (Outer) JOINs then the recursion works perfectly. My problem is I need both. Is there a way I can accomplish this?
Per http://msdn.microsoft.com/en-us/library/ms175972.aspx you can not have a left/right/outer join in a recursive CTE.
For a recursive CTE you can't use a subquery either so I sugest following this example.
They use two CTE's. The first is not recursive and does the left join to get the data it needs. The second CTE is recursive and inner joins on the first CTE. Since CTE1 is not recursive it can left join and supply default values for the missing rows and is guarenteed to work in the inner join.
However, you can also duplicate a left join with a union and subselect though it isn't really useful normally but it is interesting.
In that case, you would keep your first statement how it is. It will match all rows that join successfully.
Then UNION that query with another query that removes the join, but has a
NOT EXISTS(SELECT 1 FROM MISSING_ROWS_TABLE WHERE MAIN_TABLE.JOIN_CONDITION = MISSING_ROWS_TABLE.JOIN_CONDITION)
This gets all the rows that failed the previous join condition in query 1. You can replace the colmuns you would get from MISSING_ROWS_TABLE with NULL. I had to do this once using a coding framework that didn't support outer joins. Since recursive CTE's don't allow subqueries you have to use the first solution.

Super Slow Query - sped up, but not perfect... Please help

I posted a query yesterday (see here) that was horrible (took over a minute to run, resulting in 18,215 records):
SELECT DISTINCT
dbo.contacts_link_emails.Email, dbo.contacts.ContactID, dbo.contacts.First AS ContactFirstName, dbo.contacts.Last AS ContactLastName, dbo.contacts.InstitutionID,
dbo.institutionswithzipcodesadditional.CountyID, dbo.institutionswithzipcodesadditional.StateID, dbo.institutionswithzipcodesadditional.DistrictID
FROM
dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_3
INNER JOIN
dbo.contacts
INNER JOIN
dbo.contacts_link_emails
ON dbo.contacts.ContactID = dbo.contacts_link_emails.ContactID
ON contacts_def_jobfunctions_3.JobID = dbo.contacts.JobTitle
INNER JOIN
dbo.institutionswithzipcodesadditional
ON dbo.contacts.InstitutionID = dbo.institutionswithzipcodesadditional.InstitutionID
LEFT OUTER JOIN
dbo.contacts_def_jobfunctions
INNER JOIN
dbo.contacts_link_jobfunctions
ON dbo.contacts_def_jobfunctions.JobID = dbo.contacts_link_jobfunctions.JobID
ON dbo.contacts.ContactID = dbo.contacts_link_jobfunctions.ContactID
WHERE
(dbo.contacts.JobTitle IN
(SELECT JobID
FROM dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_1
WHERE (ParentJobID <> '1841')))
AND
(dbo.contacts_link_emails.Email NOT IN
(SELECT EmailAddress
FROM dbo.newsletterremovelist))
OR
(dbo.contacts_link_jobfunctions.JobID IN
(SELECT JobID
FROM dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_2
WHERE (ParentJobID <> '1841')))
AND
(dbo.contacts_link_emails.Email NOT IN
(SELECT EmailAddress
FROM dbo.newsletterremovelist AS newsletterremovelist))
ORDER BY EMAIL
With a lot of coaching and research, I've tuned it up to the following:
SELECT contacts.ContactID,
contacts.InstitutionID,
contacts.First,
contacts.Last,
institutionswithzipcodesadditional.CountyID,
institutionswithzipcodesadditional.StateID,
institutionswithzipcodesadditional.DistrictID
FROM contacts
INNER JOIN contacts_link_emails ON
contacts.ContactID = contacts_link_emails.ContactID
INNER JOIN institutionswithzipcodesadditional ON
contacts.InstitutionID = institutionswithzipcodesadditional.InstitutionID
WHERE
(contacts.ContactID IN
(SELECT contacts_2.ContactID
FROM contacts AS contacts_2
INNER JOIN contacts_link_emails AS contacts_link_emails_2 ON
contacts_2.ContactID = contacts_link_emails_2.ContactID
LEFT OUTER JOIN contacts_def_jobfunctions ON
contacts_2.JobTitle = contacts_def_jobfunctions.JobID
RIGHT OUTER JOIN newsletterremovelist ON
contacts_link_emails_2.Email = newsletterremovelist.EmailAddress
WHERE (contacts_def_jobfunctions.ParentJobID <> 1841)
GROUP BY contacts_2.ContactID
UNION
SELECT contacts_1.ContactID
FROM contacts_link_jobfunctions
INNER JOIN contacts_def_jobfunctions AS contacts_def_jobfunctions_1 ON
contacts_link_jobfunctions.JobID = contacts_def_jobfunctions_1.JobID
AND contacts_def_jobfunctions_1.ParentJobID <> 1841
INNER JOIN contacts AS contacts_1 ON
contacts_link_jobfunctions.ContactID = contacts_1.ContactID
INNER JOIN contacts_link_emails AS contacts_link_emails_1 ON
contacts_link_emails_1.ContactID = contacts_1.ContactID
LEFT OUTER JOIN newsletterremovelist AS newsletterremovelist_1 ON
contacts_link_emails_1.Email = newsletterremovelist_1.EmailAddress
GROUP BY contacts_1.ContactID))
While this query is now super fast (about 3 seconds), I've blown part of the logic somewhere - it only returns 14,863 rows (instead of the 18,215 rows that I believe is accurate).
The results seem near correct. I'm working to discover what data might be missing in the result set.
Can you please coach me through whatever I've done wrong here?
Thanks,
Russell Schutte
The main problem with your original query was that you had two extra joins just to introduce duplicates and then a DISTINCT to get rid of them.
Use this:
SELECT cle.Email,
c.ContactID,
c.First AS ContactFirstName,
c.Last AS ContactLastName,
c.InstitutionID,
izip.CountyID,
izip.StateID,
izip.DistrictID
FROM dbo.contacts c
INNER JOIN
dbo.institutionswithzipcodesadditional izip
ON izip.InstitutionID = c.InstitutionID
INNER JOIN
dbo.contacts_link_emails cle
ON cle.ContactID = c.ContactID
WHERE cle.Email NOT IN
(
SELECT EmailAddress
FROM dbo.newsletterremovelist
)
AND EXISTS
(
SELECT NULL
FROM dbo.contacts_def_jobfunctions cdj
WHERE cdj.JobId = c.JobTitle
AND cdj.ParentJobId <> '1841'
UNION ALL
SELECT NULL
FROM dbo.contacts_link_jobfunctions clj
JOIN dbo.contacts_def_jobfunctions cdj
ON cdj.JobID = clj.JobID
WHERE clj.ContactID = c.ContactID
AND cdj.ParentJobId <> '1841'
)
ORDER BY
email
Create the following indexes:
newsletterremovelist (EmailAddress)
contacts_link_jobfunctions (ContactID, JobID)
contacts_def_jobfunctions (JobID)
Do you get the same results when you do:
SELECT count(*)
FROM
dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_3
INNER JOIN
dbo.contacts
INNER JOIN
dbo.contacts_link_emails
ON dbo.contacts.ContactID = dbo.contacts_link_emails.ContactID
ON contacts_def_jobfunctions_3.JobID = dbo.contacts.JobTitle
SELECT COUNT(*)
FROM
contacts
INNER JOIN contacts_link_jobfunctions
ON contacts.ContactID = contacts_link_jobfunctions.ContactID
INNER JOIN contacts_link_emails
ON contacts.ContactID = contacts_link_emails.ContactID
If so keep adding each join conditon on until you don't get the same results and you will see where your mistake was. If all the joins are the same, then look at the where clauses. But I will be surprised if it isn't in the first join because the syntax you have orginally won't even work on SQL Server and it is pretty nonstandard SQL and may have been incorrect all along but no one knew.
Alternatively, pick a few of the records that are returned in the orginal but not the revised. Track them through the tables one at a time to see if you can find why the second query filters them out.
I'm not directly sure what is wrong, but when I run in to this situation, the first thing I do is start removing variables.
So, comment out the where clause. How many rows are returned?
If you get back the 11,604 rows then you've isolated the problems to the joins. Work though the joins, commenting each one out (remove the associated columns too) and figure out how many rows are eliminated.
As you do this, aim to find what is causing the desired rows to be eliminated. Once isolated, consider the join differences between the first query and the second query.
In looking at the first query, you could probably just modify that to eliminate any INs and instead do a EXISTS instead.
Consider your indexes as well. Any thing in the where or join clauses should probably be indexed.