Have merge Cartesian Join with high cost - sql

We were querying the DB to populate some logged tickets, however the query formed causing the above issue and is communicated by our performance team.
Here I am into Java programming and I don't have much idea on these joins. How can I the re-frame below piece of query to avoid the merge Cartesian Join with high cost?
FROM
SERVICE_REQ SR,
SR_COBRAND_DATA SR_COB_DATA,
REPOSITORY rep,
SR_ASSIGNEE_INFO ASSIGNEE_INFO
WHERE
SR.SR_COBRAND_ID=rep.COBRAND_ID
AND SR.SERVICE_REQ_ID=SR_COB_DATA.SERVICE_REQ_ID (+)
AND SR.SERVICE_REQ_ID = ASSIGNEE_INFO.SERVICE_REQ_ID (+)
AND SR.SR_COBRAND_ID = 99

Create a composite index on columns SR_COBRAND_ID and SERVICE_REQ_ID of table SERVICE_REQ
-- Create Index [indexname] on SERVICE_REQ (SR_COBRAND_ID , SERVICE_REQ_ID);

Just a suggestion: you should not use old implicit join syntax but join explicit join syntax:
SELECT *
FROM SERVICE_REQ SR
LEFT JOIN SR_COBRAND_DATA SR_COB_DATA ON SR.SERVICE_REQ_ID=SR_COB_DATA.SERVICE_REQ_ID
INNER JOIN REPOSITORY rep ON SR.SR_COBRAND_ID=rep.COBRAND_ID
LEFT JOIN SR_ASSIGNEE_INFO ASSIGNEE_INFO ON SR.SERVICE_REQ_ID = ASSIGNEE_INFO.SERVICE_REQ_ID
WHERE SR.SR_COBRAND_ID = 99
Anyway, based on this condition you have not a Cartesian product between the table but a left join for SERVICE_REQ with SR_COBRAND_DATA and SR_ASSIGNEE_INFO reduce by inner join with REPOSITORY.
Perhaps to explain you goal you should add proper sample data, the expected result, and your actual result.

Related

What is the best way to write inner join query in SQL?

First Way:
SELECT ST.PersonID, ST.CustomerID, ST.SaleTypeID, ST.PaymentGatewayID,
ST.CustomerMembershipID, ST.CustomerPaymentGatewayID, P.Currency
FROM ServiceTransaction ST
INNER JOIN dbo.Person P ON ST.PersonID = P.PersonID;
Second Way:
SELECT ST.PersonID, ST.CustomerID, ST.SaleTypeID, ST.PaymentGatewayID,
ST.CustomerMembershipID, ST.CustomerPaymentGatewayID, P.Currency
FROM ServiceTransaction ST
INNER JOIN dbo.Person P ON P.PersonID = ST.PersonID;
I want to optimize this query in SQL. Currently I am using second way and I need expert opinion, which one is best approach.
However, any one can add some new way to optimize this query as well.
As comments suggest, they are identical in many aspects. For a personal note tho if you keep the same format it gets easier to follow so I use the following schema to keep tabs on tables where I always use lower numbered table on left side
SELECT Column_list
FROM TABLE1
INNER JOIN TABLE2
INNER JOIN TABLE3
ON Table1.ColName = Table2.ColName
ON Table1.ColName = Table3.ColName | ON Table2.ColName = Table3.ColName
This way when you are working with multiple tables and joins it is easier to follow which table is joined where and which table was mentioned first in the query.

SQL: JOIN vs LEFT OUTER JOIN?

I have multiple SQL queries that look similar where one uses JOIN and another LEFT OUTER JOIN. I played around with SQL and found that it the same results are returned. The codebase uses JOIN and LEFT OUTER JOIN interchangeably. While LEFT JOIN seems to be interchangeable with LEFT OUTER JOIN, I cannot I cannot seem to find any information about only JOIN. Is this good practice?
Ex Query1 using JOIN
SQL
SELECT
id,
name
FROM
u_users customers
JOIN
t_orders orders
ON orders.status=='PAYMENT PENDING'
Ex. Query2 using LEFT OUTER JOIN
SQL
SELECT
id,
name
FROM
u_users customers
LEFT OUTER JOIN
t_orders orders
ON orders.status=='PAYMENT PENDING'
As previously noted above:
JOIN is synonym of INNER JOIN. It's definitively different from all
types of OUTER JOIN
So the question is "When should I use an outer join?"
Here's a good article, with several great diagrams:
https://www.sqlshack.com/sql-outer-join-overview-and-examples/
The short answer your your question is:
Prefer JOIN (aka "INNER JOIN") to link two related tables. In practice, you'll use INNER JOIN most of the time.
INNER JOIN is the intersection of the two tables. It's represented by the "green" section in the middle of the Venn diagram above.
Use an "Outer Join" when you want the left, right or both outer regions.
In your example, the result set happens to be the same: the two expressions happen to be equivalent.
ALSO: be sure to familiarize yourself with "Show Plan" (or equivalent) for your RDBMS: https://www.sqlshack.com/execution-plans-in-sql-server/
'Hope that helps...
First the theory:
A join is a subset of the left join (all other things equal). Under some circumstances they are identical
The difference is that the left join will include all the tuples in the left hand side relation (even if they don't match the join predicate), while the join will only include the tuples of the left hand side that match the predicate.
For instance assume we have to relations R and S.
Say we have to do R JOIN S (and R LEFT JOIN S) on some predicate p
J = R JOIN S on (p)
Now, identify the tuples of R that are not in J.
Finally, add those tuples to J (padding any attribute in J not in R with null)
This result is the left join:
R LEFT JOIN S (p)
So when all the tuples of the left hand side of the relation are in the JOIN, this result will be identical to the Left Join.
back to you problem:
Your JOIN is very likely to include all the tuples from Users. So the query is the same if you use JOIN or LEFT JOIN.
The two are exactly equivalent, because the WHERE clause turns the LEFT JOIN into an INNER JOIN.
When filtering on all but the first table in a LEFT JOIN, the condition should usually be in the ON clause. Presumably, you also have a valid join condition, connecting the two tables:
SELEC id, name
FROM u_users u LEFT JOIN
t_orders o
ON o.user_id = u.user_id AND o.status = 'PAYMENT PENDING';
This version differs from the INNER JOIN version, because this version returns all users even those with no pending payments.
Both are the same, there is no difference here.
You need to use the ON clause when using Join. It can match any data between two tables when you don't use the ON clause.
This can cause performance issue as well as map unwanted data.
If you want to see the differences you can use "execution plans".
for example, I used the Microsoft AdventureWorks database for the example.
LEFT OUTER JOIN :
LEFT JOIN :
If you use the ON clause as you wrote, there is a possibility of looping.
Example "execution plans" is below.
You can access the correct mapping and data using the ON clause complement.
select
id,
name
from
u_users customers
left outer join
t_orders orders on customers.id = orders.userid
where orders.status=='payment pending'

Condition on main table in inner join vs in where clause

Is there any reason on an INNER JOIN to have a condition on the main table vs in the WHERE clause?
Example in INNER JOIN:
SELECT
(various columns here from each table)
FROM dbo.MainTable AS m
INNER JOIN dbo.JohnDataRecord AS jdr
ON m.ibID = jdr.ibID
AND m.MainID = #MainId -- question here
AND jdr.SentDate IS NULL
LEFT JOIN dbo.PTable AS p1
ON jdr.RecordID = p1.RecordID
LEFT JOIN dbo.DataRecipient AS dr
ON jdr.RecipientID = dr.RecipientID
(more left joins here)
WHERE
dr.lastRecordID IS NOT NULL;
Query with condition in WHERE clause:
SELECT
(various columns here from each table)
FROM dbo.MainTable AS m
INNER JOIN dbo.JohnDataRecord AS jdr
ON m.ibID = jdr.ibID
AND jdr.SentDate IS NULL
LEFT JOIN dbo.PTable AS p1
ON jdr.RecordID = p1.RecordID
LEFT JOIN dbo.DataRecipient AS dr
ON jdr.RecipientID = dr.RecipientID
(more left joins here)
WHERE
m.MainID = #MainId -- question here
AND dr.lastRecordID IS NOT NULL;
Difference in other similar questions that are more general whereas this is specific to SQL Server.
In the scope of the question,
Is there any reason on an INNER JOIN to have a condition on the main
table vs in the WHERE clause?
This is a STYLE choice for the INNER JOIN.
From a pure style reflection point of view:
While there is no hard and fast rule for STYLE, it is generally observed that this is a less often used style choice. For example that might generally lead to more challenging maintenance such as if someone where to remove the INNER JOIN and all the subsequent ON clause conditions, it would effect the primary table result set, OR make the query more difficult to debug/understand when it is a very complex set of joins.
It might also be noted that this line might be placed on many INNER JOINS further adding to the confusion.

Joining Two Queries onto a Shared Table

I have been struggling with the concept of combining results of two queries together via a join on a common table and I was hoping I could gain some assistance. The following is a rough guide to the tables:
dbo.Asset (not returned in the SELECT statement, used for joins only)
- dbo.Asset.AssetID
- dbo.Asset.CatalogueID
dbo.WorkOrder
- WorkOrderID
- AssetID
- WorkOrderNumber
dbo.WorkOrderSpare
- WorkOrderID
- WorkOrderSpareID
- WorkOrderSpareDescription
- ActualQuantity
- CreatedDateTime
dbo.Catalogue
- CatalogueID (PK)
- CatalogueNumber
- CatalogueDescription
- CatalogueGroupID
dbo.CatalogueGroup
- CatalogueGroupID
- CatalogueGroupNumber
First Query:
SELECT CatalogueGroup.CatalogueGroupName,
Catalogue.CatalogueNumber,
Catalogue.CatalogueDescription,
Catalogue.CatalogueID,
Asset.IsActive
FROM CatalogueGroup
INNER JOIN Catalogue
ON CatalogueGroup.CatalogueGroupID = Catalogue.CatalogueGroupID
INNER JOIN Asset
ON Catalogue.CatalogueID = Asset.CatalogueID
Second Query:
SELECT WorkOrder.WorkOrderNumber,
WorkOrderSpare.WorkOrderSpareDescription,
WorkOrderSpare.ActualQuantity,
WorkOrderSpare.CreatedDateTime
FROM WorkOrder
INNER JOIN WorkOrderSpare
ON WorkOrder.WorkOrderID = WorkOrderSpare.WorkOrderID
Now I can do the above easily enough in Access (WYSIWYG) by joining Asset/Catalogue/CatalogueGroup together and then joining Asset.AssetID onto WorkOrder.AssetID. I can't seem to get anything similar to work via raw code, I think I have my logic correct for the joins (INNER JOIN on the results of both) but then I am new to this.
Any assistance would be greatly appreciated, any pointers on where I can read further into problems like this would be great.
EDIT: This is what I was trying to use to no avail, I should also mention I am trying to do this in ms-sql, not Access (trying to move away from drag and drop):
SELECT CatalogueGroup.CatalogueGroupName,
Catalogue.CatalogueNumber,
Catalogue.CatalogueDescription,
Catalogue.CatalogueID,
Asset.IsActive,
WorkOrderSpare.WorkOrderSpareDescription,
WorkOrderSpare.ActualQuantity,
WorkOrderSpare.CreatedDateTime,
WorkOrder.WorkOrderNumber
FROM (((CatalogueGroup
INNER JOIN Catalogue
ON CatalogueGroup.CatalogueGroupID = Catalogue.CatalogueGroupID)
INNER JOIN Asset ON Catalogue.CatalogueID = Asset.CatalogueID)
INNER JOIN WorkOrderSpare
ON WorkOrder.WorkOrderID = WorkOrderSpare.WorkOrderID)
INNER JOIN WorkOrder ON Asset.AssetID = WorkOrder.AssetID
Think I see the issue. Assuming that the joins themselves are correct (ie your columns do relate to each other), the order of your joins is a little off - when you join WorkOrder to WorkOrderSpare, neither of those two tables relate back to any table you've identified up until that point in the query. Think of it as joining two tables separately from the chain of joins you have going so far - it's almost like doing two separate join queries. If you switch the last two it should work, that way WorkOrder will join to Asset (which you've already defined) and then you can join WorkOrderSpare to WorkOrder. I've also taken the liberty of removing parentheses on the joins, that's an Access thing.
SELECT CatalogueGroup.CatalogueGroupName,
Catalogue.CatalogueNumber,
Catalogue.CatalogueDescription,
Catalogue.CatalogueID,
Asset.IsActive,
WorkOrderSpare.WorkOrderSpareDescription,
WorkOrderSpare.ActualQuantity,
WorkOrderSpare.CreatedDateTime,
WorkOrder.WorkOrderNumber
FROM CatalogueGroup
INNER JOIN Catalogue ON CatalogueGroup.CatalogueGroupID = Catalogue.CatalogueGroupID
INNER JOIN Asset ON Catalogue.CatalogueID = Asset.CatalogueID
INNER JOIN WorkOrder ON Asset.AssetID = WorkOrder.AssetID
INNER JOIN WorkOrderSpare ON WorkOrder.WorkOrderID = WorkOrderSpare.WorkOrderID
I think you were close. As a slightly different approach to joining, try this:
SELECT CatalogueGroup.CatalogueGroupName,
Catalogue.CatalogueNumber,
Catalogue.CatalogueDescription,
Catalogue.CatalogueID,
Asset.IsActive,
WorkOrderSpare.WorkOrderSpareDescription,
WorkOrderSpare.ActualQuantity,
WorkOrderSpare.CreatedDateTime,
WorkOrder.WorkOrderNumber
FROM
CatalogueGroup, Catalogue, Asset, WorkOrder, WorkOrderSpare
WHERE CatalogueGroup.CatalogueGroupID = Catalogue.CatalogueGroupID
and Catalogue.CatalogueID = Asset.CatalogueID
and Asset.AssetID = WorkOrder.AssetID
and WorkOrder.WorkOrderID = WorkOrderSpare.WorkOrderID
It looks like it should work, but not having data, hard to know if simple joins all the way through is what you want. (It's a matter of personal preference whether to use the and clauses rather than the inner join syntax. While on style preferences, I like table aliases if supported, so FROM CatalogueGroup cg for example, so that you can refer to cg.CatalogueGroupID etc, rather than writing out the full table name every time.)
Ok. so the error that you noted in a comment
The multi-part identifier "WorkOrder.WorkOrderID" could not be bound
is usually when you have an alias for a table and instead of using alias in JOIN you use the table name or when you are using the wrong column name or table name.
Ideally in SQL server your query should look like this
SELECT
cg.CatalogueGroupName,
c.CatalogueNumber,
c.CatalogueDescription,
c.CatalogueID,
A.IsActive,
Wo.WorkOrderNumber,
WoS.WorkOrderSpareDescription,
WoS.ActualQuantity,
WoS.CreatedDateTime
FROM CatalogueGroup cg
INNER JOIN Catalogue c ON cg.CatalogueGroupID = c.CatalogueGroupID
INNER JOIN Asset A ON c.CatalogueID = A.CatalogueID
INNER JOIN WorkOrder Wo ON Wo.AssetID= A.AssetID
INNER JOIN WorkOrderSpare WoS ON Wo.WorkOrderID = WoS.WorkOrderID

multiple sql joins not producing desired results

I'm new to sql and trying to tweak someone else's huge stored procedure to get a subset of the results. The code below is maybe 10% of the whole procedure. I added the lp.posting_date, last left join, and the where clause. Trying to get records where the posting date is between the start date and the end date. Am I doing this right? Apparently not because the results are unaffected by the change. UPDATE: I CHANGED THE LAST JOIN. The results are correct if there's only one area allocation term. If there is more than one area allocation term, the results are duplicated for each term.
SELECT Distinct
l.lease_id ,
l.property_id as property_id,
l.lease_number as LeaseNumber,
l.name as LeaseName,
lty.name as LeaseType,
lst.name as LeaseStatus,
l.possession_date as PossessionDate,
l.rent as RentCommencementDate,
l.store_open_date as StoreOpenDate,
msr.description as MeasureUnit,
l.comments as Comments ,
lat.start_date as atStartDate,
lat.end_date as atEndDate,
lat.rentable_area as Rentable,
lat.usable_area as Usable,
laat.start_date as aatStartDate,
laat.end_date as aatEndDate,
MK.Path as OrgPath,
CAST(laa.percentage as numeric(9,2)) as Percentage,
laa.rentable_area as aaRentable,
laa.usable_area as aaUsable,
laa.headcounts as Headcount,
laa.area_allocation_term_id,
lat.area_term_id,
laa.area_allocation_id,
lp.posting_date
INTO #LEASES FROM la_tbl_lease l
INNER JOIN #LEASEID on l.lease_id=#LEASEID.lease_id
INNER JOIN la_tbl_lease_term lt on lt.lease_id=l.lease_id and lt.IsDeleted=0
LEFT JOIN la_tlu_lease_type lty on lty.lease_type_id=l.lease_type_id and lty.IsDeleted=0
LEFT JOIN la_tlu_lease_status lst on lst.status_id= l.status_id
LEFT JOIN la_tbl_area_group lag on lag.lease_id=l.lease_id
LEFT JOIN fnd_tlu_unit_measure msr on msr.unit_measure_key=lag.unit_measure_key
LEFT JOIN la_tbl_area_term lat on lat.lease_id=l.lease_id and lat.isDeleted=0
LEFT JOIN la_tbl_area_allocat_term laat on laat.area_term_id=lat.area_term_id and laat.isDeleted=0
LEFT JOIN dbo.la_tbl_area_allocation laa on laa.area_allocation_term_id=laat.area_allocation_term_id and laa.isDeleted=0
LEFT JOIN vw_FND_TLU_Menu_Key MK on menu_type_id_key=2 and isActive=1 and id=laa.menu_id_key
INNER JOIN la_tbl_lease_projection lp on lp.lease_projection_id = #LEASEID.lease_projection_id
where lp.posting_date <= laat.end_date and lp.posting_date >= laat.start_date
As may have already been hinted at you should be careful when using the WHERE clause with an OUTER JOIN.
The idea of the OUTER JOIN is to optionally join that table and provide access to the columns.
The JOINS will generate your set and then the WHERE clause will run to restrict your set. If you are using a condition in the WHERE clause that says one of the columns in your outer joined table must exist / equal a value then by the nature of your query you are no longer doing a LEFT JOIN since you are only retrieving rows where that join occurs.
Shorten it and copy it out as a new query in ssms or whatever you are using for testing. Use an inner join unless you want to preserve the left side set even when there is no matching lp.lease_id. Try something like
if object_id('tempdb..#leases) is not null
drop table #leases;
select distinct
l.lease_id
,l.property_id as property_id
,lp.posting_date
into #leases
from la_tbl_lease as l
inner join la_tbl_lease_projection as lp on lp.lease_id = l.lease_id
where lp.posting_date <= laat.end_date and lp.posting_date >= laat.start_date
select * from #leases
drop table #leases
If this gets what you want then you can work from there and add the other left joins to the query (getting rid of the select * and 'drop table' if you copy it back into your proc). If it doesn't then look at your Boolean date logic or provide more detail for us. If you are new to sql and its procedural extensions, try using the object explorer to examine the properties of the columns you are querying, and try selecting the top 1000 * from the tables you are using to get a feel for what the data looks like when building queries. -Mike
You can try the BETWEEN operator as well
Where lp.posting_date BETWEEN laat.start_date AND laat.end_date
Reasoning: You can have issues wheres there is no matching values in a table. In that instance on a left join the table will populate with null. Using the 'BETWEEN' operator insures that all returns have a value that is between the range and no nulls can slip in.
As it turns out, the problem was easier to solve and it was in a different place in the stored procedure. All I had to do was add one line to one of the cursors to include area term allocations by date.