Oracle Left outer joining same tables multiple times - sql

I have four tables named as ROLE,EMPLOYEE, HIERARCHY_UNIT, EMPLOYEE_CLASS all of them have a different nameid column which is a primary key to STRING_TABLE table.Also STRING_TABLE will have a column stringtext that stores the exact text or we can say name.
All of these tables are linked as all of them will have a empid column.
Now i want to select some information from EMPLOYEE table along with the names of corresponding role, hierarchy_unit,employeeclass.My select query willl be something like
SELECT EMPST.STRINGTEXT,EROLE.STRINGTEXT,EHIER.STRINGTEXT,EEMPCLASS.STRINGTEXT
FROM EMPLOYEE EMP
INNER JOIN ROLE RO ON ROLE.EMPID=EMP.EMPID
INNER JOIN EMPLOYEE_CLASS EC ON EC.EMPID = EMP.EMPID
INNER JOIN HIERARCHY_UNIT HU ON HU.EMPID=EMP.EMPID
LEFT OUTER JOIN STRING_TABLE EMPST ON EMPST.NAMEID=EMP.NAMEID
LEFT OUTER JOIN STRING_TABLE EROLE ON RO.NAMEID=EROLE.NAMEID
LEFT OUTER JOIN STRING_TABLE EHIER ON HU.NAMEID=EHIER.NAMEID
LEFT OUTER JOIN STRING_TABLE EEMPCLASS ON EEMPCLASS.NAMEID=EC.NAMEID
The above query is working fine but i have a question whether doing the join to same table will not cause any performance issue. In the above example i have taken left outer join 3 times ( in actual i have a case of 26 joins with the string table ).Is there any way to optimize the above select query n and not to take join with same table multiple times?

It is hard to say anything about performance of a query without it's execution plan. Guessing is the only thing possible at the moment.
So, assuming STRING_TABLE.NAMEID is worth indexing (there are many unique values) and you already have this column indexed, this query is fine.
In the select-part of the query you've specified 4 STRING_TABLE's columns. This means you're asking the database to find values for different NAMEIDs from 4 different lines in that table and post them in one line for each line in the result query.
What the database has to do is looking 4 times (or 26 in production) for the particular line with particular nameid for each line in the output. This is why you need to join STRINGS_TABLE 4 times and, again, this is fine as long as column is worth-indexing and is already indexed.
If there are many non-unique data in NAMEID column, you might need to use partitioning or even to change database structure in order to get better performance. But, again, query is fine and I don't see any way to make it better

Related

Newbie to SQL I have run the the inner join query but result comes up with columns only

I have run this query in adventureworks but the result is run successfully but i only get the columns instead of the data with columns how so?
select
a.BusinessEntityID,b.bonus,b.SalesLastYear
from
[Sales].[SalesPersonQuotaHistory] a
inner join
[Sales].[SalesPerson] b
on
a.SalesQuota = b.SalesQuota
My best guess is that instead of joining the tables on SalesQuota, you should be joining them on something else - An ID field, typically.
I don't have Adventureworks here, but judging from the names of the tables and the columns that you've provided, I would assume that there's a SalesPersonID field of some sort that actually connects a Salesperson's quota history to the Salesperson him/herself.
I would expect that you're looking for something closer to this:
SELECT
a.BusinessEntityID
,b.bonus
,b.SalesLastYear
FROM [Sales].[SalesPersonQuotaHistory] a
INNER JOIN [Sales].[SalesPerson] b
ON a.SalesPersonID = b.SalesPersonID
General Knowledge:
INNER JOIN means "Show me only entries (rows) that have a matching value on both sides of the condition." (i.e. The value in Table A matches the value in Table B).
So ON a.SalesQuota = b.SalesQuota means "Only where the value of SalesQuota in Table A matches the value of SalesQuota in Table B."
I'm not sure what the purpose of this query could be, since it is entirely possible that two salespeople have the same values in both tables, and then you would get duplicate rows (because the values of SalesQuota would match in both cases), or that the values wouldn't match at all, and then you wouldn't get any rows - I suspect that is what's happening to you.
Consider the conditions of what you're trying to join. Are you really trying to join quota amounts, or are you trying to retrieve quota information for specific salespeople? The answer should help guide your JOIN conditions.

SQL Server View tables not joining

I am incredibly new to SQL and am trying to create a view for a pizza store database. The sides ordered table and the sides names table have to be separate but need a view that combines them.
This is the code I have entered,
CREATE VIEW ordered_sides_view
AS
SELECT
ordered_side_id, side.side_id, side_name, number_ordered,
SUM(number_ordered * price) AS 'total_cost'
FROM
ordered_side
FULL JOIN
side ON ordered_side.side_id = side.side_id
GROUP BY
ordered_side_id, side.side_id, side_name, number_ordered;
The problem is that this is the resulting table.
Screenshot of view table:
How do I get the names to match the ordered sides?
You fail to understand what a FULL JOIN and an INNER JOIN operation does.
FULL JOIN returns at least every row from each table (plus any extra values from the ON clause).
INNER JOIN returns only matching row sets based on the ON clause.
OUTER JOIN returns every matching row set PLUS the side of the join that the OUTER JOIN is on (LEFT OUTER JOIN vs RIGHT OUTER JOIN).
In your picture, you can clearly see that there are no rows that match from the tables ordered_side and side...
That is why switching to an INNER JOIN returns zero rows...there are no matches on the COLUMNS YOU CHOSE TO USE.
Why in your SELECT operator do you have this:
SELECT ordered_side_id, side.side_id, side_name, number_ordered,
while your ON clause has this:
side ON ordered_side.side_id = side.side_id
ordered_side_id !=ordered_side.side_id
Investigate your columns and fix your JOIN clause to match the correct columns.
P.S. I like how you structure your queries. Very nice and what an
expert does! It makes reading MUCH, MUCH easier. :)
One suggestion I might add is structure your columns in the SELECT statement in its own row:
SELECT ordered_side_id
, side.side_id
, side_name
, number_ordered
, SUM(number_ordered * price) AS Total_Cost --or written [Total_Cost]/'Total_Cost'
FROM ordered_side
FULL JOIN side ON ordered_side.ordered_side_id = side.side_id
GROUP BY ordered_side_id
, side.side_id
, side_name
, number_ordered;

Is there any performance difference between using inner join vs left join? [duplicate]

I've created SQL command that uses INNER JOIN on 9 tables, anyway this command takes a very long time (more than five minutes). So my folk suggested me to change INNER JOIN to LEFT JOIN because the performance of LEFT JOIN is better, despite what I know. After I changed it, the speed of query got significantly improved.
I would like to know why LEFT JOIN is faster than INNER JOIN?
My SQL command look like below:
SELECT * FROM A INNER JOIN B ON ... INNER JOIN C ON ... INNER JOIN D and so on
Update:
This is brief of my schema.
FROM sidisaleshdrmly a -- NOT HAVE PK AND FK
INNER JOIN sidisalesdetmly b -- THIS TABLE ALSO HAVE NO PK AND FK
ON a.CompanyCd = b.CompanyCd
AND a.SPRNo = b.SPRNo
AND a.SuffixNo = b.SuffixNo
AND a.dnno = b.dnno
INNER JOIN exFSlipDet h -- PK = CompanyCd, FSlipNo, FSlipSuffix, FSlipLine
ON a.CompanyCd = h.CompanyCd
AND a.sprno = h.AcctSPRNo
INNER JOIN exFSlipHdr c -- PK = CompanyCd, FSlipNo, FSlipSuffix
ON c.CompanyCd = h.CompanyCd
AND c.FSlipNo = h.FSlipNo
AND c.FSlipSuffix = h.FSlipSuffix
INNER JOIN coMappingExpParty d -- NO PK AND FK
ON c.CompanyCd = d.CompanyCd
AND c.CountryCd = d.CountryCd
INNER JOIN coProduct e -- PK = CompanyCd, ProductSalesCd
ON b.CompanyCd = e.CompanyCd
AND b.ProductSalesCd = e.ProductSalesCd
LEFT JOIN coUOM i -- PK = UOMId
ON h.UOMId = i.UOMId
INNER JOIN coProductOldInformation j -- PK = CompanyCd, BFStatus, SpecCd
ON a.CompanyCd = j.CompanyCd
AND b.BFStatus = j.BFStatus
AND b.ProductSalesCd = j.ProductSalesCd
INNER JOIN coProductGroup1 g1 -- PK = CompanyCd, ProductCategoryCd, UsedDepartment, ProductGroup1Cd
ON e.ProductGroup1Cd = g1.ProductGroup1Cd
INNER JOIN coProductGroup2 g2 -- PK = CompanyCd, ProductCategoryCd, UsedDepartment, ProductGroup2Cd
ON e.ProductGroup1Cd = g2.ProductGroup1Cd
A LEFT JOIN is absolutely not faster than an INNER JOIN. In fact, it's slower; by definition, an outer join (LEFT JOIN or RIGHT JOIN) has to do all the work of an INNER JOIN plus the extra work of null-extending the results. It would also be expected to return more rows, further increasing the total execution time simply due to the larger size of the result set.
(And even if a LEFT JOIN were faster in specific situations due to some difficult-to-imagine confluence of factors, it is not functionally equivalent to an INNER JOIN, so you cannot simply go replacing all instances of one with the other!)
Most likely your performance problems lie elsewhere, such as not having a candidate key or foreign key indexed properly. 9 tables is quite a lot to be joining so the slowdown could literally be almost anywhere. If you post your schema, we might be able to provide more details.
Edit:
Reflecting further on this, I could think of one circumstance under which a LEFT JOIN might be faster than an INNER JOIN, and that is when:
Some of the tables are very small (say, under 10 rows);
The tables do not have sufficient indexes to cover the query.
Consider this example:
CREATE TABLE #Test1
(
ID int NOT NULL PRIMARY KEY,
Name varchar(50) NOT NULL
)
INSERT #Test1 (ID, Name) VALUES (1, 'One')
INSERT #Test1 (ID, Name) VALUES (2, 'Two')
INSERT #Test1 (ID, Name) VALUES (3, 'Three')
INSERT #Test1 (ID, Name) VALUES (4, 'Four')
INSERT #Test1 (ID, Name) VALUES (5, 'Five')
CREATE TABLE #Test2
(
ID int NOT NULL PRIMARY KEY,
Name varchar(50) NOT NULL
)
INSERT #Test2 (ID, Name) VALUES (1, 'One')
INSERT #Test2 (ID, Name) VALUES (2, 'Two')
INSERT #Test2 (ID, Name) VALUES (3, 'Three')
INSERT #Test2 (ID, Name) VALUES (4, 'Four')
INSERT #Test2 (ID, Name) VALUES (5, 'Five')
SELECT *
FROM #Test1 t1
INNER JOIN #Test2 t2
ON t2.Name = t1.Name
SELECT *
FROM #Test1 t1
LEFT JOIN #Test2 t2
ON t2.Name = t1.Name
DROP TABLE #Test1
DROP TABLE #Test2
If you run this and view the execution plan, you'll see that the INNER JOIN query does indeed cost more than the LEFT JOIN, because it satisfies the two criteria above. It's because SQL Server wants to do a hash match for the INNER JOIN, but does nested loops for the LEFT JOIN; the former is normally much faster, but since the number of rows is so tiny and there's no index to use, the hashing operation turns out to be the most expensive part of the query.
You can see the same effect by writing a program in your favourite programming language to perform a large number of lookups on a list with 5 elements, vs. a hash table with 5 elements. Because of the size, the hash table version is actually slower. But increase it to 50 elements, or 5000 elements, and the list version slows to a crawl, because it's O(N) vs. O(1) for the hashtable.
But change this query to be on the ID column instead of Name and you'll see a very different story. In that case, it does nested loops for both queries, but the INNER JOIN version is able to replace one of the clustered index scans with a seek - meaning that this will literally be an order of magnitude faster with a large number of rows.
So the conclusion is more or less what I mentioned several paragraphs above; this is almost certainly an indexing or index coverage problem, possibly combined with one or more very small tables. Those are the only circumstances under which SQL Server might sometimes choose a worse execution plan for an INNER JOIN than a LEFT JOIN.
There is one important scenario that can lead to an outer join being faster than an inner join that has not been discussed yet.
When using an outer join, the optimizer is always free to drop the outer joined table from the execution plan if the join columns are the PK of the outer table, and none of the outer table columns are referenced outside of the outer join itself. For example SELECT A.* FROM A LEFT OUTER JOIN B ON A.KEY=B.KEY and B.KEY is the PK for B. Both Oracle (I believe I was using release 10) and Sql Server (I used 2008 R2) prune table B from the execution plan.
The same is not necessarily true for an inner join: SELECT A.* FROM A INNER JOIN B ON A.KEY=B.KEY may or may not require B in the execution plan depending on what constraints exist.
If A.KEY is a nullable foreign key referencing B.KEY, then the optimizer cannot drop B from the plan because it must confirm that a B row exists for every A row.
If A.KEY is a mandatory foreign key referencing B.KEY, then the optimizer is free to drop B from the plan because the constraints guarantee the existence of the row. But just because the optimizer can drop the table from the plan, doesn't mean it will. SQL Server 2008 R2 does NOT drop B from the plan. Oracle 10 DOES drop B from the plan. It is easy to see how the outer join will out-perform the inner join on SQL Server in this case.
This is a trivial example, and not practical for a stand-alone query. Why join to a table if you don't need to?
But this could be a very important design consideration when designing views. Frequently a "do-everything" view is built that joins everything a user might need related to a central table. (Especially if there are naive users doing ad-hoc queries that do not understand the relational model) The view may include all the relevent columns from many tables. But the end users might only access columns from a subset of the tables within the view. If the tables are joined with outer joins, then the optimizer can (and does) drop the un-needed tables from the plan.
It is critical to make sure that the view using outer joins gives the correct results. As Aaronaught has said - you cannot blindly substitute OUTER JOIN for INNER JOIN and expect the same results. But there are times when it can be useful for performance reasons when using views.
One last note - I haven't tested the impact on performance in light of the above, but in theory it seems you should be able to safely replace an INNER JOIN with an OUTER JOIN if you also add the condition <FOREIGN_KEY> IS NOT NULL to the where clause.
If everything works as it should it shouldn't, BUT we all know everything doesn't work the way it should especially when it comes to the query optimizer, query plan caching and statistics.
First I would suggest rebuilding index and statistics, then clearing the query plan cache just to make sure that's not screwing things up. However I've experienced problems even when that's done.
I've experienced some cases where a left join has been faster than a inner join.
The underlying reason is this:
If you have two tables and you join on a column with an index (on both tables).
The inner join will produce the same result no matter if you loop over the entries in the index on table one and match with index on table two as if you would do the reverse: Loop over entries in the index on table two and match with index in table one.
The problem is when you have misleading statistics, the query optimizer will use the statistics of the index to find the table with least matching entries (based on your other criteria).
If you have two tables with 1 million in each, in table one you have 10 rows matching and in table two you have 100000 rows matching. The best way would be to do an index scan on table one and matching 10 times in table two. The reverse would be an index scan that loops over 100000 rows and tries to match 100000 times and only 10 succeed. So if the statistics isn't correct the optimizer might choose the wrong table and index to loop over.
If the optimizer chooses to optimize the left join in the order it is written it will perform better than the inner join.
BUT, the optimizer may also optimize a left join sub-optimally as a left semi join. To make it choose the one you want you can use the force order hint.
Try both queries (the one with inner and left join) with OPTION (FORCE ORDER) at the end and post the results. OPTION (FORCE ORDER) is a query hint that forces the optimizer to build the execution plan with the join order you provided in the query.
If INNER JOIN starts performing as fast as LEFT JOIN, it's because:
In a query composed entirely by INNER JOINs, the join order doesn't matter. This gives freedom for the query optimizer to order the joins as it sees fit, so the problem might rely on the optimizer.
With LEFT JOIN, that's not the case because changing the join order will alter the results of the query. This means the engine must follow the join order you provided on the query, which might be better than the optimized one.
Don't know if this answers your question but I was once in a project that featured highly complex queries making calculations, which completely messed up the optimizer. We had cases where a FORCE ORDER would reduce the execution time of a query from 5 minutes to 10 seconds.
Have done a number of comparisons between left outer and inner joins and have not been able to find a consisten difference. There are many variables. Am working on a reporting database with thousands of tables many with a large number of fields, many changes over time (vendor versions and local workflow) . It is not possible to create all of the combinations of covering indexes to meet the needs of such a wide variety of queries and handle historical data. Have seen inner queries kill server performance because two large (millions to tens of millions of rows) tables are inner joined both pulling a large number of fields and no covering index exists.
The biggest issue though, doesn't seem to appeaer in the discussions above. Maybe your database is well designed with triggers and well designed transaction processing to ensure good data. Mine frequently has NULL values where they aren't expected. Yes the table definitions could enforce no-Nulls but that isn't an option in my environment.
So the question is... do you design your query only for speed, a higher priority for transaction processing that runs the same code thousands of times a minute. Or do you go for accuracy that a left outer join will provide. Remember that inner joins must find matches on both sides, so an unexpected NULL will not only remove data from the two tables but possibly entire rows of information. And it happens so nicely, no error messages.
You can be very fast as getting 90% of the needed data and not discover the inner joins have silently removed information. Sometimes inner joins can be faster, but I don't believe anyone making that assumption unless they have reviewed the execution plan. Speed is important, but accuracy is more important.
Outer joins can offer superior performance when used in views.
Say you have a query that involves a view, and that view is comprised of 10 tables joined together. Say your query only happens to use columns from 3 out of those 10 tables.
If those 10 tables had been inner-joined together, then the query optimizer would have to join them all even though your query itself doesn't need 7 out of 10 of the tables. That's because the inner joins themselves might filter down the data, making them essential to compute.
If those 10 tables had been outer-joined together instead, then the query optimizer would only actually join the ones that were necessary: 3 out of 10 of them in this case. That's because the joins themselves are no longer filtering the data, and thus unused joins can be skipped.
Source:
http://www.sqlservercentral.com/blogs/sql_coach/2010/07/29/poor-little-misunderstood-views/
Your performance problems are more likely to be because of the number of joins you are doing and whether the columns you are joining on have indexes or not.
Worst case you could easily be doing 9 whole table scans for each join.
I found something interesting in SQL server when checking if inner joins are faster than left joins.
If you dont include the items of the left joined table, in the select statement, the left join will be faster than the same query with inner join.
If you do include the left joined table in the select statement, the inner join with the same query was equal or faster than the left join.
From my comparisons, I find that they have the exact same execution plan. There're three scenarios:
If and when they return the same results, they have the same speed. However, we must keep in mind that they are not the same queries, and that LEFT JOIN will possibly return more results (when some ON conditions aren't met) --- this is why it's usually slower.
When the main table (first non-const one in the execution plan) has a restrictive condition (WHERE id = ?) and the corresponding ON condition is on a NULL value, the "right" table is not joined --- this is when LEFT JOIN is faster.
As discussed in Point 1, usually INNER JOIN is more restrictive and returns fewer results and is therefore faster.
Both use (the same) indices.

Inner join between two tables with same count values

I have been working on this issue since 2 days now.
I have two tables created by using SQL Select statements
SELECT (
) Target
INNER JOIN
SELECT (
) Source
ON Join condition 1
AND Join condition 2
AND Join condition 3
AND Join condition 4
AND Join condition 5
The target table has count value of 10,000 records.
The source table has count value of 10,000 records.
but when I do an inner join between the two tables on the 5 join conditions
I get 9573 records.
I am basically trying to find a one to one match between source and target table. I feel every field from target matches every field in source.
Questions:
Why does my inner join give less records even if there are same value of records in both tables?
If it is expected, how can I make sure I get the exact 10,000 records after the join condition?
1) An INNER JOIN only outputs the rows from the JOINING of two tables where their joining columns match. So in your case, Join Condition1 may not exist in rows in both tables and therefore some rows are filtering out.
2) As the other poster mentioned a left join is one way. You need to look which table source or target you want to use as your master i.e. start from and return all those rows. You then left join the remaining table based on your conditions to add all the columns where you join conditions match.
It's probably better if you give us the tables you are working on and the query\results you are trying to achieve.
There's some really good articles about the different joins out there. But it looks like you'd be interested in left joins. So if it exists in Target, but not in Source, it will not drop the record.
So, it would be:
SELECT(...) Target
LEFT OUTER JOIN
SELECT(...) Source
ON cond1 and cond2 and cond3 and cond4 and cond5
Give that a shot and let me know how it goes!
Sometime you need to rely on logical analysis rather than feelings. Use this query to find the fields that do not match and then work out your next steps
SELECT
Target.Col1,Source.Col1,
Target.Col2,Source.Col2,
Target.Col3,Source.Col3
FROM
(
) Target
FULL OUTER JOIN
(
) Source
ON Target.Col1=Source.Col1
AND Target.Col2=Source.Col2
AND Target.Col3=Source.Col3
WHERE (
Target.Col1 IS NULL
OR Source.Col1 IS NULL
OR Target.Col2 IS NULL
OR Source.Col2 IS NULL
OR Target.Col3 IS NULL
OR Source.Col3 IS NULL
)

Successive LEFT SQL joins back to original

I need to redo sql statement in legacy Foxpro application and don't understand whether it is meaningful at all. Syntax is a bit specific - it extracts data from temporary table into the same temporary table ( overwriting) with some joins.
SELECT aa.*,b.spa_date FROM (ALIAS()) aa INNER JOIN jobs ON aa.seq=jobs.seq ;
LEFT JOIN job2 ON jobs.job_no=job2.rucjob;
left join jobs b on b.job_no=job2.job_no;
WHERE jobs.qty1<>0 INTO CURSOR (ALIAS())
Since only one field is added from joined tables ( spa_date ) is there any point in 2 left joins or I am missing something. Isn't it equivalent to
SELECT aa.*,jobs.spa_date FROM (ALIAS()) aa INNER JOIN jobs ON aa.seq=jobs.seq ;
WHERE jobs.qty1<>0 INTO CURSOR (ALIAS())
They are different because b.spa_date come from the second left join. You may be missing filtered rows without both left joins.
You would need to know the intent of the original query and perhaps rewrite it to make more sense but I'd say the two queries are different.