How can I remove the subquery from the select statement? - sql

I need help in removing the subquery out of the original SELECT statement. Is this even possible? I'm needing this to ultimately move queries like this to Denodo/VQL, which doesn't allow subqueries in SELECT statements (but does allow CTE/WITH and subqueries in FROM/WHERE).
select case when material in (
select material
from schema.material_table
where old_material like '%55AD%'
) then 'Found'
else 'Not Found'
end
from schema.material_table;

I can see a couple of options. The most direct translation seems to be:
SELECT CASE
WHEN m2.MATERIAL IS NOT NULL THEN 'Found'
ELSE 'Not Found'
END AS IZZIT_THERE
FROM SCHEMA.MATERIAL_TABLE m2
RIGHT OUTER JOIN SCHEMA.MATERIAL_TABLE m1
ON m1.MATERIAL = m2.MATERIAL
WHERE m1.OLD_MATERIAL LIKE '%55AD%'
but the use of a RIGHT OUTER JOIN may be unfamiliar. To switch to the more familiar LEFT OUTER JOIN we need to reverse the position of the tables in the query and alter how the conditions are presented:
SELECT CASE
WHEN m1.MATERIAL IS NOT NULL THEN 'Found'
ELSE 'Not Found'
END AS IZZIT_THERE
FROM SCHEMA.MATERIAL_TABLE m1
LEFT OUTER JOIN SCHEMA.MATERIAL_TABLE m2
ON m2.MATERIAL = m1.MATERIAL
WHERE m1.OLD_MATERIAL LIKE '%55AD%'
I kept the aliases the same so you can see how they moved around in the query. In both queries m1 is the primary table, that is, it's the one that must provide data, while m2 is the secondary or "optional" table - it may or may not have data which matches the primary.
Personally, I prefer joins over subqueries as I find them easier to understand, but YMMV.

Related

Difference between SQL statements

I have come across two versions of an SQLRPGLE program and saw a change in the code as below:
Before:
Exec Sql SELECT 'N'
INTO :APRFLG
FROM LG751F T1
INNER JOIN LG752F T2
ON T1.ISBOLN = T2.IDBOLN AND
T1.ISITNO = T2.IDMDNO
WHERE T2.IDVIN = :M_VIN AND
T1.ISAPRV <> 'Y';
After:
Exec Sql SELECT case
when T1.ISAPRV <> 'Y' then 'N'
else T1.ISAPRV
end as APRFLG
INTO :APRFLG
FROM LG751F T1
join LG752F T2
ON T1.ISBOLN = T2.IDBOLN AND
T1.ISITNO = T2.IDMDNO
WHERE T2.IDVIN = :M_VIN AND
T1.ISAPRV <> 'Y'
group by T1.ISAPRV;
Could you please tell me if you see any difference in how the codes would work differently? The second SQL has a group by which is supposed to be a fix to avoid -811 SQLCod error. Apart from this, do you guys spot any difference?
They are both examples of poor coding IMO.
The requirement to "remove duplicates" is often an indication of a bad statement design and/or a bad DB design.
You appear to be doing an existence check, in which case you should be making use of the EXISTS predicate.
select 'N' into :APRFLG
from sysibm.sysdummy1
where exists (select 1
FROM LG751F T1
INNER JOIN LG752F T2
ON T1.ISBOLN = T2.IDBOLN
AND T1.ISITNO = T2.IDMDNO
WHERE
T2.IDVIN = :M_VIN
AND T1.ISAPRV <> 'Y');
As far as the original two statements, besides the group by, the only real difference is moving columns from the JOIN clause to the WHERE clause. However, the query engine in Db2 for i will rewrite both statements equivalently and come up with the same plan; since an inner join is used.
EDIT : as Mark points out, there JOIN and WHERE are the same in both the OP's statements. But I'll leave the statement above in as an FYI.
I don't find a compelling difference, other that the addition of the group by, that will have the effect of suppressing any duplicate rows that might have been output.
It looks like the developer intended for the query to be able to vary its output to be sometimes Y and sometimes N, but forgot to remove the WHERE clause that necessarily forces the case to always be true, and hence it to always output N. This kind of pattern is usually seen when the original report includes some spec like "don't include managers in the employee Sakarya report" and that then changes to "actually we want to know if the employee is a manager or not". What was a "where employee not equal manager" becomes a "case when employee is manager then... else.." but the where clause needs removing for it to have any effect
The inner keyword has disappeared from the join statement, but overall this should also be a non-op
Another option is just to use fetch first row only like this:
Exec Sql
SELECT 'N'
INTO :APRFLG
FROM LG751F T1
JOIN LG752F T2
ON T1.ISBOLN = T2.IDBOLN AND
T1.ISITNO = T2.IDMDNO
WHERE T2.IDVIN = :M_VIN AND
T1.ISAPRV <> 'Y'
fetch first row only;
That makes it more obvious that you only want a single row rather than trying to use grouping which necessitates the funky do nothing CASE statement. But I do like the EXISTS method Charles provided since that is the real goal, and having exists in there makes it crystal clear.
If your lead insists on GROUP BY, you can also GROUP BY 'N' and still leave out the CASE statement.

SQL Server Query - Weird Behaviour

Consider below SQL.
SELECT DISTINCT bvc_Order.ID,
bvc_OrderItem.ProductID,
bvc_OrderItem_BundleItem.ProductID
FROM dbo.bvc_OrderItem WITH (nolock)
RIGHT OUTER JOIN dbo.bvc_Order WITH (nolock)
LEFT OUTER JOIN dbo.bvc_User WITH (nolock) ON dbo.bvc_Order.UserID = dbo.bvc_User.ID
LEFT OUTER JOIN dbo.Amazon_Merchants WITH (nolock) ON dbo.bvc_Order.CompanyID = dbo.Amazon_Merchants.ID ON dbo.bvc_OrderItem.OrderID = dbo.bvc_Order.ID
LEFT OUTER JOIN dbo.bvc_OrderItem_BundleItem WITH (nolock) ON dbo.bvc_OrderItem.ID = dbo.bvc_OrderItem_BundleItem.OrderItemID
LEFT OUTER JOIN dbo.bvc_Product WITH (nolock) ON dbo.bvc_OrderItem.ProductID = dbo.bvc_Product.ID
WHERE 1=1
AND (bvc_Order.StatusCode <> 1
AND bvc_Order.StatusCode <> 999)
AND ( bvc_OrderItem.ProductID IN ('28046_00')
OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00'))
AND bvc_Order.OrderSource = 56;
The query when I execute against my database, it returns 85 rows. Well, that is not correct.
If I just remove the part "AND bvc_Order.OrderSource = 56" it returns back 5 rows which is really correct.
Strange.....
Another thing, if I remove the part
OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00')
it will also return the 5 rows as expected even with bvc_Order.OrderSource filter.
I am not sure why it is adding more rows while I am trying to reduce rows by using filters.
the table bvc_OrderItem_BundleItem doesn't contain any rows for the result order ids or OrderItemIDs
[edit]
Thanks guys, I tried to remove the LEFT/RIGHT Join Mix but Query manager doesn't allows only LEFT, it does add at least one RIGHT join. I updated the SQL to remove extra tables and now we have only three. But same result
SELECT DISTINCT dbo.bvc_Order.ID, dbo.bvc_OrderItem.ProductID, dbo.bvc_OrderItem_BundleItem.ProductID AS Expr1
FROM dbo.bvc_OrderItem
LEFT OUTER JOIN dbo.bvc_OrderItem_BundleItem ON dbo.bvc_OrderItem.ID = dbo.bvc_OrderItem_BundleItem.OrderItemId
RIGHT OUTER JOIN dbo.bvc_Order ON dbo.bvc_OrderItem.OrderID = dbo.bvc_Order.ID
WHERE 1=1
AND (bvc_Order.StatusCode <> 1 AND bvc_Order.StatusCode <> 999)
AND (
bvc_OrderItem.ProductID IN ('28046_00')
OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00')
)
AND bvc_Order.OrderSource = 56;
[edit]So far, there is no solution for this. I previously pasted a link in my comment with example data outout for both valid/invalid results with queries. here it is again.
http://sameers.me/SQLIssue.xlsx
One thing to remember here is that ALL left join is not possible. Let me explain further
bvc_Order contains main order record
bvc_ORderItem contains Order Items/Products
bvc_ORderItem_BundleItem contains child products of the product which are available in bvC_OrderItem table.
Now NOT Every product has child products, so bvc_OrderItem_BundleItem may not have any record (and in current scenario, there is really no valid row for the orders in bvC_OrderItem_BundleItem).
In short, in current scenario, there is NO matching row available in bvc_OrderItem_BundleItem table. If I remove that join for now, it is all okay, but in real world, I can't remove that BundleItem table join ofcourse.
thank you
When you say
WHERE bvc_Order.OrderSource = 56
that evaluates to false when bvc_Order.OrderSource is NULL. If the LEFT/RIGHT join failed then it will be NULL. This effectively turns the LEFT/RIGHT join into an inner join.
You probably should write the predicate into the ON clause. An alternative approach, which might not deliver the same results, is:
WHERE (bvc_Order.OrderSource IS NULL OR bvc_Order.OrderSource = 56)
The other predicates have the same problem:
Another thing, if I remove the part OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00') it will also return the 5 rows as expected
When the join fails bvc_OrderItem_BundleItem.ProductID is NULL.
I also would recommend writing queries manually. If I understand you right this query comes from a designer. It's structure is quite confusing. I'm pulling up the most important comment:
Mixing left and right outer joins in a query is just confusing. You should start by rewriting the from clause to only use one type (and I strongly recommend left outer join). – Gordon Linoff
When you have eliminated the impossible, whatever remains, however
improbable, must be the truth? S.H.
It is impossible that an extra AND condition appended to a WHERE clause can ever result in extra rows. That would imply a database engine defect, which I hope I can assume is "impossible". (If not, then I guess it's back to square one).
That fact makes it easier to concentrate on possible reasons:
When you comment out
AND bvc_Order.OrderSource = 56;
then you also comment out the semicolon terminator. Is it possible
that there is text following this query that is affecting it? Try
putting a semicolon at the end of the previous line to make sure.
Depending on the tool you are using to run queries, sometimes when a
query fails to execute, the tool mistakenly shows an old result set.
Make sure your query is executing correctly by adding a dummy column
to the SELECT statement to absolutely prove you are seeing live
results. Which tool are you using?
when you use LEFT outer join it will give all the rows from left table (dbo.bvc_OrderItem) once the your and, or conditions satisfies,
the same thing happens with Right outer join too,
Those conditions (Left join, right join ) may not restrict the rows since rows from one table can be all, another table with some rows only.
check with your join condition
Then check you condition :
(bvc_Order.StatusCode <> 1 AND bvc_Order.StatusCode <> 999)
if any rows satisfying this condition
next check with another condition
[bvc_OrderItem.ProductID IN ('28046_00')
OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00')]
Then bvc_Order.OrderSource = 56
compare the result of three queries and check the data in with the conditions and then write your complete query, so that you will understand where the mistake you have done.
Few points to remember
1.And is applied during Virtual join phases
2.Where clause is applied after the final result
3.Left join followed by right join is effectively an inner join in some cases
Lets break your query step by step..
dbo.bvc_OrderItem a1
LEFT OUTER JOIN
dbo.bvc_OrderItem_BundleItem b1
Above output will be a single virtual table (logically) which contains all rows from b1 with matching rows from a1
now below predicates from your and clause will be applied
bvc_OrderItem.ProductID IN ('28046_00')
OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00')
which effectively eliminates all rows from bvc_OrderItem_BundleItem even if they have matches and gives result some thing like below if bvc_OrderItem_BundleItem.ProductID IN ('28046_00') is true
bvc_OrderItem bvc_OrderItem_BundleItem
28046 28046
null 1
null 2
null 3
if this condition(bvc_OrderItem.ProductID IN ('28046_00')) is true,then you are asking sql to ignore all rows in bvc_OrderItem ,which effectively means the same result set as above
bvc_OrderItem bvc_OrderItem_BundleItem other columns
28046 28046
null 1
null 2
null 3
next you are doing right outer join with dbo.bvc_Order which may qualifies for the join point I mentioned above
Assume ,you got below result set as output which preserves all of bvc_order table(rough output only for understanding due to lack of actual data)
bvc_OrderItem bvc_OrderItem_BundleItem statuscode ordersource
28046 28046 999 56
null 1 1 57
null 2 100 58
null 3 11 59
Next below AND predicates will be applied
status code <>1 and statuscode<> 999
which means ignore rows which match with bvc_order and has status of 1 ,999 even if they found matching rows
Next you are asking bvc_Order.OrderSource = 56; which means I don't care about other rows,preserve matching rows only for 56 and keep the rest as null
Hope this clarifies on what is happening step by step.A more better way can be provide some test data and show the expected output.
you also can control physical order of joins,you can try below to see if this is what you are trying to do..
SELECT DISTINCT dbo.bvc_Order.ID, dbo.bvc_OrderItem.ProductID, dbo.bvc_OrderItem_BundleItem.ProductID AS Expr1
dbo.bvc_OrderItem
LEFT OUTER JOIN
(
dbo.bvc_OrderItem_BundleItem
RIGHT OUTER JOIN
dbo.bvc_Order
ON dbo.bvc_OrderItem.OrderID = dbo.bvc_OrderItem_BundleItem.OrderItemId
)c
on
dbo.bvc_OrderItem.ID = c.bvc_OrderItem_BundleItem.OrderItemId
WHERE 1=1
AND (bvc_Order.StatusCode <> 1 AND bvc_Order.StatusCode <> 999)
AND (
bvc_OrderItem.ProductID IN ('28046_00')
OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00')
)
AND bvc_Order.OrderSource = 56;
It looks like you are using the Query Designer. I would avoid using this as this can make your queries extremely confusing. Your queries will be much more concise if you are designing them by hand. If you don't completely understand how inner/outer joins work, a great textbook that I used to teach myself SQL is Murach's SQL Server for Developers.
https://www.murach.com/shop/murach-s-sql-server-2012-for-developers-detail
Now, onto the answer.
I've been thinking about how to resolve your problem, and if you are trying to reduce the result set to 5 rows, why are you using multiple outer joins in the first place? I would consider switching the joins to inner joins instead of outer joins if you are looking for a very specific result set. I can't really provide you with a really comprehensive answer without looking at exactly what results you are trying to achieve, but here's a general idea based on what you've provided to all of us:
SELECT DISTINCT dbo.bvc_Order.ID, dbo.bvc_OrderItem.ProductID, dbo.bvc_OrderItem_BundleItem.ProductID AS 'bvc_OrderItem_BundleItem_ProductID'
FROM dbo.bvc_OrderItem
INNER JOIN dbo.bvc_OrderItem_BundleItem ON dbo.bvc_OrderItem.ID = dbo.bvc_OrderItem_BundleItem.OrderItemId
INNER JOIN dbo.bvc_Order ON dbo.bvc_OrderItem.OrderID = dbo.bvc_Order.ID
Start here and then based upon what you are searching for, add where clauses to filter criteria.
Also, your where clause must be rewritten if you use an inner join instead of an outer join:
WHERE 1=1 --not really sure why this is here. This will always be true. Omit this statement to avoid a bad result set.
AND (bvc_Order.StatusCode <> 1 AND bvc_Order.StatusCode <> 999) --this is saying, if the StatusCode is not equal to 1 and not equal to 999, don't include it.
--Revised: Look for Status codes with 1 or 999
--bvc_Order.StatusCode = 1 OR bvc_Order.StatusCode = 999
AND (bvc_OrderItem.ProductID IN ('28046_00') --I would eliminate this unless you are looking to see if this exists in Product ID. You could also accomplish this if you are trying to see if this value is in both tables, change this to:
bvc_OrderItem.ProductID = '28046_00' AND bvc_OrderItem_BundleItem.ProductID = '28046_00')
--if you are trying to see if the order source is 56, use this.
AND bvc_Order.OrderSource = 56;
If you are trying to find out rows that are not included in this result set, then I would use OUTER JOIN as necessary (LEFT preferred). Without more information about what you're looking for in your database, that's the best all of us can do.
bLike #usr writ, the reason of this unexpected (for you) result is, you build query with outer joins, and filter rows after join. If you need filter rows of outer joined tables, you should do this before join.
but probably you try build this:
SELECT DISTINCT o.ID, oi.ProductID, bi.ProductID AS Expr1
FROM dbo.bvc_Order as o
LEFT JOIN dbo.bvc_OrderItem as oi on oi.OrderID = o.ID
LEFT JOIN dbo.bvc_OrderItem_BundleItem as bi ON oi.ID = bi.OrderItemId
WHERE 1=1
AND o.OrderSource = 56;
AND o.StatusCode not in (1, 999)
AND '28046_00' in (oi.ProductID, isnull(bi.ProductID,'_') )
Is this query give results what you need?
if not, try change last condition, for example:
and (bi.ProductID = '28046_00' or bi.ProductID is null and oi.ProductID = '28046_00')
you can also put additional condition in to join conditions, for example:
SELECT DISTINCT o.ID, oi.ProductID, bi.ProductID AS Expr1
FROM dbo.bvc_Order as o
LEFT JOIN dbo.bvc_OrderItem as oi on oi.OrderID = o.ID
LEFT JOIN dbo.bvc_OrderItem_BundleItem as bi ON oi.ID = bi.OrderItemId
and bi.ProductID in ('28046_00') --this join BundleItem only if ...
WHERE 1=1
AND o.OrderSource = 56;
AND o.StatusCode not in (1, 999)
AND (oi.ProductID in ('28046_00') or bi.ProductID is not null)
ah, and if you always need join bvc_Order with bvc_OrderItem then use inner join

INNER JOIN should happen only when a condition is fulfilled

I am trying to "short-circuit" an INNER JOIN, if condition isn't met.
What I tried:
I found that if a Left Join is preceded with a False clause on "ON" condition, the LEFT JOIN fails. Hence, I tried to simulate INNER JOIN with LEFT OUTER JOIN and WHERE clause got the execution plan as below:
DECLARE #a nvarchar(4) = 'All'
SELECT A.*
FROM [dbo].[your_table] A
LEFT JOIN [dbo].[your_table_2] B
ON #a <> 'All'
WHERE A.City_Code = CASE WHEN #a <> 'All'
THEN B.City_Code
ELSE A.City_Code END
This will "short-circuit" the left join and it will never occur. Execution plan is below:
But then when I tried to execute the same statement by declaring the variable as 'Al' and not 'All', I saw execution plan was still the same.
I am puzzled if Join happened in the initial step or not?
What I want:
I want to know whether the above approach is correct? Is it really short-circuiting the INNER JOIN?
I basically want the INNER JOIN to happen between the two tables only when variable is not 'All' else it should not JOIN and continue further. I have already tried by using "OR" (to short-circuit) and "IN" (to apply filter) but the performance slows down if you have too many items in IN clause.
Please help me and do tell me if I was wrong anywhere in my approach.
Sample Data:
I should get the INNER JOIN result only when variable <> 'all'
When variable = 'All', I should get table A i.e.
Note: I have simplified this query hence it appears that simple if statement can do. In actual I have 53 parameters that I need to check and run JOINS. Plus result set of one query must be joined with another i.e. I have several other JOIN conditions preceding this :)
Have you tried something like
SELECT A.*
FROM [dbo].[your_table] A
WHERE EXISTS (
SELECT 1
FROM [dbo].[your_table_2] B
WHERE A.City_Code = B.City_Code
)
OR #a = 'All'
Did you try case statement? The CASE or DECODE is usually like an IF-ELSE statement in the SQL.

I Need some sort of Conditional Join

Okay, I know there are a few posts that discuss this, but my problem cannot be solved by a conditional where statement on a join (the common solution).
I have three join statements, and depending on the query parameters, I may need to run any combination of the three. My Join statement is quite expensive, so I want to only do the join when the query needs it, and I'm not prepared to write a 7 combination IF..ELSE.. statement to fulfill those combinations.
Here is what I've used for solutions thus far, but all of these have been less than ideal:
LEFT JOIN joinedTable jt
ON jt.someCol = someCol
WHERE jt.someCol = conditions
OR #neededJoin is null
(This is just too expensive, because I'm performing the join even when I don't need it, just not evaluating the join)
OUTER APPLY
(SELECT TOP(1) * FROM joinedTable jt
WHERE jt.someCol = someCol
AND #neededjoin is null)
(this is even more expensive than always left joining)
SELECT #sql = #sql + ' INNER JOIN joinedTable jt ' +
' ON jt.someCol = someCol ' +
' WHERE (conditions...) '
(this one is IDEAL, and how it is written now, but I'm trying to convert it away from dynamic SQL).
Any thoughts or help would be great!
EDIT:
If I take the dynamic SQL approach, I'm trying to figure out what would be most efficient with regards to structuring my query. Given that I have three optional conditions, and I need the results from all of them my current query does something like this:
IF condition one
SELECT from db
INNER JOIN condition one
UNION
IF condition two
SELECT from db
INNER JOIN condition two
UNION
IF condition three
SELECT from db
INNER JOIN condition three
My non-dynamic query does this task by performing left joins:
SELECT from db
LEFT JOIN condition one
LEFT JOIN condition two
LEFT JOIN condition three
WHERE condition one is true
OR condition two is true
OR condition three is true
Which makes more sense to do? since all of the code from the "SELECT from db" statement is the same? It appears that the union condition is more efficient, but my query is VERY long because of it....
Thanks!
LEFT JOIN
joinedTable jt ON jt.someCol = someCol AND jt.someCol = conditions AND #neededjoin ...
...
OR
LEFT JOIN
(
SELECT col1, someCol, col2 FROM joinedTable WHERE someCol = conditions AND #neededjoin ...
) jt ON jt.someCol = someCol
...
OR
;WITH jtCTE AS
(SELECT col1, someCol, col2 FROM joinedTable WHERE someCol = conditions AND #neededjoin ...)
SELECT
...
LEFT JOIN
jtCTE ON jtCTE.someCol = someCol
...
To be honest, there is no such construct as a conditional JOIN unless you use literals.
If it's in the SQL statement it's evaluated... so don't have it in the SQL statement by using dynamic SQL or IF ELSE
the dynamic sql solution is usually the best for these situations, but if you really need to get away from that a series of if statments in a stroed porc will do the job. It's a pain and you have to write much more code but it will be faster than trying to make joins conditional in the statement itself.
I would go for a simple and straightforward approach like this:
DECLARE #ret TABLE(...) ;
IF <coondition one> BEGIN ;
INSERT INTO #ret() SELECT ...
END ;
IF <coondition two> BEGIN ;
INSERT INTO #ret() SELECT ...
END ;
IF <coondition three> BEGIN ;
INSERT INTO #ret() SELECT ...
END ;
SELECT DISTINCT ... FROM #ret ;
Edit: I am suggesting a table variable, not a temporary table, so that the procedure will not recompile every time it runs. Generally speaking, three simpler inserts have a better chance of getting better execution plans than one big huge monster query combining all three.
However, we can not guess-timate performance. we must benchmark to determine it. Yet simpler code chunks are better for readability and maintainability.
Try this:
LEFT JOIN joinedTable jt
ON jt.someCol = someCol
AND jt.someCol = conditions
AND #neededJoin = 1 -- or whatever indicates join is needed
I think you'll find it is good performance and does what you need.
Update
If this doesn't give the performance I claimed, then perhaps that's because the last time I did this using joins to a table. The value I needed could come from one of 3 tables, based on 2 columns, so I built a 'join-map' table like so:
Col1 Col2 TableCode
1 2 A
1 4 A
1 3 B
1 5 B
2 2 C
2 5 C
1 11 C
Then,
SELECT
V.*,
LookedUpValue =
CASE M.TableCode
WHEN 'A' THEN A.Value
WHEN 'B' THEN B.Value
WHEN 'C' THEN C.Value
END
FROM
ValueMaster V
INNER JOIN JoinMap M ON V.Col1 = M.oOl1 AND V.Col2 = M.Col2
LEFT JOIN TableA A ON M.TableCode = 'A'
LEFT JOIN TableB B ON M.TableCode = 'B'
LEFT JOIN TableC C ON M.TableCode = 'C'
This gave me a huge performance improvement querying these tables (most of them dozens or hundreds of million-row tables).
This is why I'm asking if you actually get improved performance. Of course it's going to throw a join into the execution plan and assign it some cost, but overall it's going to do a lot less work than some plan that just indiscriminately joins all 3 tables and then Coalesce()s to find the right value.
If you find that compared to dynamic SQL it's only 5% more expensive to do the joins this way, but with the indiscriminate joins is 100% more expensive, it might be worth it to you to do this because of the correctness, clarity, and simplicity over dynamic SQL, all of which are probably more valuable than a small improvement (depending on what you're doing, of course).
Whether the cost scales with the number of rows is also another factor to consider. If even with a huge amount of data you only save 200ms of CPU on a query that isn't run dozens of times a second, it's a no-brainer to use it.
The reason I keep hammering on the fact that I think it's going to perform well is that even with a hash match, it wouldn't have any rows to probe with, or it wouldn't have any rows to create a hash of. The hash operation is going to stop a lot earlier compared to using the WHERE clause OR-style query of your initial post.
The dynamic SQL solution is best in most respects; you are trying to run different queries with different numbers of joins without rewriting the query to do different numbers of joins - and that doesn't work very well in terms of performance.
When I was doing this sort of stuff an æon or so ago (say the early 90s), the language I used was I4GL and the queries were built using its CONSTRUCT statement. This was used to generate part of a WHERE clause, so (based on the user input), the filter criteria it generated might look like:
a.column1 BETWEEN 1 AND 50 AND
b.column2 = 'ABCD' AND
c.column3 > 10
In those days, we didn't have the modern JOIN notations; I'm going to have to improvise a bit as we go. Typically there is a core table (or a set of core tables) that are always part of the query; there are also some tables that are optionally part of the query. In the example above, I assume that 'c' is the alias for the main table. The way the code worked would be:
Note that table 'a' was referenced in the query:
Add 'FullTableName AS a' to the FROM clause
Add a join condition 'AND a.join1 = c.join1' to the WHERE clause
Note that table 'b' was referenced...
Add bits to the FROM clause and WHERE clause.
Assemble the SELECT statement from the select-list (usually fixed), the FROM clause and the WHERE clause (occasionally with decorations such as GROUP BY, HAVING or ORDER BY too).
The same basic technique should be applied here - but the details are slightly different.
First of all, you don't have the string to analyze; you know from other circumstances which tables you need to add to your query. So, you still need to design things so that they can be assembled, but...
The SELECT clause with its select-list is probably fixed. It will identify the tables that must be present in the query because values are pulled from those tables.
The FROM clause will probably consist of a series of joins.
One part will be the core query:
FROM CoreTable1 AS C1
JOIN CoreTable2 AS C2
ON C1.JoinColumn = C2.JoinColumn
JOIN CoreTable3 AS M
ON M.PrimaryKey = C1.ForeignKey
Other tables can be added as necessary:
JOIN AuxilliaryTable1 AS A
ON M.ForeignKey1 = A.PrimaryKey
Or you can specify a full query:
JOIN (SELECT RelevantColumn1, RelevantColumn2
FROM AuxilliaryTable1
WHERE Column1 BETWEEN 1 AND 50) AS A
In the first case, you have to remember to add the WHERE criterion to the main WHERE clause, and trust the DBMS Optimizer to move the condition into the JOIN table as shown. A good optimizer will do that automatically; a poor one might not. Use query plans to help you determine how able your DBMS is.
Add the WHERE clause for any inter-table criteria not covered in the joining operations, and any filter criteria based on the core tables. Note that I'm thinking primarily in terms of extra criteria (AND operations) rather than alternative criteria (OR operations), but you can deal with OR too as long as you are careful to parenthesize the expressions sufficiently.
Occasionally, you may have to add a couple of JOIN conditions to connect a table to the core of the query - that is not dreadfully unusual.
Add any GROUP BY, HAVING or ORDER BY clauses (or limits, or any other decorations).
Note that you need a good understanding of the database schema and the join conditions. Basically, this is coding in your programming language the way you have to think about constructing the query. As long as you understand this and your schema, there aren't any insuperable problems.
Good luck...
Just because no one else mentioned this, here's something that you could use (not dynamic). If the syntax looks weird, it's because I tested it in Oracle.
Basically, you turn your joined tables into sub-selects that have a where clause that returns nothing if your condition does not match. If the condition does match, then the sub-select returns data for that table. The Case statement lets you pick which column is returned in the overall select.
with m as (select 1 Num, 'One' Txt from dual union select 2, 'Two' from dual union select 3, 'Three' from dual),
t1 as (select 1 Num from dual union select 11 from dual),
t2 as (select 2 Num from dual union select 22 from dual),
t3 as (select 3 Num from dual union select 33 from dual)
SELECT m.*
,CASE 1
WHEN 1 THEN
t1.Num
WHEN 2 THEN
t2.Num
WHEN 3 THEN
t3.Num
END SelectedNum
FROM m
LEFT JOIN (SELECT * FROM t1 WHERE 1 = 1) t1 ON m.Num = t1.Num
LEFT JOIN (SELECT * FROM t2 WHERE 1 = 2) t2 ON m.Num = t2.Num
LEFT JOIN (SELECT * FROM t3 WHERE 1 = 3) t3 ON m.Num = t3.Num

if statements in sql query

Good morning all. I have an issue with a query. I want to select something in a query only if another field is somethingelse. The below query will better explain
select
Case isnull(rl.reason,'Not Found')
When 'D' then 'Discontinued'
When 'N' then 'Not Found'
When 'I' then 'Inactive'
When 'C' then 'No Cost'
When '' then 'Not Found'
End as Reason, ***If statement to select pv.descriptor only if reason is in ('D','I','C')***pv.descriptor
from table1 as rl
left join table2 as v on v.field= rl.field
***Here i want an if statment to run if reason is in ('D','I','C')***
left join table3 as pv on
Case rl.scantype
when 'S' then cast(ltrim(rtrim(pv.field#1)) as varchar)
when 'U' then cast(ltrim(rtrim(pv.field#2)) as varchar)
when 'V' then cast(ltrim(rtrim(pv.vfield#3)) as varchar)
end
= rl.scan and pv.vend_no = rl.vendnum
***'**If statement ends*****
left join storemain..prmastp as p on p.emuserid = rl.userid
where rl.scandate between GetDate() -7 and GetDate() order by rl.scandate desc
I want the if statement to select the descriptor only if the reason selected is a 'D','I',or'C'. If not I want a null value there because i will not do the join to get that variable unless the reason is a 'D','I','C'
BY the way, I can used a case statement where i used it in the middle of the left join. It works perfectly fine. That's not my issue.
If you want it in one query, you HAVE to do the join. Using left joins and case statement as you have, you can ensure pv.descriptor is shown as null if that is what you want in certain cases.
If you want control flow, you will need to use T-SQL
If performance is your concern, you shouldn't be joining on computed values. Rethink the database design. You likey want to create new columns for your join, and may want to create intermediary tables if you have many-to-many relationships.
I think that you want to only join to pv if v.reason is (D, I, C) - is that right? If that's your problem, just change your JOIN clause to:
LEFT JOIN table3 as pv ON
LTRIM(RTRIM(
CASE rl.scantype
WHEN 'S' THEN pv.field#1
WHEN 'U' THEN pv.field#2
WHEN 'V' THEN pv.field#3
END
)) = rl.scan
AND rl.vendnum = pv.vend_no
AND rl.reason IN ('D', 'I', 'C')
Of course, you also have "If statement to select pv.descriptor only if reason is in ('D','I','C') [as] pv.descriptor" in the SELECT clause. So, assuming you want that instead, try this:
SELECT
/* your other columns */
CASE
WHEN rl.reason IN ('D', 'I', 'C') THEN pv.descriptor
ELSE NULL --optional, since it'll default to NULL
END as descriptor
What motivates the combination of two queries? Joining could cause a radically different evaluation plan to be required... Selecting between two plans based on the values of variables (or worse - columns!) is not something the query optimizer does well.
You will be far better off if you write two queries and use a SQL IF statement to flow control to one of them.
Edit: What should the query return if there are two rows from rl, one with reason=D and one with reason=S?
I'd keep it simple so whoever supports your code in the future can figure out how to maintain it (when they add another reason code, for example).
It may not be as performant, but it's more maintainable.
Also, is there a way you can get those codes in a table and test for a marker, instead of having them hard-coded in sql?