Difference between SQL statements - sql

I have come across two versions of an SQLRPGLE program and saw a change in the code as below:
Before:
Exec Sql SELECT 'N'
INTO :APRFLG
FROM LG751F T1
INNER JOIN LG752F T2
ON T1.ISBOLN = T2.IDBOLN AND
T1.ISITNO = T2.IDMDNO
WHERE T2.IDVIN = :M_VIN AND
T1.ISAPRV <> 'Y';
After:
Exec Sql SELECT case
when T1.ISAPRV <> 'Y' then 'N'
else T1.ISAPRV
end as APRFLG
INTO :APRFLG
FROM LG751F T1
join LG752F T2
ON T1.ISBOLN = T2.IDBOLN AND
T1.ISITNO = T2.IDMDNO
WHERE T2.IDVIN = :M_VIN AND
T1.ISAPRV <> 'Y'
group by T1.ISAPRV;
Could you please tell me if you see any difference in how the codes would work differently? The second SQL has a group by which is supposed to be a fix to avoid -811 SQLCod error. Apart from this, do you guys spot any difference?

They are both examples of poor coding IMO.
The requirement to "remove duplicates" is often an indication of a bad statement design and/or a bad DB design.
You appear to be doing an existence check, in which case you should be making use of the EXISTS predicate.
select 'N' into :APRFLG
from sysibm.sysdummy1
where exists (select 1
FROM LG751F T1
INNER JOIN LG752F T2
ON T1.ISBOLN = T2.IDBOLN
AND T1.ISITNO = T2.IDMDNO
WHERE
T2.IDVIN = :M_VIN
AND T1.ISAPRV <> 'Y');
As far as the original two statements, besides the group by, the only real difference is moving columns from the JOIN clause to the WHERE clause. However, the query engine in Db2 for i will rewrite both statements equivalently and come up with the same plan; since an inner join is used.
EDIT : as Mark points out, there JOIN and WHERE are the same in both the OP's statements. But I'll leave the statement above in as an FYI.

I don't find a compelling difference, other that the addition of the group by, that will have the effect of suppressing any duplicate rows that might have been output.
It looks like the developer intended for the query to be able to vary its output to be sometimes Y and sometimes N, but forgot to remove the WHERE clause that necessarily forces the case to always be true, and hence it to always output N. This kind of pattern is usually seen when the original report includes some spec like "don't include managers in the employee Sakarya report" and that then changes to "actually we want to know if the employee is a manager or not". What was a "where employee not equal manager" becomes a "case when employee is manager then... else.." but the where clause needs removing for it to have any effect
The inner keyword has disappeared from the join statement, but overall this should also be a non-op

Another option is just to use fetch first row only like this:
Exec Sql
SELECT 'N'
INTO :APRFLG
FROM LG751F T1
JOIN LG752F T2
ON T1.ISBOLN = T2.IDBOLN AND
T1.ISITNO = T2.IDMDNO
WHERE T2.IDVIN = :M_VIN AND
T1.ISAPRV <> 'Y'
fetch first row only;
That makes it more obvious that you only want a single row rather than trying to use grouping which necessitates the funky do nothing CASE statement. But I do like the EXISTS method Charles provided since that is the real goal, and having exists in there makes it crystal clear.
If your lead insists on GROUP BY, you can also GROUP BY 'N' and still leave out the CASE statement.

Related

Why does Oracle SQL update query return "invalid identifier" on existing column?

I have an update query for an Oracle SQL db. Upon execution the query returns ORA-00904: "t1"."sv_id": invalid identifier
So, why do I get an "invalid identifier" error message although the column exists?
Here is the complete query (replaced actual table and column names by dummies in np++)
UPDATE table_1 t1 SET (type) =
CASE
WHEN
((SELECT COUNT(dateCheck.id) FROM table_2 dateCheck
WHERE dateCheck.s_id = t1.s_id AND dateCheck.sv_id = t1.sv_id) = 0)
THEN
(SELECT sv.type FROM table_3 sv WHERE sv.id = t1.sv_id)
ELSE
(SELECT type FROM
(SELECT d.type as type FROM table_2 d
WHERE d.s_id = t1.s_id AND d.sv_id = t1.sv_id
ORDER BY d.creationTimestamp ASC)
WHERE ROWNUM = 1)
END
Now I don't understand why that error occurs.
Here is what I already know:
The Queries in the CASE statement work when executed separately, provided they are wrapped into a query that provides table_1 t1 for sure.
t1.s_id seems to work since oracle doesn't complain about that. When i change it to a column that really doesn't exist, oracle starts complaining about that non existent column before returning something about t1.sv_id. So somehow the alias might work, although I'm not sure about it.
I'm 100% sure that the column t1.sv_id exists and no typo was made. Executed a query on t1 directly and doublechecked everything in notepad by marking all occurences.
An (completely unrelated) update query like the following works as well (note the alias t1 is used in the select query). Don't assume table_1/2 to be the same as in the update query above, just reused the names. This should just illustrate that I successfully used an alias in an update query before.
update table_1 t1 set (t2_id) = (select id from table_2 t2 where t1.id = t2.t1_id)
UPDATE
Thx a lot for pointing me to the "you don't have access to alises in deeper suquery layers" issue. That got me on track again pretty fast.
So here is the query I ended up with. This seems to work fine. Eliminates the acces to t1 in the deeper layers and selects the oldest row, so that the same result should be returned from the query I expected from the original query in the ELSE part.
UPDATE table_1 t1 SET (type) =
CASE
WHEN
((SELECT COUNT(dateCheck.id) FROM table_2 dateCheck
WHERE dateCheck.s_id = t1.s_id AND dateCheck.sv_id = t1.sv_id) = 0)
THEN
(SELECT sv.type FROM table_3 sv WHERE sv.id = t1.sv_id)
ELSE
(SELECT d.type as type FROM table_2 d
WHERE d.s_id = t1.s_id
AND d.sv_id = t1.sv_id
AND d.creation = (SELECT MIN(id.creation) FROM table_2 id
WHERE d.s_id = id.s_id AND d.sv_id = id.sv_id))
END
You can't reference a table alias in a subquery of a subquery; the alias doesn't apply (or doesn't exist, or isn't in scope, depending on how you prefer to look at it). With the code you posted the error is reported against line 11 character 24, which is:
(SELECT type FROM
(SELECT d.type as type FROM table_2 d
WHERE d.s_id = t1.s_id AND d.sv_id = t1.sv_id
^^^^^^^^
If you change the t1.s_id reference on the same line to something invalid then the error doesn't change and is still reported as ORA-00904: "T1"."SV_ID": invalid identifier. But if you change the same reference on line 5 instead to something like
((SELECT COUNT(dateCheck.id) FROM table_2 dateCheck
WHERE dateCheck.s_id = t1.s_idXXX AND dateCheck.sv_id = t1.sv_id) = 0)
... then the error changes to ORA-00904: "T1"."S_IDXXX": invalid identifier. This is down to how the statement is being parsed. In your original version the subquery in the WHEN clause is value, and you only break it by changing that identifier. The subquery in the ELSE is also OK. But the nested subquery in the ELSE has the problem, and changing the t1.s_id in that doesn't make any difference because the parser reads that part of the statement backwards (I don't know, or can't remember, why!).
So you have to eliminate the nested subquery. A general approach would be to make the whole CASE an inline view which you can then join using s_id and sv_id, but that's complicated as there may be no matching table_2 record (based on your count); and there may be no s_id value to match against as that isn't being checked in table_3.
It isn't clear if there will always be a table_3 record even then there is a table_2 record, or if they're mutually exclusive. If I've understood what the CASE is doing then I think you can use an outer join between those two tables and compare the combined data with the row you're updating, but because of that ambiguity it needs to be a full outer join. I think.
Here's a stab at using that construct with a MERGE instead of an update.
MERGE INTO table_1 t1
USING (
SELECT t2.s_id,
coalesce(t2.sv_id, t3.id) as sv_id,
coalesce(t2.type, t3.type) as type,
row_number() over (partition by t2.s_id, t2.sv_id
order by t2.creationtimestamp) as rn
FROM table_2 t2
FULL OUTER JOIN table_3 t3
ON t3.id = t2.sv_id
) tmp
ON ((tmp.s_id is null OR tmp.s_id = t1.s_id) AND tmp.sv_id = t1.sv_id AND tmp.rn = 1)
WHEN MATCHED THEN UPDATE SET t1.type = tmp.type;
If there will always be a table_3 record then you could use that as the driver and have a left outer join to table_2 instead, but hard to tell which might be appropriate. So this is really just a starting point.
SQL Fiddle with some made-up data that I believe would have hit both branches of your case. More realistic data would expose the flaws and misunderstandings, and suggest a more robust (or just more correct) approach...
Your query and your analysis seems sound to me. I have no solution but a few things you can try to maybe trigger something that explains this odd behavior:
Quote the column (just in case it happens to be a SQL keyword).
Use table_1.sv_id - this works as long as the whole query contains this table only once.
Make sure that the alias t1 exists only once
Run the query with a query tool like SQuirrel SQL - the tool can examine the exact position where Oracle reports the problem. Maybe it's in a different place of the query than you think
Check () and make sure they are around the parts where they should be.
Swap the order of expressions around =

Use of 1=2 in a SQL query

Someone please explain the meaning of '1=2' in the below SQL query.
SELECT E.EmpID,
E.EmpName,
Country = CASE
WHEN T.Active = 'N'
AND 1 = 2 THEN 'Not Working Anymore'
ELSE C.Country_Name
END,
T.Contract_No
FROM Employees E (nolock)
INNER JOIN Contract T
ON T.Contract_No = E.Contract_No
LEFT JOIN Country C (nolock)
ON E.Country_ID = C.Country_ID
thanks
EDIT:- Corrected the slight mistake existed in the example SQL query given by me.
# ALL :- The query mentioned here is an example version of a big working query on which I have to reoslve something. I have created a sample scenario of SQL query for the sake of simplicity of question.
There is a good use for this 1=2 part of the WHERE clause if you are creating a table from another, but you don't want to copy any rows. For example:
CREATE TABLE ABC_TEMP AS
SELECT * FROM ABC WHERE 1=2;
when T.Active = 'N' and 1=2 then 'Not Working Anymore'
Simple, the above condition will never become true.
So the result will always be C.Country_Name
It is a common trick used in dynamic construction of SQL filter clauses. This allows the automated construction of "T.Active = 'N' and" with no check needed for a following clause, because "1=2" will always be appended.
Update:
Whether 1=1 or 1=2 is used depends on whether conjunctive or disjunctive normal form is supposed to be used in building the automated clauses. In this case, there seems to have been a mismatch of design and implementation.
Update 2
I believe most developers prefer conjunctive normal form, with major terms joind by AND, but disjunctive normal form is equal in expressive power and size of code.
It corresponds to a FALSE argument.
For example ;
select * from TABLE where 1=2
returns zero rows.
Use WHERE 1=2 if you don't want to retrieve any rows,
As 1=2 is always false.
adding and 1=2 will cause that case to always return false. To find out why it's there, ask the person who put it there.
I suspect it was put there so the author could force the first condition to be false and then he forgot to remove it.
I would guess that is a debug script. It is there to always return the negative part of the case. Probably on release that part is taken out.
1 = 2 means that we are giving a condition that will always be false; therefore no records will show ('NULL') for your rows...
ie
Create table empt_tgt
AS
Select empno, ename, job, mgr, sal
WHERE 1=2;
then assuming that empt_tgt has records for all those columns
when we perform the following statement:
SELECT * FROM empt_tgt
EMPT_TGT will be null ; meaning we will only see the column name empno, ename, job, mgr,sal no data...
I have found this in several bits of code at my company. In our case it generally gets left in as DEBUG code by mistake. Developers could use it as a place holder which looks like the case in your example.
People use 1=2 to check if their code is syntactically correct without the code performing anything. For example if you have a complicated UPDATE statement and you want to check if the code is correct without updating anything.

I Need some sort of Conditional Join

Okay, I know there are a few posts that discuss this, but my problem cannot be solved by a conditional where statement on a join (the common solution).
I have three join statements, and depending on the query parameters, I may need to run any combination of the three. My Join statement is quite expensive, so I want to only do the join when the query needs it, and I'm not prepared to write a 7 combination IF..ELSE.. statement to fulfill those combinations.
Here is what I've used for solutions thus far, but all of these have been less than ideal:
LEFT JOIN joinedTable jt
ON jt.someCol = someCol
WHERE jt.someCol = conditions
OR #neededJoin is null
(This is just too expensive, because I'm performing the join even when I don't need it, just not evaluating the join)
OUTER APPLY
(SELECT TOP(1) * FROM joinedTable jt
WHERE jt.someCol = someCol
AND #neededjoin is null)
(this is even more expensive than always left joining)
SELECT #sql = #sql + ' INNER JOIN joinedTable jt ' +
' ON jt.someCol = someCol ' +
' WHERE (conditions...) '
(this one is IDEAL, and how it is written now, but I'm trying to convert it away from dynamic SQL).
Any thoughts or help would be great!
EDIT:
If I take the dynamic SQL approach, I'm trying to figure out what would be most efficient with regards to structuring my query. Given that I have three optional conditions, and I need the results from all of them my current query does something like this:
IF condition one
SELECT from db
INNER JOIN condition one
UNION
IF condition two
SELECT from db
INNER JOIN condition two
UNION
IF condition three
SELECT from db
INNER JOIN condition three
My non-dynamic query does this task by performing left joins:
SELECT from db
LEFT JOIN condition one
LEFT JOIN condition two
LEFT JOIN condition three
WHERE condition one is true
OR condition two is true
OR condition three is true
Which makes more sense to do? since all of the code from the "SELECT from db" statement is the same? It appears that the union condition is more efficient, but my query is VERY long because of it....
Thanks!
LEFT JOIN
joinedTable jt ON jt.someCol = someCol AND jt.someCol = conditions AND #neededjoin ...
...
OR
LEFT JOIN
(
SELECT col1, someCol, col2 FROM joinedTable WHERE someCol = conditions AND #neededjoin ...
) jt ON jt.someCol = someCol
...
OR
;WITH jtCTE AS
(SELECT col1, someCol, col2 FROM joinedTable WHERE someCol = conditions AND #neededjoin ...)
SELECT
...
LEFT JOIN
jtCTE ON jtCTE.someCol = someCol
...
To be honest, there is no such construct as a conditional JOIN unless you use literals.
If it's in the SQL statement it's evaluated... so don't have it in the SQL statement by using dynamic SQL or IF ELSE
the dynamic sql solution is usually the best for these situations, but if you really need to get away from that a series of if statments in a stroed porc will do the job. It's a pain and you have to write much more code but it will be faster than trying to make joins conditional in the statement itself.
I would go for a simple and straightforward approach like this:
DECLARE #ret TABLE(...) ;
IF <coondition one> BEGIN ;
INSERT INTO #ret() SELECT ...
END ;
IF <coondition two> BEGIN ;
INSERT INTO #ret() SELECT ...
END ;
IF <coondition three> BEGIN ;
INSERT INTO #ret() SELECT ...
END ;
SELECT DISTINCT ... FROM #ret ;
Edit: I am suggesting a table variable, not a temporary table, so that the procedure will not recompile every time it runs. Generally speaking, three simpler inserts have a better chance of getting better execution plans than one big huge monster query combining all three.
However, we can not guess-timate performance. we must benchmark to determine it. Yet simpler code chunks are better for readability and maintainability.
Try this:
LEFT JOIN joinedTable jt
ON jt.someCol = someCol
AND jt.someCol = conditions
AND #neededJoin = 1 -- or whatever indicates join is needed
I think you'll find it is good performance and does what you need.
Update
If this doesn't give the performance I claimed, then perhaps that's because the last time I did this using joins to a table. The value I needed could come from one of 3 tables, based on 2 columns, so I built a 'join-map' table like so:
Col1 Col2 TableCode
1 2 A
1 4 A
1 3 B
1 5 B
2 2 C
2 5 C
1 11 C
Then,
SELECT
V.*,
LookedUpValue =
CASE M.TableCode
WHEN 'A' THEN A.Value
WHEN 'B' THEN B.Value
WHEN 'C' THEN C.Value
END
FROM
ValueMaster V
INNER JOIN JoinMap M ON V.Col1 = M.oOl1 AND V.Col2 = M.Col2
LEFT JOIN TableA A ON M.TableCode = 'A'
LEFT JOIN TableB B ON M.TableCode = 'B'
LEFT JOIN TableC C ON M.TableCode = 'C'
This gave me a huge performance improvement querying these tables (most of them dozens or hundreds of million-row tables).
This is why I'm asking if you actually get improved performance. Of course it's going to throw a join into the execution plan and assign it some cost, but overall it's going to do a lot less work than some plan that just indiscriminately joins all 3 tables and then Coalesce()s to find the right value.
If you find that compared to dynamic SQL it's only 5% more expensive to do the joins this way, but with the indiscriminate joins is 100% more expensive, it might be worth it to you to do this because of the correctness, clarity, and simplicity over dynamic SQL, all of which are probably more valuable than a small improvement (depending on what you're doing, of course).
Whether the cost scales with the number of rows is also another factor to consider. If even with a huge amount of data you only save 200ms of CPU on a query that isn't run dozens of times a second, it's a no-brainer to use it.
The reason I keep hammering on the fact that I think it's going to perform well is that even with a hash match, it wouldn't have any rows to probe with, or it wouldn't have any rows to create a hash of. The hash operation is going to stop a lot earlier compared to using the WHERE clause OR-style query of your initial post.
The dynamic SQL solution is best in most respects; you are trying to run different queries with different numbers of joins without rewriting the query to do different numbers of joins - and that doesn't work very well in terms of performance.
When I was doing this sort of stuff an æon or so ago (say the early 90s), the language I used was I4GL and the queries were built using its CONSTRUCT statement. This was used to generate part of a WHERE clause, so (based on the user input), the filter criteria it generated might look like:
a.column1 BETWEEN 1 AND 50 AND
b.column2 = 'ABCD' AND
c.column3 > 10
In those days, we didn't have the modern JOIN notations; I'm going to have to improvise a bit as we go. Typically there is a core table (or a set of core tables) that are always part of the query; there are also some tables that are optionally part of the query. In the example above, I assume that 'c' is the alias for the main table. The way the code worked would be:
Note that table 'a' was referenced in the query:
Add 'FullTableName AS a' to the FROM clause
Add a join condition 'AND a.join1 = c.join1' to the WHERE clause
Note that table 'b' was referenced...
Add bits to the FROM clause and WHERE clause.
Assemble the SELECT statement from the select-list (usually fixed), the FROM clause and the WHERE clause (occasionally with decorations such as GROUP BY, HAVING or ORDER BY too).
The same basic technique should be applied here - but the details are slightly different.
First of all, you don't have the string to analyze; you know from other circumstances which tables you need to add to your query. So, you still need to design things so that they can be assembled, but...
The SELECT clause with its select-list is probably fixed. It will identify the tables that must be present in the query because values are pulled from those tables.
The FROM clause will probably consist of a series of joins.
One part will be the core query:
FROM CoreTable1 AS C1
JOIN CoreTable2 AS C2
ON C1.JoinColumn = C2.JoinColumn
JOIN CoreTable3 AS M
ON M.PrimaryKey = C1.ForeignKey
Other tables can be added as necessary:
JOIN AuxilliaryTable1 AS A
ON M.ForeignKey1 = A.PrimaryKey
Or you can specify a full query:
JOIN (SELECT RelevantColumn1, RelevantColumn2
FROM AuxilliaryTable1
WHERE Column1 BETWEEN 1 AND 50) AS A
In the first case, you have to remember to add the WHERE criterion to the main WHERE clause, and trust the DBMS Optimizer to move the condition into the JOIN table as shown. A good optimizer will do that automatically; a poor one might not. Use query plans to help you determine how able your DBMS is.
Add the WHERE clause for any inter-table criteria not covered in the joining operations, and any filter criteria based on the core tables. Note that I'm thinking primarily in terms of extra criteria (AND operations) rather than alternative criteria (OR operations), but you can deal with OR too as long as you are careful to parenthesize the expressions sufficiently.
Occasionally, you may have to add a couple of JOIN conditions to connect a table to the core of the query - that is not dreadfully unusual.
Add any GROUP BY, HAVING or ORDER BY clauses (or limits, or any other decorations).
Note that you need a good understanding of the database schema and the join conditions. Basically, this is coding in your programming language the way you have to think about constructing the query. As long as you understand this and your schema, there aren't any insuperable problems.
Good luck...
Just because no one else mentioned this, here's something that you could use (not dynamic). If the syntax looks weird, it's because I tested it in Oracle.
Basically, you turn your joined tables into sub-selects that have a where clause that returns nothing if your condition does not match. If the condition does match, then the sub-select returns data for that table. The Case statement lets you pick which column is returned in the overall select.
with m as (select 1 Num, 'One' Txt from dual union select 2, 'Two' from dual union select 3, 'Three' from dual),
t1 as (select 1 Num from dual union select 11 from dual),
t2 as (select 2 Num from dual union select 22 from dual),
t3 as (select 3 Num from dual union select 33 from dual)
SELECT m.*
,CASE 1
WHEN 1 THEN
t1.Num
WHEN 2 THEN
t2.Num
WHEN 3 THEN
t3.Num
END SelectedNum
FROM m
LEFT JOIN (SELECT * FROM t1 WHERE 1 = 1) t1 ON m.Num = t1.Num
LEFT JOIN (SELECT * FROM t2 WHERE 1 = 2) t2 ON m.Num = t2.Num
LEFT JOIN (SELECT * FROM t3 WHERE 1 = 3) t3 ON m.Num = t3.Num

if statements in sql query

Good morning all. I have an issue with a query. I want to select something in a query only if another field is somethingelse. The below query will better explain
select
Case isnull(rl.reason,'Not Found')
When 'D' then 'Discontinued'
When 'N' then 'Not Found'
When 'I' then 'Inactive'
When 'C' then 'No Cost'
When '' then 'Not Found'
End as Reason, ***If statement to select pv.descriptor only if reason is in ('D','I','C')***pv.descriptor
from table1 as rl
left join table2 as v on v.field= rl.field
***Here i want an if statment to run if reason is in ('D','I','C')***
left join table3 as pv on
Case rl.scantype
when 'S' then cast(ltrim(rtrim(pv.field#1)) as varchar)
when 'U' then cast(ltrim(rtrim(pv.field#2)) as varchar)
when 'V' then cast(ltrim(rtrim(pv.vfield#3)) as varchar)
end
= rl.scan and pv.vend_no = rl.vendnum
***'**If statement ends*****
left join storemain..prmastp as p on p.emuserid = rl.userid
where rl.scandate between GetDate() -7 and GetDate() order by rl.scandate desc
I want the if statement to select the descriptor only if the reason selected is a 'D','I',or'C'. If not I want a null value there because i will not do the join to get that variable unless the reason is a 'D','I','C'
BY the way, I can used a case statement where i used it in the middle of the left join. It works perfectly fine. That's not my issue.
If you want it in one query, you HAVE to do the join. Using left joins and case statement as you have, you can ensure pv.descriptor is shown as null if that is what you want in certain cases.
If you want control flow, you will need to use T-SQL
If performance is your concern, you shouldn't be joining on computed values. Rethink the database design. You likey want to create new columns for your join, and may want to create intermediary tables if you have many-to-many relationships.
I think that you want to only join to pv if v.reason is (D, I, C) - is that right? If that's your problem, just change your JOIN clause to:
LEFT JOIN table3 as pv ON
LTRIM(RTRIM(
CASE rl.scantype
WHEN 'S' THEN pv.field#1
WHEN 'U' THEN pv.field#2
WHEN 'V' THEN pv.field#3
END
)) = rl.scan
AND rl.vendnum = pv.vend_no
AND rl.reason IN ('D', 'I', 'C')
Of course, you also have "If statement to select pv.descriptor only if reason is in ('D','I','C') [as] pv.descriptor" in the SELECT clause. So, assuming you want that instead, try this:
SELECT
/* your other columns */
CASE
WHEN rl.reason IN ('D', 'I', 'C') THEN pv.descriptor
ELSE NULL --optional, since it'll default to NULL
END as descriptor
What motivates the combination of two queries? Joining could cause a radically different evaluation plan to be required... Selecting between two plans based on the values of variables (or worse - columns!) is not something the query optimizer does well.
You will be far better off if you write two queries and use a SQL IF statement to flow control to one of them.
Edit: What should the query return if there are two rows from rl, one with reason=D and one with reason=S?
I'd keep it simple so whoever supports your code in the future can figure out how to maintain it (when they add another reason code, for example).
It may not be as performant, but it's more maintainable.
Also, is there a way you can get those codes in a table and test for a marker, instead of having them hard-coded in sql?

Sql Server query syntax

I need to perform a query like this:
SELECT *,
(SELECT Table1.Column
FROM Table1
INNER JOIN Table2 ON Table1.Table2Id = Table2.Id
) as tmp
FROM Table2 WHERE tmp = 1
I know I can take a workaround but I would like to know if this syntax is possible as it is (I think) in Mysql.
The query you posted won't work on sql server, because the sub query in your select clause could possibly return more than one row. I don't know how MySQL will treat it, but from what I'm reading MySQL will also yield an error if the sub query returns any duplicates. I do know that SQL Server won't even compile it.
The difference is that MySQL will at least attempt to run the query and if you're very lucky (Table2Id is unique in Table1) it will succeed. More probably is will return an error. SQL Server won't try to run it at all.
Here is a query that should run on either system, and won't cause an error if Table2Id is not unique in Table1. It will return "duplicate" rows in that case, where the only difference is the source of the Table1.Column value:
SELECT Table2.*, Table1.Column AS tmp
FROM Table1
INNER JOIN Table2 ON Table1.Table2Id = Table2.Id
WHERE Table1.Column = 1
Perhaps if you shared what you were trying to accomplish we could help you write a query that does it.
SELECT *
FROM (
SELECT t.*,
(
SELECT Table1.Column
FROM Table1
INNER JOIN
Table2
ON Table1.Table2Id = Table2.Id
) as tmp
FROM Table2 t
) q
WHERE tmp = 1
This is valid syntax, but it will fail (both in MySQL and in SQL Server) if the subquery returns more than 1 row
What exactly are you trying to do?
Please provide some sample data and desired resultset.
I agree with Joel's solution but I want to discuss why your query would be a bad idea to use (even though the syntax is essentially valid). This is a correlated subquery. The first issue with these is that they don't work if the subquery could possibly return more than one value for a record. The second and more critical problem (in my mind) is that they must work row by row rather than on the set of data. This means they will virtually always affect performance. So correlated subqueries should almost never be used in a production system. In this simple case, the join Joel showed is the correct solution.
If the subquery is more complicated, you may want to turn it into a derived table instead (this also fixes the more than one value associated to a record problem). While a derived table looks a lot like a correlated subquery to the uninitated, it does not perform the same way because it acts on the set of data rather than row-by row and thus will often be significantly faster. You are essentially making the query a table in the join.
Below is an example of your query re-written as a derived table. (Of course in production code you would not use select * either especially in a join, spell out the fields you need)
SELECT *
FROM Table2 t2
JOIN
(SELECT Table1.[Column], Table1.Table2Id as tmp
FROM Table1
INNER JOIN Table2 ON Table1.Table2Id = Table2.Id ) as t
ON t.Table2Id = Table2.Id
WHERE tmp = 1
You've already got a variety of answers, some of them more useful than others. But to answer your question directly:
No, SQL Server will not allow you to reference the column alias (defined in the select list) in the predicate (the WHERE clause). I think that is sufficient to answer the question you asked.
Additional details:
(this discussion goes beyond the original question you asked.)
As you noted, there are several workarounds available.
Most problematic with the query you posted (as others have already pointed out) is that we aren't guaranteed that the subquery in the SELECT list returns only one row. If it does return more than one row, SQL Server will throw a "too many rows" exception:
Subquery returned more than 1 value.
This is not permitted when the subquery
follows =, !=, , >= or when the
subquery is used as an expression.
For the following discussion, I'm going to assume that issue is already sufficiently addressed.
Sometimes, the easiest way to make the alias available in the predicate is to use an inline view.
SELECT v.*
FROM ( SELECT *
, (SELECT Table1.Column
FROM Table1
JOIN Table2 ON Table1.Table2Id = Table2.Id
WHERE Table1.Column = 1
) as tmp
FROM Table2
) v
WHERE v.tmp = 1
Note that SQL Server won't push the predicate for the outer query (WHERE v.tmp = 1) into the subquery in the inline view. So you need to push that in yourself, by including the WHERE Table1.Column = 1 predicate in the subquery, particularly if you're depending on that to make the subquery return only one value.
That's just one approach to working around the problem, there are others. I suspect that query plan for this SQL Server query is not going to be optimal, for performance, you probably want to go with a JOIN or an EXISTS predicate.
NOTE: I'm not an expert on using MySQL. I'm not all that familiar with MySQL support for subqueries. I do know (from painful experience) that subqueries weren't supported in MySQL 3.23, which made migrating an application from Oracle 8 to MySQL 3.23 particularly painful.
Oh and btw... of no interest to anyone in particular, the Teradata DBMS engine DOES have an extension that allows for the NAMED keyword in place of the AS keyword, and a NAMED expression CAN be referenced elsewhere in the QUERY, including the WHERE clause, the GROUP BY clause and the ORDER BY clause. Shuh-weeeet
That kind of syntax is basically valid (you need to move the where tmp=... to on outer "select * from (....)", though), although it's ambiguous since you have two sets named "Table2"- you should probably define aliases on at least one of your usages of that table to clear up the ambiguity.
Unless you intended that to return a column from table1 corresponding to columns in table2 ... in which case you might have wanted to simply join the tables?