Obnoxious WHERE Clause in UPDATE - sql

I found the following WHERE clause in an UPDATE statement, and I hate it. The only way I can think to make it different would be with possibly a CTE and some Unions.
FROM dbo.Table1 T1
INNER JOIN #Table2 T2 ON T1.IntField1 = T2.IntField1
WHERE (ISNULL(T1.IntField2, 0) <> ISNULL(T2.IntField2, 0)
OR ISNULL(T1.IntField3, 0) <> ISNULL(T2.IntField3, 0))
AND (T2.IntField1 IN (
SELECT IntField1
FROM dbo.Table3)
OR T2.IntField1 IS NULL)
I think I've just been staring at this too long. I just happened to look at this SP, and see this. Really felt like something could be done differently/better.

It's not the prettiest no, but no real need to change it unless it is performing badly. Don't ever change SQL code just because you don't like the way it looks, that is often counterproductive because some of the worst looking code is the most performant and the DBA will not thank you for changing their tuned code. Thinking you should change SQL code to suit your personal prefernces is BAD habit you need to break. Read up about performance tuning instead and refactor to improve performance not to suit your prejudices of what is pretty or (worse elegant!) code.
There are two things I can see that might help this though. First why do you need OR T2.IntField1 IS NULL? Since you are joining in an Inner join to table1 on that field, there can never be a result set where T2.IntField1 IS NULL.
The other thing depends on what else #table2 is used for. But since you are clearly creating and populating this table earlier, why not do the conversion of the T2.IntField2 and T2.IntField3 to 0 when they are null at the time the data is put into the table? That would reduce the complexity of the update query some. However, if you need those nulls for some other purpose during the process you can't do this.

It looks like you could combine the elements of the where clause into joins:
Overview:
1) NOT (A AND B) is the same as NOT(A) OR NOT(B)
2) IN OR NULL can be combined in an ISNULL() join.
FROM dbo.Table1 T1
JOIN #Table2 T2 ON T1.IntField1 = T2.IntField1
AND NOT
(
ISNULL(T1.IntField2, 0) = ISNULL(T2.IntField2, 0)
and
ISNULL(T1.IntField3, 0) = ISNULL(T2.IntField3, 0)
)
JOIN dbo.Table3 t3 on
t3.IntField1 = ISNULL(T2.IntField1, t3.IntField1)
But as it's been stated before, if performance is the only focus, this -- although more readable (in my opinion) -- is not necessary.

Related

Improving performance of adding a column with a single value

By experimentation and surprisingly, I have found out that LEFT JOINING a point-table is much faster on large tables then a simple assigning of a single value to a column. By a point-table I mean a table 1x1 (1 row and 1 column).
Approach 1. By a simple assigning value, I mean this (slower):
SELECT A.*, 'Value' as NewColumn,
FROM Table1 A
Approach 2. By left-joining a point-table, I mean this (faster):
WITH B AS (SELECT 'Value' as 'NewColumn')
SELECT * Table1 A
LEFT JOIN B
ON A.ID <> B.NewColumn
Now the core of my question. Can someone advise me how to get rid of the whole ON clause:
ON A.ID <> B.NewColumn?
Checking the joining condition seems unnecessary waste of time because the key of table A must not equal the key of table B. It would throw out the rows from results if t1.ID had the same value as 'Value'. Removing that condition or maybe changing <> to = sign, seems further space to facilitate the join's performance.
Update February 23, 2015
Bounty question addressed to performance experts. Which of the approaches mentioned in my question and answers is the fastest.
Approach 1 Simple assigning value,
Approach 2 Left joining a point-table,
Approach 3 Cross joining a point-table (thanks to answer of Gordon Linoff)
Approach 4 Any other approach which may be suggested during the bounty period.
As I have measured empirically time of query execution in seconds of 3 approaches - the second approach with LEFT JOIN is the fastest. Then CROSS JOIN method, and then at last simple assigning value. Surprising as it is. Performance expert with a Solomon's sword is needed to confirm it or deny it.
I'm surprised this is faster for a simple expression, but you seem to want a cross join:
WITH B AS (SELECT 'Value' as NewColumn)
SELECT *
FROM Table1 A CROSS JOIN
B;
I use this construct to put "parameters" in queries (values that can easily be changed). However, I don't see why it would be faster. If the expression is more complicated (such as a subquery or very complicated calculation), then this method only evaluates it once. In the original query, it would normally be evaluated only once, but there might be cases where it is evaluated for each row.
You can also try with CROSS APPLY:
SELECT A.*, B.*,
FROM Table1 A
CROSS APPLY(SELECT 'Value' as 'NewColumn') B
Can you try to insert into a temp table instead of outputting to screen:
SELECT A.*, 'Value' as NewColumn
INTO #Table1Assign
FROM Table1 A
and
WITH B AS (SELECT 'Value' as 'NewColumn')
SELECT * Table1 A
INTO #Table1Join
LEFT JOIN B
ON A.ID <> B.NewColumn
That takes the actual transmission and rendering of the data to SSMS out of the equation, which could be caused by network slowdown or processing on the client.
When I run this with a 1M row table, I consistently get better performance with the simple assigning method, even if I switch to CROSS JOIN for the join method.
I doubts that second approach will be faster,with three select and left join.
First of all you should test same query with various sample data repeatedly.
What is the real scenario like ?
Inner join will be definitely faster than left join .
How about this ?
Declare #t table(id int,c2 varchar(10))
INSERT INTO #T
select 1,'A' union all
select 2,'A' union all
select 3,'B' union all
select 4,'B'
Declare #t1 table(nEWcOL varchar(10))
INSERT INTO #T1 Values('Value')
-- #Approach1
--SELECT * FROM #T outer apply
--#t1
--Create index on both join column
--#Approach2
SELECT * FROM #T A inner join
#t1 b on a.c2<>b.nEWcOL
--#Approach3
Declare #value varchar(20)
Select #value= nEWcOL from #t1
select *,#value value from #t
Too much text for a comment, so added this as an answer although I'm actually more adding to the question (**)
Somehow I think this is going to be one of those 'it depends' situations. I think it depends a lot on the amount of rows involved and even more on what happens afterwards with the data. Is it simply returned, is it used in a GROUP BY or DISTINCT later on, do we further JOIN or calculate with it etc..
Anyway, I think this IS an interesting question in that I've had to find out the hard way that having a dozen of 'parameters' in a single-row temp-table was faster than having them assigned upfront to 12 variables. Many, many moons ago the code I was given looked like an absurd construction to me so I rewrote it to use #variables instead. This was in a +1000-lines stored procedure which needed some extra performance squeezed out of it. After quite a bit of refactoring it turned out to run remarkably slower than before the change?!?!!
I've never really understood why and at the time simply reverted to the old version again. My best guess is some weird kind of combination of parameter-sniffing vs (auto-created?) statistics on the temp-table in question; if anyone could bring some light to your question it probably will lead to an answer of mine too =)
(**: I realize SO is not a forum so I apologise upfront, simply wanted to chime in that the observed behaviour of the OP isn't entirely anecdotal)
Select * doesn't use indexes properly on SQL, you should always specify your columns.
Other than that I would use
DECLARE #Value VARCHAR(30) = 'Value'
SELECT t.Id, t.C2, #Value NewColumn
FROM Table1 t

SQL Server stored procedure takes 1' 18" to run... seems long

Sure could use some optimization help here. I've got a stored procedure which takes approximately 1 minute, 18 seconds to run and it gets even worse when I run the asp.net page which hits it.
Some stats:
tbl_Allocation typically has approximately 55K records
CS_Ready has ~300
Redate_Orders has ~2000
Here is the code:
ALTER PROCEDURE [dbo].[sp_Order_Display]
/*
(
#parameter1 int = 5,
#parameter2 datatype OUTPUT
)
*/
AS
/* SET NOCOUNT ON */
BEGIN
WTIH CS_Ready AS
(
SELECT
tbl_Order_Notes.txt_Order_Only As CS_Ready_Order
FROM
tbl_Order_Notes
INNER JOIN
tbl_Order_Notes_by_line ON tbl_Order_Notes.txt_Order_Only = SUBSTRING(tbl_Order_Notes_by_line.txt_Order_Key_by_line, 1, CHARINDEX('-', tbl_Order_Notes_by_line.txt_Order_Key_by_line, 0) - 1)
WHERE
(tbl_Order_Notes.bin_Customer_Service_Review = 'True')
AND (tbl_Order_Notes_by_line.dat_Recommended_Date_by_line IS NOT NULL)
AND (tbl_Order_Notes_by_line.bin_Redate_Request_by_line = 'True')
OR (tbl_Order_Notes.bin_Customer_Service_Review = 'True')
AND (tbl_Order_Notes_by_line.dat_Recommended_Date_by_line IS NULL)
AND (tbl_Order_Notes_by_line.bin_Redate_Request_by_line = 'False'
OR tbl_Order_Notes_by_line.bin_Redate_Request_by_line IS NULL)
),
Redate_Orders AS
(
SELECT DISTINCT
SUBSTRING(txt_Order_Key_by_line, 1, CHARINDEX('-', txt_Order_Key_by_line, 0) - 1) AS Redate_Order_Number
FROM
tbl_Order_Notes_by_line
WHERE
(bin_Redate_Request_by_line = 'True')
)
SELECT DISTINCT
tbl_Allocation.*, tbl_Order_Notes.*,
tbl_Order_Notes_by_line.*,
tbl_Max_Promised_Date_1.Max_Promised_Ship,
tbl_Max_Promised_Date_1.Max_Scheduled_Pick,
Redate_Orders.Redate_Order_Number, CS_Ready.CS_Ready_Order,
tbl_Most_Recent_Comments.Abbr_Comment,
MRC_Line.Abbr_Comment as Abbr_Comment_Line
FROM
tbl_Allocation
INNER JOIN
tbl_Max_Promised_Date AS tbl_Max_Promised_Date_1 ON tbl_Allocation.num_Order_Num = tbl_Max_Promised_Date_1.num_Order_Num
LEFT OUTER JOIN
CS_Ready ON tbl_Allocation.num_Order_Num = CS_Ready.CS_Ready_Order
LEFT OUTER JOIN
Redate_Orders ON tbl_Allocation.num_Order_Num = Redate_Orders.Redate_Order_Number
LEFT OUTER JOIN
tbl_Order_Notes ON Hidden_Order_Only = tbl_Order_Notes.txt_Order_Only
LEFT OUTER JOIN
tbl_Order_Notes_by_line ON Hidden_Order_Key = tbl_Order_Notes_by_line.txt_Order_Key_by_line
LEFT OUTER JOIN
tbl_Most_Recent_Comments ON Cast(tbl_Allocation.Hidden_Order_Only as varchar) = tbl_Most_Recent_Comments.Com_ID_Parent_Key
LEFT OUTER JOIN
tbl_Most_Recent_Comments as MRC_Line ON Cast(tbl_Allocation.Hidden_Order_Key as varchar) = MRC_Line.Com_ID_Parent_Key
ORDER BY
num_Order_Num, num_Line_Num
End
RETURN
What suggestions do you have to make this execute within five seconds or less?
Thanks,
Rob
Assuming you have appropriate indices defined, you still have several things that suggest problems.
1) You have 2 select distinct clauses in this query -- in a good design, distinct clauses are are rarely needed
2) The first inner join uses
tbl_Order_Notes_by_line
ON tbl_Order_Notes.txt_Order_Only
= SUBSTRING(tbl_Order_Notes_by_line.txt_Order_Key_by_line, 1,
CHARINDEX('-', tbl_Order_Notes_by_line.txt_Order_Key_by_line, 0) - 1)
This looks like a horrible join criteria -- function calls during the join that prevent any decent query optimization. My guess is that your are using data the has internal meaning and that you are parsing the internal meaning during the join, e.g.,
PartNumber = AAA-BBBB_NNNNNNNN
where AAA is the Country product line and BBBB is the year & month of the design
If you must have coded fields like these AND you need to manipulate them, put the codes into separate database fields and created a computer column -- or even a plan copy of the full part number field if the combined field is unusually complex.
This point is not a performance issue, but you have a long sub-query using multiple AND & OR clauses. I know the rules for operator precedence, you may know the rules for operator precedence, but will the next guy? Will you remember them an 1:00 when stuff is broken.
ADDED
You are using 2 common table expressions. I know others say it does not happen, but I don't really trust the query optimizer for CTE's -- I have had to recode CTE based joins for performance issues on several occasions -- creating an actual view equivalent to the CTE and using that instead can be a significant speedup. May well depend on the version of SQL server, but if you are running an older version I would definitely wonder about CTR optimization. -- This is not as important as the first 2 things I've mentioned, try to fix those first.
ADDED
I'm going to harsh on CTEs again, as I did not really explain why they are bad for performance, and it was bothering me. If you don't have performance issues, and you like the syntax, they can be useful in at least limited usage, personally I don't normally recommend them for anything more than that -- and given that it is MS specific syntactical sugar, I really can't recommend them much at all.
I think the primary reason that CTEs don't get optimized well is that there are no statistics for the opimizer to use. If you are pulling a lot of rows into a CTE, you are probably better off creating #temptable and populating it. You can even add an index or two to your #temptable and the optimizer can figure out how to use them too. A #temp table is similar, but at least through sql 2012, the were no faster than #temp that I could tell -- supposedly new goodness in server 2014 help this.
A CTE is really just a temporary view in disguise, which I why I suggested you can replace with a real view to better better performance (and you often can), or you can populate a temp table and sometime get even better performance.

SQL - Relationship between a SubQuery and an Outer Table

Problem
I need to better understand the rules about when I can reference an outer table in a subquery and when (and why) that is an inappropriate request. I've discovered a duplication in an Oracle SQL query I'm trying to refactor but I'm running into issues when I try and turn my referenced table into a grouped subQuery.
The following statement works appropriately:
SELECT t1.*
FROM table1 t1,
INNER JOIN table2 t2
on t1.id = t2.id
and t2.date = (SELECT max(date)
FROM table2
WHERE id = t1.id) --This subquery has access to t1
Unfortunately table2 sometimes has duplicate records so I need to aggregate t2 first before I join it to t1. However when I try and wrap it in a subquery to accomplish this operation, suddenly the SQL engine can't recognize the outer table any longer.
SELECT t1.*
FROM table1 t1,
INNER JOIN (SELECT *
FROM table2 t2
WHERE t1.id = t2.id --This loses access to t1
and t2.date = (SELECT max(date)
FROM table2
WHERE id = t1.id)) sub on t1.id = sub.id
--Subquery loses access to t1
I know these are fundamentally different queries I'm asking the compiler to put together but I'm not seeing why the one would work but not the other.
I know I can duplicate the table references in my subquery and effectively detach my subquery from the outer table but that seems like a really ugly way of accomplishing this task (what with all the duplication of code and processing).
Helpful References
I found this fantastic description of the order in which clauses are executed in SQL Server: (INNER JOIN ON vs WHERE clause). I'm using Oracle but I would think that this would be standard across the board. There is a clear order to clause evaluation (with FROM being first) so I would think that any clause occuring further down the list would have access to all information previously processed. I can only assume my 2nd query somehow changes that ordering so that my subquery is being evaluated too early?
In addition, I found a similar question asked (Referencing outer query's tables in a subquery
) but while the input was good they never really explained why he couldn't do what he is doing and just gave alternative solutions to his problem. I've tried their alternate solutions but it's causing me other issues. Namely, that subquery with the date reference is fundamental to the entire operation so I can't get rid of it.
Questions
I want to understand what I've done here... Why can my initial subquery see the outer table but not after I wrap the entire statement in a subquery?
That said, if what I'm trying to do can't be done, what is the best way of refactoring the first query to eliminate the duplication? Should I reference table1 twice (with all the duplication that requires)? Or is there (probably) a better way of tackling this problem?
Thanks in advance!
------EDIT------
As some have surmised these queries above are not the actually query I'm refactoring but an example of the problem I'm running into. The query I'm working with is a lot more complicated so I'm hesitant to post it here as I'm afraid it will get people off track.
------UPDATE------
So I ran this by a fellow developer and he had one possible explanation for why my subquery is losing access to t1. Because I'm wrapping this subquery in a parenthesis, he thinks that this subquery is being evaluated before my table t1 is being evaluated. This would definitely explain the 'ORA-00904: "t1"."id": invalid identifier' error I've been receiving. It would also suggest that like arithmetic order of operations, that adding parens to a statement gives it priority within certain clause evaluations. I would still love for an expert to weigh in if they agree/disagree that is a logical explanation for what I'm seeing here.
So I figured this out based on the comment that Martin Smith made above (THANKS MARTIN!) and I wanted to make sure I shared my discovery for anyone else who trips across this issue.
Technical Considerations
Firstly, it would certainly help if I used the proper terminology to describe my problem: My first statement above uses a correlated subquery:
http://en.wikipedia.org/wiki/Correlated_subquery
http://www.programmerinterview.com/index.php/database-sql/correlated-vs-uncorrelated-subquery/
This is actually a fairly inefficient way of pulling back data as it reruns the subquery for every line in the outer table. For this reason I'm going to look for ways of eliminating these type of subqueries in my code:
https://blogs.oracle.com/optimizer/entry/optimizer_transformations_subquery_unesting_part_1
My second statement on the other hand was using what is called an inline view in Oracle also known as a derived table in SQL Server:
http://docs.oracle.com/cd/B19306_01/server.102/b14200/queries007.htm
http://www.programmerinterview.com/index.php/database-sql/derived-table-vs-subquery/
An inline view / derived table creates a temporary unnamed view at the beginning of your query and then treats it like another table until the operation is complete. Because the compiler needs to create a temporary view when it sees on of these subqueries on the FROM line, those subqueries must be entirely self-contained with no references outside the subquery.
Why what I was doing was stupid
What I was trying to do in that second table was essentially create a view based on an ambiguous reference to another table that was outside the knowledge of my statement. It would be like trying to reference a field in a table that you hadn't explicitly stated in your query.
Workaround
Lastly, it's worth noting that Martin suggested a fairly clever but ultimately inefficient way to accomplish what I was trying to do. The Apply statement is a proprietary SQL Server function but it allows you to talk to objects outside of your derived table:
http://technet.microsoft.com/en-us/library/ms175156(v=SQL.105).aspx
Likewise this functionality is available in Oracle through different syntax:
What is the equivalent of SQL Server APPLY in Oracle?
Ultimately I'm going to re-evaluate my entire approach to this query which means I'll have to rebuild it from scratch (believe it or not I didn't create this monstrocity originally - I swear!). A big thanks to everyone who commented - this was definitely stumping me but all of the input helped put me on the right track!
How about the following query:
SELECT t1.* FROM
(
SELECT *
FROM
(
SELECT t2.id,
RANK() OVER (PARTITION BY t2.id, t2.date ORDER BY t2.date DESC) AS R
FROM table2 t2
)
WHERE R = 1
) sub
INNER JOIN table1 t1
ON t1.id = sub.id
In your second example you are trying to pass the t1 reference down 2 levels.. you can't do that, you can only pass it down 1 level (which is why the 1st works). If you give a better example of what you are trying to do, we can help you rewrite your query as well.

Explanation of using the operator EXISTS on a correlated subqueries

What is an explanation of the mechanics behind the following Query?
It looks like a powerful method of doing dynamic filtering on a table.
CREATE TABLE tbl (ID INT, amt INT)
INSERT tbl VALUES
(1,1),
(1,1),
(1,2),
(1,3),
(2,3),
(2,400),
(3,400),
(3,400)
SELECT *
FROM tbl T1
WHERE EXISTS
(
SELECT *
FROM tbl T2
WHERE
T1.ID = T2.ID AND
T1.amt < T2.amt
)
Live test of it here on SQL Fiddle
You can usually convert correlated subqueries into an equivalent expression using explicit joins. Here is one way:
SELECT distinct t1.*
FROM tbl T1 left outer join
tbl t2
on t1.id = t2.id and
t1.amt < t2.amt
where t2.id is null
Martin Smith shows another way.
The question of whether they are a "powerful way of doing dynamic filtering" is true, but (usually) unimportant. You can do the same filtering using other SQL constructs.
Why use correlated subqueries? There are several positives and several negatives, and one important reason that is both. On the positive side, you do not have to worry about "multiplication" of rows, as happens in the above query. Also, when you have other filtering conditions, the correlated subquery is often more efficient. And, sometimes using delete or update, it seems to be the only way to express a query.
The Achilles heel is that many SQL optimizers implement correlated subqueries as nested loop joins (even though do not have to). So, they can be highly inefficient at times. However, the particular "exists" construct that you have is often quite efficient.
In addition, the nature of the joins between the tables can get lost in nested subqueries, which complicated conditions in where clauses. It can get hard to understand what is going on in more complicated cases.
My recommendation. If you are going to use them on large tables, learn about SQL execution plans in your database. Correlated subqueries can bring out the best or the worst in SQL performance.
Possible Edit. This is more equivalent to the script in the OP:
SELECT distinct t1.*
FROM tbl T1 inner join
tbl t2
on t1.id = t2.id and
t1.amt < t2.amt
Let's translate this to english:
"Select rows from tbl where tbl has a row of the same ID and bigger amt."
What this does is select everything except the rows with maximum values of amt for each ID.
Note, the last line SELECT * FROM tbl is a separate query and probably not related to the question at hand.
As others have already pointed out, using EXISTS in a correlated subquery is essentially telling the database engine "return all records for which there is a corresponding record which meets the criteria specified in the subquery." But there's more.
The EXISTS keyword represents a boolean value. It could also be taken to mean "Where at least one record exists that matches the criteria in the WHERE statement." In other words, if a single record is found, "I'm done, and I don't need to search any further."
The efficiency gain that CAN result from using EXISTS in a correlated subquery comes from the fact that as soon as EXISTS returns TRUE, the subquery stops scanning records and returns a result. Similarly, a subquery which employs NOT EXISTS will return as soon as ANY record matches the criteria in the WHERE statement of the subquery.
I believe the idea is that the subquery using EXISTS is SUPPOSED to avoid the use of nested loop searches. As #Gordon Linoff states above though, the query optimizer may or may not perform as desired. I believe MS SQL Server usually takes full advantage of EXISTS.
My understanding is that not all queries benefit from EXISTS, but often, they will, particularly in the case of simple structures such as that in your example.
I may have butchered some of this, but conceptually I believe it's on the right track.
The caveat is that if you have a performance-critical query, it would be best to evaluate execution of a version using EXISTS with one using simple JOINS as Mr. Linoff indicates. Depending on your database engine, table structure, time of day, and the alignment of the moon and stars, it is not cut-and-dried which will be faster.
Last note - I agree with lc. When you use SELECT * in your subquery, you may well be negating some or all of any performance gain. SELECT only the PK field(s).

Sql Server query syntax

I need to perform a query like this:
SELECT *,
(SELECT Table1.Column
FROM Table1
INNER JOIN Table2 ON Table1.Table2Id = Table2.Id
) as tmp
FROM Table2 WHERE tmp = 1
I know I can take a workaround but I would like to know if this syntax is possible as it is (I think) in Mysql.
The query you posted won't work on sql server, because the sub query in your select clause could possibly return more than one row. I don't know how MySQL will treat it, but from what I'm reading MySQL will also yield an error if the sub query returns any duplicates. I do know that SQL Server won't even compile it.
The difference is that MySQL will at least attempt to run the query and if you're very lucky (Table2Id is unique in Table1) it will succeed. More probably is will return an error. SQL Server won't try to run it at all.
Here is a query that should run on either system, and won't cause an error if Table2Id is not unique in Table1. It will return "duplicate" rows in that case, where the only difference is the source of the Table1.Column value:
SELECT Table2.*, Table1.Column AS tmp
FROM Table1
INNER JOIN Table2 ON Table1.Table2Id = Table2.Id
WHERE Table1.Column = 1
Perhaps if you shared what you were trying to accomplish we could help you write a query that does it.
SELECT *
FROM (
SELECT t.*,
(
SELECT Table1.Column
FROM Table1
INNER JOIN
Table2
ON Table1.Table2Id = Table2.Id
) as tmp
FROM Table2 t
) q
WHERE tmp = 1
This is valid syntax, but it will fail (both in MySQL and in SQL Server) if the subquery returns more than 1 row
What exactly are you trying to do?
Please provide some sample data and desired resultset.
I agree with Joel's solution but I want to discuss why your query would be a bad idea to use (even though the syntax is essentially valid). This is a correlated subquery. The first issue with these is that they don't work if the subquery could possibly return more than one value for a record. The second and more critical problem (in my mind) is that they must work row by row rather than on the set of data. This means they will virtually always affect performance. So correlated subqueries should almost never be used in a production system. In this simple case, the join Joel showed is the correct solution.
If the subquery is more complicated, you may want to turn it into a derived table instead (this also fixes the more than one value associated to a record problem). While a derived table looks a lot like a correlated subquery to the uninitated, it does not perform the same way because it acts on the set of data rather than row-by row and thus will often be significantly faster. You are essentially making the query a table in the join.
Below is an example of your query re-written as a derived table. (Of course in production code you would not use select * either especially in a join, spell out the fields you need)
SELECT *
FROM Table2 t2
JOIN
(SELECT Table1.[Column], Table1.Table2Id as tmp
FROM Table1
INNER JOIN Table2 ON Table1.Table2Id = Table2.Id ) as t
ON t.Table2Id = Table2.Id
WHERE tmp = 1
You've already got a variety of answers, some of them more useful than others. But to answer your question directly:
No, SQL Server will not allow you to reference the column alias (defined in the select list) in the predicate (the WHERE clause). I think that is sufficient to answer the question you asked.
Additional details:
(this discussion goes beyond the original question you asked.)
As you noted, there are several workarounds available.
Most problematic with the query you posted (as others have already pointed out) is that we aren't guaranteed that the subquery in the SELECT list returns only one row. If it does return more than one row, SQL Server will throw a "too many rows" exception:
Subquery returned more than 1 value.
This is not permitted when the subquery
follows =, !=, , >= or when the
subquery is used as an expression.
For the following discussion, I'm going to assume that issue is already sufficiently addressed.
Sometimes, the easiest way to make the alias available in the predicate is to use an inline view.
SELECT v.*
FROM ( SELECT *
, (SELECT Table1.Column
FROM Table1
JOIN Table2 ON Table1.Table2Id = Table2.Id
WHERE Table1.Column = 1
) as tmp
FROM Table2
) v
WHERE v.tmp = 1
Note that SQL Server won't push the predicate for the outer query (WHERE v.tmp = 1) into the subquery in the inline view. So you need to push that in yourself, by including the WHERE Table1.Column = 1 predicate in the subquery, particularly if you're depending on that to make the subquery return only one value.
That's just one approach to working around the problem, there are others. I suspect that query plan for this SQL Server query is not going to be optimal, for performance, you probably want to go with a JOIN or an EXISTS predicate.
NOTE: I'm not an expert on using MySQL. I'm not all that familiar with MySQL support for subqueries. I do know (from painful experience) that subqueries weren't supported in MySQL 3.23, which made migrating an application from Oracle 8 to MySQL 3.23 particularly painful.
Oh and btw... of no interest to anyone in particular, the Teradata DBMS engine DOES have an extension that allows for the NAMED keyword in place of the AS keyword, and a NAMED expression CAN be referenced elsewhere in the QUERY, including the WHERE clause, the GROUP BY clause and the ORDER BY clause. Shuh-weeeet
That kind of syntax is basically valid (you need to move the where tmp=... to on outer "select * from (....)", though), although it's ambiguous since you have two sets named "Table2"- you should probably define aliases on at least one of your usages of that table to clear up the ambiguity.
Unless you intended that to return a column from table1 corresponding to columns in table2 ... in which case you might have wanted to simply join the tables?