I'm not quite sure how to describe this, and I'm not quite sure if it's just syntactical sugar. This is the first time I've seen it, and I'm having trouble finding a reference or explanation as to the why and what of it.
I have a query as follows:
select * from
table1
join table2 on field1 = field2
join (
table3
join table4 on field3 = field4
join table5 on field5 = field6
) on field3 = field2
-- notice the fields in the parens and outside the parens
-- are part of the on clause
Are the parentheses necessary? Will removing them change the join order? I'm in a SQL Server 2005 environment in this case. Thanks!
Join order should make no difference in the result set of a query using natural joins (outside of column order). The query
select *
from t1
join t2 on t2.t1_id = t1.id
produces the same result set as
select *
from t2
join t1 on t1.id = t2.t1_id
If you're using outer joins and change the order of the tables in the from clause, naturally the direction of the outer join must change:
select *
from t1
left join t2 on t2.t1_id = t1.id
is the same as
select *
from t2
right join t1 on t1.id = t2.t1_id
However, if you see a subquery used as a table, with syntax like
select *
from t1
join ( select t2.*
from t2
join t3 on t3.t2_id = t2.id
where t3.foobar = 37
) x on x.t1_id = t1.id
You'll note the table alias (x) assigned to the subquery above.
What you have is something called a derived table (though some people call it a virtual table). You can think of it as a temporary view that exists for the life of a query. It's particularly useful when you need to filter something based on something like the result of a aggregration (group by).
The T-SQL documentation on the select, under the from clause goes into the details:
http://msdn.microsoft.com/en-us/library/ms189499(v=SQL.100).aspx
http://msdn.microsoft.com/en-us/library/ms177634(v=sql.100).aspx
It's not necessary in this case.
It's necessary (or at the very least, a lot simpler) in some others, especially where you name the nested call:
select table1.fieldX, table2.fieldY, sq.field6 from
table1 join table2 on field1 = field2
join ( select
top 1 table3.field6
from table3 join table4
on field3 = field4
where table3.field7 = table2.field8
order by fieldGoshIveUsedALotOfFieldsAlready
) sq on sq.field6 = field12345
The code you had could have been:
Like the above once, and then refactored.
Machine produced.
Reflecting the thought process of the developer as he or she arrived at the query, as they thought of that part of the larger query as a unit, then worked it into the larger query.
In this case they are not necessary:
select * from table1
join table2 on field1 = field2
join table3 on field3 = field2
join table4 on field3 = field4
join table5 on field5 = field6
Produces the same result.
Related
Initially, I have a query like below, doing a join on 1=1. (It's simply doing a cross join, which selects all rows from the first table and all rows from the second table and shows as a cartesian product, i.e. with all possibilities.)
SELECT * FROM Table1 t1
JOIN Table2 t2 ON 1=1
Problem: Optimize this query in such a way, it will show only the records for a particular ID and if we don't have an ID or have a NULL in the ID then it will show the result same as previously(1=1). So I wrote the script below.
Declare #T2id as int;
Set #T2id = 123;
SELECT * FROM Table1 t1
JOIN Table2 t2 ON
-- left side of join on statement
CASE
WHEN #T2id Is NULL
THEN 1
ELSE
t2.Id
END
=
-- right side of join on statement
CASE
WHEN #T2id Is NULL
THEN 1
ELSE
#T2id
END
Can anyone confirm, is it good or we can have a better approach than this?
I think your way of presenting a cross-join is something I haven't seen before.
My view is it's simpler to read and understand if you just:
SELECT *
FROM Table1 t1, Table2 t2
As for the question, assuming SQL Server (you didn't tag the RDBMS, but I guess from your variable declaration) you might consider:
IF ISNULL(#T2id,1) = 1
SELECT *
FROM Table1 t1, Table2 t2;
ELSE
SELECT *
FROM Table1 t1
INNER JOIN Table2 t2 ON t1.id = t2.id
WHERE t2.id = #T2id;
I noticed today that this query
select * from table1 table2 where column_from_table1 = ?;
works. It works the same as (same columns return)
select * from table1 where column_from_table1 = ?;
Shouldn't the former be a syntax error? What is it interpreting table2 as?
Appears it's interpreting it as renaming the table, even though table2 exists it happily allows the rename, this also works:
select * from table1 asdf where asdf.column_from_table1 = ?;
select * from table1 table2 where column_from_table1 = ?;
table2 is working as a table alias for table1. It's not being used as the name of an object in the database at all. The fact that a table named table2 exists is wholly irrelevant to this query. Usually you'd see something like this:
select t.id, t.name from table1 t where t.column_from_table1 = ?;
Some RDBMSs require the as keyword, so you'll also see this:
SELECT t.id, t.name FROM table1 AS t WHERE t.column_from_table1 = ?;
Table aliases are useful for making queries with multiple tables easier to write, especially if they have shared column names which need to be qualified. They're also essential for self-joins where a table is joined to itself.
Example of a join using aliases:
SELECT t1.Id,
t1.Name as t1_Name
t2.Name as t2_Name
FROM table1 t1
JOIN table2 t2
ON t1.id = t2.id
WHERE t1.column_from_table1 = ?;
Or, for a self-join to look for duplicate Name values, for example:
SELECT t1.Name,
t1.Id
t2.Id as Dupe_Id
FROM table1 t1
JOIN table1 t2
ON t1.Name = t2.Name
WHERE t1.Id < t2.Id;
Notice that this query is referring to table1 twice and uses the aliases of t1 and t2 to differentiate which it's referring to.
Note that a comma join, such as FROM table1, table2 WHERE table1.id = table2.id is very old syntax that should be explicitly avoided when writing queries. The older syntax is difficult to read and maintain and doesn't support outer joins except by vender-specific extensions. The newer syntax with the JOIN keyword was introduced in standard SQL in 1992. There's no reason to still be using comma joins.
I have three tables and I have to write one query to update table 1 row from table 3 and the only matching columns I have is in table 2.
Table 1 which has incorrect data:
Table 3 has the correct data:
I did try to write a query and execute it but it gives me an error saying there are too many rows too select which is true I do have many rows to correct but it still wouldn't correct. What do you think I should do. This is my query so far.
UPDATE Table1
SET Table1.Number = (SELECT Table3.Number
FROM Table2
FULL OUTER JOIN Table1 ON Table1.ID = Table2.ID
FULL OUTER JOIN Table3 ON Table3.Signin = Table2.Signin
WHERE (Table2.ID = Table1.ID)
AND (Table1.Number = 'xxx'))
WHERE (Tale1.Number = 'xxx')
In Where clause of JOIN query need to modify as multiple records are generating by inappropriate condition.Try to use Table3 components instead of using Table1 in joining query where clause.
UPDATE Table1
SET Table1.NUMBER = (SELECT table3.NUMBER FROM Table1 FULL OUTER JOIN Table2
ON Table1.ID = Table2.ID
FULL OUTER JOIN Table3
ON Table2.SIGNIN = Table3.SIGNIN
WHERE Table3.SIGNIN = 100) // This is the point where you need to modify your code
WHERE Table1.ID = 1;
ONLINE DEMO HERE
It actually worked after I removed this line from my query.
FULL OUTER JOIN Table1 ON table1.ID = Table2.ID
Thanks for the help.
You are fairly close. When doing the update though unless you are wanting to clear value for t1.number when a record is not matched in t3, you will want to use INNER JOIN. FULL OUTER JOIN would mean you are trying to update rows in t1 that don't exist but a LEFT JOIN you would update t1.number to NULL if a record in t3 doesn't exist.
UPDATE t1
SET t1.Number = t3.Number
FROM
Table1 t1
INNER JOIN Table2 t2
ON t1.Id = t2.Id
INNER JOIN Table3 t3
ON t2.Signin = t.3.Signin
WHERE
t1.number <> t3.number
--Or if you have nulls something like
--ISNULL(t1.number,'xxx') <> ISNULL(t3.number,'xxx')
-- if you only want to update when t1.number = 'xxx' then
--t1.number = 'xxx'
t1,t2,t3 are table aliases that I created by adding the alias after table name. By using join syntax rather than a sub select you simplify your were conditions. In sql-sever if more than 1 record in t2 & t3 match it will select one row randomly in the case of a one to many relationship. If you want a specific record when not one to one relation you can use window functions and common table expressions (cte) to limit t3 to the exact record you want to use.
For example1:
select T1.*, T2.*
from TABLE1 T1, TABLE2 T2
where T1.id = T2.id
and T1.name = 'foo'
and T2.name = 'bar';
That will first join T1 and T2 together by id, then select the records that satisfy the name conditions?
Or select the records that satisfy the name condition in T1 or T2, then join those together?
And, Is there a difference in performance between example1 and example2(DB2)?
example2:
select *
from
(
select * from TABLE1 T1 where T1.name = 'foo'
) A,
(
select * from TABLE2 T2 where T2.name = 'bar'
) B
where A.id = B.id;
How the query will be executed depends on what the query planner does with it. Depending on the available indexes and how much data is in the tables the query plan may look different. The planner tries to do the work in the order that it thinks is most efficient.
If the planner does a good job, the plan for both queries should be the same, otherwise the first query is likely to be faster because the second would create two intermediate results that doesn't have any indexes.
Exemple 1 is more efficient because it has no embedded queries. About how the result set is build, I have no idea - I don't know DB2.
I am trying to rewrite this block with simpler logic if this can be done. I am using it within a larger SELECT statement and I think IF I can simplify this block, I might be able to improve performance of my query.
proj_catg_type_id, proj_catg_id and proj_id are all PKs in their tables.
select t1.proj_catg_name
from table1 t1, table2 t2, table3 t3
where t2.proj_catg_type_id = t1.proj_catg_type_id
and t2.proj_catg_type_id = 213
and t3.proj_id = t2.proj_id
Without knowing the referential integrety rules and the logic behind the tables it is difficult to give a 100% correct answer. But just by looking to this statement the most simplified logic would be
select t1.proj_catg_name
from table1 t1
where t1.proj_catg_type_id = 213;
select t1.proj_catg_name
from table1 t1 inner join table2 t2
on t2.proj_catg_type_id=t1.proj_catg_type_id
where t2.proj_catg_type_id=213
and t3.proj_id=t2.proj_i
maybe? is t3 used outside this subselect?
If t3 is a table outside the selct you showed, then this is a correlated subquery which you should not be using at all, ever! That turns your query into a row-by agonizing row cursor.
Use derived tables or joins to get the results.
You don't give me enough code to write a specific solution for your problem, but let me give you an example:
SELECT
field1
, field2
, (SELECT t3.field3
FROM table2 t2
JOIN table3 t3 ON t2.id = t3.id
WHERE t4.somefield = t2.somefield)
FROM table1 t1
JOIn table4 t4 ON t1.id = t4.id
SELECT
field1
, field2
, t3.field3
FROM table1 t1
JOIn table4 t4
ON t1.id = t4.id
join (SELECT field3
FROM table2 t2
JOIN table3 t3 ON t2.id = t3.id) a
ON t4.somefield = t2.somefield
The first query runs one row at a time which is extremely slow. The second should give the same results but runs in a set-based fashion which is much faster. It is important to make sure the derived table has an a alias. You could also use a CTE.