Postgres/netezza multiple join from multiple tables - sql

Hi I have a problem when migrating from ORACLE to Netezza, netezza seems to have problem if multiple tables is declared before using JOIN`s. How could I write this join differently ?
INSERT INTO...
SELECT...
FROM table1 t1, table2 t2 //here seems to be the problem as postgres dont allow to put two tables in FROM clause if there are JOIN`s involved
JOIN talbe3 t3 ON t2.column = t3.column
JOIN table4 t4 ON t2.column = t4.column
LEFT OUTER JOIN table5 t5 ON (t4.column=t5.column AND t4.column=t2.column AND t4.column=t3.column)
WHERE....;

You simply should not mix old-style (implicit) and new-style (explicit) joins. In fact, a simple rule is simply to avoid commas in the from clause.
I imagine the problem that you have is a scoping problem for the table aliases. I know this happens in MySQL. But, because I never use commas in from clauses, I am not aware of how this affects other databases. I think the part of the from clause after the comma is parsed as a unit, and the aliases defined before are not known during this parsing stage.
In any case, whatever the problem, the simple solution is to replace the comma with CROSS JOIN:
INSERT INTO...
SELECT...
FROM table1 t1 CROSS JOIN table2 t2 //here seems to be the problem as postgres dont allow to put two tables in FROM clause if there are JOIN`s involved
JOIN table3 t3 ON t2.column = t3.column
JOIN table4 t4 ON t2.column = t4.column
LEFT OUTER JOIN table5 t5 ON (t4.column=t5.column AND t4.column=t2.column AND t4.column=t3.column)
WHERE....;
This should work in all the databases you mention -- and more.

Related

Joins with WHERE - splitting WHERE clauses

I solved the query at this link
Can you return a list of characters and TV shows that are not named "Willow Rosenberg" and not in the show "How I Met Your Mother"?
with the following code:
SELECT ch.name,sh.name
FROM character ch
INNER JOIN character_tv_show chat
ON ch.id = chat.character_id
INNER JOIN tv_show sh
ON chat.tv_show_id=sh.id
WHERE ch.name != "Willow Rosenberg" AND sh.name !="How I Met Your Mother"
;
However, my first try was:
SELECT ch.name,sh.name
FROM character ch
WHERE ch.name != "Willow Rosenberg" /*This here*/
INNER JOIN character_tv_show chat
ON ch.id = chat.character_id
INNER JOIN tv_show sh
ON chat.tv_show_id=sh.id
WHERE sh.name !="How I Met Your Mother"
;
because I thought that in this way only the table character would have been filtered before doing the joins and, therefore, it would have been less computationally heavy.
Does it make any sense?
Is there a way to "split" the WHERE clause when joining multiple tables?
Think of JOINs as a cross-product of two tables, which is filtered using the conditions specified in the ON clause. Your WHERE clause is then applied on the result set, and not on the individual tables participating in the join.
If you want to apply WHERE on only one of the joined tables, you'll have to use a sub-query. The filtered result of that sub-query will then be treated as a normal table and joined with a real table using JOIN again.
If you are doing this for performance, remember though that a join is almost always faster on standard JOINs compared to sub-queries, for properly indexed tables. You'll find that queries using JOIN will be orders of magnitude faster than the ones using sub-queries, except for rare cases.
You can using subqueries
SELECT ch.name,sh.name
FROM (
SELECT ch.name
FROM character ch
WHERE ch.name != "Willow Rosenberg") ch
INNER JOIN character_tv_show chat
ON ch.id = chat.character_id
INNER JOIN tv_show sh
ON chat.tv_show_id=sh.id
WHERE sh.name !="How I Met Your Mother"
but i think it don't have sense. subqueries will make temp table.
First query will be optimized by database server, and likely select only rows from character table that need
JOIN and WHERE clauses are not necessarily executed in the order you write them. In general, the query optimizer will rearrange things to make them as efficient as possible (or at least what it thinks is most efficient), so adding a second WHERE clause wouldn't be any different from adding another AND condition (which is why it's not allowed).
Your idea wasn't bad, but it's just not how databases actually work.
A SELECT can only have 1 WHERE clause.
And it comes after the JOIN's.
But you can have additional WHERE clauses in the sub-queries you join.
And sometimes a criteria that you've added to a WHERE clause can be moved to the ON of a JOIN.
For example the queries below would return the same results
SELECT *
FROM Table1 AS t1
JOIN Table2 AS t2 ON t2.ID = t1.table2ID
WHERE t1.Col1 = 'foo'
AND t2.Col1 = 'bar'
SELECT *
FROM
(
SELECT *
FROM Table1
WHERE Col1 = 'foo'
) AS t1
JOIN Table2 AS t2 ON t2.ID = t1.table2ID
WHERE t2.Col1 = 'bar'
SELECT *
FROM Table1 AS t1
JOIN Table2 AS t2 ON (t2.ID = t1.table2ID AND t2.Col1 = 'bar')
WHERE t1.Col1 = 'foo'

what operation does "select from table1, table2 " imply? [duplicate]

This question already has answers here:
Select from Table1, Table2
(3 answers)
Closed 4 years ago.
I know different joins, but I wanted to know which of them is being used when we run queries like this:
select * from table1 t1, table2 t2
is it full outer join or natural join for example?
Also does it have a unique meaning among different databases or all do the same?
UPDATE: what if we add where clause ? will it be always inner join?
The comma in the from clause -- by itself -- is equivalent to cross join in almost all databases. So:
from table1 t1, table2 t2
is functionally equivalent to:
from table1 t1 cross join table2 t2
They are not exactly equivalent, because the scoping rules within the from clause are slightly different. So:
from table1 t1, table2 t2 join
table3 t3
on t1.x = t3.x
generates an error, whereas the equivalent query with cross join works.
In general, conditions in the WHERE clause will always result in the INNER JOIN. However, some databases have extended the syntax to support outer joins in the WHERE clause.
I can think of one exception where the comma does not mean CROSS JOIN. Google's BigQuery originally used the comma for UNION ALL. However, that is only in Legacy SQL and they have removed that in Standard SQL.
Commas in the FROM clause have been out of fashion since the 1900s. They are the "original" form of joining tables in SQL, but explicit JOIN syntax is much better.
To me, they also mean someone who learned SQL decades ago and refused to learn about outer joins, or someone who has learned SQL from ancient materials -- and doesn't know a lot of other things that SQL does.
demo: db<>fiddle
This is a CROSS JOIN (cartesian product). So both of the following queries are equal
SELECT * FROM table1, table2 -- implicit CROSS JOIN
SELECT * FROM table1 CROSS JOIN table1 -- explicit CROSS JOIN
concerning UPDATE
A WHERE clause makes the general CROSS JOIN to an INNER JOIN. An INNER JOIN can be got by three ways:
SELECT * FROM table1, table2 WHERE table1.id = table2.id -- implicit CROSS JOIN notation
SELECT * FROM table1 CROSS JOIN table2 WHERE table1.id = table2.id -- really unusual!: explicit CROSS JOIN notation
SELECT * FROM table1 INNER JOIN table2 ON (table1.id = table2.id) -- explicit INNER JOIN NOTATION
Further reading (wikipedia)

Can the SQL JOIN 'USING()' clause be used with one or more 'AND ON()' clauses for a single join..?

In a DB2-400 SQL join, can the USING() clause be used with one or more AND ON clauses for a single join..? This is for a situation where some field names are the same, but not all, so USING() would only apply to part of the join.
I could have sworn I've done this before and it worked, but now it eludes me.
I've tried various combinations as shown below, but none of them work. Perhaps I'm simply mistaken and it's not possible:
SELECT * FROM T1 INNER JOIN T2 USING (COL1,COL2) AND ON (T1.COL3=T2.COL4)
SELECT * FROM T1 INNER JOIN T2 ON (T1.COL3=T2.COL4) AND USING (COL1,COL2)
SELECT * FROM T1 INNER JOIN T2 ON (T1.COL3=T2.COL4), USING (COL1,COL2)
SELECT * FROM T1 INNER JOIN T2 USING (COL1,COL2,(1.COL3=T2.COL4))
Checking the syntax diagram here https://www.ibm.com/support/knowledgecenter/en/ssw_ibm_i_72/db2/rbafzjoinedt.htm
I suggest that the only options for JOIN USING are a comma separated list of columns
JOIN table-reference USING ( column-name [, column-name] ... )
and you can't mix USING with ON
You can use where:
SELECT *
FROM T1 INNER JOIN
T2 USING (COL1, COL2)
WHERE T1.COL3 = T2.COL4;
Another alternative would be to use a subquery to rename the column in one of the tables.

How to simplify to use ISNULL with LEFT OUTER JOIN to merge tables

I need to merge below Table 1 & Table 2 as indicated in "Merge".
The query I wrote below works; but it is TOO SLOW! Is there any simpler way which returns it faster?
SELECT T1.Loc, T2.SO, T2.PO, T1.Item
FROM Table1 T1
LEFT OUTER JOIN Table2 T2
ON ISNULL(T1.[SO2],T1.[SO1]) = T2.[SO]
Never use commas in the FROM clause. Always use proper, explicit JOIN syntax:
SELECT *
FROM Table1 T1 LEFT OUTER JOIN
Table2 T2
ON COALESCE(T1.[SO2], T1.[SO1]) = T2.[SO];
There is no need to join to Table2 twice (much less getting a Cartesian product). I prefer COALESCE() to ISNULL() simply because the former is the ANSI standard function for replacing NULL values.
I should add that for performance, two joins might be preferable:
SELECT T1.Loc, COALESCE(T2_2.SO, T2_1.SO) as SO,
COALESCE(T2_2.PO, T2_1.PO) as PO, T1.Item
FROM Table1 T1 LEFT OUTER JOIN
Table2 T2_2
ON T1.[SO2] = T2_2.[SO] LEFT OUTER JOIN
Table2 T2_1
ON T1.[SO1] = T2_1.[SO] AND T1.[SO2] IS NULL;
It is easier for the SQL engine to optimize joins that use only equality and AND conditions, on the clean column references (that is, the columns are not the arguments to any function calls).
Hope this SQL full fill your requirement.
Select T2.Loc, T2.SO, T2.PO, T1.item from table 2 T2 join table1 T1
on T1.SO1 = T2.SO;

Query design for nested statements and CTEs

I have a query that sequentially joins 6 tables from their original data sources. Nested, it's a mess:
SELECT
FROM
(
SELECT
FROM
(
SELECT
FROM
(. . .)
INNER JOIN
)
INNER JOIN
)
I switched to CTE definitions, and each definition is one join on a previous definition, with the final query at the end providing the result:
WITH
Table1 (field1, field2) AS
(
SELECT
FROM
INNER JOIN
),
Table2 (field2, field3) AS
(
SELECT
FROM Table1
INNER JOIN
), . . .
SELECT
FROM Table 6
This is a lot more readable, and dependencies flow downward in logical order. However, this doesn't seem like the intended use of CTEs (and also why I'm not using Views), since each definition is really only referenced once in order.
Is there any guidance out there on how to construct sequentially nested joins like this that is both readable and logical in structure?
I don't think there is anything wrong in utilizing CTE to create temporary views.
In a larger shop, there are roles defined that separates the
responsibility of DBAs versus developers. The CREATE statement, in general, will be the victim of this bureaucracy. Hence, no view. CTE is a very good
compromise.
If the views are not really reusable anyway, keeping it with the SQL makes it more readable.
CTE is a lot more readable and intuitive than sub-queries (even with
just one level). If your subqueries are not correlated, I would just suggesting
converting all of your sub-queries to CTE.
Recursion is the "killer" app for CTE, but it doesn't mean that you shouldn't use CTE, otherwise.
The only con that I can think of is that (depending on your Database Engine) it might confuse or prevent the optimizer from doing what it's suppose to do. Optimizers are smart enough to rewrite subqueries for you.
Now, let us discuss abuse of CTE, be careful that you don't substitute your application developer knowledge for Database Engine optimization. There are a lot of smart developers (smarter that us) that designed this software, just write the query without cte or subqueries as much as possible and let the DB do the work. For example, I often see developers DISTINCT/WHERE every key in a subquery before doing their join. You may think your doing the right thing, but you're not.
With regards to your question, most people intend to solve problems and not discuss something theoretical. Hence, you get people scratching their heads on what you are after. I wouldn't say you didn't imply that in your text, but perhaps be more forceful.
May be I didn't understand the question but what's wrong with:
select * from table1 t1, table2 t2, table3 t3, table4 t4, table5 t5, table6 t6
where t1.id = t2.id and t2.id = t3.id and t3.id = t4.id
and t4.id = t5.id and t5.id = t6.id
or same using table 1 t1 INNER JOIN table2 t2 ON t1.id = t2.id ....
why didn't you just join your tables like this
select *
from Table1 as t1
inner join Table2 as t2 on t2.<column> = t1.<column>
inner join Table3 as t3 on t3.<column> = t2.<column>
inner join Table4 as t4 on t4.<column> = t3.<column>
inner join Table5 as t5 on t5.<column> = t4.<column>
inner join Table6 as t6 on t6.<column> = t5.<column>