I am writing a query for one of my applications that I support, and I am looking for what will cost less in trying to get a DB2 version of an OR statement in the join. Since you can't use "OR" inside on a
Select *
FROM
TABLE A INNER JOIN TABLE B
ON
A.USERNAME = B.NAME
OR
A.USERNAME = B.USERNAME
I have been looking at trying UNION or UNION ALL, but I am worried that the match up may take longer to return.
While I join others in confirming that OR can easily be used in join conditions, like in any other conditions (and so your query is perfectly fine), you could still do without OR in this particular case.
Conditions like column = value1 OR column = value2, where the same column is tested against a number of various fixed values, could be transformed into column IN (value1, value2). The latter should work no worse than the former and arguably looks clearer in the end.
So:
SELECT *
FROM
TABLE A
INNER JOIN
TABLE B
ON
A.USERNAME IN (B.NAME, B.USERNAME)
You can revert to using the WHERE clause syntax for joining. If the DB2 optimizer is clever (and I suspect it is), it should perform equally as well as JOIN:
SELECT * FROM TABLE1 A, TABLE2 B
WHERE A.USERNAME = B.NAME OR A.USERNAME=B.USERNAME
SELECT GROUP_REDUCTION.TOUR, FROM_DATE, TO_DATE, DISCOUNT, GROUP_SIZE, REDUCTION
FROM SEASON_DISCOUNT
RIGHT JOIN GROUP_REDUCTION ON SEASON_DISCOUNT.TOUR = GROUP_REDUCTION.TOUR
ORDER BY GROUP_REDUCTION.TOUR, FROM_DATE;
Related
Using Snowflake,have 2 tables, one with many columns and the other with a few, trying to select * on their join, get the following error:
SQL compilation error:duplicate column name
which makes sense because my joining columns are in both tables, could probably use select with columns names instead of *, but is there a way I could avoid that? or at least have the query infer the columns names dynamically from any table it gets?
I am quite sure snowflake will let you choose all from both halves of two+ tables via
SELECT a.*, b.*
FROM table_a AS a
JOIN table_b AS b
ON a.x = b.x
what you will not be able to do is refer to the named of the columns in GROUP BY indirectly, thus this will not work
SELECT a.*, b.*
FROM table_a AS a
JOIN table_b AS b
ON a.x = b.x
ORDER BY x
even though some databases know because you have JOIN ON a.x = b.x there is only one x, snowflake will not allow it (well it didn't last time I tried this)
but you can with the above use the alias name or the output column position thus both the following will work.
SELECT a.*, b.*
FROM table_a AS a
JOIN table_b AS b
ON a.x = b.x
ORDER BY a.x
SELECT a.*, b.*
FROM table_a AS a
JOIN table_b AS b
ON a.x = b.x
ORDER BY 1 -- assuming x is the first column
in general the * and a.* forms are super convenient, but are actually bad for performance.
when selecting you are now are risk of getting the columns back in a different order if the table has been recreated, thus making reading code unstable. Which also impacts VIEWs.
It also means all meta data for the table need to be loaded to know what the complete form of the data will be in. Where if you want x,y,z only and later a w was added to the table, the whole query plan can be compiled faster.
Lastly if you are selecting SELECT * FROM table in a sub-select and only a sub-set of those columns are needed the execution compiler doesn't need to prune these. And if all variables are attached to a correctly aliased table, if later a second table adds the same named column, naked columns are not later ambiguous. Which will only occur when that SQL is run, which might be an "annual report" which doesn't happen that often. wow, what a long use alias rant.
You can prefix the name of the column with the name of the table:
select table_a.id, table_b.name from table_a join table_b using (id)
The same works in combination with *:
select table_a.id, table_b.* from table_a join table_b using (id)
It works in "join" and "where" parts of the statement as well
select table_a.id, table_b.* from table_a join table_b
on table_a.id = table_b.id where table_b.name LIKE 'b%'
You can use table aliases to make the statement sorter:
select a.id, b.* from table_a a join table_b b
on a.id = b.id
Aliases could be applies on fields to use in subqueries, client software and (depending on the SQL server) in the other parts of the statements, for example 'order by':
select a.id as a_id, b.* from table_a a join table_b b
on a.id = b.id order by a_id
If you're after a result that includes all the distinct non-join columns from each table in the join with the join columns included in the output only once (given they will be identical for an inner-join) you can use NATURAL JOIN.
e.g.
select * from d1 natural inner join d2 order by id;
See examples: https://docs.snowflake.com/en/sql-reference/constructs/join.html#examples
I have a long query with a with structure. At the end of it, I'd like to output two tables. Is this possible?
(The tables and queries are in snowflake SQL by the way.)
The code looks like this:
with table_a as (
select id,
product_a
from x.x ),
table_b as (
select id,
product_b
from x.y ),
table_c as (
..... many more alias tables and subqueries here .....
)
select * from table_g where z = 3 ;
But for the very last row, I'd like to query table_g twice, once with z = 3 and once with another condition, so I get two tables as the result. Is there a way of doing that (ending with two queries rather than just one) or do I have to re-run the whole code for each table I want as output?
One query = One result set. That's just the way that RDBMS's work.
A CTE (WITH statement) is just syntactic sugar for a subquery.
For instance, a query similar to yours:
with table_a as (
select id,
product_a
from x.x ),
table_b as (
select id,
product_b
from x.y ),
table_c as (
select id,
product_c
from x.z ),
select *
from table_a
inner join table_b on table_a.id = table_b.id
inner join table_c on table_b.id = table_c.id;
Is 100% identical to:
select *
from
(select id, product_a from x.x) table_a
inner join (select id, product_b from x.y) table_b
on table_a.id = table_b.id
inner join (select id, product_c from x.z) table_c
on table_b.id = table_c.id
The CTE version doesn't give you any extra features that aren't available in the non-cte version (with the exception of a recursive cte) and the execution path will be 100% the same (EDIT: Please see Simon's answer and comment below where he notes that Snowflake may materialize the derived table defined by the CTE so that it only has to perform that step once should the CTE be referenced multiple times in the main query). As such there is still no way to get a second result set from the single query.
While they are the same syntactically, they don't have the same performance plan.
The first case can be when one of the stages in the CTE is expensive, and is reused via other CTE's or join to many times, under Snowflake, use them as a CTE I have witness it running the "expensive" part only a single time, which can be good so for example like this.
WITH expensive_select AS (
SELECT a.a, b.b, c.c
FROM table_a AS a
JOIN table_b AS b
JOIN table_c AS c
WHERE complex_filters
), do_some_thing_with_results AS (
SELECT stuff
FROM expensive_select
WHERE filters_1
), do_some_agregation AS (
SELECT a, SUM(b) as sum_b
FROM expensive_select
WHERE filters_2
)
SELECT a.a
,a.b
,b.stuff
,c.sum_b
FROM expensive_select AS a
LEFT JOIN do_some_thing_with_results AS b ON a.a = b.a
LEFT JOIN do_some_agregation AS c ON a.a = b.a;
This was originally unrolled, and the expensive part was some VIEWS that the date range filter that was applied at the top level were not getting pushed down (due to window functions) so resulted in full table scans, multiple times. Where pushing them into the CTE the cost was paid once. (In our case putting date range filters in the CTE made Snowflake notice the filters and push them down into the view, and things can change, a few weeks later the original code ran as good as the modified, so they "fixed" something)
In other cases, like this the different paths that used the CTE use smaller sub-sets of the results, so using the CTE reduced the remote IO so improved performance, there then was more stalls in the execution plan.
I also use CTEs like this to make the code easier to read, but giving the CTE a meaningful name, but the aliasing it to something short, for use. Really love that.
I am been asked to do performance tuning of a SQL Server query which has so many joins in it.
For example
LEFT JOIN
vw_BILLABLE_CENSUS_R CEN ON DE.Client = CEN.Client
AND CAL.REPORTING_MONTH = CEN.REPORTING_MONTH
There are almost 25 columns present in vw_Billing_Cenus_R but we wanted to use only 3 of them. So I wanted to know instead of selecting all the columns from the view or table, if I only select those columns which are required and then perform join like this
LEFT JOIN (SELECT [Column_1], [Column_2], [Column_3]
FROM vw_BILLABLE_CENSUS_R) CEN ON DE.Client = CEN.Client
AND CAL.REPORTING_MONTH = CEN.REPORTING_MONTH
So Will this improve the performance or not?
The important part is the columns you are actually using on the outmost SELECT, not the ones to are selecting to join. The SQL Server engine is smart enough to realize that he does not need to retrieve all columns from the referenced table (or view) if he doesn't need them.
So the following 2 queries should yield the exact same query execution plan:
SELECT
A.SomeColumn
FROM
MyTable AS A
LEFT JOIN (
SELECT
*
FROM
OtherTable AS B) AS X ON A.SomeColumn = X.SomeColumn
SELECT
A.SomeColumn
FROM
MyTable AS A
LEFT JOIN (
SELECT
B.SomeColumn
FROM
OtherTable AS B) AS X ON A.SomeColumn = X.SomeColumn
The difference would be if you actually use the selected column (in a conditional where or actually retrieving the value), as in here:
SELECT
A.SomeColumn,
X.* -- * has all X columns
FROM
MyTable AS A
LEFT JOIN (
SELECT
B.*
FROM
OtherTable AS B) AS X ON A.SomeColumn = X.SomeColumn
SELECT
A.SomeColumn,
X.* -- * has only X's SomeColumn
FROM
MyTable AS A
LEFT JOIN (
SELECT
B.SomeColumn
FROM
OtherTable AS B) AS X ON A.SomeColumn = X.SomeColumn
I would rather use this approach:
LEFT JOIN
vw_BILLABLE_CENSUS_R CEN ON DE.Client = CEN.Client
AND CAL.REPORTING_MONTH = CEN.REPORTING_MONTH
than this
LEFT JOIN (SELECT [Column_1], [Column_2], [Column_3]
FROM vw_BILLABLE_CENSUS_R) CEN ON DE.Client = CEN.Client
AND CAL.REPORTING_MONTH = CEN.REPORTING_MONTH
Since in this case:
you make your query simpler,
you does not have to rely on query optimizer smartness and expect that it will eliminate unnecessary columns and rows
finally, you can select as many columns in the outer SELECT as necessary without using derived tables techniques.
In some cases, derived tables are welcome, when you want to eliminate duplicates in a table you want to join on a fly, but, imho, not in your case.
It depends on how many records are stored, but generally it will improve performance.
In this case read #LukStorms ' comments, I think he is right
Joins are usually used to fetch data from 2 tables using a common factor from either tables
Is it possible to use a join statement using a table and results of another SQL statement and if it is what is the syntax
Sure, this is called a derived table
such as:
select a.column, b.column
from
table1 a
join (select statement) b
on b.column = a.column
keep in mind that it will run the select for the derived table in entirety, so it can be helpful if you only select things you need.
EDIT: I've found that I rarely need to use this technique unless I am joining on some aggregated queries.... so I would carefully consider your design here.
For example, thus far most demonstrations in this thread have not required the use of a derived table.
It depends on what the other statement is, but one of the techniques you can use is common table expressions - this may not be available on your particular SQL platform.
In the case of SQL Server, if the other statement is a stored procedure, you may have to insert the results into a temporary table and join to that.
It's also possible in SQL Server (and some other platforms) to have table-valued functions which can be joined just like a view or table.
select *
from TableA a
inner join (select x from TableB) b
on a.x = b.x
Select c.CustomerCode, c.CustomerName, sq.AccountBalance
From Customers c
Join (
Select CustomerCode, AccountBalance
From Balances
)sq on c.CustomerCode = sq.CustomerCode
Sure, as an example:
SELECT *
FROM Employees E
INNER JOIN
(
SELECT EmployeeID, COUNT(EmployeeID) as ComplaintCount
FROM Complaints
GROUP BY EmployeeID
) C ON E.EmployeeID = C.EmployeeID
WHERE C.ComplaintCount > 3
It is. But what specifically are you looking to do?
That can be done with either a sub-select, a view or a temp table... More information would help us answer this question better, including which SQL software, and an example of what you'd like to do.
Try this:
SELECT T1.col1, t2.col2 FROM Table1 t1 INNER JOIN
(SELECT col1, col2, col3 FROM Table 2) t2 ON t1.col1 = t2.col1
I'm creating a really complex dynamic sql, it's got to return one row per user, but now I have to join against a one to many table. I do an outer join to make sure I get at least one row back (and can check for null to see if there's data in that table) but I have to make sure I only get one row back from this outer join part if there's multiple rows in this second table for this user.
So far I've come up with this: (sybase)
SELECT a.user_id
FROM table1 a
,table2 b
WHERE a.user_id = b.user_id
AND a.sub_id = (
SELECT min(c.sub_id)
FROM table2 c
WHERE b.sub_id = c.sub_id
)
The subquery finds the min value in the one to many table for that particular user.
This works but I fear nastiness from doing correlated subqueries when table 1 and 2 get very large.
Is there a better way? I'm trying to dream up a way to get joins to do it, but I'm not seeing it.
Also saying "where rowcount=1" or "top 1" doesn't help me, because I'm not trying to fix the above query, I'm ADDING the above to an already complex query.
In MySql you can ensure that any query returns at most X rows using
select *
from foo
where bar = 1
limit X;
Unfortunately, I'm fairly sure this is a MySQL-specific extension to SQL. However, a Google search for something like "mysql sybase limit" might turn up an equivalent for Sybase.
A few quick points:
You need to have definitive business rules. If the query returns more than one row then you need to think about why (beyond just "it's a 1:many relationship - WHY is it a 1:many relationship?). You should come up with the business solution rather than just use "min" because it gives you 1 row. The business solution might simply be "take the first one", in which case min might be the answer, but you need to make sure that's a conscious decision.
You should really try to use the ANSI syntax for joins. Not just because it's standard, but because the syntax that you have isn't really doing what you think it's doing (it's not an outer join) and some things are simply impossible to do with the syntax that you have.
Assuming that you end up using the MIN solution, here's one possible solution without the subquery. You should test it with various other solutions to make sure that they are equivalent in outcome and to see which performs the best.
SELECT
a.user_id, b.*
FROM
dbo.Table_1 a
LEFT OUTER JOIN dbo.Table_2 b ON b.user_id = a.user_id AND b.sub_id = a.sub_id
LEFT OUTER JOIN dbo.Table_2 c ON c.user_id = a.user_id AND c.sub_id < b.sub_id
WHERE
c.user_id IS NULL
You'll need to test this to see if it's really giving what you want and you might need to tweak it, but the basic idea is to use the second LEFT OUTER JOIN to ensure that there are no rows that exist with a lower sub_id than the one found in the first LEFT OUTER JOIN (if any is found). You can adjust the criteria in the second LEFT OUTER JOIN depending on the final business rules.
How about:
select a.user_id
from table1 a
where exists (select null from table2 b
where a.user_id = b.user_id
)
Maybe your example is too simplified, but I'd use a group by:
SELECT
a.user_id
FROM
table1 a
LEFT OUTER JOIN table2 b ON (a.user_id = b.user_id)
GROUP BY
a.user_id
I fear the only other way would be using nested queries:
The difference between this query and your example is a 'sub table' is only generated once, however in your example you generate a 'sub table' for each row in table1 (but may depend on the compiler, so you might want to use query analyser to check performance).
SELECT
a.user_id,
b.sub_id
FROM
table1 a
LEFT OUTER JOIN (
SELECT
user_id,
min(sub_id) as sub_id,
FROM
table2
GROUP BY
user_id
) b ON (a.user_id = b.user_id)
Also, if your query is getting quite complex I'd use temporary tables to simplify the code, it might cost a little more in processing time, but will make your queries much easier to maintain.
A Temp Table example would be:
SELECT
user_id
INTO
#table1
FROM
table1
WHERE
.....
SELECT
a.user_id,
min(b.sub_id) as sub_id,
INTO
#table2
FROM
#table1 a
INNER JOIN table2 b ON (a.user_id = b.user_id)
GROUP BY
a.user_id
SELECT
a.*,
b.sub_id
from
#table1 a
LEFT OUTER JOIN #table2 b ON (a.user_id = b.user_id)
First of all, I believe the query you are trying to write as your example is:
select a.user_id
from table1 a, table2 b
where a.user_id = b.user_id
and b.sub_id = (select min(c.sub_id)
from table2 c
where b.user_id = c.user_id)
Except you wanted an outer join (which I think someone edited out the Oracle syntax).
select a.user_id
from table1 a
left outer join table2 b on a.user_id = b.user_id
where b.sub_id = (select min(c.sub_id)
from table2 c
where b.user_id = c.user_id)
Well, you already have a query that works. If you are concerned about the speed you could
Add a field to table2 which
identifies which sub_id is the
'first one' or
Keep track of table2's primary key in table1, or in another table