I'm trying to write a QueryDSL expression for selecting a column value. It's a few joins away from the 'from' table:
a.b.c.get(0).field
where b may be null object, but if not then it will have at most 1 record in c collection.
what I want is something like
new CaseBuilder()
.when(a.b.isNotNull().and(a.b.c.isNotEmpty()))
.then(a.b.c.get(0).field.stringValue())
.otherwise(Expressions.stringTemplate("''"))
This implicitly produces inner joins with b and c tables in SQL, which is not what I want because that returns no results when b is in fact null. Adding explicit left outer joins doesn't stop the implicit joins anyway. I'm pretty sure I'm not thinking about this the right way, please help me get unstuck :-)
When I had added support for implicit joins to jOOQ, I had researched the topic and stumbled upon this "limitation" / design in Hibernate as well. I've reported this issue to the Hibernate-dev mailing list recently as I clearly think this is a bug:
http://lists.jboss.org/pipermail/hibernate-dev/2018-February/017299.html
I don't see why any projection expression should "inadvertently" produce a filter on the FROM clause. This appears to be quite contrary to the intuition that one might build when thinking in terms of relational algebra.
You can read the replies to that email, particularly this rationale by Steve Ebersole:
I was saying that I can no longer look at the
HQL/JPQL and tell what kind of SQL joins will be used for that if it is
dependent on the mapped association. This approach was mentioned earlier
in the thread. But you clarified that you mean that implicit joins ought
to just always be interpreted as an outer join.
You could plugin in your own query handling and interpret implicit joins
however you want. That is current not the most trivial task though.
I don't think this will get fixed any time soon in Hibernate. So, the workaround here is to build all of your outer joins explicitly and not use any implicit joins whenever you expect them to produce outer joins.
Related
I have a few tables in their third normal form and I need to do some cross table queries to get the information I need.
I looked at joins but it seems like it will create a new table. Is this the proper way to perform such queries? Or should I just do nested queries ? I guess it might make sense if I have to do these queries alot? I'm really not sure how well optimize these operations are. I'm using the sequelize ORM and I'm not sure I see any clear solution.
It seems to me you are asking about joins vs subqueries. These are to some extent different. But let's start with a couple of points.
A join creates a new relvar, not a new table. A relvar is a variable standing in for the relation output by the join operation. It is transient (as opposed to a view which would be persistent).
Joins and subqueries are not always perfect substitutes. Sometimes you will need both.
Your query output is also a relvar.
The above being said, generally where possible I think joins are preferable. The major reason is that a SQL query that can be written using the structure below is far easier (as you master the language) to both understand and debug than most alternatives, and also subqueries in column lists necessarily perform badly:
SELECT [column_list]
FROM [initial_table]
[join list]
WHERE [filters]
GROUP BY [grouping list]
HAVING [post-aggregation filters]
LIMIT [limit and offset]
If your query fits the above structure then you can usually expect that specific kinds of problems will occur in logic in specific parts of the query. On the other hand, with subqueries, you have to check these independently.
I'm hoping some of the more experienced database/dwh developers or DBAs can weigh in on this one:
My team is using OBIEE as a front-end tool to drive ad-hoc reporting being done by our business units.
There is a lot of latency when generating sets that are relatively small. We are facing ~1 hour to produce ~50k records.
I looked into one of the queries that is behaving this way, and I was surprised to find that all of the tables being referenced are being cross-joined, and then filters are being applied in the WHERE clause.
So, to illustrate, the queries tend to look like this:
SELECT ...
FROM tbl1
,tbl2
,tbl3
,tbl4
WHERE tbl1.col1 = tbl2.col1
and tbl3.col2 = tbl2.col2
and tbl4.col3 = tbl3.col3
instead of like this:
SELECT ...
FROM tbl1
INNER JOIN tbl2
ON tbl1.col1 = tbl2.col1
INNER JOIN tbl3
ON tbl3.col2 = tbl2.col2
INNER JOIN tbl4
ON tbl4.col3 = tbl3.col3
Now, from what I know about the order of query operations, the FROM clause gets performed before the WHERE clause, so the first example would perform much more slowly than the latter example. Am I correct (please answer only if you know the answer in the context of Oracle DB)? Unfortunately, I don't have the admin rights to run a trace against the 2 different versions of the query.
Is there a reason to set up the query the first way, related to how the OBIEE interface works? Remember, the query is the result of a user drag-and-dropping attributes into a sandbox, from a 'bank' of attributes. Selecting any combination of the attributes is supposed to generate output (if the data exists). The attributes come from many different tables. I don't have any experience in designing the mecahnism that generates the SQL based on this kind of ad-hoc attribute selection, so I don't know whether the query design in the first example is required to service this kind of reporting tool.
Don't worry, historically Oracle used the first notation for inner joins but later on adopted ANSI SQL standards.
The results in terms of performance and returned recordsets are exactly the same, the implicit 'comma' joins are not crossing resultset but effectively integrating the WHERE filters. If you doubt it, run an EXPLAIN SELECT command for both queries and you will see the forcasted algorithms will be identical.
Expanding this answer you may notice in the future the analogous notation (+) in place of outer joins. This answer will also stand correct in that context.
The real issue comes when both notations (implicit and explicit joins) are mixed in the same query. This would be asking for trouble big time, but I doubt you find such a case in OBIEE.
Those are inner joins, not cross joins, they just use the old syntax for doing it rather than ANSI as you were expecting.
Most join queries contain at least one join condition, either in the FROM clause or in the WHERE clause. (Oracle Documentation)
For a simple query such as in your example the execution should be exactly the same.
Where you have set outer joins (in the business model join) you will see OBI produce a query where the inner joins are made in the WHERE clause and the outer joins are done ANSI in the FROM statement – just to make things really hard to debug!
SELECT ...
FROM tbl1
,tbl2
,tbl3 left outer join
tbl4 on tbl3.col1 = tbl4.col2
WHERE tbl1.col1 = tbl2.col1
and tbl3.col2 = tbl2.col2
and tbl4.col3 = tbl3.col3
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Is there something wrong with joins that don't use the JOIN keyword in SQL or MySQL?
Hi,
i'ave always retrieved data without joins...
but is there a benefit to one method over the other?
select * from a INNER JOIN b on a.a = b.b;
select a.*,b.* from a,b where a.a = b.b;
Thanks!
The first method using the INNER JOIN keyword is:
ANSI SQL standard
much cleaner and more expressive
Therefore, I always cringe when I see the second option used - it just bloats up your WHERE clause, and you can't really see at one glance how the tables are joined (on what fields).
Should you happen to forget one of the JOIN conditions in a long list of WHERE clause expressions, you suddenly get a messy cartesian product..... can't do that with the INNER JOIN keyword (you must express what field(s) to join on).
I'd say the biggest benefit is readability. The version with the explicitly named join types are much easier for me to comprehend.
You are using a different syntax for a JOIN, basically. As a matter of best practices, it is best to use the first syntax (explicit JOIN) because it is clearer what the intention of the query is and makes the code easier to maintain.
These are both joins. they are just two different syntactical representations for joins. The first one, (using the "Join" keyword, is the current ANSI Standard (as of 1992 I think).
In the case of inner joins only, the two differeent representations are functionally identical, but the latter ANSI SQL92 standard syntax is much moire readable, once you get used to it, because each individual join condition is associated with the pair of intermediate resultsets being joined together, In the older representation, the join conditions are all together, along with the overall queries' filter conditions, in the where clause, and it is not as clear which is which. This makes identifying bad join conditions (where for example, an unintended cartesian product will be generated) much more difficult.
But more important, perhaps, is that, when performing an outer Join, in certain scenarios, the older syntax is NOT equivilent, and in fact will generate the WRONG resultset.
You should transition to the newer syntax for all your queries.
You've always retrieved the data with joins. The second query is using old syntax, but in the background it is still join :)
This depends on the RDBMS but in the case of SQL server I understand that the utilizing the former syntax allows for better optimization. This is less of a SQL question and more of a vendor specific question.
You can also use the EXPLAIN (SQL Server: Query Execution Plan) type functions to help you understand if there is a difference. Each query is unique and I imagine that the stored statistics can (and will) alter the behavior.
It seems like to combine two or more tables, we can either use join or where. What are the advantages of one over the other?
Any query involving more than one table requires some form of association to link the results from table "A" to table "B". The traditional (ANSI-89) means of doing this is to:
List the tables involved in a comma separated list in the FROM clause
Write the association between the tables in the WHERE clause
SELECT *
FROM TABLE_A a,
TABLE_B b
WHERE a.id = b.id
Here's the query re-written using ANSI-92 JOIN syntax:
SELECT *
FROM TABLE_A a
JOIN TABLE_B b ON b.id = a.id
From a Performance Perspective:
Where supported (Oracle 9i+, PostgreSQL 7.2+, MySQL 3.23+, SQL Server 2000+), there is no performance benefit to using either syntax over the other. The optimizer sees them as the same query. But more complex queries can benefit from using ANSI-92 syntax:
Ability to control JOIN order - the order which tables are scanned
Ability to apply filter criteria on a table prior to joining
From a Maintenance Perspective:
There are numerous reasons to use ANSI-92 JOIN syntax over ANSI-89:
More readable, as the JOIN criteria is separate from the WHERE clause
Less likely to miss JOIN criteria
Consistent syntax support for JOIN types other than INNER, making queries easy to use on other databases
WHERE clause only serves as filtration of the cartesian product of the tables joined
From a Design Perspective:
ANSI-92 JOIN syntax is pattern, not anti-pattern:
The purpose of the query is more obvious; the columns used by the application is clear
It follows the modularity rule about using strict typing whenever possible. Explicit is almost universally better.
Conclusion
Short of familiarity and/or comfort, I don't see any benefit to continuing to use the ANSI-89 WHERE clause instead of the ANSI-92 JOIN syntax. Some might complain that ANSI-92 syntax is more verbose, but that's what makes it explicit. The more explicit, the easier it is to understand and maintain.
These are the problems with using the where syntax (other wise known as the implicit join):
First, it is all too easy to get accidental cross joins because the join conditions are not right next to the table names. If you have 6 tables being joined together, it is easy to miss one in the where clause. You will see this fixed all too often by using the distinct keyword. This is ahuge performance hit for the database. You can't get an accidental cross join using the explicit join syntax as it will fail the syntax check.
Right and left joins are problematic (In SQl server you are not guaranteed to get the correct results) in the old syntax in some databases. Further they are deprecated in SQL Server I know.
If you intend to use a cross join, that is not clear from the old syntax. It is clear using the current ANSII standard.
It is much harder for the maintainer to see exactly which fields are part of the join or even which tables join together in what order using the implicit syntax. This means it might take more time to revise the queries. I have known very few people who, once they took the time to feel comfortable with the explicit join syntax, ever went back to the old way.
I've also noticed that some people who use these implicit joins don't actually understand how joins work and thus are getting incorrect results in their queries.
Honestly, would you use any other kind of code that was replaced with a better method 18 years ago?
Most people tend to find the JOIN syntax a bit clearer as to what is being joined to what. Additionally, it has the benefit of being a standard.
Personally, I "grew up" on WHEREs, but the more I use the JOIN syntax the more I'm starting to see how it's more clear.
Explicit joins convey intent, leaving the where clause to do the filtering. It is cleaner and it is standard, and you can do things such as left outer or right outer which is harder to do only with where.
You can't use WHERE to combine two tables. What you can do though is to write:
SELECT * FROM A, B
WHERE ...
The comma here is equivalent to writing:
SELECT *
FROM A
CROSS JOIN B
WHERE ...
Would you write that? No - because it's not what you mean at all. You don't want a cross join, you want an INNER JOIN. But when you write comma, you're saying CROSS JOIN and that's confusing.
Actually you often need both "WHERE" and "JOIN".
"JOIN" is used to retrieve data from two tables - based ON the values of a common column. If you then want to further filter this result, use the WHERE clause.
For example, "LEFT JOIN" retrieves ALL rows from the left table, plus the matching rows from the right table. But that does not filter the records on any specific value or on other columns that are not part of the JOIN. Thus, if you want to further filter this result, specify the extra filters in the WHERE clause.
EDIT 9-3-10: I found this blog entry recently that was very enlightening. http://optimizermagic.blogspot.com/2007/12/outerjoins-in-oracle.html
There are times when one or the other join syntax may in fact perform better. I have also found times when a have noticed a slight performance increase (only noticeable in VLDBs) when choosing the Oracle join syntax over the ANSI one. Probably not enough to get fussy over, but for those serious about mastering the Oracle DB, it may be helpful to review the article.
I am aware of two outer join syntaxes for Oracle:
select a, b
from table1
left outer join table2
on table2.foo = table1.foo
OR
select a, b
from table1, table2
where table2.foo(+) = table1.foo
(assuming I got the syntax of the second sample right.)
Is there a performance difference between these? At first I thought it must just be a style preference on the part of the developer, but then I read something that made me think maybe there would be a reason to use one style instead of the other.
"maybe there would be a reason to use
one style instead of the other. "
There are reasons, but not performance related ones. The ANSI style outer joins, as well as being standard, offer FULL OUTER JOINs and outer joins to multiple tables.
Oracle didn't support ANSI syntax prior to version 9i.
Since that version, these queries do the same and yield the same plan.
Correct pre-9i syntax is this:
SELECT a, b
FROM table1, table2
WHERE table2.foo(+) = table1.foo
There is no performance difference. You can also check the execution plans of both queries to compare.
Theoretically, the second query performs the Cartesian product of the two tables and then selects those meeting the join condition. In practice, though, the database engine will optimize it exactly the same as the first.
I found some additional information in answer to my own question. Looks like the old style is very limiting, as of this doc from 3 years ago.
http://www.freelists.org/post/oracle-l/should-one-use-ANSI-join-syntax-when-writing-an-Oracle-application,2
I think perhaps it would only make sense to use the old style if for some reason the queries might be run on an outdated version of Oracle.
The stuff I see at work is almost all in the old style, but it's probably just because the consultants have been working in Oracle since before 9i and they likely didn't see a reason to go update all the old stuff.
Thanks all!
It's not the same. In the first case you're forcing to join the tables in that order.
In the second case Oracle Planner can choose the best option to execute the query.
In this trivial case the result probably will be the same in all the executions, but if you use that syntax in more complex cases the difference will be shown.