Related
I am studying jpa and hibernate. I see in the repository class:
#Query(value = "FROM AccountTable")
List<Account> findAllAccount();
Is #Query(value = "FROM AccountTable") equal to a select * from AccountTable, without filters? Is it correct?
First of all, #Query is not jpa, but spring data jpa annotation.
According to the hibernate documentation:
The select statement in JPQL is exactly the same as for HQL except that JPQL requires a select_clause, whereas HQL does not.
So, the mentioned in your question query with HQL specific syntax equals to the following JPQL:
#Query("select a FROM AccountTable a")
List<Account> findAllAccount();
Please note that the last one query is more clear and suggested as a good practice.
Even though HQL does not require the presence of a select_clause, it is generally good practice to include one. For simple queries the intent is clear and so the intended result of the select_clause is easy to infer. But on more complex queries that is not always the case.
It is usually better to explicitly specify intent. Hibernate does not actually enforce that a select_clause be present even when parsing JPQL queries, however, applications interested in JPA portability should take heed of this.
I'm trying to write a QueryDSL expression for selecting a column value. It's a few joins away from the 'from' table:
a.b.c.get(0).field
where b may be null object, but if not then it will have at most 1 record in c collection.
what I want is something like
new CaseBuilder()
.when(a.b.isNotNull().and(a.b.c.isNotEmpty()))
.then(a.b.c.get(0).field.stringValue())
.otherwise(Expressions.stringTemplate("''"))
This implicitly produces inner joins with b and c tables in SQL, which is not what I want because that returns no results when b is in fact null. Adding explicit left outer joins doesn't stop the implicit joins anyway. I'm pretty sure I'm not thinking about this the right way, please help me get unstuck :-)
When I had added support for implicit joins to jOOQ, I had researched the topic and stumbled upon this "limitation" / design in Hibernate as well. I've reported this issue to the Hibernate-dev mailing list recently as I clearly think this is a bug:
http://lists.jboss.org/pipermail/hibernate-dev/2018-February/017299.html
I don't see why any projection expression should "inadvertently" produce a filter on the FROM clause. This appears to be quite contrary to the intuition that one might build when thinking in terms of relational algebra.
You can read the replies to that email, particularly this rationale by Steve Ebersole:
I was saying that I can no longer look at the
HQL/JPQL and tell what kind of SQL joins will be used for that if it is
dependent on the mapped association. This approach was mentioned earlier
in the thread. But you clarified that you mean that implicit joins ought
to just always be interpreted as an outer join.
You could plugin in your own query handling and interpret implicit joins
however you want. That is current not the most trivial task though.
I don't think this will get fixed any time soon in Hibernate. So, the workaround here is to build all of your outer joins explicitly and not use any implicit joins whenever you expect them to produce outer joins.
we are using SAP HANA 1.0 SPS12. AS INTERSECT,MINUS,EXCEPT nodes are not available in graphical mode. We need to rely on LEFT OUTER JOIN or use below method-
https://blogs.sap.com/2014/03/02/thinking-in-hana-part-1-set-operators/
I have tables of volumes having 1 billion rows.
Can anyone suggest which method is better using LEFT OUTER JOIN or realising INTERSECT through UNION or using scripted view with INTERSECT operator from performance point of view?
With HANA 1 SPS 12 it’s perfectly OK to use table functions as part of graphical models.
Trying to emulate complex query operations usually worsens both performance and maintainability.
If your project „doesn’t allow“ using table functions the answer must be: fix that rule instead of producing a twisted view logic.
As for the „outer join is faster“ - that’s not true per se. Inner joins have a stricter semantic in that they have to be executed in every case, whereas outer joins allow to avoid the actual computation of the join in cases where the result set won’t be impacted by this. That means, when the conditions are fulfilled, outer joins can simply be avoided, which is of course faster than execution a join.
What the difference between the two below SQL Statements (one uses INNER JOIN, and the second uses the from clause) (Performance, execution time..), and is there any cases i must use one instead of the other?
SELECT Tbl1_Fld1, Tbl2_Fld1 FROM DB1..TABLE1
INNER JOIN DB2..TABLE1
on DB1..TABLE1 .Tbl1_Fld1 = DB2..TABLE1.Tbl2_Fld1
SELECT Tbl1_Fld1, Tbl2_Fld1 FROM DB1..TABLE1,DB2..TABLE1
WHERE DB1..TABLE1 .Tbl1_Fld1 = DB2..TABLE1.Tbl2_Fld1
In a perfect world, those should be equivalent except that the first one better documents what you want to achieve (join two tables and then search the result).
Alas, history, bugs, features, optimizers and other obstacles make this much more complicated than it needs to be.
Some databases simply don't support INNER JOIN even though it's a SQL standard syntax.
Other have bugs for certain data types, so the join won't work or will be very slow.
So in reality, you will have to run these with suitable test data to find out. There is no way to say for sure just by looking at the SQL. Sometimes, there isn't even a way to say for sure when you can run it because changes in the underlying data can have a huge impact (for example, Oracle can suddenly decide to ignore the index because too many rows in the table have been changed).
Given a sane query optimizer, there shouldn't be a difference between the two.
Use INNER JOIN if your engine supports it, basically for clarity of join condition vs filter separation
SELECT Tbl1_Fld1, Tbl2_Fld1
FROM
DB1..TABLE1,DB2..TABLE1
WHERE
DB1..TABLE1 .Tbl1_Fld1 = DB2..TABLE1.Tbl2_Fld1 AND
DB1..TABLE1 .Tbl1_Fld1 = 'foo' AND DB2..TABLE1.Tbl2_Fld1 = 1
vs
SELECT Tbl1_Fld1, Tbl2_Fld1
FROM
DB1..TABLE1
INNER JOIN
DB2..TABLE1 ON DB1..TABLE1 .Tbl1_Fld1 = DB2..TABLE1.Tbl2_Fld1
WHERE
DB1..TABLE1 .Tbl1_Fld1 = 'foo' AND DB2..TABLE1.Tbl2_Fld1 = 1
There should be no difference in execution for most engines
However, and there's always a "however", not all joins are INNER. What about an OUTER join?
The *= and =* is deprecated in SQL Server for example.
It was in Books online in SQL Server 2000. Here is the quote
In earlier versions of Microsoft® SQL
Server™ 2000, left and right outer
join conditions were specified in the
WHERE clause using the *= and =*
operators. In some cases, this syntax
results in an ambiguous query that can
be interpreted in more than one way.
SQL-92 compliant outer joins are
specified in the FROM clause and do
not result in this ambiguity. Because
the SQL-92 syntax is more precise,
detailed information about using the
old Transact-SQL outer join syntax in
the WHERE clause is not included with
this release. The syntax may not be
supported in a future version of SQL
Server. Any statements using the
Transact-SQL outer joins should be
changed to use the SQL-92 syntax.
So if you have a more complex query with both inner and outer joins then it's going to become confusing if you mix styles. So, add consistency to clarity mentioned above
No new queries should be written using the "From " syntax (really called an implicit join). And personally I would rewrite any I came into contact with because it is simply a bad coding practice.
This syntax is highly subject to error from accidental cross joins and harder to mainatin and harder to read correctly. If you need to add a left join later you may get incorrect results if you don't convert to explicit joins. And sometimes you need a cross join but with the implicit syntax, you don't know if the cross join was a mistake (a common one using this syntax) or deliberate, therefore the maintainer doesn't know whether to change it or not. It was replaced in 1992 for goodness sakes. There is no excuse for not using the explicit join syntax in 2010.
I am curious on how exactly LINQ (not LINQ to SQL) is performing is joins behind the scenes in relation to how Sql Server performs joins.
Sql Server before executing a query, generates an Execution Plan. The Execution Plan is basically an Expression Tree on what it believes is the best way to execute the query. Each node provides information on whether to do a Sort, Scan, Select, Join, ect.
On a 'Join' node in our execution plan, we can see three possible algorithms; Hash Join, Merge Join, and Nested Loops Join. Sql Server will choose which algorithm to for each Join operation based on expected number of rows in Inner and Outer tables, what type of join we are doing (some algorithms don't support all types of joins), whether we need data ordered, and probably many other factors.
Join Algorithms:
Nested Loop Join:
Best for small inputs, can be optimized with ordered inner table.
Merge Join:
Best for medium to large inputs sorted inputs, or an output that needs to be ordered.
Hash Join:
Best for medium to large inputs, can be parallelized to scale linearly.
LINQ Query:
DataTable firstTable, secondTable;
...
var rows = from firstRow in firstTable.AsEnumerable ()
join secondRow in secondTable.AsEnumerable ()
on firstRow.Field<object> (randomObject.Property)
equals secondRow.Field<object> (randomObject.Property)
select new {firstRow, secondRow};
SQL Query:
SELECT *
FROM firstTable fT
INNER JOIN secondTable sT ON fT.Property = sT.Property
Sql Server might use a Nested Loop Join if it knows there are a small number of rows from each table, a merge join if it knows one of the tables has an index, and Hash join if it knows there are a lot of rows on either table and neither has an index.
Does Linq choose its algorithm for joins? or does it always use one?
The methods on System.Linq.Enumerable are performed in the order they are issued. There is no query optimizer at play.
Many methods are very lazy, which allows you to not fully enumerate the source by putting .First or .Any or .Take at the end of the query. That is the easiest optimization to be had.
For System.Linq.Enumerable.Join specifically, the docs state that this is a hash join.
The default equality comparer, Default, is used to hash and compare keys.
So examples:
//hash join (n+m) Enumerable.Join
from a in theAs
join b in theBs on a.prop equals b.prop
//nestedloop join (n*m) Enumerable.SelectMany
from a in theAs
from b in theBs
where a.prop == b.prop
Linq to SQL does not send join hints to the server. Thus the performance of a join using Linq to SQL will be identical to the performance of the same join sent "directly" to the server (i.e. using pure ADO or SQL Server Management Studio) without any hints specified.
Linq to SQL also doesn't allow you to use join hints (as far as I know). So if you want to force a specific type of join, you'll have to do it using a stored procedure or the Execute[Command|Query] method. But unless you specify a join type by writing INNER [HASH|LOOP|MERGE] JOIN, then SQL Server always picks the type of join it thinks will be most efficient - it doesn't matter where the query came from.
Other Linq query providers - such as Entity Framework and NHibernate Linq - will do exactly the same thing as Linq to SQL. None of these have any direct knowledge of how you've indexed your database and so none of them send join hints.
Linq to Objects is a little different - it will (almost?) always perform a "hash join" in SQL Server parlance. That is because it lacks the indexes necessary to do a merge join, and hash joins are usually more efficient than nested loops, unless the number of elements is very small. But determining the number of elements in an IEnumerable<T> might require a full iteration in the first place, so in most cases it's faster just to assume the worst and use a hashing algorithm.
LINQ itself does not chose algorithms of any kind, as LINQ, strictly speaking, is simply a way of expressing a query in SQL-like syntax that can map to function calls on either IEnumerable<T> or IQueryable<T>. LINQ is entirely a language feature and does not provide functionality, only another way of expressing existing function calls.
In the case of IQueryable<T>, it's entirely up to the provider (such as LINQ to SQL) to chose the best method of producing the results.
In the case of LINQ to Objects (using IEnumerable<T>), simple enumeration is what's used (roughly equivalent to nested loops) in all cases. There is no deep inspection (or even knowledge of) the underlying data types in order to optimize the query.