Hibernate Criteria queries and how to efficiently join data? - sql

"THIS IS MY SQL THAT I WANT TO CONVERT TO CRTIERIA:
select be.* from BlogEntry be join Blog b on be.blog=b.id join Follower f on b.id=f.blogId where be.publishStatus='published' and be.secured=false and f.user=? union select be1.* from BlogEntry be1 join SecureUser s on be1.id=s.blogEntryId join User u on s.userProfile=u.userProfile and u.id=? order by publishDate desc";
hello, folks. i have been trying to use HQL and native SQL to execute the above query and i have been frustrated at every turn, for the most part because doing UNION is super awkward in Hibernate. even if you try a SQLQuery you still have the whole mess of establishing your entity relationships by being FORCED to include every single attribute of every subclass referenced in the SQL. this is proving to be a total pain to get past.
SO, i am moving on to a possible criteria query solution, but i think i need some help. the query below is totally fine in MySQL workbench and fast as lightning. the hump i am trying to get over with the criteria query is that some of my entity relationships are defined by foreign key references in the tables and some are not. when they ARE, i can, of course, do something like this (which evaluates part of my query, before the UNION):
ExtendedDetachedCriteria entryDetachedCriteria = extendedDetachedCriteria.forClass(BlogEntry.class);
entryDetachedCriteria.createAlias("blogEntry","blogEntry");
entryDetachedCriteria.createAlias("blogEntry.blog", "blog");
etc, etc...
HOWEVER, when i am joining data in a different way, like in this portion of the SQL:
select be1.* from BlogEntry be1 join SecureUser s on be1.id=s.blogEntryId (no actual foreign key relationship defined in the tables, SecureUser entities are just stamped with the relevant BlogEntry ID when they are created)
how should i write the criteria queries differently from the way demonstrated above?
i realize that questions like these are a total pain to get your head around if you are not already knee deep in trying to solve - please excuse the convoluted-ness of the question i am asking. i would deeply appreciate any guidance someone could offer, even if it's "get your hibernate act together, ya doofus!". sort of stuck at the moment.

createAlias creates an inner join using an association between entities. So,
criteria.createAlias("blogEntry","blogEntry");
is equivalent to the following HQL:
inner join rootEntity.blogEntry as blogEntry.
The root entity is BlogEntry. And I guess you don't have a blogEntry field in the BlogEntry entity. So this line doesn't make sense.
If you don't have any association between two entities, you can't make a join. You're reduced to making an inner join in the form of an equality between two fields in the where clause:
select be1 from BlogEntry be1, SecureUser s
where be1.id = s.blogEntryId
But since Criteria only allows to select from one root entity, and a series of joined associations, it's impossible to do this using Criteria.
Your best bets are:
to do it in SQL
to do it using 2 separate HQL queries, and join the results using Java.

Related

Why is it necessary to explicitly specify the foreign keys when joining tables in SQL?

SQL server is aware of table dependencies based on foreign keys, so why is it necessary to explicitly specify a JOIN ON foreign keys?
Real world working example (This query works):
SELECT * FROM users
INNER JOIN roles ON users.role_id=roles.id
Implicit example (This query doesn't work):
SELECT * FROM users
INNER JOIN roles
Shouldn't SQL implicitly and correctly assume that if no ON keyword is specified joining should be done on the foreign keys?
I understand that the benefit of this may be trivial, but after leveraging this feature in SQL APIs such as Java Hibernate's query language I can't see why this wouldn't be built in to SQL.
EDIT
Thanks for the answers so far. Although they are interesting, none of them answer the original question regarding SQL Server.
SQL does sort-of support this notion. The standard includes natural join, which SQL Server has happily not implemented. This allows you to do:
SELECT *
FROM users u NATURAL JOIN
roles r;
A natural join has no on clause.
Alas, it does something slightly different from what you suggest. Instead of using foreign keys, it simply uses columns with the same name. I consider this an abomination, because SQL does have explicit foreign key declarations and this would be the right place to use them.
It's quite normal for tables to have multiple foreign key relationships between them.
In this case, what would you expect the database to do when there are two FKs between two tables. Pick one at random?
Typical example:
A table CLIENT.
A table PURCHASE. Since a purchase happens between two clients it needs two FKs to CLIENT: seller_id, and buyer_id, both pointing to CLIENT.
Every time there are implicit operations happening it makes it more difficult to do exactly what you want. Having a natural join seems fast at first but if you have to debug it then you will have to go figure out exactly how the join worked. Was it using foreign keys? Was it just columns with the same name? I'm using NHIBERNATE right now and it seems everyone in my team has a much harder time with it to do exactly what is needed.

What is happening under the hood when a relationship is established between tables?

This question is not limited to Power BI, but it will help me explain my problem.
If you have more than one table in Power BI, you can establish a relationship between them by dragging a column from one table to the other like this:
And you can edit that relationship by clicking the occuring line:
And by the way, here are the structures of the two tables:
# Table1
A,B
1,abc
2,def
3,ghi
4,jkl
# Table2
A,C
1,abc
1,def
2,ghi
3,ghit
This works fine since column A in Table1 consists of unique values and can work as a primary key. And now you can head over to the Report tab, set up two tables, and slice and dice at your hearts desire either by clicking directly under A in Table1, or by introducing a slicer:
But the thing is that you can do that without having established a relationship between the tables. Delete the relationshiop under Relationships and go back to Report and select Home > Manage Relationships to see what I mean:
As the dialog box says 'There are no relationships defined yet.' But you can still subset one table by making selections in the other just like before (EDIT: This statement has been proven wrong in the answer from RADO) . I do know that you can highlight the slicer and select Format > Edit Interactions and deselect the tables associated with the slicer. But I'm still puzzled by the whole thing.
So is there something happening under the hood here that I'm not aware of? Or is the relationship between tables really defined by the very contents of the tables - in the sence that the existence of related values accross tables with the existence of a potential primary key (be it natural or synthetic) makes it possible to query them using SQL, dplyr verbs or any other form of querying techniques. And that you really do not need an explicitly defined relationship?
Or put in another way, does the establishment of a Power BI table relationship have a SQL equivalent? Perhaps like the following:
CREATE TABLE Persons (
ID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int,
PRIMARY KEY (ID)
);
I'm sorry If I'm rambling a bit here, but I'm just very confused. And googling has so far only added to the confusion. So thank you for any insights!
Your statement "But you can still subset one table by making selections in the other just like before" is not correct. This is a key issue here.
Relations enable propagation of filter context in Power BI. That's a very loaded phrase, and you will have to learn what it means if you plan to use Power BI. It's the most important concept to understand.
To see what I mean, you will need to write DAX measures and try to manipulate them using your tables. You will immediately see the difference when you have or don't have relations.
How the whole system works (simplified):
PowerBI contains a language called "DAX". You will create measures in DAX, and PowerBI will then translate them into its internal language called xmSQL, which is a special flavor of SQL. In xmSQL, regular connection is translated into LEFT OUTER JOIN, like this:
SELECT SUM(Sales.Amount)
FROM Sales
LEFT OUTER JOIN Customer
ON Sales.Customer_Key = Customer.Customer_Key
By-directional relations are a bit more complex, but conceptually similar.
Overall, when you create relations between tables, you are telling PowerBI engine how to join the tables. The engine then also adds some optimizations to speed up the queries.
Every time you execute a DAX measure, click a slicer or a visual, PowerBI generates multiple xmSQL statements in the background, executes them, and then renders their results as visuals. You can see these SQL queries with some tools such as DAX Studio.
Note that it's not strictly necessary to establish relations between tables in PowerBI. You can imitate the same behavior using DAX (programmatically), but such "virtual" relations are more complex and can be substantially slower.
In the RM (relational model) & ERM (entity-relationship model) tables represent relation(ship)s/association. Hence, relational in "RM" & relationship in "ERM".
FKs (foreign keys) get erroneously called "relationships" in pseudo-ERM methods. A SQL FK constraint says subrows appear elsewhere as PK (primary key) or UNIQUE. A DBMS uses them to disallow invalid updates & to optimize queries.
Power BI "relationships" are not FKs. They are instructions on how to build queries.
When there is a FK we do often want to join on it. So we often want a Power BI relationship when there is a FK.
Create and manage relationships in Power BI Desktop
(See also its Download PDF link for Developer.)
PS We do not need constraints to hold or be declared or be known to query. The constraints (Including PKs, FKs, UNIQUE & cardinalities) are determined by the table meanings--(characteristic) predicates--& what business situations can arise. If constraints hold then we just sometimes get fewer rows than otherwise & some query pairs always return the same results when otherwise they wouldn't.
Foreign keys are not needed to join tables!
Is there any rule of thumb to construct SQL query from a human-readable description?
PS Cross join is inner join with a TRUE condition (or no condition in some DBMSs), period. Whether there is a "relationship" aka FK is irrelevant. If the condition is FK=PK or anything else other than TRUE then it's not a cross join; otherwise it is a cross join whether or not there is a FK between the tables. It's just that we frequently want PK=FK in a condition & tools can & do use the presence of a FK towards a default condition.
CROSS JOIN vs INNER JOIN in SQL Server 2008
You asked "What is happening under the hood?"
The simple answer is "Statements about relationships."
Many well meaning people draw ER diagrams and seem to either forget or be unaware of the fact that their ER diagrams are really "pictures of statements in language."
The problem is ambiguity.
Many well meaning people jump straight to ER diagrams without also expressing the logical statements on which their ER diagrams are based. In effect, this means that the person who draws the ER diagram seems to expect that the "reader" of the ER diagram will be able reconstruct the statements from which the ER diagram was drawn.
Here is an example to illustrate what I mean. My purpose is to show the linguistic basis of the "under the covers" relationship between Students and their Addresses.
So, what's under the covers is language!
A simple diagram
The statements from which the diagram is derived.
A more complex diagram
The statements from which the diagram is derived.

How do I access the joined columns when using custom Arel joins?

I have a simple database with the following schema:
Book has many Tags through Taggings
Book has many Users through ReadingStatuses
What I want to do is to list all of the books, their tags, and a reading status of the currently logged in user with each book. I've managed to write this using Arel (with the arel-helpers gem), but I don't know how to access the results in each book entry while iterating over the books array.
Here's the query
join_params = Book.arel_table.join(ReadingStatus.arel_table, Arel::OuterJoin)
.on(Book[:id].eq(ReadingStatus[:book_id])
.and(ReadingStatus[:user_id].eq(User.first.id)))
.join_sources
books = Book.all.includes(:tags).joins(join_params)
and the respective SQL it generates
SELECT "books".* FROM "books"
LEFT OUTER JOIN "reading_statuses"
ON "books"."id" = "reading_statuses"."book_id"
AND "reading_statuses"."user_id" = 'XXX'
There's nothing really to be done with the tags, since includes will automatically make everything work when calling book.tags, but what I don't know is how to access the ReadingStatus that is joined to each Book when iterating over the books result?
Try using the includes instead of joins. "includes" does eager fetching, but if you don't mind that it might make you query look a lot simpler.
You will also not have to explicitly mention the left outer join.
See if that helps:
Pulling multiple levels of data efficiently in Rails
Rails: How to fetch records that are 2 'has_many' levels deep?
Eager loading
Make sure you include the call to references
"ReadingStatus[:user_id].eq(User.first.id)" can be shifted into the where clause

Is eager loading same as join fetch?

Is eager fetch same as join fetch?
I mean whether eagerly fetching a has-many relation fires 2 queries or a single join query?
How does rails active record implement a join fetch of associations as it doesnt know the table's meta-data in first hand (I mean columns in the table)? Say for example i have
people - id, name
things - id, person_id, name
person has one-to-many relation with the things. So how does it generate the query with all the column aliases even though it cannot know it when i do a join fetch on people?
An answer hasn't been accepted so I will try to answer your questions as I understand them:
"how does it know all the fields available in a table?"
It does a SQL query for every class that inherits from ActiveRecord::Base. If the class is 'Dog', it will do a query to find the column names of the table 'dogs'. In production mode it should only do this query once per run of the server -- in development mode it does it a lot. The query will differ depending on the database you use, and it is usually an expensive query.
"Say if i have a same name for column in a table and in an associated table how does it resolve this?"
If you are doing a join, it generates sql using the table names as prefixes to avoid ambiguities. In fact, if you are doing a join in Rails and want to add a condition (using custom SQL) for name, but both the main table and join table have a name column, you need to specify the table name in your sql. (e.g. Human.join(:pets).where("humans.name = 'John'"))
"I mean whether eagerly fetching a has-many relation fires 2 queries or a single join query?"
Different Rails versions are different. I think that early versions did a single join query at all times. Later versions would sometimes do multiple queries and sometimes a single join query, based on the realization that a single join query isn't always as performant as multiple queries. I'm not sure of the exact logic that it uses to decide. Recently, in Rails 3, I am seeing multiple queries happening in my current codebase -- but maybe it sometimes does a join as well, I'm not sure.
It knows the columns through a type of reflection. Ruby is very flexible and allows you to build functionality that will be used/defined during runtime and doesn't need to be stated ahead of time. It learns the associated "person_id" column by interpreting the "belongs_to :person" and knowing that "person_id" is the field that would be associated and the table would be called "people".
If you do People.includes(:things) then it will generate 2 queries, 1 that gets the people and a second that gets the things that have a relation to the people that exist.
http://guides.rubyonrails.org/active_record_querying.html

Is there any reason not to join Foreign Key to Foreign Key?

I have the following tables:
Financial:
PK_FinancialID
FK_SchoolID
School:
PK_SchoolID
Class:
PK_ClassID
FK_SchoolID
ClassName
Both Class and Financial have Foreign Key relationships to School. I want to make a query that would show all classes that are related to Financial rows that meet certain criteria.
Initially I think to construct the query as follows:
Select Class.ClassName
From Class
Join School on Class.FK_SchoolID = School.PK_SchoolID
Join Financial on Financial.FK_SchoolID = Schol.PK_SchoolID
Where Financial ... -- define criteria
However, since both Financial and Class are joined on the PK_SchoolID column, it should be possible to rewrite the query as follows (cutting out the School table and joining Class and Financial directly):
Select Class.ClassName
From Class
Join Financial on Financial.FK_SchoolID = Class.FK_SchoolID
Where Financial ... -- define criteria
Which approach is preferable from a sql perspective? Would including the School table make performance better because the actual PK record is referenced (and thus a Clustered Index can be referenced)? Or does that not really matter? Anything that I am missing?
Platform: Sql Server 2005. All tables have their PK and FK columns properly declared and defined.
If you don't need School, don't join School. If you wan't this query to run fast, create index on FK_SchoolID of Financial table. It looks as if you have n-1-1 relation between Class-School-Financial, so you should even create unique index on Financial. You shouldn't (in most cases) add additional tables to make query faster, just optimize used.
EDIT
If you select only ClassName, maybe what you need is:
Select Class.ClassName
From Class
Where Exists
(select * from Financial
where (Financial.FK_SchoolID = Class.FK_SchoolID) and (...))
It may be faster than other solutions and more understandable.
Yes, the index most definitely affects the performance.
Just add an index for the FK_SchoolID in the Financial table so that there is an index that the query can use.
Note that adding another index gives a slight performance hit when you add or delete records in the table. This is often outweighed by the big performance gain you get when querying the table, but it's the reason why you should be somewhat restrictive with adding indexes and don't just add indexes to all fields.
Try the following:
Select Class.ClassName
From Class
Inner Join Financial on Financial.FK_SchoolID = Class.FK_SchoolID
Where Financial....yourcriteria
No need to join school table.
Seems to me that both of your examples are wrong. The fact that a school is listed as financial and that the school offers classes, does not mean that a specific class is a financial class -- it can be an art class from an another course. Seems that this is a weakness of the whole model, nothing to do with your SQL technique -- or maybe I do not understand the underlying model and all special constraints you may have. However, here is an example of a similar model:
One school can offer many courses; a course can be offered by several schools.
Each school may have specific name and description for a "generic" course.
One certificate requires several courses; a course may be required by many certificates.
I'd say you're fine to leave out the actual school table. Don't see anything wrong with that.
As far as performance goes: I'm not really sure, but I'd say it would be faster because you have one less table to join - but I'm not an expert in that area...