Efficently display results from multiple joins - sql

In a JPA project I need to display a table whose data comes from 5 related entities.
Without JPA I could write a sql query which joins the 5 database tables together and filters according to some criteria.
Suppose that the fields involved in the filtering criteria are only those of the first entity.
Using JPA I can load filtered instances of the first entity and navigate through the properties till the final entity.
My concern is that way the number of queries to the database can explode if I cannot use or do a mistake with the fecttype=eager annotation.
Which is the best approach in such cases ?
I would like to have a strict control over the sql queries that will be executed, so I can optimize them, but if I write the sql query with the joins by hand do I have to use the 'old' resultset to retrive the data ?

You can use JPA's built-in query language, the JPQL, can't you? (It does have a JOIN operator for sure.) Be aware though that this is not standard SQL, only something similar, so read the JPQL docs thoroughly. Yes, this is still plain text queries embedded in Java code, which is a shame, but hey, that's how far Java can go supporting the development process.
The main advantage here is that you get entity objects as the result of your queries - although you still need to cast them from Object. You can also use the objects (records) and their member variables (attributes) directly in the query string, so this is a step up from good old JDBC.
Alternatively you could also choose the Criteria API, but frankly, my experiences were not very good with it. The syntax is quite horrible and you basically end up building the low-level query yourself. This is clearly Java at its worst... but at least Strings containing queries can be eliminated from the code. I'm not sure it's worth it though.
Check this page for more information and examples:
http://download.oracle.com/javaee/6/tutorial/doc/gjise.html

Related

Can I use straight SQL in Django models?

And, if I can, does that mean I lose my advantage of treating the results as objects? I find complex queries confusing in many ORMs, not just Django's. But, it is probably because I have never really used an ORM. Does anyone use straight up SQL anymore?
edit: Am I defeating the purpose of having a framework if I bypass the ORM completely? They all have a "nifty" ORM, but when it comes to queries with lots of subqueries, derived tables, it doesn't look pretty.
Using Django's QuerySet API you have different possibilities:
You can use extra() which will return a queryset which evaluates to model objects. Therefore it is, as the name says, somehow limited, because for returning model instances it is necessary to eg. query the model's table. But you have the possibility to add additional SQL eg. the WHERE or ORDER clause. Querysets that use extra() can still use the features of the ORM - like chaining multiple filter() for example.
raw() returns a RawQueryset which also can be iterated over to get model instances, but you loose a lot of features that the ORM would normally provide.
And of course you can execute SQL directly, using a low level connection cursor API (no model instances of course).
Study the documentation on raw queries, there's also a lot of information on eg. how to map a model's fields on the data coming from a raw query and documeting a few gotchas when passing parameters into the query.
To also answer your edited question: I wouldn't use raw SQL when you can do it with the ORM, but of course the ORM is limited and if you need to do some more complex stuff you will always have to switch to SQL (but sometimes using extra() is enough-so you can still use the advantages of the ORM). Don't forget that the ORM works with every DB backend, while the custom SQL might not work with every database.
You can use raw SQL to either return objects; or if you want you can bypass the ORM completely.

Database Type Agnostic Select Query Encapsulation class

I am upgrading a webapp that will be using two different database types. The existing database is a MySQL database, and is tightly integrated with the current systems, and a MongoDB database for the extended functionality. The new functionality will also be relying pretty heavily on the MySQL database for environmental variables such as information on the current user, content, etc.
Although I know I can just assemble the queries independently, it got me thinking of a way that might make the construction of queries much simpler (only for easier legibility while building, once it's finished, converting back to hard coded queries) that would entail an encapsulation object that would contain:
what data is being selected (including functionally derived data)
source (including joined data, I know that join's are not a good idea for non-relational db's, but it would be nice to have the facility just in case, which can be re-written into two queries later for performance times)
where and having conditions (stored as their own object types so they can be processed later, potentially including other select queries that can be interpreted by whatever db is using it)
orders
groupings
limits
This data can then be passed to an interface adapter that can build and execute the query, returning it in an array, or object or whatever is desired.
Although this sounds good, I have no idea if any code like this exists. If so, can anybody point it out to me, if not, are there any resources on similar projects undertaken that might allow me to continue the work and build a basic version?
I know this is a complicated library, but I have been working on this update for the last few days, and constantly switching back and forth has been getting me muddled at times and allowing for mistakes to occur
I would study things like the SQL grammar: http://www.h2database.com/html/grammar.html
Gives you an idea of how queries should be constructed.
You can study existing libraries around LINQ (C#): https://code.google.com/p/linqbridge/
Maybe even check out this link about FQL (Facebook's query language): https://code.google.com/p/mockfacebook/issues/list?q=label:fql
Like you already know, this is a hard problem. It will be a big challenge to make it run efficiently. Maybe consider moving all data from MySQL and Mongo to a third data store that has a copy of all the data and then running queries against that? Replicating all writes to something like Redis or Elastic Search and then write your queries there?
Either way, good luck!

What is the recommendation on using NHibernate CreateSQLQuery?

My gut tells me that advanced NHibernate users would be against it and I have been looking for actual analysis on this and have found nothing, I'd like for the answer to address these questions:
What are the pros/cons of using it?
Are there any performance implications, both good or bad (e.g. use it to call stored procedures?)
In which scenarios should we use/avoid it?
Who should use/avoid it?
basically, what are the reasons to use/avoid it and why?
CreateSQLQuery exists for a reason, which is executing queries that are either:
Not supported
Hard to write
using any of the other methods.
Of course it's usually the last choice, because:
It's not object oriented (i.e. you're back to thinking of tables and columns instead of entities, properties and relationships)
It ties you to the physical model
It ties you to a specific RDBMS
It usually forces you to do more work in order to retrieve entities
It doesn't automatically support features like paging
But if you think it's needed for a particular query, go ahead. Make sure to learn all the other methods first (HQL, Linq, QueryOver, Criteria and Get) to avoid doing unnecessary work.
One of the main reasons to avoid SQL and use HQL is to avoid making the code base dependent on the RDBMS type (e.g. MySQL, Oracle). Another reason is that you have to make your code dependent on the table and column names rather than the entity names and properties.
If you are comparing raw SQL to using the NHibernate LINQ provider there are other compelling reasons to go for LINQ queries (when it works), such as type safety and being able to use VS reference search to determine in what queries a certain table or column is referenced.
My opinion is that CreateSQLQuery() is a "last way out" option. It is there because there are things you cannot do with the other NHibernate APIs but it should be avoided since it more or less goes against the whole idea of using NHibernate in the first place.

Is there any framework for parsing a SQL-like query into its component parts?

I'm interested in writing a SQL-like query syntax for a CMS I work with. The idea would be that a CMS query could be written in a SQL-ish syntax, and I would convert that to execute through the CMS API.
There would be no field or table selection, so I need some way to get from this:
SELECT WHERE Something = 'something' AND (SomethingElse != 'something' OR AnotherThing == 'something')
Essentially then, I need some way to get the WHERE clauses grouped correctly based on their parentheticals and AND/ORs.
Is there some framework for doing this? Some example of when it's been done? I don't want to re-invent the wheel here, and I know someone else has to have done this in the past.
The answer is yes, there are many frameworks that work in an analog of SQL and convert to SQL. Linq and various Linq translators are a prime example. Knowing exactly which CMS you're working with, and thus which language and platform you're developing in, would be helpful. Some .NET ORMs that support code queries are:
NHibernate - allows use of a SQL-ish language called HQL in strings, or more code-based query construction using expression lists and Linq.
Linq2SQL - On its way out, but for your simpler applications it should be fine. The framework generates DAO classes that map between tables and your domain objects, and you can use coded Linq queries to work with the classes very much like the real tables.
And of course you can use good ol' vanilla ADO.NET with a string SQL query. This has numerous drawbacks, but if you want to have queries in your code, why not make them real SQL? If you wanted to hide your table structure, you could translate table names before submitting queries, so the SQL contained at the web layer (shudder) won't run against your DB.

entity framework entity sql vs linq to entities

what's the purpose of entity sql, i mean if you have linq to entities why would you need to write queries in string, are there any performance reasons or something ?
LINQ to Entities does not allow you access to every feature of your database. Being able to "reach into" the database is sometimes necessary for advanced queries, either to pull them off in the first place or to improve the sometimes horrible choices that the LINQ to Entities system will make about your query.
That said, I believe that LINQ to Entities should be the first tool reached for. If the performance becomes a problem, or you have something more complex I would then encapsulate that problem piece in a stored procedure and call that. There is no reason for strings being used as the basis of queries these days.
ESQL does allow you to choose a collation on a where clause, something which isn't supported in LINQ-to-Anything. This can be genuinely useful. ESQL also allows you to specify the precise type you want returned when types inherit from each other (as opposed to LINQ's OfType, which returns instances of a certain type and any subtype). Beyond that, I can't think of a great reason to use it. It's occasionally nice to be able to build queries in strings, but DynamicQuery/Dynamic LINQ is generally good enough in the very rare cases where this is necessary.
I think (perhaps cynically) that the "real" purpose of ESQL is "it predates LINQ."
Regarding Godeke's point of fixing non-optimal queries, I have yet to see one I couldn't fix by changing the LINQ expression. Both ESQL and L2E end up as CCTs, so the SQL generation pipeline is the same.