Is there any framework for parsing a SQL-like query into its component parts?

Is there any framework for parsing a SQL-like query into its component parts? - sql

I'm interested in writing a SQL-like query syntax for a CMS I work with. The idea would be that a CMS query could be written in a SQL-ish syntax, and I would convert that to execute through the CMS API.
There would be no field or table selection, so I need some way to get from this:
SELECT WHERE Something = 'something' AND (SomethingElse != 'something' OR AnotherThing == 'something')
Essentially then, I need some way to get the WHERE clauses grouped correctly based on their parentheticals and AND/ORs.
Is there some framework for doing this? Some example of when it's been done? I don't want to re-invent the wheel here, and I know someone else has to have done this in the past.

The answer is yes, there are many frameworks that work in an analog of SQL and convert to SQL. Linq and various Linq translators are a prime example. Knowing exactly which CMS you're working with, and thus which language and platform you're developing in, would be helpful. Some .NET ORMs that support code queries are:
NHibernate - allows use of a SQL-ish language called HQL in strings, or more code-based query construction using expression lists and Linq.
Linq2SQL - On its way out, but for your simpler applications it should be fine. The framework generates DAO classes that map between tables and your domain objects, and you can use coded Linq queries to work with the classes very much like the real tables.
And of course you can use good ol' vanilla ADO.NET with a string SQL query. This has numerous drawbacks, but if you want to have queries in your code, why not make them real SQL? If you wanted to hide your table structure, you could translate table names before submitting queries, so the SQL contained at the web layer (shudder) won't run against your DB.

Related

Efficently display results from multiple joins

In a JPA project I need to display a table whose data comes from 5 related entities.
Without JPA I could write a sql query which joins the 5 database tables together and filters according to some criteria.
Suppose that the fields involved in the filtering criteria are only those of the first entity.
Using JPA I can load filtered instances of the first entity and navigate through the properties till the final entity.
My concern is that way the number of queries to the database can explode if I cannot use or do a mistake with the fecttype=eager annotation.
Which is the best approach in such cases ?
I would like to have a strict control over the sql queries that will be executed, so I can optimize them, but if I write the sql query with the joins by hand do I have to use the 'old' resultset to retrive the data ?

You can use JPA's built-in query language, the JPQL, can't you? (It does have a JOIN operator for sure.) Be aware though that this is not standard SQL, only something similar, so read the JPQL docs thoroughly. Yes, this is still plain text queries embedded in Java code, which is a shame, but hey, that's how far Java can go supporting the development process.
The main advantage here is that you get entity objects as the result of your queries - although you still need to cast them from Object. You can also use the objects (records) and their member variables (attributes) directly in the query string, so this is a step up from good old JDBC.
Alternatively you could also choose the Criteria API, but frankly, my experiences were not very good with it. The syntax is quite horrible and you basically end up building the low-level query yourself. This is clearly Java at its worst... but at least Strings containing queries can be eliminated from the code. I'm not sure it's worth it though.
Check this page for more information and examples:
http://download.oracle.com/javaee/6/tutorial/doc/gjise.html

Is there an existing piece of software that allows you to (easily) build queries throught a webpage?

I would like to build arbitrary queries to a database, by allowing the user to build queries "on the fly". For every object/table, being able to select its attributes, and then "building" the query (that would translate into a SQL statement) and finally launching it, all through a web interface.
The ticketing system "rt" does that, for example, and another example would be the http://gatherer.wizards.com/Pages/Advanced.aspx webpage.
I'm currently programming in rails but any existing solution that implements this (or something similar) would be welcome.

Just be careful when creating dynamically generated queries like this that will need to be executed via sp_executesql (example: ms sql server), etc..... make sure you cover all of your bases to ensure that your application isnt vulnerable to SQL injection attacks as this type of development will essentially get one in a lot of trouble if its done incorrectly.. I would recommend storing all queries in a table and only reading queries from this table to help isolate the queries that are being ran in your application. Just identify them with a label, and allow the EU to choose the label from a dropdown list control on the frontend.
Good luck and I'm not sure of any software that will help assist

Not quite sure what your use case is here but i would say check out the
Doctrine ORM ( Object Relational Mapper )
**Edit
After reading more and looking at the example. I would only suggest Doctrine for a large website.
Then use Doctrines DQL syntax with some javascript/jquery magic for the forms.

Note that the queries you're referencing aren't arbitrary: they're on a very specific problem domain, on a specific set of sql tables.
That said, if I were you I'd look into how people are building sql queries with javascript. Something like these:
http://code.google.com/p/django-querybuilder/
http://css.dzone.com/articles/sqlike-sql-querying-engine?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+zones%2Fria+(RIA+Zone)
http://thechangelog.com/post/4914956307/rel-arel-ported-to-node-js-with-some-changes
That'll at least get you a good idea of the underlying data structures.

Complex search across grails domain classes

I want to be able to do complex searches on grails objects. This is presently implemented using stored procedures on my site that build up a SQL query and parses all of those results.
Will something like the searchable plugin allow me to simplify this task? My guess is not, since it mostly is doing text based searching. The stored procedures are quite complex, and hard to change. Our users are employees, and the queries have to do with amounts of job experience and who they were working for, what skills they have, etc. The Employee domain object would have things like a list of roles that contain skills. The role would have a start and end date, etc.
A list of example queries:
All users with 5 years of experience in C++
All users who have worked for Stackoverflow, in California
All users who have at least 5 years of C++ experience, at least 2 years of Java experience, have worked for StackOverflow, and are available to work now.

I've never tried the searchable plugin so may be selling it short. Your best bet is probably HQL queries or Hibernate criteria builder. I like HQL for complex queries since it's similar to SQL. For a blog post comparing the use of these technologies from Grails see
http://blog.xebia.com/2008/06/04/querying-associations-in-grails-with-hql-criteria-and-hibernatecriteriabuilder/
For the HQL reference see
http://docs.jboss.org/hibernate/core/3.3/reference/en/html/queryhql.html
For Hibernate criteria see
http://docs.jboss.org/hibernate/core/3.3/reference/en/html/querycriteria.html

You could also have a look at the FilterPane plugin to help you (or the user) to build these queries. There is a GUI that can handle most of such questions, or you could resort to add the FilterPane fields yourself in the HTML and controller if you prefer to have the control over the queries.
FilterPane then translates the fields to Hibernate criteria builder internally.

All of the queries you have listed can be accomplished with the searchable plugin.
I believe you CAN do the queries you have suggested with HQL but Compass/Lucene is really the better tool for that solution

In my experience HQL queries were the only solution for complex queries.
Some queries even made it necessary to use non-Hibernate functions of the underlying DB, e.g. setting dialect = "org.hibernate.dialect.ExtendedMySqlDialect" in DataSource.groovy and then implementing something like this:
package org.hibernate.dialect;
import org.hibernate.Hibernate;
import org.hibernate.dialect.function.*;
public class ExtendedMySqlDialect extends MySQL5InnoDBDialect {
public ExtendedMySqlDialect() {
registerFunction("timeStampAdd", new SQLFunctionTemplate(Hibernate.TIMESTAMP, "TIMESTAMPADD(?1, ?2, ?3)"));
registerFunction("timeStampDiff", new SQLFunctionTemplate(Hibernate.INTEGER, "TIMESTAMPDIFF(?1, ?2, ?3)"));
}
The above functions might also help you when dealing with dates.

When to use an ORM (Sequel, Datamapper, AR, etc.) vs. pure SQL for querying

A colleague of mine is currently designing SQL queries like the one below to produce reports, which are displayed in excel files through an external data query.
At present, only reporting processes on the DB are required (no CRUD operations).
I am trying to convince him that it would be better to use a ruby ORM in order to be able to display the data in a rails/sinatra app.
Despite the obvious advantages in displaying the data, what advantages are there for him in learning to use an ORM like Sequel or Datamapper?
The SQL queries he is writing are clearly quite complex, and being relatively new to SQL, he often complains that it is very time-consuming and confusing.
Is it possible to write extremely complex queries with an ORM? and if so, which is the most suitable(I have heard Sequel is good for legacy dbs)? and what are the advantages of learning ruby and using an ORM versus sticking with plain SQL, in making complex database queries?

I'm the DataMapper maintainer, and I think for complex reporting you should use SQL.
While I do think someday we'll have a DSL that provides the power and conciseness of SQL, everything I've seen so far requires you to write more Ruby code than SQL for complex queries. I would much rather maintain a 5 line SQL query than 10-15 lines of Ruby code to describe the same complex operation.
Please note I say complex.. if you have something simple, use the ORM's build-in finders. However, I do believe there is a line you can cross where SQL becomes simpler. Now, most apps aren't just reporting. You may have alot of CRUD type operations, for which an ORM is perfectly suited and far better than doing those things by hand.
One thing that an ORM will usually provide is some sort of organization to your application logic. You can group code based around each model in the same file. It's usually there that I'll put the complex SQL query, rather than embedding it in the controller, eg:
class User
include DataMapper::Resource
property :id, Serial
property :name, String, :length => 1..100, :required => true
property :age, Integer, :min => 1, :max => 130
def self.some_complex_query
repository.adapter.select <<-SQL
SELECT ...
FROM ...
WHERE ...
... more complex stuff here ...
SQL
end
end
Then I can just generate the report using User.some_complex_query. You could also push the SQL query into a view if you wanted to further cleanup this code.
EDIT: By "view" in the above sentence I meant RDBMS view, rather than view in the MVC context. Just wanted to clear up any potential confusion.

If you are writing your queries by hand you have the chance to optimize them. When I look at that query I see some potential for optimizations (E.ICGROUPNAME LIKE '%san-fransisco%' or E.ICGROUPNAME LIKE '%bordeaux%' wont use an index = Table Scan).
When using an OR Mapper (the native Objects/Tables) for reporting you have no or little control over the resulting SQL Query.
But: You could put that query in an View or Stored Procedure and map that View/Proc with an OR Mapper. You can optimize your queries and you can use all features of your Application Framework.

Unless you're dealing with objects, an ORM is not necessary. It sounds like your friend simply needs to generate reports, in which case pure SQL is just fine so long as he knows what he's doing (e.g. avoiding SQL injection issues).
ORM stands for "Object-Relational Mapping". If you don't have the "O" (objects), then it's probably not a good fit for your app. Where ORMs really shine is in persisting objects to the database and loading them from a database.

ORM stands for Object Relational Mapping - but looking at the query your friend seems to be wanting a pretty specific table of sums and other items... I've not used Ruby's Sequel, but I've used Hibernate, and Python's SQLAlchemy (for Django/Turbogears) and while you can do these sorts of queries, I don't believe that is their strength.
The power of ORM comes from being able to finding Foo->Bar object relationships, say you want all the Bar objects for Foo's field greater then X... That sort of thing. Therefore I would not classify an ORM as a "good" solution, though moving to a real programming language like Ruby and doing the SQL through it instead of Excel... that in itself is a win.
Just my 2 cents.

In a situation like that, I'd probably write them by hand or use a View (if the DB you're using supports views)

ORM's are used when you have Objects (Business Objects). I am therefore assuming that you have an application with which you creating and Managing the Business Objects that are ultimately saved into the database. If you have then you have almost definitely got some representation of the relationships and probably many of the calculations you are going to use in reports. The problem with using SQL to directly access your database for reports is simply maintainability.
You typically put a lot of effort into ensuring that your Business Objects hide any details of their database. You implement business rules and do common calculations in your Business Objects. Build a common language for all members of the team etc etc. You then use an ORM to map to the database and use Habanero or NHibernate or something like that to do this. This is all great. We do this all in the name of Maintainability and is great. You can migrate your application change your design etc etc.
You now go and write SQL to run reports over time you have hundreds of report. Firstly they often duplicate logic you already have in your BusinessObjects (Usually without any tests) and even worse Bham Damb sorry maintainability is now stuffed forget about moving a that field from one table to another forget about splitting that table into two changing that relationship etc you have a number of reports that are going to break unexpectedly.
The problem with quering through your Domain Objects/Business Objects is simply one of performance.
In summary if you are using Domain Driven Design or Business Object concepts try to use these for reports. (You will probably run directly from DB using SQL or stored procs for performance reasons but try limit these use your Business Objects first and then use SQL).
The other option of course is using a separate reporting database (Like some of the BI concepts) The mapping from your transactional DB to your reporting DB is therefore in one place and easily changeable in cases where you want to change your design.
Domain Objects (Business Objects) and ORMs have all the knowledge to allow you to start building high performing queries that run directly on the Database while using the Domain Terminology. Lets hope that these continue to evolve to a point where this is a reality.
Until then if you are using Business Objects in your application try use them for Reporting when performance is an issue resort to SQL.

Database Abstraction - supporting multiple syntaxes

In a PHP project I'm working on we need to create some DAL extensions to support multiple database platforms. The main pitfall we have with this is that different platforms have different syntaxes - notable MySQL and MSSQL are quite different.
What would be the best solution to this?
Here are a couple we've discussed:
Class-based SQL building
This would involve creating a class that allows you to build SQL querys bit-by-bit. For example:
$stmt = new SQL_Stmt('mysql');
$stmt->set_type('select');
$stmt->set_columns('*');
$stmt->set_where(array('id' => 4));
$stmt->set_order('id', 'desc');
$stmt->set_limit(0, 30);
$stmt->exec();
It does involve quite a lot of lines for a single query though.
SQL syntax reformatting
This option is much cleaner - it would read SQL code and reformat it based on the input and output languages. I can see this being a much slower solution as far as parsing goes however.

I'd recommend class-based SQL building and recommend Doctrine, Zend_Db or MDB2. And yeah, if it requires more lines to write simple selects but at least you get to rely on a parser and don't need to re-invent the wheel.
Using any DBAL is a trade-off in speed, and not just database execution, but the first time you use either of those it will be more painful than when you are really familiar with it. Also, I'm almost a 100% sure that the code generated is not the fastest SQL query but that's the trade-off I meant earlier.
In the end it's up to you, so even though I wouldn't do it and it sure is not impossible, the question remains if you can actually save time and resources (in the long run) by implementing your own DBAL.

A solution could be to have different sets of queries for different platforms with ID's something like
MySql: GET_USERS = "SELECT * FROM users"
MsSql: GET_USERS = ...
PgSql: GET_USERS = ...
Then on startup you load the needed set of queries and refers then
Db::loadQueries(platform):
$users = $db->query(GET_USERS)

Such a scheme would not take account of all the richness which SQL offers, so you would be better off with code-generated stored procs for all your tables for each DB.
Even if you use parametrized stored procs which are more database model-aware (i.e. they do joins or are user-aware and so are optimized for each vendor), that's still a great approach. I always view the database interface layer as providing more than just simple tables to the application, because that approach can be bandwidth-intensive and roundtrip wasteful.

if you have a set of backends that support it, I would agree that generating stored procedures to form a contract is the best approach. This approach, however, doesnt work if you have a backend that is limited in capabilty with regards to stored procedures in which case you build an abstaction layer to implement SQL or generate target specific sql based on an abstract/limited sql syntax.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas