I am new to ANTLR & grammar writing. My requirement is to parse insert/update/delete SQL queries to get details like which table is being update/inserted/deleting a row, list of columns & their values etc. Yes there is a good documentation of ANTLR, but if anyone can help me with specific grammar for query parsing then it would be great help.
There are various SQL grammars on the Wiki: http://www.antlr.org/grammar/list
Beware though: they are contributed by ANTLR-users, chances are that they've not been properly tested and/or contain bugs.
But why generate an SQL parser yourself? It would probably be easier to use some exiting SQL parser. Just do a search on "SQL parser Java" (assuming you're working with Java), and you're bound to get dozens of hits.
Implementing a "decent" SQL parser is actually fairly hard. In SQL, one can write all kinds of complex statements (nested joins, ...), and people really do this, so you have to implement the "full" language. (I've seen SQL queries that cover tens of pages. Stupid, yes, but then you have to work with what you encounter).
I suggest checking out SQL2011 (the standard) as being rather comprehensive. However, that grammar may not be ANTLR friendly, so be prepared for a fair bit of work.
You also have to worry about database/vendor specific extensions to standard SQL. PL/SQL (just the SQL sublanguage part) contains lots of Oracle-specific extensions. If you are facing PL/SQL stored procedures, and you want to do that table/column tracing, you may have to do the procedural part of PL/SQL too, and that's also pretty big.
Related
Is there an ANTLR grammar for just the where clause of an ANSI SQL query?
i am trying to parse the conditions in where clause and change the column in that condition to something relevant to the indexes i have, so that my performance can be improved.
i want to automate this feature, so want to parse the dynamic sql and change the where clause dynamically.
Probably not.
I assume you've found an ANTLR grammar for all of ANSI SQL.
Picking out the subset that is just the where clause shouldn't really be hard.
What you'll have a harder time with is modifying the AST and regenerating source text, which I presume you want. ANTLR parsers provide no specific help in modifying the AST other than a library for hacking at the nodes as you visit them. Regeneration of source is completely on you; it can be implemented by spitting strings as you walk the tree or better, using String templates. But what has happened now is that your interest in modifying the statement has turned into a big(?) engineering job before you can do it.
See my essay on "Life After Parsing" (from bio or by googling).
I think what you want is a program transformation system. These are tools that given a grammar, can parse source code to ASTs, and carry out transformations on the AST to achieve the effect you want. PrettyPrinters to regenerate source text are usually provided.
Is it possible to implement own database server taking Oracle PL/SQL syntax as the bases or i would like to ask why different database solutions have different syntax eg: SQL server, MySql, Sqlite etc. can't they have some specific standard of syntax for basic operations including PL/SQL(excluding SQLite) why everyone is having a different syntax, sorry for diversion of question into patent issues but i could not find a better place to ask this question.
Of course you can, but you have to parse the PL/SQL yourself into something other platforms understand. (You can use ANTLR for example as parser tool. There is even a full featured grammar for PL/SQL) This is possible for small solutions with a small instruction set, but for large, full support of PL/SQL you need to be Oracle-sized.
To answer the why: two reasons:
There is no standard, so everyone picks his own;
You don't want customers to leave, so your own 'best' framework that is incompatible with others, that is your USP, and it prevents users from just porting their code to the other platform. They are stuck on yours.
I'm looking for a SQL Implementation (and its Editor) that can be used for translating it to many other(s) SQL Languages.
For example, when i code in that SQL Language to script file(s), and then i translate to other(s) SQL Language script file(s) (for ex: MS SQL's , MySQL's , ...).
If you're sure to use only ANSI SQL to construct your scripts, you should be good to go.
I agree with #Justin Niessner: all SQL vendors pay attention to the SQL Standards, notably core SQL-92. To take SQL Server as an example, although they find Sybase legacy code is tricky to deprecate they are not afraid to do so and entirely new features (e.g. MERGE in MSSQL2008) tend to extend their Standard SQL equivalents, rather than reinventing the wheel.
For a product that has good Standards compliance, take a look at Mimer
Here at Mimer Information Technology, we pride ourselves on conforming
to the SQL standard and we play an active role in the Database
Languages standardization group which determines exactly what is SQL
standard.
Mimer also provide extremely useful SQL validators for SQL-92, SQL-99 and SQL:2003 respectively.
I've been researching the same thing a while ago. What I've found is that there is a project liquibase. It is aimed at change tracking but also converting between different DBMS. You can download source code and see different datatypes conversions across databases. Source at github browse for java files there, probably you'll find something helpful
If all you want are basic operations, these are fairly universal. For instance:
SELECT
INSERT
DELETE
UPDATE
FROM
WHERE
JOIN
...are all at the most basic level the same across implementations.
However, the more complicated your scripts get, the more difficult it becomes to make them "universal". Things like aggregation, subqueries, cursors, while loops, functions, indexes, constraints, temp tables, variables, string manipulation, window operations etc. are all pretty much database-specific.
Some of these do have "universal" equivalents but the more generic you make your code the worse it will perform.
All that I know is that the former is Oracle and the latter is SQL Server. I assume some things might be easier in one versus the other but are there certain things I can do in PL that I can't in T?
Are there fundamental differences that I should be aware of? If so, what are they?
T-SQL and PL/SQL are two completely different programming languages with different syntax, type system, variable declarations, built-in functions and procedures, and programming capabilities.
The only thing they have in common is direct embedding of SQL statements, and storage and execution inside a database.
(In Oracle Forms, PL/SQL is even used for client-side code, though integration with database-stored PL/SQL is (almost) seemless)
The only fundamental difference is that PL/SQL is a procedural programming language with embedded SQL, and T-SQL is a set of procedural extensions for SQL used by MS SQL Server. The syntax differences are major. Here are a couple good articles on converting between the two:
http://www.dba-oracle.com/t_convent_sql_server_tsql_oracle_plsql.htm
http://jopinblog.wordpress.com/2007/04/24/oracle-plsql-equivalents-for-ms-sql-server-t-sql-constructs/
They're not necessarily easier, just different - ANSI-SQL is the standard code that's shared between them - the SELECT, WHERE and other query syntax, and T-SQL or PL/SQL is the control flow and advanced coding that's added on top by the database vendor to let you write stored procedures and string queries together.
It's possible to run multiple queries using just ANSI-SQL statements, but if you want to do any IF statements or handle any errors, you're using more than what's in the ANSI spec, so it's either T-SQL or PL/SQL (or whatever the vendor calls it if you're not using Microsoft or Oracle).
One tid bit I can add is take what you know in one and while using the other forget what you know(except for using set based logic when ever possible).
One example of the differences is cursor's are typically considered a less ideal solution in T-SQL unless there is a really good reason to use them which there is often not. In Oracle the cursor's are much more optimized for example they have bulk abilities, that is the ability to work on a set of data much like a normal SQL statement can. So in Oracle using a cursor isn't an instant failed code review where it might in a TSQL code review.
Overall T-SQL is much easier to learn as there's not much to it as far as languages are concerned. PL/SQL is a richer language, and therefore more complicated. It is not a hard language to pick up if have a good book. Overall I really like PLSQL for it's depth and I really like TSQL for it's simplicity.
PL/SQL and T-SQL are extensions for SQL. PL/SQL is used for Oracle databases and T-SQL is used for Microsoft databases.
Here are more useful informations:
http://techdifferences.com/difference-between-t-sql-and-pl-sql.html
I'm currently looking at some light-weight SQL abstraction modules. My workflow is such that i usually write SELECT queries manually, and INSERT/UPDATE queries via subs which take hashes.
Both of these modules seem perfect for my needs and i have a hard time deciding. SQL::Interp claims SQL::Abstract cannot provide full expressivity in SQL, but discusses no other differences.
Does it have any disadvantages? If so, which?
I can't speak to SQL::Interp, but I use SQL::Abstract and it's pretty good. In conjunction with DBIx::Connector and plain old DBI, I was able to totally eliminate the use of an ORM in my system with very little downside.
The only limitations I have run into is that it's not possible to write GROUP BY queries directly (although it's easy to do by simply appending to the generated query, and LIMIT queries are handled by the extension SQL::Abstract::Limit.
I used SQL::Abstract for a over a year, and then switched to SQL::Interp, which I've stuck with since.
SQL::Abstract had trouble with complex clauses. For the ones it could support, you would end up with a nest of "(" "[" and {" characters, which you were mentally translate back to meaning "AND", "OR" or actually parentheses.
SQL::Interp has no such limitations and uses no middle representation. Your SQL looks like SQL with bind variables where you want them. It works for complex queries as well as simple ones. I find SQL::Interp especially pleasant to use in combination with DBIx::Simple's built-in support for it. DBIx::Simple+SQL::Interp is a friendly and intuitive replacement for using raw DBI. I use the combination in a 100,000k+ LoC mod_perl web app.