Which language has good SQL parsing library? - sql

I'm looking for good SQL parser. One that will work with subselects, non-select queries, CTE, window functions and other legal SQL elements.
Result would be some kind of abstract syntax tree, that I could later on work on.
Language is mostly irrelevant, as I am willing to learn new language just to use the library, if it exists.
I know that it is technically possible to extract parser from some open source database, but it's far from easy (at least for the parser of PostgreSQL which is what I need).

There's a non-validating SQL parser in Python: python-sqlparse. The tokens are exposed as objects. I doubt if they support "other legal SQL statements", window functions, and the like though as those are controlled by vendor specific grammars and no vendor is technically fully compliant with SQL standards.
Um (knowing that you're willing to learn a new language), why would you need to work on the syntax tree? If you need some magic in dealing with the database, probably you don't need to reinvent the wheel: Python got a fantastic database toolkit - SQL ALchemy.

You can google "sql parser". This is the one that listed: General SQL Parser Here are some highlighted features listed on official website:
Offline SQL syntax check
Highly customizable SQL formatter
In-depth analysis of SQL script
Fully access to SQL query parse tree
Custom SQL engine for various databases
Major programming language support
It's a commercial SQL library.

Our DMS Software Reengineering Toolkit has PL/SQL and ANSI SQL 2011 full parsers (to ASTs) and prettyprinters (ASTs back to valid text). Neither of these are PostGres SQL, but DMS has a dialect mechanism that enables one to relatively easily build a dialect from a base grammar, by revising just some of the grammar rules and retaining the rest. Doing this from the SQL 2011 grammar seems like a practical way to tackle the problem.
DMS also offers facilities to access/traverse/modify the ASTs, both procedurally and in terms of surface-syntax patterns and transformations. Think of this as "life beyond parsing".

Related

Why is SQL called a data sublanguage?

Recently I have read (in a PDF document - SQL for dummies) that SQL is actually a data sublanguage and not a programming language like C++ or Java or C# and right now I am a bit confused, because since you can develop things through SQL, I thought it is similar to other programming languages.
Could anyone explain to me what is the difference? Thanks
Try to write a simple but non-trivial application using nothing but standard SQL that asks the user to input their name, and outputs "Hello, ."
Maybe you could do it with some vendor-specific extensions, but then it wouldn't be standard SQL.
SQL is designed to be a domain-specific language for database queries. It's meant to be used in combination with a more fully-featured language. The SQL standard defines ways that you can write lines of SQL within the code file of C, C++ or other languages. There's no standard way to write a full standalone app using just SQL.
Read the standard. The SQL/PSM part defines a full-blown programming language with loops and IF-THEN-ELSE and what have you. SQL/PSM was initially amended to the existing standard in 1996 and formally included in SQL:1999.
As Bill Karwin hinted at, SQL does not have features for UI interaction, but then the very same thing is true about the java language (no, the swing package is not part of the language), and the COBOL language, and the ALGOL language, and many many others.
SQL began life as a data sublanguage. That history is >20yrs behind us. (The "data sublanguage" portion is still the most-used and most-useful part, but that does not change the fact that technically speaking, SQL-the-language has everything it takes to be regarded as a full-blown programming language.)
According to wikipedia a sublanguage
the term "sublanguage", first used for this purpose by E. F. Codd in 1970, refers to a computer language used to define or manipulate the structure and contents of a relational database management system (RDBMS). Typical sublanguages associated with modern RDBMS's are QBE (Query by Example) and SQL (Structured Query Language).
that means that sublanguages cannot be used to develop a standalone applications but they could be incorporated with other computer programming languages to manage the application-database interaction.
Standard SQL can only be used for defining database schemas and for querying and updating data. Database vendors have extensions to SQL (like Microsoft's T-SQL) that are full-blown programming languages.

Using Oracle SQL syntax for custom developed database server

Is it possible to implement own database server taking Oracle PL/SQL syntax as the bases or i would like to ask why different database solutions have different syntax eg: SQL server, MySql, Sqlite etc. can't they have some specific standard of syntax for basic operations including PL/SQL(excluding SQLite) why everyone is having a different syntax, sorry for diversion of question into patent issues but i could not find a better place to ask this question.
Of course you can, but you have to parse the PL/SQL yourself into something other platforms understand. (You can use ANTLR for example as parser tool. There is even a full featured grammar for PL/SQL) This is possible for small solutions with a small instruction set, but for large, full support of PL/SQL you need to be Oracle-sized.
To answer the why: two reasons:
There is no standard, so everyone picks his own;
You don't want customers to leave, so your own 'best' framework that is incompatible with others, that is your USP, and it prevents users from just porting their code to the other platform. They are stuck on yours.

JDBC SQL:Where is the detailed specification?

Everybody loves to mention how JDBC abstracts away vendor-specific differences between SQLs to present a single SQL flavor that would work against a whole slew of them.
But no book or reference on JDBC ever mentions a (detailed) specification or even a decent, user-space coverage of this SQL supported by (a specific version of) JDBC, say JDBC 4.1!
So, what ends up happening (at least with me) is that, if I'm working with MySQL, I must refer to the MySQL reference manual and then try to guard myself against accidentally using MySQL-specific features. For writing portable SQL (at least at the level supported by the JDBC driver version I'm using), I would rather prefer to refer to a JDBC spec or to an SQL spec directly instead of referring to MySQL, PostgresQL, etc.
Is the SQL standard itself (2008, 2003, etc), on which a particular version of JDBC is based, freely available? Or, do I have to shell out $$ to get a copy?
There is no "JDBC SQL", just ISO SQL and the vendor implementations of it. JDBC defines the interface for talking to SQL databases, it's a different layer to the query language its self.
The reference for JDBC its self is the JSR documentation:
JDBC 4.0
JDBC 4.1
Unfortunately the official SQL standards are expensive and must be purchased from the ISO.
You can find late-stage drafts that're perfectly good for reference when you're not trying to develop a conforming implementation here among other places.
The SQL spec isn't the most friendly and readable of things, so in practice it's a good idea to use vendor documentation that's actually intended to be read by human beings. You can compare a couple of vendor docs or fall back on the standard doc when uncertainty arises.
Standard compliance with the spec isn't exactly ideal across DBs; writing code strictly to the spec doesn't necessarily mean it'll actually work. For example, MySQL doesn't impliment window functions or common table expressions, PostgreSQL doesn't implement SQL/PSM (instead offering PL/PgSQL) or the CALL statement; most vendors use different ways of specifying auto-increment columns or sequence generators; etc etc etc.
Please don't use the w3schools SQL guides, they're severely outdated, wrong, fail to differentiate between vendor extensions and the standard, and should generally be avoided. I mention them because w3schools tends to come up quite high in search rankings - back in the day they used to actually be useful.
You can download the JDBC 4.1 specification from http://download.oracle.com/otndocs/jcp/jdbc-4_1-mrel-spec/index.html but this only covers JDBC itself, not SQL. The specification is more a description of the interface; it does expect databases to support some level of the SQL standards, but don't expect to find more information than a reference to the SQL standard if it comes to the requirements to queries.
You usually need to use the database specific SQL anyway, because even though there is a SQL standard, database vendors don't implement them to the letter. JDBC itself defines some escapes to bridge the gaps, but as far as I know, they are hardly ever used. Drivers also - usually - don't translate standard SQL to database specific SQL if the database doesn't support the standard SQL.
If you want to look at the official SQL standard, you need to buy it from ISO or your country-specific ISO representative. That said, with some searching you can find and download draft versions of the specification for free. I am not sure how helpful that is though, as the SQL standard documents are not intended as a reference manual, but meant to be a formal description and goes really deep in details that are only relevant to an implementer.

Which SQL Implementation can translate to many other(s)?

I'm looking for a SQL Implementation (and its Editor) that can be used for translating it to many other(s) SQL Languages.
For example, when i code in that SQL Language to script file(s), and then i translate to other(s) SQL Language script file(s) (for ex: MS SQL's , MySQL's , ...).
If you're sure to use only ANSI SQL to construct your scripts, you should be good to go.
I agree with #Justin Niessner: all SQL vendors pay attention to the SQL Standards, notably core SQL-92. To take SQL Server as an example, although they find Sybase legacy code is tricky to deprecate they are not afraid to do so and entirely new features (e.g. MERGE in MSSQL2008) tend to extend their Standard SQL equivalents, rather than reinventing the wheel.
For a product that has good Standards compliance, take a look at Mimer
Here at Mimer Information Technology, we pride ourselves on conforming
to the SQL standard and we play an active role in the Database
Languages standardization group which determines exactly what is SQL
standard.
Mimer also provide extremely useful SQL validators for SQL-92, SQL-99 and SQL:2003 respectively.
I've been researching the same thing a while ago. What I've found is that there is a project liquibase. It is aimed at change tracking but also converting between different DBMS. You can download source code and see different datatypes conversions across databases. Source at github browse for java files there, probably you'll find something helpful
If all you want are basic operations, these are fairly universal. For instance:
SELECT
INSERT
DELETE
UPDATE
FROM
WHERE
JOIN
...are all at the most basic level the same across implementations.
However, the more complicated your scripts get, the more difficult it becomes to make them "universal". Things like aggregation, subqueries, cursors, while loops, functions, indexes, constraints, temp tables, variables, string manipulation, window operations etc. are all pretty much database-specific.
Some of these do have "universal" equivalents but the more generic you make your code the worse it will perform.

are there open source validation parsers for major SQL dialects (TSQL, Oracle, MySQL)? or at least precise specs for these dialects?

word on the street is that Perl is defined not by a spec but by whatever the current interpreter version happens to accept. Now, let's consider an SQL dialect like TSQL. Is there a published spec of it that would allow making a validator equivalent to the one inside SQL Server? Are there such validators already in existence as open source? And the same question for Oracle.
Ok, so for MySQL I am guessing that validator could be extracted directly from the MySQL codebase. Nevertheless, do they in fact publish the spec itself in case I wanted to make my own validator?
You seem to have an idea of what to do for MySQL. I can't really say much about Oracle apart from that it mostly implements ANSI SQL and the PL/SQL procedural language extensions to SQL can mostly be found here for Oracle 9i.
For SQL Server:
Microsoft Books On Line (BOL) is the official reference spec. There are different pages for different versions of SQL Server, however.
There are a few projects relating to this.
http://www.sqlparser.com/ - This has .NET, Java, COM and VCL versions for Oracle, DB2, Mysql and SQL Server / Sybase (T-SQL). Quite reasonably priced too.
http://www.codeproject.com/Articles/1136/SharpHSQL-An-SQL-engine-written-in-C (c#)
http://antlr.org/ - This looks like a good bet.
I often use this site for formatting of SQL but it also does some validation although it's fairly crude:
http://www.dpriver.com/pp/sqlformat.htm
This is a similar site:
http://www.tsqltidy.com/
I would suggest that writing a validator for SQL even in just one of its variations is a massive undertaking. You could look at the various ISO/IEC standards for ANSI SQL. ANSI SQL-92 is very widely implemented, but there is a SQL:2008 standard as well.
You'd have to pay for the documentation for those standards though and they aren't cheap.
Good luck.