I'm looking for a SQL Implementation (and its Editor) that can be used for translating it to many other(s) SQL Languages.
For example, when i code in that SQL Language to script file(s), and then i translate to other(s) SQL Language script file(s) (for ex: MS SQL's , MySQL's , ...).
If you're sure to use only ANSI SQL to construct your scripts, you should be good to go.
I agree with #Justin Niessner: all SQL vendors pay attention to the SQL Standards, notably core SQL-92. To take SQL Server as an example, although they find Sybase legacy code is tricky to deprecate they are not afraid to do so and entirely new features (e.g. MERGE in MSSQL2008) tend to extend their Standard SQL equivalents, rather than reinventing the wheel.
For a product that has good Standards compliance, take a look at Mimer
Here at Mimer Information Technology, we pride ourselves on conforming
to the SQL standard and we play an active role in the Database
Languages standardization group which determines exactly what is SQL
standard.
Mimer also provide extremely useful SQL validators for SQL-92, SQL-99 and SQL:2003 respectively.
I've been researching the same thing a while ago. What I've found is that there is a project liquibase. It is aimed at change tracking but also converting between different DBMS. You can download source code and see different datatypes conversions across databases. Source at github browse for java files there, probably you'll find something helpful
If all you want are basic operations, these are fairly universal. For instance:
SELECT
INSERT
DELETE
UPDATE
FROM
WHERE
JOIN
...are all at the most basic level the same across implementations.
However, the more complicated your scripts get, the more difficult it becomes to make them "universal". Things like aggregation, subqueries, cursors, while loops, functions, indexes, constraints, temp tables, variables, string manipulation, window operations etc. are all pretty much database-specific.
Some of these do have "universal" equivalents but the more generic you make your code the worse it will perform.
Related
I just wanted to know if there is an SQL standard compliance validator out there for Visual Studio 2019 Professional (something that could be set to strict: only absolutely compliant syntax would be accepted). It would be nice if it had support for native languages too, but I'm used to that kind of stuff being CLR-only (I don't really know why probably because of linking... I may be so absolutely wrong, though... I actually have no idea and took a guess).
Something important would be that it needs to be standard compliant, not only SQL-server compliant. What is not in the standard is an error.
The goal is to make SQL code that is completely independent of the DBMS. Thank you for taking the time to read my question.
The goal is to make SQL code that is completely independent of the DBMS.
Impossible goal, unless you are going to forsake writing SQL at all. It is perhaps sad, but different databases differ on very fundamental things, picking and choosing the parts of the standard they want. Happily, the major things like SELECT, JOIN and GROUP BY are common but the details are not.
You can think of them of them like dialects of a spoken language over time and region. I'm most familiar with English, but it is true that all languages evolve and change. I can read Shakespearean English, but I am not going to write English like that. It would be grammatically incorrect in some cases, use unknown words, and alternative meanings of common words.
Here are just some examples of some features that differ widely among databases:
Intervals. Adding an interval to a date using the standard syntax is interval + '1 day'. This varies significantly across databases.
Some databases do not support FULL JOIN.
Some databases do not support recursive CTEs. Some use the recursive keyword; some do not.
Some databases do not support the VALUES() constructor in the FROM clause.
Some databases allow the FROM clause to be optional.
The standard has nifty functionality such as FILTER and aggregation by functionally dependent ids.that few databases support
Limitations on data types vary significantly -- what is the longer string, for instance.
The standard uses FETCH to limit results, which some databases do not support.
Parsing strings into dates and formatting dates into strings is totally database-dependent.
Extracting date/time components uses extract() in the standard, but few databases actually support that functionality.
These are just a few of the differences off the top of my head -- in no way meant to be complete or even the most important. I just want to point out that what you want to do is not possible.
So, I haven't had any luck finding any articles or forum posts that have explained to me how exactly a query language works in conjunction with a general use programming language like c++ or vb. So I guess it wont hurt to ask >.<
Basically, I've been having a hard time understanding what the roles of the query language are ( we'll use SQL as an example for query language and VB6 for norm language) if i'm creating a simple database query that fills a table with normal information (first name, last name, address etc). I somewhat know the steps in setting up a program like this using ado objects for the connection and whatnot, but how do we decide which language of the 2 gets used for certain things ? Does vb6 specifically handle the basics like loops, if else's, declarations of your vars, and SQL specifically handles things like connecting to the database and doing the searching, filtering and sorting ? Is it possible to do certain general use vb6 actions (loops or conditionals) in SQL syntax instead ? Any help would be GREATLY appreciated.
SQL is a language to query a database. SQL is an ISO standard and relational database vendors implement to the ISO standard and then add on their own customizations. For example in SQL Server it is called T-SQL and in Oracle it is called PL-SQL. They both implement ISO standards and so each will have identical queries for a simple select like
select columname from tablename where columnname=1
However, each have different syntax for string functions, date functions, etc....
The ISO SQL standard by design is not a full procedural language with looping, subroutines, ect as in a full procedural language like VB.
However, each vendor has added capabilities to their version to add some of this functionality in.
For example both T-SQL and PL-SQL can "loop" through records using various constructs in their language.
There is also a difference when working with data that many developers are not well in tuned with. That is set based operations vs. procedural based.
Databases can work with procedural constructs but are often more performant with set based. A developer who is not versed in this concept may end up creating a very innefficient query. Here's an example of this discussion.
With any situation you have to weight out the pro's/con's of where it is best to do this work.
I tend to favor using procedural constructs such as loops in the language I am using over SQL. I find it easier to maintain and the language I am using offers more powerful syntax for me to get the job done.
However, I keep both options as a tool in the toolbox. For example, I have written data conversion scripts in SQL and in this case I have used the looping constructs in SQL.
Usually programming language are executed in the client side (app server too), and query languages are executed in the db server, so in the end it depends where you want to put all the work. Sometimes you can put lot of work in the client side by doing all the calculations with the programming language and other times you want to use more the db server and you end up using the query language or even better tsql/psql or whatever.
Relational databases are designed to manage data. In particular, they provide an efficient mechanism for managing memory, disk, and processors for large quantities of data. In addition, relational databases can handle multiple clients, guarantee transactional integrity, security, backups, persistence, and numerous other functions.
In general, if you are using an RDBMS with another language, you want to design the data structure first and then think about the API (applications programming interface) between the two. This is particularly true when you have an app/server relationship.
For a "simple" type of application, which uses a lot of data but with minimal or batch changes to it, you want to move as much of the processing into the database as is reasonable. Here are things you do not want to do:
Use queries to load things into arrays, and then do array manipulations at the language level. SQL provides joins for this.
Load data into an array and do manipulations and summaries on the array. SQL provides aggregations for this.
Save data into a file to have a backup. Databases provide backup mechanisms.
If you data fits into an array or on an Excel spreadsheet, it is often sufficient to get started with the data stored there. Only when you start to expand the needs (multiple clients, security, integration with other data) do the advantages of a database become more apparent.
These are just for guidance and to give you some ideas.
In terms of doing what where, do as much as is sensible in SQL (given it runs on a server) as you can.
So for instance don't do stuff like this (psuedo code)
foreach(row in "Select * from Orders")
if (row[CustomerID] = 876)
Display(row)
Do
foreach(row in "Select * from Orders where CustomerId = 876")
Display(row)
First it's likely Orders is indexed by CustomerID so it will find all 876s order way quicker.
Second to do the first one you just sucked every record in that table into the client's memory space probably across your network.
What language is used is essentially irrelevant, you could invent your own DBMS with it's own language.
It's where you do what processing that matters. It's Rule with exceptions, but the essential idea is let your backend do as much as it can.
I'm looking for good SQL parser. One that will work with subselects, non-select queries, CTE, window functions and other legal SQL elements.
Result would be some kind of abstract syntax tree, that I could later on work on.
Language is mostly irrelevant, as I am willing to learn new language just to use the library, if it exists.
I know that it is technically possible to extract parser from some open source database, but it's far from easy (at least for the parser of PostgreSQL which is what I need).
There's a non-validating SQL parser in Python: python-sqlparse. The tokens are exposed as objects. I doubt if they support "other legal SQL statements", window functions, and the like though as those are controlled by vendor specific grammars and no vendor is technically fully compliant with SQL standards.
Um (knowing that you're willing to learn a new language), why would you need to work on the syntax tree? If you need some magic in dealing with the database, probably you don't need to reinvent the wheel: Python got a fantastic database toolkit - SQL ALchemy.
You can google "sql parser". This is the one that listed: General SQL Parser Here are some highlighted features listed on official website:
Offline SQL syntax check
Highly customizable SQL formatter
In-depth analysis of SQL script
Fully access to SQL query parse tree
Custom SQL engine for various databases
Major programming language support
It's a commercial SQL library.
Our DMS Software Reengineering Toolkit has PL/SQL and ANSI SQL 2011 full parsers (to ASTs) and prettyprinters (ASTs back to valid text). Neither of these are PostGres SQL, but DMS has a dialect mechanism that enables one to relatively easily build a dialect from a base grammar, by revising just some of the grammar rules and retaining the rest. Doing this from the SQL 2011 grammar seems like a practical way to tackle the problem.
DMS also offers facilities to access/traverse/modify the ASTs, both procedurally and in terms of surface-syntax patterns and transformations. Think of this as "life beyond parsing".
All that I know is that the former is Oracle and the latter is SQL Server. I assume some things might be easier in one versus the other but are there certain things I can do in PL that I can't in T?
Are there fundamental differences that I should be aware of? If so, what are they?
T-SQL and PL/SQL are two completely different programming languages with different syntax, type system, variable declarations, built-in functions and procedures, and programming capabilities.
The only thing they have in common is direct embedding of SQL statements, and storage and execution inside a database.
(In Oracle Forms, PL/SQL is even used for client-side code, though integration with database-stored PL/SQL is (almost) seemless)
The only fundamental difference is that PL/SQL is a procedural programming language with embedded SQL, and T-SQL is a set of procedural extensions for SQL used by MS SQL Server. The syntax differences are major. Here are a couple good articles on converting between the two:
http://www.dba-oracle.com/t_convent_sql_server_tsql_oracle_plsql.htm
http://jopinblog.wordpress.com/2007/04/24/oracle-plsql-equivalents-for-ms-sql-server-t-sql-constructs/
They're not necessarily easier, just different - ANSI-SQL is the standard code that's shared between them - the SELECT, WHERE and other query syntax, and T-SQL or PL/SQL is the control flow and advanced coding that's added on top by the database vendor to let you write stored procedures and string queries together.
It's possible to run multiple queries using just ANSI-SQL statements, but if you want to do any IF statements or handle any errors, you're using more than what's in the ANSI spec, so it's either T-SQL or PL/SQL (or whatever the vendor calls it if you're not using Microsoft or Oracle).
One tid bit I can add is take what you know in one and while using the other forget what you know(except for using set based logic when ever possible).
One example of the differences is cursor's are typically considered a less ideal solution in T-SQL unless there is a really good reason to use them which there is often not. In Oracle the cursor's are much more optimized for example they have bulk abilities, that is the ability to work on a set of data much like a normal SQL statement can. So in Oracle using a cursor isn't an instant failed code review where it might in a TSQL code review.
Overall T-SQL is much easier to learn as there's not much to it as far as languages are concerned. PL/SQL is a richer language, and therefore more complicated. It is not a hard language to pick up if have a good book. Overall I really like PLSQL for it's depth and I really like TSQL for it's simplicity.
PL/SQL and T-SQL are extensions for SQL. PL/SQL is used for Oracle databases and T-SQL is used for Microsoft databases.
Here are more useful informations:
http://techdifferences.com/difference-between-t-sql-and-pl-sql.html
I know that most sql server software allows you to do "A Update on a Join", but I am wondering, is this in the SQL standards?
(eg. can I assume that any software package allows this?)
Note: I am asking this because I am writing a database library that should be easily extensible to database software that is not included in the original build. As such there's no point in answering with a remark such as "a, b, c and b all allow that - together they make up the lionshare of the market, so you can assume that all software packages allow that". No, I am interested in whether it is in the standards or not.
If I understand the question right, I think the answer is no, there is no standard "update based on a join". The postgres manual page for UPDATE includes this under "Compatibility":
This command conforms to the SQL standard, except that the FROM and RETURNING clauses are PostgreSQL extensions, as is the ability to use WITH with UPDATE.
Some other database systems offer a FROM option in which the target table is supposed to be listed again within FROM. That is not how PostgreSQL interprets FROM. Be careful when porting applications that use this extension.
While this doesn't explicitly say there isn't, the Compatibility notes in that manual generally note when there is a related, but not identical, feature in the standard. What's more, the mention of other systems with different behaviour demonstrates that if there is a standard, you can't rely on it anyway.
According to the ANSI SQL-92 standard, an UPDATE on JOINed tables is NOT part of the standards; See http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt sections 13.9 and 13.10 (you'll have to search for 391, the page number).
I tried to find an ANSI 2003 standard, but the closest I came was here: www.wiscorp.com/sql_2003_standard.zip (a late draft). There was no substantial difference between the two in regards to the UPDATE statement and JOIN syntax.
Stu
You're presuming that all software packages adhere to ANSI SQL Standards.....in reality, none of them that I'm aware of adhere completely to the standards.
If you're looking to adhere to ANSI SQL standards, the best place to start would be with the documented standards themselves. Here's the SQL-92 document:
http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
Careful now, folks. Writing truly portable code is much more difficult than you would imagine and you also have to be willing to give up a lot in the areas of performance, ease of coding/maintenance, and readability. Just declare and use one variable in, say, SQL Server and your code is no longer truly portable. Write an audit trigger and I can guarantee that your trigger won't be portable between Oracle, SQL Server, and several other popular engines. And, it should really matter because it's not actually rocket science in any RDBMS (well, except maybe for writing a joined UPDATE in Oracle without using MERGE {which is standard but not portable, yet}).
Also, don't forget there are two basic types of SQL. That which supports the single row nature of most front-end code and that of batch code. If you really want your batch code to perform well, you'll use many of the "proprietary extensions" to the database engine you're using to efficiently process sometimes billions of rows overnight... the same night. ;-)
Be careful when aiming at writing code for "true" portability. You might end up with a tangled mess that's a whole lot slower than you might have ever imagined.