Modularizing SQL even if only syntactic sugar

Modularizing SQL even if only syntactic sugar - sql

Is there a way to modularize SQL code so that is more readable and testable?
My SQL code often becomes a long complicated series of nested joins, inner joins, etc. that are hard to write and hard to debug. By contrast, in a procedural language like Javascript or Java, one would pinch off discrete elements as separate functions you would call by name.
Yes, one could write each as entirely separate queries, stored in the database, or as stored procedures, but often I don't want to change/clutter the database, just query it is fine, especially if the DBA doesn't wish to grant write permissions to all users.
For instance, conceptually a complex query might be easily described in pseudocode like this:
(getCustomerProfile)
left join
(getSummarizedCustomerTransactionHistory)
using (customerId)
left join
(getGeographicalSummaries)
using (region, vendor)
...
I realize that a lot is written on the topic from a theoretical vantage (a few links below), but I'm just looking for a way to make the code easier to write correctly, and easier to read once written. Perhaps just syntactic sugar to abstract the complexity from sight, if not from execution, that compiles down in the literal SQL I'm trying to not look at. By analogy...
Stylus: CSS ::
CoffeeScript : Javascript ::
SAS Macro language: SAS language ::
? : SQL
And if the specific SQL flavor matters, most of my work is in PostgresQL.
http://lambda-the-ultimate.org/node/2440
Code reuse and modularity in SQL
Are Databases and Functional Programming at odds?

In most databases, you can do what you want using CTEs (Common Table Expressions):
with CustomerProfile as (
getCustomerProfile
),
SummarizedCustomerTransactionHistory as (
getSummarizedCustomerTransactionHistory
),
GeographicalSummaries as (
getGeographicalSummaries
)
select <whatever>
This works for a single query. It has the advantage that you can define a CTE once, but use it multiple times. Also, I often define a CTE called const that has constant values.
The next step is to take these constructs and create views from them. This is especially useful when sharing code among multiple modules, to ensure constant definitions. In some databases, you can put indexes on the views to "instantiate" them, further optimizing processing.
Finally, I recommend wrapping inserts/updates/deletes in stored procedures. This allows you to do have a consistent framework.
Two more comments though. First, SQL is often used for transactional or reporting systems. Often, once you get the data in the right format for the purpose, the data speaks for itself. You example might just be asking for a data mart that has three tables devoted to those three subject areas, which get populated once per week or once per day.
And, SQL is not an idea language for abstraction. With good practice, naming conventions, and indentation style, you can make it useful. I sorely miss certain things from "real" languages, such as macros, error handling (why data errors are so hard to identify and handle is beyond me), consistent methods for common functionality (can someone say group string concatenation), and some other features. That said, because it is data centric and readily parallelizable, it is more useful for me than most other languages.

The issue here is you need to think about data in a relational way. I do not believe this type of abstraction correctly fits into the relational model. In terms of making SQL modular, that is what stored procedures and/or functions are for. Notice how these have the same characteristics as methods do in Java. You can abstract out that way. Another way is to abstract the data that is what you care about into materialized views. By doing this you can put a regular view (see virtual function) over top of these materialized views which allow you to test the structure of the data without touching the "raw" tables.

Related

Why is dynamical selection of column & table names so difficult in SQL?

I figure there has to be a specific design reason why you can't write a query like the following one:
select
(select column_name
from information_schema
where column_name not like '%rate%'
and table_name = 'Fixed_Income')
from Fixed_Income
and instead have to resort to dynamic SQL.
Anyone knows what that reason is? I tried Googling it, but all the hits were cries for help in solving the problem -- meaning it's a pretty widespread need and not well understood.

The reason is that the query optimizer needs to know the exact schema objects you are referring to at compile time. It needs them to optimize the query. You wouldn't believe how slow the RDBMS would be without having this information available to the query optimizer.
It's a little like the performance difference of static vs. dynamic typing in practice: There is usually a non-trivial difference (I'm thinking just about mainstream languages here). The compiler can exploit the static information to generate great code.
Even if this feature was present, it would be implemented by first computing the table and column names and then doing a standard "static" query planning.

You ask a very interesting question.
The "relational" in "relational algebra" refers to name-value pairs, not to relationships between tables. In relational algebra, there is no requirement that all records in a set (table) have the same columns.
My best guess is that the limitation is related to the idea of entity-relationship diagrams comes into play. A database is designed around tables, and these tables have relationships to each other. The choice of a relational database for data storage and access was specifically when the data could be stored this way. Knowing the entities and their attributes suggests a static form of the data and hence static references in queries.
In addition, SQL as a language is a declarative language rather than a procedural language. This suggests -- but does not impose -- a compilation step separate from the running of the query. In general, the SQL engine does the following (at a very high level):
Compiles the query, generally into some sort of data flow process.
Optimizes the data flow process. (Typically part of the compilation process.)
Runs the query.
The first two result in what is called "the query plan". You really cannot do optimization, though, unless you know about the objects you are operating on. So, dynamically choosing tables and columns means that optimization would be part of running the query rather than compiling it.
Finally, some databases like SQL Server support dynamic SQL. This allows you to build strings that get compiled and run at the same time. This is very useful for complex decision support queries. It is not recommended when you need fast transaction throughput, because the overhead for compilation is too high relative to the query.

Confused about the role of a query language

So, I haven't had any luck finding any articles or forum posts that have explained to me how exactly a query language works in conjunction with a general use programming language like c++ or vb. So I guess it wont hurt to ask >.<
Basically, I've been having a hard time understanding what the roles of the query language are ( we'll use SQL as an example for query language and VB6 for norm language) if i'm creating a simple database query that fills a table with normal information (first name, last name, address etc). I somewhat know the steps in setting up a program like this using ado objects for the connection and whatnot, but how do we decide which language of the 2 gets used for certain things ? Does vb6 specifically handle the basics like loops, if else's, declarations of your vars, and SQL specifically handles things like connecting to the database and doing the searching, filtering and sorting ? Is it possible to do certain general use vb6 actions (loops or conditionals) in SQL syntax instead ? Any help would be GREATLY appreciated.

SQL is a language to query a database. SQL is an ISO standard and relational database vendors implement to the ISO standard and then add on their own customizations. For example in SQL Server it is called T-SQL and in Oracle it is called PL-SQL. They both implement ISO standards and so each will have identical queries for a simple select like
select columname from tablename where columnname=1
However, each have different syntax for string functions, date functions, etc....
The ISO SQL standard by design is not a full procedural language with looping, subroutines, ect as in a full procedural language like VB.
However, each vendor has added capabilities to their version to add some of this functionality in.
For example both T-SQL and PL-SQL can "loop" through records using various constructs in their language.
There is also a difference when working with data that many developers are not well in tuned with. That is set based operations vs. procedural based.
Databases can work with procedural constructs but are often more performant with set based. A developer who is not versed in this concept may end up creating a very innefficient query. Here's an example of this discussion.
With any situation you have to weight out the pro's/con's of where it is best to do this work.
I tend to favor using procedural constructs such as loops in the language I am using over SQL. I find it easier to maintain and the language I am using offers more powerful syntax for me to get the job done.
However, I keep both options as a tool in the toolbox. For example, I have written data conversion scripts in SQL and in this case I have used the looping constructs in SQL.

Usually programming language are executed in the client side (app server too), and query languages are executed in the db server, so in the end it depends where you want to put all the work. Sometimes you can put lot of work in the client side by doing all the calculations with the programming language and other times you want to use more the db server and you end up using the query language or even better tsql/psql or whatever.

Relational databases are designed to manage data. In particular, they provide an efficient mechanism for managing memory, disk, and processors for large quantities of data. In addition, relational databases can handle multiple clients, guarantee transactional integrity, security, backups, persistence, and numerous other functions.
In general, if you are using an RDBMS with another language, you want to design the data structure first and then think about the API (applications programming interface) between the two. This is particularly true when you have an app/server relationship.
For a "simple" type of application, which uses a lot of data but with minimal or batch changes to it, you want to move as much of the processing into the database as is reasonable. Here are things you do not want to do:
Use queries to load things into arrays, and then do array manipulations at the language level. SQL provides joins for this.
Load data into an array and do manipulations and summaries on the array. SQL provides aggregations for this.
Save data into a file to have a backup. Databases provide backup mechanisms.
If you data fits into an array or on an Excel spreadsheet, it is often sufficient to get started with the data stored there. Only when you start to expand the needs (multiple clients, security, integration with other data) do the advantages of a database become more apparent.
These are just for guidance and to give you some ideas.

In terms of doing what where, do as much as is sensible in SQL (given it runs on a server) as you can.
So for instance don't do stuff like this (psuedo code)
foreach(row in "Select * from Orders")
if (row[CustomerID] = 876)
Display(row)
Do
foreach(row in "Select * from Orders where CustomerId = 876")
Display(row)
First it's likely Orders is indexed by CustomerID so it will find all 876s order way quicker.
Second to do the first one you just sucked every record in that table into the client's memory space probably across your network.
What language is used is essentially irrelevant, you could invent your own DBMS with it's own language.
It's where you do what processing that matters. It's Rule with exceptions, but the essential idea is let your backend do as much as it can.

Query design practices in SQL

I am building queries for a database in MS Access 2007 and I am wondering if my current design practices are up to par. Basically, the database was configured before I came, but I have been given the responsibility of building efficient queries to extract the data.
My current queries are small and simple, each accomplishing 2-3 tasks (sometimes only 1) at a time. The reason I am taking this approach is because I am completely new to SQL, and I find it easier to work with many, simple queries and use reports to consolidate the data, as opposed to building extremely complex queries which are 1) hard to build (for me, anyways) and 2) hard to maintain.
I was just curious if anyone had any best practices for query design, and if you could give me some specific feed back for the approach listed above, and whether or not I should start making complex queries, or just stick to simple queries and reports to consolidate the relevant data.
Thanks.

The people answering this question are not coming to it from an Access point of view, so I'll offer some observations as somebody who has been creating Access applications professionally full-time since 1996.
First off, there are several places where you'll have SQL in an Access application:
stored queries.
stored properties of forms, reports, combo boxes and list boxes.
in VBA code where you are writing SQL on the fly.
Managing all of these SQL statements in an organized fashion is difficult, if not impossible. But I'm not sure it's worth it!
First off, consider just stored queries. If you follow the advice of saving a query for every individual task so that each SQL statement is used in only one place, you'll soon have a mess in the list of queries, and you'll be forced into some kind of naming convention to keep track of what's what. Because of this, I generally don't save queries EXCEPT where they MUST be saved, or where the optimization that comes with a saved query is going to be helpful (i.e., large dataset or complex joins/filtering).
For example, when I first started programming in Access, I'd save all the rowsources of my combo boxes as saved queries. I developed a naming convention so they wouldn't be mixed in with the other queries in the list of queries, so it wasn't to hard to manage. At first, I thought I'd be re-using the saved queries, but it quickly became clear that I needed to make changes for individual circumstances, and changing a query that was used elsewhere might alter its results in other contexts, so really, there was no "shared code" benefit to the saved queries (as I thought there would be). The only place where it was helpful was where I had the same combo box on multiple forms, and then I could save the rowsource for that as a saved query and if I needed to alter it, I could it in just one place. However, that was really only an advantage for a relatively complex rowsource -- a simple SELECT on a couple of fields doesn't really benefit from that kind of sharing, particularly when it's used in only a couple of different places.
In short, I quickly concluded that it was just easier to save the SQL statements where they were used -- since there was very little re-use in the first place (once I gained enough experience to realize the pitfalls of trying to re-use them), this worked much better, and it kept the SQL close to where it was being used.
For forms and reports, I do some of the same things, but in general, use saved queries for the purpose of avoiding having to write too many complex subselects for use as derived tables. Where I needed those it was always easier to write it and save it and then use it with a JOIN in another SQL statement than it was to try to use the subselect inline as a derived table (which just makes for complicated SQL that's hard to read -- particularly when you can't comment or format your SQL, as is the case with saved Access queries).
In general, I don't save the recordsources of forms or reports except where there is real re-use going on (a report will often use the same recordsource as a form, so in that case, it's useful to save it, so that when you change the SQL of the form, the report that goes with it inherits the alteration).
That all leaves dynamic SQL assembled in VBA code. I use lots of this, from dynamically setting the rowsources of combo/listboxes, to setting the recordsources of subforms for filtering purposes. This is harder to manage, and sometimes I use string constants in the module to make that easier. For instance, in a case where you're writing dynamic SQL where everything remains the same except the WHERE clause, a constant with the SELECT and a second constant with the ORDER BY makes it a lot easier to assemble the complete SQL statement.
I don't know if this really answers your questions, but I have learned over the years that the benefits of re-using SQL statements are vastly outweighed by the uncertainty that comes from the inability to track easily where that SQL statement may be used. I find that storing the SQL statment as close to where it is used as possible is the best practice, as that is a form of "self-documentation" (though not a great one!).
I do make many exceptions and save queries when there is a real and demonstrable benefit in terms of performance or managing what would otherwise become much more comples SQL. However, I would also note that one should also not go too far in the other direction, using tons of nested saved queries, because then you run into other problems (i.e., the "too many databases" problem, which is actually caused by using up the 2048 table handles available at one time -- it's done more easily than you might think).

My humble opinion, it doesn't matter if DB engine is big and monstrous as MSSQL or Oracle, or tiny and simple as SQLite, every query (or stored procedure or any other unit of data processing) should be responsible only for 1 function. I use this principle anywhere (not only in DB development) and I can say it works.
If you are not sure, try to read books about refactoring, Fawler for example. I suppose his principles are applicable to any area of development.

If you are storing your data in MSAccess then your database cannot be too large and any optimization you do is limited by the constraints MSAccess imposes. If better (more optimized) queries is a goal, then perhaps migrating the data out of Access and into SQL Server may allow you to have better flexibility in development going forward. You can leverage cached execution plans, stored procedure, and views.
This may mean that you will need to enhance your T-SQL skills to accomplish this.
So weigh out the options you propose in your question:
1. Keep code simple (comfortable at your current skill level)
2. Meet the responsibility to create efficient queries for data extraction.
SQL Server Express could be a good starting point (it's free).

Why use SQL database? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm not quite sure stackoverflow is a place for such a general question, but let's give it a try.
Being exposed to the need of storing application data somewhere, I've always used MySQL or sqlite, just because it's always done like that. As it seems like the whole world is using these databases (most of all software products, frameworks, etc), it is rather hard for a beginning developer like me to start thinking about whether this is a good solution or not.
Ok, say we have some object-oriented logic in our application, and objects are related to each other somehow. We need to map this logic to the storage logic, so relations between database objects are required too. This leads us to using relational database, and I'm ok with that - to put it simple, our database table rows sometimes will need to have references to other tables' rows. But why use SQL language for interaction with such a database?
SQL query is a text message. I can understand this is cool for actually understanding what it does, but isn't it silly to use text table and column names for a part of application that no one ever seen after deploynment? If you had to write a data storage from scratch, you would have never used this kind of solution. Personally, I would have used some 'compiled db query' bytecode, that would be assembled once inside a client application and passed to the database. And it surely would name tables and colons by id numbers, not ascii-strings. In the case of changes in table structure those byte queries could be recompiled according to new db schema, stored in XML or something like that.
What are the problems of my idea? Is there any reason for me not to write it myself and to use SQL database instead?
EDIT To make my question more clear. Most of answers claim that SQL, being a text query, helps developers better understand the query itself and debug it more easily. Personally, I haven't seen people writing SQL queries by hand for a while. Everyone I know, including me, is using ORM. This situation, in which we build up a new level of abstraction to hide SQL, leads to thinking if we need SQL or not. I would be very grateful if you could give some examples in which SQL is used without ORM purposely, and why.
EDIT2 SQL is an interface between a human and a database. The question is why do we have to use it for application/database interaction? I still ask for examples of human beings writing/debugging SQL.

Everyone I know, including me, is using ORM
Strange. Everyone I know, including me, still writes most of the SQL by hand. You typically end up with tighter, more high performance queries than you do with a generated solution. And, depending on your industry and application, this speed does matter. Sometimes a lot. yeah, I'll sometimes use LINQ for a quick-n-dirty where I don't really care what the resulting SQL looks like, but thus far nothing automated beats hand-tuned SQL for when performance against a large database in a high-load environment really matters.

If all you need to do is store some application data somewhere, then a general purpose RDBMS or even SQLite might be overkill. Serializing your objects and writing them to a file might be simpler in some cases. An advantage to SQLite is that if you have a lot of this kind of information, it is all contained in one file. A disadvantage is that it is more difficult to read it. For example, if you serialize you data to YAML, you can read the file with any text editor or shell.
Personally, I would have used some
'compiled db query' bytecode, that
would be assembled once inside a
client application and passed to the
database.
This is how some database APIs work. Check out static SQL and prepared statements.
Is there any reason for me not to
write it myself and to use SQL
database instead?
If you need a lot of features, at some point it will be easier to use an existing RDMBS then to write your own database from scratch. If you don't need many features, a simpler solution may be wiser.
The whole point of database products is to avoid writing the database layer for every new program. Yes, a modern RDMBS might not always be a perfect fit for every project. This is because they were designed to be very general, so in practice, you will always get additional features you don't need. That doesn't mean it is better to have a custom solution. The glove doesn't always need to be a perfect fit.
UPDATE:
But why use SQL language for
interaction with such a database?
Good question.
The answer to that may be found in the original paper describing the relational model A Relational Model of Data for Large Shared Data Banks, by E. F. Codd, published by IBM in 1970. This paper describes the problems with the existing database technologies of the time, and explains why the relational model is superior.
The reason for using the relational model, and thus a logical query language like SQL, is data independence.
Data independence is defined in the paper as:
"... the independence of application programs and terminal activities from the growth in data types and changes in data representations."
Before the relational model, the dominate technology for databases was referred to as the network model. In this model, the programmer had to know the on-disk structure of the data and traverse the tree or graph manually. The relational model allows one to write a query against the conceptual or logical scheme that is independent of the physical representation of the data on disk. This separation of logical scheme from the physical schema is why we use the relational model. For a more on this issue, here are some slides from a database class. In the relational model, we use logic based query languages like SQL to retrieve data.
Codd's paper goes into more detail about the benefits of the relational model. Give it a read.
SQL is a query language that is easy to type into a computer in contrast with the query languages typically used in a research papers. Research papers generally use relation algebra or relational calculus to write queries.
In summary, we use SQL because we happen to use the relational model for our databases.
If you understand the relational model, it is not hard to see why SQL is the way it is. So basically, you need to study the relation model and database internals more in-depth to really understand why we use SQL. It may be a bit of a mystery otherwise.
UPDATE 2:
SQL is an interface between a human
and a database. The question is why do
we have to use it for
application/database interaction? I
still ask for examples of human beings
writing/debugging SQL.
Because the database is a relational database, it only understands relational query languages. Internally it uses a relational algebra like language for specifying queries which it then turns into a query plan. So, we write our query in a form we can understand (SQL), the DB takes our SQL query and turns it into its internal query language. Then it takes the query and tries to find a "query plan" for executing the query. Then it executes the query plan and returns the result.
At some point, we must encode our query in a format that the database understands. The database only knows how to convert SQL to its internal representation, that is why there is always SQL at some point in the chain. It cannot be avoided.
When you use ORM, your just adding a layer on top of the SQL. The SQL is still there, its just hidden. If you have a higher-level layer for translating your request into SQL, then you don't need to write SQL directly which is beneficial in some cases. Some times we do not have such a layer that is capable of doing the kinds of queries we need, so we must use SQL.

Given the fact that you used MySQL and SQLite, I understand your point of view completely. Most DBMS have features that would require some of the programming from your side, while you get it from database for free:
Indexes - you can store large amounts of data and still be able to filter and search very quickly because of indexes. Of course, you could implement you own indexing, but why reinvent the wheel
data integrity - using database features like cascading foreign keys can ensure data integrity across the system. You only need to declare relationship between data, and system takes care of the rest. Of course, once more, you could implement constraints in code, but it's more work. Consider, for example, deletion, where you would have to write code in object's destructor to track all dependent objects and act accordingly
ability to have multiple applications written in different programming languages, working on different operating systems, some even distributed across the network - all using the same data stored in a common database
dead easy implementation of observer pattern via triggers. There are many cases where only some data depends on some other data and it does not affect UI aspect of application. Ensuring consistency can be very tricky or require a lot of programming. Of course, you could implement trigger-like behavior with objects but it requires more programming than simple SQL definition

There are some good answers here. I'll attempt to add my two cents.
I like SQL, I can think in it pretty easily. The queries produced by layers on top of the DB (like ORM frameworks) are usually hideous. They'll select tons of extra stuff, join in things you don't need, etc.; all because they don't know that you only want a small part of the object in this code. When you need high performance, you'll often end up going in and using at least some custom SQL queries in an ORM system just to speed up a few bottlenecks.
Why SQL? As others have said, it's easy for humans. It makes a good lowest common denominator. Any language can make SQL and call command line clients if necessary, and they is pretty much always a good library.
Is parsing out the SQL inefficient? Somewhat. The grammar is pretty structured, so there aren't tons of ambiguities that would make the parser's job really hard. The real thing is that the overhead of parsing out SQL is basically nothing.
Let's say you run a query like "SELECT x FROM table WHERE id = 3", and then do it again with 4, then 5, over and over. In that case, the parsing overhead may exist. That's why you have prepared statements (as others have mentioned). The server parses the query once, and can swap in the 3 and 4 and 5 without having to reparse everything.
But that's the trivial case. In real life, your system may join 6 tables and have to pull hundreds of thousands of records (if not more). It may be a query that you let run on a database cluster for hours, because that's the best way to do things in your case. Even with a query that takes only a minute or two to execute, the time to parse the query is essentially free compared to pulling records off disk and doing sorting/aggregation/etc. The overhead of sending the ext "LEFT OUTER JOIN ON" is only a few bytes compared to sending special encoded byte 0x3F. But when your result set is 30 MB (let alone gigs+), those few extra bytes are worthless compared to not having to mess with some special query compiler object.
Many people use SQL on small databases. The biggest one I interact with is only a few dozen gigs. SQL is used on everything from tiny files (like little SQLite DBs may be) up to terabyte size Oracle clusters. Considering it's power, it's actually a surprisingly simple and small command set.

It's an ubiquitous standard. Pretty much every programming language out there has a way to access SQL databases. Try that with a proprietary binary protocol.
Everyone knows it. You can find experts easily, new developers will usually understand it to some degree without requiring training
SQL is very closely tied to the relational model, which has been thoroughly explored in regard to optimization and scalability. But it still frequently requires manual tweaking (index creation, query structure, etc.), which is relatively easy due to the textual interface.

But why use SQL language for interaction with such a database?
I think it's for the same reason that you use a human-readable (source code) language for interaction with the compiler.
Personally, I would have used some 'compiled db query' bytecode, that would be assembled once inside a client application and passed to the database.
This is an existing (optional) feature of databases, called "stored procedures".
Edit:
I would be very grateful if you could give some examples in which SQL is used without ORM purposely, and why
When I implemented my own ORM, I implemented the ORM framework using ADO.NET: and using ADO.NET includes using SQL statements in its implementation.

After all the edits and comments, the main point of your question appears to be : why is the nature of SQL closer to being a human/database interface than to being an application/database interface ?
And the very simple answer to that question is : because that is exactly what it was originally intended to be.
The predecessors of SQL (QUEL being presumably the most important one) were intended to be exactly that : a QUERY language, i.e. one that didn't have any of INSERT, UPDATE, DELETE.
Moreover, it was intended to be a query language that could be used by any user, provided that user was aware of the logical structure of the database, and obviously knew how to express that logical structure in the query language he was using.
The original ideas behind QUEL/SQL were that a database was built using "just any mechanism conceivable", that the "real" database could be really just anything (e.g. one single gigantic XML file - allthough 'XML' was not considered a valid option at the time), and that there would be "some kind of machinery" that understood how to transform the actual structure of that 'just anything' into the logical relational structure as it was perceived by the SQL user.
The fact that in order to actually achieve that, the underlying structures are required to lend themselves to "viewing them relationally", was not understood as well in those days as it is now.

Yes, it is annoying to have to write SQL statements to store and retrieve objects.
That's why Microsoft have added things like LINQ (language integrated query) into C# and VB.NET to make it possible to query databases using objects and methods instead of strings.
Most other languages have something similar with varying levels of success depending on the abilities of that language.
On the other hand, it is useful to know how SQL works and I think it is a mistake to shield yourself entirely from it. If you use the database without thinking you can write extremely inefficient queries and index the database incorrectly. But once you understand how to use SQL correctly and have tuned your database, you have a very powerful tried-and-tested tool available for finding exactly the data you need extremely quickly.

My biggest reason for SQL is Ad-hoc reporting. That report your business users want but don't know that they need it yet.

SQL is an interface between a human
and a database. The question is why do
we have to use it for
application/database interaction? I
still ask for examples of human beings
writing/debugging SQL.
I use sqlite a lot right from the simplest of tasks (like logging my firewall logs directly to a sqlite database) to more complex analytic and debugging tasks in my day-to-day research. Laying out my data in tables and writing SQL queries to munge them in interesting ways seems to be the most natural thing to me in these situations.
On your point about why it is still used as an interface between application/database, this is my simple reasoning:
There is about 3-4 decades of
serious research in that area
starting in 1970 with Codd's seminal
paper on Relational Algebra.
Relational Algebra forms the
mathematical basis to SQL (and other
QLs), although SQL does not
completely follow the relational
model.
The "text" form of the language
(aside from being easily
understandable to humans) is also
easily parsable by machines (say
using a grammar parser like like
lex) and is easily convertable to whatever "bytecode" using any number of optimizations.
I am not sure if doing this in any
other way would have yielded
compelling benefits in the generic cases. Otherwise it
would have been probably discovered
and adopted in the 3 decades of
research. SQL probably provides the
best tradeoffs when bridging the
divide between humans/databases and
applications/databases.
The question that then becomes interesting to ask is, "What are the real benefits of doing SQL in any other "non-text" way?" Will google for this now:)

SQL is a common interface used by the DBMS platform - the entire point of the interface is that all database operations can be specified in SQL without needing supplementary API calls. This means that there is a common interface across all clients of the system - application software, reports and ad-hoc query tools.
Secondly, SQL gets more and more useful as queries get more complex. Try using LINQ to specify a 12-way join a with three conditions based on existential predicates and a condition based on an aggregate calculated in a subquery. This sort of thing is fairly comprehensible in SQL but unlikely to be possible in an ORM.
In many cases an ORM will do 95% of what you want - most of the queries issued by applications are simple CRUD operations that an ORM or other generic database interface mechanism can handle easily. Some operations are best done using custom SQL code.
However, ORMs are not the be-all and end-all of database interfacing. Fowler's Patterns of Enterprise Application Architecture has quite a good section on other types of database access strategy with some discussion of the merits of each.
There are often good reasons not to use an ORM as the primary database interface layer. An example of a good one is that platform database libraries like ADO.Net often do a good enough job and integrate nicely with the rest of the environment. You might find that the gain from using some other interface doesn't really outweigh the benefits from the integration.
However, the final reason that you can't really ignore SQL is that you are ultimately working with a database if you are doing a database application. There are many, many WTF stories about screw-ups in commercial application code done by people who didn't understand databases properly. Poorly thought-out database code can cause trouble in so many ways, and blithely thinking that you don't need to understand how the DBMS works is an act of Hubris that is bound to come and bite you some day. Worse yet, it will come and bite some other poor schmoe who inherits your code.

While I see your point, SQL's query language has a place, especially in large applications with a lot of data. And to point out the obvious, if the language wasn't there, you couldn't call it SQL (Structured Query Language). The benefit of having SQL over the method you described is SQL is generally very readable, though some really push the limits on their queries.
I whole heartly agree with Mark Byers, you should not shield yourself from SQL. Any developer can write SQL, but to really make your application perform well with SQL interaction, understanding the language is a must.
If everything was precompiled with bytecode as you described, I'd hate to be the one to have to debug the application after the original developer left (or even after not seeing the code for 6 months).

I think the premise of the question is incorrect. That SQL can be represented as text is immaterial. Most modern databases would only compile queries once and cache them anyway, so you already have effectively a 'compiled bytecode'. And there's no reason this couldn't happen client-wise though I'm not sure if anyone's done it.
You said SQL is a text message, well I think of him as a messenger, and, as we know, don't shoot the messenger. The real issue is that relations are not a good enough way of organising real world data. SQL is just lipstick on the pig.

If the first part you seem to refer to what is usually called the Object - relational mapping impedance. There are already a lot of frameworks to alleviate that problem. There are tradeofs as well. Some things will be easier, others will get more complex, but in the general case they work well if you can afford the extra layer.
In the second part you seem to complain about SQL being text (it uses strings instead of ids, etc)... SQL is a query language. Any language (computer or otherwise) that is meant to be read or written by humans is text oriented for that matter. Assembly, C, PHP, you name it. Why? Because, well... it does make sense, doesn't it?
If you want precompiled queries, you already have stored procedures. Prepared statements are also compiled once on the fly, IIRC. Most (if not all) db drivers talk to the database server using a binary protocol anyway.

yes, text is a bit inefficient. But actually getting the data is a lot more costly, so the text based sql is reasonably insignificant.

SQL was created to provide an interface to make ad hoc queries against a relational database.
Generally, most relational databases understand some form of SQL.
Object-oriented databases exist, and (presumably) use objects to do their querying... but as I understand it, OO databases have a lot more overheard, and relational databases work just fine.
Relational Databases also allow you to operate in a "disconnected" state. Once you have the information you asked for, you can close the database connection. With an OO database, you either need to return all objects related to the current one (and the ones they're related to... and the... etc...) or reopen the connection to retrieve new objects as they are accessed.
In addition to SQL, you also have ORMs (object-relational mappings) that map objects to SQL and back. There are quite a few of them, including LINQ (.NET), the MS Entity Framework (.NET), Hibernate (Java), SQLAlchemy (Python), ActiveRecord (Ruby), Class::DBI (Perl), etc...

A database language is useful because it provides a logical model for your data independent of any applications that use it. SQL has a lot of shortcomings however, not the least being that its integration with other languages is poor, type support is about 30 years behind the rest of the industry and it has never been a truly relational language anyway.
SQL has survived mostly because the database market has been and remains dominated by the three mega-vendors who have a vested interest in protecting their investment. That's changing and SQL's days are probably numbered but the model that will finally replace it probably hasn't arrived yet - although there are plenty of contenders around these days.

I don't think most people are getting your question, though I think it's very clear. Unfortunately I don't have the "correct" answer. I would guess it's a combination of several things:
Semi-arbitrary decisions when it was designed such as ease of use, not needing a SQL compiler (or IDE), portability, etc.
It happened to catch on well (probably due to similar reasons)
And now due to historical reasons (compatibility, well known, proven, etc.) continues to be used.
I don't think most companies have bothered with another solution because it works well, isn't much of a bottleneck, it's a standard, blah, blah..

One of the Unix design principles can be said thusly, "Write programs to handle text streams, because that is a universal interface.".
And that, I believe, is why we typically use SQL instead of some 'byte-SQL' that only has a compilation interface. Even if we did have a byte-SQL, someone would write a "Text SQL", and the loop would be complete.
Also, MySQL and SQLite are less full-featured than, say, MSSQL and Oracle SQL. So you're still in the low end of the SQL pool.

Actually there are a few non-SQL database (like Objectivity, Oracle Berkeley DB, etc.) products came but non of them succeeded. In future if someone finds intuitive alternative for SQL, that will answer your question.

There are a lot of non relational database systems. Here are just a few:
Memcached
Tokyo Cabinet

As far as finding a relational database that doesn't use SQL as its primary interface, I think you won't find it. Reason: SQL is a great way to talk about relations. I can't figure out why that's a big deal to you: if you don't like SQL, put an abstraction over it (like an ORM) so you don't have to worry about it. Let the abstraction worry about it. It gets you to the same place.
However, the problem your'e really mentioning here is the object-relation disconnect - the problem is with the relation itself. Objects and relational-tuples don't always lend themselves to be a 1-1 relationship, which is the reason why a developer can frustrated with a database. The solution to that is to use a different database type.

Because often, you cannot be sure that (citing you) "no one ever seen after deployment". Knowing that there is an easy interface for reporting and for dataset level querying is a good path for evolution of your app.
You're right, that there are other solutions that may be valid in some situations: XML, plain text files, OODB...
But having a set of common interfaces (like ODBC) is a huge plus for the life of data.

I think the reason might be the search/find/grab algorithms the sql laungage is connected to do. Remember that sql has been developed for 40 years - and the goal has been both preformence wise and user firendly wise.
Ask yourself what the best way of finding 2 attibutes is. Now why investigating that each time you would want to do something that includes that each time you develope your application. Assuming the main goal is the developing of your application when developing an application.
An application has similarities with other applications, a database has similarities with other databases. So there should be a "best way" of these to interact, logically.
Also ask yourself how you would develop a better console only application that does not use sql laungage. If you cannot do that I think you need to develope a new kind of GUI that are even more fundamentally easier to use than with a console - to develope things from it. And that might actually be possible. But still most development of applications is based around console and typing.
Then when it comes to laungage I don´t think you can make a much more fundamentally easier text laungage than sql. And remember that each word of anything is inseperatly connected to its meaning - if you remove the meaning the word cannot be used - if you remove the word you cannot communicate the meaning. You have nothing to describe it with (And maybe you cannot even think it beacuse it woulden´t be connected to anything else you have thought before...).
So basically the best possible algorithms for database manipulation are assigned to words - if you remove these words you will have to assign these manipulations something else - and what would that be?

i think you can use ORM
if and only if you know the basic of sql.
else the result there isn't the best

When to use an ORM (Sequel, Datamapper, AR, etc.) vs. pure SQL for querying

A colleague of mine is currently designing SQL queries like the one below to produce reports, which are displayed in excel files through an external data query.
At present, only reporting processes on the DB are required (no CRUD operations).
I am trying to convince him that it would be better to use a ruby ORM in order to be able to display the data in a rails/sinatra app.
Despite the obvious advantages in displaying the data, what advantages are there for him in learning to use an ORM like Sequel or Datamapper?
The SQL queries he is writing are clearly quite complex, and being relatively new to SQL, he often complains that it is very time-consuming and confusing.
Is it possible to write extremely complex queries with an ORM? and if so, which is the most suitable(I have heard Sequel is good for legacy dbs)? and what are the advantages of learning ruby and using an ORM versus sticking with plain SQL, in making complex database queries?

I'm the DataMapper maintainer, and I think for complex reporting you should use SQL.
While I do think someday we'll have a DSL that provides the power and conciseness of SQL, everything I've seen so far requires you to write more Ruby code than SQL for complex queries. I would much rather maintain a 5 line SQL query than 10-15 lines of Ruby code to describe the same complex operation.
Please note I say complex.. if you have something simple, use the ORM's build-in finders. However, I do believe there is a line you can cross where SQL becomes simpler. Now, most apps aren't just reporting. You may have alot of CRUD type operations, for which an ORM is perfectly suited and far better than doing those things by hand.
One thing that an ORM will usually provide is some sort of organization to your application logic. You can group code based around each model in the same file. It's usually there that I'll put the complex SQL query, rather than embedding it in the controller, eg:
class User
include DataMapper::Resource
property :id, Serial
property :name, String, :length => 1..100, :required => true
property :age, Integer, :min => 1, :max => 130
def self.some_complex_query
repository.adapter.select <<-SQL
SELECT ...
FROM ...
WHERE ...
... more complex stuff here ...
SQL
end
end
Then I can just generate the report using User.some_complex_query. You could also push the SQL query into a view if you wanted to further cleanup this code.
EDIT: By "view" in the above sentence I meant RDBMS view, rather than view in the MVC context. Just wanted to clear up any potential confusion.

If you are writing your queries by hand you have the chance to optimize them. When I look at that query I see some potential for optimizations (E.ICGROUPNAME LIKE '%san-fransisco%' or E.ICGROUPNAME LIKE '%bordeaux%' wont use an index = Table Scan).
When using an OR Mapper (the native Objects/Tables) for reporting you have no or little control over the resulting SQL Query.
But: You could put that query in an View or Stored Procedure and map that View/Proc with an OR Mapper. You can optimize your queries and you can use all features of your Application Framework.

Unless you're dealing with objects, an ORM is not necessary. It sounds like your friend simply needs to generate reports, in which case pure SQL is just fine so long as he knows what he's doing (e.g. avoiding SQL injection issues).
ORM stands for "Object-Relational Mapping". If you don't have the "O" (objects), then it's probably not a good fit for your app. Where ORMs really shine is in persisting objects to the database and loading them from a database.

ORM stands for Object Relational Mapping - but looking at the query your friend seems to be wanting a pretty specific table of sums and other items... I've not used Ruby's Sequel, but I've used Hibernate, and Python's SQLAlchemy (for Django/Turbogears) and while you can do these sorts of queries, I don't believe that is their strength.
The power of ORM comes from being able to finding Foo->Bar object relationships, say you want all the Bar objects for Foo's field greater then X... That sort of thing. Therefore I would not classify an ORM as a "good" solution, though moving to a real programming language like Ruby and doing the SQL through it instead of Excel... that in itself is a win.
Just my 2 cents.

In a situation like that, I'd probably write them by hand or use a View (if the DB you're using supports views)

ORM's are used when you have Objects (Business Objects). I am therefore assuming that you have an application with which you creating and Managing the Business Objects that are ultimately saved into the database. If you have then you have almost definitely got some representation of the relationships and probably many of the calculations you are going to use in reports. The problem with using SQL to directly access your database for reports is simply maintainability.
You typically put a lot of effort into ensuring that your Business Objects hide any details of their database. You implement business rules and do common calculations in your Business Objects. Build a common language for all members of the team etc etc. You then use an ORM to map to the database and use Habanero or NHibernate or something like that to do this. This is all great. We do this all in the name of Maintainability and is great. You can migrate your application change your design etc etc.
You now go and write SQL to run reports over time you have hundreds of report. Firstly they often duplicate logic you already have in your BusinessObjects (Usually without any tests) and even worse Bham Damb sorry maintainability is now stuffed forget about moving a that field from one table to another forget about splitting that table into two changing that relationship etc you have a number of reports that are going to break unexpectedly.
The problem with quering through your Domain Objects/Business Objects is simply one of performance.
In summary if you are using Domain Driven Design or Business Object concepts try to use these for reports. (You will probably run directly from DB using SQL or stored procs for performance reasons but try limit these use your Business Objects first and then use SQL).
The other option of course is using a separate reporting database (Like some of the BI concepts) The mapping from your transactional DB to your reporting DB is therefore in one place and easily changeable in cases where you want to change your design.
Domain Objects (Business Objects) and ORMs have all the knowledge to allow you to start building high performing queries that run directly on the Database while using the Domain Terminology. Lets hope that these continue to evolve to a point where this is a reality.
Until then if you are using Business Objects in your application try use them for Reporting when performance is an issue resort to SQL.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas