Shouldn't FROM come before SELECT in Sql? - sql

This is something that has always bothered me. Wouldnt it make more sense to have the FROM clause come before the SELECT? Whenever Im writing sql, especially with joins, I always figure out the FROM clause first and then write the SELECT.
Plus, putting the FROM first would allow for better intellisense inside the editor.
Does anyone know what the reasoning was to have SELECT come first? Am I only one who is bothered by this?

Yes it is strange and counterintuitive. Hugh Darwen theorises about how this state of affairs came about:
Do you take SELECT-FROM-WHERE for
granted, or do you, like me, find it
rather curious that the System R team
should have spurned the normal way of
writing expressions of arbitrary
complexity in favour of something
utterly idiosyncratic and, one might
say, rather dictatorial...?
The fact is that in the 1960s various
scripting languages (as we tend to
call such things these days) had come
about for the purposes of report
generation, especially ad hoc report
generation. We had one such language
in the prerelational DBMS called
Terminal Business System (TBS) that I
worked on for IBM from 1969-77. Our
language required the user to specify
the required report in a series of
steps that had to be given in the
prescribed order...
A somewhat similar but much more
sophisticated report generator was
later developed by IBM in the US, as
part of a product called (prosaically,
as was IBM's style in those days)
Generalized Information System
(GIS)... when I first looked at SQL,
my immediate reaction was "Oh no!
Son of GIS? Please not that!" I
might have been quite wrong about
this. The similarity I perceived
might have been illusory and even if
it was not, I have no firm evidence
that anybody in the System R team was
familiar with GIS. The fact remains
that the general style of a fixed
order of actions was the order of the
day at the time. I postulate that
SQL's SELECT-FROM-WHERE arose out of
this fashion.
From HAVING a Blunderful Time

The syntax was to resemble English.

Related

For really complex reports, do people sometimes code in their language rather than in sql?

I have some pretty complex reports to write. Some of them... I'm not sure how I could write an sql query for just one of the values, let alone stuff them in a single query.
Is it common to just pull a crap load of data and figure it all via code instead? Or should I try and find a way to make all the reports rely on sql?
I have a very rich domain model. In fact, parts of code can be expanded on to calculate exactly what they want. The actual logic is not all that difficult to write - and it's nicer to work my domain model than with SQL. With SQL, writing the business logic, refactoring it, testing it and putting it version control is a royal pain because it's separate from your actual code.
For example, one the statistics they want is the % of how much they improved, especially in relation to other people in the same class, the same school, and compared to other schools. This requires some pretty detailed analysis of how they performed in the past to their latest information, as well as doing a calculation for the groups you are comparing against as a whole. I can't even imagine what the sql query would even look like.
The thing is, this % improvement is not a column in the database - it involves a big calculation in of itself by analyzing all the live data in real-time. There is no way to cache this data in a column as doing this calculation for every row it's needed every time the student does something is CRAZY.
I'm a little afraid about pulling out hundreds upon hundreds of records to get these numbers though. I may have to pull out that many just to figure out 1 value for 1 user... and if they want a report for all the users on a single screen, it's going to basically take analyzing the entire database. And that's just 1 column of values of many columns that they want on the report!
Basically, the report they want is a massive performance hog no matter what method I choose to write it.
Anyway, I'd like to ask you what kind of solutions you've used to these kind of a problems.
Sometimes a report can be generated by a single query. Sometimes some procedural code has to be written. And sometimes, even though a single query CAN be used, it's much better/faster/clearer to write a bit of procedural code.
Case in point - another developer at work wrote a report that used a single query. That query was amazing - turned a table sideways, did some amazing summation stuff - and may well have piped the output through hyperspace - truly a work of art. I couldn't have even conceived of doing something like that and learned a lot just from readying through it. It's only problem was that it took 45 minutes to run and brought the system to its knees in the process. I loved that query...but in the end...I admit it - I killed it. ((sob!)) I dismembered it with a chainsaw while humming "Highway To Hell"! I...I wrote a little procedural code to cover my tracks and...nobody noticed. I'd like to say I was sorry, but...in the end the job ran in 30 seconds. Oh, sure, it's easy enough to say "But performance matters, y'know"...but...I loved that query... ((sniffle...)) Anybody seen my chainsaw..? >;->
The point of the above is "Make Things As Simple As You Can, But No Simpler". If you find yourself with a query that covers three pages (I loved that query, but...) maybe it's trying to tell you something. A much simpler query and some procedural code may take up about the same space, page-wise, but could possibly be much easier to understand and maintain.
Share and enjoy.
Sounds like a challenging task you have ahead of you. I don't know all the details, but I think I would go at it from several directions:
Prioritize: You should try to negotiate with the "customer" and prioritize functionality. Chances are not everything is equally useful for them.
Manage expectations: If they have unrealistic expectations then tell them so in a nice way.
IMHO SQL is good in many respects, but it's not a brilliant programming language. So I'd rather just do calculations in the application rather than in the database.
I think I'd go for some delay in the system .. perhaps by caching calculated results for some minutes before recalculating. This is with a mind towards performance.
The short answer: for analysing large quantities of data, a SQL database is probably the best tool around.
However, that does not mean you should analyse this straight off your production database. I suggest you look into Datawarehousing.
For a one-off report, I'll write the code to produce it in whatever I can best reason about it in.
For a report that'll be generated more than once, I'll check on who is going to be producing it the next time. I'll still write the code in whatever I can best reason about it in, but I might add something to make it more attractive to use to that other person.
People usually use a third party report writing system rather than writing SQL. As an application developer, if you're spending a lot of time writing complex reports, I would severely question your manager's actions in NOT buying an off-the-shelf solution and letting less-skilled people build their own reports using some GUI.

How do you think while formulating Sql Queries. Is it an experience or a concept?

I have been working on sql server and front end coding and have usually faced problem formulating queries.
I do understand most of the concepts of sql that are needed in formulating queries but whenever some new functionality comes into the picture that can be dont using sql query, i do usually fails resolving them.
I am very comfortable with select queries using joins and all such things but when it comes to DML operation i usually fails
For every query that i never done before I usually finds uncomfortable with that while creating them. Whenever I goes for an interview I usually faces this problem.
Is it their some concept behind approaching on formulating sql queries.
Eg.
I need to create an sql query such that
A table contain single column having duplicate record. I need to remove duplicate records.
I know i can find the solution to this query very easily on Googling, but I want to know how everyone comes to the desired result.
Is it something like Practice Makes Man Perfect i.e. once you did it, next time you will be able to formulate or their is some logic or concept behind.
I could have get my answer of solving above problem simply by posting it on stackoverflow and i would have been with an answer within 5 to 10 minutes but I want to know the reason. How do you work on any new kind of query. Is it a major contribution of experience or some an implementation of concepts.
Whenever I learns some new thing in coding section I tries to utilize it wherever I can use it. But here scenario seems to be changed because might be i am lagging in some concepts.
EDIT
How could I test my knowledge and
concepts in Sql and related sql
queries ?
Typically, the first time you need to open a child proof bottle of pills, you have a hard time, but after that you are prepared for what it might/will entail.
So it is with programming (me thinks).
You find problems, research best practices, and beat your head against a couple of rocks, but in the process you will come to have a handy set of tools.
Also, reading what others tried/did, is a good way to avoid major obsticles.
All in all, with a lot of practice/coding, you will see patterns quicker, and learn to notice where to make use of what tool.
I have a somewhat methodical method of constructing queries in general, and it is something I use elsewhere with any problem solving I need to do.
The first step is ALWAYS listing out any bits of information I have in a request. Information is essentially anything that tells me something about something.
A table contain single column having
duplicate record. I need to remove
duplicate
I have a table (I'll call it table1)
I have a
column on table table1 (I'll call it col1)
I have
duplicates in col1 on table table1
I need to remove
duplicates.
The next step of my query construction is identifying the action I'll take from the information I have.
I'll look for certain keywords (e.g. remove, create, edit, show, etc...) along with the standard insert, update, delete to determine the action.
In the example this would be DELETE because of remove.
The next step is isolation.
Asnwer the question "the action determined above should only be valid for ______..?" This part is almost always the most difficult part of constructing any query because it's usually abstract.
In the above example you're listing "duplicate records" as a piece of information, but that's really an abstract concept of something (anything where a specific value is not unique in usage).
Isolation is also where I test my action using a SELECT statement.
Every new query I run gets thrown through a select first!
The next step is execution, or essentially the "how do I get this done" part of a request.
A lot of times you'll figure the how out during the isolation step, but in some instances (yours included) how you isolate something, and how you fix it is not the same thing.
Showing duplicated values is different than removing a specific duplicate.
The last step is implementation. This is just where I take everything and make the query...
Summing it all up... for me to construct a query I'll pick out all information that I have in the request. Using the information I'll figure out what I need to do (the action), and what I need to do it on (isolation). Once I know what I need to do with what I figure out the execution.
Every single time I'm starting a new "query" I'll run it through these general steps to get an idea for what I'm going to do at an abstract level.
For specific implementations of an actual request you'll have to have some knowledge (or access to google) to go further than this.
Kris
I think in the same way I cook dinner. I have some ingredients (tables, columns etc.), some cooking methods (SELECT, UPDATE, INSERT, GROUP BY etc.) then I put them together in the way I know how.
Sometimes I will do something weird and find it tastes horrible, or that it is amazing.
Occasionally I will pick up new recipes from the internet or friends, then use parts of these in my own.
I also save my recipes in handy repositories, broken down into reusable chunks.
On the "Delete a duplicate" example, I'd come to the result by googling it. This scenario is so rare if the DB is designed properly that I wouldn't bother keeping this information in my head. Why bother, when there is a good resource is available for me to look it up when I need it?
For other queries, it really is practice makes perfect.
Over time, you get to remember frequently used patterns just because they ARE frequently used. Rare cases should be kept in a reference material. I've simply got too much other stuff to remember.
Find a good documentation to your software. I am using Mysql a lot and Mysql has excellent documentation site with decent search function so you get many answers just by reading docs. If you do NOT get your answer at least you are learning something.
Than I set up an example database (or use the one I am working on) and gradually build my SQL. I tend to separate the problem into small pieces and solve it step by step - this is very successful if you are building queries including many JOINS - it is best to start with some particular case and "polute" your SQL with many conditions like WHEN id = "123" which you are taking out as you are working towards your solution.
The best and fastest way to learn good SQL is to work with someone else, preferably someone who knows more than you, but it is not necessarry condition. It can be replaced by studying mature code written by others.
Your example is a test of how well you understand the DISTINCT keyword and the GROUP BY clause, which are SQL's ways of dealing with duplicate data.
Examples and experience. You look at other peoples examples and you create your own code and once it groks, you don't need to think about it again.
I would have a look at the Mere Mortals book - I think it's the one by Hernandez. I remember that when I first started seriously with SQL Server 6.5, moving from manual ISAM databases and Access database systems using VB4, that it was difficult to understand the syntax, the joins and the declarative style. And the SQL queries, while powerful, were very intimidating to understand - because typically, I was looking at generated code in Microsoft Access.
However, once I had developed a relatively systematic approach to building queries in a consistent and straightforward fashion, my skills and confidence quickly moved forward.
From seeing your responses you have two options.
Have a copy of the specification for whatever your working on (SQL spec and the documentation for the SQL implementation (SQLite, SQL Server etc..)
Use Google, SO, Books, etc.. as a resource to find answers.
You can't formulate an answer to a problem without doing one of the above. The first option is to become well versed into the capabilities of whatever you are working on.
The second option allows you to find answers that you may not even fully know how to ask. You example is fairly simplistic, so if you read the spec/implementation documentaion you would know the answer right away. But there are times, where even if you read the spec/documentation you don't know the answer. You only know that it IS possible, just not how to do it.
Remember that as far as jobs and supervisors go, being able to resolve a problem is important, but the faster you can do it the better which can often be done with option 2.

Why no love for SQL? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I've heard a lot lately that SQL is a terrible language, and it seems that every framework under the sun comes pre-packaged with a database abstraction layer.
In my experience though, SQL is often the much easier, more versatile, and more programmer-friendly way to manage data input and output. Every abstraction layer I've used seems to be a markedly limited approach with no real benefit.
What makes SQL so terrible, and why are database abstraction layers valuable?
This is partly subjective. So this is my opinion:
SQL has a pseudo-natural-language style. The inventors believed that they can create a language just like English and that database queries will be very simple. A terrible mistake. SQL is very hard to understand except in trivial cases.
SQL is declarative. You can't tell the database how it should do stuff, just what you want as result. This would be perfect and very powerful - if you wouldn't have to care about performance. So you end up in writing SQL - reading execution plans - rephrasing SQL trying to influence the execution plan, and you wonder why you can't write the execution plan yourself.
Another problem of the declarative language is that some problems are easier to solve in a imperative manner. So you either write it in another language (you'll need standard SQL and probably a data access layer) or by using vendor specific language extensions, say by writing stored procedures and the like. Doing so you will probably find that you're using one of the worst languages you've ever seen - because it was never designed to be used as an imperative language.
SQL is very old. SQL has been standardized, but too late, many vendors already developed their language extensions. So SQL ended up in dozens of dialects. That's why applications are not portable and one reason to have a DB abstraction layer.
But it's true - there are no feasible alternatives. So we all will use SQL for the next few years.
Aside from everything that was said, a technology doesn't have to be bad to make an abstraction layer valuable.
If you're doing a very simple script or application, you can afford to mix SQL calls in your code wherever you like. However, if you're doing a complex system, isolating the database calls in separate module(s) is a good practice and so it is isolating your SQL code. It improves your code's readability, maintainability and testability. It allows you to quickly adapt your system to changes in the database model without breaking up all the high level stuff, etc.
SQL is great. Abstraction layers over it makes it even greater!
One point of abstraction layers is the fact that SQL implementations tend to be more or less incompatible with each other since the standard is slightly ambiguous, and also because most vendors have added their own (nonstandard) extras there. That is, SQL written for a MySQL DB might not work quite similarly with, say, an Oracle DB — even if it "should".
I agree, though, that SQL is way better than most of the abstraction layers out there. It's not SQL's fault that it's being used for things that it wasn't designed for.
SQL gets badmouthed from several sources:
Programmers who are not comfortable with anything but an imperative language.
Consultants who have to deal with many incompatible SQL-based products on a daily basis
Nonrelational database vendors trying to break the stranglehold of relational database vendors on the market
Relational database experts like Chris Date who view current implementations of SQL as insufficient
If you stick to one DBMS product, then I definitely agree that SQL DBs are more versatile and of higher quality than their competition, at least until you hit a scalability barrier intrinsic in the model. But are you really trying to write the next Twitter, or are you just trying to keep some accounting data organized and consistent?
Criticism of SQL is often a standin for criticisms of RDBMSes. What critics of RDBMSes seem not to understand is that they solve a huge class of computing problems quite well, and that they are here to make our lives easier, not harder.
If they were serious about criticizing SQL itself, they'd back efforts like Tutorial D and Dataphor.
It's not so terrible. It's an unfortunate trend in this industry to rubbish the previous reliable technology when a new "paradigm" comes out. At the end of the day, these frameworks are very most probably using SQL to communicate with the database so how can it be THAT bad? That said, having a "standard" abstraction layer means that a developer can focus on the application code and not the SQL code. Without such a standard layer you'd probably write a lightweight one each time you're developing a system, which is a waste of effort.
SQL is designed for management and query of SET based data. It is often used to do more and edge cases lead to frustration at times.
Actual USE of SQL can be SO impacted by the base database design that the SQL may not be the issue, but the design might - and when you toss in the legacy code associated with a bad design, changes are more impactive and costly to impliment (no one like to go back and "fix" stuff that is "working" and meeting objectives)
Carpenters can pound nails with hammers, saw lumber with saws and smooth boards with planes. It IS possible to "saw" using hammers and planes, but dang it is frustrating.
I wont say it's terrible. It's unsuitable for some tasks. For example: you can not write good procedural code with SQL. I was once forced to work with set manipulation with SQL. It took me a whole weekend to figure that out.
SQL was designed for relational algebra - that's where it should to be used.
I've heard a lot lately that SQL is a terrible language, and it seems that every framework under the sun comes pre-packaged with a database abstraction layer.
Note that these layers just convert their own stuff into SQL. For most database vendors SQL is the only way to communicate with the engine.
In my experience though, SQL is often the much easier, more versatile, and more programmer-friendly way to manage data input and output. Every abstraction layer I've used seems to be a markedly limited approach with no real benefit.
… reason for which I just described above.
The database layers don't add anything, they just limit you. They make the queries disputably more simple but never more efficient.
By definition, there is nothing in the database layers that is not in SQL.
What makes SQL so terrible, and why are database abstraction layers valuable?
SQL is a nice language, however, it takes some brain twist to work with it.
In theory, SQL is declarative, that is you declare what you want to get and the engine provides it in the fastest way possible.
In practice, there are many ways to formulate a correct query (that is the query that return correct results).
The optimizers are able to build a Lego castle out of some predefined algorithms (yes, they are multiple), but they just cannot make new algorithms. It still takes an SQL developer to assist them.
However, some people expect the optimizer to produce "the best plan possible", not "the best plan available for this query with given implementation of the SQL engine".
And as we all know, when the computer program does not meet people's expectations, it's the program that gets blamed, not the expectations.
In most cases, however, reformulating a query can produce a best plan possible indeed. There are tasks when it's impossible, however, with the new and growing improvements to SQL these cases get fewer and fewer in number.
It would be nice, though, if the vendors provided some low-level access to the functions like "get the index range", "get a row by the rowid" etc., like C compilers let you to embed the assembly right into the language.
I recenty wrote an article on this in my blog:
Double-thinking in SQL
I'm a huge ORM advocate and I still believe that SQL is very useful, although it's certainly possible to do terrible things with it (just like anything else). .
I look at SQL as a super-efficient language that does not have code re-use or maintainability/refactoring as priorities.
So lightning fast processing is the priority. And that's acceptable. You just have to be aware of the trade-offs, which to me are considerable.
From an aesthetic point of view, as a language I feel that it is lacking some things since it doesn't have OO concepts and so on -- it feels like very old school procedural code to me. But it's far and away the fastest way to do certain things, and that's a powerful niche!
SQL is excellent for certain kinds of tasks, especially manipulating and retrieving sets of data.
However, SQL is missing (or only partially implements) several important tools for managing change and complexity:
Encapsulation: SQL's encapsulation mechanisms are coarse. When you write SQL code, you have to know everything about the implementation of your data. This limits the amount of abstraction you can achieve.
Polymorphism: if you want to perform the same operation on different tables, you've got to write the code twice. (One can mitigate this with imaginative use of views.)
Visibility control: there's no standard SQL mechanism for hiding pieces of the code from one another or grouping them into logical units, so every table, procedure, etc. is
accessible from every other one, even when it's undesirable.
Modularity and Versioning
Finally, manually coding CRUD operations in SQL (and writing the code to hook it up to the rest of one's application) is repetitive and error-prone.
A modern abstraction layer provides all of those features, and allows us to use SQL where it's most effective while hiding the disruptive, repetitive implementation details. It provides tools to help overcome the object-relational impedance mismatch that complicates data access in object-oriented software development.
I would say that a database abstraction layer included with a framework is a good thing because it solves two very important problems:
It keeps the code distinct. By putting the SQL into another layer, which is generally very thin and should only be doing the basics of querying and handoff of results (in a standardized way), you keep your application free from the clutter of SQL. It's the same reason web developers (should) put CSS and Javascript in separate files. If you can avoid it, do not mix your languages.
Many programmers are just plain bad at using SQL. For whatever reason, a large number of developers (especially web developers) seem to be very, very bad at using SQL, or RDBMSes in general. They treat the database (and SQL by extension) as the grubby little middleman they have to go through to get to data. This leads to extremely poorly thought out databases with no indexes, tables stacked on top of tables in dubious manners, and very poorly written queries. Or worse, they try to be too general (Expert System, anyone?) and cannot reasonably relate data in any meaningful way.
Unfortunately, sometimes the way that someone tries to solve a problem and tools they use, whether due to ignorance, stubbornness, or some other trait, are in direct opposition with one another, and good luck trying to convince them of this. As such, in addition to just being a good practice, I consider a database abstraction layer to be a sort of safety net, as it not only keeps the SQL out of the poor developer's eyes, but it makes their code significantly easier to refactor, since all the queries are in one place.
SQL is based on Set Theory, while most high level languages are object oriented these days. Object programmers typically like to think in objects, and have to make a mental shift to use Set based tools to store their objects. Generally, it is much more natural (for the OO programmer) to just cut code in the language of their choice and do something like object.save or object.delete in application code instead of having to write sql queries and call the database to achieve the same result.
Of course, sometimes for complex things, SQL is easier to use and more efficient, so it is good to have a handle on both types of technology.
IMO, the problem that I see that people have with SQL has nothing to do with relational design nor the SQL language itself. It has to do with the discipline of modeling the data layer which in many ways is fundamentally different than modeling a business layer or interface. Mistakes in modeling at the presentation layer are generally much easier to correct than at the data layer where you have multiple applications using the database. These problems are the same as those encountered in modeling a service layer in SOA designs where you have to account for current consumers of your service and the input and output contracts.
SQL was designed to interact with relational database models. There are other data models that have existed for some time, but the discipline about designing the data layer properly exists regardless of the theoretical model used and thus, the difficulties that developers typically have with SQL are usually related to attempts to impose a non-relational data model onto a relational database product.
For one thing, they make it trivial to use parameterized queries, protecting you from SQL injection attacks. Using raw SQL, from this perspective, is riskier, that is, easier to get wrong from a security perspective. They also often present an object-oriented perspective on your database, relieving you of having to do this translation.
Heard a lot recently? I hope you're not confusing this with the NoSql movement. As far as i'm aware that is mainly a bunch of people who use NoSql for high scalability web apps and appear to have forgotten that SQL is an effective tool in a non "high scalability web app" scenario.
The abstraction layer business is just about sorting out the difference between Object Oriented code and Table - Set based code such as SQL likes to talk. Usually this results in writing lots of boiler plate and dull transition code between the two. ORM automates this and thus saves time for business objecty people.
For experienced SQL programmer the bad sides are
Verbosity
As many have said here, SQL is declarative, which means optimizing is not direct. It's like rallying compared to circuit racing.
Frameworks that try to address all possible dialects and don't support shortcuts of any of them
No easy version control.
For others, the reasons are that
some programmers are bad at SQL. Probably because SQL operates with sets, while programming languages work in object or functional paradigm. Thinking in sets (union, product, intersect) is a matter of habbit that some people don't have.
some operations aren't self-explanatory: i.e. at first it's not clear that where and having filter different sets.
there are too many dialects
The primary goal of SQL frameworks is to reduce your typing. They somehow do, but too often only for very simple queries. If you try doing something complex, you have to use strings and type a lot. Frameworks that try to handle everything possible, like SQL Alchemy, become too huge, like another programming language.
[update on 26.06.10] Recently I worked with Django ORM module. This is the only worthy SQL framework I've seen. And this one makes working with stuff a lot. Complex aggregates are a bit harder though.
SQL is not a terrible language, it just doesn't play too well with others sometimes.
If for example if you have a system that wants to represent all entities as objects in some OO language or another, then combining this with SQL without any kind of abstraction layer can become rather cumbersome. There's no easy way to map a complex SQL query onto the OO-world. To ease the tension between those worlds additional layers of abstraction are inserted (an OR-Mapper for example).
SQL is a really good language for data manipulation. From a developer perspective, what I don't like with it is that changing the database don't break your code at compile time... So I use abstraction which add this feature at the price of performance and maybe expressiveness of the SQL language, because in most application you don't need all the stuff SQL has.
The other reason why SQL is hated, is because of relational databases.
The CAP Theorem becomes popular:
What goals might you want from a
shared-data system?
Strong Consistency: all clients see the same view, even in presence of
updates
High Availability: all clients can find some replica of the data, even in
the presence of failures
Partition-tolerance: the system properties hold even when the system
is partitioned
The theorem states that you can always
have only two of the three CAP
properties at the same time
Relational database address Strong Consistency and Partition-Tolerance.
So more and more people realize that relational database is not the silver bullet, and more and more people begin to reject it in favor of high availability, because high availability makes horizontal scaling more easy. Horizontal scaling gain popularity because we have reached the limit of Moore law, so the best way to scale is to add more machine.
If relational database is rejected, SQL is rejected too.
Quick, write me SQL to paginate a dataset that works in MySQL, Oracle, MSSQL, PostgreSQL, and DB2.
Oh, right, standard SQL doesn't define any operators to limit the number of results coming back and which row to start at.
• Every vendor extends the SQL syntax to suit their needs. So unless you're doing fairly simple things, your SQL code is not portable.
• The syntax of SQL is not orthogonal; e.g., the select, insert, update,anddelete statements all have completely different syntactical structure.
I agree with your points, but to answer your question, one thing that makes SQL so "terrible" is the lack of complete standardization of T-SQL between database vendors (Sql Server, Oracle etc.), which makes SQL code unlikely to be completely portable. Database abstraction layers solve this problem, albeit with a performance cost (sometimes a very severe one).
Living with pure SQL can really be a maintenance hell. For me the greatest advantage of ORMs is the ability to safely refactor code without tedious "DB refactoring" procedures. There are good unit testing frameworks and refactoring tools for OO languages, but I yet have to see Resharper's counterpart for SQL, for example.
Still all DALs have SQL behind the scenes, and still you need to know it to understand what's happening to your database, but daily working with good abstraction layer becomes easier.
If you haven't used SQL too much, I think the major problem is the lack of good developer tools.
If you have lots of experience with SQL, you will have, at one point or another, been frustrated by the lack of control over the execution plan. This is an inherent problem in the way SQL was specified to the vendors. I think SQL needs to become a more robust language to truly harness the underlying technology (which is very powerful).
SQL has many flaws, as some other posters here have pointed out. Still, I much prefer to use SQL over many of the tools that people offer as alternatives, because the "simplifications" are often more complicated than the thing they were supposed to simplify.
My theory is that SQL was invented by a bunch of ivory-tower blue-skiers. The whole non-procedural structure. Sounds great: tell me what you want rather than how you want to do it. But in practice, it's often easier to just give the steps. Often this seems like trying to give car maintenance instructions by describing how the car should perform when you're done. Yes, you could say, "I want the car to once again get 30 miles per gallon, and to run with this humming sound like this ... hmmmm ... and, etc" But wouldn't it be easier for everyone to just say, "Replace the spark plugs" ? And even when you do figure out how to express a complex query in non-procedural terms, the database engine often comes up with a very inefficient execution plan to get there. I think SQL would be much improved by the addition of standardized ways to tell it which table to read first and what index to use.
And the handling of nulls drive me crazy! Yes, theoretically it must have sounded great when someone said, "Hey, if null means unknown, then adding an unknown value to a known value should give an unknown value. After all, by definition, we have no idea what the unknown value is." Theoretically, absolutely true. In practice, if we have 10,000 customers and we know exactly how much money 9,999 owe us but there's some question about the amount owed by the last one, and management says, "What are our total accounts receivable?", yes, the mathematically correct answer is "I don't know". But the practical answer is "we calculate $4,327,287.42 but one account is in question so that number isn't exact". I'm sure management would much rather get a close if not certain number than a blank stare. But SQL insists on this mathemcatically pristine approach, so every operation you do, you have to add extra code to check for nulls and handle them special.
All that said, I'd still rather use SQL than some layer built on top of SQL, that just creates another whole set of things I need to learn, and then I have to know that ultimately this will be translated to SQL, and sometimes I can just trust it to do the translation correctly and efficiently, but when things get complex I can't, so now I have to know the extra layer, I still have to know SQL, and I have to know how it's going to translate to I can trick the layer into tricking SQL into doing the right thing. Arggh.
There's no love for SQL because SQL is bad in syntax, semantics and current usage. I'll explain:
it's syntax is a cobol shrapnel, all the cobol criticism applies here (to a lesser degree, to be fair). Trying to be natural language like without actually attempting to interpret natural language creates arbirtrary syntax (is it DROP TABLE or DROP , UPDATE TABLE , UPDATE or UPDATE IN , DELETE or DELETE FROM ...) and syntactical monstrosities like SELECT (how many pages does it fill?)
semantics is also deeply flawed, Date explains it in great detail, but it will suffice to note that a three valued boolean logic doesn't really fit a relational algebra where a row can only be or not be part of a table
having a programming language as the main (and often only) interface to databases proved to be a really bad choice and it created a new category of security flaws
I'd agree with most of the posts here that the debate over the utility of SQL is mostly subjective, but I think it's more subjective in the nature of your business needs.
Declarative languages, as Stefan Steinegger has pointed out, are good for specifying what you want, not how you want to do it. This means that your various implementations of SQL are decent from a high-level perspective : that is, if all you want is to get some data and nothing else matters, you can satisfy yourself with writing relatively simple queries, and choosing the implementation of SQL that is right for you.
If you work on a much "lower" level, and you need to optimize all of that yourself, it's far from ideal. Using a further layer of abstraction can help, but if what you're really trying to do is specify the methods for optimizing queries and so forth, it's a little counter intuitive to add a middleman when trying to optimize.
The biggest problem I have with SQL is like other "standardized" languages, there are very few real standards. I'd almost prefer having to learn a whole new language between Sybase and MySQL so that I don't get the two conventions confused.
While SQL does get the job done it certainly has issues...
it tries to simultaneously be the high level and the low level abstraction, and that's ... odd. Perhaps it should have been two or more standards at different levels.
it is a huge failure as a standard. Lots of things go wrong when a standard either stirs in everything, asks too much of implementations, asks too little, or for some reason does not accomplish the partially social goal of motivating vendors and implementors to produce strictly conforming interoperable complete implementations. You certainly cannot say SQL has done any of that. Look at some other standards and note that success or failure of the standard is clearly a factor of the useful cooperation attained:
RS-232 (Bad, not nearly enough specified, even which pin transmits and which pin receives is optional, sheesh. You can comply but still achieve nothing. Chance of successful interop: really low until the IBM PC made a de-facto useful standard.)
IEEE 754-1985 Floating Point (Bad, overreach: not a single supercomputer or scientific workstation or RISC microprocessor ever adopted it, although eventually after 20 years we were able to implement it nicely in HW. At least the world eventually grew into it.)
C89, C99, PCI, USB, Java (Good, whether standard or spec, they succeeded in motivating strict compliance from almost everyone, and that compliance resulted in successful interoperation.)
it failed to be selected for arguably the most important database in the world. While this is more of a datapoint than a reason, the fact that Google Bigtable is not SQL and not relational is kind of an anti-achievement for SQL.
I don't dislike SQL, but I also don't want to have to write it as part of what I am developing. The DAL is not about speed to market - actually, I have never thought that there would be a DAL implementation that would be faster than direct queries from the code. But the goal of the DAL is to abstract. Abstraction comes at a cost, and here it is that it will take longer to implement.
The benefits are huge, though. Writing native tests around the code, using expressive classes, strongly typed datasets, etc. We use a "DAL" of sorts, which is a pure DDD implementation using Generics in C#. So we have generic repositories, unit of work implementations (code based transactions), and logical separation. We can do things like mock out our datasets with little effort and actually develop ahead of database implementations. There was an upfront cost in building such a framework, but it is very nice that business logic is the star of the show again. We consume data as a resource now, and deal with it in the language we are natively using in the code. An added benefit of this approach is the clear separation it provides. I no longer see a database query in a web page, for example. Yes, that page needs data. Yes, the database is involved. But now, no matter where I am pulling data from, there is one (and only one) place to go into the code and find it. Maybe not a big deal on smaller projects, but when you have hundreds of pages in a site or dozens of windows in a desktop application, you truly can appreciate it.
As a developer, I was hired to implement the requirements of the business using my logical and analytical skills - and our framework implementation allows for me to be more productive now. As a manager, I would rather have my developers using their logical and analytical skills to solve problems than to write SQL. The fact that we can build an entire application that uses the database without having the database until closer to the end of the development cycle is a beautiful thing. It isn't meant as a knock against database professionals. Sometimes a database implementation is more complex than the solution. SQL (and in our case, Views and Stored Procs, specifically) are an abstraction point where code can consume data as a service. In shops where there is a definite separation between the data and development teams, this helps to eliminate sitting in a holding pattern waiting for database implementation and changes. Developers can focus on the problem domain without hovering over a DBA and the DBA can focus on the correct implementation without a developer needing it right now.
Many posts here seem to argue that SQL is bad because it doesn't have "code optimization" features, and that you have no control over execution plans.
What SQL engines are good at is to come up with an execution plan for a written instruction, geared towards the data, the actual contents. If you care to take a look beyond the programming side of things, you will see that there is more to data than bytes being passed between application tiers.

How to think in SQL?

How do I stop thinking every query in terms of cursors, procedures and functions and start using SQL as it should be? Do we make the transition to thinking in SQL just by practise or is there any magic to learning the set based query language? What did you do to make the transition?
A few examples of what should come to your mind first if you're real SQL geek:
Bible concordance is a FULLTEXT index to the Bible
Luca Pacioli's Summa de arithmetica which describes double-entry bookkeeping is in fact a normalized database schema
When Xerxes I counted his army by walling an area that 10,000 of his men occupied and then marching the other men through this enclosure, he used HASH AGGREGATE method.
The House That Jack Built should be rewritten using a self-join.
The Twelve Days of Christmas should be rewritten using a self-join and a ROWNUM
There Was An Old Woman Who Swallowed a Fly should be rewritten using CTE's
If the European Union were called European Union All, we would see 27 spellings for the word euro on a Euro banknote, instead of 2.
And finally you can read a lame article in my blog on how I stopped worrying and learned to love SQL (I almost forgot I wrote it):
Click
And one more article just on the subject:
Double-thinking in SQL
The key thing is you're manipulating SETS & elements of sets; and relating different sets (and corresponding elements) together. That's really the heart of it, imho. That's why every table should have a primary key; why you see set operators in the language; and why set operators like UNION won't (by defualt) return duplicate rows.
Of course in practice, the rules of sets are bent or broken but it's not that hard to see when this is necessary (otherwise, SQL would be TOO limited). Imho, just crack open your discrete math book and reacquaint yourself with some set exercises.
Best advice I can give you is that every time you think about processing something row-by-row, that you stop and ask yourself if there is a set-based way to do this.
Joe Celko's Thinking in Sets (book)
Perfectly intelligent programmers
often struggle when forced to work
with SQL. Why? Joe Celko believes the
problem lies with their procedural
programming mindset, which keeps them
from taking full advantage of the
power of declarative languages. The
result is overly complex and
inefficient code, not to mention lost
productivity.
This book will change the way you
think about the problems you solve
with SQL programs.. Focusing on three
key table-based techniques, Celko
reveals their power through detailed
examples and clear explanations. As
you master these techniques, you’ll
find you are able to conceptualize
problems as rooted in sets and
solvable through declarative
programming. Before long, you’ll be
coding more quickly, writing more
efficient code, and applying the full
power of SQL.
When people ask me about joins I send them here it has a great visual representation on what they are!
The way that I learned was by doing a lot of queries, and working at a job that required you to think in terms of result sets.
From your question, it seems like you've been writing lots of front-end code that uses sequential/procedural/iterative data manipulation. If you don't get on any projects that require you to use result set skills, I personally wouldn't worry about it.
One thing you might want to try is by trying to write analytical queries, e.g., generating simplistic reports on your data. In those cases you are trying to summarize large amounts of data by cordoning them off into sets.
Another good way would be to read a book on the theoretical/mathematical foundations to RDBMSes. Those deal strictly with set theory and how parts of the SQL query syntax relate directly with the math behind it. Of course, this requires you to like math. :)
I found that the Art Of SQL was a useful kick in the head for getting into the right mindset.
Part of this, however, comes down to style.
Obviously, you need to start thinking in result sets and not just procedurally.
However, once you've start that, you will often find decisions have to be made.
Do you write the incredibly complex update statement that may be difficult to understand by anyone but yourself, and difficult to maintain, or do you write a less efficient, but easier to manage procedure?
I would HIGHLY suggest that you remember that SQL statements can have comments in them to clarifiy what they are doing, not just stored procedures.
link: The Art Of SQL
One exercise you might want to try is this:
Take some of your existing reporting code from your application layer, preferably something that produces a single, tabular data set. Starting with the most basic elements, port it over to an SQL View.
Take all of the columns pulled from a single table and write the SQL statement to select that data. Then join on one table at a time and start figuring out the appropriate conditions and logic for your output.
You might come up against some particular task that at first seems impossible in SQL, but depending on the implementation you are programming against, there is almost always a way to get the result you're looking for. Check the documentation for your SQL implementation, or try Google.
This exercise has the benefit of giving you an original report to test against, so you know if you're getting the output you expect.
A few things to watch out for:
Recursion and graphs are fairly advanced techniques; you might want to start with something easier. (Joe Celko has a good book on the topic, if you're interested.)
There's often a big difference between a BIT and a C-style bool. At the very least, you may have to explicitly cast your output from INT to BIT.
OUTER JOINs are useful when a portion of the data might be empty, but try not to abuse them.
I think it takes a while to adjust (it was long ago for me, so I don't remember too well). But perhaps the key point is that SQL is declarative - i.e. you specify what you want done, not precisely how it should be done procedurally. So for a simple example:
"Get me the names and salaries of employees in departments located in London"
The relevant SQL is almost natural:
select name, salary
from employees
join departments on departments.deptno = employees.deptno
where departments.location = 'London';
We have "told" SQL how to join departments to employees, but only declaratively (NATURAL JOIN removes the need to do that, but is dangerous so not used in practice). We haven't defined procedurally how it should be done (e.g. "for each department, find all employees...") SQL is free to choose the optimal method to perform the query.
Thinking of rows makes sense when you use SQL to dump a table to your file system and then do whatever has to be done in your favorite programming language.
Not too much leverage of SQL; waste of disk, memory, cpu and human resources.
Think of SQL as of English (or whatever human language you prefer).
Show me all customers who ride bulls and get drunk every day but never visited Indonesia with their mother-in-law whose phone number is the same as my friend Doug's except for the area code.
You can do it (and much more) in one SQL statement, just learn how to. It's very lucrative.

Languages other than SQL in postgres

I've been using PostgreSQL a little bit lately, and one of the things that I think is cool is that you can use languages other than SQL for scripting functions and whatnot. But when is this actually useful?
For example, the documentation says that the main use for PL/Perl is that it's pretty good at text manipulation. But isn't that more of something that should be programmed into the application?
Secondly, is there any valid reason to use an untrusted language? It seems like making it so that any user can execute any operation would be a bad idea on a production system.
PS. Bonus points if someone can make PL/LOLCODE seem useful.
#Mike: this kind of thinking makes me nervous. I've heard to many times "this should be infinitely portable", but when the question is asked: do you actually foresee that there will be any porting? the answer is: no.
Sticking to the lowest common denominator can really hurt performance, as can the introduction of abstraction layers (ORM's, PHP PDO, etc). My opinion is:
Evaluate realistically if there is a need to support multiple RDBMS's. For example if you are writing an open source web application, chances are that you need to support MySQL and PostgreSQL at least (if not MSSQL and Oracle)
After the evaluation, make the most of the platform you decided upon
And BTW: you are mixing relational with non-relation databases (CouchDB is not a RDBMS comparable with Oracle for example), further exemplifying the point that the perceived need for portability is many times greatly overestimated.
"isn't that [text manipulation] more of something that should be programmed into the application?"
Usually, yes. The generally accepted "three-tier" application design for databases says that your logic should be in the middle tier, between the client and the database. However, sometimes you need some logic in a trigger or need to index on a function, requiring that some code be placed into the database. In that case all the usual "which language should I use?" questions come up.
If you only need a little logic, the most-portable language should probably be used (pl/pgSQL). If you need to do some serious programming though, you might be better off using a more expressive language (maybe pl/ruby). This will always be a judgment call.
"is there any valid reason to use an untrusted language?"
As above, yes. Again, putting direct file access (for example) into your middle tier is best when possible, but if you need to fire things off based on triggers (that might need access to data not available directly to your middle tier), then you need untrusted languages. It's not ideal, and should generally be avoided. And you definitely need to guard access to it.
These days, any "unique" or "cool" feature in a DBMS makes me incredibly nervous. I break out in a rash and have to stop work until the itching goes away.
I just hate to be locked in to a platform unnecessarily. Suppose you build a big chunk of your system in PL/Perl inside the database. Or in C# within SQL Server, or PL/SQL within Oracle, there are plenty of examples*.
Now you suddenly discover that your chosen platform doesn't scale. Or isn't fast enough. Or something. Worse, there's a new kid on the database block (something like MonetDB, CouchDB, Cache, say but much cooler) that would solve all your problems (even if your only problem, like mine, is having an uncool databse platform). And you can't switch to it without recoding half your application.
(*Admittedly, the paid-for products are to some extent seeking to lock you in by persuading you to use their unique features, which is not an accusation that can directly be levelled at the free providers, but the effect is the same).
So that's a rant on the first part of the question. Heart-felt, though.
is there any valid reason to use an
untrusted language? It seems like
making it so that any user can execute
any operation would be a bad idea
My goodness, yes it does! A sort of "Perl injection attack"? Almost worth doing it just to see what happens, I'd have thought.
For philosophical reasons outlined above I think I'll pass on the PL/LOLCODE challenge. Although I was somewhat amazed to discover it was a link to something extant.
From my perspective, I guess the answer is 'it depends'.
There is an argument that manipulation of the data belongs in the database layer, so that the business logic does not need to be overly concerned about how the manipulation happens, it just knows that it has.
Another very good reason to process data on the db layer is if the volume of data being crunched means that network bandwidth will become an issue. I once had to categorise very large amounts of data. Processing this in the application layer was severly restricted by the time required to transfer all the data across the network for processing.
I then wrote a binning algorithm in PL/pgSQL and it worked much faster.
Regarding untrusted languages, I heard a podcast from Josh Berkus (a postgres advocate) who discussed an application of postgresql that brought in data from MySQL as part of its processing, so that the communication itself was handled by the postgres server. I don't remember the full details, I think it was on the FLOSS Weekly podcast which is quite an interesting discussion of the history of PostGRESQL and some of the issues it is put to.
The untrusted versions of the procedural languages allow you to access I/O on the system.
This can come in handy if you need a trigger or something send a email or connect to a socket server to send a popup notification. There are tons of uses for this type of thing, and because of postgresql isolation levels you cans safely do things like this.
You can put checkpoints in the function so if the transaction fails the email or whatever won't go out. The nice thing about doing this is it removes the logic from the client and puts it on the server.
I think most additional languages are offered so that if you develop in that language on a regular basis, you can feel comfortable writing db functions, triggers, etc. The usefulness of these features is to provide a control over data as close to the data as possible.
An example of a useful stored procedure I recently wrote in an external language that would not have been possible in pl/sql is a version of 'df' which allowed SQL table generators to pick a tablespace with the most free space available at runtime.
I used plperlu, and it was relatively simple, although I had to be careful with data typing.