I have been trying to improve my SQL chops a little bit by working through SQL Koans, since I enjoy the learn-by-doing-and-meditating approach, and my SQL knowledge is lacking. In one set of koans is the following:
-- Meditate on MANY-TO-MANY relationships
select a.first_name, a.last_name, b.title
from book b
join book_to_author_map map on _____.id = _____.book_id
join author a on _____.author_id = _____.id
where author_id in (1, 5, 6)
Having no previous experience with aliasing, and little experience with joins, I was stuck on this problem for a little while. I was stuck longer than necessary though, because in Emacs sql-mode the word map (which I understand now to be an alias for the table book_to_author_map) is highlighted as an SQL keyword. I spent a lot of time looking for documentation on this keyword, and found nothing (aside from lots of information on sqlmap...).
Peeking at the source code for the Emacs sql-mode I found that map is designated as a keyword as part of sql-mode-postgres-font-lock-keywords, so I started to search for map in relation to PostgreSQL, and found it in a list of SQL keywords in the PostgreSQL documentation. The keyword MAP is designated as a "non-reserved" keyword for SQL:2003 and a "reserved" keyword for SQL:1999. However, I have been unable so far to find any documentation for this keyword in association with SQL.
My question, more out of curiosity than anything else, is as in the title: what is the MAP keyword in SQL?
I don't have a copy of the standard, but judging by the grammar, MAP WITH <function> is a clause for the CREATE ORDERING statement.
CREATE ORDERING is used to specify a sort order for a user-defined type, though as far as I can tell, the only vendor to have implemented the MAP WITH clause is Teradata. It looks like this clause lets you define a sort order for a custom type by providing a function which maps it to an existing type with a known ordering.
There is no such statement in Postgres, which defines sort ordering via operator classes and collations.
Related
My question is strictly related to the ISO SQL standard, and not some of its implementations like Oracle, MySQL, PostgreSQL etc.
The question is:
Are nested aggregate functions allowed/forbidden by the SQL standard? For example, the following query:
select MAX(COUNT(*))
from a_table
group by a_column
In particular, I'm reading the documentation available at this link, but I can't find any reference about this
As far as I know, Oracle allows this, but MySQL and PostgreSQL not, so I was wondering if somebody that knows the standard better than me, can tell if it's a grey area where different implementations can do what they want, or if there are some constraints on the standard
What does the "Structured" word means in SQL?
Is it because this(SQL) language statements are organized into Clauses, expressions and predicates?
Because of this organization, is it called "Structured" ?
The original full name was SEQUEL, which stood for "Structured English Query Language". It later had to be renamed to SQL due to trademark issues.
So basically, it was yet another attempt to sell a programming language as "just like English, except with a formal syntax" (hence "structured").
As I understand it, SQL is actually an abbreviation of SEQUEL, or Structured English Query Language. It was meant to have queries that everyone could read. The structured part means that you can only use a structured English; i.e. select col1 from table1, but not give col1 out of table1.
Mostly because it is a backronym. They needed a S to make Sequel.
from: http://wiki.answers.com/Q/Why_sql_is_called_structured_query_language
SQL, the standard that was later developed from Codd's work, provides a means of describing data with its natural structure only - that is, without superimposing any additional structure for machine representation purposes.
Why did Andres Heilsberg designed LINQ syntax to be different than that of SQL (whereby made an overhead for the programmers to learn a whole new thing)?
Weren't it better if it used same syntax as of SQL?
LINQ isn't meant to be SQL. It's meant to be a query language which is as independent of the data source as reasonably possible. Now admittedly it has a strong SQL bias, but it's not meant to just be embedding SQL in source code (fortunately).
Personally, I vastly prefer LINQ's syntax to SQL's. In particular, the ordering is much more logical in LINQ. Just by looking at the order of the query clauses, you can see the logical order in which the query is processed. You start with a data source, possibly do some filtering, ordering etc, and usually end with a projection or grouping. Compare that with SQL, where you start off saying which columns you're interested in, not even knowing what table you're talking about yet.
Not only is LINQ more logical in that respect, but it allows tools to work with you better - if Visual Studio knows what data you're starting with, then when you start writing a select clause (for example) it can help you with IntelliSense. Additionally, it allows the translation from LINQ query expressions into "dot notation" to be relatively simple using extension methods, without the compiler having to be aware of any details of what the query will actually do.
So from my point of view: no, LINQ would be a lot worse if it had slavishly followed SQL's syntax.
First, choose your flavor of SQL - there are several! (T-, PL-, etc).
Ultimately, there are similarities and differences. A lot of the LINQ changes make more sense - i.e. choosing your source (FROM) before you try filtering (WHERE) / projection (SELECT), allowing better static analysis etc (including intellisense), and a more natural query comprehension syntax. This helps both the developer and the compiler, so I'm happy.
It is simpler to parse expression when initial data is provided in its beginning.
Because of this VS provides code completion even for partially written LINQ queries (great feature IMO).
The reason is the C# language designers use this approach because of when I first specify where the data is coming from, now Visual Studio and the C# compiler know what my data looks like. And I can have IntelliSense help in the rest of the query because Visual Studio will know that "city" (for example) is a string, and it has operations like startsWith and a property named length. And really, inside of a relational database like SQL Server, the select clause that you are writing in a SQL statement at the top is really one of the last pieces of the information the query engine has to figure out. Before that it has to figure out what table you are working against in the from clause even though the from clause comes later in SQL syntax
In just about any formally structured set of information, you start reading either from the start towards the end, or occasionally from the end towards the beginning (street addresses, for example.) But in SQL, especially SELECT queries, in order to properly understand its meaning you have to start in the middle, at the FROM clause. This can make long queries very difficult to read, especially if it contains nested SELECT queries.
Usually in programming, when something doesn't seem to make any sense, there's a historical reason behind it. Starting with the SELECT instead of the FROM doesn't make sense. Does anyone know the reason it's done that way?
I think the way in which a SQL statement is structured makes logical sense as far as English sentences are structured. Basically
I WANT THIS
FROM HERE
WHERE WHAT I WANT MEETS THESE CRITERIA
I don't think it makes much sense, In English at least, to say
FROM HERE
I WANT THIS
WHERE WHAT I WANT MEETS THESE CRITERIA
The SQL Wikipedia entry briefly describes some history:
During the 1970s, a group at IBM San Jose Research Laboratory developed the System R relational database management system, based on the model introduced by Edgar F. Codd in his influential paper, "A Relational Model of Data for Large Shared Data Banks". Donald D. Chamberlin and Raymond F. Boyce of IBM subsequently created the Structured English Query Language (SEQUEL) to manipulate and manage data stored in System R. The acronym SEQUEL was later changed to SQL because "SEQUEL" was a trademark of the UK-based Hawker Siddeley aircraft company.
The original name explicitly mentioned English, explaining the syntax.
Digging a little deeper, we find the FLOW-MATIC programming language.
FLOW-MATIC, originally known as B-0 (Business Language version 0), is possibly the first English-like data processing language. It was invented and specified by Grace Hopper, and development of the commercial variant started at Remington Rand in 1955 for the UNIVAC I. By 1958, the compiler and its documentation were generally available and being used commercially.
FLOW-MATIC was the inspiration behind the Common Business Oriented Language, one of the oldest programming languages still in active use. Keeping with that spirit, SEQUEL was designed with English-like syntax (1970s is modern, compared with 1950s and 1960s).
In perspective, "modern" programming systems still access databases using the age old ideas behind
MULTIPLY PRICE BY QUANTITY GIVING COST.
I must disagree. SQL grammar is not inside-out.
From the very first look you can tell whether the query will SELECT, INSERT, UPDATE, or DELETE data (all the rest of SQL, e.g. DDL, omitted on purpose).
Back to your SELECT statement confusion: The aim of SQL is to be declarative. Which means you express WHAT you want and not HOW you want it. So it makes every sense to first state WHAT YOU WANT (list of attributes you're selecting) and then provide the DBMS with some additional info on where that should be looked up FROM.
Placing the WHERE clause at the end makes great sense too: Imagine a funnel, wide at the top, narrow at the bottom. By adding a WHERE clause towards the end of the statement, you are choking down the amount of resulting data. Applying restrictions to your query any place else than at the bottom would require the developer to turn their head around.
ORDER BY clause at the very end: once the data has gone through the funnel, sort it.
JOINS (JOIN criteria) really belong into the FROM clause.
GROUPING: basically running data through a funnel before it gets into another funnel.
SQL sytax is sweet. There's nothing inside out about it. Maybe that's why SQL is so popular even after so many decades. It's rather easy to grasp and to make sense out of. (Although I have once faced a 7-page (A4-size) SQL statement which took me quite a while to get my head around.)
It's designed to be English like. I think that's the primary reason.
As a side note, I remember the initial previews of LINQ were directly modeled after it (select ... from ...). This was changed in later previews to be more programming language like (so that the scope goes downwards). Anders Hejlsberg specifically mentioned this weird fact about SQL (which makes IntelliSense harder and doesn't match C# scope rules) as the reason they made this decision.
Anyhow, good or bad, it's what it is and it's too late to change anything.
The order of the clauses in SQL is absolutely logical. Remember that SQL is a declarative language, where you declare what you want and the system figures out how best to get it for you. The first clause is the select clause where you list the columns that you want in the result table. This is the primary purpose of the query. Having stated what you want the result to look like, you next state where the data should come from. The where clause limits the amount of data being returned. There is no point in thinking about how to limit your data unless you know where it comes from, so it goes after the from clause. The group by clause works with the aggregation operators in the select clause and could go anywhere after the from clause however it is better to think about aggregation on the filtered data, so it comes after the where clause. The having clause has to come after the group by clause. The order by clause is about how the data is presented and could go anywhere after the select.
It's consistent with the rest of SQL's syntax of having every statement start with a verb (CREATE, DROP, UPDATE, etc.).
The major disadvantage of having the column list first is that it's inconvenient for auto-complete (as Hejlsberg has mentioned), but this wasn't a concern when the syntax was designed in the 1970s.
We could have had the best of both worlds with a syntax like SELECT FROM SomeTable: ColumnA, ColumnB, but it's too late to change it now.
Anyhow, SQL's SELECT statement order isn't unique. It exactly matches that of Python list comprehensions:
[(rec.a, rec.b) for rec in data where rec.a > 0]
History of the language aside (although it is fascinating) I think the thing you are missing is that SQL isn't about telling the system what to do, so much as what end result you want (and it figures out how to do it)
saying 'go over there to that rack, pick up the hats with hatbands, blue hats first, then green, then red, and bring them to me' is very much telling the system how to do what you want. it's programmer think where we presume the worker is very stupid and needs minutely detailed instructions.
SQL is starting with the end result first, the data you want, the order of the columns, etc.. it's very much the perspective of someone who is building a report. "I want firstname, lastname, then age, then....." That is after all the purpose of making the request. So it starts with that, the format of the results you want. Then it goes into where you expect it to find the data, what criteria to look for, the order to present it, etc.
So as an alternative to specifying in minute detail what you want the worker to do, SQL presumes the system knows how to do that, and centers more on what you want.
So instead of pedantically telling your worker to go here, get this, bring it over there.. it's more like saying "I want hats, from rack 12, which have hatbands, and please sort them by color."
Sql is the standard in query languages, however it is sometime a bit verbose. I am currently writing limited query language that will make my common queries quicker to write and with a bit less mental overhead.
If you write a query over a good database schema, essentially you will be always joining over the primary key, foreign key fields so I think it should be unnecessary to have to state them each time.
So a query could look like.
select s.name, region.description from shop s
where monthly_sales.amount > 4000 and s.staff < 10
The relations would be
shop -- many to one -- region,
shop -- one to many -- monthly_sales
The sql that would be eqivilent to would be
select distinct s.name, r.description
from shop s
join region r on shop.region_id = region.region_id
join monthly_sales ms on ms.shop_id = s.shop_id
where ms.sales.amount > 4000 and s.staff < 10
(the distinct is there as you are joining to a one to many table (monthly_sales) and you are not selecting off fields from that table)
I understand that original query above may be ambiguous for certain schemas i.e if there the two relationship routes between two of the tables. However there are ways around (most) of these especially if you limit the schema allowed. Most possible schema's are not worth considering anyway.
I was just wondering if there any attempts to do something like this?
(I have seen most orm solutions to making some queries easier)
EDIT: I actually really like sql. I have used orm solutions and looked at linq. The best I have seen so far is SQLalchemy (for python). However, as far as I have seen they do not offer what I am after.
Hibernate and LinqToSQL do exactly what you want
I think you'd be better off spending your time just writing more SQL and becoming more comfortable with it. Most developers I know have gone through just this progression, where their initial exposure to SQL inspires them to bypass it entirely by writing their own ORM or set of helper classes that auto-generates the SQL for them. Usually they continue adding to it and refining it until it's just as complex (if not more so) than SQL. The results are sometimes fairly comical - I inherited one application that had classes named "And.cs" and "Or.cs", whose main functions were to add the words " AND " and " OR ", respectively, to a string.
SQL is designed to handle a wide variety of complexity. If your application's data design is simple, then the SQL to manipulate that data will be simple as well. It doesn't make much sense to use a different sort of query language for simple things, and then use SQL for the complex things, when SQL can handle both kinds of thing well.
I believe that any (decent) ORM would be of help here..
Entity SQL is slightly higher level (in places) than Transact SQL. Other than that, HQL, etc. For object-model approaches, LINQ (IQueryable<T>) is much higher level, allowing simple navigation:
var qry = from cust in db.Customers
select cust.Orders.Sum(o => o.OrderValue);
etc
Martin Fowler plumbed a whole load of energy into this and produced the Active Record pattern. I think this is what you're looking for?
Not sure if this falls in what you are looking for but I've been generating SQL dynamically from the definition of the Data Access Objects; the idea is to reflect on the class and by default assume that its name is the table name and all properties are columns. I also have search criteria objects to build the where part. The DAOs may contain lists of other DAO classes and that directs the joins.
Since you asked for something to take care of most of the repetitive SQL, this approach does it. And when it doesn't, I just fall back on handwritten SQL or stored procedures.