"ORDER BY" in subquery - not avaliable in MonetDB? - sql

I found that, when using order-by directly, it is ok.
SELECT t0."D" AS fd,
SUM(t0."SD") AS top
FROM "mock_table_1" AS t0
GROUP BY t0."D"
ORDER BY top ASC
LIMIT 10
but when using it in a subquery, an syntax error is reported.
SELECT * FROM (
SELECT t0."D" AS fd,
SUM(t0."SD") AS top
FROM "mock_table_1" AS t0
GROUP BY t0."D"
ORDER BY top ASC
LIMIT 10
)
here is the error message.
syntax error, unexpected ORDER, expecting UNION or EXCEPT or INTERSECT or ')' in: "select t0."A" as d0,
So, I wonder if monetdb is designed to be like this, or it is a bug?

that is the expected behavior. offset, limit, and order by are not allowed in subqueries
https://www.monetdb.org/pipermail/users-list/2013-October/006856.html

SQL-conforming DBMSes are not supposed to allow ORDER BY in subqueries, because it contradicts the conceptual model of a relational DBMS. See:
Is order by clause allowed in a subquery
for details. A way around that, however, is to use Window Functions, which MonetDB does support. Specifically, in your subquery, instead of, say,
SELECT c1 FROM t1;
you can
SELECT c1, ROW_NUMBER() OVER () as rownum from t1;
and now you have the relative order of the inner query result available to the outer query.

Related

Can't use ORDER BY in a derived table

I am trying to select the last 20 rows of my SQL Database, but I get this error:
[Microsoft][ODBC Driver 17 for SQL Server][SQL Server]The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.
My query is:
SELECT TOP 20 * FROM (SELECT * FROM TBArticles ORDER BY id_art DESC)
I think it's because I am using ORDER BY in this second expression... but what can I do to select the 20 last rows fixing this error?
You don't need a subquery for this:
SELECT TOP 20 *
FROM TBArticles
ORDER BY id_art DESC
The documentation is quite clear on the use of ORDER BY in subqueries:
The ORDER BY clause is not valid in views, inline functions, derived tables, and subqueries, unless either the TOP or OFFSET and FETCH clauses are also specified. When ORDER BY is used in these objects, the clause is used only to determine the rows returned by the TOP clause or OFFSET and FETCH clauses. The ORDER BY clause does not guarantee ordered results when these constructs are queried, unless ORDER BY is also specified in the query itself.
Gordon's answer is probably the most direct way to handle your requirement. However, if you wanted to use a query along the same lines as the pattern you were already using, you could use ROW_NUMBER here:
SELECT *
FROM
(
SELECT *, ROW_NUMBER() OVER (ORDER BY id_art DESC) rn
FROM TBArticles
) t
WHERE rn <= 20;
By computing a row number in the derived table, the ordering "sticks" in the same way your original query was expecting.

Calculating SQL Server ROW_NUMBER() OVER() for a derived table

In some other databases (e.g. DB2, or Oracle with ROWNUM), I can omit the ORDER BY clause in a ranking function's OVER() clause. For instance:
ROW_NUMBER() OVER()
This is particularly useful when used with ordered derived tables, such as:
SELECT t.*, ROW_NUMBER() OVER()
FROM (
SELECT ...
ORDER BY
) t
How can this be emulated in SQL Server? I've found people using this trick, but that's wrong, as it will behave non-deterministically with respect to the order from the derived table:
-- This order here ---------------------vvvvvvvv
SELECT t.*, ROW_NUMBER() OVER(ORDER BY (SELECT 1))
FROM (
SELECT TOP 100 PERCENT ...
-- vvvvv ----redefines this order here
ORDER BY
) t
A concrete example (as can be seen on SQLFiddle):
SELECT v, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
FROM (
SELECT TOP 100 PERCENT 1 UNION ALL
SELECT TOP 100 PERCENT 2 UNION ALL
SELECT TOP 100 PERCENT 3 UNION ALL
SELECT TOP 100 PERCENT 4
-- This descending order is not maintained in the outer query
ORDER BY 1 DESC
) t(v)
Also, I cannot reuse any expression from the derived table to reproduce the ORDER BY clause in my case, as the derived table might not be available as it may be provided by some external logic.
So how can I do it? Can I do it at all?
The Row_Number() OVER (ORDER BY (SELECT 1)) trick should NOT be seen as a way to avoid changing the order of underlying data. It is only a means to avoid causing the server to perform an additional and unneeded sort (it may still perform the sort but it's going to cost the minimum amount possible when compared to sorting by a column).
All queries in SQL server ABSOLUTELY MUST have an ORDER BY clause in the outermost query for the results to be reliably ordered in a guaranteed way.
The concept of "retaining original order" does not exist in relational databases. Tables and queries must always be considered unordered until and unless an ORDER BY clause is specified in the outermost query.
You could try the same unordered query 100,000 times and always receive it with the same ordering, and thus come to believe you can rely on said ordering. But that would be a mistake, because one day, something will change and it will not have the order you expect. One example is when a database is upgraded to a new version of SQL Server--this has caused many a query to change its ordering. But it doesn't have to be that big a change. Something as little as adding or removing an index can cause differences. And more: Installing a service pack. Partitioning a table. Creating an indexed view that includes the table in question. Reaching some tipping point where a scan is chosen instead of a seek. And so on.
Do not rely on results to be ordered unless you have said "Server, ORDER BY".

Why do partitions require nested selects?

I have a page to show 10 messages by each user (don't ask me why)
I have the following code:
SELECT *, row_number() over(partition by user_id) as row_num
FROM "posts"
WHERE row_num <= 10
It doesn't work.
When I do this:
SELECT *
FROM (
SELECT *, row_number() over(partition by user_id) as row_num FROM "posts") as T
WHERE row_num <= 10
It does work.
Why do I need nested query to see row_num column? Btw, in first request I actually see it in results but can't use where keyword for this column.
It seems to be the same "rule" as any query, column aliases aren't visible to the WHERE clause;
This will also fail;
SELECT id AS newid
FROM test
WHERE newid=1; -- must use "id" in WHERE clause
SQL Query like:
SELECT *
FROM table
WHERE <condition>
will execute in next order:
3.SELECT *
1.FROM table
2.WHERE <condition>
so, as Joachim Isaksson say, columns in SELECt clause are not visible in WHERE clause, because of processing order.
In your second query, column row_num are fetched in FROM clause first, so it will be visible in WHERE clause.
Here is simple list of steps in order they executes.
There is a good reason for this rule in standard SQL.
Consider the statement:
SELECT *, row_number() over (partition by user_id) as row_num
FROM "posts"
WHERE row_num <= 10 and p.type = 'xxx';
When does the p.type = 'xxx' get evaluated relative to the row number? In other words, would this return the first ten rows of "xxx"? Or would it return the "xxx"s in the first ten rows?
The designers of the SQL language recognize that this is a hard problem to resolve. Only allowing them in the select clause resolves the issue.
You can check this topic and this one on dba.stockexchange.com about order in which SQL executes SELECT clause. I think it aplies not only for PostgreSQL, but for all RDBMS.

SQL Server ORDER BY clause in subquery

I am facing a strange error in SQL Server and I want some explanation of it.
When I write ORDER BY in a subquery, for instance
SELECT a FROM (SELECT * FROM A ORDER BY a) T
it throws the following error
The ORDER BY clause is invalid in views, inline functions, derived
tables, subqueries, and common table expressions, unless TOP or FOR
XML is also specified.
But when I use TOP in subquery it works normally
SELECT a
FROM
(SELECT TOP 1000000 * FROM A ORDER BY a) T
So, does it mean that I can select top row count of A, instead of
SELECT a FROM (SELECT * FROM A ORDER BY a) T
In that case. what is the reason of error?
There is no much sense to sort the subquery and after that select something from it - it is not guaranteed that top-level select will be ordered, so - there is no sense to order the inner query
But if you order inner query with TOP statement - it also not guaranteed that top level select will be ordered in such a way, but it will contain only top X rows from the inner query - that is already makes sense.

Select last n rows without use of order by clause

I want to fetch the last n rows from a table in a Postgres database. I don't want to use an ORDER BY clause as I want to have a generic query. Anyone has any suggestions?
A single query will be appreciated as I don't want to use FETCH cursor of Postgres.
That you get what you expect with Lukas' solution (as of Nov. 1st, 2011) is pure luck. There is no "natural order" in an RDBMS by definition. You depend on implementation details that could change with a new release without notice. Or a dump / restore could change that order. It can even change out of the blue when db statistics change and the query planner chooses a different plan that leads to a different order of rows.
The proper way to get the "last n" rows is to have a timestamp or sequence column and ORDER BY that column. Every RDBMS you can think of has ORDER BY, so this is as 'generic' as it gets.
The manual:
If ORDER BY is not given, the rows are returned in whatever order the
system finds fastest to produce.
Lukas' solution is fine to avoid LIMIT, which is implemented differently in various RDBMS (for instance, SQL Server uses TOP n instead of LIMIT), but you need ORDER BY in any case.
Use window functions!
select t2.* from (
select t1.*, row_number() over() as r, count(*) over() as c
from (
-- your generic select here
) t1
) t2
where t2.r + :n > t2.c
In the above example, t2.r is the row number of every row, t2.c is the total records in your generic select. And :n will be the n last rows that you want to fetch. This also works when you sort your generic select.
EDIT: A bit less generic from my previous example:
select * from (
select my_table.*, row_number() over() as r, count(*) over() as c
from my_table
-- optionally, you could sort your select here
-- order by my_table.a, my_table.b
) t
where t.r + :n > t.c