Is order in a subquery guaranteed to be preserved? - sql

I am wondering in particular about PostgreSQL. Given the following contrived example:
SELECT name FROM
(SELECT name FROM people WHERE age >= 18 ORDER BY age DESC) p
LIMIT 10
Are the names returned from the outer query guaranteed to be be in the order they were for the inner query?

No, put the order by in the outer query:
SELECT name FROM
(SELECT name, age FROM people WHERE age >= 18) p
ORDER BY p.age DESC
LIMIT 10
The inner (sub) query returns a result-set. If you put the order by there, then the intermediate result-set passed from the inner (sub) query, to the outer query, is guaranteed to be ordered the way you designate, but without an order by in the outer query, the result-set generated by processing that inner query result-set, is not guaranteed to be sorted in any way.

For simple cases, #Charles query is most efficient.
More generally, you can use the window function row_number() to carry any order you like to the main query, including:
order by columns not in the SELECT list of the subquery and thus not reproducible
arbitrary ordering of peers according to ORDER BY criteria. Postgres will reuse the same arbitrary order in the window function within the subquery. (But not truly random order from random() for instance!)
If you don't want to preserve arbitrary sort order of peers from the subquery, use rank() instead.
This may also be generally superior with complex queries or multiple query layers:
SELECT p.name
FROM (
SELECT name, row_number() OVER (ORDER BY <same order by criteria>) AS rn
FROM people
WHERE age >= 18
ORDER BY <any order by criteria>
) p
ORDER BY p.rn
LIMIT 10;

The are not guaranteed to be in the same order, though when you run it you might see that it is generally follows the order.
You should place the order by on the main query
SELECT name FROM
(SELECT name FROM people WHERE age >= 18) p
ORDER BY p.age DESC LIMIT 10

Related

Select Distinct on one column, without ordering by that column

I'm trying to select only the IDs of a table that I'm querying on, and still be able to specify ordering on other columns.
First I tried simply doing:
SELECT DISTINCT countries.id
FROM countries
...
ORDER BY province_infos.population DESC, country_infos.population ASC
That won't work, because for SELECT DISTINCT, ORDER BY expressions must appear in select list, and returns an error.
If I add province_infos.population and country_infos.population, it works, but I then get duplicate IDs, which I cannot have.
To resolve this, i attempted using DISTINCT ON():
SELECT DISTINCT ON (countries.id)
countries.id, country_infos.population, province_infos.population
FROM countries
...
ORDER BY province_infos.population DESC, country_infos.population ASC
That then gives me the error SELECT DISTINCT ON expressions must match initial ORDER BY expressions. I can't SELECT DISTINCT ON a column without ordering it too.
It seems the only way for this to work, is to do something like:
SELECT DISTINCT ON (countries.id)
countries.id
FROM countries
...
ORDER BY countries.id DESC, province_infos.population DESC, country_infos.population ASC
I unfortunately can't do this, since I cannot order by IDs, as it skews the results of the other orders. And it seems the only way to not order by the IDs, is if I remove the DISTINCT from the select, but then I'll get duplicates.
Anyone know how I can work around this?
EDIT:
The ... I omitted shouldn't be relevant, but in case you want to see:
JOIN country_infos ON country_infos.country_refer = countries.id
JOIN languages ON languages.country_refer = countries.id
JOIN provinces ON provinces.country_refer = countries.id
JOIN province_infos ON province_infos.province_refer = provinces.id
WHERE country_infos.population > 10.3
AND languages.alphabet = 'Latin'
And I'm not just trying to get this working for this specific query. This is just an example I'm using to explain the predicament. I'm generating these kinds of queries automatically off of an arbitrary data structure.
The general answer to your question is that when using DISTINCT ON (x, ...) in SELECT statement in postgresql, the database sorts by the values in the distinct clause in order to make it easy to tell if the rows have distinct values (once they're ordered by the values, it only takes one pass for the db to remove duplicates, and it only needs to compare adjacent rows. Because of this, the db forces you to sort by the same columns in the distinct clause.
You can work around this by making your original query a subquery, like so:
SELECT t.id FROM
(SELECT DISTINCT ON (countries.id) countries.id
, province_infos.population
, country_infos.founding_date
FROM countries
...
ORDER BY countries.id, province_infos.population DESC, country_infos.founding_date ASC
)t
ORDER BY t.population DESC, T.founding_date ASC
Use GROUP BY, something like this:
SELECT c.id
FROM countries c
...
GROUP BY c.id
ORDER BY MAX(pi.population) DESC, MAX(ci.population) ASC;
Actually, given the nature of your problem, you might want SUM():
SELECT c.id
FROM countries c
...
GROUP BY c.id
ORDER BY SUM(pi.population) DESC, SUM(ci.population) ASC;

Order by DESC reverse result

I'm retrieving some data in SQL, order by DESC. I then want to reverse the result. I was doing this by pushing the data into an array and then using array_reverse, but I am finding it's quite taxing on CPU time and would like to simply use the correct SQL query.
I've looked at this thread SQL Server reverse order after using desc, but I cannot seem to make it work with my query.
SELECT live.message,
live.sender,
live.sdate,
users.online
FROM live, users
WHERE users.username = live.sender
ORDER BY live.id DESC
LIMIT 15
You can place your query into a subquery and then reverse the order:
SELECT t.message,
t.sender,
t.sdate,
t.online
FROM
(
SELECT live.id,
live.message,
live.sender,
live.sdate,
users.online
FROM live
INNER JOIN users
ON users.username = live.sender
ORDER BY live.id DESC
LIMIT 15
) t
ORDER BY t.id ASC
You'll notice that I replaced your implicit JOIN with an explicit INNER JOIN. It is generally considered undesirable to use commas in the FROM clause (q.v. the ANSI-92 standard) because it makes the query harder to read.
You could wrap your query with another query and order by with asc. Since you want to order by live.id, you must include it in the inner query so the outer one can sort by it:
SELECT message, sender, sdate, online
FROM (SELECT live.message, live.sender, live.sdate, users.online, live.id
FROM live, users
WHERE users.username = live.sender
ORDER BY live.id DESC
LIMIT 15) t
ORDER BY id ASC

Ambiguous column name using row_number() without alias

I'm trying to implement pagination in a query that is built using information from a view, and I need to use the row_number() function over a column when I don't know which table it is from.
SELECT * FROM (
SELECT class.ID as ID, user.ID as USERID, row_number() over (ORDER BY
ID desc) as row_number FROM class, user
) out_q WHERE row_number > #startrow ORDER BY row_number
The problem is that I only have the result column name (ID or USERID) that came from a previous query. If I execute this query, it will raise the error 'Ambiguous column name "ID"'. Is there a way to specify that I'm referencing the column ID that is being selected and not from a different table?
Is it possible to specify an alias to the query result itself?
I have already tried the following,
SELECT TOP 30 * FROM (
SELECT *, row_number() over (ORDER BY ID desc) as row_number FROM(
SELECT class.ID as ID, user.ID as USERID FROM class, user
) in_q
) out_q WHERE row_number > #startrow ORDER BY row_number
It works, but the SGBD gets confused on which query plan it has to use, because of the small row goal present in the outer query and the big set of results returned by the inner query, when #startrow is a small number, the query executes in less than one second, when it is a big number the query takes minutes to execute.
Your problem is the id in the row_number itself. If you want a stable sort, then include both ids:
SELECT *
FROM (SELECT class.ID as ID, user.ID as USERID,
row_number() over (ORDER BY class.ID desc, user.id) as row_number
FROM class CROSS JOIN user
) out_q
WHERE row_number > #startrow
ORDER BY row_number;
I assume the cartesian product is intentional. Sometimes, this indicates an error in the query. In general, I would advise you to avoid using commas in the from clause. If you do want a cartesian product, then be explicit by using CROSS JOIN.
You could try using the option you already tried, then use the OPTIMIZE FOR hint.
OPTION ( OPTIMIZE FOR (#startrow = 100000) );
See a description of the hint in MSDN docs here: https://msdn.microsoft.com/en-us/library/ms181714.aspx.

Order by not working in Oracle subquery

I'm trying to return 7 events from a table, from todays date, and have them in date order:
SELECT ID
FROM table
where ID in (select ID from table
where DATEFIELD >= trunc(sysdate)
order by DATEFIELD ASC)
and rownum <= 7
If I remove the 'order by' it returns the IDs just fine and the query works, but it's not in the right order. Would appreciate any help with this since I can't seem to figure out what I'm doing wrong!
(edit) for clarification, I was using this before, and the order returned was really out:
select ID
from TABLE
where DATEFIELD >= trunc(sysdate)
and rownum <= 7
order by DATEFIELD
Thanks
The values for the ROWNUM "function" are applied before the ORDER BY is processed. That why it doesn't work the way you used it (See the manual for a similar explanation)
When limiting a query using ROWNUM and an ORDER BY is involved, the ordering must be done in an inner select and the limit must be applied in the outer select:
select *
from (
select *
from table
where datefield >= trunc(sysdate)
order by datefield ASC
)
where rownum <= 7
You cannot use order by in where id in (select id from ...) kind of subquery. It wouldn't make sense anyway. This condition only checks if id is in subquery. If it affects the order of output, it's only incidental. With different data query execution plan might be different and output order would be different as well. Use explicit order by at the end of the main query.
It is well known 'feature' of Oracle that rownum doesn't play nice with order by. See http://www.adp-gmbh.ch/ora/sql/examples/first_rows.html for more information. In your case you should use something like:
SELECT ID
FROM (select ID, row_number() over (order by DATEFIELD ) r
from table
where DATEFIELD >= trunc(sysdate))
WHERE r <= 7
See also:
http://www.orafaq.com/faq/how_does_one_select_the_top_n_rows_from_a_table
http://www.oracle.com/technetwork/issue-archive/2006/06-sep/o56asktom-086197.html
http://asktom.oracle.com/pls/asktom/f?p=100:11:507524690399301::::P11_QUESTION_ID:127412348064
See also other similar questions on SO, eg.:
Oracle SELECT TOP 10 records
Oracle/SQL - Select specified range of sequential records
Your outer query cant "see" the ORDER in the inner query and in this case the order in the inner doesn't make sense because it (the inner) is only being used to create a subset of data that will be used on the WHERE of the outer one, so the order of this subset doesn't matter.
maybe if you explain better what you want to do, we can help you
ORDER BY CLAUSE IN Subqueries:
the order by clause is not allowed inside a subquery, with the exception of the inline views. If attempt to include an ORDER BY clause, you receive an error message
An inline View is a query at the from clause.
SELECT t.*
FROM (SELECT id, name FROM student) t

Avoiding Correlated Subquery in Oracle

In Oracle 9.2.0.8, I need to return a record set where a particular field (LAB_SEQ) is at a maximum (it is a sequential VARCHAR array '0001', '0002', etc.) for each of another field (WO_NUM). To select the maximum, I am attempting to order in descending order and select the first row. Everything I can find on StackOverflow suggests that the only way to do this is with a correlated subquery. Then I use this maximum in the WHERE clause of the outer query to get the row I want for each WO_NUM:
SELECT lt.WO_NUM, lt.EMP_NUM, lt.LAB_END_DATE, lt.LAB_END_TIME
FROM LAB_TIM lt WHERE lt.LAB_SEQ = (
SELECT LAB_SEQ FROM (
SELECT lab.LAB_SEQ FROM LAB_TIM lab WHERE lab.CCN='1' AND MAS_LOC='1'
AND lt.WO_NUM = lab.WO_NUM ORDER BY ROWNUM DESC
) WHERE ROWNUM=1
)
However, this returns an invalid identifier for lt.WO_NUM error. Research suggests that ORacle 8 only allows correlated subqueries one level deep, and suggests rewriting to avoid the subquery - something which discussion of selecting maximums suggests can't be done. Any help getting this statement to execute would be greatly appreciated.
Your correlated subquery would need to be something like
SELECT lt.WO_NUM, lt.EMP_NUM, lt.LAB_END_DATE, lt.LAB_END_TIME
FROM LAB_TIM lt WHERE lt.LAB_SEQ = (
SELECT max(lab.LAB_SEQ)
FROM LAB_TIM lab
WHERE lab.CCN='1' AND MAS_LOC='1'
AND lt.WO_NUM = lab.WO_NUM
)
Since you are on Oracle 9.2, it will probably be more efficient to use a correlated subquery. I'm not sure what the predicates lab.CCN='1' AND MAS_LOC='1' are doing in your current query so I'm not quite sure how to translate them into the analytic function approach. Is the combination of LAB_SEQ and WO_NUM not unique in LAB_TIM? Do you need to add in the predicates on CCN and MAS_LOC in order to get a single unique row for every WO_NUM? Or are you using those predicates to decrease the number of rows in your output? The basic approach will be something like
SELECT *
FROM (SELECT lt.WO_NUM,
lt.EMP_NUM,
lt.LAB_END_DATE,
lt.LAB_END_TIME,
rank() over (partition by wo_num
order by lab_seq desc) rnk
FROM LAB_TIM lt)
WHERE rnk = 1
but it's not clear to me whether CCN and MAS_LOC need to be added to the ORDER BY clause in the analytic function or whether they need to be added to the WHERE clause.
This is one case where a correlated subquery is better, particularly if you have indexes on the table. However, it should be possible to rewrite correlated subqueries as joins.
I think the following is equivalent, without the correlated subquery:
SELECT lt.WO_NUM, lt.EMP_NUM, lt.LAB_END_DATE, lt.LAB_END_TIME
FROM (select *, rownum as r
from LAB_TIM lt
) lt join
(select wo_num, max(r) as maxrownum
from (select LAB_SEQ, wo_num, rownum as r
from LAB_TIM lt
where lab.CCN = '1' AND MAS_LOC = '1'
)
) ltsum
on lt.wo_num = ltsum.wo_num and
lt.r = ltsum.maxrownum
I'm a little unsure about how Oracle works with rownums in things like ORDER BY.