What's the meaning of this sql statement (order by count(*))? - sql

What could be the meaning of this sql statement ?
select * from tab1 order by (select count(*) from tab2) desc

The below line just returns the number of rows in tab2, which is some constant number
select count(*) from tab2
Consider the columns numbered 1 through n where n is the last column.
select * from tab1 order by 1
would order by the first column
select * from tab1 order by 2
would order by the second column and etc.
If n is larger than the number of columns then you'll run into a problemEDIT
You are using a subquery however and having
select * from tbl1 order by (select 1000)
does not cause a problem if you have <1000 columns, it seems to do nothing; the query may be missing some information

The result is to order by the column whose index is the count returned by the inner query in the ORDER BY clause. Whoever wrote this, especially without a comment, should be hanged by body parts important for reproduction.

The answer is based in Microsoft SQL functionality [edit:] where subquery in ORDER BY (subquery) expression indicates sort value.
Here's how I see it: since tab2 is not linked to tab1 in a subquery, the SQL can be reduced to:
select * from tab1 order by (SELECT <CONSTANT>) desc
therefore it's equivalent to:
select * from tab1

Quite frankly all that query is going to do is return all records from tab1 in some unknown order.
The order by clause is a bit asinine because the value returned will always be the count of all records in tab2.
I suspect it's missing a where clause on the (select count(*) from tab2) part. Something along the lines of (select count(*) from tab2 t where t.tab1id = tab1.id) Although it's hard to say without knowing the structure of those two tables.

The ORDER BY is equivalent to ORDER BY 'X'; that is, it has no effect. It does not order by the column number referenced by the count(*) in the second query -- it is not equivalent to order by 3 if the second table has three rows.
See fiddles for Oracle, MySQL, and SQL Server. If the ORDER BY was based on the count(*), the result should then be sorted by the third column. None of them are. Also, a count(*)+100 has no effect.

Related

Why do partitions require nested selects?

I have a page to show 10 messages by each user (don't ask me why)
I have the following code:
SELECT *, row_number() over(partition by user_id) as row_num
FROM "posts"
WHERE row_num <= 10
It doesn't work.
When I do this:
SELECT *
FROM (
SELECT *, row_number() over(partition by user_id) as row_num FROM "posts") as T
WHERE row_num <= 10
It does work.
Why do I need nested query to see row_num column? Btw, in first request I actually see it in results but can't use where keyword for this column.
It seems to be the same "rule" as any query, column aliases aren't visible to the WHERE clause;
This will also fail;
SELECT id AS newid
FROM test
WHERE newid=1; -- must use "id" in WHERE clause
SQL Query like:
SELECT *
FROM table
WHERE <condition>
will execute in next order:
3.SELECT *
1.FROM table
2.WHERE <condition>
so, as Joachim Isaksson say, columns in SELECt clause are not visible in WHERE clause, because of processing order.
In your second query, column row_num are fetched in FROM clause first, so it will be visible in WHERE clause.
Here is simple list of steps in order they executes.
There is a good reason for this rule in standard SQL.
Consider the statement:
SELECT *, row_number() over (partition by user_id) as row_num
FROM "posts"
WHERE row_num <= 10 and p.type = 'xxx';
When does the p.type = 'xxx' get evaluated relative to the row number? In other words, would this return the first ten rows of "xxx"? Or would it return the "xxx"s in the first ten rows?
The designers of the SQL language recognize that this is a hard problem to resolve. Only allowing them in the select clause resolves the issue.
You can check this topic and this one on dba.stockexchange.com about order in which SQL executes SELECT clause. I think it aplies not only for PostgreSQL, but for all RDBMS.

Using distinct on a column and doing order by on another column gives an error

I have a table:
abc_test with columns n_num, k_str.
This query doesnt work:
select distinct(n_num) from abc_test order by(k_str)
But this one works:
select n_num from abc_test order by(k_str)
How do DISTINCT and ORDER BY keywords work internally that output of both the queries is changed?
As far as i understood from your question .
distinct :- means select a distinct(all selected values should be unique).
order By :- simply means to order the selected rows as per your requirement .
The problem in your first query is
For example :
I have a table
ID name
01 a
02 b
03 c
04 d
04 a
now the query select distinct(ID) from table order by (name) is confused which record it should take for ID - 04 (since two values are there,d and a in Name column). So the problem for the DB engine is here when you say
order by (name).........
You might think about using group by instead:
select n_num
from abc_test
group by n_num
order by min(k_str)
The first query is impossible.
Lets explain this by example. we have this test:
n_num k_str
2 a
2 c
1 b
select distinct (n_num) from abc_test is
2
1
Select n_num from abc_test order by k_str is
2
1
2
What do you want to return
select distinct (n_num) from abc_test order by k_str?
it should return only 1 and 2, but how to order them?
How do extended sort key columns
The logical order of operations in SQL for your first query, is (simplified):
FROM abc_test
SELECT n_num, k_str i.e. add a so called extended sort key column
ORDER BY k_str DESC
SELECT n_num i.e. remove the extended sort key column again from the result.
Thanks to the SQL standard extended sort key column feature, it is possible to order by something that is not in the SELECT clause, because it is being temporarily added to it behind the scenes prior to ordering, and then removed again after ordering.
So, why doesn't this work with DISTINCT?
If we add the DISTINCT operation, it would need to be added between SELECT and ORDER BY:
FROM abc_test
SELECT n_num, k_str i.e. add a so called extended sort key column
DISTINCT
ORDER BY k_str DESC
SELECT n_num i.e. remove the extended sort key column again from the result.
But now, with the extended sort key column k_str, the semantics of the DISTINCT operation has been changed, so the result will no longer be the same. This is not what we want, so both the SQL standard, and all reasonable databases forbid this usage.
Workarounds
PostgreSQL has the DISTINCT ON syntax, which can be used here for precisely this job:
SELECT DISTINCT ON (k_str) n_num
FROM abc_test
ORDER BY k_str DESC
It can be emulated with standard syntax as follows, if you're not using PostgreSQL
SELECT n_num
FROM (
SELECT n_num, MIN(k_str) AS k_str
FROM abc_test
GROUP BY n_num
) t
ORDER BY k_str
Or, just simply (in this case)
SELECT n_num, MIN(k_str) AS k_str
FROM abc_test
GROUP BY n_num
ORDER BY k_str
I have blogged about SQL DISTINCT and ORDER BY more in detail here.
You are selecting the collection distinct(n_num) from the resultset from your query. So there is no actual relation with the column k_str anymore. A n_num can be from two rows each having a different value for k_str. So you can't order the collection distinct(n_num) by k_str.
According to SQL Standards, a SELECT clause may refer either to as clauses ("aliases") in the top level SELECT clause or columns of the resultset by ordinal position, and therefore nether of your queries would be compliant.
It seems Oracle, in common with other SQL implemetations, allows you to refer to columns that existed (logically) immediately prior to being projected away in the SELECT clause. I'm not sure whether such flexibility is such a good thing: IMO it is good practice to expose the sort order to the calling application by including the column/expressions etc in the SELECT clause.
As ever, you need to apply dsicpline to get meaningful results. For your first query, the definition of order is potentially entirely arbitrary.You should be grateful for the error ;)
This approach is available in SQL server 2000, you can select distinct values from a table and order by different column which is not included in Distinct.
But in SQL 2012 this will through you an error
"ORDER BY items must appear in the select list if SELECT DISTINCT is specified."
So, still if you want to use the same feature as of SQL 2000 you can use the column number for ordering(its not recommended in best practice).
select distinct(n_num) from abc_test order by 1
This will order the first column after fetching the result. If you want the ordering should be done based on different column other than distinct then you have to add that column also in select statement and use column number to order by.
select distinct(n_num), k_str from abc_test order by 2
When I got same error, I got it resolved by changing it as
SELECT n_num
FROM(
SELECT DISTINCT(n_num) AS n_num, k_str
FROM abc_test
) as tbl
ORDER BY tbl.k_str
My query doesn't match yours exactly, but it's pretty close.
select distinct a.character_01 , (select top 1 b.sort_order from LookupData b where a.character_01 = b.character_01 )
from LookupData a
where
Dataset_Name = 'Sample' and status = 200
order by 2, 1
did you try this?
SELECT DISTINCT n_num as iResult
FROM abc_test
ORDER BY iResult
you can do
select distinct top 10000 (n_num) --assuming you won't have more than 10,000 rows
from abc_test order by(k_str)

Select last n rows without use of order by clause

I want to fetch the last n rows from a table in a Postgres database. I don't want to use an ORDER BY clause as I want to have a generic query. Anyone has any suggestions?
A single query will be appreciated as I don't want to use FETCH cursor of Postgres.
That you get what you expect with Lukas' solution (as of Nov. 1st, 2011) is pure luck. There is no "natural order" in an RDBMS by definition. You depend on implementation details that could change with a new release without notice. Or a dump / restore could change that order. It can even change out of the blue when db statistics change and the query planner chooses a different plan that leads to a different order of rows.
The proper way to get the "last n" rows is to have a timestamp or sequence column and ORDER BY that column. Every RDBMS you can think of has ORDER BY, so this is as 'generic' as it gets.
The manual:
If ORDER BY is not given, the rows are returned in whatever order the
system finds fastest to produce.
Lukas' solution is fine to avoid LIMIT, which is implemented differently in various RDBMS (for instance, SQL Server uses TOP n instead of LIMIT), but you need ORDER BY in any case.
Use window functions!
select t2.* from (
select t1.*, row_number() over() as r, count(*) over() as c
from (
-- your generic select here
) t1
) t2
where t2.r + :n > t2.c
In the above example, t2.r is the row number of every row, t2.c is the total records in your generic select. And :n will be the n last rows that you want to fetch. This also works when you sort your generic select.
EDIT: A bit less generic from my previous example:
select * from (
select my_table.*, row_number() over() as r, count(*) over() as c
from my_table
-- optionally, you could sort your select here
-- order by my_table.a, my_table.b
) t
where t.r + :n > t.c

Clarification on "rownum"

I have a table Table1
Name Date
A 01-jun-2010
B 03-dec-2010
C 12-may-2010
When i query this table with the following query
select * from table1 where rownum=1
i got output as
Name Date
A 01-jun-2010
But in the same way when i use the following queries i do not get any output.
select * from table1 where rownum=2
select * from table1 where rownum=3
Someone please give me guidance why it works like that, and how to use the rownum.
Tom has an answer for many Oracle related questions
In short, rownum is available after the where clause has been applied and before the order by clause is applied.
In the case of RowNum=2, the predicate in the where clause will never evaluate to true as RowNum starts at 1 and only increases if records matching the predicate can be found.
Adding rownums is one of the last things done after the result set has been fetched from the database. This means that the first row will always have rownum 1. Rownum is better used when you want to limit the result set, for instance when doing paging.
See this for more: http://www.orafaq.com/wiki/ROWNUM
(Not an Oracle expert by any means)
From what I understand, rownum numbers the rows in a result set.
So, in your example:
select * from table1 where rownum=2
How many rows are there going to be in the result set? Therefore, what rownum would be assigned to such a row? Can you see now why no result is actually returned?
In general, you should avoid relying on rownum, or any features that imply an order to results. Try to think about working with the entire set of results.
With that being said, I believe the following would work:
select * from (select rownum as rn,table1.* from table1) as t where t.rn = 2
Because in that case, you're numbering the rows within the subquery.

Are the results deterministic, if I partition SQL SELECT query without ORDER BY?

I have SQL SELECT query which returns a lot of rows, and I have to split it into several partitions. Ie, set max results to 10000 and iterate the rows calling the query select time with increasing first result (0, 10000, 20000). All the queries are done in same transaction, and data that my queries are fetching is not changing during the process (other data in those tables can change, though).
Is it ok to use just plain select:
select a from b where...
Or do I have to use order by with the select:
select a from b where ... order by c
In order to be sure that I will get all the rows? In other word, is it guaranteed that query without order by will always return the rows in the same order?
Adding order by to the query drops performance of the query dramatically.
I'm using Oracle, if that matters.
EDIT: Unfortunately I cannot take advantage of scrollable cursor.
Order is definitely not guaranteed without an order by clause, but whether or not your results will be deterministic (aside from the order) would depend on the where clause. For example, if you have a unique ID column and your where clause included a different filter range each time you access it, then you would have non-ordered deterministic results, i.e.:
select a from b where ID between 1 and 100
select a from b where ID between 101 and 200
select a from b where ID between 201 and 300
would all return distinct result sets, but order would not be any way guaranteed.
No, without order by it is not guaranteed that query will ALWAYS return the rows in the same order.
No guarantees unless you have an order by on the outermost query.
Bad SQL Server example, but same rules apply. Not guaranteed order even with inner query
SELECT
*
FROM
(
SELECT
*
FROM
Mytable
ORDER BY SomeCol
) foo
Use Limit
So you would do:
SELECT * FROM table ORDER BY id LIMIT 0,100
SELECT * FROM table ORDER BY id LIMIT 101,100
SELECT * FROM table ORDER BY id LIMIT 201,100
The LIMIT would be from which position you want to start and the second variable would be how many results you want to see.
Its a good pagnation trick.