ORDER BY before SELECT? - sql

I'm just trying to add in an the newest estimated return date for a car as part of a subquery in my Select statement, I just wanted to know if this was how it is done?
I think I heard that Select happens before Order By so wanted to do a quick check.
select top 1 ESTIMATE_RETURN_DATE
from CHECK.AOS
where AOS.AUTO_NO = CHECK_EVENT.AUTONUM_4
ORDER BY REVISION_NO desc

ORDER BY is evaluated before the SELECT, as the ordering changes the results returned.
TOP 1 also ensures the lowest REVISION_NO is returned, therefore it appears you are using the query correctly.

I was pretty sure, but this explains it http://use-the-index-luke.com/sql/partial-results/top-n-queries
It selects everything and sorts then stops when it reaches the number to return.

What you have will work. I think of it like this:
"SQL" needs to get the data before it can sort (order) the data.
This is a great resource:
http://blog.sqlauthority.com/2009/04/06/sql-server-logical-query-processing-phases-order-of-statement-execution/
I am not affiliated with that website at all, or any of the people who contribute to it, jsut have found it helpful in the past (and continue to find it helpful!).

Yes Top works after OrderBY. First it will find all the records and then it would apply Top. So you are doing it right. I am also writing down the order in which a query process in sql server
FROM
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
TOP/OFFSET FETCH

Related

Why is there no `select last` or `select bottom` in SQL Server like there is `select top`?

I know this might probably sound like a stupid question, but please bear with me.
In SQL-server we have
SELECT TOP N ...
now in that we can get the first n rows in ascending order (by default), cool. If we want records to be sorted on any other column, we just specify that in the order by clause, something like this...
SELECT TOP N ... ORDER BY [ColumnName]
Even more cool. But what if I want the last row? I just write something like this...
SELECT TOP N ... ORDER BY [ColumnName] DESC
But there is a slight concern with that. I said concern and not issue because it isn't actually an issue. By this way, I could get the last row based on that column, but what if I want the last row that was inserted. I know about SCOPE_IDENTITY, IDENT_CURRENT and ##IDENTITY, but consider a heap (a table without a clustered index) without any identity column, and multiple accesses from many places (please don't go into this too much as to how and when these multiple operation are happening, this doesn't concern the main thing). So in this case there doesn't seems to be an easy way to find which row was actually inserted last. Some might answer this as
If you do a select * from [table] the last row shown in the sql result window will be the last one inserted.
To anything thinking about this, this is not actually the case, at least not always and one that you can always rely on (msdn, please read the Advanced Scanning section).
So the question boils down to this, as in the title itself. Why doesn't SQL Server have a
SELECT LAST
or say
SELECT BOTTOM
or something like that, where we don't have to specify the Order By and then it would give the last record inserted in the table at the time of executing the query (again I am not going into details about how would this result in case of uncommitted reads or phantom reads).
But if still, someone argues that we can't talk about this without talking about these read levels, then, for them, we could make it behave as the same way as TOP work but just the opposite. But if your argument is then we don't need it as we can always do
SELECT TOP N ... ORDER BY [ColumnName] DESC
then I really don't know what to say here. I know we can do that, but are there any relation based reason, or some semantics based reason, or some other reason due to which we don't have or can't have this SELECT LAST/BOTTOM. I am not looking for way to does Order By, I am looking for reason as to why do don't have it or can't have it.
Extra
I don't know much about how NOSQL works, but I've worked (just a little bit) with mongodb and elastic search, and there too doesn't seems to be anything like this. Is the reason they don't have it is because no one ever had it before, or is it for some reason not plausible?
UPDATE
I don't need to know that I need to specify order by descending or not. Please read the question and understand my concern before answering or commenting. I know how will I get the last row. That's not even the question, the main question boils down to why no select last/bottom like it's counterpart.
UPDATE 2
After the answers from Vladimir and Pieter, I just wanted to update that I know the the order is not guaranteed if I do a SELECT TOP without ORDER BY. I know from what I wrote earlier in the question might make an impression that I don't know that's the case, but if you just look a further down, I have given a link to msdn and have mentioned that the SELECT TOP without ORDER BY doesn't guarantees any ordering. So please don't add this to your answer that my statement in wrong, as I have already clarified that myself after a couple of lines (where I provided the link to msdn).
You can think of it like this.
SELECT TOP N without ORDER BY returns some N rows, neither first, nor last, just some. Which rows it returns is not defined. You can run the same statement 10 times and get 10 different sets of rows each time.
So, if the server had a syntax SELECT LAST N, then result of this statement without ORDER BY would again be undefined, which is exactly what you get with existing SELECT TOP N without ORDER BY.
You have stressed in your question that you know and understand what I've written below, but I'll still keep it to make it clear for everyone reading this later.
Your first phrase in the question
In SQL-server we have SELECT TOP N ... now in that we can get the
first n rows in ascending order (by default), cool.
is not correct. With SELECT TOP N without ORDER BY you get N "random" rows. Well, not really random, the server doesn't jump randomly from row to row on purpose. It chooses some deterministic way to scan through the table, but there could be many different ways to scan the table and server is free to change the chosen path when it wants. This is what is meant by "undefined".
The server doesn't track the order in which rows were inserted into the table, so again your assumption that results of SELECT TOP N without ORDER BY are determined by the order in which rows were inserted in the table is not correct.
So, the answer to your final question
why no select last/bottom like it's counterpart.
is:
without ORDER BY results of SELECT LAST N would be exactly the same as results of SELECT TOP N - undefined.
with ORDER BY result of SELECT LAST N ... ORDER BY X ASC is exactly the same as result of SELECT TOP N ... ORDER BY X DESC.
So, there is no point to have two key words that do the same thing.
There is a good point in the Pieter's answer: the word TOP is somewhat misleading. It really means LIMIT result set to some number of rows.
By the way, since SQL Server 2012 they added support for ANSI standard OFFSET:
OFFSET { integer_constant | offset_row_count_expression } { ROW | ROWS }
[
FETCH { FIRST | NEXT } {integer_constant | fetch_row_count_expression } { ROW | ROWS } ONLY
]
Here adding another key word was justified that it is ANSI standard AND it adds important functionality - pagination, which didn't exist before.
I would like to thank #Razort4x here for providing a very good link to MSDN in his question. The "Advanced Scanning" section there has an excellent example of mechanism called "merry-go-round scanning", which demonstrates why the order of the results returned from a SELECT statement cannot be guaranteed without an ORDER BY clause.
This concept is often misunderstood and I've seen many question here on SO that would greatly benefit if they had a quote from that link.
The answer to your question
Why doesn't SQL Server have a SELECT LAST or say SELECT BOTTOM or
something like that, where we don't have to specify the ORDER BY and
then it would give the last record inserted in the table at the time
of executing the query (again I am not going into details about how
would this result in case of uncommitted reads or phantom reads).
is:
The devil is in the details that you want to omit. To know which record was the "last inserted in the table at the time of executing the query" (and to know this in a somewhat consistent/non-random manner) the server would need to keep track of this information somehow. Even if it is possible in all scenarios of multiple simultaneously running transactions, it is most likely costly from the performance point of view. Not every SELECT would request this information (in fact very few or none at all), but the overhead of tracking this information would always be there.
So, you can think of it like this: by default the server doesn't do anything specific to know/keep track of the order in which the rows were inserted, because it affects performance, but if you need to know that you can use, for example, IDENTITY column. Microsoft could have designed the server engine in such a way that it required an IDENTITY column in every table, but they made it optional, which is good in my opinion. I know better than the server which of my tables need IDENTITY column and which do not.
Summary
I'd like to summarise that you can look at SELECT LAST without ORDER BY in two different ways.
1) When you expect SELECT LAST to behave in line with existing SELECT TOP. In this case result is undefined for both LAST and TOP, i.e. result is effectively the same. In this case it boils down to (not) having another keyword. Language developers (T-SQL language in this case) are always reluctant to add keywords, unless there are good reasons for it. In this case it is clearly avoidable.
2) When you expect SELECT LAST to behave as SELECT LAST INSERTED ROW. Which should, by the way, extend the same expectations to SELECT TOP to behave as SELECT FIRST INSERTED ROW or add new keywords LAST_INSERTED, FIRST_INSERTED to keep existing keyword TOP intact. In this case it boils down to the performance and added overhead of such behaviour. At the moment the server allows you to avoid this performance penalty if you don't need this information. If you do need it IDENTITY is a pretty good solution if you use it carefully.
There is no select last because there is no need for it. Consider a "select top 1 * from table" . Top 1 would get you the first row that is returned. And then the process stops.
But there is no guarantees about ordering if you don't specify an order by. So it may as well be any row in the dataset you get back.
Now do a "select last 1 * from table". Now the database will have to process all the rows in order to get you the last one.
And because ordering is non-deterministic, it may as well be the same result as from the select "top 1".
See now where the problem comes? Without an order by top and last are actually the same, only "last" will take more time. And with an order by, there's really only a need for top.
SELECT TOP N ...
now in that we can get the first n rows in ascending order (by
default), cool. If we want records to be sorted on any other column,
we just specify that in the order by clause, something like this...
What you say here is totally wrong and absolutely NOT how it works. There is no guarantee on what order you get. Ascending order on what ?
create table mytest(id int, id2 int)
insert into mytest(id,id2)values(1,5),(2,4),(3,3),(4,2),(5,1)
select top 1 * from mytest
select * from mytest
create clustered index myindex on mytest(id2)
select top 1 * from mytest
select * from mytest
insert into mytest(id,id2)values(6,0)
select top 1 * from mytest
Try this code line by line and see what you get with the last "select top 1".....you get in this case the last inserted record.
update
I think you understand that "select top 1 * from table" basically means: "Select a random row from the table".
So what would last mean? "Select the last random row from the table?" Wouldn't the last random row from a table be conceptually the same as saying any 1 random row from the table? And if that's true, top and last are the same, so there is no need for last.
Update 2
In hindsight I was happier with the syntax mysql uses : LIMIT.
Top doesn't say anything about ordering it is only there to specify the number of rows to be returned.
Limits the rows returned in a query result set to a specified number of rows or percentage of rows in SQL Server 2014.
The reasons why SELECT LAST_INSERTED does not make sense.
It cannot be easily applied to non-heap tables.
Heap data can be freely moved by DBMS so those "natural" order is subject to change. To keep it the system needs some additional mechanism which seems to be a useless waste.
If really desired it can be simulated by adding some 'auto-increment' column.
SQL Server ordering is arbitrary unless otherwise stated. It's set based, therefore you must define what your set is. Correct SCOPE_IDENTITY() is the correct way to capture the last inserted record, or the OUTPUT clause. Why would you do inserts on a heap that you need to reference chronologically anyway?? That's super bad database design.

Query to Find Adjacent Date Records

There exists in my database a page_history table; the idea is that whenever a record in the page table is changed, that record's old values are stored in the history table.
My job now is to find occasions in which a record was changed, and retrieve the pre- and post-conditions of that change. Specifically, I want to know when a page changed groups, and what groups were involved in the change. The query I have below can find these instances, but with the use of the min function, I can only get back the values that match between the two records:
select page_id,
original_group,
min(created2) change_date
from (select h.page_id,
h.group_id original_group,
i.group_id new_group,
h.created_dttm created1,
i.created_dttm created2
from page_history h,
page_history i
where h.page_id = i.page_id
and h.created_dttm < i.created_dttm
and h.group_id != i.group_id)
group by page_id, original_group, created1
order by page_id
When I try to get, say, any details of the second record, like new_group, I'm hit with a ORA-00979: not a GROUP BY expression error. I don't want to group by new_group, though, because that's going to destroy the logic (I think it would find records displaying times a page changed from a group to another group, regardless of any changes to other groups in between).
My question, then, is how can I modify this query, or go about writing a new one, that achieves a similar end, but with the added availability of columns that do not match between the two records? In essence, how can I find that min record without sacrificing all the other columns I'm not trying to compare? I don't exactly need a complete answer, any suggestions that point me in the right direction would be appreciated.
I use PL/SQL Developer, and it looks like version 11.2.0.2.0 of Oracle.
EDIT: I have found a solution. It's not pretty, and I'd still like to see some alternatives, but if helping me out would threaten to explode your brain, I would advise relocating to an easier question.
Without seeing your table structure it's hard to re-write the query but when you have a min function used like that it invariably seems better to put it into a separate sub select to get what you want and then compare the result of that.

How to retain the order of results while using a IN Clause in DB2?

I need to get a set of results using the IN clause, but the default ordering is done and the results are returned. Is there a way to maintain the order of the in clause in db2 ?
ORDER BY FILED would be a solution in MySQL but is there an equivalent in DB2 ?
As I understand it, you want to do this:
select foo from table where bar in (3, 1, 2);
and order by which item bar matched. i.e. bar = 3 comes first, followed by 1, then 2.
I don't think there is a built-in way to do what you want in DB2.
However, take a look at this recent question, which discusses workarounds.
If you want results in a particular order, ORDER BY is how to do it. SQL does not guarantee an order unless you use ORDER BY. There is no relationship whatsoever between a sort order of a result set and the way you choose to list items in any IN() clause. An IN() clause has nothing to do with it.
Note that a specific sort order may be obtained at any time without ORDER BY purely by luck. However, it is not guaranteed. If rows change over time, a different sort order might show up without warning.
This is a SQL behavior, not DB2. DB2 simply works for this behavior the way SQL is intended to work.

Why Rails ActiveRecord last method orders by id whereas first does not?

According to this, ActiveRecord first generates the SQL:
SELECT * FROM clients LIMIT 1
whereas ActiveRecord last generates the SQL:
SELECT * FROM clients ORDER BY clients.id DESC LIMIT 1
The behavior on first is not correct, according to my opinion, whereas on last it is. If you do not specify the ordering, simple SELECT returns in arbitrary or unpredictable order. Hence, first does not guarrantee to return the same record always (if not the record with minimum id).
Does anybody have a clue, why does Rails ActiveRecord work like that?
Thanks in advance
Panayotis
Since I didn't get any answer on this post, I tried to find the answer from other people on other forums. I believe that either ActiveRecord (or mysql gem) has a bug. Rick James says that if we want to get the minimum id, we should use order by with limit. Here is his answer:
http://forums.mysql.com/read.php?22,530328,530514#msg-530514
I think it is not a de jure, but a de facto rule. Most DB's returns the content of the table in order of creation (so in order of their id's) in case of no ordering was specified. It just works.

ORDER BY in a Sql Server 2008 view

we have a view in our database which has an ORDER BY in it.
Now, I realize views generally don't order, because different people may use it for different things, and want it differently ordered. This view however is used for a VERY SPECIFIC use-case which demands a certain order. (It is team standings for a soccer league.)
The database is Sql Server 2008 Express, v.10.0.1763.0 on a Windows Server 2003 R2 box.
The view is defined as such:
CREATE VIEW season.CurrentStandingsOrdered
AS
SELECT TOP 100 PERCENT *, season.GetRanking(TEAMID) RANKING
FROM season.CurrentStandings
ORDER BY
GENDER, TEAMYEAR, CODE, POINTS DESC,
FORFEITS, GOALS_AGAINST, GOALS_FOR DESC,
DIFFERENTIAL, RANKING
It returns:
GENDER, TEAMYEAR, CODE, TEAMID, CLUB, NAME,
WINS, LOSSES, TIES, GOALS_FOR, GOALS_AGAINST,
DIFFERENTIAL, POINTS, FORFEITS, RANKING
Now, when I run a SELECT against the view, it orders the results by GENDER, TEAMYEAR, CODE, TEAMID. Notice that it is ordering by TEAMID instead of POINTS as the order by clause specifies.
However, if I copy the SQL statement and run it exactly as is in a new query window, it orders correctly as specified by the ORDER BY clause.
The order of rows returned by a view with an ORDER BY clause is never guaranteed. If you need a specific row order, you must specify where you select from the view.
See this the note at the top of this Book On-Line entry.
SQL Server 2005 ignores TOP 100 PERCENT by design.
Try TOP 2000000000 instead.
Now, I'll try and find a reference... I was at a seminar presented by Itzak Ben-Gan who mentioned it
Found some...
Kimberly L. Tripp
"TOP 100 Percent ORDER BY Considered Harmful"
In this particular case, the optimizer
recognizes that TOP 100 PERCENT
qualifies all rows and does not need
to be computed at all.
Just use :
"Top (99) Percent "
or
"Top (a number 1000s times more than your data rows like 24682468123)"
it works! just try it.
In SQL server 2008, ORDER BY is ignored in views that use TOP 100 PERCENT. In prior versions of SQL server, ORDER BY was only allowed if TOP 100 PERCENT was used, but a perfect order was never guaranteed. However, many assumed a perfect order was guaranteed. I infer that Microsoft does not want to mislead programmers and DBAs into believing there is a guaranteed order using this technique.
An excellent comparative demonstration of this inaccuracy, can be found here...
http://blog.sqlauthority.com/2009/11/24/sql-server-interesting-observation-top-100-percent-and-order-by
Oops, I just noticed that this was already answered. But checking out the comparative demonstration is worth a look anyway.
Microsoft has fixed this. You have patch your sql server
http://support.microsoft.com/kb/926292
I found an alternative solution.
My initial plan was to create a 'sort_order' column that would prevent users from having to perform a complex sort.
I used a windowed function ROW_NUMBER. In the ORDER BY clause, I specified the default sort order that I needed (just as it would have been in the ORDER BY of a SELECT statement).
I get several positive outcomes:
By default, the data is getting returned in the default sort order I originally intended (this is probably due to the windowed function having to sort the data prior to assigning the sort_order value)
Other users can sort the data in alternative ways if they choose to
The sort_order column is there for a very specific sort need, making it easier for users to sort the data should whatever tool they use rearranges the rowset.
Note: In my specific application, users are accessing the view via Excel 2010, and by default the data is presented to the user as I had hoped without further sorting needed.
Hope this helps those with a similar problem.
Cheers,
Ryan
run a profiler trace on your database and see the query that's actually being run when you query your view.
You also might want to consider using a stored procedure to return the data from your view, ordered correctly for your specific use case.