SQL Top clause with order by clause - sql

I am bit new to SQL, I want to write query with TOP clause and order by clause.
So, for returning all the records I write below query
select PatientName,PlanDate as Date,* from OPLMLA21..Exams order
by PlanDate desc
And I need top few elements from same query, so I modified the query to
select top(5) PatientName,PlanDate as Date,* from OPLMLA21..Exams
order by PlanDate desc
In my understanding it will give the top 5 results from the previous query, but I see ambiguity there. I have attached the screen shot of query results .
May be my understanding is wrong, I read a lot but not able to understand this please help me out.

I stated this in a comment, however, to repeat that:
 TOP (5) doesn't give the "top results" of the prior query though, no. It gives the top (first) rows from the dataset defined in the query it in is. If there are multiple rows that have the same "rank", then the row(s) returned for that rank are arbitrary. So, for example, for your query if you have 100 rows all with the same value for PlanDate, what 5 rows you get are completely arbitrary and could be different (including the order they are in) every time you run said query.
What I mean by arbitrary is that, effectively, SQL Server is free to choose whatever rows, of those applicable, are returned. Sometimes this might be the same everytime you run the query, but this by luck more than anything. As your database gets larger, you have more users querying the data, you involve joins, things like locks, indexes, parrallelism, etc all will effect the "order" that SQL Server is processing said data, and will effect an ambigious TOP clause.
Take the example data below:
ID | SomeDate
---|---------
1 |2020-01-01
2 |2020-01-01
3 |2020-01-01
4 |2020-01-01
5 |2020-01-01
6 |2020-01-02
Now, what would you expect if I ran a TOP (2) against that table with an ORDER BY clause of SomeDate DESC. Well, certainly, you'd expect the "last" row (with an ID of 6) to be returned, but what about the next row? The other 5 rows all have the same value for SomeDate. Perhaps, because your under the impression that data in a table is pre-sorted, you might expect the row with a value of 5 for ID. What if I told you that there was a CLUSTERED INDEX on ID ASC; that might well end up meaning that the row with a value of 1 is returned. What if there is also an index on SomeDate DESC?
What if the table was 10,000 of rows in size, and you also have a JOIN to another table, which also has a CLUSTERED INDEX, and some user is performing a query with some specific row locking on in while you run your query? What would you expect then?
Without your ORDER BY being specific enough to ensure that each row has a distinct ordering position, SQL Server will return other rows in an arbitrary order and when mixed with a TOP means the "top" rows will also be arbitrary.
Side note: I note in your image (of what appears to be SSMS), your "dates" are in the format yyyyMMdd. This strongly implies that you are storing a date value as a varchar or int type. This is a design flaw and needs to be fixed. There are 6 date and time data types, and 5 of them are far superior to using a string and numerical data type to storing the data.

Related

How does Oracle SQL order rows when there is no ORDER BY

I have currently written SQL to bring back data in a select statement without using ORDER BY. From what I have read, the selection seems to be random? As I am trying to run tests on data that should be updated once selected, is there a way to 'predict' which entries will be returned by the select statement (and therefore should be modified)? My select statement is:
SELECT x.FIELD_A,
x.FIELD_B,
x.FIELD_C,
y.FIELD_D,
y.FIELD_E,
y.FIELD_F
FROM TABLE_X x
LEFT JOIN TABLE_Y Y ON x.FIELD_A = y.FIELD_G
WHERE y.FIELD_G in ('xxx', 'yyy', 'zzz', 'www')
AND y.FIELD_E = 'vvvv'
AND y.FIELD_G IS NULL
AND y.FIELD_H IS NULL
AND y.FIED_I IS NULL
AND x.FIELD_J IN ('uuu', 'ttt', 'sss', 'rrrr')
AND y.FIELD_K_DATE > ADD_MONTHS(SYSDATE, -12)
AND ROWNUM <= 20
How does Oracle SQL order rows when there is no ORDER BY?
It doesn't order them at all.
From a classic Tom Kyte article, and quoting his own book:
You should think of a heap organized table as a big unordered collection of rows. These rows will come out in a seemingly random order, and depending on other options being used (parallel query, different optimizer modes and so on), they may come out in a different order with the same query. Do not ever count on the order of rows from a query unless you have an ORDER BY statement on your query!
If you run your query several times today you may see the same results in the same order, but might not. You could run it a hundred times and perhaps get the same result each time, and think that means that's what it will always return. But if you run it again tomorrow it may or may not be the same. If you run it weeks or months or years in the future it still could be the same as you see today, but is increasingly likely not to be as other things change - including data volumes, statistics gathering, database upgrades or even potentially patching, and so on.
Without an order-by clause you are at the mercy of the optimiser and things even that can't control, potentially even down to the order blocks of data are read from disk.
So no, you can't predict which entries will be returned by your query. You will get 20 rows but which ones you get and what order those are in is non-deterministic.
If you need a predictable result you need to order the results before apply the rownum filter (or using fetch first).

Strange behavior when doing where and order by query in postgres

Background: A large table, 50M+, all column in query is indexed.
when I do a query like this:
select * from table where A=? order by id DESC limit 10;
In statement, A, id are both indexed.
Now confusing things happen:
the more rows where returned, the less time whole sql cost
the less rows where returned, the more time whole sql cost
I have a guess here: postgres do the order by first, and then where , so it cost more time to find 10 row in the orderd index when target rowset is small(like find 10 particular sand on beach); opposite, if target rowset is large, it's easy to find the first 10.
Is it right? Or there are some other reason for this?
Final question: How to optimize this situation?
It can either use the index on A to apply the selectivity, then sort on "id" and apply the limit. Or it can read them already in order using the index on "id", then filter out the ones that meet the A condition until it finds 10 of them. It will choose the one it thinks is faster, and sometimes it makes the wrong choice.
If you had a multi-column index, on (A,id) it could use that one index to do both things, get the selectivity on A and still fetch the already in order by "id", at the same time.
Do you know PGAdmin? With "explain verbose" before your statement, you can check how the query is executed (meaning the order of the operators). Usually first happens the filter and only afterwards the sorting...

Are big-query results always ordered, that is: using OFFSET makes sense to skip rows?

In other words does a select query order results every time, so these 2 will always produce unique values:
select *
from bigquery-public-data.crypto_ethereum.balances
limit 10 OFFSET 100
select *
from bigquery-public-data.crypto_ethereum.balances
limit 10 OFFSET 2000
Assuming of course the table has unique values...I am just curious if without using "order" clause the table is always deterministic/consequetive or can the results duplicate if they're returned indeed at random? 10x!
I am just curious if without using "order" clause the table is always deterministic/consequetive or can the results duplicate if they're returned indeed at random.
No. SQL tables represent unordered set of rows. There is no inherent ordering of the rows. Unless an order by clause is specified, there is no guarantee that two consequent executive of the same query would yield an indentical result. The database is free to return the rows in whatever order it likes.
As a consequence, the results of a query with a row-limiting clause but no order by clause are not deterministic. Do add an order by clause the these queries, or you will sooner or later run into suprising and hard-to-debug behaviors.

I am using select top 5 from(select * from tbl2) tbl, will it get all records from tbl2 or will it get only specific records i.e 5

I am using select top 5 from(select * from tbl2) tbl, will it get all records from tbl2 and then it will it get only specific records i.e 5? or do it only get 5 records from internal memory. suppose we have 1000 records in tbl2.
For future reference, SQL actually has a huge amount of amazing documentation/resources online. For instance: This google search.
As to your question, it will pull the top five results matching your criteria, so it depends. It'll go through, and find the first five results matching your criteria. If those are the last results it goes through, it'll still have to do the comparisons and filtering on all rows, the only difference will be that it will have to send less rows to your computer.
For example, let's say we have a table my_table with two columns: Customer_ID (which is unique) and First_Purchase_Date (which is a date value between 2015-01-01 and 2017-07-26). If we simply do SELECT TOP 5 * FROM my_table then it will go through and pull the first five rows it finds, without looking at the rest of the rows. On the other hand, if we do SELECT TOP 5 * FROM my_table WHERE First_Purchase_Date = '2017-05-17' then it will have to go through all the rows until it can find five rows with a First_Purchase_Date of 2017-05-17. If First_Purchase_Date is indexed, this should not be very expensive, as it'll more or less know where to look. If it's not, then it depends on your how SQL has decided to structure your table, and if it has created any useful statistics. Worst case, it could not have any statistics and the desired rows could be the last five in the database, in which case it will have to complete the comparison on all the rows in the database.
By the way, this is a somewhat poor idea, as the columns returned will not necessarily stay consistent over time. It may be a good idea to throw in an ORDER BY clause, to ensure you get the same records every time.
The SELECT TOP clause is used to specify the number of records to return.
It limits the rows returned in a query result set to a specified number of rows or percentage of rows. When TOP is used in conjunction with the ORDER BY clause, the result set is limited to the first N number of ordered rows; otherwise, it returns the first N number of rows in an undefined order.
SELECT TOP number|percent column_name(s)
FROM table_name
WHERE condition;

Implicit ORDER BY order in a table

I have a table with a timestamp column and a few other char columns. I have ordered the information in the table by the timestamp but due to the fact that there are more records on the same timestamp I do not know if the order displayed is the order they were inserted in the table.
Sadly there is no index on the table so apart from the timestamp there isn't anything else I could use to order them by.
Example:
Timestamp | Foregin table | Foreign table value
2012-10-09 19:29:50.000 | tableA | "random string here"
2012-10-09 19:29:50.000 | tableA | "different string here"
2012-10-09 19:29:50.000 | tableB | "another random string here"
2012-10-09 19:29:50.000 | tableC | "string here"
2012-10-09 19:29:50.000 | tableD | "another string here"
The query that I run is something like this:
SELECT *
FROM TABLE
WHERE Timestamp BETWEEN 'x' and 'y'
ORDER BY Timestamp
Reviewing my results I assume, but I am not sure, that the query returns the data ordered by the specified column and continues ordering of the results by the next column/columns, in case there are more than 1 column, in an ascending manner (maybe by length of text or by alphabetical order).
Could you please help me clarify this situation as it is very important for me to find out the order.
SQL Server in general has no order guarantee for queries not specifying an ORDER BY. So a SELECT * FROM Tbl can essentially return the rows in any random order. Because of how the data is stored on disk, if you have an empty cache and no other activity going on on the server, you will get the data in on-disk order with a fairly high probability. That probability goes down dramatically if there are other concurrent queries on your server and if the cache is not empty. So while you might observe a specific behavior in your tests, you will not get the same behavior in production.
That said, if you need the rows to come back in a specific order you need to specify an appropriate ORDER BY clause. In the case above you could use ORDER BY [Timestamp], [Foreign table], [Foreign table value] to make sure that the records always come back in the same order. If your query contains more columns and none of them is unique you need to add all those to the ORDER BY too. But keep in mind that sorting gets more expensive the more columns are involved.
SQL Server will order the results using the explicit ORDER BY clause and gives no gaurantees as to the order of results for which all columns in the ORDER BY clause are equal.
If you want it to then order by one of your char columns alphabetically you need to specify that in your ORDER BY clause. It may be the case that your current results set is sorted in this way but it is highly unlikely that this will continue indefinitely.
Tables in databases are inherently unordered. This answer is a clarification of Sebastian's answer (the clarification is too long for a comment).
The return order is not random. It is arbitrary. And, it can change from invocation to invocation. On a table that has deletes as well as inserts, then later records can be interspersed on data pages with earlier records. Once again, there is no concept of ordering within a data table. Ordering is only part of query statements.
A bigger factor than other queries running on the system is multi-threading (and to a less extant multiple partitions).
If you have an empty cache, a query that does not access the table through an index, no deletes in the table, a single partition, and a single threaded system, then the data would normally be returned in insert order. Even in this case, though, there are no guarantees. If you want the data in a particular order, use order by. If you don't want a performance hit, include an identity primary key and use that for the ordering.