I have a table that contains 300 million rows, and a clustered index on the [DataDate] column.
How do I select the last 10 rows of this table (I want to find the most recent date in the table)?
Database: Microsoft SQL Server 2008 R2.
Update
The answers below work perfectly - but only if there is a clustered index on [DataDate]. The table is, after all, 300 million rows, and a naive query would end up taking hours to execute rather than seconds. The query plan is using the clustered index on [DataDate] to get results within a few tens of milliseconds.
TOP
SELECT TOP(10) [DataDate] FROM YourTable ORDER BY [DataDate] DESC
TOP (Transact-SQL) specifies that only the first set of rows will be returned from the query result. The set of rows can be either a number or a percent of the rows. The TOP expression can be used in SELECT, INSERT, UPDATE, MERGE, and DELETE statements.
SELECT TOP(10) *
FROM MyTable
ORDER BY DataDate DESC
Do a reverse sort using ORDER BY and use TOP.
Related
I have a big table with two datetime columns.
[Timestamp] and [TimestampRounded]
The [Timestamp] column has the full timestamp including milliseconds and the table has no index for this column.
The [TimestampRounded] column has the timestamp but milliseconds, seconds, and minutes truncated (set to 0). The table has a clustered index for this column. That is, the table is effectively stored in the order of this column. Typically the newest row is on the top of the table. The index was created like this:
CREATE CLUSTERED INDEX cidx_time ON [dbo].[MyTable] ([TimestampRounded] DESC)
Now, I want to retrieve some data leveraging my clustered index so I do the following select, my table has around 5 million rows.
Query 1:
SELECT TOP(100) * FROM [dbo].[MyTable] ORDER BY [TimestampRounded] DESC
This query returns immediately (less than 1 second). But the 100 returned rows are not ordered with respect to milliseconds, only by hour.
Then I learned if I also want to order by a second column I do:
Query 2:
SELECT TOP(100) * FROM [dbo].[MyTable] ORDER BY [TimestampRounded] DESC, [Timestamp] DESC
This query is very slow and takes around 23 seconds to return the 100 rows.
My immediate solution was to use the first query and then just order those returned 100 rows in my client frontend code. But I experienced some problem that I missed rows that should have returned so I would like to understand how I can fix/rewrite query 2 to return those 100 sorted rows as expected, and by reasonable logic should also take less than 1 second. Since the table is already stored by hour (clustered index) I do not understand why it should take longer.
I might be oversimplifying, but why not simply create an index on the column that stores the entire timestamp?
CREATE INDEX cidx_time2 ON [dbo].[MyTable] ([Timestamp] DESC)
Then, you can just do:
SELECT TOP(100) * FROM [dbo].[MyTable] ORDER BY [[Timestamp] DESC
Or, if you need to two timestamps in the order by clause for some reason, then you want an index on both columns:
CREATE INDEX cidx_time3 ON [dbo].[MyTable] ([TimestampRounded] DESC, [Timestamp] DESC);
Then you can run your original query:
SELECT TOP(100) * FROM [dbo].[MyTable] ORDER BY [TimestampRounded] DESC, [Timestamp] DESC
Specify WITH TIES so sqlserver will return you [upto] "several thousand" rows that have all the same rounded timestamp value, then order those several thousand by the precise time stamp to get your truly most recent 100; quicker to sort thousands than millions
I have a table with approximately 100 million rows (TABLE_A), I need to select 6 millons different rows each query, once the entire table is selected, the process ends. TABLE_A does not have index or primary key, and ORDER BY is very expensive in terms of time, also I don't need any order here, just different rows. I have tried to order using ROWID, according to this,
They are the fastest way to access a single row.
This query works but takes about 5 minutes (I would like to avoid this order by)
SELECT * FROM TABLE_A ORDER BY ROWID
OFFSET 6000000 ROWS FETCH NEXT 6000000 ROWS ONLY;
This query works faster but has no sense since ROWNUM, according to this
returns a number indicating the order in which Oracle selects the row
from a table
SELECT * FROM TABLE_A ORDER BY ROWNUM asc
OFFSET 6000000 ROWS FETCH NEXT 6000000 ROWS ONLY;
As expected, same query returns different results each time.
This query seems to be conceptually better.
SELECT * FROM TABLE_A WHERE ROWID >= 6000000 AND ROWID <12000000;
But it can't be done in this way, ROWID (UROWID Datatype) has values like AAAZIUAHWAAC6XcAAI
So, Is there a way to select different rows avoiding order? and just call the rows using some kind of internal ID, maybe a direction in the storage or maybe a default order. The whole approach was likely wrong, so I'm open to radical changes.
I've also tried somethig like this
SELECT * FROM TABLE_A
WHERE dbms_rowid.rowid_block_number(rowid)
BETWEEN 2049 AND 261281;
it's surprisingly fast but unfortunately a row could have more than one block number.
Based on your last comment, some things to look at:
DBMS_PARALLEL_EXECUTE
If you are going through 100 million rows, the best place to process them is on the database itself. If your processing is done with PL/SQL, then dbms_parallel_execute can manage most of the parallelisation for you, and carve up the rows.
ROWID ranges
Even if you don't process the rows on the database, you can use DBMS_PARALLEL_EXECUTE to produce the rowid ranges for you. Then use those start-end pairs as inputs to whatever app you are using to do the processing
simple MOD
Each instance of your app gets an ID from 0 to 'n-1' and each issues a query
select *
from (
select rownum r, m.* from my_table
)
where mod(r,"n") = :x
where x is that app's ID. If you already have a numeric sequence column of some sort that is reasonably distributed, you can substitute that in for the rownum
SQL Server 1 million records: best way to get fastest last record of table?
Example: I have a table A with 1 million records. What is the way to get fastest last records?
I know: SELECT TOP 1 * FROM A ORDER BY ID DESC
But I think It's not good way for me.
The query in your question will perform very well if you have a clustered index (which may be the primary key index) on ID. There is no faster way to retrieve all columns from a single row of a table.
I'll add that a table is logically an unordered set of rows so ORDER BY is required to return a "last" or "first" row. The b-tree index on the ORDER BY column will locate the row efficiently.
you have only one way index on primary key and where values . order by has a little bit cost but it's ok if you has index on order Column
--ORDER BY 1 DESC means order by primary key index desc
SELECT [Columns] FROM [TABLENAME] ORDER BY 1 DESC
--or you can use this if your first column is IDENTITY or A/A
SELECT [Columns] FROM [TABLENAME] ORDER BY [YOUR_COLUMN_WITHA/A ] DESC
I have a query that select the last 5($new) items from my database.
SELECT OvenRunData.dataId AS id, OvenRunData.data AS data
FROM ovenRuns INNER JOIN OvenRunData ON OvenRuns.id = OvenRunData.ovenRunId
WHERE OvenRunData.ovenRunId = (SELECT MAX(id) FROM OvenRuns)
ORDER BY id DESC LIMIT '$new'
I want to execute this query every 5 seconds with an AJAX request so I can update my table.
I know this query select the last 5 records but I want to know if the query runs through all records and then selects the last 5 or does it select only the last 5 without checking all the data?
I'm really worried that I'll have lag.
You need two indexes to make it fast enough:
create index ix_OvenRuns_id on OvenRuns(id)
create index ix_OvenRunData_ovenRunId on OvenRunData(ovenRunId)
you can even put OvenRunData.dataId OvenRunData.data into the second one, or create clustered index, however, these indexes definitely avoid full data scan.
That depends on the indexes.
In your case, you should have one on OverRuns(id).
More here: http://use-the-index-luke.com/sql/partial-results/top-n-queries
The LIMIT is applied after the ORDER BY, and the ORDER BY is applied to the entire result-set. So the answer to your question is, yes it must go through all of the records in your result-set determined by your WHERE clause before applying the LIMIT.
Here is my SQL query:
select * from TABLE T where ROWNUM<=100
If i execute this and then re-execute this, I don't get the same result. Why?
Also, on a sybase system if i execute
set rowcount 100
select * from TABLE
even on re-execution i get the same result?
Can someone explain why? and provide possible solution for RowNum
Thanks
If you don't use ORDER BY in your query you get the results in natural order.
Natural order is whatever is fastest for the database at the moment.
A possible solution is to ORDER BY your primary key, if it's an INT
SELECT TOP 100 START AT 0 * FROM TABLE
ORDER BY TABLE.ID;
If your primary key is not a sequentially incrementing integer and you don't have another column to order by (such as a timestamp) you may need to create an extra column SORT_ORDER INT and increment in automatically on insert using either an Autoincrement column or a sequence and an insert trigger, depending on the database.
Make sure to create an index on that column to speed up the query.
You need to specify an ORDER BY. Queries without explicit ORDER BY clause make no guarantee about the order in which the rows are returned. And from this result set you take the first 100 rows. As the order in which the rows can be different every time, so can be your first 100 rows.
You need to use ORDER BY first, followed by ROWNUM. You will get inconsistent results if you don't follow this order.
select * from
(
select * from TABLE T ORDER BY rowid
) where ROWNUM<=100