rownum in sql query results - sql

i have 10K records in a table and i use rownum to fetch first 5000 records , does frequently accessed records have lower rownum and show up in this first 5000 records when i use rownum with them or it is based on insertion order into the table
ie records inserted first will have a lower rownum and so on.
I m looking at how the Oracle engine decides giving a rownum to a row.

rownum is a pseudo-column that is assigned by the query engine when the results from the query are returned.
It is not assigned to any particular rows in the database. Hence, frequently accessed records have nothing to do with the assignment.

From the documentation
The ROWNUM pseudocolumn returns a number indicating the order in which
Oracle selects the row from a table or set of joined rows.
This is dynamic in nature and allocated only at runtime. So, there is no relationship between ROWNUM and frequently accessed records.

ROWNUM simply assigns a unique number to each row of the result, which can be used to uniquely identify the rows in your result. It is a pseudo column created by the Oracle.
Read Oracle documentation for more details.

Related

Fetch rows in batches from a table/tables in guaranteed order in SQL Server 2016

I am creating a bulk load utility in java that will read rows from tables from source database and populate data in destination database as the destination database is empty.
I started with select statement as below: This will return me batches and as order was guaranteed by order by clause. Everything works fine assuming no record with past date (created_date) is inserted while this bulk utility in process.
SELECT * FROM dbo.${batch.name}
ORDER BY created_date
OFFSET ${batch.offset} ROWS
FETCH NEXT ${batch.batchSize} ROWS ONLY;
But later I realize there are some tables those don't have created_date column.
As per the SQL-Server order is not guaranteed if you don't specify explicit order by clause. So I can not remove order by, but as created_date is not in all table this query will fail.
Is there a generic select query that can return me rows in an order taking into consideration that all tables don't have a common column for order by clause OR Any query that can return the rows in insertion order?
Will the following query work? What if more rows were inserted while this batch utility is in progress using following query?
SELECT * FROM dbo.${batch.name}
ORDER BY (SELECT 1)
OFFSET ${batch.offset} ROWS
FETCH NEXT ${batch.batchSize} ROWS ONLY;
Thanks.
There is no such thing as insertion order. By definition a table is an unordered set. Order only happens when you order your resultset. So no, what you want is not possible. What is worse is you may think that it does because it will usually return them in the order of the clustered index....until suddenly it doesn't.
https://blogs.msdn.microsoft.com/conor_cunningham_msft/2008/08/27/no-seatbelt-expecting-order-without-order-by/
Because we can not fetch rows in insertion order so I changed my query to take order by column for each table that I want to bulk load. It has a downside that while bulk load process is in progress we can not insert into those table. So need to take a snapshot of source database while bulk load process is running.
SELECT * FROM dbo.${batch.name}
ORDER BY ${batch.orderByColumn}
OFFSET ${batch.offset} ROWS
FETCH NEXT ${batch.batchSize} ROWS ONLY;
Thanks #Sean Lange and #Jason A. Long for you response.

How does order by row id work in Oracle?

I have a table called points. I executed the following query and expected a list of lexicographicaly sorted list of ROWIDs but that did not happen. How does Order by rowid sorts the row?
select rowid from points order by rowid
I had rows like following
AAAE6MAAFAAABiSAAA
AAAE6MAAFAAABi+AAA
2nd row is lexicographicaly smaller than first row. So what is sorting criteria if it is not lecxicographical sorting?
Why you see is only a representation used for display purposes.
The actual rowid contains binary information about the data block, the row in the block, the file where the block is located and the internal object id of the table (See the manual for details)
When you use order by rowid Oracle sorts the rows based on that (internal) information, not based on the "string representation".
If you change your query to:
select rowid,
dbms_rowid.rowid_relative_fno(rowid) as rel_fno,
dbms_rowid.rowid_row_number(rowid) as row_num,
dbms_rowid.rowid_block_number(rowid) as block_num,
dbms_rowid.rowid_object(rowid)
from points
order by rowid
You will most probably see the logic behind the ordering of the rownumber.
Note that the value for dbms_rowid.rowid_object() will always be the same. And if you only have two rows in your table, both will most probably also have the same value for rowid_block_number()
The sequence of rowid is not guranteed. It depends on how you have set the NLS settings. Also rowid represents the physical allocation of the row in the database. A rowid is considered immutable(does not change) but if you delete a row and insert it again then it changes.
If you delete a row, then Oracle may reassign its rowid to a new row
inserted later.

Can SQL return different results for two runs of the same query using ORDER BY?

I have the following table:
CREATE TABLE dbo.TestSort
(
Id int NOT NULL IDENTITY (1, 1),
Value int NOT NULL
)
The Value column could (and is expected to) contain duplicates.
Let's also assume there are already 1000 rows in the table.
I am trying to prove a point about unstable sorting.
Given this query that returns a 'page' of 10 results from the first 1000 inserted results:
SELECT TOP 10 * FROM TestSort WHERE Id <= 1000 ORDER BY Value
My intuition tells me that two runs of this query could return different rows if the Value column contains repeated values.
I'm basing this on the facts that:
the sort is not stable
if new rows are inserted in the table between the two runs of the query, it could possibly create a re-balancing of B-trees (the Value column may be indexed or not)
EDIT: For completeness: I assume rows never change once inserted, and are never deleted.
In contrast, a query with stable sort (ordering also by Id) should always return the same results, since IDs are unique:
SELECT TOP 10 * FROM TestSort WHERE Id <= 1000 ORDER BY Value, Id
The question is: Is my intuition correct? If yes, can you provide an actual example of operations that would produce different results (at least "on your machine")? You could modify the query, add indexes on the Values column etc.
I don't care about the exact query, but about the principle.
I am using MS SQL Server (2014), but am equally satisfied with answers for any SQL database.
If not, then why?
Your intuition is correct. In SQL, the sort for order by is not stable. So, if you have ties, they can be returned in any order. And, the order can change from one run to another.
The documentation sort of explains this:
Using OFFSET and FETCH as a paging solution requires running the query
one time for each "page" of data returned to the client application.
For example, to return the results of a query in 10-row increments,
you must execute the query one time to return rows 1 to 10 and then
run the query again to return rows 11 to 20 and so on. Each query is
independent and not related to each other in any way. This means that,
unlike using a cursor in which the query is executed once and state is
maintained on the server, the client application is responsible for
tracking state. To achieve stable results between query requests using
OFFSET and FETCH, the following conditions must be met:
The underlying data that is used by the query must not change. That is, either the rows touched by the query are not updated or all
requests for pages from the query are executed in a single transaction
using either snapshot or serializable transaction isolation. For more
information about these transaction isolation levels, see SET
TRANSACTION ISOLATION LEVEL (Transact-SQL).
The ORDER BY clause contains a column or combination of columns that are guaranteed to be unique.
Although this specifically refers to offset/fetch, it clearly applies to running the query multiple times without those clauses.
If you have ties when ordering the order by is not stable.
LiveDemo
CREATE TABLE #TestSort
(
Id INT NOT NULL IDENTITY (1, 1) PRIMARY KEY,
Value INT NOT NULL
) ;
DECLARE #c INT = 0;
WHILE #c < 100000
BEGIN
INSERT INTO #TestSort(Value)
VALUES ('2');
SET #c += 1;
END
Example:
SELECT TOP 10 *
FROM #TestSort
ORDER BY Value
OPTION (MAXDOP 4);
DBCC DROPCLEANBUFFERS; -- run to clear cache
SELECT TOP 10 *
FROM #TestSort
ORDER BY Value
OPTION (MAXDOP 4);
The point is I force query optimizer to use parallel plan so there is no guaranteed that it will read data sequentially like Clustered index probably will do when no parallelism is involved.
You cannot be sure how Query Optimizer will read data unless you explicitly force to sort result in specific way using ORDER BY Id, Value.
For more info read No Seatbelt - Expecting Order without ORDER BY.
I think this post will answer your question:
Is SQL order by clause guaranteed to be stable ( by Standards)
The result is everytime the same when you are in a single-threaded environment. Since multi-threading is used, you can't guarantee.

The Subquery which returns multiple rows in Oracle SQL

I have a complex SQL query with multiple sub queries. The Query returns a very big data. The tables are dynamic and they get updated every day. Yesterday, the query didn't execute, because one of the subqueries returned multiple rows.
The subquery would be something like this.
Select Value1 from Table1 where Table1.ColumnName = 123456
Table1.ColumnName will be fetched dynamically, nothing will be hardcoded. Table1.ColumnName will be fetched from another subquery which runs perfectly.
My Question would be,
How to find which value in the particular subquery returned two rows.
How to find which value in the particular subquery returned two rows.
You need to check each sub-query whether it returns a single-row or multiple-rows for a value. You can use the COUNT function to verify -
select column_name, count(*) from table_name
group by column_name
having count(*) > 1
The above is the sub-query for which it checks the count of rows grouped by each value, if any value returns more than one row, that value is the culprit.
Once you get to know which sub-query and respective column is the culprit, you coulkd then use ROWNUM or ANALYTIC functions to limit the number of rows.

How do I limit the rowcount in an SSIS data flow task?

I have an Oracle source, and I'm getting the entire table, and it is being copied to a SQL Server 2008 table that looks the same. Just for testing, I would like to only get a subset of the table.
In the old DTS packages, under Options on the data transform, I could set a first and last record number, and it would only get that many records.
If I were doing a query, I could change it to a select top 5000 or set rowcount 5000 at the top (maybe? This is an Oracle source). But I'm grabbing the entire table.
How do I limit the rowcount when selecting an Oracle table?
We can use the rowcount component in the dataflow and after the component make User::rowCount <= 500 in the precedence constraint condition while loding into the target. Whenever the count >500 the process stops to inserts the data into the target table.
thanks
prav
It's been a while since I've touched pl/sql, but I would think that you could simply put a where condition of "rownum <= n" where n = the number of rows that you want for your sample. ROWNUM is a pseudo-column that exists on each Oracle table . . . it's a handy feature for problems like this (it's equivalent to t-sql's row_number() function without the ability to partition and sort (I think). This would keep you from having to bring in the whole table into memory:
select col1, col2
from tableA
where rownum <= 10;
For future reference (and only because I've been working with it lately), DB2's equivalent for this is the clause "fetch first n only" at the end of the statement:
select col1, col2
from tableA
fetch first 10 only;
Hope I've not been too off base.
The row sampling component in the data flow restricts the number of rows. Just insert it between your source and destination and set the number of rows. Very useful for a large amount of data and when you can not modify the query. In this example, I execute an SP in the source.
See example below