Are the results deterministic, if I partition SQL SELECT query without ORDER BY? - sql

I have SQL SELECT query which returns a lot of rows, and I have to split it into several partitions. Ie, set max results to 10000 and iterate the rows calling the query select time with increasing first result (0, 10000, 20000). All the queries are done in same transaction, and data that my queries are fetching is not changing during the process (other data in those tables can change, though).
Is it ok to use just plain select:
select a from b where...
Or do I have to use order by with the select:
select a from b where ... order by c
In order to be sure that I will get all the rows? In other word, is it guaranteed that query without order by will always return the rows in the same order?
Adding order by to the query drops performance of the query dramatically.
I'm using Oracle, if that matters.
EDIT: Unfortunately I cannot take advantage of scrollable cursor.

Order is definitely not guaranteed without an order by clause, but whether or not your results will be deterministic (aside from the order) would depend on the where clause. For example, if you have a unique ID column and your where clause included a different filter range each time you access it, then you would have non-ordered deterministic results, i.e.:
select a from b where ID between 1 and 100
select a from b where ID between 101 and 200
select a from b where ID between 201 and 300
would all return distinct result sets, but order would not be any way guaranteed.

No, without order by it is not guaranteed that query will ALWAYS return the rows in the same order.

No guarantees unless you have an order by on the outermost query.
Bad SQL Server example, but same rules apply. Not guaranteed order even with inner query
SELECT
*
FROM
(
SELECT
*
FROM
Mytable
ORDER BY SomeCol
) foo

Use Limit
So you would do:
SELECT * FROM table ORDER BY id LIMIT 0,100
SELECT * FROM table ORDER BY id LIMIT 101,100
SELECT * FROM table ORDER BY id LIMIT 201,100
The LIMIT would be from which position you want to start and the second variable would be how many results you want to see.
Its a good pagnation trick.

Related

SQL Server - Pagination Without Order By Clause

My situation is that a SQL statement which is not predictable, is given to the program and I need to do pagination on top of it. The final SQL statement would be similar to the following one:
SELECT * FROM (*Given SQL Statement*) b
OFFSET 0 ROWS FETCH NEXT 50 ROWS ONLY;
The problem here is that the *Given SQL Statement* is unpredictable. It may or may not contain order by clause. I am not able to change the query result of this SQL Statement and I need to do pagination on it.
I searched for solution on the Internet, but all of them suggested to use an arbitrary column, like primary key, in order by clause. But it will change the original order.
The short answer is that it can't be done, or at least can't be done properly.
The problem is that SQL Server (or any RDBMS) does not and can not guarantee the order of the records returned from a query without an order by clause.
This means that you can't use paging on such queries.
Further more, if you use an order by clause on a column that appears multiple times in your resultset, the order of the result set is still not guaranteed inside groups of values in said column - quick example:
;WITH cte (a, b)
AS
(
SELECT 1, 'a'
UNION ALL
SELECT 1, 'b'
UNION ALL
SELECT 2, 'a'
UNION ALL
SELECT 2, 'b'
)
SELECT *
FROM cte
ORDER BY a
Both result sets are valid, and you can't know in advance what will you get:
a b
-----
1 b
1 a
2 b
2 a
a b
-----
1 a
1 b
2 a
2 b
(and of course, you might get other sorts)
The problem here is that the *Given SQL Statement" is unpredictable. It may or may not contain order by clause.
your inner query(unpredictable sql statement) should not contain order by,even if it contains,order is not guaranteed.
To get guaranteed order,you have to order by some column.for the results to be deterministic,the ordered column/columns should be unique
Please note: what I'm about to suggest is probably horribly inefficient and should really only be used to help you go back to the project leader and tell them that pagination of an unordered query should not be done. Having said that...
From your comments you say you are able to change the SQL statement before it is executed.
You could write the results of the original query to a temporary table, adding row count field to be used for subsequent pagination ordering.
Therefore any original ordering is preserved and you can now paginate.
But of course the reason for needing pagination in the first place is to avoid sending large amounts of data to the client application. Although this does prevent that, you will still be copying data to a temp table which, depending on the row size and count, could be very slow.
You also have the problem that the page size is coming from the client as part of the SQL statement. Parsing the statement to pick that out could be tricky.
As other notified using anyway without using a sorted query will not be safe, But as you know about it and search about it, I can suggest using a query like this (But not recommended as a good way)
;with cte as (
select *,
row_number() over (order by (select 0)) rn
from (
-- Your query
) t
)
select *
from cte
where rn between (#pageNumber-1)*#pageSize+1 and #pageNumber*#pageSize
[SQL Fiddle Demo]
I finally found a simple way to do it without any order by on a specific column:
declare #start AS INTEGER = 1, #count AS INTEGER = 5;
select * from (SELECT *,ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS fakeCounter
FROM (select * from mytable) AS t) AS t2 order by fakeCounter OFFSET #start ROWS
FETCH NEXT #count ROWS ONLY
where select * from mytable can be any query

Reverse initial order of SELECT statement

I want to run a SQL query in Postgres that is exactly the reverse of the one that you'd get by just running the initial query without an order by clause.
So if your query was:
SELECT * FROM users
Then
SELECT * FROM users ORDER BY <something here to make it exactly the reverse of before>
Would it just be this?
ORDER BY Desc
You are building on the incorrect assumption that you would get rows in a deterministic order with:
SELECT * FROM users;
What you get is really arbitrary. Postgres returns rows in any way it sees fit. For simple queries typically in order of their physical storage, which typically is the order in which rows were entered. But there are no guarantees, and the order may change any time between two calls. For instance after any UPDATE (writing a new physical row version), or when any background process reorders rows - like VACUUM. Or a more complex query might return rows according to an index or a join. Long story short: there is no reliable order for table rows in a relational database unless you specify it with ORDER BY.
That said, assuming you get rows from the above simple query in the order of physical storage, this would get you the reverse order:
SELECT * FROM users
ORDER BY ctid DESC;
ctid is the internal tuple ID signifying physical order. Related:
In-order sequence generation
How list all tables with data changes in the last 24 hours?
here is a tsql solution, thid might give you an idea how to do it in postgres
select * from (
SELECT *, row_number() over( order by (select 1)) rowid
FROM users
) x
order by rowid desc

Select rows randomly without changing the order in sql query

I searched everywhere to find an SQL query to select rows randomly without changing the order. Almost everyone uses something like this:
SELECT * FROM table WHERE type = 1 ORDER BY RAND() LIMIT 25
But above query changes the order. I need a query which selects randomly among the rows but doesn't changes the order, cause every record has a date also.
Select the random rows and then re-order them:
select t.*
from (select *
from table t
where type = 1
order by rand()
limit 25
) t
order by datecol;
In SQL, if you want rows in a particular order, you need to use an explicit order by clause. You should never depend on the ordering of results with no order by. SQL does not guarantee the ordering. MySQL does not guarantee the ordering, unless the query has an order by.

Why does distinct does not give results in the order?

I ordered my results by their id's by:
CREATE TABLE my_table2 AS SELECT * FROM my_table ORDER BY record_group_id;
now when i execute:
SELECT DISTINCT record_group_id FROM my_table2 where rownum <=1000000;
I get gorup id's in random order, though my order by went fine:
Here is few of the records in result set
1599890050
1647717203
1647717120
1647717172
1647716972
1647717196
1647717197
1647717205
1599889999
1599889986
What could be the possible reason?
Shouldn't DISTINCT statement return records in same order as they are in table?
Neither SELECT or DISTINCT defines the order of data.
If you want ordered data explicitly define the Order you need.
SELECT DISTINCT record_group_id
FROM my_table2
WHERE rownum <=1000000
ORDER BY record_group_id;
The ordering only determines the order of the source data that is inserted in the table. If there is no clustered index in the table, that means that the records will be stored in that order physically.
However, how the records are stored doesn't guarantee that they will be selected in that order. The execution planner determines the most efficient way to run the query, which means that the data might not be fetched the way that you think it is, and it can differ from time to time as the data changes, or just the statistics about the data.
For a simple query like in the example, you usually get a predictable result, but there is no guarantee, so you always need to sort the query where you fetch the data to be sure to get a predictable result.
One reason that you don't get the data in the order that they are stored in the table in this case, may be that an index is used for filtering the result, and the records are returned in the order of the index rather than the order of the table.
Use ORDER BY on your SELECT statement:
SELECT DISTINCT record_group_id
FROM my_table2
WHERE rownum <=1000000
ORDER BY record_group_id;
Using DISTINCT has no effect on order, only on uniqueness of values.
If you want to control order too:
SELECT DISTINCT record_group_id
FROM my_table2
WHERE rownum <= 1000000
ORDER BY record_group_id -- Added this line
Your assumption that data in the table is ordered is wrong.
There is no implicit ordering in a database table - it's just a bag of unsorted data.
If you need ordered data, you'll have to use ORDER BY - there's no way around it (neither DISTINCT nor GROUP BY nor ...), see TomKyte Blog on Order By

Teradata - limiting the results using TOP

I am trying to fetch a huge set of records from Teradata using JDBC. And I need to break this set into parts for which I'm using "Top N" clause in select.
But I dont know how to set the "Offset" like how we do in MySQL -
SELECT * FROM tbl LIMIT 5,10
so that next select statement would fetch me the records from (N+1)th position.
RANK and QUALIFY I beleive are your friends here
for example
SEL RANK(custID), custID
FROM mydatabase.tblcustomer
QUALIFY RANK(custID) < 1000 AND RANK(custID) > 900
ORDER BY custID;
RANK(field) will (conceptually) retrieve all the rows of the resultset,
order them by the ORDER BY field and assign an incrementing rank ID to them.
QUALIFY allows you to slice that by limiting the rows returned to the qualification expression, which now can legally view the RANKs.
To be clear, I am returning the 900-1000th rows in the query select all from cusotmers,
NOT returning customers with IDs between 900 and 1000.
You can also use the ROW_NUMBER window aggregate on Teradata.
SELECT ROW_NUMBER() OVER (ORDER BY custID) AS RowNum_
, custID
FROM myDatabase.myCustomers
QUALIFY RowNum_ BETWEEN 900 and 1000;
Unlike the RANK windows aggregate, ROW_NUMBER will provide you a sequence regardless of whether the column you are ordering over the optional partition set is unique or not.
Just another option to consider.