I searched everywhere to find an SQL query to select rows randomly without changing the order. Almost everyone uses something like this:
SELECT * FROM table WHERE type = 1 ORDER BY RAND() LIMIT 25
But above query changes the order. I need a query which selects randomly among the rows but doesn't changes the order, cause every record has a date also.
Select the random rows and then re-order them:
select t.*
from (select *
from table t
where type = 1
order by rand()
limit 25
) t
order by datecol;
In SQL, if you want rows in a particular order, you need to use an explicit order by clause. You should never depend on the ordering of results with no order by. SQL does not guarantee the ordering. MySQL does not guarantee the ordering, unless the query has an order by.
Related
I want to run a SQL query in Postgres that is exactly the reverse of the one that you'd get by just running the initial query without an order by clause.
So if your query was:
SELECT * FROM users
Then
SELECT * FROM users ORDER BY <something here to make it exactly the reverse of before>
Would it just be this?
ORDER BY Desc
You are building on the incorrect assumption that you would get rows in a deterministic order with:
SELECT * FROM users;
What you get is really arbitrary. Postgres returns rows in any way it sees fit. For simple queries typically in order of their physical storage, which typically is the order in which rows were entered. But there are no guarantees, and the order may change any time between two calls. For instance after any UPDATE (writing a new physical row version), or when any background process reorders rows - like VACUUM. Or a more complex query might return rows according to an index or a join. Long story short: there is no reliable order for table rows in a relational database unless you specify it with ORDER BY.
That said, assuming you get rows from the above simple query in the order of physical storage, this would get you the reverse order:
SELECT * FROM users
ORDER BY ctid DESC;
ctid is the internal tuple ID signifying physical order. Related:
In-order sequence generation
How list all tables with data changes in the last 24 hours?
here is a tsql solution, thid might give you an idea how to do it in postgres
select * from (
SELECT *, row_number() over( order by (select 1)) rowid
FROM users
) x
order by rowid desc
I ordered my results by their id's by:
CREATE TABLE my_table2 AS SELECT * FROM my_table ORDER BY record_group_id;
now when i execute:
SELECT DISTINCT record_group_id FROM my_table2 where rownum <=1000000;
I get gorup id's in random order, though my order by went fine:
Here is few of the records in result set
1599890050
1647717203
1647717120
1647717172
1647716972
1647717196
1647717197
1647717205
1599889999
1599889986
What could be the possible reason?
Shouldn't DISTINCT statement return records in same order as they are in table?
Neither SELECT or DISTINCT defines the order of data.
If you want ordered data explicitly define the Order you need.
SELECT DISTINCT record_group_id
FROM my_table2
WHERE rownum <=1000000
ORDER BY record_group_id;
The ordering only determines the order of the source data that is inserted in the table. If there is no clustered index in the table, that means that the records will be stored in that order physically.
However, how the records are stored doesn't guarantee that they will be selected in that order. The execution planner determines the most efficient way to run the query, which means that the data might not be fetched the way that you think it is, and it can differ from time to time as the data changes, or just the statistics about the data.
For a simple query like in the example, you usually get a predictable result, but there is no guarantee, so you always need to sort the query where you fetch the data to be sure to get a predictable result.
One reason that you don't get the data in the order that they are stored in the table in this case, may be that an index is used for filtering the result, and the records are returned in the order of the index rather than the order of the table.
Use ORDER BY on your SELECT statement:
SELECT DISTINCT record_group_id
FROM my_table2
WHERE rownum <=1000000
ORDER BY record_group_id;
Using DISTINCT has no effect on order, only on uniqueness of values.
If you want to control order too:
SELECT DISTINCT record_group_id
FROM my_table2
WHERE rownum <= 1000000
ORDER BY record_group_id -- Added this line
Your assumption that data in the table is ordered is wrong.
There is no implicit ordering in a database table - it's just a bag of unsorted data.
If you need ordered data, you'll have to use ORDER BY - there's no way around it (neither DISTINCT nor GROUP BY nor ...), see TomKyte Blog on Order By
I am trying to fetch a huge set of records from Teradata using JDBC. And I need to break this set into parts for which I'm using "Top N" clause in select.
But I dont know how to set the "Offset" like how we do in MySQL -
SELECT * FROM tbl LIMIT 5,10
so that next select statement would fetch me the records from (N+1)th position.
RANK and QUALIFY I beleive are your friends here
for example
SEL RANK(custID), custID
FROM mydatabase.tblcustomer
QUALIFY RANK(custID) < 1000 AND RANK(custID) > 900
ORDER BY custID;
RANK(field) will (conceptually) retrieve all the rows of the resultset,
order them by the ORDER BY field and assign an incrementing rank ID to them.
QUALIFY allows you to slice that by limiting the rows returned to the qualification expression, which now can legally view the RANKs.
To be clear, I am returning the 900-1000th rows in the query select all from cusotmers,
NOT returning customers with IDs between 900 and 1000.
You can also use the ROW_NUMBER window aggregate on Teradata.
SELECT ROW_NUMBER() OVER (ORDER BY custID) AS RowNum_
, custID
FROM myDatabase.myCustomers
QUALIFY RowNum_ BETWEEN 900 and 1000;
Unlike the RANK windows aggregate, ROW_NUMBER will provide you a sequence regardless of whether the column you are ordering over the optional partition set is unique or not.
Just another option to consider.
I have two sql queries
select * from table1
ORDER BY column1
Select top 10 * from table1
ORDER by column1
Column1 is a non unique column and table1 does not have a primary key.
When I run both queries, I'm getting the rows returning in different orders. I attribute this to the fact that the criterion for ranking (the Order By) is non unique.
I'm wondering what method does a normal SELECT statement use to determine the output order of the rows in this case and what a select top statement would use. Is it just random?
In the non-unique case, the output order of the rows should not be relied upon.
Instead, impose the ordering you want (by including other columns in the ORDER BY)
Consider it random - even if it isn't in some circumstances, no DBMS should be making promises on the ordering of results in the case of ties short of the ordering requested by ORDER BY.
I have SQL SELECT query which returns a lot of rows, and I have to split it into several partitions. Ie, set max results to 10000 and iterate the rows calling the query select time with increasing first result (0, 10000, 20000). All the queries are done in same transaction, and data that my queries are fetching is not changing during the process (other data in those tables can change, though).
Is it ok to use just plain select:
select a from b where...
Or do I have to use order by with the select:
select a from b where ... order by c
In order to be sure that I will get all the rows? In other word, is it guaranteed that query without order by will always return the rows in the same order?
Adding order by to the query drops performance of the query dramatically.
I'm using Oracle, if that matters.
EDIT: Unfortunately I cannot take advantage of scrollable cursor.
Order is definitely not guaranteed without an order by clause, but whether or not your results will be deterministic (aside from the order) would depend on the where clause. For example, if you have a unique ID column and your where clause included a different filter range each time you access it, then you would have non-ordered deterministic results, i.e.:
select a from b where ID between 1 and 100
select a from b where ID between 101 and 200
select a from b where ID between 201 and 300
would all return distinct result sets, but order would not be any way guaranteed.
No, without order by it is not guaranteed that query will ALWAYS return the rows in the same order.
No guarantees unless you have an order by on the outermost query.
Bad SQL Server example, but same rules apply. Not guaranteed order even with inner query
SELECT
*
FROM
(
SELECT
*
FROM
Mytable
ORDER BY SomeCol
) foo
Use Limit
So you would do:
SELECT * FROM table ORDER BY id LIMIT 0,100
SELECT * FROM table ORDER BY id LIMIT 101,100
SELECT * FROM table ORDER BY id LIMIT 201,100
The LIMIT would be from which position you want to start and the second variable would be how many results you want to see.
Its a good pagnation trick.