Oracle - select random rows with potential duplicates

Oracle - select random rows with potential duplicates - sql

I am using Oracle 11gR2. Given a table, I would like to return a certain number of rows in random order, with potential duplicates.
All the posts I have seen (here or here or here also) are about finding a number of unique rows in random order.
For example, given this table and asking for 2 random rows:
Table
-----------------
ID LABEL
1 Row 1
2 Row 2
3 Row 3
I would like the query to return
1 Row 1
2 Row 2
but also possibly
1 Row 1
1 Row 1
How could this be done using only pure SQL (no PL/SQL or stored procedure) ? The source table does not have duplicate rows; by duplicate, I mean two rows having the same ID.

Maybe something like this (where p_num is a parameter):
with sample_data as (select 1 id, 'row 1' label from dual union all
select 2 id, 'row 2' label from dual union all
select 3 id, 'row 3' label from dual),
dummy as (select level lvl
from dual
connect by level <= p_num)
select *
from (select sd.*
from sample_data sd,
dummy d
order by dbms_random.value)
where rownum <= p_num;
I really wouldn't like to use this in production code, though, as I don't think it will scale at all well.
What's the reasoning behind your requirement? It doesn't sound like particularly good design to me.

select a random row union select another random row
That gives you two totally randomized rows, which can be the same, if both randoms have the same value, or two different rows. The key is to do two random selects, not one to return two rows
If you want more than two rows, i think the best solution would be to have a random-number-table, do a full outer join to that table and order by random, select top(n) of that join. By the full outer join you have each Row of your Sourcetable many times in the result set before selecting the top(n)

I cant think of any way to do it without a stored procedure.
You might be able to make sue of DBMS_RANDOM
http://docs.oracle.com/cd/B19306_01/appdev.102/b14258/d_random.htm#i998925
http://www.databasejournal.com/features/oracle/article.php/3341051/Generating-random-numbers-and-strings-in-Oracle.htm
You could generate a random primary key and return that?

You can use DBMS_RANDOM in a SQL Query.
SELECT ID FROM
(
SELECT ID FROM mytable
ORDER BY dbms_random.value)
WHERE ROWNUM <=2
http://www.sqlfiddle.com/#!4/c6487/13/0

Related

How to get results of a joined query (2 tables) sorted in the order shown in the main table? [duplicate]

I have a table MY_TABLE with a primary key MY_PK. Then, I have a list of ordered primary keys, for example (17,13,35,2,9).
Now I want to retrieve all rows with these primary keys, and keep the order of the rows in the same way as the given list of keys.
What I was initally doing was:
SELECT * FROM MY_TABLE WHERE MY_PK IN (:my_list)
But then the order of the returned rows is random and does not correspond to the order of the given keys anymore. Is there a way to achieve that?
The only thing I thought of is making many SELECT statements and concatenate them with UNION, but my list of primary keys can be very long and contain hundreds or even thousands of keys. The alternative I thought of was to reorder the rows afterwards in the application, but I would prefer a solution where this is not necessary.

First, doing this with a union would not necessarily help. The ordering of rows in the result set is not guaranteed unless you have an order by clause.
Here is one solution, although it is inelegant:
with keys as (
select 1 as ordering, 17 as pk from dual union all
select 2 as ordering, 13 as pk from dual union all
select 3 as ordering, 35 as pk from dual union all
select 4 as ordering, 2 as pk from dual union all
select 5 as ordering, 9 as pk from dual
)
select mt.*
from My_Table mt join
keys
on mt.my_pk = keys.pk
order by keys.ordering

Oracle only guarantees the order of a result set if it is sorted with an explicit ORDER BY statement. So your actual question is, "how can I guarantee to sort my results into an arbitrary order?"
Well the simple answer is you can't. The more complicated answer is that you need to associate your arbitrary order with an index which can be sorted.
I will presume you're getting your list of IDs as a string. (If you get them as an array or something similarly table-like life is easier.) So first of all you need to tokenize your string. In my example I use the splitter function from this other SO thread. I'm going to use that in a common table expression to get some rows, and use the rownum pseudo-column to synthesize an index. Then we join the CTE output to your table.
with cte as
( select s.column_value as id
, rownum as sort_order
from table(select splitter('17,13,35,2,9') from dual) s
)
select yt.*
from your_table yt
where yt.id = cte.id
order by cte.sort_order
Caveat: this is untested code but the principle is sound. If you do get compilation or syntax errors which you cannot resolve please include sufficient detail in the comments.

The way to guarantee an order on your resultset is using ORDER BY, so what I would do is to insert in a temporary table with 2 columns your primary key and a secuencial ID which you would use later to make the ORDER BY. Your temporal table would be:
PrimaryKey ID
-------------------
17 1
13 2
35 3
2 4
9 5
After that just using a join of your table and the temporal table on the PrimaryKey column and order by the ID column of your temporal table.

Find out if query exceeds arbitrary limit using ROWNUM?

I have a stored proc in Oracle, and we're limiting the number of records with ROWNUM based on a parameter. However, we also have a requirement to know whether the search result count exceeded the arbitrary limit (even though we're only passing data up to the limit; searches can return a lot of data, and if the limit is exceeded a user may want to refine their query.)
The limit's working well, but I'm attempting to pass an OUT value as a flag to signal when the maximum results were exceeded. My idea for this was to get the count of the inner table and compare it to the count of the outer select query (with ROWNUM) but I'm not sure how I can get that into a variable. Has anyone done this before? Is there any way that I can do this without selecting everything twice?
Thank you.
EDIT: For the moment, I am actually doing two identical selects - one for the count only, selected into my variable, and one for the actual records. I then pass back the comparison of the base result count to my max limit parameter. This means two selects, which isn't ideal. Still looking for an answer here.

You can add a column to the query:
select * from (
select . . . , count(*) over () as numrows
from . . .
where . . .
) where rownum <= 1000;
And then report numrows as the size of the final result set.

You could use a nested subquery:
select id, case when max_count > 3 then 'Exceeded' else 'OK' end as flag
from (
select id, rn, max(rn) over () as max_count
from (
select id, rownum as rn
from t
)
where rownum <= 4
)
where rownum <= 3;
The inner level is your actual query (which you probably have filters and an order-by clause in really). The middle later restricts to your actual limit + 1, which still allows Oracle to optimise using a stop key, and uses an analytic count over that inner result set to see if you got a fourth record (without requiring a scan of all matching records). And the outer layer restricts to your original limit.
With a sample table with 10 rows, this gets:
ID FLAG
---------- --------
1 Exceeded
2 Exceeded
3 Exceeded
If the inner query had a filter that returned fewer rows, say:
select id, rownum as rn
from t
where id < 4
it would get:
ID FLAG
---------- --------
1 OK
2 OK
3 OK
Of course for this demo I haven't done any ordering so you would get indeterminate results. And from your description you would use your variable instead of 3, and (your variable + 1) instead of 4.

In my application I do a very simple approach. I do the normal SELECT and when the number of returned rows is equal to the limit then the client application shows LIMIT reached message, because is it very likely that my query would return more rows in case you would not limit the result.
Of course, when the number of rows is exactly the limit then this is a wrong indication. However, in my application the limit is set mainly for performance reasons by end-user, a typical limit is "1000 rows" or "10000 rows", for example.
In my case this solution is fully sufficient - and it is simple.
Update:
Are you aware of the row_limiting_clause? It was introduced in Oracle 12.1
For example this query
SELECT employee_id, last_name
FROM employees
ORDER BY employee_id
OFFSET 5 ROWS FETCH NEXT 10 ROWS ONLY;
will return row 6 to row 16 of the entire result set. It may support you in finding a solution.
Another idea is this one:
SELECT employee_id, last_name
FROM employees
UNION ALL
SELECT NULL, NULL FROM dual
ORDER BY employee_id NULLS LAST
When you get the row where employee_id IS NULL then you know you reached the end of your result-set and no further records will arrive.

Select the whole thing, then select the count and the data, restricting the number of rows.
with
base as
(
select c1, c2, c3
from table
where condition
)
select (select count(*) from base), c1, c2, c3
from base
where rownum < 100

Why = operator doesn't work with ROWNUM other than for value 1?

I have the following query:
select * from abc where rownum = 10
Output: No records to display
I sure have more than 25 records in the abc table and my objective is to display the nth record.
If I write the query as: -
select * from abc where rownum = 1
it works fine and gives me the first record. Not any other record other than first.
Any idea?

Because row numbers are assigned sequentially to the rows that are fetched and returned.
Here's how that statement of yours works. It grabs the first candidate row and temporarily gives it row number 1, which doesn't match your condition so it's thrown away.
Then you get the second candidate row and it's also given row number 1 (since the previous one was tossed away). It doesn't match either.
Then the third candidate row ... well, I'm sure you can see where this is going now. In short, you will never find a row that satisfies that condition.
Row numbers are only useful for = 1, < something or <= something.
This is all explained in the Oracle docs for the rownum pseudo-column.
You should also keep in mind that SQL is a relational algebra that returns unordered sets unless you specify an order. That means row number ten may be something now and something else in three minutes.
If you want a (kludgy, admittedly) way to get the nth row, you can use something like (for the fifth row):
select * from (
select * from (
select col1, col2, col3 from tbl order by col1 asc
) where rownum < 6 order by col1 desc
) where rownum = 1
The inner select will ensure you have a consistent order on the query before you start throwing away rows, and the middle select will throw away all but the first five rows from that, and also reverse the order.
The outer select will then only return the first row of the reversed set (which is the last row of the five-row set when it was in ascending order).
A better way is probably:
select * from (
select rownum rn, col1, col2, col3 from tbl order by col1
) where rn = 5
This works by retrieving everything and assigning the rownum to a "real" column, then using that real column number to filter the results.

How to find unused ID in a column? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
SQL query to find Missing sequence numbers
I have a table which has a user Id column, The user could select which user ID to add in the table. I am wondering if there is a one sql code that could point me to the list of unused user id or even just the smallest unused ID?
For example, I have the following IDs
USER_ID
1
2
3
5
6
7
8
10
I would like to know if there is a way to select 4 or even selecting 4 and 9?

You can try using the "NOT IN" clause:
select
user_id
from table
where
user_id not in (select user_id from another_table)
Like this:
select
u1.user_id + 1 as start
from users as u1
left outer join users as u2 on u1.user_id + 1 = u2.id
where
u2.id is null
From here.

It depends on the Database you are using. If you are using Oracle, something like this will work:
Step 1: Find out max value of userid in your table:
select max(userid) from tbl_userid
let this number be m
Step 2: Find out the max value of rownum in the foll query
select rownum from all_objects
Step 3: If the max value is greater than m then you can use the foll query to list your unused user ids
select user_id
from tbl_userid
where user_id NOT IN (select rownum from all_objects)
If max value returned by step 2 is less than m you can tweak your query to the following
select user_id
from tbl_userid
where user_id NOT IN
(select rownum
from (select *
from all_objects
UNION ALL
select * from all_objects)
)
Repeat the UNION ALL until you get max(rownum) >= m
If you are using SQL server, kindly let me know. There is no direct equivalent of ROWNUM pseudocolumn in sql server but there are workarounds using the RANK() function.

Given that SQL is generally a set-based language, the only way I could think to do this would be to create the full set of ID's, and outer join your table where no ID's matched. Problem with that is if your table has a significant number of records, you would have to generate a temporary table containing every ID from 1 through MAX(USER_ID). Given a table with tens or hundreds of millions of records, that could be very slow.
Just out of curiosity, why do you need to know the ID holes? Is there some specific reason, or are you just trying to not "waste" an ID? Given the processing effort to find the holes, I would think it is more efficient to just let them be.

Here's one way to do it using SQL Server 2005 or later. It may or may not work efficiently for you:
insert into T values
(1),(2),(3),(5),(6),(9),(11);
with Trk as (
select userid,
row_number() over (
order by userid
) as rk
from T
), Truns(start,finish,gp) as (
select -1+min(userid), 1+max(userid),
userid-rk
from Trk
group by userid-rk
), Tregroup as (
select start, finish,
row_number() over (
order by gp
) as rk
from Truns
), Tpre as (
select a.finish, b.start
from Tregroup as a full outer join Tregroup as b
on a.rk + 1 = b.rk
)
select
rtrim(finish) + case when start = finish then '' else + '-' + rtrim(start) end as gap
from Tpre
where finish+start is not null
drop table T;

Short of looping through all the ids (perhaps using binary search tree logic?) I don't have a good answer for you.
I would ask what you want this for? By their nature, ids are essentially meaningless - all they do is identify some data, not describe it, and as such it shouldn't be a problem if you have large gaps in your user ids. (In fact, some people would say that it's even better to have unguessable ids, to avoid users tampering with information to find security holes)

Most efficient way to select 1st and last element, SQLite?

What is the most efficient way to select the first and last element only, from a column in SQLite?

The first and last element from a row?
SELECT column1, columnN
FROM mytable;
I think you must mean the first and last element from a column:
SELECT MIN(column1) AS First,
MAX(column1) AS Last
FROM mytable;
See http://www.sqlite.org/lang_aggfunc.html for MIN() and MAX().
I'm using First and Last as column aliases.

if it's just one column:
SELECT min(column) as first, max(column) as last FROM table
if you want to select whole row:
SELECT 'first',* FROM table ORDER BY column DESC LIMIT 1
UNION
SELECT 'last',* FROM table ORDER BY column ASC LIMIT 1

The most efficient way would be to know what those fields were called and simply select them.
SELECT `first_field`, `last_field` FROM `table`;

Probably like this:
SELECT dbo.Table.FirstCol, dbo.Table.LastCol FROM Table
You get minor efficiency enhancements from specifying the table name and schema.

First: MIN() and MAX() on a text column gives AAAA and TTTT results which are not the first and last entries in my test table. They are the minimum and maximum values as mentioned.
I tried this (with .stats on) on my table which has over 94 million records:
select * from
(select col1 from mitable limit 1)
union
select * from
(select col1 from mitable limit 1 offset
(select count(0) from mitable) -1);
But it uses up a lot of virtual machine steps (281,624,718).
Then this which is much more straightforward (which works if the table was created without WITHOUT ROWID) [sql keywords are in capitals]:
SELECT col1 FROM mitable
WHERE ROWID = (SELECT MIN(ROWID) FROM mitable)
OR ROWID = (SELECT MAX(ROWID) FROM mitable);
That ran with 55 virtual machine steps on the same table and produced the same answer.

min()/max() approach is wrong. It is only correct, if the values are ascending only. I needed something liket this for currency rates, which are random raising and falling.
This is my solution:
select st.*
from stats_ticker st,
(
select min(rowid) as first, max(rowid) as last --here is magic part 1
from stats_ticker
-- next line is just a filter I need in my case.
-- if you want first/last of the whole table leave it out.
where timeutc between datetime('now', '-1 days') and datetime('now')
) firstlast
WHERE
st.rowid = firstlast.first --and these two rows do magic part 2
OR st.rowid = firstlast.last
ORDER BY st.rowid;
magic part 1: the subselect results in a single row with the columns first,last containing rowid's.
magic part 2 easy to filter on those two rowid's.
This is the best solution I've come up so far. Hope you like it.

We can do that by the help of Sql Aggregate function, like Max and Min. These are the two aggregate function which help you to get last and first element from data table .
Select max (column_name ), min(column name) from table name
Max will give you the max value means last value and min will give you the min value means it will give you the First value, from the specific table.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas