i need to generate some sample data from a population. I want to do this with an SQL query on an Oracle 11g database.
Here is a simple working example with population size 4 and sample size 2:
with population as (
select 1 as val from dual union all
select 2 from dual union all
select 3 from dual union all
select 4 from dual)
select val from (
select val, dbms_random.value(0,10) AS RANDORDER
from population
order by randorder)
where rownum <= 2
(the oracle sample() funtion didn't work in connection with the WITH-clause for me)
But now I, I want to "upscale" or multiply my sample data. So that I can get something like 150 % sample data of the population data (population size 4 and sample size 6, e.g.)
Is there a good way to achieve this with an SQL query?
You could use CONNECT BY:
with population(val, RANDOMORDER) as (
select level, dbms_random.value(0,10) AS RANDORDER
from dual
connect by level <= 6
ORDER BY RANDORDER
)
select val
FROM population
WHERE rownum <= 4;
db<>fiddle demo
The solution depends, if you want all rows from first initial set(s) and random additional rows from last one then use:
with params(size_, sample_) as (select 4, 6 from dual)
select val
from (
select mod(level - 1, size_) + 1 val, sample_,
case when level <= size_ * floor(sample_ / size_) then 0
else dbms_random.value()
end rand
from params
connect by level <= size_ * ceil(sample_ / size_)
order by rand)
where rownum <= sample_
But if you allow possibility of result like (1, 1, 2, 2, 3, 3), where some values may not appear at all in output (here 4) then use this:
with params(size_, sample_) as (select 4, 6 from dual)
select val
from (
select mod(level - 1, size_) + 1 val, sample_, dbms_random.value() rand
from params
connect by level <= size_ * ceil(sample_ / size_)
order by rand)
where rownum <= sample_
How it works? We build set of (1, 2, 3, 4) as many times as it results from division sample / size. Then we assign random values. In first case I assign 0 to first set(s), so they will be in output for sure, and random values to last set. In second case randoms are assigned to all rows.
Related
There is a character variable in the dataset that reads as '2-6' or '4-7'.
What I am interested in selecting those observations where the numbers 3,4,5 may be in that spectrum '4-7'
it is easy enough to use the statement
where 3 between TO_NUMBER(REGEXP_SUBSTR(MEAS_VALUE,'\d+',1,1)) and TO_NUMBER(substr(REGEXP_SUBSTR(MEAS_VALUE,'(-)\d+',1,1),2)) or
4 between TO_NUMBER(REGEXP_SUBSTR(MEAS_VALUE,'\d+',1,1)) and TO_NUMBER(substr(REGEXP_SUBSTR(MEAS_VALUE,'(-)\d+',1,1),2)) or
5 between TO_NUMBER(REGEXP_SUBSTR(MEAS_VALUE,'\d+',1,1)) and TO_NUMBER(substr(REGEXP_SUBSTR(MEAS_VALUE,'(-)\d+',1,1),2))
Just looking to see if there is a more..eloquent solution as to group that range together in a function-
test data below
TIA-
with test (id, MEAS_VALUE) as (
select 1,'2-5' from dual union all --want this
select 2,'1-2' from dual union all --do not want this
select 3,'5-7' from dual) ----want this
select * from test
where 3 between TO_NUMBER(REGEXP_SUBSTR(MEAS_VALUE,'\d+',1,1)) and TO_NUMBER(substr(REGEXP_SUBSTR(MEAS_VALUE,'(-)\d+',1,1),2))
or 4 between TO_NUMBER(REGEXP_SUBSTR(MEAS_VALUE,'\d+',1,1)) and TO_NUMBER(substr(REGEXP_SUBSTR(MEAS_VALUE,'(-)\d+',1,1),2)) or
5 between TO_NUMBER(REGEXP_SUBSTR(MEAS_VALUE,'\d+',1,1)) and TO_NUMBER(substr(REGEXP_SUBSTR(MEAS_VALUE,'(-)\d+',1,1),2));
Use a subquery:
select t.*
from (select t.*,
TO_NUMBER(REGEXP_SUBSTR(MEAS_VALUE, '\d+', 1, 1)) as lo,
TO_NUMBER(REGEXP_SUBSTR(MEAS_VALUE, '\d+', 1, 2)) as lhi
from test t
) t
where 3 between lo and hi and
4 between lo and hi and
5 between lo and hi;
You can further simplify the condition to:
where least(3, 4, 5) >= lo and
greatest(3, 4, 5) <= hi
I have one of the column in oracle table which has below value :
select csv_val from my_table where date='09-OCT-18';
output
==================
50,100,25,5000,1000
I want this values to be in ascending order with select query, output would looks like :
output
==================
25,50,100,1000,5000
I tried this link, but looks like it has some restriction on number of digits.
Here, I made you a modified version of the answer you linked to that can handle an arbitrary (hardcoded) number of commas. It's pretty heavy on CTEs. As with most LISTAGG answers, it'll have a 4000-char limit. I also changed your regexp to be able to handle null list entries, based on this answer.
WITH
T (N) AS --TEST DATA
(SELECT '50,100,25,5000,1000' FROM DUAL
UNION
SELECT '25464,89453,15686' FROM DUAL
UNION
SELECT '21561,68547,51612' FROM DUAL
),
nums (x) as -- arbitrary limit of 20, can be changed
(select level from dual connect by level <= 20),
splitstr (N, x, substring) as
(select N, x, regexp_substr(N, '(.*?)(,|$)', 1, x, NULL, 1)
from T
inner join nums on x <= 1 + regexp_count(N, ',')
order by N, x)
select N, listagg(substring, ',') within group (order by to_number(substring)) as sorted_N
from splitstr
group by N
;
Probably it can be improved, but eh...
Based on sample data you posted, relatively simple query would work (you need lines 3 - 7). If data doesn't really look like that, query might need adjustment.
SQL> with my_table (csv_val) as
2 (select '50,100,25,5000,1000' from dual)
3 select listagg(token, ',') within group (order by to_number(token)) result
4 from (select regexp_substr(csv_val, '[^,]+', 1, level) token
5 from my_table
6 connect by level <= regexp_count(csv_val, ',') + 1
7 );
RESULT
-------------------------
25,50,100,1000,5000
SQL>
Based on: How to check any missing number from a series of numbers?
I've got a similiar question. My source table has a sequence from 1 to 1000.
But it is only bad if the gap is >1 and <20. I can't get the CONNECT BY to work.
Please help me.
SELECT
'XX' AS NETWORK
,'YY' AS TYPE
,min_seq - 1 + level AS MISSING
FROM (
select
min(s.SEQUENCE_NUMBER) min_seq
, max(s.SEQUENCE_NUMBER) max_seq
FROM source s
)
CONNECT BY level <= max_seq - min_seq +20 AND level >= max_seq - min_seq +1
MINUS
SELECT
'XX' AS NETWORK
,'YY' AS TYPE
,s.SEQUENCE_NUMBER AS EXISTING
FROM source s
Old school connect by version
with tn as(
-- sample data
Select 1 n from dual
union all
Select 4 from dual
union all
Select 26 from dual
union all
Select 30 from dual
union all
Select 52 from dual
)
select distinct n, delta, n+level nn
from (
select n, delta
from (
select n, lead(n) Over(order by n) - n delta
from tn) t
where delta between 2 and 20
) t2
connect by level < delta
order by n
Use a CTE (with statement):
with CTE as
(
select level as NN
from dual
connect by level <= 20
)
select CTE.NN
from CTE
left join source s
on CTE.NN = s.SEQUENCE_NUMBER
where s.SEQUENCE_NUMBER is null
i want to find the minimum missing number of a column named (s_no) and the table named (test_table) in oracle and I write the following code..
select
min_s_no-1+level missing_number
from (
select min(s_no) min_s_no, max(s_no) max_s_no
from test_table
) connect by level <= max_s_no-min_s_no+1
minus
select s_no from test_table
;
it gives me all the missing number as a result. But I want to select the minimum
number. Can any one help me please.
thanks in advance.
Using analytical function LEAD you can get the number from the next row in ascending order. Comparing of this value with with the original number increased by 1 you get the missing values (if two numbers do not match).
To get the first missing value in ascending order is the same selecting the MIN value:
select
num,
lead(num) over (order by num) num_lead,
case when num + 1 != lead(num) over (order by num) then num + 1 end as missing_num
from test_data
order by num;
NUM NUM_LEAD MISSING_NUM
---------- ---------- -----------
4 5
5 6
6 9 7
9 10
10 13 11
13
-- first missing number = MIN missing number
select min(missing_num)
from (
select
case when num + 1 != lead(num) over (order by num) then num + 1 end as missing_num
from test_data
);
MIN(MISSING_NUM)
----------------
7
ADDENDUM
A good practice in writing SQL is to consider edge cases - here a table that contains a complete interval without holes. The first missing value will be the successor of the last number.
select nvl(min(missing_num),max(num)+1) first_missing_value
from (
select
num,
case when num + 1 != lead(num) over (order by num) then num + 1 end as missing_num
from test_data
);
A complete table return no MISSING_NUM, so the original query return NULL. Using the NVL the expected result is provided.
The best way to find the gaps is to use analytic functiions lead or lag. An example with lag:
with test_data as (
select 1 num from dual union all
select 4 from dual union all
select 6 from dual union all
select 8 from dual union all
select 3 from dual union all
select 9 from dual union all
select 0 from dual
)
select min(gap) min_gap
from (
select num, lag(num) over (order by num)+1 gap
from test_data
)
where num != gap
;
MIN_GAP
------------------
2
More about how to find the gaps here
In Oracle 12.1 and above, MATCH_RECOGNIZE can do quick work of this kind of problems:
Edited. Initially I was picking the "next number" where a gap exists (in the example, the value 9). But that is not what the OP wants, he wants the first missing number (7 in this case). I edited to change the measures clause, to find the first missing number as requested. End Edit
with test_data (num) as (
select 4 from dual union all
select 5 from dual union all
select 6 from dual union all
select 9 from dual union all
select 10 from dual union all
select 13 from dual
)
-- end of test data; when you use the SQL query below,
-- replace test_data and num with your actual table and column names.
select result as num
from test_data
match_recognize (
order by num
measures last(b.num) + 1 as result
pattern ( ^ a b* c )
define b as num = prev(num) + 1,
c as num > prev(num) + 1
)
;
NUM
---
7
Using the DUAL table, how can I get a list of numbers from 1 to 100?
Your question is difficult to understand, but if you want to select the numbers from 1 to 100, then this should do the trick:
Select Rownum r
From dual
Connect By Rownum <= 100
Another interesting solution in ORACLE PL/SQL:
SELECT LEVEL n
FROM DUAL
CONNECT BY LEVEL <= 100;
Using Oracle's sub query factory clause: "WITH", you can select numbers from 1 to 100:
WITH t(n) AS (
SELECT 1 from dual
UNION ALL
SELECT n+1 FROM t WHERE n < 100
)
SELECT * FROM t;
Do it the hard way. Use the awesome MODEL clause:
SELECT V
FROM DUAL
MODEL DIMENSION BY (0 R)
MEASURES (0 V)
RULES ITERATE (100) (
V[ITERATION_NUMBER] = ITERATION_NUMBER + 1
)
ORDER BY 1
Proof: http://sqlfiddle.com/#!4/d41d8/20837
You could use XMLTABLE:
SELECT rownum
FROM XMLTABLE('1 to 100');
-- alternatively(useful for generating range i.e. 10-20)
SELECT (COLUMN_VALUE).GETNUMBERVAL() AS NUM
FROM XMLTABLE('1 to 100');
DBFiddle Demo
If you want your integers to be bound between two integers (i.e. start with something other than 1), you can use something like this:
with bnd as (select 4 lo, 9 hi from dual)
select (select lo from bnd) - 1 + level r
from dual
connect by level <= (select hi-lo from bnd);
It gives:
4
5
6
7
8
Peter's answer is my favourite, too.
If you are looking for more details there is a quite good overview, IMO, here.
Especially interesting is to read the benchmarks.
Using GROUP BY CUBE:
SELECT ROWNUM
FROM (SELECT 1 AS c FROM dual GROUP BY CUBE(1,1,1,1,1,1,1) ) sub
WHERE ROWNUM <=100;
Rextester Demo
A variant of Peter's example, that demonstrates a way this could be used to generate all numbers between 0 and 99.
with digits as (
select mod(rownum,10) as num
from dual
connect by rownum <= 10
)
select a.num*10+b.num as num
from digits a
,digits b
order by num
;
Something like this becomes useful when you are doing batch identifier assignment, and looking for the items that have not yet been assigned.
For example, if you are selling bingo tickets, you may want to assign batches of 100 floor staff (guess how i used to fund raise for sports). As they sell a batch, they are given the next batch in sequence. However, people purchasing the tickets can select to purchase any tickets from the batch. The question may be asked, "what tickets have been sold".
In this case, we only have a partial, random, list of tickets that were returned within the given batch, and require a complete list of all possibilities to determine which we don't have.
with range as (
select mod(rownum,100) as num
from dual
connect by rownum <= 100
),
AllPossible as (
select a.num*100+b.num as TicketNum
from batches a
,range b
order by num
)
select TicketNum as TicketsSold
from AllPossible
where AllPossible.Ticket not in (select TicketNum from TicketsReturned)
;
Excuse the use of key words, I changed some variable names from a real world example.
... To demonstrate why something like this would be useful
I created an Oracle function that returns a table of numbers
CREATE OR REPLACE FUNCTION [schema].FN_TABLE_NUMBERS(
NUMINI INTEGER,
NUMFIN INTEGER,
EXPONENCIAL INTEGER DEFAULT 0
) RETURN TBL_NUMBERS
IS
NUMEROS TBL_NUMBERS;
INDICE NUMBER;
BEGIN
NUMEROS := TBL_NUMBERS();
FOR I IN (
WITH TABLA AS (SELECT NUMINI, NUMFIN FROM DUAL)
SELECT NUMINI NUM FROM TABLA UNION ALL
SELECT
(SELECT NUMINI FROM TABLA) + (LEVEL*TO_NUMBER('1E'||TO_CHAR(EXPONENCIAL))) NUM
FROM DUAL
CONNECT BY
(LEVEL*TO_NUMBER('1E'||TO_CHAR(EXPONENCIAL))) <= (SELECT NUMFIN-NUMINI FROM TABLA)
) LOOP
NUMEROS.EXTEND;
INDICE := NUMEROS.COUNT;
NUMEROS(INDICE):= i.NUM;
END LOOP;
RETURN NUMEROS;
EXCEPTION
WHEN NO_DATA_FOUND THEN
RETURN NUMEROS;
WHEN OTHERS THEN
RETURN NUMEROS;
END;
/
Is necessary create a new data type:
CREATE OR REPLACE TYPE [schema]."TBL_NUMBERS" IS TABLE OF NUMBER;
/
Usage:
SELECT COLUMN_VALUE NUM FROM TABLE([schema].FN_TABLE_NUMBERS(1,10))--integers difference: 1;2;.......;10
And if you need decimals between numbers by exponencial notation:
SELECT COLUMN_VALUE NUM FROM TABLE([schema].FN_TABLE_NUMBERS(1,10,-1));--with 0.1 difference: 1;1.1;1.2;.......;10
SELECT COLUMN_VALUE NUM FROM TABLE([schema].FN_TABLE_NUMBERS(1,10,-2));--with 0.01 difference: 1;1.01;1.02;.......;10
If you want to generate the list of numbers 1 - 100 you can use the cartesian product of {1,2,3,4,5,6,6,7,8,9,10} X {0,10,20,30,40,50,60,70,80,90}
https://en.wikipedia.org/wiki/Cartesian_product
Something along the lines of the following:
SELECT
ones.num + tens.num
FROM
(
SELECT 1 num UNION ALL
SELECT 2 num UNION ALL
SELECT 3 num UNION ALL
SELECT 4 num UNION ALL
SELECT 5 num UNION ALL
SELECT 6 num UNION ALL
SELECT 7 num UNION ALL
SELECT 9 num UNION ALL
SELECT 10 num
) as ones
CROSS JOIN
(
SELECT 0 num UNION ALL
SELECT 10 num UNION ALL
SELECT 20 num UNION ALL
SELECT 30 num UNION ALL
SELECT 40 num UNION ALL
SELECT 50 num UNION ALL
SELECT 60 num UNION ALL
SELECT 70 num UNION ALL
SELECT 80 num UNION ALL
SELECT 90 num
) as tens;
I'm not able to test this out on an oracle database, you can place the dual where it belongs but it should work.
SELECT * FROM `DUAL` WHERE id>0 AND id<101
The above query is written in SQL in the database.