i need to generate some sample data from a population. I want to do this with an SQL query on an Oracle 11g database.
Here is a simple working example with population size 4 and sample size 2:
with population as (
select 1 as val from dual union all
select 2 from dual union all
select 3 from dual union all
select 4 from dual)
select val from (
select val, dbms_random.value(0,10) AS RANDORDER
from population
order by randorder)
where rownum <= 2
(the oracle sample() funtion didn't work in connection with the WITH-clause for me)
But now I, I want to "upscale" or multiply my sample data. So that I can get something like 150 % sample data of the population data (population size 4 and sample size 6, e.g.)
Is there a good way to achieve this with an SQL query?
You could use CONNECT BY:
with population(val, RANDOMORDER) as (
select level, dbms_random.value(0,10) AS RANDORDER
from dual
connect by level <= 6
ORDER BY RANDORDER
)
select val
FROM population
WHERE rownum <= 4;
db<>fiddle demo
The solution depends, if you want all rows from first initial set(s) and random additional rows from last one then use:
with params(size_, sample_) as (select 4, 6 from dual)
select val
from (
select mod(level - 1, size_) + 1 val, sample_,
case when level <= size_ * floor(sample_ / size_) then 0
else dbms_random.value()
end rand
from params
connect by level <= size_ * ceil(sample_ / size_)
order by rand)
where rownum <= sample_
But if you allow possibility of result like (1, 1, 2, 2, 3, 3), where some values may not appear at all in output (here 4) then use this:
with params(size_, sample_) as (select 4, 6 from dual)
select val
from (
select mod(level - 1, size_) + 1 val, sample_, dbms_random.value() rand
from params
connect by level <= size_ * ceil(sample_ / size_)
order by rand)
where rownum <= sample_
How it works? We build set of (1, 2, 3, 4) as many times as it results from division sample / size. Then we assign random values. In first case I assign 0 to first set(s), so they will be in output for sure, and random values to last set. In second case randoms are assigned to all rows.
i want to find the minimum missing number of a column named (s_no) and the table named (test_table) in oracle and I write the following code..
select
min_s_no-1+level missing_number
from (
select min(s_no) min_s_no, max(s_no) max_s_no
from test_table
) connect by level <= max_s_no-min_s_no+1
minus
select s_no from test_table
;
it gives me all the missing number as a result. But I want to select the minimum
number. Can any one help me please.
thanks in advance.
Using analytical function LEAD you can get the number from the next row in ascending order. Comparing of this value with with the original number increased by 1 you get the missing values (if two numbers do not match).
To get the first missing value in ascending order is the same selecting the MIN value:
select
num,
lead(num) over (order by num) num_lead,
case when num + 1 != lead(num) over (order by num) then num + 1 end as missing_num
from test_data
order by num;
NUM NUM_LEAD MISSING_NUM
---------- ---------- -----------
4 5
5 6
6 9 7
9 10
10 13 11
13
-- first missing number = MIN missing number
select min(missing_num)
from (
select
case when num + 1 != lead(num) over (order by num) then num + 1 end as missing_num
from test_data
);
MIN(MISSING_NUM)
----------------
7
ADDENDUM
A good practice in writing SQL is to consider edge cases - here a table that contains a complete interval without holes. The first missing value will be the successor of the last number.
select nvl(min(missing_num),max(num)+1) first_missing_value
from (
select
num,
case when num + 1 != lead(num) over (order by num) then num + 1 end as missing_num
from test_data
);
A complete table return no MISSING_NUM, so the original query return NULL. Using the NVL the expected result is provided.
The best way to find the gaps is to use analytic functiions lead or lag. An example with lag:
with test_data as (
select 1 num from dual union all
select 4 from dual union all
select 6 from dual union all
select 8 from dual union all
select 3 from dual union all
select 9 from dual union all
select 0 from dual
)
select min(gap) min_gap
from (
select num, lag(num) over (order by num)+1 gap
from test_data
)
where num != gap
;
MIN_GAP
------------------
2
More about how to find the gaps here
In Oracle 12.1 and above, MATCH_RECOGNIZE can do quick work of this kind of problems:
Edited. Initially I was picking the "next number" where a gap exists (in the example, the value 9). But that is not what the OP wants, he wants the first missing number (7 in this case). I edited to change the measures clause, to find the first missing number as requested. End Edit
with test_data (num) as (
select 4 from dual union all
select 5 from dual union all
select 6 from dual union all
select 9 from dual union all
select 10 from dual union all
select 13 from dual
)
-- end of test data; when you use the SQL query below,
-- replace test_data and num with your actual table and column names.
select result as num
from test_data
match_recognize (
order by num
measures last(b.num) + 1 as result
pattern ( ^ a b* c )
define b as num = prev(num) + 1,
c as num > prev(num) + 1
)
;
NUM
---
7
I have a table with two columns:
OLD_REVISIONS |NEW_REVISIONS
-----------------------------------
1,25,26,24 |1,26,24,25
1,56,55,54 |1,55,54
1 |1
1,2 |1
1,96,95,94 |1,96,94,95
1 |1
1 |1
1 |1
1 |1
1,2 |1,2
1 |1
1 |1
1 |1
1 |1
For each row there will be a list of revisions for a document (comma separated)
The comma separated list might be the same in both columns but the order/sort might be different - e.g.
2,1 |1,2
I would like to find all the instances where the highest revision in the OLD_REVISIONS column is lower than than the highest revision in NEW_REVISIONS
The following would fit that criteria
OLD_REVISIONS |NEW_REVISIONS
-----------------------------------
1,2 |1
1,56,55,54 |1,55,54
I tried a solution using the MINUS option (joining the table to itself) but it returns differences even for when the list is the same but in the wrong order
I tried the function GREATEST (i.e where greatest(new_Revisions) < greatest(old_revisions)) but i am not sure why greatest(OLD_REVISIONS) always just returns the comma separated value. It does not return the max value. I suspect it is comparing strings because the columns are VARCHAR.
Also, MAX function expects a single number.
Is there another way i can achieve the above? I am looking for a pure SQL option so i can print out the results (or a PL/SQL option that can print out the results)
Edit
Apologies for not mentioning this but for the NEW_REVISIONS i do actually have the data in a table where each revision is in a separate row:
"DOCNUMBER" "REVISIONNUMBER"
67 1
67 24
67 25
67 26
75 1
75 54
75 55
75 56
78 1
79 1
79 2
83 1
83 96
83 94
Just to give some content, a few weeks ago i suspected that there are revisions disappearing.
To investigate this, i decided to take a count of all revisions for all documents and take a snapshot to compare later to see if revisions are indeed missing.
The snapshot that i took contained the following columns:
docnumber, count, revisions
The revisions were stored in a comma separated list using the listagg function.
The trouble i have now is the on live table, new revisions have been added so when i compare the main table and the snapshot using a MINUS i get a difference because
of the new revisions in the main table.
Even though in the actual table the revisions are individual rows, in the snapshot table i dont have the individual rows.
I am thinking the only way to recreate the snapshot in the same format and compare them find out if maximum revision in the main table is lower than the max revision in the snapshot table (hence why im trying to find out how to find out the max in a comma separated string)
Enjoy.
select xmlcast(xmlquery(('max((' || OLD_REVISIONS || '))') RETURNING CONTENT) as int) as OLD_REVISIONS_max
,xmlcast(xmlquery(('max((' || NEW_REVISIONS || '))') RETURNING CONTENT) as int) as NEW_REVISIONS_max
from t
;
Assuming your base table has an id column (versions of what?) - here is a solution based on splitting the rows.
Edit: If you like this solution, check out vkp's solution, which is better than mine. I explain why his solution is better in a Comment to his Answer.
with
t ( id, old_revisions, new_revisions ) as (
select 101, '1,25,26,24', '1,26,24,25' from dual union all
select 102, '1,56,55,54', '1,55,54' from dual union all
select 103, '1' , '1' from dual union all
select 104, '1,2' , '1' from dual union all
select 105, '1,96,95,94', '1,96,94,95' from dual union all
select 106, '1' , '1' from dual union all
select 107, '1' , '1' from dual union all
select 108, '1' , '1' from dual union all
select 109, '1' , '1' from dual union all
select 110, '1,2' , '1,2' from dual union all
select 111, '1' , '1' from dual union all
select 112, '1' , '1' from dual union all
select 113, '1' , '1' from dual union all
select 114, '1' , '1' from dual
)
-- END of TEST DATA; the actual solution (SQL query) begins below.
select id, old_revisions, new_revisions
from (
select id, old_revisions, new_revisions, 'old' as flag,
to_number(regexp_substr(old_revisions, '\d+', 1, level)) as rev_no
from t
connect by level <= regexp_count(old_revisions, ',') + 1
and prior id = id
and prior sys_guid() is not null
union all
select id, old_revisions, new_revisions, 'new' as flag,
to_number(regexp_substr(new_revisions, '\d+', 1, level)) as rev_no
from t
connect by level <= regexp_count(new_revisions, ',') + 1
and prior id = id
and prior sys_guid() is not null
)
group by id, old_revisions, new_revisions
having max(case when flag = 'old' then rev_no end) !=
max(case when flag = 'new' then rev_no end)
order by id -- ORDER BY is optional
;
ID OLD_REVISION NEW_REVISION
--- ------------ ------------
102 1,56,55,54 1,55,54
104 1,2 1
You can compare every value by putting together the revisions in the same order using listagg function.
SELECT listagg(o,',') WITHIN GROUP (ORDER BY o) old_revisions,
listagg(n,',') WITHIN GROUP (ORDER BY n) new_revisions
FROM (
SELECT DISTINCT rowid r,
regexp_substr(old_revisions, '[^,]+', 1, LEVEL) o,
regexp_substr(new_revisions, '[^,]+', 1, LEVEL) n
FROM table
WHERE regexp_substr(old_revisions, '[^,]+', 1, LEVEL) IS NOT NULL
CONNECT BY LEVEL<=(SELECT greatest(MAX(regexp_count(old_revisions,',')),MAX(regexp_count(new_revisions,',')))+1 c FROM table)
)
GROUP BY r
HAVING listagg(o,',') WITHIN GROUP (ORDER BY o)<>listagg(n,',') WITHIN GROUP (ORDER BY n);
This could be a way:
select
OLD_REVISIONS,
NEW_REVISIONS
from
REVISIONS t,
table(cast(multiset(
select level
from dual
connect by level <= length (regexp_replace(t.OLD_REVISIONS, '[^,]+')) + 1
) as sys.OdciNumberList
)
) levels_old,
table(cast(multiset(
select level
from dual
connect by level <= length (regexp_replace(t.NEW_REVISIONS, '[^,]+')) + 1
)as sys.OdciNumberList
)
) levels_new
group by t.ROWID,
OLD_REVISIONS,
NEW_REVISIONS
having max(to_number(trim(regexp_substr(t.OLD_REVISIONS, '[^,]+', 1, levels_old.column_value)))) >
max(to_number(trim(regexp_substr(t.new_REVISIONS, '[^,]+', 1, levels_new.column_value))))
This uses a double string split to pick the values from every field, and then simply finds the rows where the max values among the two collections match your requirement.
You should edit this by adding some unique key in the GROUP BYclause, or a rowid if you don't have any unique key on your table.
One way to do is to split the columns on comma separation using regexp_substr and checking if the max and min values are different.
Sample Demo
with rownums as (select t.*,row_number() over(order by old_revisions) rn from t)
select old_revisions,new_revisions
from rownums
where rn in (select rn
from rownums
group by rn
connect by regexp_substr(old_revisions, '[^,]+', 1, level) is not null
or regexp_substr(new_revisions, '[^,]+', 1, level) is not null
having max(cast(regexp_substr(old_revisions,'[^,]+', 1, level) as int))
<> max(cast(regexp_substr(new_revisions,'[^,]+', 1, level) as int))
)
Comments say normalise data. I agree but also I understand it may be not possible. I would try something like query below:
select greatest(val1, val2), t1.r from (
select max(val) val1, r from (
select regexp_substr(v1,'[^,]+', 1, level) val, rowid r from tab1
connect by regexp_substr(v1, '[^,]+', 1, level) is not null
) group by r) t1
inner join (
select max(val) val2, r from (
select regexp_substr(v2,'[^,]+', 1, level) val, rowid r from tab1
connect by regexp_substr(v2, '[^,]+', 1, level) is not null
) group by r) t2
on (t1.r = t2.r);
Tested on:
create table tab1 (v1 varchar2(100), v2 varchar2(100));
insert into tab1 values ('1,3,5','1,4,7');
insert into tab1 values ('1,3,5','1,2,9');
insert into tab1 values ('1,3,5','1,3,5');
insert into tab1 values ('1,3,5','1,4');
and seems to work fine. I left rowid for reference. I guess you have some id in table.
After your edit I would change query to:
select greatest(val1, val2), t1.r from (
select max(val) val1, r from (
select regexp_substr(v1,'[^,]+', 1, level) val, DOCNUMBER r from tab1
connect by regexp_substr(v1, '[^,]+', 1, level) is not null
) group by DOCNUMBER) t1
inner join (
select max(DOCNUMBER) val2, DOCNUMBER r from NEW_REVISIONS) t2
on (t1.r = t2.r);
You may write a PL/SQL function parsing the string and returning the maximal number
select max_num( '1,26,24,25') max_num from dual;
MAX_NUM
----------
26
The query ist than very simple:
select OLD_REVISIONS NEW_REVISIONS
from revs
where max_num(OLD_REVISIONS) < max_num(NEW_REVISIONS);
A prototyp function without validation and error handling
create or replace function max_num(str_in VARCHAR2) return NUMBER as
i number;
x varchar2(1);
n number := 0;
max_n number := 0;
pow number := 0;
begin
for i in 0.. length(str_in)-1 loop
x := substr(str_in,length(str_in)-i,1);
if x = ',' then
-- check max number
if n > max_n then
max_n := n;
end if;
-- reset
n := 0;
pow := 0;
else
n := n + to_number(x)*power(10,pow);
pow := pow +1;
end if;
end loop;
return(max_n);
end;
/
Below is the interview question, can some please help me resolve it?
select 'a1b2c3d4e5f6g7' from dual;
Output is sum of given integer number(1+2+3+4+5+6+7)=28.
Any help?
Use a Regex to keep only the numbers,then connect by to add each number
With T
as (select regexp_replace('a1b2c3d4e5f6g7', '[A-Za-z]') as col from dual)
select sum(val)
From
(
select substr(col,level,1) val from t connect by level <= length(col)
)
FIDDLE
Since it is only 1 digit numbers you can use SUBSTR() to extract every other character:
SQL Fiddle
Oracle 11g R2 Schema Setup:
Query 1:
WITH data ( value ) AS (
select 'a1b2c3d4e5f6g7' from dual
)
SELECT SUM( TO_NUMBER( SUBSTR( value, 2*LEVEL, 1 ) ) ) AS total
FROM data
CONNECT BY 2 * LEVEL <= LENGTH( value )
Results:
| TOTAL |
|-------|
| 28 |
However, if you have two digit numbers then you can do:
Query 2:
WITH data ( value ) AS (
select 'a1b2c3d4e5f6g7h8i9j10' from dual
)
SELECT SUM( TO_NUMBER( REGEXP_SUBSTR( value, '\d+', 1, LEVEL ) ) ) AS total
FROM data
CONNECT BY LEVEL <= REGEXP_COUNT( value, '\d+' )
Results:
| TOTAL |
|-------|
| 55 |
You can use regexp_substr to extract exactly the numbers, then just sum them:
with t as (select 'a1b2c3d4e5f6g7' expr from dual)
select sum(regexp_substr(t.expr, '[0-9]+',1, level)) as col
from dual
connect by level < regexp_instr(t.expr, '[0-9]+',1, level);
example:
select sum(regexp_substr('a1b2c3d4e5f6g7r22g4', '[0-9]+',1, level)) as col
from dual
connect by level < regexp_instr('a1b2c3d4e5f6g7r22g4', '[0-9]+',1, level);
Result:
54
This solution works with numbers with more than 1 digit and it doesn't matter how many characters are between the numbers:
with t as (select 'a1b2c3d4e5f6g7' as str from dual)
select sum(to_number(regexp_substr(str,'[0-9]+',1,level)))
from t
connect by regexp_substr(str,'[0-9]+',1,level) is not null
Using the DUAL table, how can I get a list of numbers from 1 to 100?
Your question is difficult to understand, but if you want to select the numbers from 1 to 100, then this should do the trick:
Select Rownum r
From dual
Connect By Rownum <= 100
Another interesting solution in ORACLE PL/SQL:
SELECT LEVEL n
FROM DUAL
CONNECT BY LEVEL <= 100;
Using Oracle's sub query factory clause: "WITH", you can select numbers from 1 to 100:
WITH t(n) AS (
SELECT 1 from dual
UNION ALL
SELECT n+1 FROM t WHERE n < 100
)
SELECT * FROM t;
Do it the hard way. Use the awesome MODEL clause:
SELECT V
FROM DUAL
MODEL DIMENSION BY (0 R)
MEASURES (0 V)
RULES ITERATE (100) (
V[ITERATION_NUMBER] = ITERATION_NUMBER + 1
)
ORDER BY 1
Proof: http://sqlfiddle.com/#!4/d41d8/20837
You could use XMLTABLE:
SELECT rownum
FROM XMLTABLE('1 to 100');
-- alternatively(useful for generating range i.e. 10-20)
SELECT (COLUMN_VALUE).GETNUMBERVAL() AS NUM
FROM XMLTABLE('1 to 100');
DBFiddle Demo
If you want your integers to be bound between two integers (i.e. start with something other than 1), you can use something like this:
with bnd as (select 4 lo, 9 hi from dual)
select (select lo from bnd) - 1 + level r
from dual
connect by level <= (select hi-lo from bnd);
It gives:
4
5
6
7
8
Peter's answer is my favourite, too.
If you are looking for more details there is a quite good overview, IMO, here.
Especially interesting is to read the benchmarks.
Using GROUP BY CUBE:
SELECT ROWNUM
FROM (SELECT 1 AS c FROM dual GROUP BY CUBE(1,1,1,1,1,1,1) ) sub
WHERE ROWNUM <=100;
Rextester Demo
A variant of Peter's example, that demonstrates a way this could be used to generate all numbers between 0 and 99.
with digits as (
select mod(rownum,10) as num
from dual
connect by rownum <= 10
)
select a.num*10+b.num as num
from digits a
,digits b
order by num
;
Something like this becomes useful when you are doing batch identifier assignment, and looking for the items that have not yet been assigned.
For example, if you are selling bingo tickets, you may want to assign batches of 100 floor staff (guess how i used to fund raise for sports). As they sell a batch, they are given the next batch in sequence. However, people purchasing the tickets can select to purchase any tickets from the batch. The question may be asked, "what tickets have been sold".
In this case, we only have a partial, random, list of tickets that were returned within the given batch, and require a complete list of all possibilities to determine which we don't have.
with range as (
select mod(rownum,100) as num
from dual
connect by rownum <= 100
),
AllPossible as (
select a.num*100+b.num as TicketNum
from batches a
,range b
order by num
)
select TicketNum as TicketsSold
from AllPossible
where AllPossible.Ticket not in (select TicketNum from TicketsReturned)
;
Excuse the use of key words, I changed some variable names from a real world example.
... To demonstrate why something like this would be useful
I created an Oracle function that returns a table of numbers
CREATE OR REPLACE FUNCTION [schema].FN_TABLE_NUMBERS(
NUMINI INTEGER,
NUMFIN INTEGER,
EXPONENCIAL INTEGER DEFAULT 0
) RETURN TBL_NUMBERS
IS
NUMEROS TBL_NUMBERS;
INDICE NUMBER;
BEGIN
NUMEROS := TBL_NUMBERS();
FOR I IN (
WITH TABLA AS (SELECT NUMINI, NUMFIN FROM DUAL)
SELECT NUMINI NUM FROM TABLA UNION ALL
SELECT
(SELECT NUMINI FROM TABLA) + (LEVEL*TO_NUMBER('1E'||TO_CHAR(EXPONENCIAL))) NUM
FROM DUAL
CONNECT BY
(LEVEL*TO_NUMBER('1E'||TO_CHAR(EXPONENCIAL))) <= (SELECT NUMFIN-NUMINI FROM TABLA)
) LOOP
NUMEROS.EXTEND;
INDICE := NUMEROS.COUNT;
NUMEROS(INDICE):= i.NUM;
END LOOP;
RETURN NUMEROS;
EXCEPTION
WHEN NO_DATA_FOUND THEN
RETURN NUMEROS;
WHEN OTHERS THEN
RETURN NUMEROS;
END;
/
Is necessary create a new data type:
CREATE OR REPLACE TYPE [schema]."TBL_NUMBERS" IS TABLE OF NUMBER;
/
Usage:
SELECT COLUMN_VALUE NUM FROM TABLE([schema].FN_TABLE_NUMBERS(1,10))--integers difference: 1;2;.......;10
And if you need decimals between numbers by exponencial notation:
SELECT COLUMN_VALUE NUM FROM TABLE([schema].FN_TABLE_NUMBERS(1,10,-1));--with 0.1 difference: 1;1.1;1.2;.......;10
SELECT COLUMN_VALUE NUM FROM TABLE([schema].FN_TABLE_NUMBERS(1,10,-2));--with 0.01 difference: 1;1.01;1.02;.......;10
If you want to generate the list of numbers 1 - 100 you can use the cartesian product of {1,2,3,4,5,6,6,7,8,9,10} X {0,10,20,30,40,50,60,70,80,90}
https://en.wikipedia.org/wiki/Cartesian_product
Something along the lines of the following:
SELECT
ones.num + tens.num
FROM
(
SELECT 1 num UNION ALL
SELECT 2 num UNION ALL
SELECT 3 num UNION ALL
SELECT 4 num UNION ALL
SELECT 5 num UNION ALL
SELECT 6 num UNION ALL
SELECT 7 num UNION ALL
SELECT 9 num UNION ALL
SELECT 10 num
) as ones
CROSS JOIN
(
SELECT 0 num UNION ALL
SELECT 10 num UNION ALL
SELECT 20 num UNION ALL
SELECT 30 num UNION ALL
SELECT 40 num UNION ALL
SELECT 50 num UNION ALL
SELECT 60 num UNION ALL
SELECT 70 num UNION ALL
SELECT 80 num UNION ALL
SELECT 90 num
) as tens;
I'm not able to test this out on an oracle database, you can place the dual where it belongs but it should work.
SELECT * FROM `DUAL` WHERE id>0 AND id<101
The above query is written in SQL in the database.