Create table from loop output Oracle SQL - sql

I need to pull a random sample from a table of ~5 million observations based on 175 demographic options. The demographic table is something like this form:
1 40 4%
2 30 3%
3 30 3%
- -
174 2 .02%
175 1 .01%
Basically I need this same demographic breakdown randomly sampled from the 5M row table. For each demographic I need a sample of the same one from the larger table but with 5x the number of observations (example: for demographic 1 I want a random sample of 200).
SELECT *
FROM (
SELECT *
FROM my_table
ORDER BY
dbms_random.value
)
WHERE rownum <= 100;
I've used this syntax before to get a random sample but is there any way I can modify this as a loop and substitute variable names from existing tables? I'll try to encapsulate the logic I need in pseudocode:
for (each demographic_COLUMN in TABLE1)
select random(5*num_obs_COLUMN in TABLE1) from ID_COLUMN in TABLE2
/*somehow join the results of each step in the loop into one giant column of IDs */

You could join your tables (assuming the 1-175 demographic value exists in both, or there is an equivalent column to join on), something like:
select id
from (
select d.demographic, d.percentage, t.id,
row_number() over (partition by d.demographic order by dbms_random.value) as rn
from demographics d
join my_table t on t.demographic = d.demographic
)
where rn <= 5 * percentage
Each row in the main table is given a random pseudo-row-number within its demographic (via the analytic row_number()). The outer query then uses the relevant percentage to select how many of those randomly-ordered rows for each demographic to return.
I'm not sure I've understood how you're actually picking exactly how many of each you want, so that probably needs to be adjusted.
Demo with a smaller sample in a CTE, and matching smaller match condition:
-- CTEs for sample data
with my_table (id, demographic) as (
select level, mod(level, 175) + 1 from dual connect by level <= 175000
),
demographics (demographic, percentage, str) as (
select 1, 40, '4%' from dual
union all select 2, 30, '3%' from dual
union all select 3, 30, '3%' from dual
-- ...
union all select 174, 2, '.02%' from dual
union all select 175, 1, '.01%' from dual
)
-- actual query
select demographic, percentage, id, rn
from (
select d.demographic, d.percentage, t.id,
row_number() over (partition by d.demographic order by dbms_random.value) as rn
from demographics d
join my_table t on t.demographic = d.demographic
)
where rn <= 5 * percentage;
DEMOGRAPHIC PERCENTAGE ID RN
----------- ---------- ---------- ----------
1 40 94150 1
1 40 36925 2
1 40 154000 3
1 40 82425 4
...
1 40 154350 199
1 40 126175 200
2 30 36051 1
2 30 1051 2
2 30 100451 3
2 30 18026 149
2 30 151726 150
3 30 125302 1
3 30 152252 2
3 30 114452 3
...
3 30 104652 149
3 30 70527 150
174 2 35698 1
174 2 67548 2
174 2 114798 3
...
174 2 70698 9
174 2 30973 10
175 1 139649 1
175 1 156974 2
175 1 145774 3
175 1 97124 4
175 1 40074 5
(you only need the ID, but I'm including the other columns for context); or more succinctly:
with my_table (id, demographic) as (
select level, mod(level, 175) + 1 from dual connect by level <= 175000
),
demographics (demographic, percentage, str) as (
select 1, 40, '4%' from dual
union all select 2, 30, '3%' from dual
union all select 3, 30, '3%' from dual
-- ...
union all select 174, 2, '.02%' from dual
union all select 175, 1, '.01%' from dual
)
select demographic, percentage, count(id) as ids, min(id) as min_id, max(id) as max_id
from (
select d.demographic, d.percentage, t.id,
row_number() over (partition by d.demographic order by dbms_random.value) as rn
from demographics d
join my_table t on t.demographic = d.demographic
)
where rn <= 5 * percentage
group by demographic, percentage
order by demographic;
DEMOGRAPHIC PERCENTAGE IDS MIN_ID MAX_ID
----------- ---------- ---------- ---------- ----------
1 40 200 175 174825
2 30 150 1 174126
3 30 150 2452 174477
174 2 10 23448 146648
175 1 5 19074 118649
db<>fiddle

Related

Calculation of distinct and sum

I have a a below table where all the columns are same except for group column and I am calculating count(distinct group) and blocks in the same table:
Input:
id
time
CODE
group
value
total_blocks
1
22
32206
mn2
1
200
1
22
32206
mn4
1
200
Output:
id
time
CODE
group
value
count(distinct group)
blocks
1
22
32206
mn2
1
2
100
1
22
32206
mn4
1
2
100
count(distinct group) is just distinct values (mn2 and mn4) and blocks overall wrt to code(32206) is 200, but I am splitting the same over the two rows.
The output should look exactly the same in the final, without removal of any columns.
I tried using count(distinct) but it didn't work
Oracle:
Try it like here:
WITH -- S a m p l e d a t a
tbl (ID, A_TIME, CODE, A_GROUP, A_VALUE, TOTAL_BLOCKS) AS
(
Select 1, 22, 32206, 'mn2', 1, 200 From Dual Union All
Select 1, 22, 32206, 'mn4', 1, 200 From Dual
)
-- S Q L --
Select
ID, A_TIME, CODE, A_GROUP, A_VALUE,
Count(DISTINCT A_GROUP) OVER(Partition By CODE) "COUNT_DIST_GROUP", -- Count distinct groups per CODE
-- Count(DISTINCT A_GROUP) OVER() "COUNT_DIST_GROUP", --Count distinct groups over the whole table
TOTAL_BLOCKS / Count(*) OVER(Partition By CODE) "BLOCKS" -- TOTAL_BLOCKS divided by number of rows per CODDE
-- TOTAL_BLOCKS / Count(*) OVER() "BLOCKS" -- TOTAL_BLOCKS divided by number of rows in the whole table
From
tbl
R e s u l t :
ID A_TIME CODE A_GROUP A_VALUE COUNT_DIST_GROUP BLOCKS
---------- ---------- ---------- ------- ---------- ---------------- ----------
1 22 32206 mn2 1 2 100
1 22 32206 mn4 1 2 100
Comments in code explain the basic use of Count() Over() analytic function. More about analytic functions here.
Using just ROW_NUMBER() analytic function and Max() aggregate function...
-- S Q L --
Select
r.ID, r.A_TIME, r.CODE, t.A_GROUP, r.A_VALUE, MAX_RN "COUNT_DIST_GROUP", (TOTAL_BLOCKS / MAX_RN) "BLOCKS"
From
( SELECT ID, A_TIME, CODE, A_VALUE, Max(RN) "MAX_RN"
FROM (Select ID, A_TIME, CODE, A_VALUE, Row_Number() OVER(Partition By CODE Order By CODE, A_GROUP) "RN"
From tbl )
GROUP BY ID, A_TIME, CODE, A_VALUE ) r
Inner Join tbl t ON(t.CODE = r.CODE)
ID A_TIME CODE A_GROUP A_VALUE COUNT_DIST_GROUP BLOCKS
---------- ---------- ---------- ------- ---------- ---------------- ----------
1 22 32206 mn2 1 2 100
1 22 32206 mn4 1 2 100
... and it works with another group of similar data:
WITH -- S a m p l e d a t a
tbl (ID, A_TIME, CODE, A_GROUP, A_VALUE, TOTAL_BLOCKS) AS
(
Select 1, 22, 32206, 'mn2', 1, 200 From Dual Union All
Select 1, 22, 32206, 'mn4', 1, 200 From Dual Union All
--
Select 1, 22, 32207, 'mn6', 1, 450 From Dual Union All
Select 1, 22, 32207, 'mn7', 1, 450 From Dual Union All
Select 1, 22, 32207, 'mn8', 1, 450 From Dual
)
-- S Q L --
Select
r.ID, r.A_TIME, r.CODE, t.A_GROUP, r.A_VALUE, MAX_RN "COUNT_DIST_GROUP", (TOTAL_BLOCKS / MAX_RN) "BLOCKS"
From
( SELECT ID, A_TIME, CODE, A_VALUE, Max(RN) "MAX_RN"
FROM (Select ID, A_TIME, CODE, A_VALUE, Row_Number() OVER(Partition By CODE Order By CODE, A_GROUP) "RN"
From tbl )
GROUP BY ID, A_TIME, CODE, A_VALUE ) r
Inner Join tbl t ON(t.CODE = r.CODE)
ID A_TIME CODE A_GROUP A_VALUE COUNT_DIST_GROUP BLOCKS
---------- ---------- ---------- ------- ---------- ---------------- ----------
1 22 32206 mn2 1 2 100
1 22 32206 mn4 1 2 100
1 22 32207 mn6 1 3 150
1 22 32207 mn7 1 3 150
1 22 32207 mn8 1 3 150

Select top 10 records of a certain key sql

I have 2 tables connected with each other, for simplicity lets say ID of the 1st is connected to UserCreated_FK of the 2nd table.
Table A:
ID
NAME
237
Gal
240
blah
250
blah2
in the parallel table ( aka table B ) I have the following:
UserCreated_FK
col2
237
10/10
20 more rows of 237
20 more rows of 237
240
11/10
5 more rows of 240
20 more rows of 240
250
12/10
14 more rows of 250
20 more rows of 250
Result wanted:
id.237 x10 last(might be sorted by date col I have).
no 240 (cause less than 10).
id.250 x10 last records(might be sorted by date col I have).
My task is to check which of those IDs(person'sIds) have more then 10 records on table B ( because table A is just the ids and names not the actual records of them.
So for example, ID no.237 got 19 records on table B I want to be able to pull the last 10 records of that user.
And again, the condition is only users that have more than 10 records, then to pull the last of 10 of those..
Hope I was understandable, thanks a lot, any help will be appreciated!
You can join the two tables and then use analytic functions:
SELECT *
FROM (
SELECT a.id,
a.name,
b.col2,
b.datetime,
COUNT(*) OVER (PARTITION BY a.id) AS b_count,
ROW_NUMBER() OVER (PARTITION BY a.id ORDER BY b.datetime DESC) AS rn
FROM table_a a
INNER JOIN table_b b
ON (a.id = b.user_created_fk)
)
WHERE b_count >= 10
AND rn <= 10;
If you just want the values from table_b then you do not need to use table_a:
SELECT *
FROM (
SELECT b.*,
COUNT(*) OVER (PARTITION BY b.user_created_fk) AS b_count,
ROW_NUMBER() OVER (
PARTITION BY b.user_created_fk ORDER BY b.datetime DESC
) AS rn
FROM table_b b
)
WHERE b_count >= 10
AND rn <= 10;
The WITH clause is here just to generate some sample data and it is not a part of the answer.
WITH
tbl_a AS
(
Select 237 "ID", 'Gal' "A_NAME" From Dual Union All
Select 240 "ID", 'blah' "A_NAME" From Dual Union All
Select 250 "ID", 'blah2' "A_NAME" From Dual
),
tbl_b AS
(
Select 237 "FK_ID", SYSDATE - (21 - LEVEL) "A_DATE" From Dual Connect By LEVEL <= 20 Union All
Select 240 "FK_ID", SYSDATE - (6 - LEVEL) "A_DATE" From Dual Connect By LEVEL <= 5 Union All
Select 250 "FK_ID", SYSDATE - (15 - LEVEL) "A_DATE" From Dual Connect By LEVEL <= 14
)
Select
a.ID,
a.A_NAME,
b.TOTAL_COUNT "TOTAL_COUNT",
b.RN "RN",
b.A_DATE
From
tbl_a a
Inner Join
( Select FK_ID, A_DATE, ROW_NUMBER() OVER(Partition By FK_ID Order By FK_ID, A_DATE DESC) "RN", Count(*) OVER(Partition By FK_ID) "TOTAL_COUNT" From tbl_b ) b ON(b.FK_ID = a.ID)
WHERE
b.RN <= 10 And b.TOTAL_COUNT > 10
With your sample data the main SQL joins tbl_a with a subquery by FK_ID = ID. The subquery uses analytic functions to get total rows per FK_ID (in tbl_b) and gives row numbers (partitioned by FK_ID) to the dataset. All the rest is in the Where clause.
Result:
/*
ID A_NAME TOTAL_COUNT RN A_DATE
---------- ------ ----------- ---------- ---------
237 Gal 20 1 29-OCT-22
237 Gal 20 2 28-OCT-22
237 Gal 20 3 27-OCT-22
237 Gal 20 4 26-OCT-22
237 Gal 20 5 25-OCT-22
237 Gal 20 6 24-OCT-22
237 Gal 20 7 23-OCT-22
237 Gal 20 8 22-OCT-22
237 Gal 20 9 21-OCT-22
237 Gal 20 10 20-OCT-22
250 blah2 14 1 29-OCT-22
250 blah2 14 2 28-OCT-22
250 blah2 14 3 27-OCT-22
250 blah2 14 4 26-OCT-22
250 blah2 14 5 25-OCT-22
250 blah2 14 6 24-OCT-22
250 blah2 14 7 23-OCT-22
250 blah2 14 8 22-OCT-22
250 blah2 14 9 21-OCT-22
250 blah2 14 10 20-OCT-22
*/
Regards...

Procedure for adding one column from one table to another table and its data

I have two tables table1 and table2.
Table 1 has three columns
id name age
----------------
1 ram 27
2 rafi 30
Table 2-
no place
--------------
101 agra
102 delhi
103 chennai
104 hyd
In this situation I want to create a procedure to get the no column of table2 will be added to id column of table1 and the remaining data should be copied same and and if the count of table2 is more then the data should be repeated as shown below
id name age
-----------------
1 ram 27
2 rafi 30
101 ram 27
102 rafi 30
103 ram 27
104 rafi 30
Please help
Assuming table1.id1 is a monotonically increasing series starting at 1 this simple trick will work:
insert into table1
select sq2.no#
, t1.name
, t1.age
from table1 t1
inner join (
select t2.no#
, mod(rownum, sq.cnt)+1 as mod#
from table2 t2
cross join (select count(*) as cnt from table1) sq
) sq2 on sq2.mod# = t1.id
/
There is a demo on db<>fiddle.
If table1.id1 contains gaps, or does not start from 1, you will need to replace table1 in the FROM clause above with another subquery ...
(select t1.*
, row_number() over (order by t1.id) as mod#
from table1 t1 ) sq1
... and inner join that to the existing subquery on mod#.
You can achieve it using row_number analytical function with join using MOD function as following:
SQL> WITH TABLE_1(ID, NAME, AGE)
2 AS (SELECT 1, 'ram', 27 FROM DUAL UNION ALL
3 SELECT 2, 'rafi', 30 FROM DUAL),
4 TABLE_2(NO, PLACE)
5 AS (SELECT 101, 'agra' FROM DUAL UNION ALL
6 SELECT 102, 'delhi' FROM DUAL UNION ALL
7 SELECT 103, 'chennai' FROM DUAL UNION ALL
8 SELECT 104, 'hyd' FROM DUAL)
9 -- YOUR QUERY STARTS FROM HERE
10 SELECT ID, NAME, AGE FROM TABLE_1
11 UNION ALL
12 SELECT T2.NO, T1.NAME, T1.AGE
13 FROM
14 (SELECT T.*,
15 ROW_NUMBER() OVER( ORDER BY ID DESC) AS R,
16 COUNT(1) OVER() C
17 FROM TABLE_1 T ) T1
18 --
19 JOIN (SELECT T.*,
20 ROW_NUMBER() OVER( ORDER BY NO ) AS R,
21 COUNT(1) OVER() C
22 FROM TABLE_2 T ) T2
23 ON ( MOD(T2.R, T1.C) = T1.R - 1 )
24 ORDER BY ID;
ID NAME AGE
---------- ---- ----------
1 ram 27
2 rafi 30
101 ram 27
102 rafi 30
103 ram 27
104 rafi 30
6 rows selected.
SQL>
Cheers!!

Oracle Finding a string match from multiple database tables

This is somewhat a complex problem to describe, but I'll try to explain it with an example. I thought I would have been able to use the Oracle Instr function to accomplish this, but it does not accept queries as parameters.
Here is a simplification of my data:
Table1
Person Qualities
Joe 5,6,7,8,9
Mary 7,8,10,15,20
Bob 7,8,9,10,11,12
Table2
Id Desc
5 Nice
6 Tall
7 Short
Table3
Id Desc
8 Angry
9 Sad
10 Fun
Table4
Id Desc
11 Boring
12 Happy
15 Cool
20 Mad
Here is somewhat of a query to give an idea of what I'm trying to accomplish:
select * from table1
where instr (Qualities, select Id from table2, 1,1) <> 0
and instr (Qualities, select Id from table3, 1,1) <> 0
and instr (Qualities, select Id from table3, 1,1) <> 0
I'm trying to figure out which people have at least 1 quality from each of the 3 groups of qualities (tables 2,3, and 4)
So Joe would not be returned in the results because he does not have the quality from each of the 3 groups, but Mary and Joe would since they have at least 1 quality from each group.
We are running Oracle 12, thanks!
Here's one option:
SQL> with
2 table1 (person, qualities) as
3 (select 'Joe', '5,6,7,8,9' from dual union all
4 select 'Mary', '7,8,10,15,20' from dual union all
5 select 'Bob', '7,8,9,10,11,12' from dual
6 ),
7 table2 (id, descr) as
8 (select 5, 'Nice' from dual union all
9 select 6, 'Tall' from dual union all
10 select 7, 'Short' from dual
11 ),
12 table3 (id, descr) as
13 (select 8, 'Angry' from dual union all
14 select 9, 'Sad' from dual union all
15 select 10, 'Fun' from dual
16 ),
17 table4 (id, descr) as
18 (select 11, 'Boring' from dual union all
19 select 12, 'Happy' from dual union all
20 select 15, 'Cool' from dual union all
21 select 20, 'Mad' from dual
22 ),
23 t1new (person, id) as
24 (select person, regexp_substr(qualities, '[^,]+', 1, column_value) id
25 from table1 cross join table(cast(multiset(select level from dual
26 connect by level <= regexp_count(qualities, ',') + 1
27 ) as sys.odcinumberlist))
28 )
29 select a.person,
30 count(b.id) bid,
31 count(c.id) cid,
32 count(d.id) did
33 from t1new a left join table2 b on a.id = b.id
34 left join table3 c on a.id = c.id
35 left join table4 d on a.id = d.id
36 group by a.person
37 having ( count(b.id) > 0
38 and count(c.id) > 0
39 and count(d.id) > 0
40 );
PERS BID CID DID
---- ---------- ---------- ----------
Bob 1 3 2
Mary 1 2 2
SQL>
What does it do?
lines #1 - 22 represent your sample data
T1NEW CTE (lines #23 - 28) splits comma-separated qualities into rows, per every person
final select (lines #29 - 40) are outer joining t1new with each of "description" tables (table2/3/4) and counting how many qualities are contained in there for each of person's qualities (represented by rows from t1new)
having clause is here to return only desired persons; each of those counts have to be a positive number
Maybe this will help:
{1} Create a view that categorises all qualities and allows you to SELECT quality IDs and categories . {2} JOIN the view to TABLE1 and use a join condition that "splits" the CSV value stored in TABLE1.
{1} View
create or replace view allqualities
as
select 1 as category, id as qid, descr from table2
union
select 2, id, descr from table3
union
select 3, id, descr from table4
;
select * from allqualities order by category, qid ;
CATEGORY QID DESCR
---------- ---------- ------
1 5 Nice
1 6 Tall
1 7 Short
2 8 Angry
2 9 Sad
2 10 Fun
3 11 Boring
3 12 Happy
3 15 Cool
3 20 Mad
{2} Query
-- JOIN CONDITION:
-- {1} add a comma at the start and at the end of T1.qualities
-- {2} remove all blanks (spaces) from T1.qualities
-- {3} use LIKE and the qid (of allqualities), wrapped in commas
--
-- inline view: use UNIQUE, otherwise we may get counts > 3
--
select person
from (
select unique person, category
from table1 T1
join allqualities A
on ',' || replace( T1.qualities, ' ', '' ) || ',' like '%,' || A.qid || ',%'
)
group by person
having count(*) = ( select count( distinct category ) from allqualities )
;
-- result
PERSON
Bob
Mary
Tested w/ Oracle 18c and 11g. DBfiddle here.

Find closest or higher values in SQL

I have a table:
table1
rank value
1 10
25 120
29 130
99 980
I have to generate the following table:
table2
rank value
1 10
2 10
3 10
4 10
5 10
6 10
7 10
8 10
9 10
10 10
11 10
12 10
13 120
14 120
15 120
.
.
.
.
25 120
26 120
27 130
28 130
29 130
30 130
.
.
.
.
.
62 980
63 980
.
.
.
99 980
100 980
So, table2 should have all values from 1 to 100. There are 3 cases:
If it's an exact match, for ex. rank 25, value would be 120
Find closest, for ex. for rank 9 in table2, we do NOT have exact match, but 1 is closest to 9 (9-1 = 8 whereas 25-9 = 16), so assign value of 1
If there is equal distribution from both sides, use higher rank value, for ex. for rank 27, we have 25 as well as 29 which are equally distant, so take higher value which is 29 and assign value.
something like
-- your testdata
with table1(rank,
value) as
(select 1, 10
from dual
union all
select 25, 120
from dual
union all
select 29, 130
from dual
union all
select 99, 980
from dual),
-- range 1..100
data(rank) as
(select level from dual connect by level <= 100)
select d.rank,
min(t.value) keep(dense_rank first order by abs(t.rank - d.rank) asc, t.rank desc)
from table1 t, data d
group by d.rank;
If I understand well your need, you could use the following:
select num, value
from (
select num, value, row_number() over (partition by num order by abs(num-rank) asc, rank desc) as rn
from table1
cross join ( select level as num from dual connect by level <= 100) numbers
)
where rn = 1
This joins your table with the [1,100] interval and then keeps only the first row for each number, ordering by the difference and keeping, in case of equal difference, the greatest value.
Join hierarchical number generator with your table and use lag() with ignore nulls clause:
select h.rank, case when value is null
then lag(value ignore nulls) over (order by h.rank)
else value
end value
from (select level rank from dual connect by level <= 100) h
left join t on h.rank = t.rank
order by h.rank
Test:
with t(rank, value) as (
select 1, 10 from dual union all
select 25, 120 from dual union all
select 29, 130 from dual union all
select 99, 980 from dual )
select h.rank, case when value is null
then lag(value ignore nulls) over (order by h.rank)
else value
end value
from (select level rank from dual connect by level <= 100) h
left join t on h.rank = t.rank
order by h.rank
RANK RANK
---------- ----------
1 10
2 10
...
24 10
25 120
26 120
27 120
28 120
29 130
30 130
...
98 130
99 980
100 980
Here's an alternative that doesn't need a cross join (but does use a couple of analytic functions, so you'd need to test whether this is more performant for your set of data than the other solutions):
WITH sample_data AS (SELECT 1 rnk, 10 VALUE FROM dual UNION ALL
SELECT 25 rnk, 120 VALUE FROM dual UNION ALL
SELECT 29 rnk, 130 VALUE FROM dual UNION ALL
SELECT 99 rnk, 980 VALUE FROM dual)
SELECT rnk + LEVEL - 1 rnk,
CASE WHEN rnk + LEVEL - 1 < rnk + (next_rank - rnk)/2 THEN
VALUE
ELSE next_value
END VALUE
FROM (SELECT rnk,
VALUE,
LEAD(rnk, 1, 100 + 1) OVER (ORDER BY rnk) next_rank,
LEAD(VALUE, 1, VALUE) OVER (ORDER BY rnk) next_value
FROM sample_data)
CONNECT BY PRIOR rnk = rnk
AND PRIOR sys_guid() IS NOT NULL
AND LEVEL <= next_rank - rnk;
RNK VALUE
---------- ----------
1 10
2 10
... ...
12 10
13 120
... ...
24 120
25 120
26 120
27 130
28 130
29 130
30 130
... ...
63 130
64 980
65 980
... ...
98 980
99 980
100 980
N.B, I'm not sure why you have 62 and 63 as having a value of 980 - the mid point between 29 and 99 is 64.
Also, you'll see that I've used 100 + 1 instead of 101 - this is because if you wanted to parameterise things, you would replace 100 with the parameter - e.g. v_max_rank + 1
You can use cross join. Use the below query to get your result:
select t1.* from table1 t1
cross join
(select * from table1) t2
on (t1.rank=t2.rank);