Strange random behavior in where clause - sql

I have a table like this:
Id | GroupId | Category
------------------------
1 | 101 | A
2 | 101 | B
3 | 101 | C
4 | 103 | B
5 | 103 | D
6 | 103 | A
........................
I need select one of the GroupId randomly. For this I have used the following PL/SQL code block:
declare v_group_count number;
v_group_id number;
begin
select count(distinct GroupId) into v_group_count from MyTable;
SELECT GroupId into v_group_id FROM
(
SELECT GroupId, ROWNUM RN FROM
(SELECT DISTINCT GroupId FROM MyTable)
)
WHERE RN=Round(dbms_random.value(1, v_group_count));
end;
Because I rounded random value then it will be an integer value and the WHERE RN=Round(dbms_random.value(1, v_group_count)) condition must return one row always. Generally it gives me one row as expected. But strangely sometimes it gives me no rows and sometimes it returns two rows. That's why it gives error in this section:
SELECT GroupId into v_group_id
Anyone knows the reason of that behaviour?

round(dbms_random.value(1, v_group_count)) is being executed for every row, so every row might be selected or not.
P.s.
ROUND is a bad choice.
The probability of getting any of the edge values (e.g. 1 and 10) is half the probability of getting any other value (e.g. 2 to 9).
It is 0.0555... (1/18) Vs. 0.111... (1/9)
[ 1,1.5) --> 1
[1.5,2.5) --> 2
.
.
.
[8.5,9.5) --> 9
[9.5, 10) --> 10
select n,count(*)
from (select round(dbms_random.value(1, 10)) as n
from dual
connect by level <= 100000
)
group by n
order by n
;
N COUNT(*)
1 5488
2 11239
3 11236
4 10981
5 11205
6 11114
7 11211
8 11048
9 10959
10 5519
My recommendation is to use FLOOR on dbms_random.value(1,N+1)
select n,count(*)
from (select floor(dbms_random.value(1, 11)) as n
from dual
connect by level <= 100000
)
group by n
order by n
;
N COUNT(*)
1 10091
2 10020
3 10020
4 10021
5 9908
6 10036
7 10054
8 9997
9 9846
10 10007

If you want to select one randomly:
declare v_group_count number;
v_group_id number;
begin
SELECT GroupId into v_group_id
FROM (SELECT DISTINCT GroupId
FROM MyTable
ORDER BY dbms_random.value
) t
WHERE rownum = 1
end;

Related

How to generate a dynamic sequence in Oracle

I have a table A which represents a valid sequence of numbers, which looks something like this:
| id | start | end | step |
|----|-------|-------|------|
| 1 | 4000 | 4999 | 4 |
| 2 | 3 | 20000 | 1 |
A[1] thus represents the sequence [4000, 4004, 4008, ...4996]
and another B of "occupied" numbers that looks like this:
| id | number | ... |
|-----|--------|-----|
| 1 | 4000 | ... |
| 2 | 4003 | ... |
| ... | ... | ... |
I want to construct a query which using A and B, finds the first unoccupied number for a particular sequence.
I have been trying – and failing – to do, is to generate a list of valid numbers from a row in A and then left outer join table B on B.number = valid_number where B.id is null from which result I could then select min(...).
How about this?
I simplified your test case (END value isn't that high) in order to save space (otherwise, I'd have to use smaller font :)).
What does it do?
CTEs A and B are your sample data
FULL_ASEQ creates a sequence of numbers from table A
if you want what it returns, remove everything from line #17 and - instead of it - run select * from full_aseq
the final query returns the first available sequence number, i.e. the one that hasn't been used yet (lines #19 - 23).
Here you go:
SQL> with
2 a (id, cstart, cend, step) as
3 (select 1, 4000, 4032, 4 from dual union all
4 select 2, 3, 20, 1 from dual
5 ),
6 b (id, cnumber) as
7 (select 1, 4000 from dual union all
8 select 1, 4004 from dual union all
9 select 2, 4003 from dual
10 ),
11 full_aseq as
12 (select a.id, a.cstart + column_value * a.step seq_val
13 from a cross join table(cast(multiset(select level from dual
14 connect by level <= (a.cend - a.cstart) / a.step
15 ) as sys.odcinumberlist))
16 )
17 select f.id, min(f.seq_val) min_seq_val
18 from full_aseq f
19 where not exists (select null
20 from b
21 where b.id = f.id
22 and b.cnumber = f.seq_val
23 )
24 group by f.id;
ID MIN_SEQ_VAL
---------- -----------
1 4008
2 4
SQL>
You can use LEAD to compute the difference between ordered rows in table B. Any row having a difference (to the next row) that exceeds the step value for that sequence is a gap.
Here's that concept, implemented (below). I threw in a sequence ID "3" that has no values in table B, to illustrate that it generates the proper first value.
with
a (id, cstart, cend, step) as
(select 1, 4000, 4032, 4 from dual union all
select 2, 3, 20000, 1 from dual union all
select 3, 100, 200, 3 from dual
),
b (id, cnumber) as
(select 1, 4000 from dual union all
select 1, 4004 from dual union all
select 1, 4012 from dual union all
select 2, 4003 from dual
),
work1 as (
select a.id,
b.cnumber cnumber,
lead(b.cnumber,1) over ( partition by b.id order by b.cnumber ) - b.cnumber diff,
a.step,
a.cstart,
a.cend
from a left join b on b.id = a.id )
select w1.id,
CASE WHEN min(w1.cnumber) is null THEN w1.cstart
WHEN min(w1.cnumber)+w1.step < w1.cend THEN min(w1.cnumber)+w1.step
ELSE null END next_cnumber
from work1 w1
where ( diff is null or diff > w1.step )
group by w1.id, w1.step, w1.cstart, w1.cend
order by w1.id
+----+--------------+
| ID | NEXT_CNUMBER |
+----+--------------+
| 1 | 4008 |
| 2 | 4004 |
| 3 | 100 |
+----+--------------+
You can further improve the results by excluding rows in table B that are impossible for the sequence. E.g., exclude a row for ID #1 having a value of, say, 4007.
I'll ask the obvious and suggest why not use an actual sequence?
SQL> set timing on
SQL> CREATE SEQUENCE SEQ_TEST_A
START WITH 4000
INCREMENT BY 4
MINVALUE 4000
MAXVALUE 4999
NOCACHE
NOCYCLE
ORDER
Sequence created.
Elapsed: 00:00:01.09
SQL> CREATE SEQUENCE SEQ_TEST_B
START WITH 3
INCREMENT BY 1
MINVALUE 3
MAXVALUE 20000
NOCACHE
NOCYCLE
ORDER
Sequence created.
Elapsed: 00:00:00.07
SQL> -- get nexvals from A
SQL> select seq_test_a.nextval from dual
NEXTVAL
----------
4000
1 row selected.
Elapsed: 00:00:00.09
SQL> select seq_test_a.nextval from dual
NEXTVAL
----------
4004
1 row selected.
Elapsed: 00:00:00.08
SQL> select seq_test_a.nextval from dual
NEXTVAL
----------
4008
1 row selected.
Elapsed: 00:00:00.08
SQL> -- get nextvals from B
SQL> select seq_test_b.nextval from dual
NEXTVAL
----------
3
1 row selected.
Elapsed: 00:00:00.08
SQL> select seq_test_b.nextval from dual
NEXTVAL
----------
4
1 row selected.
Elapsed: 00:00:00.08
SQL> select seq_test_b.nextval from dual
NEXTVAL
----------
5
1 row selected.
Elapsed: 00:00:00.08

Oracle: Get the smaller values and the first greater value

I have a table like this;
ID Name Value
1 Sample1 10
2 Sample2 20
3 Sample3 30
4 Sample4 40
And I would like to get all of the rows that contain smaller values and the first row that contains greater value.
For example when I send '25' as a parameter to Value column, I want to have following table;
ID Name Value
1 Sample1 10
2 Sample2 20
3 Sample3 30
I'm stuck at this point, thanks in advance.
Analytic functions to the rescue!
create table your_table (
id number,
value number)
insert into your_table
select level, level * 10
from dual
connect by level <= 5
select * from your_table
id | value
----+------
1 | 10
2 | 20
3 | 30
4 | 40
5 | 50
Ok, now we use lag(). Specify field, offset and the default value (for the first row that has no previous one).
select id, value, lag(value, 1, value) over (order by value) previous_value
from your_table
id | value | previous_value
---+-------+---------------
1 | 10 | 10
2 | 20 | 10
3 | 30 | 20
4 | 40 | 30
5 | 50 | 40
Now apply where.
select id, value
from (
select id, value, lag(value, 1, value) over (order by value) previous_value
from your_table)
where previous_value < 25
Works for me.
id | value
----+------
1 | 10
2 | 20
3 | 30
Of course you have to have some policy on ties. For example, what happens if two rows have the same value and they are both first — do you want to keep both or only one of them. Or maybe you have some other criterion for breaking the tie (say, sort by id). But the idea is fairly simple.
you can try a query like this :
SELECT * FROM YourTableName WHERE Value < 25 OR ID IN (SELECT TOP 1 ID FROM YourTableName WHERE Value >= 25 ORDER BY Value)
in Oracle, you can try this (but see "That Young Man" answer, I think it's better than mine):
SELECT * FROM (
SELECT ID, NAME, VALUE, 1 AS RN
FROM YT
WHERE VALUE < 25
UNION ALL
SELECT ID, NAME, VALUE, ROW_NUMBER()OVER (ORDER BY VALUE) AS RN
FROM YT
WHERE VALUE > 25
) A
WHERE RN=1;

Loop over "index table" to find out value of each row in Hive

I have two tables.
table AAA
userid exp
1 100
2 235325
3 3242
4 32543
table BBB
level levelup_exp
1 10
2 100
3 1000
4 10000
5 100000
6 1000000
If I want to figure out user's level, I should loop over table BBB (ordered by level DESC), compare AAA.exp with BBB.levelup_exp, if exp is > than levelup_exp but <= than next levelup_exp, user's level is found.
So the output should be like this:
userid level
1 2
2 6
3 3
4 4
How can this be accomplished by using HiveQL?
Thanks in advance.
Here is a solution you can try and use userid , next_level as output, however your output is inconsistent
select userid , level, tab1.exp , levelup_exp ,next ,next_level
from tab1 as tab1 , (
select level, levelup_exp
,lead(levelup_exp, 1) over (order by level) as next
,lead(level, 1) over (order by level) as next_level
from tab2) as tab2x
where tab1.exp > levelup_exp
and tab1.exp <= next
order by userid

Updating column based on another column's value

How do i update table structured like this:
id[pkey] | parent_id | position
1 1
2 1
3 1
4 1
5 1
6 2
7 2
8 2
9 2
10 3
11 3
12 3
...and so on
to achieve this result:
id[pkey] | parent_id | position
1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 2 1
7 2 2
8 2 3
9 2 4
10 3 1
11 3 2
12 3 3
...and so on
I was thinking about somehow mixing
SELECT DISTINCT parent_id FROM cats AS t;
with
CREATE SEQUENCE dpos;
UPDATE cats t1 SET position = nextval('dpos') WHERE t.parent_id = t1.parent_id;
DROP SEQUENCE dpos;
although im not really experienced with postgres, and not sure how to use some kind of FOREACH. I appreciate any help
You can get the incremental number using row_number(). The question is how to assign it to a particular row. Here is one method using a join:
update cats
set position = c2.newpos
from (select c2.*, c2.ctid as c_ctid,
row_number() over (partition by c2.parent_id order by NULL) as seqnum
from cats c2
) c2
where cats.parent_id = c2.parent_id and cats.ctid = c2.c_ctid;
Use row_number function
select parent_id,
row_number() over (partition by parent_id order by parent_id) as position_id from table
Try this:
UPDATE table_name set table_name.dataID = v_table_name.rn
FROM
(
SELECT row_number() over (partition by your_primaryKey order by your_primaryKey) AS rn, id
FROM table_name
) AS v_table_name
WHERE v_table_name.your_primaryKey = v_table_name.your_primaryKey;

Highest per each group

It's hard to show my actual table and data here so I'll describe my problem with a sample table and data:
create table foo(id int,x_part int,y_part int,out_id int,out_idx text);
insert into foo values (1,2,3,55,'BAK'),(2,3,4,77,'ZAK'),(3,4,8,55,'RGT'),(9,10,15,77,'UIT'),
(3,4,8,11,'UTL'),(3,4,8,65,'MAQ'),(3,4,8,77,'YTU');
Following is the table foo:
id x_part y_part out_id out_idx
-- ------ ------ ------ -------
3 4 8 11 UTL
3 4 8 55 RGT
1 2 3 55 BAK
3 4 8 65 MAQ
9 10 15 77 UIT
2 3 4 77 ZAK
3 4 8 77 YTU
I need to select all fields by sorting the highest id of each out_id.
Expected output:
id x_part y_part out_id out_idx
-- ------ ------ ------ -------
3 4 8 11 UTL
3 4 8 55 RGT
3 4 8 65 MAQ
9 10 15 77 UIT
Using PostgreSQL.
Postgres specific (and fastest) solution:
select distinct on (out_id) *
from foo
order by out_id, id desc;
Standard SQL solution using a window function (second fastest)
select id, x_part, y_part, out_id, out_idx
from (
select id, x_part, y_part, out_id, out_idx,
row_number() over (partition by out_id order by id desc) as rn
from foo
) t
where rn = 1
order by id;
Note that both solutions will only return each id once, even if there are multiple out_id values that are the same. If you want them all returned, use dense_rank() instead of row_number()
select *
from foo
where (id,out_id) in (
select max(id),out_id from foo group by out_id
) order by out_id
Finding max(val) := finding the record for which no larger val exists:
SELECT *
FROM foo f
WHERE NOT EXISTS (
SELECT 317
FROM foo nx
WHERE nx.out_id = f.out_id
AND nx.id > f.id
);