SQL random number that doesn't repeat within a group - sql

Suppose I have a table:
HH SLOT RN
--------------
1 1 null
1 2 null
1 3 null
--------------
2 1 null
2 2 null
2 3 null
I want to set RN to be a random number between 1 and 10. It's ok for the number to repeat across the entire table, but it's bad to repeat the number within any given HH. E.g.,:
HH SLOT RN_GOOD RN_BAD
--------------------------
1 1 9 3
1 2 4 8
1 3 7 3 <--!!!
--------------------------
2 1 2 1
2 2 4 6
2 3 9 4
This is on Netezza if it makes any difference. This one's being a real headscratcher for me. Thanks in advance!

To get a random number between 1 and the number of rows in the hh, you can use:
select hh, slot, row_number() over (partition by hh order by random()) as rn
from t;
The larger range of values is a bit more challenging. The following calculates a table (called randoms) with numbers and a random position in the same range. It then uses slot to index into the position and pull the random number from the randoms table:
with nums as (
select 1 as n union all select 2 union all select 3 union all select 4 union all select 5 union all
select 6 union all select 7 union all select 8 union all select 9
),
randoms as (
select n, row_number() over (order by random()) as pos
from nums
)
select t.hh, t.slot, hnum.n
from (select hh, randoms.n, randoms.pos
from (select distinct hh
from t
) t cross join
randoms
) hnum join
t
on t.hh = hnum.hh and
t.slot = hnum.pos;
Here is a SQLFiddle that demonstrates this in Postgres, which I assume is close enough to Netezza to have matching syntax.

I am not an expert on SQL, but probably do something like this:
Initialize a counter CNT=1
Create a table such that you sample 1 row randomly from each group and a count of null RN, say C_NULL_RN.
With probability C_NULL_RN/(10-CNT+1) for each row, assign CNT as RN
Increment CNT and go to step 2

Well, I couldn't get a slick solution, so I did a hack:
Created a new integer field called rand_inst.
Assign a random number to each empty slot.
Update rand_inst to be the instance number of that random number within this household. E.g., if I get two 3's, then the second 3 will have rand_inst set to 2.
Update the table to assign a different random number anywhere that rand_inst>1.
Repeat assignment and update until we converge on a solution.
Here's what it looks like. Too lazy to anonymise it, so the names are a little different from my original post:
/* Iterative hack to fill 6 slots with a random number between 1 and 13.
A random number *must not* repeat within a household_id.
*/
update c3_lalfinal a
set a.rand_inst = b.rnum
from (
select household_id
,slot_nbr
,row_number() over (partition by household_id,rnd order by null) as rnum
from c3_lalfinal
) b
where a.household_id = b.household_id
and a.slot_nbr = b.slot_nbr
;
update c3_lalfinal
set rnd = CAST(0.5 + random() * (13-1+1) as INT)
where rand_inst>1
;
/* Repeat until this query returns 0: */
select count(*) from (
select household_id from c3_lalfinal group by 1 having count(distinct(rnd)) <> 6
) x
;

Related

How to update table by of the records in the table

I have a table in PostgerSQL and I need to make N entries in the table twice and for the first half I need to fill in the partner_id field with the value 1 and the second half with the value partner_id = 2.
i try to `
update USERS_TABLE set user_rule_id = 1;
update USERS_TABLE set user_rule_id = 2 where USERS_TABLE.id > count(*)/2;
`
I depends a lot how precise the number of users have to be that are updated with 1 or 2.
The following would be quite unprecise,a s it doesn't take the exact number of user that already exist8after deleting some rows the numbers doesn't fit anymore.
SELECT * FROM USERS_TABLE
id
user_rule_id
1
1
2
1
3
2
4
2
5
2
SELECT 5
If you have a lot of deleted rows and want still the half of the users, you can choose following approach, which does rely on the id, but at teh actual row number
UPDATE USERS_TABLE1
set user_rule_id = CASE WHEN rn <= (SELECT count(*) FROM USERS_TABLE1)/ 2 then 1
ELSE 2 END
FROM (SELECT id, ROW_NUMBER() OVER( ORDER BY id) rn FROM USERS_TABLE1) t
WHERE USERS_TABLE1.id = t.id;
UPDATE 5
SELECT * FROM USERS_TABLE1
id
user_rule_id
1
1
2
1
3
2
4
2
5
2
SELECT 5
fiddle
In the sample case it it the same result, but when you have a lot of rows and a bunch of the deleted users, the senind will give you quite a good result

Snowflake: Repeating rows based on column value

How to repeat rows based on column value in snowflake using sql.
I tried a few methods but not working such as dual and connect by.
I have two columns: Id and Quantity.
For each ID, there are different values of Quantity.
So if you have a count, you can use a generator:
with ten_rows as (
select row_number() over (order by null) as rn
from table(generator(ROWCOUNT=>10))
), data(id, count) as (
select * from values
(1,2),
(2,4)
)
SELECT
d.*
,r.rn
from data as d
join ten_rows as r
on d.count >= r.rn
order by 1,3;
ID
COUNT
RN
1
2
1
1
2
2
2
4
1
2
4
2
2
4
3
2
4
4
Ok let's start by generating some data. We will create 10 rows, with a QTY. The QTY will be randomly chosen as 1 or 2.
Next we want to duplicate the rows with a QTY of 2 and leave the QTY =1 as they are.
Obviously you can change all parameters above to suit your needs - this solution works super fast and in my opinion way better than table generation.
Simply stack SPLIT_TO_TABLE(), REPEAT() with a LATERAL() join and voila.
WITH TEN_ROWS AS (SELECT ROW_NUMBER()OVER(ORDER BY NULL)SOME_ID,UNIFORM(1,2,RANDOM())QTY FROM TABLE(GENERATOR(ROWCOUNT=>10)))
SELECT
TEN_ROWS.*
FROM
TEN_ROWS,LATERAL SPLIT_TO_TABLE(REPEAT('hire me $10/hour',QTY-1),'hire me $10/hour')ALTERNATIVE_APPROACH;

Keyset pagination with composite key

I am using oracle 12c database and I have a table with the following structure:
Id NUMBER
SeqNo NUMBER
Val NUMBER
Valid VARCHAR2
A composite primary key is created with the field Id and SeqNo.
I would like to fetch the data with Valid = 'Y' and apply ketset pagination with a page size of 3. Assume I have the following data:
Id SeqNo Val Valid
1 1 10 Y
1 2 20 N
1 3 30 Y
1 4 40 Y
1 5 50 Y
2 1 100 Y
2 2 200 Y
Expected result:
----------------------------
Page 1
----------------------------
Id SeqNo Val Valid
1 1 10 Y
1 3 30 Y
1 4 40 Y
----------------------------
Page 2
----------------------------
Id SeqNo Val Valid
1 5 50 Y
2 1 100 Y
2 2 200 Y
Offset pagination can be done like this:
SELECT * FROM table ORDER BY Id, SeqNo OFFSET 3 ROWS FETCH NEXT 3 ROWS ONLY;
However, in the actual db it has more than 5 millions of records and using OFFSET is going to slow down the query a lot. Therefore, I am looking for a ketset pagination approach (skip records using some unique fields instead of OFFSET)
Since a composite primary key is used, I need to offset the page with information from more than 1 field.
This is a sample SQL that should work in PostgreSQL (fetch 2nd page):
SELECT * FROM table WHERE (Id, SeqNo) > (1, 4) AND Valid = 'Y' ORDER BY Id, SeqNo LIMIT 3;
How do I achieve the same in oracle?
Use row_number() analytic function with ceil arithmetic fuction. Arithmetic functions don't have a negative impact on performance, and row_number() over (order by ...) expression automatically orders the data without considering the insertion order, and without adding an extra order by clause for the main query. So, consider :
select Id,SeqNo,
ceil(row_number() over (order by Id,SeqNo)/3) as page
from tab
where Valid = 'Y';
P.S. It also works for Oracle 11g, while OFFSET 3 ROWS FETCH NEXT 3 ROWS ONLY works only for Oracle 12c.
Demo
You can use order by and then fetch rows using fetch and offset like following:
Select ID, SEQ, VAL, VALID FROM TABLE
WHERE VALID = 'Y'
ORDER BY ID, SEQ
--FETCH FIRST 3 ROWS ONLY -- first page
--OFFSET 3 ROWS FETCH NEXT 3 ROWS ONLY -- second pages
--OFFSET 6 ROWS FETCH NEXT 3 ROWS ONLY -- third page
--Update--
You can use row_number analytical function as following.
Select id, seqNo, Val, valid from
(Select t.*,
Row_number(order by id, seq) as rn from table t
Where valid = 'Y')
Where ceil(rn/3) = 2 -- for page no. 2
Cheers!!

SQL group table by "leading rows" without pl/sql

I have this table (short example) with two columns
1 a
2 a
3 a3
4 a
5 a
6 a6
7 a
8 a8
9 a
and I would like to group/partition them into groups separated by those leading "a", ideally to add another column like this, so I can address those groups easily.
1 a 0
2 a 0
3 a3 3
4 a 3
5 a 3
6 a6 6
7 a 6
8 a8 8
9 a 8
problem is that setup of the table is dynamic so I can't use staticly lag or lead functions, any ideas how to do this without pl/sql in postgres version 9.5
Assuming the leading part is a single character. Hence the expression right(data, -1) works to extract the group name. Adapt to your actual prefix.
The solution uses two window functions, which can't be nested. So we need a subquery or a CTE.
SELECT id, data
, COALESCE(first_value(grp) OVER (PARTITION BY grp_nr ORDER BY id), '0') AS grp
FROM (
SELECT *, NULLIF(right(data, -1), '') AS grp
, count(NULLIF(right(data, -1), '')) OVER (ORDER BY id) AS grp_nr
FROM tbl
) sub;
Produces your desired result exactly.
NULLIF(right(data, -1), '') to get the effective group name or NULL if none.
count() only counts non-null values, so we get a higher count for every new group in the subquery.
In the outer query, we take the first grp value per grp_nr as group name and default to '0' with COALESCE for the first group without name (which has a NULL as group name so far).
We could use min() or max() as outer window function as well, since there is only one non-null value per partition anyway. first_value() is probably cheapest since the rows are sorted already.
Note the group name grp is data type text. You may want to cast to integer, if those are clean (and reliably) integer numbers.
This can be achieved by setting rows containing a to a specific value and all the other rows to a different value. Then use a cumulative sum to get the desired number for the rows. The group number is set to the next number when a new value in the val column is encountered and all the proceeding rows with a will have the same group number as the one before and this continues.
I assume that you would need a distinct number for each group and the number doesn't matter.
select id, val, sum(ex) over(order by id) cm_sum
from (select t.*
,case when val = 'a' then 0 else 1 end ex
from t) x
The result for the query above with the data in question, would be
id val cm_sum
--------------
1 a 0
2 a 0
3 a3 1
4 a 1
5 a 1
6 a6 2
7 a 2
8 a8 3
9 a 3
With the given data, you can use a cumulative max:
select . . .,
coalesce(max(substr(col2, 2)) over (order by col1), 0)
If you don't strictly want the maximum, then it gets a bit more difficult. The ANSI solution is to use the IGNORE NULLs option on LAG(). However, Postgres does not (yet) support that. An alternative is:
select . . ., coalesce(substr(reft.col2, 2), 0)
from (select . . .,
max(case when col2 like 'a_%' then col1 end) over (order by col1) as ref_col1
from t
) tt join
t reft
on tt.ref_col1 = reft.col1
You can also try this :
with mytable as (select split_part(t,' ',1)::integer id,split_part(t,' ',2) myvalue
from (select unnest(string_to_array($$1 a;2 a;3 a3;4 a;5 a;6 a6;7 a;8 a8;9 a$$,
';'))t) a)
select id,myvalue,myresult from mytable join (
select COALESCE(NULLIF(substr(myvalue,2),''),'0') myresult,idmin id_down
,COALESCE(lead(idmin) over (order by myvalue),999999999999) id_up
from (
select myvalue,min(id) idmin from mytable group by 1
) a) b
on id between id_down and id_up-1

Generating Random Number In Each Row In Oracle Query

I want to select all rows of a table followed by a random number between 1 to 9:
select t.*, (select dbms_random.value(1,9) num from dual) as RandomNumber
from myTable t
But the random number is the same from row to row, only different from each run of the query. How do I make the number different from row to row in the same execution?
Something like?
select t.*, round(dbms_random.value() * 8) + 1 from foo t;
Edit:
David has pointed out this gives uneven distribution for 1 and 9.
As he points out, the following gives a better distribution:
select t.*, floor(dbms_random.value(1, 10)) from foo t;
At first I thought that this would work:
select DBMS_Random.Value(1,9) output
from ...
However, this does not generate an even distribution of output values:
select output,
count(*)
from (
select round(dbms_random.value(1,9)) output
from dual
connect by level <= 1000000)
group by output
order by 1
1 62423
2 125302
3 125038
4 125207
5 124892
6 124235
7 124832
8 125514
9 62557
The reasons are pretty obvious I think.
I'd suggest using something like:
floor(dbms_random.value(1,10))
Hence:
select output,
count(*)
from (
select floor(dbms_random.value(1,10)) output
from dual
connect by level <= 1000000)
group by output
order by 1
1 111038
2 110912
3 111155
4 111125
5 111084
6 111328
7 110873
8 111532
9 110953
you don’t need a select … from dual, just write:
SELECT t.*, dbms_random.value(1,9) RandomNumber
FROM myTable t
If you just use round then the two end numbers (1 and 9) will occur less frequently, to get an even distribution of integers between 1 and 9 then:
SELECT MOD(Round(DBMS_RANDOM.Value(1, 99)), 9) + 1 FROM DUAL