Snowflake: Repeating rows based on column value - sql

How to repeat rows based on column value in snowflake using sql.
I tried a few methods but not working such as dual and connect by.
I have two columns: Id and Quantity.
For each ID, there are different values of Quantity.

So if you have a count, you can use a generator:
with ten_rows as (
select row_number() over (order by null) as rn
from table(generator(ROWCOUNT=>10))
), data(id, count) as (
select * from values
(1,2),
(2,4)
)
SELECT
d.*
,r.rn
from data as d
join ten_rows as r
on d.count >= r.rn
order by 1,3;
ID
COUNT
RN
1
2
1
1
2
2
2
4
1
2
4
2
2
4
3
2
4
4

Ok let's start by generating some data. We will create 10 rows, with a QTY. The QTY will be randomly chosen as 1 or 2.
Next we want to duplicate the rows with a QTY of 2 and leave the QTY =1 as they are.
Obviously you can change all parameters above to suit your needs - this solution works super fast and in my opinion way better than table generation.
Simply stack SPLIT_TO_TABLE(), REPEAT() with a LATERAL() join and voila.
WITH TEN_ROWS AS (SELECT ROW_NUMBER()OVER(ORDER BY NULL)SOME_ID,UNIFORM(1,2,RANDOM())QTY FROM TABLE(GENERATOR(ROWCOUNT=>10)))
SELECT
TEN_ROWS.*
FROM
TEN_ROWS,LATERAL SPLIT_TO_TABLE(REPEAT('hire me $10/hour',QTY-1),'hire me $10/hour')ALTERNATIVE_APPROACH;

Related

JOIN on aggregate function

I have a table showing production steps (PosID) for a production order (OrderID) and which machine (MachID) they will be run on; I’m trying to reduce the table to show one record for each order – the lowest position (field “PosID”) that is still open (field “Open” = Y); i.e. the next production step for the order.
Example data I have:
OrderID
PosID
MachID
Open
1
1
A
N
1
2
B
Y
1
3
C
Y
2
4
C
Y
2
5
D
Y
2
6
E
Y
Example result I want:
OrderID
PosID
MachID
1
2
B
2
4
C
I’ve tried two approaches, but I can’t seem to get either to work:
I don’t want to put “MachID” in the GROUP BY because that gives me all the records that are open, but I also don’t think there is an appropriate aggregate function for the “MachID” field to make this work.
SELECT “OrderID”, MIN(“PosID”), “MachID”
FROM Table T0
WHERE “Open” = ‘Y’
GROUP BY “OrderID”
With this approach, I keep getting error messages that T1.”PosID” (in the JOIN clause) is an invalid column. I’ve also tried T1.MIN(“PosID”) and MIN(T1.”PosID”).
SELECT T0.“OrderID”, T0.“PosID”, T0.“MachID”
FROM Table T0
JOIN
(SELECT “OrderID”, MIN(“PosID”)
FROM Table
WHERE “Open” = ‘Y’
GROUP BY “OrderID”) T1
ON T0.”OrderID” = T1.”OrderID”
AND T0.”PosID” = T1.”PosID”
Try this:
SELECT “OrderID”,“PosID”,“MachID” FROM (
SELECT
T0.“OrderID”,
T0.“PosID”,
T0.“MachID”,
ROW_NUMBER() OVER (PARTITION BY “OrderID” ORDER BY “PosID”) RNK
FROM Table T0
WHERE “Open” = ‘Y’
) AS A
WHERE RNK = 1
I've included the brackets when selecting columns as you've written it in the question above but in general it's not needed.
What it does is it first filters open OrderIDs and then numbers the OrderIDs from 1 to X which are ordered by PosID
OrderID
PosID
MachID
Open
RNK
1
2
B
Y
1
1
3
C
Y
2
2
4
C
Y
1
2
5
D
Y
2
2
6
E
Y
3
After it filters on the "rnk" column indicating the lowest PosID per OrderID. ROW_NUMBER() in the select clause is called a window function and there are many more which are quite useful.
P.S. Above solution should work for MSSQL

SQL Query get common column with diff values in other columns

I am not very fluent with SQL.. Im just facing a little issue in making the best and efficient sql query. I have a table with a composite key of column A and B as shown below
A
B
C
1
1
4
1
2
5
1
3
3
2
2
4
2
1
5
3
1
4
So what I need is to find rows where column C has both values of 4 and 5 (4 and 5 are just examples) for a particular value of column A. So 4 and 5 are present for two A values (1 and 2). For A value 3, 4 is present but 5 is not, hence we cannot take it.
My explanation is so confusing. I hope you get it.
After this, I need to find only those where B value for 4 (First Number) is less than B value for 5 (Second Number). In this case, for A=1, Row 1 (A-1, B-1,C-4) has B value lesser than Row 2 (A-1, B-2, C-5) So we take this row. For A = 2, Row 1(A-2,B-2,C-4) has B value greater than Row 2 (A-2,B-1,C-5) hence we cannot take it.
I Hope someone gets it and helps. Thanks.
Rows containing both c=4 and c=5 for a given a and ordered by b and by c the same way.
select a, b, c
from (
select tbl.*,
count(*) over(partition by a) cnt,
row_number() over (partition by a order by b) brn,
row_number() over (partition by a order by c) crn
from tbl
where c in (4, 5)
) t
where cnt = 2 and brn = crn;
EDIT
If an order if parameters matters, the position of the parameter must be set explicitly. Comparing b ordering to explicit parameter position
with params(val, pos) as (
select 4,2 union all
select 5,1
)
select a, b, c
from (
select tbl.*,
count(*) over(partition by a) cnt,
row_number() over (partition by a order by b) brn,
p.pos
from tbl
join params p on tbl.c = p.val
) t
where cnt = (select count(*) from params) and brn = pos;
I assume you want the values of a where this is true. If so, you can use aggregation:
select a
from t
where c in (4, 5)
group by a
having count(distinct c) = 2;

Populate a sql table with duplicate data except for one column

I have a sql table :
Levels
LevelId Min Product
1 x 1
2 y 1
3 z 1
4 a 1
I need to duplicate the same data into the database by changing only the product Id from 1 2,3.... 40
example
LevelId Min Product
1 x 2
2 y 2
3 z 2
4 a 2
I could do something like
INSERT INTO dbo.Levels SELECT top 4 * fROM dbo.Levels
but that would just copy paste the data.
Is there a way I can copy the data and paste it changing only the Product value?
You're most of the way there - you just need to take one more logical step:
INSERT INTO dbo.Levels (LevelID, Min, Product)
SELECT LevelID, Min, 2 FROM dbo.Levels WHERE Product = 1
...will duplicate your rows with a different product ID.
Also consider that WHERE Product = 1 is going to be more reliable than TOP 4. Once you have more than four rows in the table, you will not be able to guarantee that TOP 4 will return the same four rows unless you also add an ORDER BY to the select, however WHERE Product = ... will always return the same rows, and will continue to work even if you add an extra row with a product ID of 1 (where as you'd have to consider changing TOP 4 to TOP 5, and so on if extra rows are added).
You can generate the product id's and then load them in:
with cte as (
select 2 as n
union all
select n + 1
from cte
where n < 40
)
INSERT INTO dbo.Levels(`min`, product)
SELECT `min`, cte.n as product
fROM dbo.Levels l cross join
cte
where l.productId = 1;
This assumes that the LevelId is an identity column, that auto-increments on insert. If not:
with cte as (
select 2 as n
union all
select n + 1
from cte
where n < 40
)
INSERT INTO dbo.Levels(levelid, `min`, product)
SELECT l.levelid+(cte.n-1)*4, `min`, cte.n as product
fROM dbo.Levels l cross join
cte
where l.productId = 1;
INSERT INTO dbo.Levels (LevelId, Min, Product)
SELECT TOP 4
LevelId,
Min,
2
FROM dbo.Levels
You can include expressions in the SELECT statement, either hard-coded values or something like Product + 1 or anything else.
I expect you probably wouldn't want to insert the LevelId though, but left that there to match your sample. If you don't want that just remove it from the INSERT and SELECT sections.
You could use a CROSS JOIN against a numbers table, for example.
WITH
L0 AS(SELECT 1 AS C UNION ALL SELECT 1 AS O), -- 2 rows
L1 AS(SELECT 1 AS C FROM L0 AS A CROSS JOIN L0 AS B), -- 4 rows
Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS N FROM L1)
SELECT
lvl.[LevelID],
lvl.[Min],
num.[N]
FROM dbo.[Levels] lvl
CROSS JOIN Nums num
This would duplicate 4 times.

SQL random number that doesn't repeat within a group

Suppose I have a table:
HH SLOT RN
--------------
1 1 null
1 2 null
1 3 null
--------------
2 1 null
2 2 null
2 3 null
I want to set RN to be a random number between 1 and 10. It's ok for the number to repeat across the entire table, but it's bad to repeat the number within any given HH. E.g.,:
HH SLOT RN_GOOD RN_BAD
--------------------------
1 1 9 3
1 2 4 8
1 3 7 3 <--!!!
--------------------------
2 1 2 1
2 2 4 6
2 3 9 4
This is on Netezza if it makes any difference. This one's being a real headscratcher for me. Thanks in advance!
To get a random number between 1 and the number of rows in the hh, you can use:
select hh, slot, row_number() over (partition by hh order by random()) as rn
from t;
The larger range of values is a bit more challenging. The following calculates a table (called randoms) with numbers and a random position in the same range. It then uses slot to index into the position and pull the random number from the randoms table:
with nums as (
select 1 as n union all select 2 union all select 3 union all select 4 union all select 5 union all
select 6 union all select 7 union all select 8 union all select 9
),
randoms as (
select n, row_number() over (order by random()) as pos
from nums
)
select t.hh, t.slot, hnum.n
from (select hh, randoms.n, randoms.pos
from (select distinct hh
from t
) t cross join
randoms
) hnum join
t
on t.hh = hnum.hh and
t.slot = hnum.pos;
Here is a SQLFiddle that demonstrates this in Postgres, which I assume is close enough to Netezza to have matching syntax.
I am not an expert on SQL, but probably do something like this:
Initialize a counter CNT=1
Create a table such that you sample 1 row randomly from each group and a count of null RN, say C_NULL_RN.
With probability C_NULL_RN/(10-CNT+1) for each row, assign CNT as RN
Increment CNT and go to step 2
Well, I couldn't get a slick solution, so I did a hack:
Created a new integer field called rand_inst.
Assign a random number to each empty slot.
Update rand_inst to be the instance number of that random number within this household. E.g., if I get two 3's, then the second 3 will have rand_inst set to 2.
Update the table to assign a different random number anywhere that rand_inst>1.
Repeat assignment and update until we converge on a solution.
Here's what it looks like. Too lazy to anonymise it, so the names are a little different from my original post:
/* Iterative hack to fill 6 slots with a random number between 1 and 13.
A random number *must not* repeat within a household_id.
*/
update c3_lalfinal a
set a.rand_inst = b.rnum
from (
select household_id
,slot_nbr
,row_number() over (partition by household_id,rnd order by null) as rnum
from c3_lalfinal
) b
where a.household_id = b.household_id
and a.slot_nbr = b.slot_nbr
;
update c3_lalfinal
set rnd = CAST(0.5 + random() * (13-1+1) as INT)
where rand_inst>1
;
/* Repeat until this query returns 0: */
select count(*) from (
select household_id from c3_lalfinal group by 1 having count(distinct(rnd)) <> 6
) x
;

Returning several rows from a single query, based on a value of a column

Let's say I have this table:
|Fld | Number|
1 5
2 2
And I want to make a select that retrieves as many Fld as the Number field has:
|Fld |
1
1
1
1
1
2
2
How can I achieve this? I was thinking about making a temporary table and instert data based on the Number, but I was wondering if this could be done with a single Select statement.
PS: I'm new to SQL
You can join with a numbers table:
SELECT Fld
FROM yourtable
JOIN Numbers
ON yourtable.Number <= Numbers.Number
A numbers table is just a table with a list of numbers:
Number
1
2
3
etc...
Not an great solution (since you still query your table twice, but maybe you can work from it)
SELECT t1.fld, t1.number
FROM table t1, (
SELECT ROWNUM number FROM dual
CONNECT BY LEVEL <= (SELECT MAX(number) FROM t1)) t2
WHERE t2.number<=t1.number
It generates maximum amount of rows needed and then filters it by each row.
I don't know if your RDBMS version supports it (although I rather suspect it does), but here is a recursive version:
WITH remaining (fld, times) as (SELECT fld, 1
FROM <table>
UNION ALL
SELECT a.fld, a.times + 1
FROM remaining as a
JOIN <table> as b
ON b.fld = a.fld
AND b.number > a.times)
SELECT fld
FROM remaining
ORDER BY fld
Given your source data table, it outputs this (count included for verification):
fld times
=============
1 1
1 2
1 3
1 4
1 5
2 1
2 2