how do I pivot this on google BigQuery? - sql

This is the data that I am currently working with.
x
y
a
3
2
LL
5
2
LL
5
4
LL
3
4
LL
6
7
RR
8
7
RR
8
9
RR
6
9
RR
I am trying to pivot the table such that it becomes:
x1
x2
y1
y2
a
3
5
2
4
LL
6
8
7
9
RR
I've tried AGG and the PIVOT functions, but can't seem to get this to work, and this has to be done using onky Google BQ. The complete dataset is much larger, so I need a general solution.
Any help or suggestions would be appreciated. Thanks!

Below should be good start for you
select * from (
select * from
(select *, row_number() over(partition by a) pos from (select distinct x, a from your_table))
full outer join
(select *, row_number() over(partition by a) pos from (select distinct y, a from your_table))
using (a, pos)
)
pivot (any_value(x) as x, any_value(y) as y for pos in (1,2))
if applied to sample data in your question - output is
you can build that query dynamically (so you will not need to explicitly specify count of unique values) and use EXECUTE IMMEDIATE to run it - there are plenty examples of such technique here on SO!

Related

How to find the parent and child relation in sql

I have the data like below and trying to get the sum of time taken by parent.
Input
ID_P ID_C SLA FL
1 2 0.2 Y
2 3 0.5 N
3 4 0.5 N
8 9 1.5 Y
9 10 0.1 N
10 0.2 N
Expected output
ID_P Sum(SLA)
1 1.2
8 1.8
Can someone please help me with the SQL.
You can use a recursive query. The idea is to start from the parent rows - which, as I understand your data, are identified with column fl. Then you can follow the links to the children. The final step is aggregation:
with cte as (
select idp_p, id_c, sla from mytable where fl = 'Y'
union all
select c.id_p, t.id_c, t.sla
from cte c
inner join mytable t on t.id_p = c.id_c
)
select id_p, sum(sla) as sum_sla from cte group by id_p

Split function across multiple fields in BigQuery SQL

I have data like this:
Each column will have the same number of elements across a row, where the first element in the first column corresponds to the first element in the second column etc.
How can I flatten this to get the below?
With a single column I am able to do this by combining a CROSS JOIN with an UNNEST but I cannot get this to work with multiple columns since the join ends up creating multiple variations and UNNEST loses the order of the array so I can't match them.
If I were building the arrays from scratch, I would use some kind of STRUCT element in there, but I can't find a way of doing this when the arrays are created from a SPLIT()?
WITH_OFFSET is your friend here:
WITH strings AS (
SELECT "a,b,c" a, "aa,bb,cc" b
UNION ALL
SELECT "a1,b1,c1" a, "aa1,bb1,cc1" b
)
SELECT x_a, x_b
FROM strings
, UNNEST(SPLIT(a)) x_a WITH OFFSET o_a
JOIN UNNEST(SPLIT(b)) x_b WITH OFFSET o_b
ON o_a=o_b
Another approach for BigQuery Standard SQL is shown below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, 'a|b|c' col1, 'n|o|p' col2 UNION ALL
SELECT 2, 'd|e', 'q|r' UNION ALL
SELECT 3, 'f|g|h|i', 's|t|u|v' UNION ALL
SELECT 4, 'j', 'w' UNION ALL
SELECT 5, 'k|l|m', 'x|y|z'
)
SELECT
id,
SPLIT(col1, '|')[SAFE_ORDINAL(pos)] value1,
SPLIT(col2, '|')[SAFE_ORDINAL(pos)] value2
FROM `project.dataset.table`,
UNNEST(GENERATE_ARRAY(1, ARRAY_LENGTH(SPLIT(col1, '|')))) pos
with expected result
Row id value1 value2
1 1 a n
2 1 b o
3 1 c p
4 2 d q
5 2 e r
6 3 f s
7 3 g t
8 3 h u
9 3 i v
10 4 j w
11 5 k x
12 5 l y
13 5 m z

how to select one row from several rows with minimum value

The question based on SQL query to select distinct row with minimum value.
Consider the table:
id game point
1 x 1
1 y 10
1 z 1
2 x 2
2 y 5
2 z 8
Using suggested answers from mentioned question (select the ids that have the minimum value in the point column, grouped by game) we obtain
id game point
1 x 1
1 z 1
2 x 2
The question is how to obtain answer with single output for each ID. Both outputs
id game point
1 x 1
2 x 2
and
id game point
1 z 1
2 x 2
are acceptable.
Use row_number():
select t.*
from (select t.*,
row_number() over (partition by id order by point asc) as seqnum
from t
) t
where seqnum = 1;
We assume that all point entries are distinct(for each id and it's game so we can obtain the minimum of each id with it's game), Using a subquery and an inner join with two conditions would give you the result you,re waiting for.If it doesnt work with you I got another solution :
SELECT yt.*,
FROM Yourtable yt INNER JOIN
(
SELECT ID, MIN(point) MinPoint
FROM Yourtable
GROUP BY ID
) t ON yt.ID = t.ID AND yt.Record_Date = yt.MinDate

Get a Group ID from Bridge table

I'm trying to get a Group ID key from a bridge table that looks something like this:
GROUP_KEY DIM_KEY
1 11
1 12
1 13
1 14
2 11
2 12
3 11
3 12
3 13
3 15
4 11
5 11
5 12
5 13
I've searched a little bit a got this query:
SELECT b1.group_key
FROM BRIDGE b1
JOIN BRIDGE b2 ON (b1.group_key= b2.group_key)
JOIN BRIDGE b3 ON (b1.group_key= b3.group_key)
WHERE b1.dim_key = 11
AND b2.dim_key = 12
AND b3.dim_key = 13;
But this gets me 1,3 and 5, and I only want the 5, I can filter it further with a a count = 3, but my question is, is there a better way ? I'm using PL/SQL btw.
EDIT
if you are using Oracle 11g, try the following
SELECT group_key FROM (
SELECT GROUP_KEY, listagg(DIM_KEY, ',') WITHIN GROUP(ORDER BY DIM_KEY) DIM_KEY
FROM t
GROUP BY GROUP_KEY) WHERE dim_key = '11,12,13'
I donĀ“t really know what you want. But if you want to the count to be 3. Then you can do it like this:
WITH CTE
(
SELECT
COUNT(GROUP_KEY) OVER(PARTITION BY GROUP_KEY) AS Counts,
BRIDGE.*
FROM
BRIDGE
)
SELECT
*
FROM
CTE
WHERE
CTE.Counts=3
AND CTE.dim_key IN(11,12,13);

Using random value as join condition

I am generating some test-data and use dbms_random. I encountered some strange behavior when using dbms_random in the condition of the JOIN, that I can not explain:
------------------------# test-data (ids 1 .. 3)
With x As (
Select Rownum id From dual
Connect By Rownum <= 3
)
------------------------# end of test-data
Select x.id,
x2.id id2
From x
Join x x2 On ( x2.id = Floor(dbms_random.value(1, 4)) )
Floor(dbms_random.value(1, 4) ) returns a random number out of (1,2,3), so I would have expected all rows of x to be joined with a random row of x2, or maybe always the same random row of x2 in case the random number is evaluated only once.
When trying several times, I get results like that, though:
(1) ID ID2 (2) ID ID2 (3)
---- ---- ---- ---- no rows selected.
1 2 1 3
1 3 2 3
2 2 3 3
2 3
3 2
3 3
What am I missing?
EDIT:
SELECT ROWNUM, FLOOR(dbms_random.VALUE (1, 4))
FROM dual CONNECT BY ROWNUM <= 3
would get the result in this case, but why does the original query behave like that?
To generate three rows with one predictable value and one random value, try this:
SQL> with x as (
2 select rownum id from dual
3 connect by rownum <= 3
4 )
5 , y as (
6 select floor(dbms_random.value(1, 4)) floor_val
7 from dual
8 )
9 select x.id,
10 y.floor_val
11 from x
12 cross join y
13 /
ID FLOOR_VAL
---------- ----------
1 2
2 3
3 2
SQL
edit
Why did your original query return an inconsistent set of rows?
Well, without the random bit in the ON clause your query was basically a CROSS JOIN of X against X - it would have returned nine rows (at least it would have if the syntax had allowed it). Each of those nine rows executes a call to DBMS_RANDOM.VALUE(). Only when the random value matches the current value of X2.ID is the row included in the result set. Consequently the query can return 0-9 rows, randomly.
Your solution is obviously simpler - I didn't refactor enough :)