How do I pivot in big query - google-bigquery

Say I have data
id,col1,col2,col3,col4,col5
1,a,b,c,d,e
and I want the result to be ...
1,a
1,b
1,c
1,d
1,e
How do I pivot on id in big query ?

Below is for BigQuery Standard SQL
#standardSQL
CREATE TEMP FUNCTION cols_to_rows(root STRING) AS (
ARRAY(SELECT REPLACE(SPLIT(kv, ':') [OFFSET(1)], '"', '') cols
FROM UNNEST(SPLIT(REGEXP_REPLACE(root, r'^{|}$', ''))) kv
WHERE SPLIT(kv, ':') [OFFSET(0)] != '"id"'
)
);
SELECT id, col
FROM `project.dataset.table` t,
UNNEST(cols_to_rows(TO_JSON_STRING(t))) col
You can test / play with above using dummy data as below
#standardSQL
CREATE TEMP FUNCTION cols_to_rows(root STRING) AS (
ARRAY(SELECT REPLACE(SPLIT(kv, ':') [OFFSET(1)], '"', '') cols
FROM UNNEST(SPLIT(REGEXP_REPLACE(root, r'^{|}$', ''))) kv
WHERE SPLIT(kv, ':') [OFFSET(0)] != '"id"'
)
);
WITH `project.dataset.table` AS (
SELECT 1 id, 'a' col1, 'b' col2, 'c' col3, 'd' col4, 'e' col5 UNION ALL
SELECT 2 id, 'x', 'y', 'z', 'v', 'w'
)
SELECT id, col
FROM `project.dataset.table` t,
UNNEST(cols_to_rows(TO_JSON_STRING(t))) col
with result as
id col
1 a
1 b
1 c
1 d
1 e
2 x
2 y
2 z
2 v
2 w

Related

SQL Subquery with delimiter

I need to be able to split one string by the delimiter * into separate columns without including *
The column y from table x looks like this:
column y
*1HS*AB*GXX*123*02*PA45*2013-08-10*
*1R1*B*GX*123*02*PA45*2013-08-10*
*1HS*B*GX*13*01*PA45*2013-08-01*
*1P*C*GXX*123*02*PA45*2013-08-10*
STRING_SPLIT is not avalible
The outcome should be this:
Column1 Column2 Column3 Column4 Column5 Column6 Column7
1HS AB GXX 123 2 PA45 10-08-2013
1R1 B GX 123 2 PA45 10-08-2013
1HS B GX 13 1 PA45 01-08-2013
1P C GXX 123 2 PA45 10-08-2013
will you use the below query..
select RTRIM (REGEXP_SUBSTR (column y, '[^,]*,', 1, 1), ',') AS column 1
, RTRIM (REGEXP_SUBSTR (column y, '[^,]*,', 1, 2), ',') AS column 2
, RTRIM (REGEXP_SUBSTR (column y, '[^,]*,', 1, 3), ',') AS column 3
, LTRIM (REGEXP_SUBSTR (column y, ',[^,]*', 1, 3), ',') AS column 4
from YOUR_TABLE
Unfortunately, string_split() does not guarantee that it preserves the ordering of the values. And, SQL Server does not offer other useful string functions.
So, I recommend using recursive CTEs for this purpose:
with t as (
select *
from (values ('*1HS*AB*GXX*123*02*PA45*2013-08-10*'), ('1HSB*GX*13*01*PA45*2013-08-01*')) v(str)
),
cte as (
select convert(varchar(max), null) as val, 0 as lev, convert(varchar(max), str) as rest,
row_number() over (order by (select null)) as id
from t
union all
select left(rest, charindex('*', rest) - 1), lev + 1, stuff(rest, 1, charindex('*', rest) + 1, ''), id
from cte
where rest <> '' and lev < 10
)
select max(case when lev = 1 then val end) as col1,
max(case when lev = 2 then val end) as col2,
max(case when lev = 3 then val end) as col3,
max(case when lev = 4 then val end) as col4,
max(case when lev = 5 then val end) as col5,
max(case when lev = 6 then val end) as col6,
max(case when lev = 7 then val end) as col7
from cte
where lev > 0
group by cte.id;
Here is a db<>fiddle.
Assuming you can add a table valued function to your database then Jeff Moden's string split function is the best approach I've encountered. It will allow you to maintain order as well.
Find details here

order columns by their value

I've got a table A with 3 columns that contains the same data, for exemple:
TABLE A
KEY COL1 COL2 COL3
1 A B C
2 B C null
3 A null null
4 D E F
5 null C B
6 B C A
7 D E F
As a result I expect the distinct values of this table and the order doesn't matter. So key 1 and 6 are the same and 2 and 5 also and 4 and 7. The rest is different.
Ofcourse, I can't use a distinct in my select that will only filter 4 and 7.
I could use a very complex case statement, or a select in a select with an order by. But this needs to be used in a conversion, so performance is an issue here.
Does anyone have a good performant way to do this?
The result I expect
COL1 COL2 COL3
A B C
B C null
A null null
D E F
If you can have many columns then you can UNPIVOT then order the values and then PIVOT and take the DISTINCT rows:
Oracle Setup:
CREATE TABLE table_name ( KEY, COL1, COL2, COL3 ) AS
SELECT 1, 'A', 'B', 'C' FROM DUAL UNION ALL
SELECT 2, 'B', 'C', null FROM DUAL UNION ALL
SELECT 3, 'A', null, null FROM DUAL UNION ALL
SELECT 4, 'D', 'E', 'F' FROM DUAL UNION ALL
SELECT 5, null, 'C', 'B' FROM DUAL UNION ALL
SELECT 6, 'B', 'C', 'A' FROM DUAL UNION ALL
SELECT 7, 'D', 'E', 'F' FROM DUAL
Query:
SELECT DISTINCT
COL1, COL2, COL3
FROM (
SELECT key,
value,
ROW_NUMBER() OVER ( PARTITION BY key ORDER BY value ) AS rn
FROM table_name
UNPIVOT ( value FOR name IN ( COL1, COL2, COL3 ) ) u
)
PIVOT ( MAX( value ) FOR rn IN (
1 AS COL1,
2 AS COL2,
3 AS COL3
) )
Output:
COL1 | COL2 | COL3
:--- | :--- | :---
A | B | C
B | C | null
D | E | F
A | null | null
db<>fiddle here
The complicated case expression is going to have the best performance. But the simplest method is going to be conditional aggregation:
select key,
max(case when seqnum = 1 then col end) as col1,
max(case when seqnum = 2 then col end) as col2,
max(case when seqnum = 3 then col end) as col3
from (select key,col,
row_number() over (partition by key order by col asc) as seqnum
from ((select key, col1 as col from t) union all
(select key, col2 as col from t) union all
(select key, col3 as col from t)
) kc
where col is not null
) kc
group by key;

How to get multiple comma-separated values as individual columns?

I have a table with data like this:
select * from data
id | col1 | col2 | col3
---+-------+-------+-------
1 | 1,2,3 | 4,5,6 | 7,8,9
I want to get the data like this:
id | name | dd | fn | suf
---+------+----+----+-----
1 | col1 | 1 | 2 | 3
1 | col2 | 4 | 5 | 6
1 | col3 | 7 | 8 | 9
Currently, I use split_part() in a query like this:
SELECT * from(
select id,
'col1' as name,
NULLIF(split_part(col1, ',', 1), '') AS dd,
NULLIF(split_part(col1, ',', 2), '') AS fn,
NULLIF(split_part(col1, ',', 3), '') AS suf
from data
UNION
select id,
'col2' as name,
NULLIF(split_part(col2, ',', 1), '') AS dd,
NULLIF(split_part(col2, ',', 2), '') AS fn,
NULLIF(split_part(col2, ',', 3), '') AS suf
from data
UNION
select id,
'col3' as name,
NULLIF(split_part(col3, ',', 1), '') AS dd,
NULLIF(split_part(col3, ',', 2), '') AS fn,
NULLIF(split_part(col3, ',', 3), '') AS suf
from data
);
Is there a more elegant way? I have 20 columns.
Assuming this table:
CREATE TABLE tbl (id int, col1 text, col2 text, col3 text);
INSERT INTO tbl VALUES (1 ,'1,2,3', '4,5,6', '7,8,9');
A VALUES expression in a LATERAL subquery should be an elegant solution.
Then just use split_part(). Add NULLIF() only if there can be actual empty strings in the source ...
SELECT id, x.name
, split_part(x.col, ',', 1) AS dd
, split_part(x.col, ',', 2) AS fn
, split_part(x.col, ',', 3) AS suf
FROM tbl t, LATERAL (
VALUES (text 'col1', t.col1)
, ( 'col2', t.col2)
, ( 'col3', t.col3)
-- ... many more?
) x(name, col);
Works in PostgreSQL 9.3 or later.
SQL Fiddle.
Related:
What is the difference between LATERAL and a subquery in PostgreSQL?
SELECT DISTINCT on multiple columns
Split comma separated column data into additional columns
I would do the union all first and the split_part() second:
select id, name,
coalesce(split_part(col, ',', 1), '') as dd,
coalesce(split_part(col, ',', 2), '') as fn,
coalesce(split_part(col, ',', 3), '') as suf
from ((select id, 'col1' as name, col1 as col from data
) union all
(select id, 'col2' as name, col2 as col from data
) union all
(select id, 'col3' as name, col3 as col from data
)
) t;

Cumulative string concatenation

I have a requirement where I have to show data in cumulative concatenation style, just like running total by group.
Sample data
Col1 Col2
1 a
1 b
2 c
2 d
2 e
Expected output:
Col1 Col2
1 a
1 b,a
2 c
2 d,c
2 e,d,c
The concatenation needs to be broken down by Col1. Any help regarding how to get this result by Oracle SQL will be appreciated.
Assuming something on the way you need to order, this can be a solution, based on Hierarchical Queries:
with test as
(
select 1 as col1, 'a' as col2 from dual union all
select 1 as col1, 'b' as col2 from dual union all
select 2 as col1, 'c' as col2 from dual union all
select 2 as col1, 'd' as col2 from dual union all
select 2 as col1, 'e' as col2 from dual
)
select col1, col2
from (
select col1 AS col1, sys_connect_by_path(col2, ',') AS col2, connect_by_isleaf leaf
from (
select row_number() over (order by col1 asc, col2 desc) as num, col1, col2
from test
)
connect by nocycle prior col1 = col1 and prior num = num -1
)
where leaf = 1
order by col1, col2
Try:
WITH d AS (
select col1, col2,
row_number() over (partition by col1 order by col2) as x
from tab_le
),
d1( col1, col2, x, col22) as (
SELECT col1, col2, x, col2 col22 FROM d WHERE x = 1
UNION ALL
SELECT d.col1, d.col2, d.x, d.col2 || ',' || d1.col22
FROM d
JOIN d1 ON (d.col1 = d1.col1 AND d.x = d1.x + 1)
)
SELECT * FROM d1
order by 1,2;
I'm not sure you can do this with listagg as it doesn't seem to support windowing clauses. If you're on 11g or higher you can use recursive subquery factoring to achieve your result.
with your_table (col1, col2) as (
select 1, 'a' from dual
union all select 1, 'b' from dual
union all select 2, 'c' from dual
union all select 2, 'd' from dual
union all select 2, 'e' from dual
), t as (
select col1, col2, row_number() over (partition by col1 order by col2) as rn
from your_table
), r (col1, col2, rn) as (
select col1, col2, rn
from t
where rn = 1
union all
select r.col1, t.col2 ||','|| r.col2, t.rn
from r
join t on t.col1 = r.col1 and t.rn = r.rn + 1
)
select col1, col2
from r
order by col1, rn;
COL1 COL2
---------- --------------------
1 a
1 b,a
2 c
2 d,c
2 e,d,c
The your_table CTE is just to mimic your base data. The t CTE adds a row_number() analytic column to provide a sequence for the next part. The interesting part is the r recursive CTE. The anchor member starts with the first row (according to rn from the previous CTE). The recursive member then finds the next row (against according to rn) for that col1, and for that it concatenates the current col2 with the previous one, which may itself already be a concatenation.

Oracle 11g split text column to rows

I have table:
ID |Values
-----+--------------------------------
1 |AB,AD
2 |AG, ... ,BD
3 |AV
How can i transform it to:
ID |Value
-----+------
1 |AB
1 |AD
2 |AG
... |...
2 |BD
3 |AV
Using the built-in XML functions, you can do it like that:
with sample_data as
(
select 1 id, 'AB,AD' vals from dual union all
select 2, 'AG,AK,AJ,BA,BD' from dual union all
select 3, 'AV' from dual
)
select id, cast(t.column_value.extract('//text()') as varchar2(10)) val
from sample_data,
table( xmlsequence( xmltype(
'<x><x>' || replace(vals, ',', '</x><x>') || '</x></x>'
).extract('//x/*'))) t;
Result:
ID VAL
--- -----
1 AB
1 AD
2 AG
2 AK
2 AJ
2 BA
2 BD
3 AV
Using recursive common table expression, the same query looks like this:
with sample_data as
(
select 1 id, 'AB,AD' vals from dual union all
select 2, 'AG,AK,AJ,BA,BD' from dual union all
select 3, 'AV' from dual
),
split_first(id, val, rem) as
(
select id,
coalesce(substr(vals, 1, instr(vals, ',') - 1), vals) val,
case when instr(vals, ',') > 0 then substr(vals, instr(vals, ',') + 1) end rem
from sample_data
union all
select id,
coalesce(substr(rem, 1, instr(rem, ',') - 1), rem) val,
case when instr(rem, ',') > 0 then substr(rem, instr(rem, ',') + 1) end rem
from split_first
where rem is not null
)
select id, val from split_first
order by id;
Or a slightly different approach:
with sample_data as
(
select 1 id, 'AB,AD' vals from dual union all
select 2, 'AG,AK,AJ,BA,BD' from dual union all
select 3, 'AV' from dual
),
pos(id, seq, vals, sta, stp) as
(
select id, 1, vals, 1, instr(vals, ',') from sample_data
union all
select id, seq + 1, vals, stp + 1, instr(vals, ',', stp + 1) from pos
where stp > 0
)
select id, substr(vals, sta, case when stp > 0 then stp - sta else length(vals) end) from pos
order by id, seq;