How to unnest and pivot two columns in BigQuery

How to unnest and pivot two columns in BigQuery - sql

Say I have a BQ table containing the following information
id
test.name
test.score
1
a
5
b
7
2
a
8
c
3
Where test is nested. How would I pivot test into the following table?
id
a
b
c
1
5
7
2
8
3
I cannot pivot test directly, as I get the following error message at pivot(test): Table-valued function not found. Previous questions (1, 2) don't deal with nested columns or are outdated.
The following query looks like a useful first step:
select a.id, t
from `table` as a,
unnest(test) as t
However, this just provides me with:
id
test.name
test.score
1
a
5
1
b
7
2
a
8
2
c
3

Conditional aggregation is a good approach. If your tables are large, you might find that this has the best performance:
select t.id,
(select max(tt.score) from unnest(t.score) tt where tt.name = 'a') as a,
(select max(tt.score) from unnest(t.score) tt where tt.name = 'b') as b,
(select max(tt.score) from unnest(t.score) tt where tt.name = 'c') as c
from `table` t;
The reason I recommend this is because it avoids the outer aggregation. The unnest() happens without shuffling the data around -- and I have found that this is a big win in terms of performance.

One option could be using conditional aggregation
select id,
max(case when test.name='a' then test.score end) as a,
max(case when test.name='b' then test.score end) as b,
max(case when test.name='c' then test.score end) as c
from
(
select a.id, t
from `table` as a,
unnest(test) as t
)A group by id

Below is generic/dynamic way to handle your case
EXECUTE IMMEDIATE (
SELECT """
SELECT id, """ ||
STRING_AGG("""MAX(IF(name = '""" || name || """', score, NULL)) AS """ || name, ', ')
|| """
FROM `project.dataset.table` t, t.test
GROUP BY id
"""
FROM (
SELECT DISTINCT name
FROM `project.dataset.table` t, t.test
ORDER BY name
)
);
If to apply to sample data from your question - output is
Row id a b c
1 1 5 7 null
2 2 8 null 3

Related

Comparing Column Values and returning ERROR or OK

I'm needing to verify a source system with a destination system and ensure the values are matching between them. The problem is the source system is a total mess and is proving hard to validate.
I've got the following sample data where they should all be OK, but they're showing as ERROR. Does anyone know a way of doing a comparison that would result as an OK for all for the below?
CREATE TABLE #testdata (
ID INT
,ValueSource VARCHAR(800)
,ValueDestination VARCHAR(800)
,Value_Varchar_Check AS (
CASE
WHEN coalesce(ValueSource, '0') = coalesce(ValueDestination, '0')
THEN 'OK'
ELSE 'ERROR'
END
)
)
INSERT INTO #testdata (
ID
,ValueSource
,ValueDestination
)
SELECT 1
,'hepatitis c,other (specify)' 'hepatitis c, other (specify)'
UNION ALL
SELECT 2
,'lung problems / asthma,lung problems / asthma'
,'lung problems / asthma'
UNION ALL
SELECT 3
,'lung problems / asthma,diabetes'
,'diabetes, lung problems / asthma'
UNION ALL
SELECT 4
,'seizures/epilepsy,hepatitis c,seizures/epilepsy'
,'hepatitis c, seizures/epilepsy'

I don't think you can write this as a generated column as it is quite a tricky thing to compute. If you are using SQL Server 2016 or later, you can use STRING_SPLIT to convert the ValueSource and ValueDestination values into tables and then sort them alphabetically using a query like this:
SELECT DISTINCT ID, TRIM(value) AS value,
DENSE_RANK() OVER (PARTITION BY ID ORDER BY TRIM(value)) AS rn
FROM testdata
CROSS APPLY STRING_SPLIT(ValueSource, ',')
For ValueSource, this produces:
ID value rn
1 hepatitis c 1
1 other (specify) 2
2 lung problems / asthma 1
3 diabetes 1
3 lung problems / asthma 2
4 hepatitis c 1
4 seizures/epilepsy 2
You can then FULL OUTER JOIN those two tables on ID, value and rn, and detect an error when there are null values from either side (since that implies that the values for a given ID and rn don't match):
WITH t1 AS (
SELECT DISTINCT ID, TRIM(value) AS value,
DENSE_RANK() OVER (PARTITION BY ID ORDER BY TRIM(value)) AS rn
FROM testdata
CROSS APPLY STRING_SPLIT(ValueSource, ',')
),
t2 AS (
SELECT DISTINCT ID, TRIM(value) AS value,
DENSE_RANK() OVER (PARTITION BY ID ORDER BY TRIM(value)) AS rn
FROM testdata
CROSS APPLY STRING_SPLIT(ValueDestination, ',')
)
SELECT COALESCE(t1.ID, t2.ID) AS ID,
CASE WHEN COUNT(CASE WHEN t1.value IS NULL OR t2.value IS NULL THEN 1 END) > 0 THEN 'Error'
ELSE 'OK'
END AS Status
FROM t1
FULL OUTER JOIN t2 ON t2.ID = t1.ID AND t2.rn = t1.rn AND t2.value = t1.value
GROUP BY COALESCE(t1.ID, t2.ID)
Output (for your sample data):
ID Status
1 OK
2 OK
3 OK
4 OK
Demo on SQLFiddle
You can then use the entire query above as a CTE (call it t3) to update your original table:
UPDATE t
SET t.Value_Varchar_Check = t3.Status
FROM testdata t
JOIN t3 ON t.ID = t3.ID
Output:
ID ValueSource ValueDestination Value_Varchar_Check
1 hepatitis c,other (specify) hepatitis c, other (specify) OK
2 lung problems / asthma,lung problems / asthma lung problems / asthma OK
3 lung problems / asthma,diabetes diabetes, lung problems / asthma OK
4 seizures/epilepsy,hepatitis c,seizures/epilepsy hepatitis c, seizures/epilepsy OK
Demo on SQLFiddle

Case statement with four columns, i.e. attributes

I have a table with values "1", "0" or "". The table has four columns: p, q, r and s.
I need help creating a case statement that returns values when the attribute is equal to 1.
For ID 5 the case statement should return "p s".
For ID 14 the case statement should return "s".
For ID 33 the case statement should return 'p r s". And so on.
Do I need to come with a case statement that has every possible combination? Or is there a simpler way. Below is what I have come up with thus far.
case
when p = 1 and q =1 then "p q"
when p = 1 and r =1 then "p r"
when p = 1 and s =1 then "p s"
when r = 1 then r
when q = 1 then q
when r = 1 then r
when s = 1 then s
else ''
end

One solution could be this which uses a case for each attribute to return the correct value, surrounded by a trim to remove the trailing space.
with tbl(id, p, q, r, s) as (
select 5,1,0,0,1 from dual union all
select 14,0,0,0,1 from dual
)
select id,
trim(regexp_replace(case p when 1 then 'p' end ||
case q when 1 then 'q' end ||
case r when 1 then 'r' end ||
case s when 1 then 's' end, '(.)', '\1 '))
from tbl;
The real solution would be to fix the database design. This design technically violates Boyce-Codd 4th normal form in that it contains more than 1 independent attribute. The fact an ID "has" or "is part of" attribute p or q, etc should be split out. This design should be 3 tables, the main table with the ID, the lookup table containing info about attributes that the main ID could have (p, q, r or s) and the associative table that joins the two where appropriate (assuming an ID row could have more than one attribute and an attribute could belong to more than one ID), which is how to model a many-to-many relationship.
main_tbl main_attr attribute_lookup
ID col1 col2 main_id attr_id attr_id attr_desc
5 5 1 1 p
14 5 4 2 q
14 4 3 r
4 s
Then it would be simple to query this model to build your list, easy to maintain if an attribute description changes (only 1 place to change it), etc.
Select from it like this:
select m.ID, m.col1, listagg(al.attr_desc, ' ') within group (order by al.attr_desc) as attr_desc
from main_tbl m
join main_attr ma
on m.ID = ma.main_id
join attribute_lookup al
on ma.attr_id = al.attr_id
group by m.id, m.col1;

You can use concatenations with decode() functions
select id, decode(p,1,'p','')||decode(q,1,'q','')
||decode(r,1,'r','')||decode(s,1,'s','') as "String"
from t;
Demo
If you need spaces between letters, consider using :
with t(id,p,q,r,s) as
(
select 5,1,0,0,1 from dual union all
select 14,0,0,0,1 from dual union all
select 31,null,0,null,1 from dual union all
select 33,1,0,1,1 from dual
), t2 as
(
select id, decode(p,1,'p','')||decode(q,1,'q','')
||decode(r,1,'r','')||decode(s,1,'s','') as str
from t
), t3 as
(
select id, substr(str,level,1) as str, level as lvl
from t2
connect by level <= length(str)
and prior id = id
and prior sys_guid() is not null
)
select id, listagg(str,' ') within group (order by lvl) as "String"
from t3
group by id;
Demo

in my opinion, its a bad practice to use columns for relationships.
you should have two tables, one that's called arts and another that is called mapping art looks like this:
ID - ART
1 - p
2 - q
3 - r
4 - 2
...
and mapping maps your base-'ID's to your art-ids and looks like this
MYID - ARTID
5 - 1
5 - 4
afterwards, you should make use of oracles pivot operator. its more dynamically

SQL query to get the count by applying group by

I want to get the below result:
source table :
Cnt A B
4 ABC YU/FGH
5 ABC YU/DFE
5 ABC KL
2 LKP BN/ER
4 JK RE
Result:
Cnt A B
9 ABC YU
5 ABC KL
2 LKP BN
4 JK RE
Here I want the count by grouping 'B' and want to display the 'B' record only till the special character (/)

Basically, you will have to filter out the all the characters after the "/" symbol and then apply a SUM and a GROUP BY. You can see this below. The inner query filters out the unwanted string and the outer query does the SUM and the GROUP BY :
SELECT SUM(t.Cnt), t.A, t.B
FROM (
SELECT Cnt,
A,
CASE
WHEN CHARINDEX('/', B) > 0 THEN SUBSTRING(B, 0, CHARINDEX('/', B))
ELSE B
END AS B
FROM #Tab
) t
GROUP BY t.A, t.B
ORDER BY t.A
You can see this working here -> http://rextester.com/IQJ79191
Hope this helps!!!

You can get your string till '/' by using SUBSTRING.
select
count(SUBSTRING(reverse(B),0,charindex('/',reverse(B)))),
A,
SUBSTRING(reverse(B),0,charindex('/',reverse(B)))
from source_table group by B;

Solution for Oracle - substr(B,0,instr(B,'/',1)-1) B
Put this both in select and groupby

I can suggest you to use a query like this:
select
sum(Cnt) Cnt,
A,
left(B, charindex('/',B+'/',0)-1) B -- Using `+'\'` will do the trick
from
t
group by
A,
left(B, charindex('/',B+'/',0)-1);

By using String and CharIndex Functions.
;WITH SourceTable(Cnt,A,B) AS
(
SELECT 4,'ABC','YU/FGH'UNION ALL
SELECT 5,'ABC','YU/DFE'UNION ALL
SELECT 5,'ABC','KL' UNION ALL
SELECT 2,'LKP','BN/ER' UNION ALL
SELECT 4,'JK','RE'
)
SELECT SUM(Cnt) AS Cnt,A,CASE WHEN CHARINDEX('/',B) = 0 THEN B
ELSE SUBSTRING(B,0,CHARINDEX('/',B)) END AS [B] FROM SourceTable
GROUP BY A,CASE WHEN CHARINDEX('/',B) = 0 THEN B
ELSE SUBSTRING(B,0,CHARINDEX('/',B)) END
ORDER BY Cnt DESC

Try this query --
SELECT SUM(Cnt) AS [COUNT]
,A
,CASE
WHEN CHARINDEX('/', B) > 0
THEN SUBSTRING(B, 1, (CHARINDEX('/', B) - 1))
ELSE B
END
FROM tblSample
GROUP BY A, B
ORDER BY A, B

How can I Pivot a table in DB2? [duplicate]

This question already has answers here:
Pivoting in DB2
(3 answers)
Closed 5 years ago.
I have table A, below, where for each unique id, there are three codes with some value.
ID Code Value
---------------------
11 1 x
11 2 y
11 3 z
12 1 p
12 2 q
12 3 r
13 1 l
13 2 m
13 3 n
I have a second table B with format as below:
Id Code1_Val Code2_Val Code3_Val
Here there is just one row for each unique id. I want to populate this second table B from first table A for each id from the first table.
For the first table A above, the second table B should come out as:
Id Code1_Val Code2_Val Code3_Val
---------------------------------------------
11 x y z
12 p q r
13 l m n
How can I achieve this in a single SQL query?

select Id,
max(case when Code = '1' then Value end) as Code1_Val,
max(case when Code = '2' then Value end) as Code2_Val,
max(case when Code = '3' then Value end) as Code3_Val
from TABLEA
group by Id

SELECT Id,
max(DECODE(Code, 1, Value)) AS Code1_Val,
max(DECODE(Code, 2, Value)) AS Code2_Val,
max(DECODE(Code, 3, Value)) AS Code3_Val
FROM A
group by Id

If your version doesn't have DECODE(), you can also use this:
INSERT INTO B (id, code1_val, code2_val, code3_val)
WITH Ids (id) as (SELECT DISTINCT id
FROM A) -- Only to construct list of ids
SELECT Ids.id, a1.value, a2.value, a3.value
FROM Ids -- or substitute the actual id table
JOIN A a1
ON a1.id = ids.id
AND a1.code = 1
JOIN A a2
ON a2.id = ids.id
AND a2.code = 2
JOIN A a3
ON a3.id = ids.id
AND a3.code = 3
(Works on my V6R1 DB2 instance, and have an SQL Fiddle Example).

Here is a SQLFiddle example
insert into B (ID,Code1_Val,Code2_Val,Code3_Val)
select Id, max(V1),max(V2),max(V3) from
(
select ID,Value V1,'' V2,'' V3 from A where Code=1
union all
select ID,'' V1, Value V2,'' V3 from A where Code=2
union all
select ID,'' V1, '' V2,Value V3 from A where Code=3
) AG
group by ID

Here is the SQL Query:
insert into pivot_insert_table(id,code1_val,code2_val, code3_val)
select * from (select id,code,value from pivot_table)
pivot(max(value) for code in (1,2,3)) order by id ;

WITH Ids (id) as
(
SELECT DISTINCT id FROM A
)
SELECT Ids.id,
(select sub.value from A sub where Ids.id=sub.id and sub.code=1 fetch first rows only) Code1_Val,
(select sub.value from A sub where Ids.id=sub.id and sub.code=2 fetch first rows only) Code2_Val,
(select sub.value from A sub where Ids.id=sub.id and sub.code=3 fetch first rows only) Code3_Val
FROM Ids

You want to pivot your data. Since DB2 has no pivot function, yo can use Decode (basically a case statement.)
The syntax should be:
SELECT Id,
DECODE(Code, 1, Value) AS Code1_Val,
DECODE(Code, 2, Value) AS Code2_Val,
DECODE(Code, 3, Value) AS Code3_Val
FROM A

SELECT DISTINCT for data groups

I have following table:
ID Data
1 A
2 A
2 B
3 A
3 B
4 C
5 D
6 A
6 B
etc. In other words, I have groups of data per ID. You will notice that the data group (A, B) occurs multiple times. I want a query that can identify the distinct data groups and number them, such as:
DataID Data
101 A
102 A
102 B
103 C
104 D
So DataID 102 would resemble data (A,B), DataID 103 would resemble data (C), etc. In order to be able to rewrite my original table in this form:
ID DataID
1 101
2 102
3 102
4 103
5 104
6 102
How can I do that?
PS. Code to generate the first table:
CREATE TABLE #t1 (id INT, data VARCHAR(10))
INSERT INTO #t1
SELECT 1, 'A'
UNION ALL SELECT 2, 'A'
UNION ALL SELECT 2, 'B'
UNION ALL SELECT 3, 'A'
UNION ALL SELECT 3, 'B'
UNION ALL SELECT 4, 'C'
UNION ALL SELECT 5, 'D'
UNION ALL SELECT 6, 'A'
UNION ALL SELECT 6, 'B'

In my opinion You have to create a custom aggregate that concatenates data (in case of strings CLR approach is recommended for perf reasons).
Then I would group by ID and select distinct from the grouping, adding a row_number()function or add a dense_rank() your choice. Anyway it should look like this
with groupings as (
select concat(data) groups
from Table1
group by ID
)
select groups, rownumber() over () from groupings

The following query using CASE will give you the result shown below.
From there on, getting the distinct datagroups and proceeding further should not really be a problem.
SELECT
id,
MAX(CASE data WHEN 'A' THEN data ELSE '' END) +
MAX(CASE data WHEN 'B' THEN data ELSE '' END) +
MAX(CASE data WHEN 'C' THEN data ELSE '' END) +
MAX(CASE data WHEN 'D' THEN data ELSE '' END) AS DataGroups
FROM t1
GROUP BY id
ID DataGroups
1 A
2 AB
3 AB
4 C
5 D
6 AB
However, this kind of logic will only work in case you the "Data" values are both fixed and known before hand.
In your case, you do say that is the case. However, considering that you also say that they are 1000 of them, this will be frankly, a ridiculous looking query for sure :-)
LuckyLuke's suggestion above would, frankly, be the more generic way and probably saner way to go about implementing the solution though in your case.

From your sample data (having added the missing 2,'A' tuple, the following gives the renumbered (and uniqueified) data:
with NonDups as (
select t1.id
from #t1 t1 left join #t1 t2
on t1.id > t2.id and t1.data = t2.data
group by t1.id
having COUNT(t1.data) > COUNT(t2.data)
), DataAddedBack as (
select ID,data
from #t1 where id in (select id from NonDups)
), Renumbered as (
select DENSE_RANK() OVER (ORDER BY id) as ID,Data from DataAddedBack
)
select * from Renumbered
Giving:
1 A
2 A
2 B
3 C
4 D
I think then, it's a matter of relational division to match up rows from this output with the rows in the original table.

Just to share my own dirty solution that I'm using for the moment:
SELECT DISTINCT t1.id, D.data
FROM #t1 t1
CROSS APPLY (
SELECT CAST(Data AS VARCHAR) + ','
FROM #t1 t2
WHERE t2.id = t1.id
ORDER BY Data ASC
FOR XML PATH('') )
D ( Data )
And then going analog to LuckyLuke's solution.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to unnest and pivot two columns in BigQuery - sql

One option could be using conditional aggregation select id, max(case when test.name='a' then test.score end) as a, max(case when test.name='b' then test.score end) as b, max(case when test.name='c' then test.score end) as c from ( select a.id, t from `table` as a, unnest(test) as t )A group by id

Related

Comparing Column Values and returning ERROR or OK

Case statement with four columns, i.e. attributes

SQL query to get the count by applying group by

How can I Pivot a table in DB2? [duplicate]

SELECT DISTINCT for data groups

Categories

Resources