How to flatten an Array Struct to columns in Google BigQuery - sql

Consider the table sample below where data is of type array<struct<key:string,value:string>>, with the repeated keys: 'Date', 'Country' and 'Brand':
source
data.key
data.value
first_file
Date
2022-12-14
Country
Germany
Brand
Mercedes
Date
2022-12-15
Country
Germany
Brand
BMW
second_file
Date
2022-12-13
Country
Sweden
Brand
Volvo
Date
2022-12-10
Country
France
Brand
Renault
By 'repeated' keys I mean that every data.key entry always consists of those keys (Date, Country, Brand). In this example, they get repeated twice per row entry, but in the real table they might get repeated even more times per unique entry. My desired result is:
source
date
country
brand
first_file
2022-12-14
Germany
Mercedes
first_file
2022-12-15
Germany
BMW
second_file
2022-12-13
Sweden
Volvo
second_file
2022-12-10
France
Renault
Any help on how I can reach that result?
If it helps, I've managed to turn the sample table into the format below in case you'd like to try a solution to this table instead:
source
date.key
date.value
country.key
country.value
brand.key
brand.value
first_file
Date
2022-12-14
Country
Germany
Brand
Mercedes
first_file
Date
2022-12-15
Country
Germany
Brand
BMW
second_file
Date
2022-12-13
Country
Sweden
Brand
Volvo
second_file
Date
2022-12-10
Country
France
Brand
Renault
Thanks!

WITH
tmp AS (
SELECT
source,
key,
value
FROM
UNNEST(ARRAY<STRUCT<source string, data ARRAY<STRUCT<key string, value string>>>>[
("first_file", [("Date","2022-12-14"), ("Country","Germany"),("Brand","Mercedes")]),
("second_file", [("Date","2022-12-13"), ("Country","Sweden"),("Brand","Volvo")])
]),
UNNEST(data) ) -- unnest data first
SELECT
source, date[SAFE_OFFSET(0)] as date, country[SAFE_OFFSET(0)] as country, brand[SAFE_OFFSET(0)] as brand,
FROM
tmp PIVOT (
ARRAY_AGG(value IGNORE NULLS) FOR key IN ("Date", "Country", "Brand")) -- pivot table

You can use the following query :
with sources AS
(
select
'first_file' as source,
[
struct('Date' as key, '2022-12-14' as value),
struct('Country' as key, 'Germany' as value),
struct('Brand' as key, 'Mercedes' as value)
] as data
UNION ALL
select
'second_file' as source,
[
struct('Date' as key, '2022-12-13' as value),
struct('Country' as key, 'Sweden' as value),
struct('Brand' as key, 'Volvo' as value)
] as data
)
select
source,
(SELECT value FROM UNNEST(data) WHERE key = 'Date') AS Date,
(SELECT value FROM UNNEST(data) WHERE key = 'Country') AS Country,
(SELECT value FROM UNNEST(data) WHERE key = 'Brand') AS Brand,
from sources;
You can also use an udf to centralize the logic :
CREATE TEMP FUNCTION getValue(k STRING, arr ANY TYPE) AS
((SELECT value FROM UNNEST(arr) WHERE key = k));
with sources AS
(
select
'first_file' as source,
[
struct('Date' as key, '2022-12-14' as value),
struct('Country' as key, 'Germany' as value),
struct('Brand' as key, 'Mercedes' as value)
] as data
UNION ALL
select
'second_file' as source,
[
struct('Date' as key, '2022-12-13' as value),
struct('Country' as key, 'Sweden' as value),
struct('Brand' as key, 'Volvo' as value)
] as data
)
SELECT
source,
getValue('Date', data) AS Date,
getValue('Country', data) AS Country,
getValue('Brand', data) AS Brand
FROM sources;
The result is :

Related

SQL ORACLE: how to transpose row to column by category?

I want to transpose row into column by category.. below is my data and my expectation
sorry for bad english and question.. I dont know how to phase it as english is not my primary language..
here is my data and my expectation
Insert SQL:
CREATE TABLE OBJECT(
ID NUMBER,
TYPE VARCHAR2(10),
NAME VARCHAR2(10)
);
INSERT INTO OBJECT (ID, TYPE, NAME ) VALUES
(1,'FISH','Shark'),
(2,'FISH','Carp'),
(3,'FISH','Salmon'),
(4,'ANIMAL','Cat'),
(5,'ANIMAL','Dog'),
(6,'ANIMAL','Sheep'),
(7,'ANIMAL','Lion'),
(8,'TRANS','Car'),
(9,'TRANS','Bike'),
(10,'FRUIT','Mango'),
(11,'FRUIT','Apple'),
(12,'FRUIT','Orange'),
(13,'FRUIT','Banana'),
(14,'FRUIT','Grape')
;
Number the rows for each type and then PIVOT around the row number and type:
SELECT fish_id,
fish_name,
animal_id,
animal_name,
trans_id,
trans_name,
fruit_id,
fruit_name
FROM (
SELECT o.*,
ROW_NUMBER() OVER (PARTITION BY type ORDER BY id) AS rn
FROM object o
)
PIVOT (
MAX(id) AS id,
MAX(name) AS name
FOR type IN (
'FISH' AS fish,
'ANIMAL' AS animal,
'TRANS' AS trans,
'FRUIT' AS fruit
)
)
Which, for the sample data:
CREATE TABLE OBJECT(
ID NUMBER,
TYPE VARCHAR2(10),
NAME VARCHAR2(10)
);
INSERT ALL
INTO OBJECT (ID, TYPE, NAME ) VALUES (1,'FISH','Shark')
INTO OBJECT (ID, TYPE, NAME ) VALUES (2,'FISH','Carp')
INTO OBJECT (ID, TYPE, NAME ) VALUES (3,'FISH','Salmon')
INTO OBJECT (ID, TYPE, NAME ) VALUES (4,'ANIMAL','Cat')
INTO OBJECT (ID, TYPE, NAME ) VALUES (5,'ANIMAL','Dog')
INTO OBJECT (ID, TYPE, NAME ) VALUES (6,'ANIMAL','Sheep')
INTO OBJECT (ID, TYPE, NAME ) VALUES (7,'ANIMAL','Lion')
INTO OBJECT (ID, TYPE, NAME ) VALUES (8,'TRANS','Car')
INTO OBJECT (ID, TYPE, NAME ) VALUES (9,'TRANS','Bike')
INTO OBJECT (ID, TYPE, NAME ) VALUES (10,'FRUIT','Mango')
INTO OBJECT (ID, TYPE, NAME ) VALUES (11,'FRUIT','Apple')
INTO OBJECT (ID, TYPE, NAME ) VALUES (12,'FRUIT','Orange')
INTO OBJECT (ID, TYPE, NAME ) VALUES (13,'FRUIT','Banana')
INTO OBJECT (ID, TYPE, NAME ) VALUES (14,'FRUIT','Grape')
SELECT * FROM DUAL;
Outputs:
FISH_ID
FISH_NAME
ANIMAL_ID
ANIMAL_NAME
TRANS_ID
TRANS_NAME
FRUIT_ID
FRUIT_NAME
1
Shark
4
Cat
8
Car
10
Mango
2
Carp
5
Dog
9
Bike
11
Apple
3
Salmon
6
Sheep
null
null
12
Orange
null
null
7
Lion
null
null
13
Banana
null
null
null
null
null
null
14
Grape
fiddle

Show fields only when other column does not contain nulls

I have a table that stores pets and a certain number of vaccines. In one column the identifier, in another column the name of the vaccine and in the third column, the date of completion. In case the date is null, it means that the pet has not received that vaccine yet.
This estructure is the next one:
CREATE TABLE pets (
pet VARCHAR (10),
vaccine VARCHAR (50),
complete_date DATE
);
INSERT INTO pets VALUES ('DOG001', 'Adenovirus', '2021-01-03');
INSERT INTO pets VALUES ('DOG001', 'Parvovirus', '2021-02-03');
INSERT INTO pets VALUES ('DOG001', 'Leptospirosis', null);
INSERT INTO pets VALUES ('CAT774', 'Calcivirosis', '2021-01-06');
INSERT INTO pets VALUES ('CAT774', 'Panleukopenia', null);
INSERT INTO pets VALUES ('DOG002', 'Adenovirus', '2020-12-21');
INSERT INTO pets VALUES ('DOG002', 'Parvovirus', '2021-02-01');
INSERT INTO pets VALUES ('DOG002', 'Leptospirosis', '2021-03-01');
pet
vaccine
complete_date
DOG001
Adenovirus
2021-01-03
DOG001
Parvovirus
2021-02-03
DOG001
Leptospirosis
null
CAT774
Calcivirosis
2021-01-06
CAT774
Panleukopenia
null
DOG002
Adenovirus
2020-12-21
DOG002
Parvovirus
2021-02-01
DOG002
Leptospirosis
2021-03-01
What I need is a list of all the pets that do not have a null "date", considering all the vaccines.
In this example, the result should be simply 'DOG002' since it is the only animal with all its dates with non-null values.
A conditional aggregate in the HAVING would be one method:
SELECT Pet
FROM dbo.Pets
GROUP BY Pet
HAVING COUNT(CASE WHEN Complete_Date IS NULL THEN 1 END) = 0;
I think Larnu posted what you are looking for (+1)... BUT... just in case you want to see the pet's details.
Just another option is WITH TIES.
Select top 1 with ties *
From pets
order by sum(case when complete_date is null then 1 else 0 end) over (partition by pet)
SELECT DISTINCT Pet FROM Pets
WHERE Pet NOT IN (SELECT Pet FROM Pets WHERE Complete_Date IS NULL)
CTE can also be used to achieve the above result
with CTE as
(
select pet,
vaccine,
complete_date,
SUM(IIF(complete_date is null ,1,0)) over (PARTITION BY pet) as pet_flag
from pets
)
select distinct Pet from CTE where
pet_flag = 0

Converting data file to flatfile?

I'm looking to convert a data file to a flat file format, with multiple hierarchical dimensions. I have included an example, but ideally, I will have an unknown number of columns that I wish to transform, while the hierarchical dimensions will be fixed.
If you have unknown or variable columns, you can dynamically UNPIVOT your data without using Dynamic SQL. Note that we only need to exclude the two key columns ... Where [key] not in ('Country','City')
Example
Select Country
,City
,Metric = B.[key]
,Value = B.Value
From YourTable A
Cross Apply ( Select *
From OpenJson( (Select A.* For JSON Path,Without_Array_Wrapper ) )
Where [key] not in ('Country','City')
) B
Returns
Country City Metric Value
US NY Snowfall 13
US NY Temp 94
US NY Snowfall 5
US NY Temp 84
UK London Snowfall 6
UK London Temp 85
you need to unpivot your data:
SELECT unpvt.country
, unpvt.city
, unpvt.metrics
, unpvt.Valuess
FROM
( SELECT * FROM tablename ) p
UNPIVOT ( Valuess FOR metrics IN ( snowfall , temp) ) unpvt

Postgres - how select jsonb key value pairs as colums?

having records in test table like so (metrics column is jsonb type):
id name metrics
1 machine1 {"metric1": 50, "metric2": 100}
2 machine2 {"metric1": 31, "metric2": 46}
I would like to select the metrics as additional columns, e.g. (pseudo-code):
Select *, json_each(test.metrics) from test;
to get the result like:
id name metric1 metric2
1 machine1 50 100
2 machine2 31 46
Is this even possible?
Use the ->> operator:
select id, name,
metrics ->> 'metric1' as metric1,
metrics ->> 'metric2' as metric2
from test;
You can simply use ->>
demo:db<>fiddle
SELECT
id,
name,
metrics ->> 'metric1' as metric1,
metrics ->> 'metric2' as metric2
FROM t
Note, that now the metric columns are of type text. If you want to them to be of type integer, you need to cast them additionally:
(metrics ->> 'metric1')::int

conditioned explode in presto\spark

I have table with with this structure:
input
id, email_MD5, email_SHA1, idType
d1, md1, sh1, type1
d2, null, sh2, type2
I need to transform the table to source and destination relations according to following logic:
If only one of email_MD5 and email_SHA1 fields are null it converted to id-> email relation with original type.
If both emails are not nulls it converted to 3 relations: id-> email_MD5 , id-> email_SHA1 and the relation between the emails email_MD5 -> email_SHA1 with hardcoded type email
output
src, dst, idType
d1, md1, type1
d1, sh1, type1
md1, sh1, email
d2, sh2, type2
How can I do it in presto and spark sql?
I guess this is possible purely with UNION all possible combinations:
SELECT id AS src
,email_SHA1 AS dst
,idType
FROM input
WHERE email_SHA1 is not null
UNION
SELECT id AS src
,email_MD5 AS dst
,idType
FROM input
WHERE email_MD5 is not null
UNION
SELECT email_MD5 AS src
,email_SHA1 AS dst
,'email' AS idType
FROM input
WHERE email_SHA1 is not null and email_MD5 is not null