Flatten BigQuery nested field contents into new columns instead of rows - google-bigquery

I have some BigQuery data in the following format:
"thing": [
{
"name": "gameLost",
"params": [
{
"key": "total_games",
"val": {
"str_val": "3",
"int_val": null
}
},
{
"key": "games_won",
"val": {
"str_val": "2",
"int_val": null
}
},
{
"key": "game_time",
"val": {
"str_val": "44",
"int_val": null
}
}
],
"dt_a": "1470625311138000",
"dt_b": "1470620345566000"
}
I'm aware of the FLATTEN() function that will result in an output of 3 rows like so:
+------------+------------------+------------------+--------------------+--------------------------+--------------------------+
| thing.name | thing.dt_a | event_dim.dt_b | thing.params.key | thing.params.val.str_val | thing.params.val.int_val |
+------------+------------------+------------------+--------------------+--------------------------+--------------------------+
| gameLost | 1470625311138000 | 1470620345566000 | total_games_played | 3 | null |
| | | | | | |
| gameLost | 1470625311138000 | 1470620345566000 | games_won | 2 | null |
| | | | | | |
| gameLost | 1470625311138000 | 1470620345566000 | game_time | 44 | null |
+------------+------------------+------------------+--------------------+--------------------------+--------------------------+
where the higher level keys/values get repeated into new rows for each deeper level object.
However, I need to output the deeper key/values as entirely new columns, and not repeat fields so the results would look like this:
+------------+------------------+------------------+--------------------+-----------+-----------+
| thing.name | thing.dt_a | event_dim.dt_b | total_games_played | games_won | game_time |
+------------+------------------+------------------+--------------------+-----------+-----------+
| gameLost | 1470625311138000 | 1470620345566000 | 3 | 2 | 44 |
+------------+------------------+------------------+--------------------+-----------+-----------+
How can I do this? Thanks!

Standard SQL makes this easier to express (uncheck "Use Legacy SQL" under "Show Options"):
WITH T AS (
SELECT STRUCT(
"gameLost" AS name,
ARRAY<STRUCT<key STRING, val STRUCT<str_val STRING, int_val INT64>>>[
STRUCT("total_games", STRUCT("3", NULL)),
STRUCT("games_won", STRUCT("2", NULL)),
STRUCT("game_time", STRUCT("44", NULL))] AS params,
1470625311138000 AS dt_a,
1470620345566000 AS dt_b) AS thing
)
SELECT
(SELECT AS STRUCT thing.* EXCEPT (params)) AS thing,
thing.params[OFFSET(0)].val.str_val AS total_games_played,
thing.params[OFFSET(1)].val.str_val AS games_won,
thing.params[OFFSET(2)].val.str_val AS game_time
FROM T;
+-------------------------------------------------------------------------+--------------------+-----------+-----------+
| thing | total_games_played | games_won | game_time |
+-------------------------------------------------------------------------+--------------------+-----------+-----------+
| {"name":"gameLost","dt_a":"1470625311138000","dt_b":"1470620345566000"} | 3 | 2 | 44 |
+-------------------------------------------------------------------------+--------------------+-----------+-----------+
If you don't know the order of the keys in the array, you can use subselects to extract the relevant values:
WITH T AS (
SELECT STRUCT(
"gameLost" AS name,
ARRAY<STRUCT<key STRING, val STRUCT<str_val STRING, int_val INT64>>>[
STRUCT("total_games", STRUCT("3", NULL)),
STRUCT("games_won", STRUCT("2", NULL)),
STRUCT("game_time", STRUCT("44", NULL))] AS params,
1470625311138000 AS dt_a,
1470620345566000 AS dt_b) AS thing
)
SELECT
(SELECT AS STRUCT thing.* EXCEPT (params)) AS thing,
(SELECT val.str_val FROM UNNEST(thing.params) WHERE key = "total_games") AS total_games_played,
(SELECT val.str_val FROM UNNEST(thing.params) WHERE key = "games_won") AS games_won,
(SELECT val.str_val FROM UNNEST(thing.params) WHERE key = "game_time") AS game_time
FROM T;

Try below (Legacy SQL)
SELECT
thing.name AS name,
thing.dt_a AS dt_a,
thing.dt_b AS dt_b
MAX(IF(thing.params.key = "total_games_played", INTEGER(thing.params.val.str_val), 0)) WITHIN RECORD AS total_games_played,
MAX(IF(thing.params.key = "games_won", INTEGER(thing.params.val.str_val), 0)) WITHIN RECORD AS games_won,
MAX(IF(thing.params.key = "game_time", INTEGER(thing.params.val.str_val), 0)) WITHIN RECORD AS game_time,
FROM YourTable
For Standard SQL you can try (inspired by Elliott's answer - important difference - array is ordered by key so order of key values is guaranteed)
WITH Temp AS (
SELECT
(SELECT AS STRUCT thing.* EXCEPT (params)) AS thing,
ARRAY(SELECT val.str_val AS val FROM UNNEST(thing.params) ORDER BY key) AS params
FROM YourTable
)
SELECT
thing,
params[OFFSET(2)] AS total_games_played,
params[OFFSET(1)] AS games_won,
params[OFFSET(0)] AS game_time
FROM Temp
Note: If you have other keys in params - you should add WHERE clause to SELECT inside ARRAY

Related

How select a row and count row in single query in postgresql?

I have a table in postgresql as follow:
id | chat_id | content | time | read_times
----+---------+-----------+------+-------------------------------------------------------------------------
1 | chat_10 | content_1 | t1 | [{"username": "user1", "time": 123}, {"username": "user2", "time": 111}]
2 | chat_10 | content_2 | t2 | [{"username": "user2", "time": 1235}]
3 | chat_10 | content_3 | t3 | []
4 | chat_11 | content_4 | t4 | [{"username": "user1", "time": 125}, {"username": "user3", "time": 121}]
5 | chat_11 | content_5 | t5 | [{"username": "user1", "time": 126}, {"username": "user3", "time": 127}]
Note: t1 < t2 < t3 < t4 < t5
After every user read a message, we registered it in read_times column(user2 read a message with id 2 at time 1235), Now I want to get user chat list with unread count chats. for user1 the result is as follow:
chat_id | content | unread_count
--------+-----------+--------------
chat_10 | content_3 | 2
chat_11 | content_5 | 0
Note: unread_count is messages count that user didn't read in a caht_id.
Is it possible with one query?
First, you must extract the user names for each chat_id and content with json_array_elements() function and with FIRST_VALUE() window function get the last content of each chat_id.
Then aggregate and combine SUM() window function with MAX() aggregate function to get the column unread_count:
WITH cte AS (
SELECT t.chat_id, t.content,
FIRST_VALUE(t.content) OVER (PARTITION BY t.chat_id ORDER BY t.time DESC) last_content,
(r->>'username') username
FROM tablename t LEFT JOIN json_array_elements(read_times::json) r ON true
)
SELECT DISTINCT c.chat_id, MAX(c.last_content) "content",
SUM((MAX((COALESCE(username, '') = 'user1')::int) = 0)::int) OVER (PARTITION BY c.chat_id) unread_count
FROM cte c
GROUP BY c.chat_id, c.content
ORDER BY c.chat_id
See the demo.

How can I return an ARRAY/JSON structure in a column in postgresql?

Hello I have a set of tables as follows to explain my issue with a minimal example (I want to know how can I retrieve that structure, not change the database structure):
fruit:
id | name | form| texture_id
-------------------------------
1 | Apple | round | 1
2 | Banana | long | 1
fruit_varieties:
id | name | fruit_id | color
-------------------------------
1 | golden | 1| green
2 | fuji | 1| red
3 | canarias | 2| yellow
fruit_texture
id | name
-------------------------------
1 | soft
2 | hard
variety_countries
id | name | fruit_variety_id | average_temperature
-------------------------------
1 | france | 1 | 21
2 | spain | 1 | 24
3 | italy | 2 | 23
I wan to get this structure as follows:
For a given fruit.name=Apple:
{
"fruit_name" = "Apple",
"form" = "round",
"texture" = "soft",
"fruit_properties" = [{
"variety_name" = "Golden",
"color" = "green",
"countries" = [{
"country" = "france",
"avg_temperature" = "21",
}, {
"country" = "spain",
"avg_temperature" = "24",
}
]
}, {
"variety_name" = "fuji",
"color" = "red",
"countries" = [{
"country" = "italy",
"avg_temperature" = "23",
}
]
}
]
}
So I started with something like this
SELECT
fruit.name AS fruit_name,
fruit.form AS form,
fruit_texture.name AS texture,
(
# I don't know how to handle this
) AS fruit_properties
FROM fruit
JOIN fruit_varieties
ON fruit.id = fruit_varieties.fruit_id
WHERE fruit.name = 'Apple'
Now I'm not able to know how can I return that array inside a column, or create a JSON with the whole response. I have been already some hours trying to use some JSON PATH functions I have been suggested in some questions but I am not able to make them work.
Could someone give me a hint using this simple example?
Your output structure is not a standard JSON format. It should be : instead of = between key and value. Considering you want standard JSON output, try below mentioned query:
select row_to_json(d2) from (
select
name,
form,
texture,
json_agg(json_build_object('variety_name',variety_name,'color',color,'countries',countries)) "fruit_properties"
from
(
select
t1.name "name",
t1.form "form",
t3.name "texture",
t2.name "variety_name",
t2.color "color",
json_agg(json_build_object( 'country',t4.name,'temp',t4.average_temperature)) "countries"
from
fruit t1 inner join fruit_varieties t2 on t1.id=t2.fruit_id
inner join fruit_texture t3 on t1.texture_id=t3.id
inner join variety_countries t4 on t4.fruit_variety_id=t2.id
group by 1,2,3,4,5
) d1
group by 1,2,3
) d2
where d2.name='Apple'
DEMO
Above query will return a row with JSON value for each fruit if you will not use where clause.
If you literally want the output as you have mentioned in your question then replace row_to_json(d2) with replace(row_to_json(d2)::text,':', ' = ') in above query.

How to Use Hive parsing multiple nested JSON arrays

{
"base": {
"code": "xm",
"name": "project"
},
"list": [{
"ACode": "cp1",
"AName": "Product1",
"BList": [{
"BCode": "gn1",
"BName": "Feature1"
}, {
"BCode": "gn2",
"BName": "Feature2"
}]
}, {
"ACode": "cp2",
"AName": "Product2",
"BList": [{
"BCode": "gn1",
"BName": "Feature1"
}]
}]
}
JSON like this, want to get this
| code | name | ACode | Aname | Bcode | Bname |
| ---- | ------- | ----- | -------- | ----- | -------- |
| xm | project | cp1 | Product1 | gn1 | Feature1 |
| xm | project | cp1 | Product1 | gn2 | Feature2 |
| xm | project | cp2 | Product2 | gn1 | Feature1 |
I try Use this
SELECT
code
, name
, get_json_object(t.list, '$.[*].ACode') AS ACode
, get_json_object(t.list, '$.[*].AName') AS AName
, get_json_object(t.list, '$.[*].BList[*].BCode') AS BCode
, get_json_object(t.list, '$.[*].BList[*].BName') AS BName
FROM
(
SELECT
get_json_object(t.value, '$.base.code') AS code
, get_json_object(t.value, '$.base.name') AS name
, get_json_object(t.value, '$.list') AS list
FROM
(
SELECT
'{"base":{"code":"xm","name":"project"},"list":[{"ACode":"cp1","AName":"Product1","BList":[{"BCode":"gn1","BName":"Feature1"},{"BCode":"gn2","BName":"Feature2"}]},{"ACode":"cp2","AName":"Product2","BList":[{"BCode":"gn1","BName":"Feature1"}]}]}' as value
)
t
)
t
;
get this
xm project ["cp1","cp2"] ["Product1","Product2"] ["gn1","gn2","gn1"] ["Feature1","Feature2","Feature1"]
BUT I find it will generate six row.Seems to have a Cartesian product.
And I try use split(string, "\},\{"),but this will split the inner layer at the same time.SO I hope to get help.
I solve it!!
SELECT
code
, name
, ai.ACode
, ai.AName
, p.BCode
, p.BName
FROM
(
SELECT
get_json_object(t.value, '$.base.code') AS code
, get_json_object(t.value, '$.base.name') AS name
, get_json_object(t.value, '$.list') AS list
FROM
(
SELECT
'{"base":{"code":"xm","name":"project"},"list":[{"ACode":"cp1","AName":"Product1","BList":[{"BCode":"gn1","BName":"Feature1"},{"BCode":"gn2","BName":"Feature2"}]},{"ACode":"cp2","AName":"Product2","BList":[{"BCode":"gn1","BName":"Feature1"}]}]}' as value
)
t
)
t
lateral view explode(split(regexp_replace(regexp_extract(list,'^\\[(.+)\\]$',1),'\\}\\]\\}\\,\\{', '\\}\\]\\}\\|\\|\\{'),'\\|\\|')) list as a
lateral view json_tuple(a,'ACode','AName','BList') ai as ACode
, AName
, BList
lateral view explode(split(regexp_replace(regexp_extract(BList,'^\\[(.+)\\]$',1),'\\}\\,\\{', '\\}\\|\\|\\{'),'\\|\\|')) BList as b
lateral view json_tuple(b,'BCode','BName') p as BCode
, BName

How to query duplicated rows according to column value?

I'm trying to figure out how to duplicate my rows based on the pass and fail column. Below is the table that I've done querying so far.
The code for the query is shown below. I'm querying from a json.
SELECT
to_date(LotSummary ->> 'Start', 'HH24:MI:SS DD/MM/YY')AS "Date",
Machine AS "Machine",
LotSummary ->> 'Pass' AS "Pass",
LotSummary ->> 'Fail' AS "Fail"
FROM
(
SELECT
CASE
WHEN jsonb_typeof(json_data->'OEESummary'->'LotSummary'->'Lot') = 'array'
THEN
jsonb_array_elements(cast(json_data->'OEESummary'->'LotSummary'->'Lot' AS JSONB))
ELSE
json_data->'OEESummary'->'LotSummary'->'Lot'
END
AS LotSummary,
json_data->'OEESummary'->>'MachineID' AS Machine
FROM
(
SELECT
jsonb_array_elements(cast(json_data->>'body' AS JSONB)) AS json_data
FROM data
)t
WHERE
json_data ->> 'file_name' = 'OEE.xml'
)a
WHERE
to_date(LotSummary ->> 'Start', 'HH24:MI:SS DD/MM/YY') IS NOT NULL
So let's say I want to query it as duplicates to separate Pass and Fail values just like:
+----------------------------+--------------+------+------+
| Date | Machine | Pass | Fail |
+----------------------------+--------------+------+------+
| "2019-08-04T16:00:00.000Z" | TRS1000i-082 | 5 | NULL |
| "2019-08-04T16:00:00.000Z" | TRS1000i-082 | NULL | 2 |
| "2019-07-01T16:00:00.000Z" | TRS1000i-001 | 0 | NULL |
| "2019-07-01T16:00:00.000Z" | TRS1000i-001 | NULL | 0 |
| "2019-07-01T16:00:00.000Z" | TRS1000i-001 | 20 | NULL |
| "2019-07-01T16:00:00.000Z" | TRS1000i-001 | NULL | 0 |
+----------------------------+--------------+------+------+
Just in case you need the json formats (Do note it's not the exact one, but the format is correct):
{
"body": [
{
"file_name": "OEE.xml",
"OEESummary": {
"MachineID": "TRS1000i-012",
"LotSummary": {
"Lot": [
{
"#i": "0",
"Start": "14:52:16 15/08/19",
"Pass": "3",
"Fail": "0"
},
{
"#i": "1",
"Start": "15:40:41 15/08/19",
"Pass": "3",
"Fail": "0"
}
]
},
"Utilisation": [
"0:01:42:48",
"19.04%"
],
"MTTR": "--",
"IdleTime": "0:07:16:39",
"MUBA": "57",
"OEE": "60.55%"
}
}
],
"header": {
"json_metadata_revision": "v1.0.0",
"json_metadata_datetime_creation": "14-OCT-2019_14:55:57",
"json_metadata_uuid": "14102019145557_65b425d8-09e5-48ec-be85-e69d9a50d2e3",
"json_metadata_type": "mvst_xml_to_json"
}
}
Do help if you know any techniques I could use to solve this issue. Your help is greatly appreciated! Thank youu.
With your table, you can use a lateral join:
select t.date, t.machine, v.pass, v.fail
from t cross join lateral
(values (t.pass, null), (null, t.fail)) v(pass, fail);
I'm not quite sure what your query has to do with the question. But you can define it as a CTE and then use the results for t.
In the top of your result set , its just union query:
WITH A AS(
Select 1 id , 'TRS1000i-082' as Machine , 5 pass, 2 fail union all
Select 2 id , 'TRS1000i-001' as Machine , 0 pass, 0 fail union all
Select 3 id , 'TRS1000i-001' as Machine , 20 pass, 0 fail
)
SELECT ID
,MACHINE
,pass
,null fail
FROM a
UNION ALL
SELECT ID
,MACHINE
,null pass
,fail fail
FROM a
order by ID

How to perform a pattern matching query on the keys of a hstore/jsonb column?

I'm trying to perform a pattern matching on an hstore column on a Postgresql database table.
Here's what I tried:
SELECT
*
FROM
products
WHERE
'iphone8' LIKE ANY(AVALS(available_devices))
however, it seems that the ANY operator only supports <, <=, <>, etc.
I also tried this:
SELECT
*
FROM
products
WHERE
ANY(AVALS(available_devices)) LIKE 'iphone8'
but then it raises a SyntaxError.
So, can I do a query with a WHERE clause in which I pass a parameter and the results of the query are the rows that contain any key in the informed hstore_column that match the given parameter?
eg:
for rows
id | hstore_column
1 { country: 'brazil' }
2 { city: 'amsterdam' }
3 { state: 'new york' }
4 { count: 10 }
5 { counter: 'Irelia' }
I'd like to perform a WHERE with a parameter 'count' and I expect the results to be:
id | hstore_column
1 { country: 'brazil' }
4 { count: 10 }
5 { counter: 'Irelia' }
You can use jsonb_object_keys to turn the keys into a column. Then match against the key.
For example, here's my test data.
select * from test;
id | stuff
----+---------------------------------------------
1 | {"country": "brazil"}
2 | {"city": "amsterdam"}
3 | {"count": 10}
4 | {"pearl": "jam", "counting": "crows"}
5 | {"count": "chocula", "count down": "final"}
Then we can use jsonb_object_keys to turn each key into its own row.
select id, stuff, jsonb_object_keys(stuff) as key
from test;
id | stuff | key
----+---------------------------------------------+------------
1 | {"country": "brazil"} | country
2 | {"city": "amsterdam"} | city
3 | {"count": 10} | count
4 | {"pearl": "jam", "counting": "crows"} | pearl
4 | {"pearl": "jam", "counting": "crows"} | counting
5 | {"count": "chocula", "count down": "final"} | count
5 | {"count": "chocula", "count down": "final"} | count down
This can be used in a sub-select to get each matching key/value pair.
select id, stuff, key, stuff->key as value
from (
select id, stuff, jsonb_object_keys(stuff) as key
from test
) pairs
where key like 'count%';
id | stuff | key | value
----+---------------------------------------------+------------+-----------
1 | {"country": "brazil"} | country | "brazil"
3 | {"count": 10} | count | 10
4 | {"pearl": "jam", "counting": "crows"} | counting | "crows"
5 | {"count": "chocula", "count down": "final"} | count | "chocula"
5 | {"count": "chocula", "count down": "final"} | count down | "final"
Or we can use distinct to get just the matching rows.
select distinct id, stuff
from (
select id, stuff, jsonb_object_keys(stuff) as key
from test
) pairs
where key like 'count%';
id | stuff
----+---------------------------------------------
1 | {"country": "brazil"}
3 | {"count": 10}
4 | {"pearl": "jam", "counting": "crows"}
5 | {"count": "chocula", "count down": "final"}
dbfiddle
Note: having to search the keys indicates your data structure might need rethinking. A traditional key/value table might work better. The values can still be jsonb. There's a little more setup, but the queries are simpler and it is easier to index.
create table attribute_group (
id bigserial primary key
);
create table test (
id bigserial primary key,
attribute_group_id bigint
references attribute_group(id)
on delete cascade
);
create table attributes (
attribute_group_id bigint
references attribute_group(id) not null,
key text not null,
value jsonb not null
);
select test.id, attrs.key, attrs.value
from test
join attributes attrs on attrs.attribute_group_id = test.attribute_group_id
where attrs.key like 'count%';
dbfiddle