Unnest array / JSON in Bigquery - Get value from key - sql

I have an array like this
[{"name": "Nome da Empresa", "value": "Land ", "updated_at": "2022-09-02T22:30:58Z"}, {"name": "Nome do Representante", "value": "Thomas GT", "updated_at": "2022-09-02T22:30:58Z"}, {"name": "Email Representante", "value": "p#xyz.com", "updated_at": "2022-09-02T22:30:58Z"}, {"name": "Qual o plano do cliente?", "value": "Go", "updated_at": "2022-09-02T22:31:12Z"},{"name": "Forma de pagamento", "value": "Internet Banking", "updated_at": "2022-09-16T14:09:53Z"}, {"name": "Valor total da guia", "value": "227,63", "updated_at": "2022-09-16T14:09:59Z"}]
I'm trying to get values from some "fields" like Nome da Empresa or Email Representante.
I've already tried use json_extract_scalar or unnest. With json_extract_scalar returns column with no values (blank) and with unnest returns error Values referenced in UNNEST must be arrays. UNNEST contains expression of type STRING
Query 1:
select
id,
fields,
json_extract_scalar(fields,'$.Email Representante') as categorias,
json_value(fields,'$.Nome da Empresa') as teste
from mytable
Query 2:
SELECT
id,
fields
from pipefy.cards_startup_pack, UNNEST(fields)
Any ideas? Thanks a lot!

You may try below query.
Query 1:
SELECT (SELECT JSON_VALUE(f, '$.value')
FROM UNNEST(JSON_QUERY_ARRAY(t.fields)) f
WHERE JSON_VALUE(f, '$.name') = 'Nome da Empresa'
) AS teste,
(SELECT JSON_VALUE(f, '$.value')
FROM UNNEST(JSON_QUERY_ARRAY(t.fields)) f
WHERE JSON_VALUE(f, '$.name') = 'Email Representante'
) AS categorias,
FROM mytable t;
# Query results
+-------+------------+
| teste | categorias |
+-------+------------+
| Land | p#xyz.com |
+-------+------------+
Query 2:
SELECT JSON_VALUE(f, '$.name') name, JSON_VALUE(f, '$.value') value
FROM mytable, UNNEST(JSON_QUERY_ARRAY(fields)) f;
# Query results
+--------------------------+------------------+
| name | value |
+--------------------------+------------------+
| Nome da Empresa | Land |
| Nome do Representante | Thomas GT |
| Email Representante | p#xyz.com |
| Qual o plano do cliente? | Go |
| Forma de pagamento | Internet Banking |
| Valor total da guia | "227,63" |
+--------------------------+------------------+

Related

Query Sum of jsonb column of objects

I have a jsonb column on my DB called reactions which has the structure like below.
[
{
"id": "1234",
"count": 1
},
{
"id": "2345",
"count": 1
}
]
The field holds an array of objects, each with a count and id field. I'm trying to find which object in the reactions field has the highest count across the db. Basically, I'd like to sum each object by it's id and find the max.
I've figured out how to sum up all of the reaction counts, but I'm getting stuck grouping it by the ID, and finding the sum for each individual id.
SELECT SUM((x->>'count')::integer) FROM (SELECT id, reactions FROM messages) as m
CROSS JOIN LATERAL jsonb_array_elements(m.reactions) AS x
Ideally I'd end up with something like this:
id | sum
-----------
1234 | 100
2345 | 70
5678 | 50
The messages table looks something like this
id | user | reactions
------------------------
1 | 3456 | jsonb
2 | 8573 | jsonb
The data calculation needs to take some transformation steps.
flat the jsonb column from array to individual jsonb objects using jsonb_array_elements function ;
postgres=# select jsonb_array_elements(reactions)::jsonb as data from messages;
data
----------------------------
{"id": "1234", "count": 1}
{"id": "2345", "count": 1}
{"id": "1234", "count": 1}
{"id": "2345", "count": 1}
...
populate each jsonb objects to seperate columns with jsonb_populate_record function ;
postgres=# create table data(id text ,count int);
CREATE TABLE
postgres=# select r.* from (select jsonb_array_elements(reactions)::jsonb as data from messages) as tmp, jsonb_populate_record(NULL::data, data) r;
id | count
------+-------
1234 | 1
2345 | 1
1234 | 1
2345 | 1
...
do the sum with group by.
postgres=# select r.id, sum(r.count) from (select jsonb_array_elements(reactions)::jsonb as data from messages) as tmp, jsonb_populate_record(NULL::data, data) r group by r.id;
id | sum
------+-----
2345 | 2
1234 | 2
...
The above steps should make it.
you can use the below - to convert the jsonb array to standard rows
see https://dba.stackexchange.com/questions/203250/getting-specific-key-values-from-jsonb-into-columns
select "id", sum("count")
from messages
left join lateral jsonb_to_recordset(reactions) x ("id" text, "count" int) on true
group by "id" order by 1;

How select a row and count row in single query in postgresql?

I have a table in postgresql as follow:
id | chat_id | content | time | read_times
----+---------+-----------+------+-------------------------------------------------------------------------
1 | chat_10 | content_1 | t1 | [{"username": "user1", "time": 123}, {"username": "user2", "time": 111}]
2 | chat_10 | content_2 | t2 | [{"username": "user2", "time": 1235}]
3 | chat_10 | content_3 | t3 | []
4 | chat_11 | content_4 | t4 | [{"username": "user1", "time": 125}, {"username": "user3", "time": 121}]
5 | chat_11 | content_5 | t5 | [{"username": "user1", "time": 126}, {"username": "user3", "time": 127}]
Note: t1 < t2 < t3 < t4 < t5
After every user read a message, we registered it in read_times column(user2 read a message with id 2 at time 1235), Now I want to get user chat list with unread count chats. for user1 the result is as follow:
chat_id | content | unread_count
--------+-----------+--------------
chat_10 | content_3 | 2
chat_11 | content_5 | 0
Note: unread_count is messages count that user didn't read in a caht_id.
Is it possible with one query?
First, you must extract the user names for each chat_id and content with json_array_elements() function and with FIRST_VALUE() window function get the last content of each chat_id.
Then aggregate and combine SUM() window function with MAX() aggregate function to get the column unread_count:
WITH cte AS (
SELECT t.chat_id, t.content,
FIRST_VALUE(t.content) OVER (PARTITION BY t.chat_id ORDER BY t.time DESC) last_content,
(r->>'username') username
FROM tablename t LEFT JOIN json_array_elements(read_times::json) r ON true
)
SELECT DISTINCT c.chat_id, MAX(c.last_content) "content",
SUM((MAX((COALESCE(username, '') = 'user1')::int) = 0)::int) OVER (PARTITION BY c.chat_id) unread_count
FROM cte c
GROUP BY c.chat_id, c.content
ORDER BY c.chat_id
See the demo.

how to use wildcard for a column jsonb type

--------------------------------------------------------------
ID | Day | ftfm_profile
--------------------------------------------------------------
23 | 22/10/2020 | {"name": ["EGMHTMA", "EGMHCR", "EDYYFOX2", "EGTTFIR", "EGTTEI"],"type": ["AUA", "ES", "ES", "FIR"]}
-------------------------------------------------------------------------------------------------
24 | 22/10/2020 | {"name": ["LFBBRL1", "LFBMC2", "LFBBTT6", "LFTTN8", "EGTTEI"],"type": ["AUA", "ES", "ES", "FIR"]}
-------------------------------------------------------------------------------------------------
25 | 22/10/2020 | {"name": ["LFBGTH4", "LFBMC2", "LFFFE7", "LFTTN8", "EGTTEI"],"type": ["AUA", "ES", "ES", "FIR"]}
I have a table (named profile) in my Postgres database, which has 3 columns: ID, Day, Ftfm_profile of type jsonb, I tried to extract the row where the profile name (ftfm_profile->'name') begins with 'LFBB' ( sql: LFBB%) using the wildcard as following:
select * from public.profile where ftfm_profile ->'name' ? 'LFBB%'
the expected result:
-------------------------------------------------------------------------------------------------
24 | 22/10/2020 | {"name": ["LFBBRL1", "LFBMC2", "LFBBTT6", "LFTTN8", "EGTTEI"],"type": ["AUA", "ES", "ES", "FIR"]}
-------------------------------------------------------------------------------------------------
I can't seem to find the solution, thanks for your help
One option unnests the json arrray in an exists subquery:
select *
from public.profile
where exists (
select 1
from jsonb_array_elements_text(ftfm_profile ->'name') as e(name)
where e.name like 'LFBB%'
)

How to Use Hive parsing multiple nested JSON arrays

{
"base": {
"code": "xm",
"name": "project"
},
"list": [{
"ACode": "cp1",
"AName": "Product1",
"BList": [{
"BCode": "gn1",
"BName": "Feature1"
}, {
"BCode": "gn2",
"BName": "Feature2"
}]
}, {
"ACode": "cp2",
"AName": "Product2",
"BList": [{
"BCode": "gn1",
"BName": "Feature1"
}]
}]
}
JSON like this, want to get this
| code | name | ACode | Aname | Bcode | Bname |
| ---- | ------- | ----- | -------- | ----- | -------- |
| xm | project | cp1 | Product1 | gn1 | Feature1 |
| xm | project | cp1 | Product1 | gn2 | Feature2 |
| xm | project | cp2 | Product2 | gn1 | Feature1 |
I try Use this
SELECT
code
, name
, get_json_object(t.list, '$.[*].ACode') AS ACode
, get_json_object(t.list, '$.[*].AName') AS AName
, get_json_object(t.list, '$.[*].BList[*].BCode') AS BCode
, get_json_object(t.list, '$.[*].BList[*].BName') AS BName
FROM
(
SELECT
get_json_object(t.value, '$.base.code') AS code
, get_json_object(t.value, '$.base.name') AS name
, get_json_object(t.value, '$.list') AS list
FROM
(
SELECT
'{"base":{"code":"xm","name":"project"},"list":[{"ACode":"cp1","AName":"Product1","BList":[{"BCode":"gn1","BName":"Feature1"},{"BCode":"gn2","BName":"Feature2"}]},{"ACode":"cp2","AName":"Product2","BList":[{"BCode":"gn1","BName":"Feature1"}]}]}' as value
)
t
)
t
;
get this
xm project ["cp1","cp2"] ["Product1","Product2"] ["gn1","gn2","gn1"] ["Feature1","Feature2","Feature1"]
BUT I find it will generate six row.Seems to have a Cartesian product.
And I try use split(string, "\},\{"),but this will split the inner layer at the same time.SO I hope to get help.
I solve it!!
SELECT
code
, name
, ai.ACode
, ai.AName
, p.BCode
, p.BName
FROM
(
SELECT
get_json_object(t.value, '$.base.code') AS code
, get_json_object(t.value, '$.base.name') AS name
, get_json_object(t.value, '$.list') AS list
FROM
(
SELECT
'{"base":{"code":"xm","name":"project"},"list":[{"ACode":"cp1","AName":"Product1","BList":[{"BCode":"gn1","BName":"Feature1"},{"BCode":"gn2","BName":"Feature2"}]},{"ACode":"cp2","AName":"Product2","BList":[{"BCode":"gn1","BName":"Feature1"}]}]}' as value
)
t
)
t
lateral view explode(split(regexp_replace(regexp_extract(list,'^\\[(.+)\\]$',1),'\\}\\]\\}\\,\\{', '\\}\\]\\}\\|\\|\\{'),'\\|\\|')) list as a
lateral view json_tuple(a,'ACode','AName','BList') ai as ACode
, AName
, BList
lateral view explode(split(regexp_replace(regexp_extract(BList,'^\\[(.+)\\]$',1),'\\}\\,\\{', '\\}\\|\\|\\{'),'\\|\\|')) BList as b
lateral view json_tuple(b,'BCode','BName') p as BCode
, BName

Flatten BigQuery nested field contents into new columns instead of rows

I have some BigQuery data in the following format:
"thing": [
{
"name": "gameLost",
"params": [
{
"key": "total_games",
"val": {
"str_val": "3",
"int_val": null
}
},
{
"key": "games_won",
"val": {
"str_val": "2",
"int_val": null
}
},
{
"key": "game_time",
"val": {
"str_val": "44",
"int_val": null
}
}
],
"dt_a": "1470625311138000",
"dt_b": "1470620345566000"
}
I'm aware of the FLATTEN() function that will result in an output of 3 rows like so:
+------------+------------------+------------------+--------------------+--------------------------+--------------------------+
| thing.name | thing.dt_a | event_dim.dt_b | thing.params.key | thing.params.val.str_val | thing.params.val.int_val |
+------------+------------------+------------------+--------------------+--------------------------+--------------------------+
| gameLost | 1470625311138000 | 1470620345566000 | total_games_played | 3 | null |
| | | | | | |
| gameLost | 1470625311138000 | 1470620345566000 | games_won | 2 | null |
| | | | | | |
| gameLost | 1470625311138000 | 1470620345566000 | game_time | 44 | null |
+------------+------------------+------------------+--------------------+--------------------------+--------------------------+
where the higher level keys/values get repeated into new rows for each deeper level object.
However, I need to output the deeper key/values as entirely new columns, and not repeat fields so the results would look like this:
+------------+------------------+------------------+--------------------+-----------+-----------+
| thing.name | thing.dt_a | event_dim.dt_b | total_games_played | games_won | game_time |
+------------+------------------+------------------+--------------------+-----------+-----------+
| gameLost | 1470625311138000 | 1470620345566000 | 3 | 2 | 44 |
+------------+------------------+------------------+--------------------+-----------+-----------+
How can I do this? Thanks!
Standard SQL makes this easier to express (uncheck "Use Legacy SQL" under "Show Options"):
WITH T AS (
SELECT STRUCT(
"gameLost" AS name,
ARRAY<STRUCT<key STRING, val STRUCT<str_val STRING, int_val INT64>>>[
STRUCT("total_games", STRUCT("3", NULL)),
STRUCT("games_won", STRUCT("2", NULL)),
STRUCT("game_time", STRUCT("44", NULL))] AS params,
1470625311138000 AS dt_a,
1470620345566000 AS dt_b) AS thing
)
SELECT
(SELECT AS STRUCT thing.* EXCEPT (params)) AS thing,
thing.params[OFFSET(0)].val.str_val AS total_games_played,
thing.params[OFFSET(1)].val.str_val AS games_won,
thing.params[OFFSET(2)].val.str_val AS game_time
FROM T;
+-------------------------------------------------------------------------+--------------------+-----------+-----------+
| thing | total_games_played | games_won | game_time |
+-------------------------------------------------------------------------+--------------------+-----------+-----------+
| {"name":"gameLost","dt_a":"1470625311138000","dt_b":"1470620345566000"} | 3 | 2 | 44 |
+-------------------------------------------------------------------------+--------------------+-----------+-----------+
If you don't know the order of the keys in the array, you can use subselects to extract the relevant values:
WITH T AS (
SELECT STRUCT(
"gameLost" AS name,
ARRAY<STRUCT<key STRING, val STRUCT<str_val STRING, int_val INT64>>>[
STRUCT("total_games", STRUCT("3", NULL)),
STRUCT("games_won", STRUCT("2", NULL)),
STRUCT("game_time", STRUCT("44", NULL))] AS params,
1470625311138000 AS dt_a,
1470620345566000 AS dt_b) AS thing
)
SELECT
(SELECT AS STRUCT thing.* EXCEPT (params)) AS thing,
(SELECT val.str_val FROM UNNEST(thing.params) WHERE key = "total_games") AS total_games_played,
(SELECT val.str_val FROM UNNEST(thing.params) WHERE key = "games_won") AS games_won,
(SELECT val.str_val FROM UNNEST(thing.params) WHERE key = "game_time") AS game_time
FROM T;
Try below (Legacy SQL)
SELECT
thing.name AS name,
thing.dt_a AS dt_a,
thing.dt_b AS dt_b
MAX(IF(thing.params.key = "total_games_played", INTEGER(thing.params.val.str_val), 0)) WITHIN RECORD AS total_games_played,
MAX(IF(thing.params.key = "games_won", INTEGER(thing.params.val.str_val), 0)) WITHIN RECORD AS games_won,
MAX(IF(thing.params.key = "game_time", INTEGER(thing.params.val.str_val), 0)) WITHIN RECORD AS game_time,
FROM YourTable
For Standard SQL you can try (inspired by Elliott's answer - important difference - array is ordered by key so order of key values is guaranteed)
WITH Temp AS (
SELECT
(SELECT AS STRUCT thing.* EXCEPT (params)) AS thing,
ARRAY(SELECT val.str_val AS val FROM UNNEST(thing.params) ORDER BY key) AS params
FROM YourTable
)
SELECT
thing,
params[OFFSET(2)] AS total_games_played,
params[OFFSET(1)] AS games_won,
params[OFFSET(0)] AS game_time
FROM Temp
Note: If you have other keys in params - you should add WHERE clause to SELECT inside ARRAY