Getting the index of the JSON array using SQL in Snowflake - sql

I'm flattening a JSON data in snowflake using the Lateral Flatten.
I have the JSON data as follows:
{
"Fruits": [
{
"Apple_Type" : Type_A,
"Banana_Type": Type_B
},
{
"Apple_Type" : Type_A2,
"Banana_Type": Type_B3
}
]
}
I used the following query to get the flattened data
SELECT v.value:Apple_Type,
v.value:Banana_Type
FROM Table1, LATERAL FLATTEN(input => Fruits) v
My Result:
--------------------------------
| Apple_Type | Banana_Type |
--------------------------------
| Type_A | Type_B |
| Type_A2 | Type_B3 |
--------------------------------
How do I get the index of the data. I want the table as follows
----------------------------------------------
| Apple_Type | Banana_Type | Index |
----------------------------------------------
| Type_A | Type_B | 0 | -> Because Apple_Type is from index 0 in the Fruit Array
| Type_A2 | Type_B3 | 1 | -> Because Banana_Type is from index 1 in the Fruit Array
----------------------------------------------

Using INDEX:
INDEX
The index of the element, if it is an array; otherwise NULL.
SELECT v.value:Apple_Type,
v.value:Banana_Type,
v.index
FROM Table1, LATERAL FLATTEN(input => Fruits) v

Related

Update of value in array of jsonb returns error"invalid input syntax for type json"

I have a column of type jsonb which contains json arrays of the form
[
{
"Id": 62497,
"Text": "BlaBla"
}
]
I'd like to update the Id to the value of a column word_id (type uuid) from a different table word.
I tried this
update inflection_copy
SET inflectionlinks = s.json_array
FROM (
SELECT jsonb_agg(
CASE
WHEN elems->>'Id' = (
SELECT word_copy.id::text
from word_copy
where word_copy.id::text = elems->>'Id'
) THEN jsonb_set(
elems,
'{Id}'::text [],
(
SELECT jsonb(word_copy.word_id::text)
from word_copy
where word_copy.id::text = elems->>'Id'
)
)
ELSE elems
END
) as json_array
FROM inflection_copy,
jsonb_array_elements(inflectionlinks) elems
) s;
Until now I always get the following error:
invalid input syntax for type json
DETAIL: Token "c66a4353" is invalid.
CONTEXT: JSON data, line 1: c66a4353...
The c66a4535 is part of one of the uuids of the word table. I don't understand why this is marked as invalid input.
EDIT:
To give an example of one of the uuids:
select to_jsonb(word_id::text) from word_copy limit(5);
returns
+----------------------------------------+
| to_jsonb |
|----------------------------------------|
| "078c979d-e479-4fce-b27c-d14087f467c2" |
| "ef288256-1599-4f0f-a932-aad85d666c9a" |
| "d1d95b60-623e-47cf-b770-de46b01042c5" |
| "f97464c6-b872-4be8-9d9d-83c0102fb26a" |
| "9bb19719-e014-4286-a2d1-4c0cf7f089fc" |
+----------------------------------------+
As requested the respective columns id and word_id from the word table:
+---------------------------------------------------+
| row |
|---------------------------------------------------|
| ('27733', '078c979d-e479-4fce-b27c-d14087f467c2') |
| ('72337', 'ef288256-1599-4f0f-a932-aad85d666c9a') |
| ('72340', 'd1d95b60-623e-47cf-b770-de46b01042c5') |
| ('27741', 'f97464c6-b872-4be8-9d9d-83c0102fb26a') |
| ('72338', '9bb19719-e014-4286-a2d1-4c0cf7f089fc') |
+---------------------------------------------------+
+----------------+----------+----------------------------+
| Column | Type | Modifiers |
|----------------+----------+----------------------------|
| id | bigint | |
| value | text | |
| homonymnumber | smallint | |
| pronounciation | text | |
| audio | text | |
| level | integer | |
| alpha | bigint | |
| frequency | bigint | |
| hanja | text | |
| typeeng | text | |
| typekr | text | |
| word_id | uuid | default gen_random_uuid() |
+----------------+----------+----------------------------+
I would suggest you to modify your sub query as follow :
update inflection_copy AS ic
SET inflectionlinks = s.json_array
FROM
(SELECT jsonb_agg(CASE WHEN wc.word_id IS NULL THEN e.elems ELSE jsonb_set(e.elems, array['Id'], to_jsonb(wc.word_id::text)) END ORDER BY e.id ASC) AS json_array
FROM inflection_copy AS ic
CROSS JOIN LATERAL jsonb_path_query(ic.inflectionlinks, '$[*]') WITH ORDINALITY AS e(elems, id)
LEFT JOIN word_copy AS wc
ON wc.id::text = e.elems->>'Id'
) AS s
The LEFT JOIN clause will return wc.word_id = NULL when there is no wc.id which corresponds to e.elems->>'id', so that e.elems is unchanged in the CASE.
The ORDER BY clause in the aggregate function jsonb_agg will ensure that the order is unchanged in the jsonb array.
jsonb_path_query is used instead of jsonb_array_elements so that to not raise an error when ic.inflectionlinks is not a jsonb array and it is used in lax mode (which is the default behavior).
see the test result in dbfiddle

Nesting jsonb in postgres without converting to jsonb[]

I have 1 table with 2 columns 1 is an index that holds the group number and a column of jsonb data
| Index | payload |
|----------------|----------------|
| 1 | {jsonb} |
| 1 | {jsonb} |
| 2 | {jsonb} |
| 2 | {jsonb} |
I then want to nest the payload into another jsonb, but it must not be an array.
Expected Output:
| Index | payload |
|----------------|----------------|
| 1 |{{jsonb},{jsonb}}|
| 2 |{{jsonb},{jsonb}}|
Actual Output:
| Index | payload |
|----------------|----------------|
| 1 |[{{jsonb},{jsonb}}]|
| 2 |[{{jsonb},{jsonb}}]|
SELECT index, jsonb_agg(payload) as "payload"
FROM table1
GROUP BY 1
ORDER BY 1
As you can see the output does aggregate the columns into a jsonb, but also converts it into an array. Is it possible to remove the array?
You can create your own aggregate that just appends the JSONB values:
create aggregate jsonb_append_agg(jsonb)
(
sfunc = jsonb_concat(jsonb, jsonb),
stype = jsonb
);
Then you can do:
SELECT index, jsonb_append_agg(payload) as "payload"
FROM table1
GROUP BY 1
ORDER BY 1

Aggregate JSON object's own key value attributes in Athena using OpenX SerDe

I have a JSON structure that looks similar to this two example events:
Event 1
{
"event":{
"type" : "FooBarEvent"
"kv":{
"key1":"value1",
"key2":"value2",
"3":"three",
"d":"4"
}
}
}
Event 2
{
"event":{
"type" : "FooBarEvent"
"kv":{
"key1":"value1",
"key2":"value2000",
"e": "4"
}
}
}
Note that I do not know upfront which keys and values are coming in and I'd like to aggregate(count) them. Output for the two events would look like follows:
+-----------+------+-----------+--------+
| EventType | Key | Value | Amount |
+-----------+------+-----------+--------+
| Foobar | key1 | value1 | 2 |
+-----------+------+-----------+--------+
| Foobar | key2 | value1 | 1 |
+-----------+------+-----------+--------+
| Foobar | key2 | value2000 | 1 |
+-----------+------+-----------+--------+
| Foobar | 3 | three | 1 |
+-----------+------+-----------+--------+
| Foobar | d | 4 | 1 |
+-----------+------+-----------+--------+
| Foobar | e | 4 | 1 |
+-----------+------+-----------+--------+
Is there a way accomplishing this in Athena without changing the JSON structure? How Do I map and flatten/query the structure best?
Hello it should work using UNNEST functionality and casting the kv to a map. The following query should work assuming your data is stored in a table called json_data
with data_formated as
(
select *
,json_extract_scalar(json_field,'$.event.type') event_type
,cast(json_extract(json_field,'$.event.kv') as map(varchar,varchar)) key_value
from json_data
)
,unnesting_data as
(
select *
from data_formated
cross join unnest(key_value) as t (k,v)
)
select event_type,k,v,count(1) amount
from unnesting_data
group by 1,2,3
order by 1,2,3;

Postgresql: select rows by OR condition - including going through json array

I have a table created by following query:
create table data
(
id integer not null unique,
owner text,
users jsonb not null
);
The table looks like this:
+----+-------+---------------------------------------------+
| id | owner | users |
+----+-------+---------------------------------------------+
| 1 | alice | [] |
| 2 | bob | [{"accountId": "alice", "role": "manager"}] |
| 3 | john | [{"accounId": "bob", "role": "guest"}] |
+----+-------+---------------------------------------------+
I need to get rows 1 and 2 on behalf of Alice.
Getting owner-based rows works perfect:
SELECT *
FROM data
WHERE owner = 'alice'
Getting jsonb-based rows is a little trickier though managable:
SELECT *
FROM data, jsonb_array_elements(users) x
WHERE (x ->> 'accountId') = 'alice'
But getting them together gets me just the jsonb-based ones:
SELECT *
FROM data, jsonb_array_elements(users) x
WHERE owner = 'alice' OR (x ->> 'accountId') = 'alice'
How do I get the selection that looks like following?
+----+-------+---------------------------------------------+
| id | owner | users |
+----+-------+---------------------------------------------+
| 1 | alice | [] |
| 2 | bob | [{"accountId": "alice", "role": "manager"}] |
+----+-------+---------------------------------------------+
Even better if I can get a selection that looks like this
+----+----------+
| id | role |
+----+----------+
| 1 | owner |
| 2 | manager |
+----+----------+
The problem is with the empty json array, which evicts the corresponding row from the result set when cross joined with jsonb_array_elements(). Instead, you can make a left join lateral:
select d.*
from data d
left join lateral jsonb_array_elements(d.users) as x(js) on 1 = 1
where 'alice' in (d.owner, x.js ->> 'accountId')
Note that, if your array always contains 0 or 1 element, tyou don't need the lateral join - your query would be simpler phrased as:
select d.*
from data d
where 'alice' in (d.owner, d.data -> 0 ->> 'accountId')
Demo on DB Fiddle - both queries return:
id | owner | users
-: | :---- | :------------------------------------------
1 | alice | []
2 | bob | [{"role": "manager", "accountId": "alice"}]

Can I sum an array of jsonb in Postgresql with dynamic keys in a select statement?

I have a jsonb object in postgres:
[{"a": 1, "b":5}, {"a":2, "c":3}]
I would like to get an aggregate sum per unique key:
{"a":3, "b":5, "c":3}
The keys are unpredictable.
Is it possible to do this in Postgres with a select statement?
Query:
SELECT key, SUM(value::INTEGER)
FROM (
SELECT (JSONB_EACH_TEXT(j)).*
FROM JSONB_ARRAY_ELEMENTS('[{"a": 1, "b":5}, {"a":2, "c":3}]') j
) j
GROUP BY key
ORDER BY key;
Results:
| key | sum |
| --- | --- |
| a | 3 |
| b | 5 |
| c | 3 |
DB Fiddle