How to unnest bigquery field that is stored as a string? - google-bigquery

I am trying to unnest a field but something is wrong with my query.
Sample data in my table
'1234', '{ "id" : "123" , "items" : [ { "quantity" : 1 , "product" : { "id" : "p1" , "categories" : [ "cat1","cat2","cat3"] }}] }'
There are 2 fields in the dataset: row_id and parts, where parts is a dictionary object with list items (categories) in it but datatype of a parts is string. I would like the output to be individual row for each category.
This is what I have tried but I am not getting any result back.
#standardSQL
with t as (
select "1234" as row_id, '{ "id" : "123" , "items" : [ { "quantity" : 1 , "product" : { "id" : "p1" , "categories" : [ "cat1","cat2","cat3"] }}] }' as parts
)
select row_id, _categories
from t,
UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(parts, '$.items'), r'"categories":"(.+?)"')) _categories
expected result
id, _categories
1234, cat1
1234, cat2
1234, cat3

Below is for BigQuery Standard SQL
#standardSQL
WITH t AS (
SELECT "1234" AS row_id, '{ "id" : "123" , "items" : [ { "quantity" : 1 , "product" : { "id" : "p1" , "categories" : [ "cat1","cat2","cat3"] }}] }' AS parts
)
SELECT row_id, REPLACE(_categories, '"', '') _categories
FROM t, UNNEST(SPLIT(REGEXP_EXTRACT(
JSON_EXTRACT(parts, '$.items'),
r'"categories":\[(.+?)]'))
) _categories
and produces expected result
Row row_id _categories
1 1234 cat1
2 1234 cat2
3 1234 cat3
Update
Above solution was mostly focused on fixing regexp used in extract - but not addressed more generic case of having multiple products. Below solution addresses such more generic case
#standardSQL
WITH t AS (
SELECT "1234" AS row_id, '''{ "id" : "123" , "items" : [
{ "quantity" : 1 , "product" : { "id" : "p1" , "categories" : [ "cat1","cat2","cat3"] }},
{ "quantity" : 2 , "product" : { "id" : "p2" , "categories" : [ "cat4","cat5","cat6"] }}
] }''' AS parts
)
SELECT row_id, REPLACE(category, '"', '') category
FROM t, UNNEST(REGEXP_EXTRACT_ALL(parts, r'"categories" : \[(.+?)]')) categories,
UNNEST(SPLIT(categories)) category
with result
Row row_id category
1 1234 cat1
2 1234 cat2
3 1234 cat3
4 1234 cat4
5 1234 cat5
6 1234 cat6

Related

How to group multiple values to only two groups?

So, I have 2 tables.
Type table
id
Name
1.
General
2.
Mostly Used
3.
Low
Component table
id
Name
typeId
1.
Component 1
1
2.
Component 2
1
4.
Component 4
2
6.
Component 6
2
7.
Component 5
3
There can be numerous types but I want to get only 'General' and 'Others' as types along with the component as follows:
[{
"General": [{
"id": "1",
"name": "General",
"component": [{
"id": 1,
"name": "component 1",
"componentTypeId": 1
}, {
"id": 2,
"name": "component 2",
"componentTypeId": 1
}]
}],
"Others": [{
"id": "2",
"name": "Mostly Used",
"component": [{
"id": 4,
"name": "component 4",
"componentTypeId": 2
}, {
"id": 6,
"name": "component 6",
"componentTypeId": 2
}]
},
{
"id": "3",
"name": "Low",
"component": [{
"id": 7,
"name": "component 5",
"componentTypeId": 3
}]
}
]
}]
WITH CTE_TYPES AS (
SELECT
CASE WHEN t. "name" <> 'General' THEN
'Others'
ELSE
'General'
END AS TYPE,
t.id,
t.name
FROM
type AS t
GROUP BY
TYPE,
t.id
),
CTE_COMPONENT AS (
SELECT
c.id,
c.name,
c.typeid
FROM
component c
)
SELECT
JSON_AGG(jsonb_build_object ('id', CT.id, 'name', CT.name, 'type', CT.type, 'component', CC))
FROM
CTE_COMPONENTTYPES CT
INNER JOIN CTE_COMPONENT CC ON CT.id = CC.tradingplancomponenttypeid
GROUP BY
CT.type
I get 2 types from the query as I expected but the components are not grouped together
Can you also point to resources to learn advanced SQL queries?
Here after is a solution to get your expected result as specified in your question :
First part
The first part of the query aggregates all the components with the same TypeId into a jsonb array. It also calculates the new type column with the value 'Others' for all the type names different from General or with the value 'General' :
SELECT CASE WHEN t.name <> 'General' THEN 'Others' ELSE 'General' END AS type
, t.id, t.name
, jsonb_build_object('id', t.id, 'name', t.name, 'component', jsonb_agg(jsonb_build_object('id', c.id, 'name', c.name, 'componentTypeId', c.typeid))) AS list
FROM component AS c
INNER JOIN type AS t
ON t.id = c.typeid
GROUP BY t.id, t.name
jsonb_build_object builds a jsonb object from a set of key/value arguments
jsonb_agg aggregates jsonb objects into a single jsonb array.
Second part
The second part of the query is much more complex because of the structure of your expected result where you want to nest the types which are different from General with their components inside each other according to the TypeId order, ie Low type with TypeId = 3 is nested inside Mostly Used type with TypeId = 2 :
{ "id": "2",
, "name": "Mostly Used"
, "component": [ { "id": 4
, "name": "component 4"
, "componentTypeId": 2
}
, { ... }
, { "id": "3"
, "name": "Low" --> 'Low' type is nested inside 'Mostly Used' type
, "component": [ { "id": 7
, "name": "component 5"
, "componentTypeId": 3
}
, { ... }
]
}
]
}
To do such a nested structure with a random number of TypeId, you could create a recursive query, but I prefer here to create a user-defined aggregate function which will make the query much more simple and readable, see the manual. The aggregate function jsonb_set_inv_agg is based on the user-defined function jsonb_set_inv which inserts the jsonb object x inside the existing jsonb object z according to the path p. This function is based on the jsonb_set standard function :
CREATE OR REPLACE FUNCTION jsonb_set_inv(x jsonb, p text[], z jsonb, b boolean)
RETURNS jsonb LANGUAGE sql IMMUTABLE AS
$$
SELECT jsonb_set(z, p, COALESCE(z#>p || x, z#>p), b) ;
$$ ;
CREATE AGGREGATE jsonb_set_inv_agg(p text[], z jsonb, b boolean)
( sfunc = jsonb_set_inv
, stype = jsonb
) ;
Based on the newly created aggregate function jsonb_set_inv_agg and the jsonb_agg and jsonb_build_object standard functions already seen above, the final query is :
SELECT jsonb_agg(jsonb_build_object('General', x.list)) FILTER (WHERE x.type = 'General')
|| jsonb_build_object('Others', jsonb_set_inv_agg('{component}', x.list, true ORDER BY x.id DESC) FILTER (WHERE x.type = 'Others'))
FROM
( SELECT CASE WHEN t.name <> 'General' THEN 'Others' ELSE 'General' END AS type
, t.id, t.name
, jsonb_build_object('id', t.id, 'name', t.name, 'component', jsonb_agg(jsonb_build_object('id', c.id, 'name', c.name, 'componentTypeId', c.typeid))) AS list
FROM component AS c
INNER JOIN type AS t
ON t.id = c.typeid
GROUP BY t.id, t.name
) AS x
see the full test result in dbfiddle.

SQL query to match and return all occurrences of search string

I have a json document in a column (record) with a table (TABLE) as below. Need to write a SQL query to bring all occurrences of values of fields "a", "b", 'k" within aaagroup.
Result should be:
NAME1 age1 comment1
NAME2 age2
NAME3 comment3
JSON data:
{
"reportfile": {
"aaa": {
"aaagroup": [{
"a": "NAME1",
"b": "age1",
"k": "comment1"
},
{
"a": "NAME2",
"b": "age2"
},
{
"a": "NAME3",
"k": "comment3"
}]
},
"dsa": {
"dsagroup": [{
"j": "Name"
},
{
"j": "Title"
}]
}
}
}
I used the below query for a single occurrence:
Data:
{"reportfile":{"aaa":{"aaagroup":[{"a":"NAME1","k":"age1}]},"dsa":{"dsagroup":[{"j":"USERNAME"}],"l":"1","m":"1"}}}
Query:
select
substr(cc.BUS_NME, 1, strpos(cc.BUS_NME,'"')-1) as BUS_NME,
substr(cc.AGE, 1, strpos(cc.AGE,'"')-1) as AGE
from
(substr(bb.aaa,strpos(bb.aaa,'"a":"')+5) as BUS_NME,
substr(bb.aaa,strpos(bb.aaa,'"k":"')+5) as AGE
from
(substr(aa.G, strpos(aa.G,'"aaagroup'),strpos(aa.G,'},')) as aaa
from
(select substr(record, strpos(record,'"aaagroup')) as G
from TABLE) aa) bb) cc
ush rani – If I am getting your question correctly, you will have a external table like this and you can try below query to get the desire result from external table
sample external table:
CREATE EXTERNAL TABLE Ext_JSON_data(
reportfile string
)
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1'
)
LOCATION
's3://bucket/folder/'
Query to fetch desire result:
WITH the_table AS (
SELECT CAST(social AS MAP(VARCHAR, JSON)) AS social_data
FROM (
VALUES
(JSON '{"aaa": {"aaagroup": [{"a": "NAME1","b": "age1","k": "comment1"},{"a": "NAME2","b": "age2"},{"a": "NAME3","k": "comment3"}]},"dsa": {"dsagroup": [{"j": "Name"},{"j": "Title"}]}}')
) AS t (social)
),
cte_first_level as
(
SELECT
first_level_key
,CAST(first_level_value AS MAP(VARCHAR, JSON))As first_level_value
FROM the_table
CROSS JOIN UNNEST (social_data) AS t (first_level_key, first_level_value)
),
cte_second_level as
(
Select
first_level_key
,SECOND_level_key
,SECOND_level_value
from
cte_first_level
CROSS JOIN UNNEST (first_level_value) AS t (SECOND_level_key, SECOND_level_value)
)
SELECT
first_level_key
,SECOND_level_key
,SECOND_level_value
,items
,items['a'] value_of_a
,items['b'] value_of_b
,items['k'] value_of_k
from
cte_second_level
cross join unnest(cast(json_extract(SECOND_level_value, '$') AS ARRAY<MAP<VARCHAR, VARCHAR>>)) t (items)
Query Output :

SQL GROUP BY and create Element for each GROUP

UPDATE:
I was able to solve this with INSERT INTO:
INSERT INTO newClass
FROM
SELECT newClassID AS id, newClassName AS name
FROM oldClass
GROUP BY newClassID
I'm using SQL with Orient DB version 2.2.
I want to group element of one class by attribute newClassID (there are multiple Elements in this class that share newClassID, lets say I have 20 Elements in the first class and 3 unique newClassID). Each group will have unique newClassIDand also newClassName as these attributes go hand in hand.
For each group I want to create an element of another Class (newClass), that will have the id = newClassID and name = newClassName. In the above case this will result to 3 new Elements in the newClass.
The Elelemt of the first class would look like:
Elem
newClassID
newClassName
1
id1
name1
2
id2
name2
3
id1
name1
4
id3
name3
5
id2
name2
And I wand to create 3 new elements in new Class:
Elem
ID
Name
1
id1
name1
2
id2
name2
3
id3
name3
With this query I get id and name as attributes:
BEGIN
let pattern = SELECT newClassID, newClassName
FROM oldClass GROUP BY newClassID
COMMIT
RETURN { 'id' : $pattern.newClassID, 'name' : $pattern.newClassName }
Result:
[
{
"#type": "d",
"#version": 0,
"name": [
"name1",
"name2",
"name3"
],
"id": [
"id1",
"id2",
"id3"
],
"#fieldTypes": "name=e,id=e"
}
]
Now I want to call this function (called GetNewClass()) and create for each entry in id a newClassElement. I can not use FOREACH as this is just available from version 3.x . To be honest: here I have no clue how to do this..
I tried:
BEGIN
INSERT INTO newClass FROM SELECT GetNewClass()
let newElements = SELECT FROM newClass
RETURN $newElements
But this gives me just one element with:
[
{
"#type": "d",
"#rid": "#41:1",
"#version": 1,
"#class": "newClass",
"GetProdID": {
"name": [
"name1",
"name2",
"name3"
],
"id": [
"id1",
"id2",
"id3"
]
}
}
]
Thanks a lot!

HIve Create Json Array that not contains duplicate

I want to create an array of jsons that not contain duplicate . I had used LATERAL VIEW EXPLODE to break the initial Array , and now i want to group the string json i received and create merged jsons based on a key.
For example if i have :
Col1 :
{"key" : ke , "value" : 1 }
{"key" : ke , "value" : 2 }
{"key" : ke1 , "value" : 5 }
I would like to have
{"key" : ke , "value" : 3 }
{"key" : ke1 , "value" : 5 }
Can you help me?
select concat('{"key":"',jt.key,'","value":',sum(jt.value),'}')
from mytable t
lateral view json_tuple(Col1, 'key', 'value') jt as key,value
group by jt.key
;

insert list into table: cassandra

How can i insert something like this in cassandra:
{
"custID" : '12736467',
"date" : '2013-06-10',
"orderID" : 19482065,
"amount" : 216.28,
"items" : [
{ "id" : 1
"sku" : 87482734,
"quantity" : 4
},
{ "id":2
"sku" : 32851042,
"quantity" : 2
}
]
}
I thought the best way to store this data in cassandra is to create one table with family column:
custId | date | orderID | amount | items
id | sku | quantity
12736467 2013-06-10 19482065 216.28 1 87482734 4
2 32851042 2
create table items {
id text,
sku int,
quantity int};
create table test {
custId text,
date bigint,
orderId text,
amount float,
items list<items >,
PRIMARY KEY ((custId))
};
but how can I do insert into this table without using update for each time (insert into test ...).
Instead of creating a table, it is better to create a User Defined Type.
Then use the following:
INSERT INTO items ("custId", date, "orderId", amount, items)
VALUES ( 'custid', 123456, 'orderid', 10.0,
[ { 'id' : '1', 'sku' : 12345, 'quantity' : 4 },
{ 'id' : '2', 'sku' : 12345, 'quantity' : 2 }]);