How to do an UNPIVOT on this json data? - sql

I commonly have json data that is stored in BigQuery that is a key-value mapping such as the following:
id product sales_data
1 socks {"US": {"Price": 2.99, "Currency": "USD"},
"CA": {"Price": 3.04, "Currency": "CAD"}}
What I want to do is two-fold:
First, push the 'keys' into a consistent values struct
Unnest the now-consistent data
For example:
# push_keys_to_value(field, path, renamed)
# push_keys_to_value(sales_data, '$', 'Country'}
id product sales_data
1 socks [{"Price" 2.99, "Currency": "USD", "Country": "US"}, {"Price" 3.04, "Currency": "CAD", "Country": "CA"}]
Now unnested:
id product sales_data
1 socks {"Price" 2.99, "Currency": "USD", "Country": "US"}
1 socks {"Price" 3.04, "Currency": "CAD", "Country": "CA"}
This is a pretty common pattern I have -- taking string (json) data and 'un-pivoting' it. How could I do this in BigQuery, and is this a common pattern?

Consider below approach
select id, product, country,
json_extract_scalar(_, '$.Price') Price,
json_extract_scalar(_, '$.Currency') Currency
from (
select *, regexp_extract(sales_date, r'"' || country || '": ({.*?})') _
from your_table,
unnest(`bqutil.fn.json_extract_keys`(sales_date)) country
)
if applied to sample data in your question - output is

Related

Querying over PostgreSQL JSONB column

I have a table "blobs" with a column "metadata" in jsonb data-type,
Example:
{
"total_count": 2,
"items": [
{
"name": "somename",
"metadata": {
"metas": [
{
"id": "11258",
"score": 6.1,
"status": "active",
"published_at": "2019-04-20T00:29:00",
"nvd_modified_at": "2022-04-06T18:07:00"
},
{
"id": "9251",
"score": 5.1,
"status": "active",
"published_at": "2018-01-18T23:29:00",
"nvd_modified_at": "2021-01-08T12:15:00"
}
]
}
]
}
I want to identify statuses in the "metas" array that match with certain, given strings. I have tried the following so far but without results:
SELECT * FROM blobs
WHERE metadata is not null AND
(
SELECT count(*) FROM jsonb_array_elements(metadata->'metas') AS cn
WHERE cn->>'status' IN ('active','reported')
) > 0;
It would also be sufficient if I could compare the string with "status" in the first array object.
I am using PostgreSQL 9.6.24
for some clarity I usually break code into series of WITH statements. My idea for your problem would be to use json path (https://www.postgresql.org/docs/12/functions-json.html#FUNCTIONS-SQLJSON-PATH) and function jsonb_path_query.
Below code gives a list of counts, I will leave the rest to you, to get final data.
I've added ID column just to have something to join on. Otherwise join on metadata.
Also, note additional " in where condition. Left join in blob_ext is there just to have null value if metadata is not present or that path does not work.
with blob as (
select row_number() over()"id", * from (VALUES
(
'{
"total_count": 2,
"items": [
{
"name": "somename",
"metadata": {
"metas": [
{
"id": "11258",
"score": 6.1,
"status": "active",
"published_at": "2019-04-20T00:29:00",
"nvd_modified_at": "2022-04-06T18:07:00"
},
{
"id": "9251",
"score": 5.1,
"status": "active",
"published_at": "2018-01-18T23:29:00",
"nvd_modified_at": "2021-01-08T12:15:00"
}
]
}
}
]}'::jsonb),
(null::jsonb)) b(metadata)
)
, blob_ext as (
select bb.*, blob_sts.status
from blob bb
left join (
select
bb2.id,
jsonb_path_query (bb2.metadata::jsonb, '$.items[*].metadata.metas[*].status'::jsonpath)::character varying "status"
FROM blob bb2
) as blob_sts ON
blob_sts.id = bb.id
)
select bbe.id, count(*) cnt, bbe.metadata
from blob_ext bbe
where bbe.status in ('"active"', '"reported"')
group by bbe.id, bbe.metadata;
A way is to peel one layer at a time with jsonb_extract_path() and jsonb_array_elements():
with cte_items as (
select id,
metadata,
jsonb_extract_path(jx.value,'metadata','metas') as metas
from blobs,
lateral jsonb_array_elements(jsonb_extract_path(metadata,'items')) as jx),
cte_metas as (
select id,
metadata,
jsonb_extract_path_text(s.value,'status') as status
from cte_items,
lateral jsonb_array_elements(metas) s)
select distinct
id,
metadata
from cte_metas
where status in ('active','reported');

Looping JSON Array in JSONB field

I want to loop over JSONB column and get certain values (price, discount_price, and currency) of relevant JSON objects to my filter. But I get this error:
syntax error at or near "FOR"
Value of the parts column which is JSONB:
[
{
"item_tags": ["black", "optional"],
"name": "Keyboard",
"price": 50,
"currency": "USD",
"discount_price": 40
},
{
"item_tags": ["white", "optional"],
"name": "Mouse",
"price": 40,
"currency": "USD",
"discount_price": 30
}
]
My query ($1 is the user input. Can be 'optional' or 'required'):
SELECT
id,
title,
FOR element IN SELECT * FROM jsonb_array_elements(parts)
LOOP
CASE
WHEN element->'item_tags' #> $1
THEN SELECT element->>'discount_price' AS price, element->>'currency' AS currency
ELSE SELECT element->>'price' AS price, element->>'currency' AS currency
END
END LOOP
FROM items;
This is the output I want to get if $1 is equal to 'optional':
{
"id": 1,
"title": "example title",
"parts": [
{
"name": "Keyboard",
"discount_price": 40,
"currency": "USD"
},
{
"name": "Mouse",
"discount_price": 30,
"currency": "USD"
}
]
}
Any help is highly appreciated. I follow official docs but it is not beginner-friendly. I use PostgreSQL 13.
You need to unnest the array, filter out the unwanted parts, remove the unwanted key, then aggregate the changed parts back into a JSON array.
This can be done using a scalar sub-query:
select id, title,
(select jsonb_agg(x.part - 'item_tags')
from jsonb_array_elements(i.parts) as x(part)
where (x.part -> 'item_tags') ? 'optional')
from items i;
The expression x.part - 'item_tags' removes the item_tags key from the JSON object. The ? operator tests if the item_tags array contains the string on the right hand side. And jsonb_agg() then aggregates those JSON values back into an array.
You can pass your parameter in the place of the 'optional' string.

JSON database table query

I have JSON table with some objects and I am trying to query the amount value in the object
{
"authorizations": [
{
"id": "d50",
"type": "passed",
"amount": 100,
"fortId": 5050,
"status": "GENERATED",
"voided": false,
"cardNumber": 3973,
"expireDate": null,
"description": "Success",
"customerCode": "858585",
"paymentMethod": "cash",
"changeDatetime": null,
"createDatetime": 000000000,
"reservationCode": "202020DD",
"authorizationCode": "D8787"
},
{
"id": "d50",
"type": "passed",
"amount": 100,
"fortId": 5050,
"status": "GENERATED",
"voided": false,
"cardNumber": 3973,
"expireDate": null,
"description": "Success",
"customerCode": "858585",
"paymentMethod": "cash",
"changeDatetime": null,
"createDatetime": 000000000,
"reservationCode": "202020DD",
"authorizationCode": "D8787"
}
],
}
I have tried the following four options, but none of these give me the value of the object:
SELECT info #> 'authorizations:[{amount}]'
FROM idv.reservations;
SELECT info -> 'authorizations:[{amount}]'
FROM idv.reservations;
info -> ''authorizations' ->> 'amount'
FROM idv.reservations
select (json_array_elements(info->'authorizations')->'amount')::int from idv.reservations
note I am using DBeaver
If you want one row per object contained in the "authorizations" JSON array, with the corresponding amount, you can use a lateral join and jsonb_array_elements():
select r.*, (x.obj ->> 'amount')::int as amount
from reservations r
cross join lateral jsonb_array_elements(r.info -> 'authorizations') x(obj)
We can also extract all amounts at once and put them in an array, like so:
select r.*,
jsonb_path_query_array(r.info, '$.authorizations[*].amount') as amounts
from reservations r
Demo on DB Fiddlde

JSON Parsing in Snowflake - Square Brackets At Start

I'm trying to parse out some JSON files in snowflake. In this case, I'd like to extract the "gift card" from the line that has "fulfillment_service": "gift_card". I've had success querying one dimensional JSON data, but this - with the square brackets - is confounding me.
Here's my simple query - I've created a small table called "TEST_WEEK"
select line_items:fulfillment_service
from TEST_WEEK
, lateral flatten(FULFILLMENTS:line_items) line_items;
Hopefully this isn't too basic a question. I'm very new with parsing JSON.
Thanks in advance!
Here's the start of the FULLFILLMENTS field with the info I want to get at.
[
{
"admin_graphql_api_id": "gid://shopify/Fulfillment/2191015870515",
"created_at": "2020-08-10T14:54:38Z",
"id": 2191015870515,
"line_items": [
{
"admin_graphql_api_id": "gid://shopify/LineItem/5050604355635",
"discount_allocations": [],
"fulfillable_quantity": 0,
"fulfillment_service": "gift_card",
"fulfillment_status": "fulfilled",
"gift_card": true,
"grams": 0,
"id": 5050604355635,
"name": "Gift Card - $100.00",
"origin_location": {
"address1": "100 Indian Road",
"address2": "",
"city": "Toronto",
"country_code": "CA",
Maybe you can use two lateral flatten to process values in line_items array:
Sample table:
create table TEST_WEEK( FULFILLMENTS variant ) as
select parse_json(
'[
{
"admin_graphql_api_id": "gid://shopify/Fulfillment/2191015870515",
"created_at": "2020-08-10T14:54:38Z",
"id": 2191015870515,
"line_items": [
{
"admin_graphql_api_id": "gid://shopify/LineItem/5050604355635",
"discount_allocations": [],
"fulfillable_quantity": 0,
"fulfillment_service": "gift_card",
"fulfillment_status": "fulfilled",
"gift_card": true,
"grams": 0,
"id": 5050604355635,
"name": "Gift Card - $100.00",
"origin_location": {
"address1": "100 Indian Road",
"address2": "",
"city": "Toronto",
"country_code": "CA"
}
}
]
}
]');
Sample query:
select s.VALUE:fulfillment_service
from TEST_WEEK,
lateral flatten( FULFILLMENTS ) f,
lateral flatten( f.VALUE:line_items ) s;
The output:
+-----------------------------+
| S.VALUE:FULFILLMENT_SERVICE |
+-----------------------------+
| "gift_card" |
+-----------------------------+
Those square brackets indicate that you have an array of JSON objects in your FULLFILLMENTS field. Unless there is a real need to have an array of objects in one field you should have a look at the STRIP_OUTER_ARRAY property of the COPY command. An example can be found here in the Snowflake documentation:
copy into <table>
from #~/<file>.json
file_format = (type = 'JSON' strip_outer_array = true);
In case others are stuck with same data issue (all json data in one array), I have this solution:
select f.VALUE:fulfillment_service::string
from TEST_WEEK,
lateral flatten( FULFILLMENTS[0].line_items ) f;
With this, you just grab the first element of the array (which is the only element).
If you have nested array elements, just add this to the lateral flatten:
, RECURSIVE => TRUE, mode => 'array'

How to convert JSON string column row into a queryable table

I have exported to BigQuery from Firestore a whole collection to perform certain queries on it.
After the data was populated in my BigQuery console, now I can query the whole set like this
SELECT *
FROM `myapp-1a602.firestore_orders.orders_raw_changelog`
LIMIT 1000
Now, this statement throws my different columns, but the one I'm looking for is the data column, in my data column is each document JSON, but is in json format and I need to query all this values.
Now, this is the data from one row
{
"cart": [{
"qty": 1,
"description": "Sprite 1 L",
"productName": "Sprite 1 Liter",
"price": 1.99,
"productId": 9
}],
"storeName": "My awesome shop",
"status": 5,
"timestamp": {
"_seconds": 1590713204,
"_nanoseconds": 916000000
}
}
This data is inside the data column, so if I do this
SELECT data
FROM `myapp-1a602.firestore_orders.orders_raw_changelog`
LIMIT 1000
I will get all the json values for each document, but I don't know how to query that values, lets say I want to know all orders with status 5 and shopName My awesome shop , now, I need to do something with this json to convert it into a table ? does I need to perform the query in the json itself ?
How can I query this json output ?
Thanks
I need to do something with this json to convert it into a table ? does I need to perform the query in the json itself ?
Below is for BigQuery Standard SQL
#standardSQL
SELECT * EXCEPT(data, cart_item),
JSON_EXTRACT(data, '$.status') AS status,
JSON_EXTRACT(data, '$.storeName') AS storeName,
JSON_EXTRACT(cart_item, '$.qty') AS qty,
JSON_EXTRACT(cart_item, '$.description') AS description,
JSON_EXTRACT(cart_item, '$.productName') AS productName,
JSON_EXTRACT(cart_item, '$.price') AS price,
JSON_EXTRACT(cart_item, '$.productId') AS productId
FROM `project.dataset.table`,
UNNEST(JSON_EXTRACT_ARRAY(data, '$.cart')) cart_item
If to apply to sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 order_id, '''
{
"cart": [{
"qty": 1,
"description": "Sprite 1 L",
"productName": "Sprite 1 Liter",
"price": 1.99,
"productId": 9
},{
"qty": 2,
"description": "Fanta 1 L",
"productName": "Fanta 1 Liter",
"price": 1.99,
"productId": 10
}],
"storeName": "My awesome shop",
"status": 5,
"timestamp": {
"_seconds": 1590713204,
"_nanoseconds": 916000000
}
}
''' data
)
SELECT * EXCEPT(data, cart_item),
JSON_EXTRACT(data, '$.status') AS status,
JSON_EXTRACT(data, '$.storeName') AS storeName,
JSON_EXTRACT(cart_item, '$.qty') AS qty,
JSON_EXTRACT(cart_item, '$.description') AS description,
JSON_EXTRACT(cart_item, '$.productName') AS productName,
JSON_EXTRACT(cart_item, '$.price') AS price,
JSON_EXTRACT(cart_item, '$.productId') AS productId
FROM `project.dataset.table`,
UNNEST(JSON_EXTRACT_ARRAY(data, '$.cart')) cart_item
result is
Row order_id status storeName qty description productName price productId
1 1 5 "My awesome shop" 1 "Sprite 1 L" "Sprite 1 Liter" 1.99 9
2 1 5 "My awesome shop" 2 "Fanta 1 L" "Fanta 1 Liter" 1.99 10
You canwork with the json functiosn like the
CrEATE Table products (id Integer,attribs_json JSON );
INSERT INTO products VALUES (1,'{
"cart": [{
"qty": 1,
"description": "Sprite 1 L",
"productName": "Sprite 1 Liter",
"price": 1.99,
"productId": 9
}],
"storeName": "My awesome shop",
"status": 5,
"timestamp": {
"_seconds": 1590713204,
"_nanoseconds": 916000000
}
}');
select * from products where attribs_json->"$.status"
= 5 AND attribs_json->"$.storeName"
= 'My awesome shop';
id | attribs_json
-: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | {"cart": [{"qty": 1, "price": 1.99, "productId": 9, "description": "Sprite 1 L", "productName": "Sprite 1 Liter"}], "status": 5, "storeName": "My awesome shop", "timestamp": {"_seconds": 1590713204, "_nanoseconds": 916000000}}
db<>fiddle here
select attribs_json->"$.storeName",attribs_json->"$.status",attribs_json->"$.cart[0].qty" from products where attribs_json->"$.status"
= 5 AND attribs_json->"$.storeName"
= 'My awesome shop';
attribs_json->"$.storeName" | attribs_json->"$.status" | attribs_json->"$.cart[0].qty"
:-------------------------- | :----------------------- | :----------------------------
"My awesome shop" | 5 | 1
db<>fiddle here
And there is JSON_EXTRACT for mysql 5.7 and above.
Finally that is in the end only text, so you could use also REGEXP or RLIKE
To transfer the jaso again to rows, you can use JSON_TABLE
What you must do is to extract the values from the json data as:
SELECT .......
WHERE data->'$.storeName'= "My awesome shop" and data->'$.status' = 5
Extracting from the 'cart' or ´the 'timestamp' keys will give you a Json object that needs further extracting to get the data.
I hope it'll help you
You probably want to have a look at the MySql documentation (https://dev.mysql.com/doc/refman/8.0/en/json.html) or https://www.mysqltutorial.org/mysql-json/.
You can use UNNEST in the WHERE clause to access the cart's columns, and JSON_EXTRACT functions in the WHERE clause to filter the rows wanted. You need to take care on accessing either the json root or the array cart; json_data and cart_items in the example below (by the way, in your example shopName doesn't exist but storeName does).
WITH
`myapp-1a602.firestore_orders.orders_raw_changelog` AS (
SELECT
'{"cart": [{"qty": 1,"description": "Sprite 1 L","productName": "Sprite 1 Liter","price": 1.99,"productId": 9}, {"qty": 11,"description": "Sprite 11 L","productName": "Sprite 11 Liter","price": 11.99,"productId": 19}],"storeName": "My awesome shop","status": 5,"timestamp": {"_seconds": 1590713204,"_nanoseconds": 916000000}}' json_data )
SELECT
JSON_EXTRACT(json_data, '$.status') AS status,
JSON_EXTRACT(json_data, '$.storeName') AS storeName,
JSON_EXTRACT(cart_items, '$.productName') AS product,
JSON_EXTRACT_SCALAR(cart_items, '$.qty') AS qty
FROM
`myapp-1a602.firestore_orders.orders_raw_changelog`,
UNNEST(JSON_EXTRACT_ARRAY(json_data, '$.cart')) AS cart_items
WHERE
JSON_EXTRACT(json_data,'$.storeName') like "\"My awesome shop\"" AND
CAST(JSON_EXTRACT_SCALAR(json_data,'$.status') AS NUMERIC) = 5