Structure of table in BigQuery - google-bigquery

I want to create table of below JSON.
{
"store_nbr": "1234",
"sls_dt": "2014-01-01 00:00:00",
"Items": [{
"sku": "3456",
"sls_amt": "9.99",
"discounts": [{
"disc_nbr": "1",
"disc_amt": "0.99"
}, {
"disc_nbr": "2",
"disc_amt": "1.00"
}]
}]
}
Can anyone help me what would be the structure of this JSON on BigQuery ? and How I can retrieve data using SQL query ?

I am wondering what would be the structure of my table?
Try below for BigQuery Standard SQL
#standardSQL
WITH yourTable AS (
SELECT
1234 AS store_nbr,
DATE('2014-01-01 00:00:00') AS sls_dt,
[STRUCT(
3456 AS sku,
9.99 AS sls_amt,
[STRUCT<disc_nbr INT64, disc_amt FLOAT64>
(1, 0.99),
(2, 1.00)
] AS discounts
)] AS items
)
SELECT *
FROM yourTable
The structure of table here is:
or if to look in Web UI:
How I can read values of it?
It is really depends on what exactly and how you want to "read" out of this data!
For example if you want to calc total discount per each sale - it can looks as below
#standardSQL
WITH yourTable AS (
SELECT
1234 AS store_nbr,
DATE('2014-01-01 00:00:00') AS sls_dt,
[STRUCT(
3456 AS sku, 9.99 AS sls_amt, [STRUCT<disc_nbr INT64, disc_amt FLOAT64>(1, 0.99), (2, 1.00)] AS discounts
)] AS items
)
SELECT
t.*,
(SELECT SUM(disc.disc_amt) FROM UNNEST(item.discounts) AS disc) AS total_discount
FROM yourTable AS t, UNNEST(items) AS item
I recommend you first to complete your "exercise" with table creation and actually get data into it, so than you can ask specific questions about query you want to build.
But this should be a new post so you do not mix all together as an all-in-one question, as such type of questions usually not welcomed here on SO

Related

Aggregate rows into complex json - Athena

I've an Athena query which gives me the below table for a given IDs:
ID
ID_2
description
state
First
row
abc
[MN, SD]
Second
row
xyz
[AL, CA ]
I'm using the array_agg function to merge states into an array. Within the query itself I want convert the output into the format below:
ID
ID_2
custom_object
First
row
{'description': 'abc', 'state': ['MN', 'SD']}
I'm looking at the Athena docs but haven't found function that does just this. I'm experimenting with multimap_agg and map_agg but this seems to be too complex to achieve. How can I do this, please help!
You can do it after aggregation by combining casts to json and creating map:
-- sample data
WITH dataset (ID, ID_2, description, state) AS (
VALUES ('First', 'row', 'abc', array['MN', 'SD']),
('Second', 'row', 'xyz', array['AL', 'CA' ])
)
-- query
select ID,
ID_2,
cast(
map(
array [ 'description',
'state' ],
array [ cast(description as json),
cast(state as json) ]
) as json
) custom_object
from dataset
Output:
ID
ID_2
custom_object
First
row
{"description":"abc","state":["MN","SD"]}
Second
row
{"description":"xyz","state":["AL","CA"]}

Select items from jsonb array in postgres 12

I'm trying to pull elements from JSONB column.
I have table like:
id NUMBER
data JSONB
data structure is:
[{
"id": "abcd",
"validTo": "timestamp"
}, ...]
I'm querying that row with SELECT * FROM testtable WHERE data #> '[{"id": "abcd"}]', and it almost works like I want to.
The trouble is data column is huge, like 100k records, so I would like to pull only data elements I'm looking for.
For example if I would query for
SELECT * FROM testtable WHERE data #> '[{"id": "abcd"}]' OR data #> '[{"id": "abcde"}]' I expect data column to contain only records with id abcd or abcde. Like that:
[
{"id": "abcd"},
{"id": "abcde"}
]
It would be okay if query would return separate entries with single data record.
I have no ideas how to solve it, trying lot options since days.
To have separate output for records having multiple matches
with a (id, data) as (
values
(1, '[{"id": "abcd", "validTo": 2}, {"id": "abcde", "validTo": 4}]'::jsonb),
(2, '[{"id": "abcd", "validTo": 3}, {"id": "abc", "validTo": 6}]'::jsonb),
(3, '[{"id": "abc", "validTo": 5}]'::jsonb)
)
select id, jsonb_array_elements(jsonb_path_query_array(data, '$[*] ? (#.id=="abcd" || #.id=="abcde")'))
from a;
You will need to unnest, filter and aggregate back:
select t.id, j.*
from testtable t
join lateral (
select jsonb_agg(e.x) as data
from jsonb_array_elements(t.data) as e(x)
where e.x #> '{"id": "abcd"}'
or e.x #> '{"id": "abcde"}'
) as j on true
Online example
With Postgres 12 you could use jsonb_path_query_array() as an alternative, but that would require to repeat the conditions:
select t.id,
jsonb_path_query_array(data, '$[*] ? (#.id == "abcd" || #.id == "abcde")')
from testtable t
where t.data #> '[{"id": "abcd"}]'
or t.data #> '[{"id": "abcde"}]'
Didn't quite get your question.Are you asking that the answer should only contain data column without id column .Then I think this is the query:
Select data from testtable where id="abcd" or id="abcde";

How to iterate on json data with sql/knexjs query

I'm using postgresql db.I have a table named 'offers' which has a column 'validity' which contains the following data in JSON format:
[{"end_date": "2019-12-31", "program_id": "4", "start_date": "2019-10-27"},
{"end_date":"2020-12-31", "program_id": "6", "start_date": "2020-01-01"},
{"end_date": "2020-01-01", "program_id": "3", "start_date": "2019-10-12"}]
Now I want to get all records where 'validity' column contains:
program_id = 4 and end_date > current_date.
How to write SQL query or knexjs query to achieve this?
Thanks in advance
You can use an EXISTS condition:
select o.*
from offers o
where exists (select *
from jsonb_array_elements(o.validity) as v(item)
where v.item ->> 'program_id' = '3'
and (v.item ->> 'end_date')::date > current_date)
Online example

Extracting data from an array of JSON objects for specific object values

In my table, there is a column of JSON type which contains an array of objects describing time offsets:
[
{
"type": "start"
"time": 1.234
},
{
"type": "end"
"time": 50.403
}
]
I know that I can extract these with JSON_EACH() and JSON_EXTRACT():
CREATE TEMPORARY TABLE Items(
id INTEGER PRIMARY KEY,
timings JSON
);
INSERT INTO Items(timings) VALUES
('[{"type": "start", "time": 12.345}, {"type": "end", "time": 67.891}]'),
('[{"type": "start", "time": 24.56}, {"type": "end", "time": 78.901}]');
SELECT
JSON_EXTRACT(Timings.value, '$.type'),
JSON_EXTRACT(Timings.value, '$.time')
FROM
Items,
JSON_EACH(timings) AS Timings;
This returns a table like:
start 12.345
end 67.891
start 24.56
end 78.901
What I really need though is to:
Find the timings of specific types. (Find the first object in the array that matches a condition.)
Take this data and select it as a column with the rest of the table.
In other words, I'm looking for a table that looks like this:
id start end
-----------------------------
0 12.345 67.891
1 24.56 78.901
I'm hoping for some sort of query like this:
SELECT
id,
JSON_EXTRACT(timings, '$.[type="start"].time'),
JSON_EXTRACT(timings, '$.[type="end"].time')
FROM Items;
Is there some way to use path in the JSON functions to select what I need? Or, some other way to pivot what I have in the first example to apply to the table?
One possibility:
WITH cte(id, json) AS
(SELECT Items.id
, json_group_object(json_extract(j.value, '$.type'), json_extract(j.value, '$.time'))
FROM Items
JOIN json_each(timings) AS j ON json_extract(j.value, '$.type') IN ('start', 'end')
GROUP BY Items.id)
SELECT id
, json_extract(json, '$.start') AS start
, json_extract(json, '$.end') AS "end"
FROM cte
ORDER BY id;
which gives
id start end
---------- ---------- ----------
1 12.345 67.891
2 24.56 78.901
Another one, that uses the window functions added in sqlite 3.25 and avoids creating intermediate JSON objects:
SELECT DISTINCT Items.id
, max(json_extract(j.value, '$.time'))
FILTER (WHERE json_extract(j.value, '$.type') = 'start') OVER ids AS start
, max(json_extract(j.value, '$.time'))
FILTER (WHERE json_extract(j.value, '$.type') = 'end') OVER ids AS "end"
FROM Items
JOIN json_each(timings) AS j ON json_extract(j.value, '$.type') IN ('start', 'end')
WINDOW ids AS (PARTITION BY Items.id)
ORDER BY Items.id;
The key is using the ON clause of the JOIN to limit results to just the two objects in each array that you care about, and then merging those up to two rows for each Items.id into one with a couple of different approaches.

BigQuery query nested json

I have JSON data which is saved in BigQuery as a string.
{
"event":{
"action":"prohibitedSoftwareCheckResult",
"clientTime":"2017-07-16T12:55:40.828Z",
"clientTimeZone":"3",
"serverTime":"2017-07-16T12:55:39.000Z",
"processList":{
"1":"outlook.exe",
"2":"notepad.exe"
}
},
"user":{
"id":123456,
}
}
I want to have a result set where each process will be in a different row.
Something like:
UserID ProcessName
-------------------------
123456 outlook.exe
123456 notepad.exe
I saw there is an option to query repeated data but the field needs to be RECORD type to my understanding.
Is it possible to convert to RECORD type "on the fly" in a subquery? (I can't change the source field to RECORD).
Or, is there a different way to return the desired result set?
This could be a possible work around for you:
SELECT
user_id,
processListValues
FROM(
SELECT
JSON_EXTRACT_SCALAR(json_data, '$.user.id') user_id,
REGEXP_EXTRACT_ALL(JSON_EXTRACT(json_data, '$.event.processList'), r':"([a-zA-Z0-9\.]+)"') processListValues
FROM data
),
UNNEST(processListValues) processListValues
Using your JSON as example:
WITH data AS(
SELECT """{
"event":{
"action":"prohibitedSoftwareCheckResult",
"clientTime":"2017-07-16T12:55:40.828Z",
"clientTimeZone":"3",
"serverTime":"2017-07-16T12:55:39.000Z",
"processList":{
"1":"outlook.exe",
"2":"notepad.exe",
"3":"outlo3245345okexe"
}
},
"user":{
"id":123456,
}
}""" as json_data
)
SELECT
user_id,
processListValues
FROM(
SELECT
JSON_EXTRACT_SCALAR(json_data, '$.user.id') user_id,
REGEXP_EXTRACT_ALL(JSON_EXTRACT(json_data, '$.event.processList'), r':"([a-zA-Z0-9\.]+)"') processListValues
FROM data
),
UNNEST(processListValues) processListValues
Results:
Row user_id processListValues
1 123456 outlook.exe
2 123456 notepad.exe
3 123456 outlo3245345okexe