BigQuery query nested json - sql

I have JSON data which is saved in BigQuery as a string.
{
"event":{
"action":"prohibitedSoftwareCheckResult",
"clientTime":"2017-07-16T12:55:40.828Z",
"clientTimeZone":"3",
"serverTime":"2017-07-16T12:55:39.000Z",
"processList":{
"1":"outlook.exe",
"2":"notepad.exe"
}
},
"user":{
"id":123456,
}
}
I want to have a result set where each process will be in a different row.
Something like:
UserID ProcessName
-------------------------
123456 outlook.exe
123456 notepad.exe
I saw there is an option to query repeated data but the field needs to be RECORD type to my understanding.
Is it possible to convert to RECORD type "on the fly" in a subquery? (I can't change the source field to RECORD).
Or, is there a different way to return the desired result set?

This could be a possible work around for you:
SELECT
user_id,
processListValues
FROM(
SELECT
JSON_EXTRACT_SCALAR(json_data, '$.user.id') user_id,
REGEXP_EXTRACT_ALL(JSON_EXTRACT(json_data, '$.event.processList'), r':"([a-zA-Z0-9\.]+)"') processListValues
FROM data
),
UNNEST(processListValues) processListValues
Using your JSON as example:
WITH data AS(
SELECT """{
"event":{
"action":"prohibitedSoftwareCheckResult",
"clientTime":"2017-07-16T12:55:40.828Z",
"clientTimeZone":"3",
"serverTime":"2017-07-16T12:55:39.000Z",
"processList":{
"1":"outlook.exe",
"2":"notepad.exe",
"3":"outlo3245345okexe"
}
},
"user":{
"id":123456,
}
}""" as json_data
)
SELECT
user_id,
processListValues
FROM(
SELECT
JSON_EXTRACT_SCALAR(json_data, '$.user.id') user_id,
REGEXP_EXTRACT_ALL(JSON_EXTRACT(json_data, '$.event.processList'), r':"([a-zA-Z0-9\.]+)"') processListValues
FROM data
),
UNNEST(processListValues) processListValues
Results:
Row user_id processListValues
1 123456 outlook.exe
2 123456 notepad.exe
3 123456 outlo3245345okexe

Related

How to group results by values that are inside json array in postgreSQL

I have a column of type jSONB that have data like this:
column name: used_filters
row number 1 example:
{ "categories" : ["economic", "Social"], "tags": ["world" ,"eco-friendly"] }
row number 2 example:
{ "categories" : ["economic"], "tags": ["eco-friendly"] , "keywords" : ["2050"] }
I want to group the result to get the most frequent value for each one of the keys
something like this:
key
most_freq
category
economic
tags
eco-friendly
keyword
2050
the keys are not constant and could be something other than the example I said but I know that they will be frequent.
You can extract keys and values as arrays first by using jsonb_each, and then unnest the generated arrays by jsonb_array_elements_text. The rest is classical aggregation along with sorting through the count values by window function such as
SELECT key, value
FROM ( SELECT j.key, jj.value,
RANK() OVER (PARTITION BY j.key ORDER BY COUNT(*) DESC)
FROM t,
LATERAL jsonb_each(js) AS j,
LATERAL jsonb_array_elements_text(j.value) AS jj
GROUP BY j.key, jj.value ) AS q
WHERE rank = 1
Demo

Filtering multiple XML nodes using XPath in PostgreSQL

There is an XML with the structure:
- Item
- Documents
- Document
- Records
- Record
- Category
- Code
- Value
And here is the SQL query that selects record values filtered by category code
SELECT (xpath('/ns:Record/ns:Value/text()', rec, ARRAY[ARRAY['ns', 'http://some-ns']]))[1]::text AS val
FROM (
SELECT unnest(xpath('/ns:Item/ns:Documents/ns:Document/ns:Records/ns:Record[ns:Category/ns:Code/text()="MAIN.CAT001"]',
'<Item xmlns="http://some-ns"><Documents><Document><Records><Record><Category><Code>MAIN.CAT001</Code></Category><Value>Value 001</Value></Record><Record><Category><Code>MAIN.CAT002</Code></Category><Value>Value 002</Value></Record><Record><Category><Code>MAIN.CAT003</Code></Category><Value>Value 003</Value></Record></Records></Document></Documents></Item>'::xml,
ARRAY[ARRAY['ns', 'http://some-ns']])) AS rec
) t
Is it possible to filter "records" not only by one "category code", but by multiple ones?
I mean I'd like to use filter like this
ns:Record[ns:Category/ns:Code/text()=("MAIN.CAT001", "MAIN.CAT003")]
or this
ns:Record[ns:Category/ns:Code/text()="MAIN.CAT001" or ns:Category/ns:Code/text()="MAIN.CAT003"]
But both solution don't work
Try using contains(ns:Code,"MAIN.CAT001") or contains(ns:Code,"MAIN.CAT003")]:
SELECT (xpath('/ns:Record/ns:Value/text()', rec, ARRAY[ARRAY['ns', 'http://some-ns']]))[1]::text AS val
FROM (
SELECT
unnest(xpath('/ns:Item/ns:Documents/ns:Document/ns:Records/ns:Record[ns:Category[contains(ns:Code,"MAIN.CAT001") or contains(ns:Code,"MAIN.CAT003")]]',
'<Item xmlns="http://some-ns"><Documents><Document><Records><Record><Category><Code>MAIN.CAT001</Code></Category><Value>Value 001</Value></Record><Record><Category><Code>MAIN.CAT002</Code></Category><Value>Value 002</Value></Record><Record><Category><Code>MAIN.CAT003</Code></Category><Value>Value 003</Value></Record></Records></Document></Documents></Item>'::xml,
ARRAY[ARRAY['ns', 'http://some-ns']])) AS rec
) t;
val
-----------
Value 001
Value 003

Extracting data from an array of JSON objects for specific object values

In my table, there is a column of JSON type which contains an array of objects describing time offsets:
[
{
"type": "start"
"time": 1.234
},
{
"type": "end"
"time": 50.403
}
]
I know that I can extract these with JSON_EACH() and JSON_EXTRACT():
CREATE TEMPORARY TABLE Items(
id INTEGER PRIMARY KEY,
timings JSON
);
INSERT INTO Items(timings) VALUES
('[{"type": "start", "time": 12.345}, {"type": "end", "time": 67.891}]'),
('[{"type": "start", "time": 24.56}, {"type": "end", "time": 78.901}]');
SELECT
JSON_EXTRACT(Timings.value, '$.type'),
JSON_EXTRACT(Timings.value, '$.time')
FROM
Items,
JSON_EACH(timings) AS Timings;
This returns a table like:
start 12.345
end 67.891
start 24.56
end 78.901
What I really need though is to:
Find the timings of specific types. (Find the first object in the array that matches a condition.)
Take this data and select it as a column with the rest of the table.
In other words, I'm looking for a table that looks like this:
id start end
-----------------------------
0 12.345 67.891
1 24.56 78.901
I'm hoping for some sort of query like this:
SELECT
id,
JSON_EXTRACT(timings, '$.[type="start"].time'),
JSON_EXTRACT(timings, '$.[type="end"].time')
FROM Items;
Is there some way to use path in the JSON functions to select what I need? Or, some other way to pivot what I have in the first example to apply to the table?
One possibility:
WITH cte(id, json) AS
(SELECT Items.id
, json_group_object(json_extract(j.value, '$.type'), json_extract(j.value, '$.time'))
FROM Items
JOIN json_each(timings) AS j ON json_extract(j.value, '$.type') IN ('start', 'end')
GROUP BY Items.id)
SELECT id
, json_extract(json, '$.start') AS start
, json_extract(json, '$.end') AS "end"
FROM cte
ORDER BY id;
which gives
id start end
---------- ---------- ----------
1 12.345 67.891
2 24.56 78.901
Another one, that uses the window functions added in sqlite 3.25 and avoids creating intermediate JSON objects:
SELECT DISTINCT Items.id
, max(json_extract(j.value, '$.time'))
FILTER (WHERE json_extract(j.value, '$.type') = 'start') OVER ids AS start
, max(json_extract(j.value, '$.time'))
FILTER (WHERE json_extract(j.value, '$.type') = 'end') OVER ids AS "end"
FROM Items
JOIN json_each(timings) AS j ON json_extract(j.value, '$.type') IN ('start', 'end')
WINDOW ids AS (PARTITION BY Items.id)
ORDER BY Items.id;
The key is using the ON clause of the JOIN to limit results to just the two objects in each array that you care about, and then merging those up to two rows for each Items.id into one with a couple of different approaches.

postgresql check if json value exist with good performance for example:index

I want to get rows where json value equals '111'
id json
1 {"1":"111", "2":"222"}
2 {"1":"111", "3":"333"}
3 {"4":"444", "2":"222"}
4 {"4":"666", "2":"111"}
5 {"1":"777", "3":"888"}
If you are not sure about the keys to look 111 in for, you can simply do:
SELECT *
FROM json_test
WHERE json::text LIKE '%"111"%';
See DEMO here.
demo:db<>fiddle
SELECT *
FROM table
WHERE json ->> '1' = '111'
or
json ->> '2' = '111'
Updated data set:
demo<>db:fiddle
SELECT
id, json
FROM
json_test, json_each_text(json)
WHERE
value = '111'
Expanding the json object (creates table with key/value pairs), filter the rows where value = '111'

How to query all entries with a value in a nested bigquery table

I generated a BigQuery table using an existing BigTable table, and the result is a multi-nested dataset that I'm struggling to query from. Here's the format of an entry from that BigQuery table just doing a simple select * from my_table limit 1:
[
{
"rowkey": "XA_1234_0",
"info": {
"column": [],
"somename": {
"cell": [
{
"timestamp": "1514357827.321",
"value": "1234"
}
]
},
...
}
},
...
]
What I need is to be able to get all entries from my_table where the value of somename is X, for instance. There will be multiple rowkeys where the value of somename will be X and I need all the data from each of those rowkey entries.
OR
If I could have a query where rowkey contains X, so to get "XA_1234_0", "XA_1234_1"... The "XA" and the "0" can change but the middle numbers to be the same. I've tried doing a where rowkey like "$_1234_$" but the query goes on for over a minute and is way too long for some reason.
I am using standard SQL.
EDIT: Here's an example of a query I tried that didn't work (with error: Cannot access field value on a value with type ARRAY<STRUCT<timestamp TIMESTAMP, value STRING>>), but best describes what I'm trying to achieve:
SELECT * FROM `my_dataset.mytable` where info.field_name.cell.value=12345
I want to get all records whose value in field_name equals some value.
From the sample Firebase Analytics dataset:
#standardSQL
SELECT *
FROM `firebase-analytics-sample-data.android_dataset.app_events_20160607`
WHERE EXISTS(
SELECT * FROM UNNEST(user_dim.user_properties)
WHERE key='powers' AND value.value.string_value='20'
)
LIMIT 1000
Below is for BigQuery Standard SQL
#standardSQL
SELECT t.*
FROM `my_dataset.mytable` t,
UNNEST(info.somename.cell) c
WHERE c.value = '1234'
above is assuming specific value can appear in each record just once - hope this is a true for you case
If this is not a case - below should make it
#standardSQL
SELECT *
FROM `yourproject.yourdadtaset.yourtable`
WHERE EXISTS(
SELECT *
FROM UNNEST(info.somename.cell)
WHERE value = '1234'
)
which I just realised pretty much same as Felipe's version - but just using your table / schema