I have a BigQuery table with nested and repeated columns that I'm having trouble unnesting to columns instead of rows.
This is what it the table looks like:
{
"dateTime": "Tue Mar 01 2022 11:11:11 GMT-0800 (Pacific Standard Time)",
"Field1": "123456",
"Field2": true,
"Matches": [
{
"ID": "abc123",
"FieldA": "asdfljadsf",
"FieldB": "0.99"
},
{
"ID": "def456",
"FieldA": "sdgfhgdkghj",
"FieldB": "0.92"
},
{
"ID": "ghi789",
"FieldA": "tfgjyrtjy",
"FieldB": "0.64"
}
]
},
What I want is to return a table with each one of the nested fields as an individual column so I have a clean dataframe to work with:
{
"dateTime": "Tue Mar 01 2022 11:11:11 GMT-0800 (Pacific Standard Time)",
"Field1": "123456",
"Field2": true,
"Matches_1_ID": "abc123",
"Matches_1_FieldA": "asdfljadsf",
"Matches_1_FieldB": "0.99",
"Matches_2_ID": "def456",
"Matches_2_FieldA": "sdgfhgdkghj",
"Matches_2_FieldB": "0.92",
"Matches_3_ID": "ghi789",
"Matches_3_FieldA": "tfgjyrtjy",
"Matches_3_FieldB": "0.64"
},
I tried using UNNEST as below, however it creates new rows with only one set of additional columns, so not what I'm looking for.
SELECT *
FROM table,
UNNEST(Matches) as items
Any solution for this? Thank you in advance.
Consider below approach
select * from (
select t.* except(Matches), match.*, offset + 1 as offset
from your_table t, t.Matches match with offset
)
pivot (min(ID) as id, min(FieldA) as FieldA, min(FieldB) as FieldB for offset in (1,2,3))
If applied to sample data in y our question - output is
if there is are no matches, that entire row is not included in the output table, whereas I want to keep that record with the fields it does have. How can I modify to keep all records?
use left join instead as in below
select * from (
select t.* except(Matches), match.*, offset + 1 as offset
from your_table t left join t.Matches match with offset
)
pivot (min(ID) as id, min(FieldA) as FieldA, min(FieldB) as FieldB for offset in (1,2,3))
Related
The table has an id and step_data columns.
id | step_data
--------------
a1 | {...}
a2 | {...}
a3 | {...}
Where step_data is nested structure as follows, where the actual keys of the metadata can be in of the events objects.
{
"events": [
{
"timestamp": "2021-04-07T17:46:13.739Z",
"meta": [
{
"key": "length",
"value": "0.898"
},
{
"key": "height",
"value": "607023104"
},
{
"key": "weight",
"value": "33509376"
}
]
},
{
"timestamp": "2021-04-07T17:46:13.781Z",
"meta": [
{
"key": "color",
"value": "0.007"
},
{
"key": "count",
"value": "641511424"
},
{
"key": "age",
"value": "0"
}
]
}
]
}
I can extract one field like length pretty easily.
select cast(metadata ->> 'value' as double precision) as length,
id
from (
select jsonb_array_elements(jsonb_array_elements(step_data #> '{events}') #> '{meta}') metadata,
id
from table
) as parsed_keys
where metadata #> '{
"key": "length"
}'::jsonb
id
length
a1
0.898
a2
0.800
But what I really need is to extract the metadata as columns from a couple of known keys, like length and color. Not sure how to get another column efficiently once I split the array with jsonb_array_elements().
Is there an efficient way to do this without having to call jsonb_array_elements() again and do a join on every single one? For example such that the result set looks like this.
id
length
color
weight
a1
0.898
0.007
33509376
a2
0.800
1.000
15812391
Using Postgres 11.7.
With Postgres 11, I can only think of unnesting both levels, then aggregating back into a key/value pair from which you can extract the desired keys:
select t.id,
(x.att ->> 'length')::numeric as length,
(x.att ->> 'color')::numeric as color,
(x.att ->> 'weight')::numeric as weight
from the_table t
cross join lateral (
select jsonb_object_agg(m.item ->> 'key', m.item -> 'value') as att
from jsonb_array_elements(t.step_data -> 'events') as e(event)
cross join jsonb_array_elements(e.event -> 'meta') as m(item)
where m.item ->> 'key' in ('color', 'length', 'weight')
) x
;
With Postgres 12 you could write it a bit simpler:
select t.id,
jsonb_path_query_first(t.step_data, '$.events[*].meta[*] ? (#.key == "length").value') #>> '{}' as length,
jsonb_path_query_first(t.step_data, '$.events[*].meta[*] ? (#.key == "color").value') #>> '{}' as color,
jsonb_path_query_first(t.step_data, '$.events[*].meta[*] ? (#.key == "weight").value') #>> '{}' as weight
from the_table t
;
crosstab()
For any Postgres version.
You could feed the result into the crosstab() function to pivot the result. You need the additional module tablefunc installed. If you are unfamiliar, read basic instructions here first:
PostgreSQL Crosstab Query
SELECT *
FROM crosstab(
$$
SELECT id, metadata->>'key', metadata ->>'value'
FROM (SELECT id, jsonb_array_elements(jsonb_array_elements(step_data -> 'events') -> 'meta') metadata FROM tbl) AS parsed_keys
ORDER BY 1
$$
, $$VALUES ('length'), ('color'), ('weight')$$
) AS ct (id text, length float, color float, weight float);
db<>fiddle here (demonstrating all)
Should deliver best performance, especially for many columns.
Note how we need no explicit cast to double precision (float). crosstab() processes text input anyway, the result is coerced to the types given in the column definition list.
If one of the keys should appear multiple times, the last row wins. (No error is raised, like you would seem to prefer.) You can add a deterministic sort order to the query in $1 to sort the preferred row last. Example: to get the lowest value per key:
ORDER BY 1, 2, 3 DESC
Conditional aggregates with FILTER clause
For Postgres 9.4 or newer.
See:
Aggregate columns with additional (distinct) filters
SELECT id
, min(val) FILTER (WHERE key = 'length') AS length
, min(val) FILTER (WHERE key = 'color') AS color
, min(val) FILTER (WHERE key = 'weight') AS weight
FROM (
SELECT id, metadata->>'key' AS key, (metadata ->>'value')::float AS val
FROM (SELECT id, jsonb_array_elements(jsonb_array_elements(step_data -> 'events') -> 'meta') metadata FROM tbl) AS parsed_keys
) sub
GROUP BY id;
In Postgres 12 or later I would compare performance with jsonb_path_query_first(). a_horse provided a solution.
This query is an option:
SELECT id,
MAX(CASE WHEN metadata->>'key' = 'length' THEN metadata->>'value' END) AS length,
MAX(CASE WHEN metadata->>'key' = 'color' THEN metadata->>'value' END) AS color,
MAX(CASE WHEN metadata->>'key' = 'weight' THEN metadata->>'value' END) AS weight
FROM (SELECT id, jsonb_array_elements(jsonb_array_elements(step_data #> '{events}') #> '{meta}') as metadata
FROM table t) AS aux
GROUP BY id;
I'm trying to pull elements from JSONB column.
I have table like:
id NUMBER
data JSONB
data structure is:
[{
"id": "abcd",
"validTo": "timestamp"
}, ...]
I'm querying that row with SELECT * FROM testtable WHERE data #> '[{"id": "abcd"}]', and it almost works like I want to.
The trouble is data column is huge, like 100k records, so I would like to pull only data elements I'm looking for.
For example if I would query for
SELECT * FROM testtable WHERE data #> '[{"id": "abcd"}]' OR data #> '[{"id": "abcde"}]' I expect data column to contain only records with id abcd or abcde. Like that:
[
{"id": "abcd"},
{"id": "abcde"}
]
It would be okay if query would return separate entries with single data record.
I have no ideas how to solve it, trying lot options since days.
To have separate output for records having multiple matches
with a (id, data) as (
values
(1, '[{"id": "abcd", "validTo": 2}, {"id": "abcde", "validTo": 4}]'::jsonb),
(2, '[{"id": "abcd", "validTo": 3}, {"id": "abc", "validTo": 6}]'::jsonb),
(3, '[{"id": "abc", "validTo": 5}]'::jsonb)
)
select id, jsonb_array_elements(jsonb_path_query_array(data, '$[*] ? (#.id=="abcd" || #.id=="abcde")'))
from a;
You will need to unnest, filter and aggregate back:
select t.id, j.*
from testtable t
join lateral (
select jsonb_agg(e.x) as data
from jsonb_array_elements(t.data) as e(x)
where e.x #> '{"id": "abcd"}'
or e.x #> '{"id": "abcde"}'
) as j on true
Online example
With Postgres 12 you could use jsonb_path_query_array() as an alternative, but that would require to repeat the conditions:
select t.id,
jsonb_path_query_array(data, '$[*] ? (#.id == "abcd" || #.id == "abcde")')
from testtable t
where t.data #> '[{"id": "abcd"}]'
or t.data #> '[{"id": "abcde"}]'
Didn't quite get your question.Are you asking that the answer should only contain data column without id column .Then I think this is the query:
Select data from testtable where id="abcd" or id="abcde";
I'm using postgresql db.I have a table named 'offers' which has a column 'validity' which contains the following data in JSON format:
[{"end_date": "2019-12-31", "program_id": "4", "start_date": "2019-10-27"},
{"end_date":"2020-12-31", "program_id": "6", "start_date": "2020-01-01"},
{"end_date": "2020-01-01", "program_id": "3", "start_date": "2019-10-12"}]
Now I want to get all records where 'validity' column contains:
program_id = 4 and end_date > current_date.
How to write SQL query or knexjs query to achieve this?
Thanks in advance
You can use an EXISTS condition:
select o.*
from offers o
where exists (select *
from jsonb_array_elements(o.validity) as v(item)
where v.item ->> 'program_id' = '3'
and (v.item ->> 'end_date')::date > current_date)
Online example
In my table, there is a column of JSON type which contains an array of objects describing time offsets:
[
{
"type": "start"
"time": 1.234
},
{
"type": "end"
"time": 50.403
}
]
I know that I can extract these with JSON_EACH() and JSON_EXTRACT():
CREATE TEMPORARY TABLE Items(
id INTEGER PRIMARY KEY,
timings JSON
);
INSERT INTO Items(timings) VALUES
('[{"type": "start", "time": 12.345}, {"type": "end", "time": 67.891}]'),
('[{"type": "start", "time": 24.56}, {"type": "end", "time": 78.901}]');
SELECT
JSON_EXTRACT(Timings.value, '$.type'),
JSON_EXTRACT(Timings.value, '$.time')
FROM
Items,
JSON_EACH(timings) AS Timings;
This returns a table like:
start 12.345
end 67.891
start 24.56
end 78.901
What I really need though is to:
Find the timings of specific types. (Find the first object in the array that matches a condition.)
Take this data and select it as a column with the rest of the table.
In other words, I'm looking for a table that looks like this:
id start end
-----------------------------
0 12.345 67.891
1 24.56 78.901
I'm hoping for some sort of query like this:
SELECT
id,
JSON_EXTRACT(timings, '$.[type="start"].time'),
JSON_EXTRACT(timings, '$.[type="end"].time')
FROM Items;
Is there some way to use path in the JSON functions to select what I need? Or, some other way to pivot what I have in the first example to apply to the table?
One possibility:
WITH cte(id, json) AS
(SELECT Items.id
, json_group_object(json_extract(j.value, '$.type'), json_extract(j.value, '$.time'))
FROM Items
JOIN json_each(timings) AS j ON json_extract(j.value, '$.type') IN ('start', 'end')
GROUP BY Items.id)
SELECT id
, json_extract(json, '$.start') AS start
, json_extract(json, '$.end') AS "end"
FROM cte
ORDER BY id;
which gives
id start end
---------- ---------- ----------
1 12.345 67.891
2 24.56 78.901
Another one, that uses the window functions added in sqlite 3.25 and avoids creating intermediate JSON objects:
SELECT DISTINCT Items.id
, max(json_extract(j.value, '$.time'))
FILTER (WHERE json_extract(j.value, '$.type') = 'start') OVER ids AS start
, max(json_extract(j.value, '$.time'))
FILTER (WHERE json_extract(j.value, '$.type') = 'end') OVER ids AS "end"
FROM Items
JOIN json_each(timings) AS j ON json_extract(j.value, '$.type') IN ('start', 'end')
WINDOW ids AS (PARTITION BY Items.id)
ORDER BY Items.id;
The key is using the ON clause of the JOIN to limit results to just the two objects in each array that you care about, and then merging those up to two rows for each Items.id into one with a couple of different approaches.
I generated a BigQuery table using an existing BigTable table, and the result is a multi-nested dataset that I'm struggling to query from. Here's the format of an entry from that BigQuery table just doing a simple select * from my_table limit 1:
[
{
"rowkey": "XA_1234_0",
"info": {
"column": [],
"somename": {
"cell": [
{
"timestamp": "1514357827.321",
"value": "1234"
}
]
},
...
}
},
...
]
What I need is to be able to get all entries from my_table where the value of somename is X, for instance. There will be multiple rowkeys where the value of somename will be X and I need all the data from each of those rowkey entries.
OR
If I could have a query where rowkey contains X, so to get "XA_1234_0", "XA_1234_1"... The "XA" and the "0" can change but the middle numbers to be the same. I've tried doing a where rowkey like "$_1234_$" but the query goes on for over a minute and is way too long for some reason.
I am using standard SQL.
EDIT: Here's an example of a query I tried that didn't work (with error: Cannot access field value on a value with type ARRAY<STRUCT<timestamp TIMESTAMP, value STRING>>), but best describes what I'm trying to achieve:
SELECT * FROM `my_dataset.mytable` where info.field_name.cell.value=12345
I want to get all records whose value in field_name equals some value.
From the sample Firebase Analytics dataset:
#standardSQL
SELECT *
FROM `firebase-analytics-sample-data.android_dataset.app_events_20160607`
WHERE EXISTS(
SELECT * FROM UNNEST(user_dim.user_properties)
WHERE key='powers' AND value.value.string_value='20'
)
LIMIT 1000
Below is for BigQuery Standard SQL
#standardSQL
SELECT t.*
FROM `my_dataset.mytable` t,
UNNEST(info.somename.cell) c
WHERE c.value = '1234'
above is assuming specific value can appear in each record just once - hope this is a true for you case
If this is not a case - below should make it
#standardSQL
SELECT *
FROM `yourproject.yourdadtaset.yourtable`
WHERE EXISTS(
SELECT *
FROM UNNEST(info.somename.cell)
WHERE value = '1234'
)
which I just realised pretty much same as Felipe's version - but just using your table / schema