Select items from jsonb array in postgres 12 - sql

I'm trying to pull elements from JSONB column.
I have table like:
id NUMBER
data JSONB
data structure is:
[{
"id": "abcd",
"validTo": "timestamp"
}, ...]
I'm querying that row with SELECT * FROM testtable WHERE data #> '[{"id": "abcd"}]', and it almost works like I want to.
The trouble is data column is huge, like 100k records, so I would like to pull only data elements I'm looking for.
For example if I would query for
SELECT * FROM testtable WHERE data #> '[{"id": "abcd"}]' OR data #> '[{"id": "abcde"}]' I expect data column to contain only records with id abcd or abcde. Like that:
[
{"id": "abcd"},
{"id": "abcde"}
]
It would be okay if query would return separate entries with single data record.
I have no ideas how to solve it, trying lot options since days.

To have separate output for records having multiple matches
with a (id, data) as (
values
(1, '[{"id": "abcd", "validTo": 2}, {"id": "abcde", "validTo": 4}]'::jsonb),
(2, '[{"id": "abcd", "validTo": 3}, {"id": "abc", "validTo": 6}]'::jsonb),
(3, '[{"id": "abc", "validTo": 5}]'::jsonb)
)
select id, jsonb_array_elements(jsonb_path_query_array(data, '$[*] ? (#.id=="abcd" || #.id=="abcde")'))
from a;

You will need to unnest, filter and aggregate back:
select t.id, j.*
from testtable t
join lateral (
select jsonb_agg(e.x) as data
from jsonb_array_elements(t.data) as e(x)
where e.x #> '{"id": "abcd"}'
or e.x #> '{"id": "abcde"}'
) as j on true
Online example
With Postgres 12 you could use jsonb_path_query_array() as an alternative, but that would require to repeat the conditions:
select t.id,
jsonb_path_query_array(data, '$[*] ? (#.id == "abcd" || #.id == "abcde")')
from testtable t
where t.data #> '[{"id": "abcd"}]'
or t.data #> '[{"id": "abcde"}]'

Didn't quite get your question.Are you asking that the answer should only contain data column without id column .Then I think this is the query:
Select data from testtable where id="abcd" or id="abcde";

Related

Postgres JSONB select json objects as columns

The table has an id and step_data columns.
id | step_data
--------------
a1 | {...}
a2 | {...}
a3 | {...}
Where step_data is nested structure as follows, where the actual keys of the metadata can be in of the events objects.
{
"events": [
{
"timestamp": "2021-04-07T17:46:13.739Z",
"meta": [
{
"key": "length",
"value": "0.898"
},
{
"key": "height",
"value": "607023104"
},
{
"key": "weight",
"value": "33509376"
}
]
},
{
"timestamp": "2021-04-07T17:46:13.781Z",
"meta": [
{
"key": "color",
"value": "0.007"
},
{
"key": "count",
"value": "641511424"
},
{
"key": "age",
"value": "0"
}
]
}
]
}
I can extract one field like length pretty easily.
select cast(metadata ->> 'value' as double precision) as length,
id
from (
select jsonb_array_elements(jsonb_array_elements(step_data #> '{events}') #> '{meta}') metadata,
id
from table
) as parsed_keys
where metadata #> '{
"key": "length"
}'::jsonb
id
length
a1
0.898
a2
0.800
But what I really need is to extract the metadata as columns from a couple of known keys, like length and color. Not sure how to get another column efficiently once I split the array with jsonb_array_elements().
Is there an efficient way to do this without having to call jsonb_array_elements() again and do a join on every single one? For example such that the result set looks like this.
id
length
color
weight
a1
0.898
0.007
33509376
a2
0.800
1.000
15812391
Using Postgres 11.7.
With Postgres 11, I can only think of unnesting both levels, then aggregating back into a key/value pair from which you can extract the desired keys:
select t.id,
(x.att ->> 'length')::numeric as length,
(x.att ->> 'color')::numeric as color,
(x.att ->> 'weight')::numeric as weight
from the_table t
cross join lateral (
select jsonb_object_agg(m.item ->> 'key', m.item -> 'value') as att
from jsonb_array_elements(t.step_data -> 'events') as e(event)
cross join jsonb_array_elements(e.event -> 'meta') as m(item)
where m.item ->> 'key' in ('color', 'length', 'weight')
) x
;
With Postgres 12 you could write it a bit simpler:
select t.id,
jsonb_path_query_first(t.step_data, '$.events[*].meta[*] ? (#.key == "length").value') #>> '{}' as length,
jsonb_path_query_first(t.step_data, '$.events[*].meta[*] ? (#.key == "color").value') #>> '{}' as color,
jsonb_path_query_first(t.step_data, '$.events[*].meta[*] ? (#.key == "weight").value') #>> '{}' as weight
from the_table t
;
crosstab()
For any Postgres version.
You could feed the result into the crosstab() function to pivot the result. You need the additional module tablefunc installed. If you are unfamiliar, read basic instructions here first:
PostgreSQL Crosstab Query
SELECT *
FROM crosstab(
$$
SELECT id, metadata->>'key', metadata ->>'value'
FROM (SELECT id, jsonb_array_elements(jsonb_array_elements(step_data -> 'events') -> 'meta') metadata FROM tbl) AS parsed_keys
ORDER BY 1
$$
, $$VALUES ('length'), ('color'), ('weight')$$
) AS ct (id text, length float, color float, weight float);
db<>fiddle here (demonstrating all)
Should deliver best performance, especially for many columns.
Note how we need no explicit cast to double precision (float). crosstab() processes text input anyway, the result is coerced to the types given in the column definition list.
If one of the keys should appear multiple times, the last row wins. (No error is raised, like you would seem to prefer.) You can add a deterministic sort order to the query in $1 to sort the preferred row last. Example: to get the lowest value per key:
ORDER BY 1, 2, 3 DESC
Conditional aggregates with FILTER clause
For Postgres 9.4 or newer.
See:
Aggregate columns with additional (distinct) filters
SELECT id
, min(val) FILTER (WHERE key = 'length') AS length
, min(val) FILTER (WHERE key = 'color') AS color
, min(val) FILTER (WHERE key = 'weight') AS weight
FROM (
SELECT id, metadata->>'key' AS key, (metadata ->>'value')::float AS val
FROM (SELECT id, jsonb_array_elements(jsonb_array_elements(step_data -> 'events') -> 'meta') metadata FROM tbl) AS parsed_keys
) sub
GROUP BY id;
In Postgres 12 or later I would compare performance with jsonb_path_query_first(). a_horse provided a solution.
This query is an option:
SELECT id,
MAX(CASE WHEN metadata->>'key' = 'length' THEN metadata->>'value' END) AS length,
MAX(CASE WHEN metadata->>'key' = 'color' THEN metadata->>'value' END) AS color,
MAX(CASE WHEN metadata->>'key' = 'weight' THEN metadata->>'value' END) AS weight
FROM (SELECT id, jsonb_array_elements(jsonb_array_elements(step_data #> '{events}') #> '{meta}') as metadata
FROM table t) AS aux
GROUP BY id;

How to iterate on json data with sql/knexjs query

I'm using postgresql db.I have a table named 'offers' which has a column 'validity' which contains the following data in JSON format:
[{"end_date": "2019-12-31", "program_id": "4", "start_date": "2019-10-27"},
{"end_date":"2020-12-31", "program_id": "6", "start_date": "2020-01-01"},
{"end_date": "2020-01-01", "program_id": "3", "start_date": "2019-10-12"}]
Now I want to get all records where 'validity' column contains:
program_id = 4 and end_date > current_date.
How to write SQL query or knexjs query to achieve this?
Thanks in advance
You can use an EXISTS condition:
select o.*
from offers o
where exists (select *
from jsonb_array_elements(o.validity) as v(item)
where v.item ->> 'program_id' = '3'
and (v.item ->> 'end_date')::date > current_date)
Online example

JSONB subset of array

I have the following JSONB field on my posts table: comments.
comments looks like this:
[
{"id": 1, "text": "My comment"},
{"id": 2, "text": "My other comment"},
]
I would like to select some information about each comments.
SELECT comments->?????? FROM posts WHERE posts.id = 1;
Is there a way for me to select only the id fields of my JSON. Eg. the result of my SQL query should be:
[{"id": 1}, {"id": 2}]
Thanks!
You can use jsonb_to_recordset to split each comment into its own row. Only columns you specify will end up in the row, so you can use this to keep only the id column. Then you can aggregate the comments for one post into an array using json_agg:
select json_agg(c)
from posts p
cross join lateral
jsonb_to_recordset(comments) c(id int) -- Only keep id
where p.id = 1
This results in:
[{"id":1},{"id":2}]
Example at db-fiddle.com
You may use json_build_object + json_agg
select json_agg(json_build_object('id',j->>'id'))
from posts cross join jsonb_array_elements(comments) as j
where (j->>'id')::int = 1;
DEMO

Extracting data from an array of JSON objects for specific object values

In my table, there is a column of JSON type which contains an array of objects describing time offsets:
[
{
"type": "start"
"time": 1.234
},
{
"type": "end"
"time": 50.403
}
]
I know that I can extract these with JSON_EACH() and JSON_EXTRACT():
CREATE TEMPORARY TABLE Items(
id INTEGER PRIMARY KEY,
timings JSON
);
INSERT INTO Items(timings) VALUES
('[{"type": "start", "time": 12.345}, {"type": "end", "time": 67.891}]'),
('[{"type": "start", "time": 24.56}, {"type": "end", "time": 78.901}]');
SELECT
JSON_EXTRACT(Timings.value, '$.type'),
JSON_EXTRACT(Timings.value, '$.time')
FROM
Items,
JSON_EACH(timings) AS Timings;
This returns a table like:
start 12.345
end 67.891
start 24.56
end 78.901
What I really need though is to:
Find the timings of specific types. (Find the first object in the array that matches a condition.)
Take this data and select it as a column with the rest of the table.
In other words, I'm looking for a table that looks like this:
id start end
-----------------------------
0 12.345 67.891
1 24.56 78.901
I'm hoping for some sort of query like this:
SELECT
id,
JSON_EXTRACT(timings, '$.[type="start"].time'),
JSON_EXTRACT(timings, '$.[type="end"].time')
FROM Items;
Is there some way to use path in the JSON functions to select what I need? Or, some other way to pivot what I have in the first example to apply to the table?
One possibility:
WITH cte(id, json) AS
(SELECT Items.id
, json_group_object(json_extract(j.value, '$.type'), json_extract(j.value, '$.time'))
FROM Items
JOIN json_each(timings) AS j ON json_extract(j.value, '$.type') IN ('start', 'end')
GROUP BY Items.id)
SELECT id
, json_extract(json, '$.start') AS start
, json_extract(json, '$.end') AS "end"
FROM cte
ORDER BY id;
which gives
id start end
---------- ---------- ----------
1 12.345 67.891
2 24.56 78.901
Another one, that uses the window functions added in sqlite 3.25 and avoids creating intermediate JSON objects:
SELECT DISTINCT Items.id
, max(json_extract(j.value, '$.time'))
FILTER (WHERE json_extract(j.value, '$.type') = 'start') OVER ids AS start
, max(json_extract(j.value, '$.time'))
FILTER (WHERE json_extract(j.value, '$.type') = 'end') OVER ids AS "end"
FROM Items
JOIN json_each(timings) AS j ON json_extract(j.value, '$.type') IN ('start', 'end')
WINDOW ids AS (PARTITION BY Items.id)
ORDER BY Items.id;
The key is using the ON clause of the JOIN to limit results to just the two objects in each array that you care about, and then merging those up to two rows for each Items.id into one with a couple of different approaches.

Structure of table in BigQuery

I want to create table of below JSON.
{
"store_nbr": "1234",
"sls_dt": "2014-01-01 00:00:00",
"Items": [{
"sku": "3456",
"sls_amt": "9.99",
"discounts": [{
"disc_nbr": "1",
"disc_amt": "0.99"
}, {
"disc_nbr": "2",
"disc_amt": "1.00"
}]
}]
}
Can anyone help me what would be the structure of this JSON on BigQuery ? and How I can retrieve data using SQL query ?
I am wondering what would be the structure of my table?
Try below for BigQuery Standard SQL
#standardSQL
WITH yourTable AS (
SELECT
1234 AS store_nbr,
DATE('2014-01-01 00:00:00') AS sls_dt,
[STRUCT(
3456 AS sku,
9.99 AS sls_amt,
[STRUCT<disc_nbr INT64, disc_amt FLOAT64>
(1, 0.99),
(2, 1.00)
] AS discounts
)] AS items
)
SELECT *
FROM yourTable
The structure of table here is:
or if to look in Web UI:
How I can read values of it?
It is really depends on what exactly and how you want to "read" out of this data!
For example if you want to calc total discount per each sale - it can looks as below
#standardSQL
WITH yourTable AS (
SELECT
1234 AS store_nbr,
DATE('2014-01-01 00:00:00') AS sls_dt,
[STRUCT(
3456 AS sku, 9.99 AS sls_amt, [STRUCT<disc_nbr INT64, disc_amt FLOAT64>(1, 0.99), (2, 1.00)] AS discounts
)] AS items
)
SELECT
t.*,
(SELECT SUM(disc.disc_amt) FROM UNNEST(item.discounts) AS disc) AS total_discount
FROM yourTable AS t, UNNEST(items) AS item
I recommend you first to complete your "exercise" with table creation and actually get data into it, so than you can ask specific questions about query you want to build.
But this should be a new post so you do not mix all together as an all-in-one question, as such type of questions usually not welcomed here on SO