If I have a table with a single jsonb column and the table has data like this:
[{"body": {"project-id": "111"}},
{"body": {"my-org.project-id": "222"}},
{"body": {"other-org.project-id": "333"}}]
Basically it stores project-id differently for different rows.
Now I need a query where the data->'body'->'etc'., from different rows would coalesce into a single field 'project-id', how can I do that?
e.g.: if I do something like this:
select data->'body'->'project-id' projectid from mytable
it will return something like:
| projectid |
| 111 |
But I also want project-id's in other rows too, but I don't want additional columns in the results. i.e, I want this:
| projectid |
| 111 |
| 222 |
| 333 |
I understand that each of your rows contains a json object, with a nested object whose key varies over rows, and whose value you want to acquire.
Assuming the 'body' always has a single key, you could do:
select jsonb_extract_path_text(t.js -> 'body', x.k) projectid
from t
cross join lateral jsonb_object_keys(t.js -> 'body') as x(k)
The lateral join on jsonb_object_keys() extracts all keys in the object as rows. Then we use jsonb_extract_path_text() to get the corresponding value.
Demo on DB Fiddle:
with t as (
select '{"body": {"project-id": "111"}}'::jsonb js
union all select '{"body": {"my-org.project-id": "222"}}'::jsonb
union all select '{"body": {"other-org.project-id": "333"}}'::jsonb
)
select jsonb_extract_path_text(t.js -> 'body', x.k) projectid
from t
cross join lateral jsonb_object_keys(t.js -> 'body') as x(k)
| projectid |
| :--------- |
| 111 |
| 222 |
| 333 |
Related
I have table of data jsonb documents in postgres and second table containing templates for data.
I need to match data jsonb row with template jsonb row just by order of elements in array in effective way.
template jsonb document:
{
"template":1,
"rows":[
"first row",
"second row",
"third row"
]
}
data jsonb document:
{
"template":1,
"data":[
125,
578,
445
]
}
desired output:
Desc
Amount
first row
125
second row
578
third row
445
template table:
| id | jsonb |
| -------- | ------------------------------------------------------ |
| 1 | {"template":1,"rows":["first row","second row","third row"]} |
| 2 | {"template":2,"rows":["first row","second row","third row"]} |
| 3 | {"template":3,"rows":["first row","second row","third row"]} |
data table:
| id | jsonb |
| -------- | ------------------------------------------- |
| 1 | {"template":1,"data":[125,578,445]} |
| 2 | {"template":1,"data":[125,578,445]} |
| 3 | {"template":2,"data":[125,578,445]} |
I have millions of data jsonb documents and hundreds of templates.
I would do it just by converting both to tables, then use row_number windowed function but it does not seem very effective way to me.
Is there better way of doing this?
You will have to normalize this mess "on-the-fly" to get the output you want.
You need to unnest each array using jsonb_array_elements() using the with ordinality option to get the array index. You can join the two tables by extracting the value of the template key:
Assuming you want to return this for a specific row from the data table:
select td.val, dt.val
from data
cross join jsonb_array_elements_text(data.jsonb_column -> 'data') with ordinality as dt(val, idx)
left join template tpl
on tpl.jsonb_column ->> 'template' = data.jsonb_column ->> 'template'
left join jsonb_array_elements_text(tpl.jsonb_column -> 'rows') with ordinality as td(val, idx)
on td.idx = dt.idx
where data.id = 1;
Online example
Suppose that I have a table named agents_timesheet that having a structure like this:
ID | name | health_check_record | date | clock_in | clock_out
---------------------------------------------------------------------------------------------------------
1 | AAA | {"mental":{"stress":"no", "depression":"no"}, | 6-Dec-2021 | 08:25:07 |
| | "physical":{"other_symptoms":"headache", "flu":"no"}} | | |
---------------------------------------------------------------------------------------------------------
2 | BBB | {"mental":{"stress":"no", "depression":"no"}, | 6-Dec-2021 | 08:26:12 |
| | "physical":{"other_symptoms":"no", "flu":"yes"}} | | |
---------------------------------------------------------------------------------------------------------
3 | CCC | {"mental":{"stress":"no", "depression":"severe"}, | 6-Dec-2021 | 08:27:12 |
| | "physical":{"other_symptoms":"cancer", "flu":"yes"}} | | |
Now I need to get all agents having flu at the day. As for getting the flu from a single JSON in Oracle SQL, I can already get it by this SQL statement:
SELECT * FROM JSON_TABLE(
'{"mental":{"stress":"no", "depression":"no"}, "physical":{"fever":"no", "flu":"yes"}}', '$'
COLUMNS (fever VARCHAR(2) PATH '$.physical.flu')
);
As for getting the values from the column health_check_record, I can get it by utilizing the SELECT statement.
But How to get the values of flu in the JSON in the health_check_record of that table?
Additional question
Based on the table, how can I retrieve full list of other_symptoms, then it will get me this kind of output:
ID | name | other_symptoms
-------------------------------
1 | AAA | headache
2 | BBB | no
3 | CCC | cancer
You can use JSON_EXISTS() function.
SELECT *
FROM agents_timesheet
WHERE JSON_EXISTS(health_check_record, '$.physical.flu == "yes"');
There is also "plain old way" without JSON parsing only treting column like a standard VARCHAR one. This way will not work in 100% of cases, but if you have the data in the same way like you described it might be sufficient.
SELECT *
FROM agents_timesheet
WHERE health_check_record LIKE '%"flu":"yes"%';
How to get the values of flu in the JSON in the health_check_record of that table?
From Oracle 12, to get the values you can use JSON_TABLE with a correlated CROSS JOIN to the table:
SELECT a.id,
a.name,
j.*,
a."DATE",
a.clock_in,
a.clock_out
FROM agents_timesheet a
CROSS JOIN JSON_TABLE(
a.health_check_record,
'$'
COLUMNS (
mental_stress VARCHAR2(3) PATH '$.mental.stress',
mental_depression VARCHAR2(3) PATH '$.mental.depression',
physical_fever VARCHAR2(3) PATH '$.physical.fever',
physical_flu VARCHAR2(3) PATH '$.physical.flu'
)
) j
WHERE physical_flu = 'yes';
db<>fiddle here
You can use "dot notation" to access data from a JSON column. Like this:
select "DATE", id, name
from agents_timesheet t
where t.health_check_record.physical.flu = 'yes'
;
DATE ID NAME
----------- --- ----
06-DEC-2021 2 BBB
Note that this approach requires that you use an alias for the table name (so you can use it in accessing the JSON data).
For testing I used the data posted by MT0 on dbfiddle. I am not a big fan of double-quoted column names; use something else for "DATE", such as dt or date_.
I have a SELECT query that works perfectly fine and it returns a single row with multiple named columns:
| registered | downloaded | subscribed | requested_invoice | paid |
|------------|------------|------------|-------------------|------|
| 9000 | 7000 | 5000 | 4000 | 3000 |
But I need to transpose this result to a new table that looks like this:
| type | value |
|-------------------|-------|
| registered | 9000 |
| downloaded | 7000 |
| subscribed | 5000 |
| requested_invoice | 4000 |
| paid | 3000 |
I have the additional module tablefunc enabled at PostgreSQL but I can't get the crosstab() function to work for this. What can I do?
You need the reverse operation of what crosstab() does. Some call it "unpivot". A LATERAL join to a VALUES expression should be the most elegant way:
SELECT l.*
FROM tbl -- or replace the table with your subquery
CROSS JOIN LATERAL (
VALUES
('registered' , registered)
, ('downloaded' , downloaded)
, ('subscribed' , subscribed)
, ('requested_invoice', requested_invoice)
, ('paid' , paid)
) l(type, value)
WHERE id = 1; -- or whatever
You may need to cast some or all columns to arrive at a common data type. Like:
...
VALUES
('registered' , registered::text)
, ('downloaded' , downloaded::text)
, ...
Related:
Postgres: convert single row to multiple rows (unpivot)
For the reverse operation - "pivot" or "cross-tabulation":
PostgreSQL Crosstab Query
Suppose I have this table:
Content
+----+---------+
| id | title |
+----+---------+
| 1 | lorem |
+----|---------|
And this one:
Fields
+----+------------+----------+-----------+
| id | id_content | name | value |
+----+------------+----------+-----------+
| 1 | 1 | subtitle | ipsum |
+----+------------+----------+-----------|
| 2 | 1 | tags | tag1 |
+----+------------+----------+-----------|
| 3 | 1 | tags | tag2 |
+----+------------+----------+-----------|
| 4 | 1 | tags | tag3 |
+----+------------+----------+-----------|
The thing is: i want to query the content, transforming all the rows from "Fields" into columns, having something like:
+----+-------+----------+---------------------+
| id | title | subtitle | tags |
+----+-------+----------+---------------------+
| 1 | lorem | ipsum | [tag1,tag2,tag3] |
+----+-------+----------+---------------------|
Also, subtitle and tags are just examples. I can have as many fields as I desired, them being array or not.
But I haven't found a way to convert the repeated "name" values into an array, even more without transforming "subtitle" into array as well. If that's not possible, "subtitle" could also turn into an array and I could change it later on the code, but I needed at least to group everything somehow. Any ideas?
You can use array_agg, e.g.
SELECT id_content, array_agg(value)
FROM fields
WHERE name = 'tags'
GROUP BY id_content
If you need the subtitle, too, use a self-join. I have a subselect to cope with subtitles that don't have any tags without returning arrays filled with NULLs, i.e. {NULL}.
SELECT f1.id_content, f1.value, f2.value
FROM fields f1
LEFT JOIN (
SELECT id_content, array_agg(value) AS value
FROM fields
WHERE name = 'tags'
GROUP BY id_content
) f2 ON (f1.id_content = f2.id_content)
WHERE f1.name = 'subtitle';
See http://www.postgresql.org/docs/9.3/static/functions-aggregate.html for details.
If you have access to the tablefunc module, another option is to use crosstab as pointed out by Houari. You can make it return arrays and non-arrays with something like this:
SELECT id_content, unnest(subtitle), tags
FROM crosstab('
SELECT id_content, name, array_agg(value)
FROM fields
GROUP BY id_content, name
ORDER BY 1, 2
') AS ct(id_content integer, subtitle text[], tags text[]);
However, crosstab requires that the values always appear in the same order. For instance, if the first group (with the same id_content) doesn't have a subtitle and only has tags, the tags will be unnested and will appear in the same column with the subtitles.
See also http://www.postgresql.org/docs/9.3/static/tablefunc.html
If the subtitle value is the only "constant" that you wan to separate, you can do:
SELECT * FROM crosstab
(
'SELECT content.id,name,array_to_string(array_agg(value),'','')::character varying FROM content inner join
(
select * from fields where fields.name = ''subtitle''
union all
select * from fields where fields.name <> ''subtitle''
) fields_ordered
on fields_ordered.id_content = content.id group by content.id,name'
)
AS
(
id integer,
content_name character varying,
tags character varying
);
Consider this schema:
key: REQUIRED INTEGER
description: NULLABLE STRING
field: REPEATED RECORD {
field.names: REQUIRED STRING
field.value: NULLABLE FLOAT
}
Where: key is unique by table, field.names is actually a comma-separated list of properties ("property1","property2","property3"...).
Sample dataset (don't pay attention to the actual values, they are only for demonstration of the structure):
{"key":1,"description":"Cool","field":[{"names":"\"Nice\",\"Wonderful\",\"Woohoo\"", "value":1.2},{"names":"\"Everything\",\"is\",\"Awesome\"", "value":20}]}
{"key":2,"description":"Stack","field":[{"names":"\"Overflow\",\"Exchange\",\"Nice\"", "value":2.0}]}
{"key":3,"description":"Iron","field":[{"names":"\"The\",\"Trooper\"", "value":666},{"names":"\"Aces\",\"High\",\"Awesome\"", "value":333}]}
What I need is a way to query for the values of multiple field.names at once. The output should be like this:
+-----+--------+-------+-------+-------+-------+
| key | desc | prop1 | prop2 | prop3 | prop4 |
+-----+--------+-------+-------+-------+-------+
| 1 | Desc 1 | 1.0 | 2.0 | 3.0 | 4.0 |
| 2 | Desc 2 | 4.0 | 3.0 | 2.0 | 1.0 |
| ... | | | | | |
+-----+--------+-------+-------+-------+-------+
If the same key contains fields with the same queried name, only the first value should be considered.
And here is my query so far:
select all.key as key, all.description as desc,
t1.col as prop1, t2.col as prop2, t3.col as prop3 //and so on...
from mydataset.mytable all
left join each
(select key, field.value as col from
mydataset.mytable
where lower(field.names) contains '"trooper"'
group each by key, col
) as t1 on all.key = t1.key
left join each
(select key, field.value as col from
mydataset.mytable
where lower(field.names) contains '"awesome"'
group each by key, col
) as t2 on all.key = t2.key
left join each
(select key, field.value as col from
mydataset.mytable
where lower(field.names) contains '"nice"'
group each by key, col
) as t3 on all.key = t3.key
//and so on...
The output of this query would be:
+-----+-------+-------+-------+-------+
| key | desc | prop1 | prop2 | prop3 |
+-----+-------+-------+-------+-------+
| 1 | Cool | null | 20.0 | 1.2 |
| 2 | Stack | null | null | 2.0 |
| 3 | Iron | 666.0 | 333.0 | null |
+-----+-------+-------+-------+-------+
So my question is: is this the way to go? If my user wants, lets say, 200 properties from my table, should I just make 200 self-joins? Is it scalable, considering the table can grow in billions of rows? Is there another way to do the same, using BigQuery?
Thanks.
Generally speaking, a query with more than 50 joins can start to become problematic, particularly if you're joining large tables. Even with repeated fields, you want to try to scan your tables in one pass wherever possible.
It's useful to note that when you query a table with a repeated field, you are really querying a semi-flattened representation of that table. You can pretend that each repetition is its own row, and apply filters, expressions, and grouping accordingly.
In this case, I think you can probably get away with a single scan:
select
key,
desc,
max(if(lower(field.names) contains "trooper", field.value, null))
within record as prop1,
max(if(lower(field.names) contains "awesome", field.value, null))
within record as prop2,
...
from mydataset.mytable
In this case, each "prop" field just selects the value corresponding to each desired field name, or null if it doesn't exist, and then aggregates those results using the "max" function. I'm assuming that there's only one occurrence of a field name per key, in which case the specific aggregation function doesn't matter much, since it only exists to collapse nulls. But obviously you should swap it for something more appropriate if needed.
The "within record" syntax tells BigQuery to perform those aggregations only over the repeated fields within a record, and not across the entire table, thus eliminating the need for a "group by" clause at the end.