How to sum a value in a JSONB array in Postgresql? - sql

Given the following data in the jsonb column p06 in the table ryzom_characters:
-[ RECORD 1 ]------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
p06 | {
"id": 675010,
"cname": "Bob",
"rpjobs": [
{
"progress": 25
},
{
"progress": 13
},
{
"progress": 30
}
]
}
I am attempting to sum the value of progress. I have attempted the following:
SELECT
c.cname AS cname,
jsonb_array_elements(c.p06->'rpjobs')::jsonb->'progress' AS value
FROM ryzom_characters c
Where cid = 675010
ORDER BY value DESC
LIMIT 50;
Which correctly lists the values:
cname | value
--------+-------
Savisi | 30
Savisi | 25
Savisi | 13
(3 rows)
But now I would like to sum these values, which could be null.
How do I correctly sum an object field within an array?
Here is the table structure:
Table "public.ryzom_characters"
Column | Type | Collation | Nullable | Default
---------------+------------------------+-----------+----------+---------
cid | bigint | | |
cname | character varying(255) | | not null |
p06 | jsonb | | |
x01 | jsonb | | |

Use the function jsonb_array_elements() in a lateral join in the from clause:
select cname, sum(coalesce(value, '0')::int) as value
from (
select
p06->>'cname' as cname,
value->>'progress' as value
from ryzom_characters
cross join jsonb_array_elements(p06->'rpjobs')
where cid = 675010
) s
group by cname
order by value desc
limit 50;
You can use left join instead of cross join to protect the query against inconsistent data:
left join jsonb_array_elements(p06->'rpjobs')
on jsonb_typeof(p06->'rpjobs') = 'array'
where p06->'rpjobs' <> 'null'

The function jsonb_array_elements() is a set-returning function. You should therefore use it as a row source (in the FROM clause). After the call you have a table where every row contains an array element. From there on it is relatively easy.
SELECT cname,
sum(coalesce(r.prog->>'progress'::int, 0)) AS value
FROM ryzom_characters c,
jsonb_array_elements(c.p06->'rpjobs') r (prog)
WHERE c.cid = 675010
GROUP BY cname
ORDER BY value DESC
LIMIT 50;

Related

Match two jsonb documents by order of elements in array

I have table of data jsonb documents in postgres and second table containing templates for data.
I need to match data jsonb row with template jsonb row just by order of elements in array in effective way.
template jsonb document:
{
"template":1,
"rows":[
"first row",
"second row",
"third row"
]
}
data jsonb document:
{
"template":1,
"data":[
125,
578,
445
]
}
desired output:
Desc
Amount
first row
125
second row
578
third row
445
template table:
| id | jsonb |
| -------- | ------------------------------------------------------ |
| 1 | {"template":1,"rows":["first row","second row","third row"]} |
| 2 | {"template":2,"rows":["first row","second row","third row"]} |
| 3 | {"template":3,"rows":["first row","second row","third row"]} |
data table:
| id | jsonb |
| -------- | ------------------------------------------- |
| 1 | {"template":1,"data":[125,578,445]} |
| 2 | {"template":1,"data":[125,578,445]} |
| 3 | {"template":2,"data":[125,578,445]} |
I have millions of data jsonb documents and hundreds of templates.
I would do it just by converting both to tables, then use row_number windowed function but it does not seem very effective way to me.
Is there better way of doing this?
You will have to normalize this mess "on-the-fly" to get the output you want.
You need to unnest each array using jsonb_array_elements() using the with ordinality option to get the array index. You can join the two tables by extracting the value of the template key:
Assuming you want to return this for a specific row from the data table:
select td.val, dt.val
from data
cross join jsonb_array_elements_text(data.jsonb_column -> 'data') with ordinality as dt(val, idx)
left join template tpl
on tpl.jsonb_column ->> 'template' = data.jsonb_column ->> 'template'
left join jsonb_array_elements_text(tpl.jsonb_column -> 'rows') with ordinality as td(val, idx)
on td.idx = dt.idx
where data.id = 1;
Online example

Postgres jsonb. Heterogenous json fields

If I have a table with a single jsonb column and the table has data like this:
[{"body": {"project-id": "111"}},
{"body": {"my-org.project-id": "222"}},
{"body": {"other-org.project-id": "333"}}]
Basically it stores project-id differently for different rows.
Now I need a query where the data->'body'->'etc'., from different rows would coalesce into a single field 'project-id', how can I do that?
e.g.: if I do something like this:
select data->'body'->'project-id' projectid from mytable
it will return something like:
| projectid |
| 111 |
But I also want project-id's in other rows too, but I don't want additional columns in the results. i.e, I want this:
| projectid |
| 111 |
| 222 |
| 333 |
I understand that each of your rows contains a json object, with a nested object whose key varies over rows, and whose value you want to acquire.
Assuming the 'body' always has a single key, you could do:
select jsonb_extract_path_text(t.js -> 'body', x.k) projectid
from t
cross join lateral jsonb_object_keys(t.js -> 'body') as x(k)
The lateral join on jsonb_object_keys() extracts all keys in the object as rows. Then we use jsonb_extract_path_text() to get the corresponding value.
Demo on DB Fiddle:
with t as (
select '{"body": {"project-id": "111"}}'::jsonb js
union all select '{"body": {"my-org.project-id": "222"}}'::jsonb
union all select '{"body": {"other-org.project-id": "333"}}'::jsonb
)
select jsonb_extract_path_text(t.js -> 'body', x.k) projectid
from t
cross join lateral jsonb_object_keys(t.js -> 'body') as x(k)
| projectid |
| :--------- |
| 111 |
| 222 |
| 333 |

Unnest json string array

I'm using psql and I have a table that looks like this:
id | dashboard_settings
-----------------------
1 | {"query": {"year_end": 2018, "year_start": 2015, "category": ["123"]}}
There are numerous rows, but for every row the "category" value is an array with one integer (in string format).
Is there a way I can 'unpackage' the category object? So that it just has 123 as an integer?
I've tried this but had no success:
SELECT jsonb_extract_path_text(dashboard_settings->'query', 'category') from table
This returns:
jsonb_extract_path_text | ["123"]
when I want:
jsonb_extract_path_text | 123
You need to use the array access operator for which is simply ->> followed by the array index:
select jsonb_extract_path(dashboard_settings->'query', 'category') ->> 0
from the_table
alternatively:
select dashboard_settings -> 'query' -> 'category' ->> 0
from the_table
Consider:
select dashboard_settings->'query'->'category'->>0 c from mytable
Demo on DB Fiddle:
| c |
| :-- |
| 123 |

Greatest N Per Group with JOIN and multiple order columns

I have two tables:
Table0:
| ID | TYPE | TIME | SITE |
|----|------|-------|------|
| aa | 1 | 12-18 | 100 |
| aa | 1 | 12-10 | 101 |
| bb | 2 | 12-10 | 102 |
| cc | 1 | 12-09 | 100 |
| cc | 2 | 12-12 | 103 |
| cc | 2 | 12-01 | 109 |
| cc | 1 | 12-07 | 101 |
| dd | 1 | 12-08 | 100 |
and
Table1:
| ID |
|----|
| aa |
| cc |
| cc |
| dd |
| dd |
I'm trying to output results where:
ID must exist in both tables.
TYPE must be the maximum for each ID.
TIME must be the minimum value for the maximum TYPE for each ID.
SITE should be the value from the same row as the minimum TIME value.
Given my sample data, my results should look like this:
| ID | TYPE | TIME | SITE |
|----|------|-------|------|
| aa | 1 | 12-10 | 101 |
| cc | 2 | 12-01 | 109 |
| dd | 1 | 12-08 | 100 |
I've tried these statements:
INSERT INTO "NuTable"
SELECT DISTINCT(QTS."ID"), "SITE",
CASE WHEN MAS.MAB=1 THEN 'B'
WHEN MAS.MAB=2 THEN 'F'
ELSE NULL END,
"TIME"
FROM (SELECT DISTINCT("ID") FROM TABLE1) AS QTS,
TABLE0 AS MA,
(SELECT "ID", MAX("TYPE") AS MASTY, MIN("TIME") AS MASTM
FROM TABLE0
GROUP BY "ID") AS MAS,
WHERE QTS."ID" = MA."ID"
AND QTS."ID" = MAS."ID"
AND MSD.MASTY =MA."TYPE"
...which generates a syntax error
INSERT INTO "NuTable"
SELECT DISTINCT(QTS."ID"), "SITE",
CASE WHEN MAS.MAB=1 THEN 'B'
WHEN MAS.MAB=2 THEN 'F'
ELSE NULL END,
"TIME"
FROM (SELECT DISTINCT("ID") FROM TABLE1) AS QTS,
TABLE0 AS MA,
(SELECT "ID", MAX("TYPE") AS MAB
FROM TABLE0
GROUP BY "ID") AS MAS,
((SELECT "ID", MIN("TIME") AS MACTM, MIN("TYPE") AS MACTY
FROM TABLE0
WHERE "TYPE" = 1
GROUP BY "ID")
UNION
(SELECT "ID", MIN("TIME"), MAX("TYPE")
FROM TABLE0
WHERE "TYPE" = 2
GROUP BY "ID")) AS MACU
WHERE QTS."ID" = MA."ID"
AND QTS."ID" = MAS."ID"
AND MACU."ID" = QTS."ID"
AND MA."TIME" = MACU.MACTM
AND MA."TYPE" = MACU.MACTB
... which is getting the wrong results.
Answering your direct question "how to avoid...":
You get this error when you specify a column in a SELECT area of a statement that isn't present in the GROUP BY section and isn't part of an aggregating function like MAX, MIN, AVG
in your data, I cannot say
SELECT
ID, site, min(time)
FROM
table
GROUP BY
id
I didn't say what to do with SITE; it's either a key of the group (in which case I'll get every unique combination of ID,site and the min time in each) or it should be aggregated (eg max site per ID)
These are ok:
SELECT
ID, max(site), min(time)
FROM
table
GROUP BY
id
SELECT
ID, site, min(time)
FROM
table
GROUP BY
id,site
I cannot simply not specify what to do with it- what should the database return in such a case? (If you're still struggling, tell me in the comments what you think the db should do, and I'll better understand your thinking so I can tell you why it can't do that ). The programmer of the database cannot make this decision for you; you must make it
Usually people ask this when they want to identify:
The min time per ID, and get all the other row data as well. eg "What is the full earliest record data for each id?"
In this case you have to write a query that identifies the min time per id and then join that subquery back to the main data table on id=id and time=mintime. The db runs the subquery, builds a list of min time per id, then that effectively becomes a filter of the main data table
SELECT * FROM
(
SELECT
ID, min(time) as mintime
FROM
table
GROUP BY
id
) findmin
INNER JOIN table t ON t.id = findmin.id and t.time = findmin.mintime
What you cannot do is start putting the other data you want into the query that does the grouping, because you either have to group by the columns you add in (makes the group more fine grained, not what you want) or you have to aggregate them (and then it doesn't necessarily come from the same row as other aggregated columns - min time is from row 1, min site is from row 3 - not what you want)
Looking at your actual problem:
The ID value must exist in two tables.
The Type value must be largest group by id.
The Time value must be smallest in the largest type group.
Leaving out a solution that involves having or analytics for now, so you can get to grips with the theory here:
You need to find the max type group by id, and then join it back to the table to get the other relevant data also (time is needed) for that id/maxtype and then on this new filtered data set you need the id and min time
SELECT t.id,min(t.time) FROM
(
SELECT
ID, max(type) as maxtype
FROM
table
GROUP BY
id
) findmax
INNER JOIN table t ON t.id = findmax.id and t.type = findmax.maxtype
GROUP BY t.id
If you can't see why, let me know
demo:db<>fiddle
SELECT DISTINCT ON (t0.id)
t0.id,
type,
time,
first_value(site) OVER (PARTITION BY t0.id ORDER BY time) as site
FROM table0 t0
JOIN table1 t1 ON t0.id = t1.id
ORDER BY t0.id, type DESC, time
ID must exist in both tables
This can be achieved by joining both tables against their ids. The result of inner joins are rows that exist in both tables.
SITE should be the value from the same row as the minimum TIME value.
This is the same as "Give me the first value of each group ofids ordered bytime". This can be done by using the first_value() window function. Window functions can group your data set (PARTITION BY). So you are getting groups of ids which can be ordered separately. first_value() gives the first value of these ordered groups.
TYPE must be the maximum for each ID.
To get the maximum type per id you'll first have to ORDER BY id, type DESC. You are getting the maximum type as first row per id...
TIME must be the minimum value for the maximum TYPE for each ID.
... Then you can order this result by time additionally to assure this condition.
Now you have an ordered data set: For each id, the row with the maximum type and its minimum time is the first one.
DISTINCT ON gives you exactly the first row of each group. In this case the group you defined is (id). The result is your expected one.
I would write this using distinct on and in/exists:
select distinct on (t0.id) t0.*
from table0 t0
where exists (select 1 from table1 t1 where t1.id = t0.id)
order by t0.id, type desc, time asc;

SQL/PostgreSQL: How to select limited amount of rows of different types based on limits stored in a different table?

I have a table (table 1) where the first column is the key and the second column contains elements of different types. In table 1, there's three types (type A, B, C) but the actual database have many more types.
Table.1. A minimal example.
_________________
| | |
|_KEY| attribute |
|____|___________|
|k1 | A |
|k2 | A |
|k3 | B |
|k4 | C |
|k5 | C |
|____|___________|
From table 1; I am interested in retrieving only a limited amount of elements from each type. The limited amount of elements of a given type is provided by table 2, in which the elements type is the key of the table (_element).
To clarify; The limited amount of elements of type A to obtain from table 1. in this minimal example is 1. Likewise, for type B it is 2 and for type C it is 1.
Table 2. Limits of item to obtain for each type in table 1.
____________________
| _Element | Limit |
|----------|-------|
| A | 1 |
| B | 2 |
| C | 1 |
|__________|_______|
Finally, the elements should be retrieved from table 1 from top to bottom.
Thanks for any help and/or pointers / gus.
P.S.
For the above minimal example, the expected output would be
___________________
| Key| Attribute |
|____|____________|
| k1 | A |
| k3 | B |
| K4 | C |
|____|____________|
Since there only exists 1 C attribute for this particular minimal example. Note that if there would have existed, say 5 elements of type C then the follow table would have been obtained instead (since the limited amount of C elements is 2)
___________________
| Key| Attribute |
|____|____________|
| k1 | A |
| k3 | B |
| K4 | C |
|_k5 | C |
|____|____________|
You can always do it with a union.
select top (SELECT Limit FROM Table2 WHERE _Element='A') * from Table1
WHERE attribute = A
UNION ALL
select top (SELECT Limit FROM Table2 WHERE _Element='B') * from Table1
WHERE attribute = B
UNION ALL
select top (SELECT Limit FROM Table2 WHERE _Element='C') * from Table1
WHERE attribute = C
Or using row_number:
with cte as (SELECT _Key,
attribute,
ROW_NUMBER() OVER (Partition by attribute Order by _Key ASC) as rowno
From Table1)
SELECT * FROM cte
LEFT JOIN Table2 on Table2.Element = Table1.attribute
WHERE rowno >= Limit
I truly like the power of PostgreSQL arrays. So
select
table2._element,
unnest((array_agg(table1._key order by table1._key desc)[1:table2.limit])) as _key
from
table1 join table2 on (table1.attribute = table2._element)
group by
table2._element, table2.limit
where in the second field of the query:
array_agg(table1._key order by table1._key desc) - collects values into array in the specified order (note that order by table1._key desc is just for example and you might to skip it or to specify another one),
(...)[1:table2.limit] - returns array elements from 1 to table2.limit,
unnest(...) - unwraps previous result to rows.