How to SUM numbers from a plain jsonb array? - sql

I'm facing issues with a jsonb ARRAY column in PostgreSQL.
I need to sum this column for each row.
Expected Result:
index
sum(snx_wavelenghts)
1
223123
2
223123

You can solve this ...
... with a subquery, then aggregate:
SELECT index, sum(nr) AS wavelength_sum
FROM (
SELECT index, jsonb_array_elements(snx_wavelengths)::numeric AS nr
FROM tbl
) sub
GROUP BY 1
ORDER BY 1; -- optional?
... with an aggregate in a correlated subquery:
SELECT index
, (SELECT sum(nr::numeric) FROM jsonb_array_elements(snx_wavelengths) nr) AS wavelength_sum
FROM tbl
ORDER BY 1; -- optional?
... or with an aggregate in a LATERAL subquery:
SELECT t.index, js.wavelength_sum
FROM tbl t
LEFT JOIN LATERAL (
SELECT sum(nr::numeric) AS wavelength_sum
FROM jsonb_array_elements(t.snx_wavelengths) nr
) js ON true
ORDER BY 1; -- optional?
fiddle
See:
What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
Your screenshot shows fractional digits. Cast to the type numeric to get exact results. A floating point type like real or float can introduce rounding errors.

You’ll need to extract the jsonb array contents from the jsonb array using jsonb_array_elements function before summing them. Here’s an example
SELECT SUM(w::float) AS wavelength_sum
FROM (
SELECT jsonb_array_elements(snx_wavelengths) AS w
FROM my_table
);
This should work if I remember correctly (remember to update my_table to your table name). More info here https://www.postgresql.org/docs/9.5/functions-json.html

Related

Extract Earliest Date from Postgres jsonb array

Suppose I have a jsonb array in a postgres column like
[{"startDate": "2019-09-01"}, {"startDate": "2019-07-22"}, {"startDate": "2019-08-08"}]
Is there a way to extract the earliest startDate from the jsonb array? I have tried using jsonb_array_elements but don't see how to loop through all the elements.
You can use a scalar sub-query
select (select (e.element ->> 'startDate')::date as start_date
from jsonb_array_elements(t.the_column) as e.element
order by start_date desc
limit 1) as start_date
from the_table t
You need to replace the_table and the_column with the actual table and column name you are using.
You can directly use MIN() aggregation after casting the derived value to date :
SELECT MIN((elm ->> 'startDate')::date)
FROM t
CROSS JOIN jsonb_array_elements(jsdata) AS j(elm)
Demo

Generate series / range / array in SQL (Big Query) with min and max values taken from another table

I need to generate table t1 with N consecutive numbers in each row, starting with the smallest value in another table t and ending with the largest value in the table t.
How I do this with Big Query Standard SQL?
For simplicitys sake, imagine t is created in the following way (except that you do not know the start and end value beforehand)
SELECT num FROM UNNEST(GENERATE_ARRAY(51, 650)) AS num;
Somehow I would like to do something to the effect of
SELECT num FROM UNNEST(GENERATE_ARRAY(MIN(t.num), MAX(t.num))) AS t1;
This question is very similar to [1], with the difference that the start and end of the series dependent on the min/max values of another table.
[1] How to generate series in BigQuery Standard SQL
You can use a subquery:
SELECT tt
FROM (SELECT MIN(t.num) as min_num, MAX(t.num) as max_num
FROM t
) t CROSS JOIN
UNNEST(GENERATE_ARRAY(t.min_num, t.max_num)) tt

Compare result of two table functions using one column from each

According the instructions here I have created two functions that use EXECUTE FORMAT and return the same table of (int,smallint).
Sample definitions:
CREATE OR REPLACE FUNCTION function1(IN _tbl regclass, IN _tbl2 regclass,
IN field1 integer)
RETURNS TABLE(id integer, dist smallint)
CREATE OR REPLACE FUNCTION function2(IN _tbl regclass, IN _tbl2 regclass,
IN field1 integer)
RETURNS TABLE(id integer, dist smallint)
Both functions return the exact same number of rows. Sample result (will be always ordered by dist):
(49,0)
(206022,3)
(206041,3)
(92233,4)
Is there a way to compare values of the second field between the two functions for the same rows, to ensure that both results are the same:
For example:
SELECT
function1('tblp1','tblp2',49),function2('tblp1_v2','tblp2_v2',49)
Returns something like:
(49,0) (49,0)
(206022,3) (206022,3)
(206041,3) (206041,3)
(92233,4) (133,4)
Although I am not expecting identical results (each function is a topK query and I have ties which are broken arbitrarily / with some optimizations in the second function for faster performance) I can ensure that both functions return correct results, if for each row the second numbers in the results are the same. In the example above, I can ensure I get correct results, because:
1st row 0 = 0,
2nd row 3 = 3,
3rd row 3 = 3,
4th row 4 = 4
despite the fact that for the 4th row, 92233!=133
Is there a way to get only the 2nd field of each function result, to batch compare them e.g. with something like:
SELECT COUNT(*)
FROM
(SELECT
function1('tblp1','tblp2',49).field2,
function2('tblp1_v2','tblp2_v2',49).field2 ) n2
WHERE function1('tblp1','tblp2',49).field2 != function1('tblp1','tblp2',49).field2;
I am using PostgreSQL 9.3.
Is there a way to get only the 2nd field of each function result, to batch compare them?
All of the following answers assume that rows are returned in matching order.
Postgres 9.3
With the quirky feature of exploding rows from SRF functions returning the same number of rows in parallel:
SELECT count(*) AS mismatches
FROM (
SELECT function1('tblp1','tblp2',49) AS f1
, function2('tblp1_v2','tblp2_v2',49) AS f2
) sub
WHERE (f1).dist <> (f2).dist; -- note the parentheses!
The parentheses around the row type are necessary to disambiguate from a possible table reference. Details in the manual here.
This defaults to Cartesian product of rows if the number of returned rows is not the same (which would break it completely for you).
Postgres 9.4
WITH ORDINALITY to generate row numbers on the fly
You can use WITH ORDINALITY to generate a row number o the fly and don't need to depend on pairing the result of SRF functions in the SELECT list:
SELECT count(*) AS mismatches
FROM function1('tblp1','tblp2',49) WITH ORDINALITY AS f1(id,dist,rn)
FULL JOIN function2('tblp1_v2','tblp2_v2',49) WITH ORDINALITY AS f2(id,dist,rn) USING (rn)
WHERE f1.dist IS DISTINCT FROM f2.dist;
This works for the same number of rows from each function as well as differing numbers (which would be counted as mismatch).
Related:
PostgreSQL unnest() with element number
ROWS FROM to join sets row-by-row
SELECT count(*) AS mismatches
FROM ROWS FROM (function1('tblp1','tblp2',49)
, function2('tblp1_v2','tblp2_v2',49)) t(id1, dist1, id2, dist2)
WHERE t.dist1 IS DISTINCT FROM t.dist2;
Related answer:
Is it possible to answer queries on a view before fully materializing the view?
Aside:
EXECUTE FORMAT is not a set plpgsql functionality. RETURN QUERY is. format() is just a convenient function for building a query string, can be used anywhere in SQL or plpgsql.
The order in which the rows are returned from the functions is not guaranteed. If you can return the row_number() (rn in the below example) from the functions then:
select
count(f1.dist is null or f2.dist is null or null) as diff_count
from
function1('tblp1','tblp2',49) f1
inner join
function2('tblp1_v2','tblp2_v2',49) f2 using(rn)
For future reference:
Checking difference in number of rows:
SELECT
ABS(count(f1a.*)-count(f2a.*))
FROM
(SELECT f1.dist, row_number() OVER(ORDER BY f1.dist) rn
FROM
function1('tblp1','tblp2',49) f1)
f1a FULL JOIN
(SELECT f2.dist, row_number() OVER(ORDER BY f2.dist) rn
FROM
function2('tblp1_v2','tblp2_v2',49) f2) f2a
USING (rn);
Checking difference in dist for same ordered rows:
SELECT
COUNT(*)
FROM
(SELECT f1.dist, row_number() OVER(ORDER BY f1.dist) rn
FROM
function1('tblp1','tblp2',49) f1)
f1a
(SELECT f2.dist, row_number() OVER(ORDER BY f2.dist) rn
FROM
function2('tblp1_v2','tblp2_v2',49) f2) f2a
WHERE f1a.rn=f2a.rn
AND f1a.distance <> f2a.distance;
A simple OVER() might also work since results of the functions are already ordered but is added for extra check.

Oracle to PostgreSQL query conversion with string_to_array()

I have below query in Oracle:
SELECT to_number(a.v_VALUE), b.v_VALUE
FROM TABLE(inv_fn_splitondelimiter('12;5;25;10',';')) a
JOIN TABLE(inv_fn_splitondelimiter('10;20;;', ';')) b
ON a.v_idx = b.v_idx
which give me result like:
I want to convert the query to Postgres. I have tried a query like:
SELECT UNNEST(String_To_Array('10;20;',';'))
I have also tried:
SELECT a,b
FROM (select UNNEST(String_To_Array('12;5;25;10;2',';'))) a
LEFT JOIN (select UNNEST(String_To_Array('12;5;25;10',';'))) b
ON a = b
But didn't get a correct result.
I don't know how to write query that's fully equivalent to the Oracle version. Anyone?
Starting with Postgres 9.4 you can use unnest() with multiple arrays to unnest them in parallel:
SELECT *
FROM unnest('{12,5,25,10,2}'::int[]
, '{10,20}' ::int[]) AS t(col1, col2);
That's all. NULL values are filled in automatically for missing elements to the right.
If parameters are provided as strings, convert with string_to_array() first. Like:
SELECT *
FROM unnest(string_to_array('12;5;25;10', ';')
, string_to_array('10;20' , ';')) AS t(col1, col2);
More details and an alternative solution for older versions:
Unnest multiple arrays in parallel
Split given string and prepare case statement
In the expression select a the a is not a column, but the name of the table alias. Consequently that expressions selects a complete row-tuple (albeit with just a single column), not a single column.
You need to define proper column aliases for the derived tables. It is also recommended to use set returning functions only in the from clause, not in the select list.
If you are not on 9.4 you need to generate the "index" using a window function. If you are on 9.4 then Erwin's answer is much better.
SELECT a.v_value, b.v_value
FROM (
select row_number() over () as idx, -- generate an index for each element
i as v_value
from UNNEST(String_To_Array('12;5;25;10;2',';')) i
) as a
JOIN (
select row_number() over() as idx,
i as v_value
from UNNEST(String_To_Array('10;20;;',';')) i
) as b
ON a.idx = b.idx;
An alternative way in 9.4 would be to use the with ordinality option to generate the row index in case you do need the index value:
select a.v_value, b.v_value
from regexp_split_to_table('12;5;25;10;2',';') with ordinality as a(v_value, idx)
left join regexp_split_to_table('10;20;;',';') with ordinality as b(v_value, idx)
on a.idx = b.idx

PostgreSQL order of WHERE "table"."column" IN query

In PostgreSQL, does this query
SELECT "table".* FROM "table" WHERE "table"."column" IN (1, 5, 3)
always return the results in the 1, 5, 3 order or is it ambiguous?
If it's ambiguous, how do I properly assure the results are in the order 1, 5, 3?
The WHERE clause will not order the results in any way, it will just select matching records, in whatever order the database index finds them in.
You'll have to add an order by clause.
Add something like the following to your select statement
order by CASE WHEN "column"=1 THEN 1
WHEN "column"=2 THEN 2
ELSE 3
END
if you have many more than three values it may be easier to make a lookup table and join to that in you query
False orders before true
SELECT "table".*
FROM "table"
WHERE "table"."column" IN (1, 5, 3)
order by
"column" != 1,
"column" != 5,
"column" != 3
When you use "IN" in a select statement Postgre just return the matching rows against that range of values. It's not ambiguous in any way but if you need to order the results you need to explicitly add ORDER BY "column1,column2..."
A set knows no order per se. A SELECT query needs ORDER BY to return ordered rows.
Other answers have suggested CASE statements or boolean expressions in your ORDER BY, but that's far from elegant and rather inefficient with big tables. I suggest to use an array or a comma-separated string instead of a set for your query.
For a given table:
CREATE TABLE tbl (col int);
Using an array it can work like this:
SELECT col
FROM tbl
JOIN (
SELECT col, row_number() OVER () AS rn
FROM unnest('{1,5,3}'::int[]) AS col
) u USING (col)
ORDER BY rn;
Returns all rows found in the sequence of the input array:
-> SQLfiddle
For more details and future-proof code consider this closely related question:
PostgreSQL unnest() with element number
Or the corresponding question on dba.SE:
How to preserve the original order of elements in an unnested array?