Oracle to PostgreSQL query conversion with string_to_array() - sql

I have below query in Oracle:
SELECT to_number(a.v_VALUE), b.v_VALUE
FROM TABLE(inv_fn_splitondelimiter('12;5;25;10',';')) a
JOIN TABLE(inv_fn_splitondelimiter('10;20;;', ';')) b
ON a.v_idx = b.v_idx
which give me result like:
I want to convert the query to Postgres. I have tried a query like:
SELECT UNNEST(String_To_Array('10;20;',';'))
I have also tried:
SELECT a,b
FROM (select UNNEST(String_To_Array('12;5;25;10;2',';'))) a
LEFT JOIN (select UNNEST(String_To_Array('12;5;25;10',';'))) b
ON a = b
But didn't get a correct result.
I don't know how to write query that's fully equivalent to the Oracle version. Anyone?

Starting with Postgres 9.4 you can use unnest() with multiple arrays to unnest them in parallel:
SELECT *
FROM unnest('{12,5,25,10,2}'::int[]
, '{10,20}' ::int[]) AS t(col1, col2);
That's all. NULL values are filled in automatically for missing elements to the right.
If parameters are provided as strings, convert with string_to_array() first. Like:
SELECT *
FROM unnest(string_to_array('12;5;25;10', ';')
, string_to_array('10;20' , ';')) AS t(col1, col2);
More details and an alternative solution for older versions:
Unnest multiple arrays in parallel
Split given string and prepare case statement

In the expression select a the a is not a column, but the name of the table alias. Consequently that expressions selects a complete row-tuple (albeit with just a single column), not a single column.
You need to define proper column aliases for the derived tables. It is also recommended to use set returning functions only in the from clause, not in the select list.
If you are not on 9.4 you need to generate the "index" using a window function. If you are on 9.4 then Erwin's answer is much better.
SELECT a.v_value, b.v_value
FROM (
select row_number() over () as idx, -- generate an index for each element
i as v_value
from UNNEST(String_To_Array('12;5;25;10;2',';')) i
) as a
JOIN (
select row_number() over() as idx,
i as v_value
from UNNEST(String_To_Array('10;20;;',';')) i
) as b
ON a.idx = b.idx;
An alternative way in 9.4 would be to use the with ordinality option to generate the row index in case you do need the index value:
select a.v_value, b.v_value
from regexp_split_to_table('12;5;25;10;2',';') with ordinality as a(v_value, idx)
left join regexp_split_to_table('10;20;;',';') with ordinality as b(v_value, idx)
on a.idx = b.idx

Related

How to SUM numbers from a plain jsonb array?

I'm facing issues with a jsonb ARRAY column in PostgreSQL.
I need to sum this column for each row.
Expected Result:
index
sum(snx_wavelenghts)
1
223123
2
223123
You can solve this ...
... with a subquery, then aggregate:
SELECT index, sum(nr) AS wavelength_sum
FROM (
SELECT index, jsonb_array_elements(snx_wavelengths)::numeric AS nr
FROM tbl
) sub
GROUP BY 1
ORDER BY 1; -- optional?
... with an aggregate in a correlated subquery:
SELECT index
, (SELECT sum(nr::numeric) FROM jsonb_array_elements(snx_wavelengths) nr) AS wavelength_sum
FROM tbl
ORDER BY 1; -- optional?
... or with an aggregate in a LATERAL subquery:
SELECT t.index, js.wavelength_sum
FROM tbl t
LEFT JOIN LATERAL (
SELECT sum(nr::numeric) AS wavelength_sum
FROM jsonb_array_elements(t.snx_wavelengths) nr
) js ON true
ORDER BY 1; -- optional?
fiddle
See:
What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
Your screenshot shows fractional digits. Cast to the type numeric to get exact results. A floating point type like real or float can introduce rounding errors.
You’ll need to extract the jsonb array contents from the jsonb array using jsonb_array_elements function before summing them. Here’s an example
SELECT SUM(w::float) AS wavelength_sum
FROM (
SELECT jsonb_array_elements(snx_wavelengths) AS w
FROM my_table
);
This should work if I remember correctly (remember to update my_table to your table name). More info here https://www.postgresql.org/docs/9.5/functions-json.html

How to use pypika to generate the following SQL query?

I have a table with arrays as one column, and I want to sum the array elements together.
for example, if I have two arrays:
[1,2,3] and [2,1,3]
the result array will look like:
[3,3,6]
This can be done with the following query:
SELECT ARRAY (
SELECT sum(elem)
FROM tbl t
, unnest(t.arr) WITH ORDINALITY x(elem, rn)
GROUP BY rn
ORDER BY rn
);
How can I use pypika to generate this exact query? I was trying to solve the problem using pypika 'CustomFunction' and 'AnalyticFunction'
I'm using PostgreSQL 11.8.1

Using calculation with an an aliased column in ORDER BY

As we all know, the ORDER BY clause is processed after the SELECT clause, so a column alias in the SELECT clause can be used.
However, I find that I can’t use the aliased column in a calculation in the ORDER BY clause.
WITH data AS(
SELECT *
FROM (VALUES
('apple'),
('banana'),
('cherry'),
('date')
) AS x(item)
)
SELECT item AS s
FROM data
-- ORDER BY s; -- OK
-- ORDER BY item + ''; -- OK
ORDER BY s + ''; -- Fails
I know there are alternative ways of doing this particular query, and I know that this is a trivial calculation, but I’m interested in why the column alias doesn’t work when in a calculation.
I have tested in PostgreSQL, MariaDB, SQLite and Oracle, and it works as expected. SQL Server appears to be the odd one out.
The documentation clearly states that:
The column names referenced in the ORDER BY clause must correspond to
either a column or column alias in the select list or to a column
defined in a table specified in the FROM clause without any
ambiguities. If the ORDER BY clause references a column alias from
the select list, the column alias must be used standalone, and not as
a part of some expression in ORDER BY clause:
Technically speaking, your query should work since order by clause is logically evaluated after select clause and it should have access to all expressions declared in select clause. But without looking at having access to the SQL specs I cannot comment whether it is a limitation of SQL Server or the other RDBMS implementing it as a bonus feature.
Anyway, you can use CROSS APPLY as a trick.... it is part of FROM clause so the expressions should be available in all subsequent clauses:
SELECT item
FROM t
CROSS APPLY (SELECT item + '') AS CA(item_for_sort)
ORDER BY item_for_sort
It is simply due to the way expressions are evaluated. A more illustrative example:
;WITH data AS
(
SELECT * FROM (VALUES('apple'),('banana')) AS sq(item)
)
SELECT item AS s
FROM data
ORDER BY CASE WHEN 1 = 1 THEN s END;
This returns the same Invalid column name error. The CASE expression (and the concatenation of s + '' in the simpler case) is evaluated before the alias in the select list is resolved.
One workaround for your simpler case is to append the empty string in the select list:
SELECT
item + '' AS s
...
ORDER BY s;
There are more complex ways, like using a derived table or CTE:
;WITH data AS
(
SELECT * FROM (VALUES('apple'),('banana') AS sq(item)
),
step2 AS
(
SELECT item AS s FROM data
)
SELECT s FROM step2 ORDER BY s+'';
This is just the way that SQL Server works, and I think you could say "well SQL Server is bad because of this" but SQL Server could also say "what the heck is this use case?" :-)

Compare result of two table functions using one column from each

According the instructions here I have created two functions that use EXECUTE FORMAT and return the same table of (int,smallint).
Sample definitions:
CREATE OR REPLACE FUNCTION function1(IN _tbl regclass, IN _tbl2 regclass,
IN field1 integer)
RETURNS TABLE(id integer, dist smallint)
CREATE OR REPLACE FUNCTION function2(IN _tbl regclass, IN _tbl2 regclass,
IN field1 integer)
RETURNS TABLE(id integer, dist smallint)
Both functions return the exact same number of rows. Sample result (will be always ordered by dist):
(49,0)
(206022,3)
(206041,3)
(92233,4)
Is there a way to compare values of the second field between the two functions for the same rows, to ensure that both results are the same:
For example:
SELECT
function1('tblp1','tblp2',49),function2('tblp1_v2','tblp2_v2',49)
Returns something like:
(49,0) (49,0)
(206022,3) (206022,3)
(206041,3) (206041,3)
(92233,4) (133,4)
Although I am not expecting identical results (each function is a topK query and I have ties which are broken arbitrarily / with some optimizations in the second function for faster performance) I can ensure that both functions return correct results, if for each row the second numbers in the results are the same. In the example above, I can ensure I get correct results, because:
1st row 0 = 0,
2nd row 3 = 3,
3rd row 3 = 3,
4th row 4 = 4
despite the fact that for the 4th row, 92233!=133
Is there a way to get only the 2nd field of each function result, to batch compare them e.g. with something like:
SELECT COUNT(*)
FROM
(SELECT
function1('tblp1','tblp2',49).field2,
function2('tblp1_v2','tblp2_v2',49).field2 ) n2
WHERE function1('tblp1','tblp2',49).field2 != function1('tblp1','tblp2',49).field2;
I am using PostgreSQL 9.3.
Is there a way to get only the 2nd field of each function result, to batch compare them?
All of the following answers assume that rows are returned in matching order.
Postgres 9.3
With the quirky feature of exploding rows from SRF functions returning the same number of rows in parallel:
SELECT count(*) AS mismatches
FROM (
SELECT function1('tblp1','tblp2',49) AS f1
, function2('tblp1_v2','tblp2_v2',49) AS f2
) sub
WHERE (f1).dist <> (f2).dist; -- note the parentheses!
The parentheses around the row type are necessary to disambiguate from a possible table reference. Details in the manual here.
This defaults to Cartesian product of rows if the number of returned rows is not the same (which would break it completely for you).
Postgres 9.4
WITH ORDINALITY to generate row numbers on the fly
You can use WITH ORDINALITY to generate a row number o the fly and don't need to depend on pairing the result of SRF functions in the SELECT list:
SELECT count(*) AS mismatches
FROM function1('tblp1','tblp2',49) WITH ORDINALITY AS f1(id,dist,rn)
FULL JOIN function2('tblp1_v2','tblp2_v2',49) WITH ORDINALITY AS f2(id,dist,rn) USING (rn)
WHERE f1.dist IS DISTINCT FROM f2.dist;
This works for the same number of rows from each function as well as differing numbers (which would be counted as mismatch).
Related:
PostgreSQL unnest() with element number
ROWS FROM to join sets row-by-row
SELECT count(*) AS mismatches
FROM ROWS FROM (function1('tblp1','tblp2',49)
, function2('tblp1_v2','tblp2_v2',49)) t(id1, dist1, id2, dist2)
WHERE t.dist1 IS DISTINCT FROM t.dist2;
Related answer:
Is it possible to answer queries on a view before fully materializing the view?
Aside:
EXECUTE FORMAT is not a set plpgsql functionality. RETURN QUERY is. format() is just a convenient function for building a query string, can be used anywhere in SQL or plpgsql.
The order in which the rows are returned from the functions is not guaranteed. If you can return the row_number() (rn in the below example) from the functions then:
select
count(f1.dist is null or f2.dist is null or null) as diff_count
from
function1('tblp1','tblp2',49) f1
inner join
function2('tblp1_v2','tblp2_v2',49) f2 using(rn)
For future reference:
Checking difference in number of rows:
SELECT
ABS(count(f1a.*)-count(f2a.*))
FROM
(SELECT f1.dist, row_number() OVER(ORDER BY f1.dist) rn
FROM
function1('tblp1','tblp2',49) f1)
f1a FULL JOIN
(SELECT f2.dist, row_number() OVER(ORDER BY f2.dist) rn
FROM
function2('tblp1_v2','tblp2_v2',49) f2) f2a
USING (rn);
Checking difference in dist for same ordered rows:
SELECT
COUNT(*)
FROM
(SELECT f1.dist, row_number() OVER(ORDER BY f1.dist) rn
FROM
function1('tblp1','tblp2',49) f1)
f1a
(SELECT f2.dist, row_number() OVER(ORDER BY f2.dist) rn
FROM
function2('tblp1_v2','tblp2_v2',49) f2) f2a
WHERE f1a.rn=f2a.rn
AND f1a.distance <> f2a.distance;
A simple OVER() might also work since results of the functions are already ordered but is added for extra check.

PostgreSQL order of WHERE "table"."column" IN query

In PostgreSQL, does this query
SELECT "table".* FROM "table" WHERE "table"."column" IN (1, 5, 3)
always return the results in the 1, 5, 3 order or is it ambiguous?
If it's ambiguous, how do I properly assure the results are in the order 1, 5, 3?
The WHERE clause will not order the results in any way, it will just select matching records, in whatever order the database index finds them in.
You'll have to add an order by clause.
Add something like the following to your select statement
order by CASE WHEN "column"=1 THEN 1
WHEN "column"=2 THEN 2
ELSE 3
END
if you have many more than three values it may be easier to make a lookup table and join to that in you query
False orders before true
SELECT "table".*
FROM "table"
WHERE "table"."column" IN (1, 5, 3)
order by
"column" != 1,
"column" != 5,
"column" != 3
When you use "IN" in a select statement Postgre just return the matching rows against that range of values. It's not ambiguous in any way but if you need to order the results you need to explicitly add ORDER BY "column1,column2..."
A set knows no order per se. A SELECT query needs ORDER BY to return ordered rows.
Other answers have suggested CASE statements or boolean expressions in your ORDER BY, but that's far from elegant and rather inefficient with big tables. I suggest to use an array or a comma-separated string instead of a set for your query.
For a given table:
CREATE TABLE tbl (col int);
Using an array it can work like this:
SELECT col
FROM tbl
JOIN (
SELECT col, row_number() OVER () AS rn
FROM unnest('{1,5,3}'::int[]) AS col
) u USING (col)
ORDER BY rn;
Returns all rows found in the sequence of the input array:
-> SQLfiddle
For more details and future-proof code consider this closely related question:
PostgreSQL unnest() with element number
Or the corresponding question on dba.SE:
How to preserve the original order of elements in an unnested array?