Check if NULL exists in Postgres array - sql

Similar to this question, how can I find if a NULL value exists in an array?
Here are some attempts.
SELECT num, ar, expected,
ar #> ARRAY[NULL]::int[] AS test1,
NULL = ANY (ar) AS test2,
array_to_string(ar, ', ') <> array_to_string(ar, ', ', '(null)') AS test3
FROM (
SELECT 1 AS num, '{1,2,NULL}'::int[] AS ar, true AS expected
UNION SELECT 2, '{1,2,3}'::int[], false
) td ORDER BY num;
num | ar | expected | test1 | test2 | test3
-----+------------+----------+-------+-------+-------
1 | {1,2,NULL} | t | f | | t
2 | {1,2,3} | f | f | | f
(2 rows)
Only a trick with array_to_string shows the expected value. Is there a better way to test this?

Postgres 9.5 or later
Or use array_position(). Basically:
SELECT array_position(arr, NULL) IS NOT NULL AS array_has_null
See demo below.
Postgres 9.3 or later
You can test with the built-in functions array_remove() or array_replace().
Postgres 9.1 or any version
If you know a single element that can never exist in your arrays, you can use this fast expression. Say, you have an array of positive numbers, and -1 can never be in it:
-1 = ANY(arr) IS NULL
Related answer with detailed explanation:
Is array all NULLs in PostgreSQL
If you cannot be absolutely sure, you could fall back to one of the expensive but safe methods with unnest(). Like:
(SELECT bool_or(x IS NULL) FROM unnest(arr) x)
or:
EXISTS (SELECT 1 FROM unnest(arr) x WHERE x IS NULL)
But you can have fast and safe with a CASE expression. Use an unlikely number and fall back to the safe method if it should exist. You may want to treat the case arr IS NULL separately. See demo below.
Demo
SELECT num, arr, expect
, -1 = ANY(arr) IS NULL AS t_1 -- 50 ms
, (SELECT bool_or(x IS NULL) FROM unnest(arr) x) AS t_2 -- 754 ms
, EXISTS (SELECT 1 FROM unnest(arr) x WHERE x IS NULL) AS t_3 -- 521 ms
, CASE -1 = ANY(arr)
WHEN FALSE THEN FALSE
WHEN TRUE THEN EXISTS (SELECT 1 FROM unnest(arr) x WHERE x IS NULL)
ELSE NULLIF(arr IS NOT NULL, FALSE) -- catch arr IS NULL -- 55 ms
-- ELSE TRUE -- simpler for columns defined NOT NULL -- 51 ms
END AS t_91
, array_replace(arr, NULL, 0) <> arr AS t_93a -- 99 ms
, array_remove(arr, NULL) <> arr AS t_93b -- 96 ms
, cardinality(array_remove(arr, NULL)) <> cardinality(arr) AS t_94 -- 81 ms
, COALESCE(array_position(arr, NULL::int), 0) > 0 AS t_95a -- 49 ms
, array_position(arr, NULL) IS NOT NULL AS t_95b -- 45 ms
, CASE WHEN arr IS NOT NULL
THEN array_position(arr, NULL) IS NOT NULL END AS t_95c -- 48 ms
FROM (
VALUES (1, '{1,2,NULL}'::int[], true) -- extended test case
, (2, '{-1,NULL,2}' , true)
, (3, '{NULL}' , true)
, (4, '{1,2,3}' , false)
, (5, '{-1,2,3}' , false)
, (6, NULL , null)
) t(num, arr, expect);
Result:
num | arr | expect | t_1 | t_2 | t_3 | t_91 | t_93a | t_93b | t_94 | t_95a | t_95b | t_95c
-----+-------------+--------+--------+------+-----+------+-------+-------+------+-------+-------+-------
1 | {1,2,NULL} | t | t | t | t | t | t | t | t | t | t | t
2 | {-1,NULL,2} | t | f --!! | t | t | t | t | t | t | t | t | t
3 | {NULL} | t | t | t | t | t | t | t | t | t | t | t
4 | {1,2,3} | f | f | f | f | f | f | f | f | f | f | f
5 | {-1,2,3} | f | f | f | f | f | f | f | f | f | f | f
6 | NULL | NULL | t --!! | NULL | f | NULL | NULL | NULL | NULL | f | f | NULL
Note that array_remove() and array_position() are not allowed for multi-dimensional arrays. All expressions to the right of t_93a only work for 1-dimenstioal arrays.
db<>fiddle here - Postgres 13, with more tests
Old sqlfiddle
Benchmark setup
The added times are from a benchmark test with 200k rows in Postgres 9.5. This is my setup:
CREATE TABLE t AS
SELECT row_number() OVER() AS num
, array_agg(elem) AS arr
, bool_or(elem IS NULL) AS expected
FROM (
SELECT CASE WHEN random() > .95 THEN NULL ELSE g END AS elem -- 5% NULL VALUES
, count(*) FILTER (WHERE random() > .8)
OVER (ORDER BY g) AS grp -- avg 5 element per array
FROM generate_series (1, 1000000) g -- increase for big test case
) sub
GROUP BY grp;
Function wrapper
For repeated use, I would create a function in Postgres 9.5 like this:
CREATE OR REPLACE FUNCTION f_array_has_null (anyarray)
RETURNS bool
LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT array_position($1, NULL) IS NOT NULL';
PARALLEL SAFE only for Postgres 9.6 or later.
Using a polymorphic input type this works for any array type, not just int[].
Make it IMMUTABLE to allow performance optimization and index expressions.
Does PostgreSQL support "accent insensitive" collations?
But don't make it STRICT, which would disable "function inlining" and impair performance because array_position() is not STRICT itself. See:
Function executes faster without STRICT modifier?
If you need to catch the case arr IS NULL:
CREATE OR REPLACE FUNCTION f_array_has_null (anyarray)
RETURNS bool
LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT CASE WHEN $1 IS NOT NULL
THEN array_position($1, NULL) IS NOT NULL END';
For Postgres 9.1 use the t_91 expression from above. The rest applies unchanged.
Closely related:
How to determine if NULL is contained in an array in Postgres?

PostgreSQL's UNNEST() function is a better choice.You can write a simple function like below to check for NULL values in an array.
create or replace function NULL_EXISTS(val anyelement) returns boolean as
$$
select exists (
select 1 from unnest(val) arr(el) where el is null
);
$$
language sql
For example,
SELECT NULL_EXISTS(array [1,2,NULL])
,NULL_EXISTS(array [1,2,3]);
Result:
null_exists null_exists
----------- --------------
t f
So, You can use NULL_EXISTS() function in your query like below.
SELECT num, ar, expected,NULL_EXISTS(ar)
FROM (
SELECT 1 AS num, '{1,2,NULL}'::int[] AS ar, true AS expected
UNION SELECT 2, '{1,2,3}'::int[], false
) td ORDER BY num;

PostgreSQL 9.5 (I know you spcified 9.1, but anyway) has the array_position() function to do just what you want without having to use the horribly inefficient unnest() for something as trivial as this (see test4):
patrick#puny:~$ psql -d test
psql (9.5.0)
Type "help" for help.
test=# SELECT num, ar, expected,
ar #> ARRAY[NULL]::int[] AS test1,
NULL = ANY (ar) AS test2,
array_to_string(ar, ', ') <> array_to_string(ar, ', ', '(null)') AS test3,
coalesce(array_position(ar, NULL::int), 0) > 0 AS test4
FROM (
SELECT 1 AS num, '{1,2,NULL}'::int[] AS ar, true AS expected
UNION SELECT 2, '{1,2,3}'::int[], false
) td ORDER BY num;
num | ar | expected | test1 | test2 | test3 | test4
-----+------------+----------+-------+-------+-------+-------
1 | {1,2,NULL} | t | f | | t | t
2 | {1,2,3} | f | f | | f | f
(2 rows)

I use this
select
array_position(array[1,null], null) is not null
array_position - returns the subscript of the first occurrence of the second argument in the array, starting at the element indicated by the third argument or at the first element (array must be one-dimensional)

Related

Find unique dataset with max. value from 3 columns

Imaging following table
ID:PrimaryKey (Sequence generated Number)
ColA:ForeignKey(Number)
ColB:ForeignKey(Number)
ColC:ForeignKey(Number)
State:Enumeration(Number) 10,20,30,... 90
ValidFrom:TimeStamp(6)
LastUpdate:(6)
I know created a query to fetch any combination in the highest states (70 and above) The combination ColA,ColB and ColC should be unqiue. If there is a validfrom available the highest would win. If there are 2 in state 90 the newest would win:
So for some table like this
|------|------|------|-------|-------------|------------|
| ColA | ColB | ColC | State |ValidFrom |LastUpdate |
|------|------|------|-------|-------------|------------|
| 1 | 1 | 1 | 10 | null | 10.10.2018 | //Excluded
|------|------|------|-------|-------------|------------|
| 1 | 1 | 1 | 70 | null | 09.10.2018 | // lower State
|------|------|------|-------|-------------|------------|
| 1 | 1 | 1 | 90 | null | 05.05.2018 | // older LastUpdate
|------|------|------|-------|-------------|------------|
| 1 | 1 | 1 | 90 | null | 12.07.2018 | //Should Win
|------|------|------|-------|-------------|------------|
| 1 | 2 | 1 | 90 | 18.10.2018 | 12.07.2018 | //Should Win
|------|------|------|-------|-------------|------------|
| 1 | 2 | 1 | 90 | null | 18.11.2018 | //loose against ValidFrom
|------|------|------|-------|-------------|------------|
| 3 | 2 | 1 | 90 | 02.12.2018 | 04.08.2018 | //lower ValidFrom
|------|------|------|-------|-------------|------------|
| 3 | 2 | 1 | 70 | 19.10.2018 | 17.11.2018 | //lower state
|------|------|------|-------|-------------|------------|
| 3 | 2 | 1 | 90 | 18.10.2018 | 14.08.2018 | //Should win
|------|------|------|-------|-------------|------------|
So as you can see the combination of ColA,ColB and ColC should be unqiue at the end.
So I started writing a script gives me all the data with the highest states per combination:
SELECT MAINSELECT.*
FROM
FOO MAINSELECT
WHERE
MAINSELECT.STATE >= 70
AND NOT EXISTS
( SELECT SUBSELECT.ID
FROM
FOO SUBSELECT
WHERE SUBSELECT.ID <> MAINSELECT.ID
AND SUBSELECT.COLA = MAINSELECT.COLA
AND SUBSELECT.COLB = MAINSELECT.COLB
AND SUBSELECT.COLC = MAINSELECT.COLC
AND SUBSELECT.STATE > MAINSELECT.STATE);
This now gives me all in the highest state. As I do not want to use an OR statement I tried to solve the problem to query either NULL as Validfrom or the MAX in 2 different queries (and use union). So I tried to extend this base SELECT like this to get all with a ValidFrom != null && Max(ValidFrom):
SELECT MAINSELECT.*
FROM
FOO MAINSELECT
WHERE
MAINSELECT.STATE >= 70
MAINSELECT.VALIDFROM IS NOT NULL
AND NOT EXISTS
( SELECT SUBSELECT.ID
FROM
FOO SUBSELECT
WHERE SUBSELECT.ID <> MAINSELECT.ID
AND SUBSELECT.COLA = MAINSELECT.COLA
AND SUBSELECT.COLB = MAINSELECT.COLB
AND SUBSELECT.COLC = MAINSELECT.COLC
AND SUBSELECT.STATE > MAINSELECT.STATE)
AND NOT EXISTS
( SELECT SUBSELECT.ID
FROM
FOO SUBSELECT
WHERE SUBSELECT.ID <> MAINSELECT.ID -- Should not be the same
AND SUBSELECT.COLA = MAINSELECT.COLA -- Same combination!
AND SUBSELECT.COLB = MAINSELECT.COLB
AND SUBSELECT.COLC = MAINSELECT.COLC
AND SUBSELECT.STATE = MAINSELECT.STATE --Filter on same state!
AND SUBSELECT.VALIDFROM > MAINSELECT.VALIDFROM);
But this doesn't seem to work because now nothing ist printed.
I am expecting just row: 5 and 9! [Starting at 1 ;-)]
And I currently get row: 5, 7 and 9!
So the combination [3,2,1] is duplicate.
I do not get why the 2nd NOT EXISTS does not work. It's like there are 0F*** given!
Use row_number():
dbfiddle demo
select *
from (
select row_number() over (
partition by cola, colb, colc
order by state desc, validfrom desc nulls last, lastupdate desc) rn,
foo.*
from foo)
where rn = 1
7 wins against 9 because 2018-12-02 is newer than 2018-10-18.
Explanation:
partition by cola, colb, colc causes that for each combination of these columns numbering is done separately,
next are criteria of ordering, so higher state wins, then newer, not nullable validfrom wins, and at the end newer lastupdate wins.
For each combinantion of a, b, c we get separate set of numbered rows. Outer query filters only rows numbered as 1.
I found the answer. Instead of using NOT EXISTS I am trying to use the max, rpad and coalesce to create a string which I compare:
SELECT
MAINSELECT.*
FROM
FOO MAINSELECT
WHERE (1 = 1)
AND MAINSELECT.STATE >= 70
AND coalesce(to_char(MAINSELECT.state), rpad('0', 3, '0') ) || coalesce(to_char(MAINSELECT.validfrom,'YYMMDDhh24missFF'), rpad('0', 18, '0') ) || coalesce(to_char(MAINSELECT.lastupdate,'YYMMDDhh24missFF'), rpad('0', 18, '0') )
= (select max(coalesce(to_char(SUBSELECT.state), rpad('0', 3, '0') ) || coalesce(to_char(SUBSELECT.validfrom,'YYMMDDhh24missFF'), rpad('0', 18, '0') )|| coalesce(to_char(SUBSELECT.lastupdate,'YYMMDDhh24missFF'), rpad('0', 18, '0')))
FROM
FOO SUBSELECT
WHERE (1 = 1)
AND SUBSELECT.STATE >= 70
AND SUBSELECT.COLA = MAINSELECT.COLA
AND SUBSELECT.COLB = MAINSELECT.COLB
AND SUBSELECT.COLC = MAINSELECT.COLC
);
This creates a simple string with the values from the columns STATE,VALIDFROM and LASTUPDATE and is then trying to find the max of these! stating with the State which has the highest number and comes in the front!

Convert one row into multiple rows with fewer columns

I'd like to convert single rows into multiple rows in PostgreSQL, where some of the columns are removed. Here's an example of the current output:
name | st | ot | dt |
-----|----|----|----|
Fred | 8 | 2 | 3 |
Jane | 8 | 1 | 0 |
Samm | 8 | 0 | 6 |
Alex | 8 | 0 | 0 |
Using the following query:
SELECT
name, st, ot, dt
FROM
times;
And here's what I want:
name | t | val |
-----|----|-----|
Fred | st | 8 |
Fred | ot | 2 |
Fred | dt | 3 |
Jane | st | 8 |
Jane | ot | 1 |
Samm | st | 8 |
Samm | dt | 6 |
Alex | st | 8 |
How can I modify the query to get the above desired output?
SELECT
times.name, x.t, x.val
FROM
times cross join lateral (values('st',st),('ot',ot),('dt',dt)) as x(t,val)
WHERE
x.val <> 0;
The core problem is the reverse of a pivot / crosstab operation. Sometimes called "unpivot".
Basically, Abelisto's query is the way to go in Postgres 9.3 or later. Related:
SELECT DISTINCT on multiple columns
You may want to use LEFT JOIN LATERAL ... ON u.val <> 0 to include names without valid values in the result (and shorten the syntax a bit).
What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
If you have more than a few value columns (or varying lists of columns) you may want to use a function to build and execute the query automatically:
CREATE OR REPLACE FUNCTION f_unpivot_columns(VARIADIC _cols text[])
RETURNS TABLE(name text, t text, val int)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT
'SELECT t.name, u.t, u.val
FROM times t
LEFT JOIN LATERAL (VALUES '
|| string_agg(format('(%L, t.%I)', c, c), ', ')
|| ') u(t, val) ON (u.val <> 0)'
FROM unnest(_cols) c
);
END
$func$;
Call:
SELECT * FROM f_unpivot_times_columns(VARIADIC '{st, ot, dt}');
Or:
SELECT * FROM f_unpivot_columns('ot', 'dt');
Columns names are provided as string literals and must be in correct (case-sensitive!) spelling with no extra double-quotes. See:
Are PostgreSQL column names case-sensitive?
db<>fiddle here
Related with more examples and explanation:
How to unpivot a table in PostgreSQL
One way:
with times(name , st , ot , dt) as(
select 'Fred',8 , 2 , 3 union all
select 'Jane',8 , 1 , 0 union all
select 'Samm',8 , 0 , 6 union all
select 'Alex',8 , 0 , 0
)
select name, key as t, value::int from
(
select name, json_build_object('st' ,st , 'ot',ot, 'dt',dt) as j
from times
) t
join lateral json_each_text(j)
on true
where value <> '0'
-- order by name, case when key = 'st' then 0 when key = 'ot' then 1 when key = 'dt' then 2 end

How to pivot or 'merge' rows with column names?

I have the following table:
crit_id | criterium | val1 | val2
----------+------------+-------+--------
1 | T01 | 9 | 9
2 | T02 | 3 | 5
3 | T03 | 4 | 9
4 | T01 | 2 | 3
5 | T02 | 5 | 1
6 | T03 | 6 | 1
I need to convert the values in 'criterium' into columns as 'cross product' with val1 and val2. So the result has to lool like:
T01_val1 |T01_val2 |T02_val1 |T02_val2 | T03_val1 | T03_val2
---------+---------+---------+---------+----------+---------
9 | 9 | 3 | 5 | 4 | 9
2 | 3 | 5 | 1 | 6 | 1
Or to say differently: I need every value for all criteria to be in one row.
This is my current approach:
select
case when criterium = 'T01' then val1 else null end as T01_val1,
case when criterium = 'T01' then val2 else null end as T01_val2,
case when criterium = 'T02' then val1 else null end as T02_val1,
case when criterium = 'T02' then val2 else null end as T02_val2,
case when criterium = 'T03' then val1 else null end as T03_val1,
case when criterium = 'T03' then val2 else null end as T04_val2,
from crit_table;
But the result looks not how I want it to look like:
T01_val1 |T01_val2 |T02_val1 |T02_val2 | T03_val1 | T03_val2
---------+---------+---------+---------+----------+---------
9 | 9 | null | null | null | null
null | null | 3 | 5 | null | null
null | null | null | null | 4 | 9
What's the fastest way to achieve my goal?
Bonus question:
I have 77 criteria and seven different kinds of values for every criterium. So I have to write 539 case statements. Whats the best way to create them dynamically?
I'm working with PostgreSql 9.4
Prepare for crosstab
In order to use crosstab() function, the data must be reorganized. You need a dataset with three columns (row number, criterium, value). To have all values in one column you must unpivot two last columns, changing at the same time the names of criteria. As a row number you can use rank() function over partitions by new criteria.
select rank() over (partition by criterium order by crit_id), criterium, val
from (
select crit_id, criterium || '_v1' criterium, val1 val
from crit
union
select crit_id, criterium || '_v2' criterium, val2 val
from crit
) sub
order by 1, 2
rank | criterium | val
------+-----------+-----
1 | T01_v1 | 9
1 | T01_v2 | 9
1 | T02_v1 | 3
1 | T02_v2 | 5
1 | T03_v1 | 4
1 | T03_v2 | 9
2 | T01_v1 | 2
2 | T01_v2 | 3
2 | T02_v1 | 5
2 | T02_v2 | 1
2 | T03_v1 | 6
2 | T03_v2 | 1
(12 rows)
This dataset can be used in crosstab():
create extension if not exists tablefunc;
select * from crosstab($ct$
select rank() over (partition by criterium order by crit_id), criterium, val
from (
select crit_id, criterium || '_v1' criterium, val1 val
from crit
union
select crit_id, criterium || '_v2' criterium, val2 val
from crit
) sub
order by 1, 2
$ct$)
as ct (rank bigint, "T01_v1" int, "T01_v2" int,
"T02_v1" int, "T02_v2" int,
"T03_v1" int, "T03_v2" int);
rank | T01_v1 | T01_v2 | T02_v1 | T02_v2 | T03_v1 | T03_v2
------+--------+--------+--------+--------+--------+--------
1 | 9 | 9 | 3 | 5 | 4 | 9
2 | 2 | 3 | 5 | 1 | 6 | 1
(2 rows)
Alternative solution
For 77 criteria * 7 parameters the above query may be troublesome. If you can accept a bit different way of presenting the data, the issue becomes much easier.
select * from crosstab($ct$
select
rank() over (partition by criterium order by crit_id),
criterium,
concat_ws(' | ', val1, val2) vals
from crit
order by 1, 2
$ct$)
as ct (rank bigint, "T01" text, "T02" text, "T03" text);
rank | T01 | T02 | T03
------+-------+-------+-------
1 | 9 | 9 | 3 | 5 | 4 | 9
2 | 2 | 3 | 5 | 1 | 6 | 1
(2 rows)
DECLARE #Table1 TABLE
(crit_id int, criterium varchar(3), val1 int, val2 int)
;
INSERT INTO #Table1
(crit_id, criterium, val1, val2)
VALUES
(1, 'T01', 9, 9),
(2, 'T02', 3, 5),
(3, 'T03', 4, 9),
(4, 'T01', 2, 3),
(5, 'T02', 5, 1),
(6, 'T03', 6, 1)
;
select [T01] As [T01_val1 ],[T01-1] As [T01_val2 ],[T02] As [T02_val1 ],[T02-1] As [T02_val2 ],[T03] As [T03_val1 ],[T03-1] As [T03_val3 ] from (
select T.criterium,T.val1,ROW_NUMBER()OVER(PARTITION BY T.criterium ORDER BY (SELECT NULL)) RN from (
select criterium, val1 from #Table1
UNION ALL
select criterium+'-'+'1', val2 from #Table1)T)PP
PIVOT (MAX(val1) FOR criterium IN([T01],[T02],[T03],[T01-1],[T02-1],[T03-1]))P
I agree with Michael's comment that this requirement looks a bit weird, but if you really need it that way, you were on the right track with your solution. It just needs a little bit of additional code (and small corrections wherever val_1 and val_2 where mixed up):
select
sum(case when criterium = 'T01' then val_1 else null end) as T01_val1,
sum(case when criterium = 'T01' then val_2 else null end) as T01_val2,
sum(case when criterium = 'T02' then val_1 else null end) as T02_val1,
sum(case when criterium = 'T02' then val_2 else null end) as T02_val2,
sum(case when criterium = 'T03' then val_1 else null end) as T03_val1,
sum(case when criterium = 'T03' then val_2 else null end) as T03_val2
from
crit_table
group by
trunc((crit_id-1)/3.0)
order by
trunc((crit_id-1)/3.0);
This works as follows. To aggregate the result you posted into the result you would like to have, the first helpful observation is that the desired result has less rows than your preliminary one. So there's some kind of grouping necessary, and the key question is: "What's the grouping criterion?" In this case, it's rather non-obvious: It's criterion ID (minus 1, to start counting with 0) divided by 3, and truncated. The three comes from the number of different criteria. After that puzzle is solved, it is easy to see that for among the input rows that are aggregated into the same result row, there is only one non-null value per column. That means that the choice of aggregate function is not so important, as it is only needed to return the only non-null value. I used the sum in my code snippet, but you could as well use min or max.
As for the bonus question: Use a code generator query that generates the query you need. The code looks like this (with only three types of values to keep it brief):
with value_table as /* possible kinds of values, add the remaining ones here */
(select 'val_1' value_type union
select 'val_2' value_type union
select 'val_3' value_type )
select contents from (
select 0 order_id, 'select' contents
union
select row_number() over () order_id,
'max(case when criterium = '''||criterium||''' then '||value_type||' else null end) '||criterium||'_'||value_type||',' contents
from crit_table
cross join value_table
union select 9999999 order_id,
' from crit_table group by trunc((crit_id-1)/3.0) order by trunc((crit_id-1)/3.0);' contents
) v
order by order_id;
This basically only uses a string template of your query and then inserts the appropriate combinations of values for the criteria and the val-columns. You could even get rid of the with-clause by reading column names from information_schema.columns, but I think the basic idea is clearer in the version above. Note that the code generated contains one comma too much directly after the last column (before the from clause). It's easier to delete that by hand afterwards than correcting it in the generator.

Filling null values with last not null one => huge number of columns

I want to copy rows from SOURCE_TABLE to TARGET_TABLE while filling null values with last not null one.
What I have :
SOURCE_TABLE :
A | B | C | PARENT | SEQ_NUMBER
1 | 2 | NULL | 1 | 1
NULL | NULL | 1 | 1 | 2
NULL | 3 | 2 | 1 | 3
DEST_TABLE is empty
What I want :
DEST_TABLE :
A | B | C | PARENT | SEQ_NUMBER
1 | 2 | NULL | 1 | 1
1 | 2 | 1 | 1 | 2
1 | 3 | 2 | 1 | 3
To achieve that I'm dynamically generating the following SQL :
insert into TARGET_TABLE (A, B, C)
select coalesce(A, lag(A ignore nulls) over (partition by parent order by seq_number)) as A,
coalesce(B, lag(B ignore nulls) over (partition by parent order by seq_number)) as B,
coalesce(C, lag(C ignore nulls) over (partition by parent order by seq_number)) as C,
from SOURCE_TABLE;
Everything works fine if SOURCE and TARGET tables have a small number of columns. In my case they have 400+ columns (yes this is bad but it is legacy and cannot be changed) and I got the following error :
ORA-01467: sort key too long
First I don't really understand this error. Is this because I'm using too many lag functions that use themselves "order by"/"partition by" ? Replace coalesce(A, lag(A....)) by coalesce(A,A) and the error disappear.
Then, is there a workaround or another way to achieve the same result ?
Thx
just do it using pl/sql anonymous block:
declare
x tgt_table%rowtype; --keep values from last row
begin
for r in (select * from src_table order by seq_number) loop
x.a := nvl(r.a, x.a);
...
x.z := nvl(r.z, x.z);
insert into tgt_table
values x;
end loop;
end;

Postgres perform query to select array index

How Can I perform a query that returns a row if I have the wanted values at same index on different columns ? For example, here is some code:
select id_reg, a1, a2 from lb_reg_teste2;
id_reg | a1 | a2
--------+------------------+-------------
1 | {10,10,20,20,10} | {3,2,4,3,6}
(1 row)
The query would be someting like:
select id_reg from lb_reg_teste2 where idx(a1, '20') = idx(a2, '3');
# Should return id_reg = 1
I Found this script , but it only returns the first occurrence of a value in an array. For this case, I need all occurrences.
CREATE OR REPLACE FUNCTION idx(anyarray, anyelement)
RETURNS int AS
$$
SELECT i FROM (
SELECT generate_series(array_lower($1,1),array_upper($1,1))
) g(i)
WHERE $1[i] = $2
LIMIT 1;
$$ LANGUAGE sql IMMUTABLE;
You can extract the values from the arrays along with their indices, and then filter out the results.
If the arrays have the same number of elements, consider this query:
SELECT id_reg,
generate_subscripts(a1,1) as idx1,
unnest(a1) as val1,
generate_subscripts(a2,1) as idx2,
unnest(a2) as val2
FROM lb_reg_teste2
With the sample values of the question, this would generate this:
id_reg | idx1 | val1 | idx2 | val2
--------+------+------+------+------
1 | 1 | 10 | 1 | 3
1 | 2 | 10 | 2 | 2
1 | 3 | 20 | 3 | 4
1 | 4 | 20 | 4 | 3
1 | 5 | 10 | 5 | 6
Then use it as a subquery and add a WHERE clause to filter out as necessary.
For the example with 20 and 3 as the values to find at the same index:
SELECT DISTINCT id_reg FROM
( SELECT id_reg,
generate_subscripts(a1,1) as idx1,
unnest(a1) as val1,
generate_subscripts(a2,1) as idx2,
unnest(a2) as val2
FROM lb_reg_teste2 ) s
WHERE idx1=idx2 AND val1=20 AND val2=3;
If the number of elements of a1 and a2 differ, the subquery above will generate a cartesian product (NxM rows where N and M are the array sizes), so this will be less efficient but still produce the correct result, as far as I understood what you expect.
In this case, a variant would be to generate two distinct subqueries with the (values,indices) of each array and join them by the equality of the indices.