Postgres perform query to select array index - sql

How Can I perform a query that returns a row if I have the wanted values at same index on different columns ? For example, here is some code:
select id_reg, a1, a2 from lb_reg_teste2;
id_reg | a1 | a2
--------+------------------+-------------
1 | {10,10,20,20,10} | {3,2,4,3,6}
(1 row)
The query would be someting like:
select id_reg from lb_reg_teste2 where idx(a1, '20') = idx(a2, '3');
# Should return id_reg = 1
I Found this script , but it only returns the first occurrence of a value in an array. For this case, I need all occurrences.
CREATE OR REPLACE FUNCTION idx(anyarray, anyelement)
RETURNS int AS
$$
SELECT i FROM (
SELECT generate_series(array_lower($1,1),array_upper($1,1))
) g(i)
WHERE $1[i] = $2
LIMIT 1;
$$ LANGUAGE sql IMMUTABLE;

You can extract the values from the arrays along with their indices, and then filter out the results.
If the arrays have the same number of elements, consider this query:
SELECT id_reg,
generate_subscripts(a1,1) as idx1,
unnest(a1) as val1,
generate_subscripts(a2,1) as idx2,
unnest(a2) as val2
FROM lb_reg_teste2
With the sample values of the question, this would generate this:
id_reg | idx1 | val1 | idx2 | val2
--------+------+------+------+------
1 | 1 | 10 | 1 | 3
1 | 2 | 10 | 2 | 2
1 | 3 | 20 | 3 | 4
1 | 4 | 20 | 4 | 3
1 | 5 | 10 | 5 | 6
Then use it as a subquery and add a WHERE clause to filter out as necessary.
For the example with 20 and 3 as the values to find at the same index:
SELECT DISTINCT id_reg FROM
( SELECT id_reg,
generate_subscripts(a1,1) as idx1,
unnest(a1) as val1,
generate_subscripts(a2,1) as idx2,
unnest(a2) as val2
FROM lb_reg_teste2 ) s
WHERE idx1=idx2 AND val1=20 AND val2=3;
If the number of elements of a1 and a2 differ, the subquery above will generate a cartesian product (NxM rows where N and M are the array sizes), so this will be less efficient but still produce the correct result, as far as I understood what you expect.
In this case, a variant would be to generate two distinct subqueries with the (values,indices) of each array and join them by the equality of the indices.

Related

is there a way to preserve order or array when using ANY in postgres query?

I'd like to be able to do a query using ANY that maintains the order of the array passed into the any function. Consider this simple example:
create table stuff (
id serial,
value int
);
insert into stuff (value) values (1), (2), (3), (4), (5);
select * from stuff where value = ANY(ARRAY[1,2,3,4,5]);
select * from stuff where value = ANY(ARRAY[5,4,3,2,1]);
which results in the same order for both queries, even though the arrays had a different order.
----+-------
1 | 1
2 | 2
3 | 3
4 | 4
5 | 5
(5 rows)
id | value
----+-------
1 | 1
2 | 2
3 | 3
4 | 4
5 | 5
(5 rows)
I'd like to have a shorthand way, if possible, to preserve results in order of array inside of the ANY. Is this possible?
So far I've had to write something like this, which feels a bit heavy-handed:
CREATE FUNCTION ordered_any (
ints int[]
) RETURNS int[] as $$
DECLARE
results int[];
i int;
value int;
BEGIN
FOR i IN 1 .. cardinality(ints) LOOP
SELECT f.id FROM stuff f
WHERE f.value = ints[i]
INTO value;
results = array_append(results, value);
END LOOP;
RETURN results;
END;
$$
LANGUAGE 'plpgsql';
select ordered_any(ARRAY[5,4,3,2,1]);
"Any" help is appreciated! No pun intended ;)
select
id,
value,
array_position(array[5,4,3,2,1],id) as ord
from stuff where value=any(array[5,4,3,2,1])
order by ord;
output:
id | value | ord
----+-------+-----
5 | 5 | 1
4 | 4 | 2
3 | 3 | 3
2 | 2 | 4
1 | 1 | 5

Join Postgresql array to table

I have following tables
create table top100
(
id integer not null,
top100ids integer[] not null
);
create table top100_data
(
id integer not null,
data_string text not null
);
Rows in table top100 look like:
1, {1,2,3,4,5,6...100}
Rows in table top100_data look like:
1, 'string of text, up to 500 chars'
I need to get the text values from table top100_data and join them with table top100.
So the result will be:
1, {'text1','text2','text3',...'text100'}
I am currenly doing this on application side by selecting from top100, then iterating over all array items and then selecting from top100_data and iterating again + transforming ids to their _data text values.
This can be very slow on large data sets.
Is is possible to get this same result with single SQL query?
You can unnest() and re-aggregate:
select t100.id, array_agg(t100d.data order by top100id)
from top100 t100 cross join
unnest(top100ids) as top100id join
top100_data t100d
on t100d.id = top100id
group by t100.id;
Or if you want to keep the original ordering:
select t100.id, array_agg(t100d.data order by top100id.n)
from top100 t100 cross join
unnest(top100ids) with ordinality as top100id(id, n) join
top100_data t100d
on t100d.id = top100id.id
group by t100.id;
Just use unnest and array_agg function in PostgreSQL, your final sql could be like below:
with core as (
select
id,
unnest(top100ids) as top_id
from
top100
)
select
t1.id,
array_agg(t1.data_string) as text_datas
from
top100 t1
join
core c on t1.id = c.top_id
The example of unnest as below:
postgres=# select * from my_test;
id | top_ids
----+--------------
1 | {1,2,3,4,5}
2 | {6,7,8,9,10}
(2 rows)
postgres=# select id, unnest(top_ids) from my_test;
id | unnest
----+--------
1 | 1
1 | 2
1 | 3
1 | 4
1 | 5
2 | 6
2 | 7
2 | 8
2 | 9
2 | 10
(10 rows)
The example of array_agg as below:
postgres=# select * from my_test_1 ;
id | content
----+---------
1 | a
1 | b
1 | c
1 | d
2 | x
2 | y
(6 rows)
postgres=# select id,array_agg(content) from my_test_1 group by id;
id | array_agg
----+-----------
1 | {a,b,c,d}
2 | {x,y}
(2 rows)

How to pivot or 'merge' rows with column names?

I have the following table:
crit_id | criterium | val1 | val2
----------+------------+-------+--------
1 | T01 | 9 | 9
2 | T02 | 3 | 5
3 | T03 | 4 | 9
4 | T01 | 2 | 3
5 | T02 | 5 | 1
6 | T03 | 6 | 1
I need to convert the values in 'criterium' into columns as 'cross product' with val1 and val2. So the result has to lool like:
T01_val1 |T01_val2 |T02_val1 |T02_val2 | T03_val1 | T03_val2
---------+---------+---------+---------+----------+---------
9 | 9 | 3 | 5 | 4 | 9
2 | 3 | 5 | 1 | 6 | 1
Or to say differently: I need every value for all criteria to be in one row.
This is my current approach:
select
case when criterium = 'T01' then val1 else null end as T01_val1,
case when criterium = 'T01' then val2 else null end as T01_val2,
case when criterium = 'T02' then val1 else null end as T02_val1,
case when criterium = 'T02' then val2 else null end as T02_val2,
case when criterium = 'T03' then val1 else null end as T03_val1,
case when criterium = 'T03' then val2 else null end as T04_val2,
from crit_table;
But the result looks not how I want it to look like:
T01_val1 |T01_val2 |T02_val1 |T02_val2 | T03_val1 | T03_val2
---------+---------+---------+---------+----------+---------
9 | 9 | null | null | null | null
null | null | 3 | 5 | null | null
null | null | null | null | 4 | 9
What's the fastest way to achieve my goal?
Bonus question:
I have 77 criteria and seven different kinds of values for every criterium. So I have to write 539 case statements. Whats the best way to create them dynamically?
I'm working with PostgreSql 9.4
Prepare for crosstab
In order to use crosstab() function, the data must be reorganized. You need a dataset with three columns (row number, criterium, value). To have all values in one column you must unpivot two last columns, changing at the same time the names of criteria. As a row number you can use rank() function over partitions by new criteria.
select rank() over (partition by criterium order by crit_id), criterium, val
from (
select crit_id, criterium || '_v1' criterium, val1 val
from crit
union
select crit_id, criterium || '_v2' criterium, val2 val
from crit
) sub
order by 1, 2
rank | criterium | val
------+-----------+-----
1 | T01_v1 | 9
1 | T01_v2 | 9
1 | T02_v1 | 3
1 | T02_v2 | 5
1 | T03_v1 | 4
1 | T03_v2 | 9
2 | T01_v1 | 2
2 | T01_v2 | 3
2 | T02_v1 | 5
2 | T02_v2 | 1
2 | T03_v1 | 6
2 | T03_v2 | 1
(12 rows)
This dataset can be used in crosstab():
create extension if not exists tablefunc;
select * from crosstab($ct$
select rank() over (partition by criterium order by crit_id), criterium, val
from (
select crit_id, criterium || '_v1' criterium, val1 val
from crit
union
select crit_id, criterium || '_v2' criterium, val2 val
from crit
) sub
order by 1, 2
$ct$)
as ct (rank bigint, "T01_v1" int, "T01_v2" int,
"T02_v1" int, "T02_v2" int,
"T03_v1" int, "T03_v2" int);
rank | T01_v1 | T01_v2 | T02_v1 | T02_v2 | T03_v1 | T03_v2
------+--------+--------+--------+--------+--------+--------
1 | 9 | 9 | 3 | 5 | 4 | 9
2 | 2 | 3 | 5 | 1 | 6 | 1
(2 rows)
Alternative solution
For 77 criteria * 7 parameters the above query may be troublesome. If you can accept a bit different way of presenting the data, the issue becomes much easier.
select * from crosstab($ct$
select
rank() over (partition by criterium order by crit_id),
criterium,
concat_ws(' | ', val1, val2) vals
from crit
order by 1, 2
$ct$)
as ct (rank bigint, "T01" text, "T02" text, "T03" text);
rank | T01 | T02 | T03
------+-------+-------+-------
1 | 9 | 9 | 3 | 5 | 4 | 9
2 | 2 | 3 | 5 | 1 | 6 | 1
(2 rows)
DECLARE #Table1 TABLE
(crit_id int, criterium varchar(3), val1 int, val2 int)
;
INSERT INTO #Table1
(crit_id, criterium, val1, val2)
VALUES
(1, 'T01', 9, 9),
(2, 'T02', 3, 5),
(3, 'T03', 4, 9),
(4, 'T01', 2, 3),
(5, 'T02', 5, 1),
(6, 'T03', 6, 1)
;
select [T01] As [T01_val1 ],[T01-1] As [T01_val2 ],[T02] As [T02_val1 ],[T02-1] As [T02_val2 ],[T03] As [T03_val1 ],[T03-1] As [T03_val3 ] from (
select T.criterium,T.val1,ROW_NUMBER()OVER(PARTITION BY T.criterium ORDER BY (SELECT NULL)) RN from (
select criterium, val1 from #Table1
UNION ALL
select criterium+'-'+'1', val2 from #Table1)T)PP
PIVOT (MAX(val1) FOR criterium IN([T01],[T02],[T03],[T01-1],[T02-1],[T03-1]))P
I agree with Michael's comment that this requirement looks a bit weird, but if you really need it that way, you were on the right track with your solution. It just needs a little bit of additional code (and small corrections wherever val_1 and val_2 where mixed up):
select
sum(case when criterium = 'T01' then val_1 else null end) as T01_val1,
sum(case when criterium = 'T01' then val_2 else null end) as T01_val2,
sum(case when criterium = 'T02' then val_1 else null end) as T02_val1,
sum(case when criterium = 'T02' then val_2 else null end) as T02_val2,
sum(case when criterium = 'T03' then val_1 else null end) as T03_val1,
sum(case when criterium = 'T03' then val_2 else null end) as T03_val2
from
crit_table
group by
trunc((crit_id-1)/3.0)
order by
trunc((crit_id-1)/3.0);
This works as follows. To aggregate the result you posted into the result you would like to have, the first helpful observation is that the desired result has less rows than your preliminary one. So there's some kind of grouping necessary, and the key question is: "What's the grouping criterion?" In this case, it's rather non-obvious: It's criterion ID (minus 1, to start counting with 0) divided by 3, and truncated. The three comes from the number of different criteria. After that puzzle is solved, it is easy to see that for among the input rows that are aggregated into the same result row, there is only one non-null value per column. That means that the choice of aggregate function is not so important, as it is only needed to return the only non-null value. I used the sum in my code snippet, but you could as well use min or max.
As for the bonus question: Use a code generator query that generates the query you need. The code looks like this (with only three types of values to keep it brief):
with value_table as /* possible kinds of values, add the remaining ones here */
(select 'val_1' value_type union
select 'val_2' value_type union
select 'val_3' value_type )
select contents from (
select 0 order_id, 'select' contents
union
select row_number() over () order_id,
'max(case when criterium = '''||criterium||''' then '||value_type||' else null end) '||criterium||'_'||value_type||',' contents
from crit_table
cross join value_table
union select 9999999 order_id,
' from crit_table group by trunc((crit_id-1)/3.0) order by trunc((crit_id-1)/3.0);' contents
) v
order by order_id;
This basically only uses a string template of your query and then inserts the appropriate combinations of values for the criteria and the val-columns. You could even get rid of the with-clause by reading column names from information_schema.columns, but I think the basic idea is clearer in the version above. Note that the code generated contains one comma too much directly after the last column (before the from clause). It's easier to delete that by hand afterwards than correcting it in the generator.

Why IN operator return distinct selection when passing duplicate value (value1 , value1 ....)

Using SQL Server 2008
Why does the IN operator return distinct values when selecting duplicate values?
Table #temp
x | 1 | 2 | 3
--+------------+-------------+------------
1 | first 1 | first 2 | first 3
2 | Second 1 | second 2 | second 3
When I execute this query
SELECT * FROM #temp WHERE x IN (1,1)
it will return
x | 1 | 2 | 3
--+------------+-------------+------------
1 | first 1 | first 2 | first 3
How can I make it so it returns this instead:
x | 1 | 2 | 3
--+------------+-------------+------------
1 | first 1 | first 2 | first 3
1 | first 1 | first 2 | first 3
What is the alternative of IN in this case?
If you want to return duplicates, then you need to phrase the query as a join. The in is simply testing a condition on each row. Whether the condition is met once or twice doesn't matter -- the row either stays in or gets filtered out.
with xes as (
select 1 as x union all
select 1 as x
)
SELECT *
FROM #temp t join
xes
on t.x = xes.x;
EDIT:
If you have a subquery, then it is even simpler:
select *
from #temp t join
(<subquery>) s
on t.x = s.x
This would be a "normal" use of a join.

SQL - min() gets the lowest value, max() the highest, what if I want the 2nd (or 5th or nth) lowest value?

The problem I'm trying to solve is that I have a table like this:
a and b refer to point on a different table. distance is the distance between the points.
| id | a_id | b_id | distance | delete |
| 1 | 1 | 1 | 1 | 0 |
| 2 | 1 | 2 | 0.2345 | 0 |
| 3 | 1 | 3 | 100 | 0 |
| 4 | 2 | 1 | 1343.2 | 0 |
| 5 | 2 | 2 | 0.45 | 0 |
| 6 | 2 | 3 | 110 | 0 |
....
The important column I'm looking is a_id. If I wanted to keep the closet b for each a, I could do something like this:
update mytable set delete = 1 from (select a_id, min(distance) as dist from table group by a_id) as x where a_gid = a_gid and distance > dist;
delete from mytable where delete = 1;
Which would give me a result table like this:
| id | a_id | b_id | distance | delete |
| 1 | 1 | 1 | 1 | 0 |
| 5 | 2 | 2 | 0.45 | 0 |
....
i.e. I need one row for each value of a_id, and that row should have the lowest value of distance for each a_id.
However I want to keep the 10 closest points for each a_gid. I could do this with a plpgsql function but I'm curious if there is a more SQL-y way.
min() and max() return the smallest and largest, if there was an aggregate function like nth(), which'd return the nth largest/smallest value then I could do this in similar manner to the above.
I'm using PostgeSQL.
Try this:
SELECT *
FROM (
SELECT a_id, (
SELECT b_id
FROM mytable mib
WHERE mib.a_id = ma.a_id
ORDER BY
dist DESC
LIMIT 1 OFFSET s
) AS b_id
FROM (
SELECT DISTINCT a_id
FROM mytable mia
) ma, generate_series (1, 10) s
) ab
WHERE b_id IS NOT NULL
Checked on PostgreSQL 8.3
I love postgres, so it took it as a challenge the second I saw this question.
So, for the table:
Table "pg_temp_29.foo"
Column | Type | Modifiers
--------+---------+-----------
value | integer |
With the values:
SELECT value FROM foo ORDER BY value;
value
-------
0
1
2
3
4
5
6
7
8
9
14
20
32
(13 rows)
You can do a:
SELECT value FROM foo ORDER BY value DESC LIMIT 1 OFFSET X
Where X = 0 for the highest value, 1 for the second highest, 2... And so forth.
This can be further embedded in a subquery to retrieve the value needed. So, to use the dataset provided in the original question we can get the a_ids with the top ten lowest distances by doing:
SELECT a_id, distance FROM mytable
WHERE id IN
(SELECT id FROM mytable WHERE t1.a_id = t2.a_id
ORDER BY distance LIMIT 10);
ORDER BY a_id, distance;
a_id | distance
------+----------
1 | 0.2345
1 | 1
1 | 100
2 | 0.45
2 | 110
2 | 1342.2
Does PostgreSQL have the analytic function rank()? If so try:
select a_id, b_id, distance
from
( select a_id, b_id, distance, rank() over (partition by a_id order by distance) rnk
from mytable
) where rnk <= 10;
This SQL should find you the Nth lowest salary should work in SQL Server, MySQL, DB2, Oracle, Teradata, and almost any other RDBMS: (note: low performance because of subquery)
SELECT * /*This is the outer query part */
FROM mytable tbl1
WHERE (N-1) = ( /* Subquery starts here */
SELECT COUNT(DISTINCT(tbl2.distance))
FROM mytable tbl2
WHERE tbl2.distance < tbl1.distance)
The most important thing to understand in the query above is that the subquery is evaluated each and every time a row is processed by the outer query. In other words, the inner query can not be processed independently of the outer query since the inner query uses the tbl1 value as well.
In order to find the Nth lowest value, we just find the value that has exactly N-1 values lower than itself.