How to get an array in postgres where the array size is greater than 1 - sql

I have a table that looks like this:
val | fkey | num
------------------
1 | 1 | 1
1 | 2 | 1
1 | 3 | 1
2 | 3 | 1
What I would like to do is return a set of rows in which values are grouped by 'val', with an array of fkeys, but only where the array of fkeys is greater than 1. So, in the above example, the return would look something like:
1 | [1,2,3]
I have the following query aggregates the arrays:
SELECT val, array_agg(fkey)
FROM mytable
GROUP BY val;
But this returns something like:
1 | [1,2,3]
2 | [3]
What would be the best way of doing this? I guess one possibility would be to use my existing query as a subquery, and do a sum / count on that, but that seems inefficient. Any feedback would really help!

Use Having clause to filter the groups which is having more than fkey
SELECT val, array_agg(fkey)
FROM mytable
GROUP BY val
Having Count(fkey) > 1

Using the HAVING clause as #Fireblade pointed out is probably more efficient, but you can also leverage subqueries:
SQLFiddle: Subquery
SELECT * FROM (
select val, array_agg(fkey) fkeys
from mytable
group by val
) array_creation
WHERE array_length(fkeys,1) > 1
You could also use the array_length function in the HAVING clause, but again, #Fireblade has used count(), which should be more efficient. Still:
SQLFiddle: Having Clause
SELECT val, array_agg(fkey) fkeys
FROM mytable
GROUP BY val
HAVING array_length(array_agg(fkey),1) > 1
This isn't a total loss, though. Using the array_length in the having can be useful if you want a distinct list of fkeys:
SELECT val, array_agg(DISTINCT fkey) fkeys
There may still be other ways, but this method is more descriptive, which may allow your SQL to be easier to understand when you come back to it, years from now.

Related

Where clause to select rows with only unique values

firstly let me describe you my problem. I need to ignore all repeated values in my select query. So for example if I have something like that:
| Other columns| THE COLUMN I'm working with |
| ............ | Value 1 |
| ............ | Value 2 |
| ............ | Value 2 |
I'd like to get the result containing only the row with "Value 1"
Now because of the specifics of my task I need to validate it with subquery.
So I've figured out something like this:
NOT EXISTS (SELECT 1 FROM TABLE fpd WHERE fpd.value = fp.value HAVING count(*) > 2)
It works like I want, but I'm aware of it being slow. Also I've tried putting 1 instead of 2 in HAVING comprassion, but it just returns zero results. Could you explain where does the 2 value come from?
I would suggest window functions:
select t.*
from (select t.*, count(*) over (partition by value) as cnt
from fpd t
) t
where cnt = 1;
Alternatively, you can use not exists with a primary key:
where not exists (select 1
from fpd fpd2
where fpd2.value = fp.value and
fpd2.primarykey <> fp.primarykey
)
SELECT DISTINCT myColumn FROM myTable

How to efficiently get a value from the last row in bulk on SQL Server

I have a table like so
Id | Type | Value
--------------------
0 | Big | 2
1 | Big | 3
2 | Small | 3
3 | Small | 3
I would like to get a table like this
Type | Last Value
--------------------
Small | 3
Big | 3
How can I do this. I understand there is an SQL Server method called LAST_VALUE(...) OVER .(..) but I can't get this to work with GROUP BY.
I've also tried using SELECT MAX(ID) & SELECT TOP 1.. but this seems a bit inefficient since there would be a subquery for each value. The queries take too long when the table has a few million rows in it.
Is there a way to quickly get the last value for these, perhaps using LAST_VALUE?
You can do it using rownumber:
select
type,
value
from
(
select
type,
value,
rownumber() over (partition by type order by id desc) as RN
) TMP
where RN = 1
Can't test this now since SQL Fiddle doesn't seem to work, but hopefully that's ok.
The most efficient method might be not exists, which uses an anti-join for the underlying operator:
select type, value
from likeso l
where not exists (select 1 from likeso l2 where l2.type = l.type and l2.id > l.id)
For performance, you want an index on likeso(type, id).
I really wonder if there is more efficent solution but, I use following query on such needs;
Select Id, Type, Value
From ( Select *, Max (Id) Over (Partition By Type) As LastId
From #Table) T
Where Id = LastId

Joining arrays within group by clause

We have a problem grouping arrays into a single array.
We want to join the values from two columns into one single array and aggregate these arrays of multiple rows.
Given the following input:
| id | name | col_1 | col_2 |
| 1 | a | 1 | 2 |
| 2 | a | 3 | 4 |
| 4 | b | 7 | 8 |
| 3 | b | 5 | 6 |
We want the following output:
| a | { 1, 2, 3, 4 } |
| b | { 5, 6, 7, 8 } |
The order of the elements is important and should correlate with the id of the aggregated rows.
We tried the array_agg() function:
SELECT array_agg(ARRAY[col_1, col_2]) FROM mytable GROUP BY name;
Unfortunately, this statement raises an error:
ERROR: could not find array type for data type character varying[]
It seems to be impossible to merge arrays in a group by clause using array_agg().
Any ideas?
UNION ALL
You could "unpivot" with UNION ALL first:
SELECT name, array_agg(c) AS c_arr
FROM (
SELECT name, id, 1 AS rnk, col1 AS c FROM tbl
UNION ALL
SELECT name, id, 2, col2 FROM tbl
ORDER BY name, id, rnk
) sub
GROUP BY 1;
Adapted to produce the order of values you later requested. The manual:
The aggregate functions array_agg, json_agg, string_agg, and xmlagg,
as well as similar user-defined aggregate functions, produce
meaningfully different result values depending on the order of the
input values. This ordering is unspecified by default, but can be
controlled by writing an ORDER BY clause within the aggregate call, as
shown in Section 4.2.7. Alternatively, supplying the input values from
a sorted subquery will usually work.
Bold emphasis mine.
LATERAL subquery with VALUES expression
LATERAL requires Postgres 9.3 or later.
SELECT t.name, array_agg(c) AS c_arr
FROM (SELECT * FROM tbl ORDER BY name, id) t
CROSS JOIN LATERAL (VALUES (t.col1), (t.col2)) v(c)
GROUP BY 1;
Same result. Only needs a single pass over the table.
Custom aggregate function
Or you could create a custom aggregate function like discussed in these related answers:
Selecting data into a Postgres array
Is there something like a zip() function in PostgreSQL that combines two arrays?
CREATE AGGREGATE array_agg_mult (anyarray) (
SFUNC = array_cat
, STYPE = anyarray
, INITCOND = '{}'
);
Then you can:
SELECT name, array_agg_mult(ARRAY[col1, col2] ORDER BY id) AS c_arr
FROM tbl
GROUP BY 1
ORDER BY 1;
Or, typically faster, while not standard SQL:
SELECT name, array_agg_mult(ARRAY[col1, col2]) AS c_arr
FROM (SELECT * FROM tbl ORDER BY name, id) t
GROUP BY 1;
The added ORDER BY id (which can be appended to such aggregate functions) guarantees your desired result:
a | {1,2,3,4}
b | {5,6,7,8}
Or you might be interested in this alternative:
SELECT name, array_agg_mult(ARRAY[ARRAY[col1, col2]] ORDER BY id) AS c_arr
FROM tbl
GROUP BY 1
ORDER BY 1;
Which produces 2-dimensional arrays:
a | {{1,2},{3,4}}
b | {{5,6},{7,8}}
The last one can be replaced (and should be, as it's faster!) with the built-in array_agg() in Postgres 9.5 or later - with its added capability of aggregating arrays:
SELECT name, array_agg(ARRAY[col1, col2] ORDER BY id) AS c_arr
FROM tbl
GROUP BY 1
ORDER BY 1;
Same result. The manual:
input arrays concatenated into array of one higher dimension (inputs
must all have same dimensionality, and cannot be empty or null)
So not exactly the same as our custom aggregate function array_agg_mult();
select n, array_agg(c) as c
from (
select n, unnest(array[c1, c2]) as c
from t
) s
group by n
Or simpler
select
n,
array_agg(c1) || array_agg(c2) as c
from t
group by n
To address the new ordering requirement:
select n, array_agg(c order by id, o) as c
from (
select
id, n,
unnest(array[c1, c2]) as c,
unnest(array[1, 2]) as o
from t
) s
group by n

oracle - getting 1 or 0 records based on the number of occurrences of a non-unique field

I have a table MYTABLE
N_REC | MYFIELD |
1 | foo |
2 | foo |
3 | bar |
where N_REC is the primary key and MYFIELD is a non-unique field.
I need to query this table on MYFIELD and extract the associated N_REC, but only if there is only one occurrence of MYFIELD; otherwise I need no records returned.
So if I go with MYFIELD='bar' I will get 3, if I go with MYFIELD='foo' I will get no records.
I went with the following query
select * from
(
select
n_rec,
( select count(*) from mytable where mycolumn=my.mycolumn ) as counter
from mytable my where mycolumn=?
)
where counter=1
While it gives me the desired result I feel like I'm running the same query twice.
Are there better ways to achieve what I'm doing?
I think that this should do what you want:
SELECT
my_field,
MAX(n_rec)
FROM
My_Table
GROUP BY
my_field
HAVING
COUNT(*) = 1
You might also try the analytic or windowing version of count(*) and compare plans to the other options:
select n_rec, my_field
from (select n_rec, my_field
, count(*) over (partition by my_field) as Counter
from myTable
where my_field = ?)
where Counter = 1

PostgreSQL if query?

Is there a way to select records based using an if statement?
My table looks like this:
id | num | dis
1 | 4 | 0.5234333
2 | 4 | 8.2234
3 | 8 | 2.3325
4 | 8 | 1.4553
5 | 4 | 3.43324
And I want to select the num and dis where dis is the lowest number... So, a query that will produce the following results:
id | num | dis
1 | 4 | 0.5234333
4 | 8 | 1.4553
If you want all the rows with the minimum value within the group:
SELECT id, num, dis
FROM table1 T1
WHERE dis = (SELECT MIN(dis) FROM table1 T2 WHERE T1.num = T2.num)
Or you could use a join to get the same result:
SELECT T1.id, T1.num, T1.dis
FROM table1 T1
JOIN (
SELECT num, MIN(dis) AS dis
FROM table1
GROUP BY num
) T2
ON T1.num = T2.num AND T1.dis = T2.dis
If you only want a single row from each group, even if there are ties then you can use this:
SELECT id, dis, num FROM (
SELECT id, dis, num, ROW_NUMBER() OVER (PARTITION BY num ORDER BY dis) rn
FROM table1
) T1
WHERE rn = 1
Unfortunately this won't be very efficient. If you need something more efficient then please see Quassnoi's page on selecting rows with a groupwise maximum for PostgreSQL. Here he suggests several ways to perform this query and explains the performance of each. The summary from the article is as follows:
Unlike MySQL, PostgreSQL implements
several clean and documented ways to
select the records holding group-wise
maximums, including window functions
and DISTINCT ON.
However to the lack of the loose index
scan support by the PostgreSQL’s
optimizer and the less efficient usage
of indexes in PostgreSQL, the queries
using these function take too long.
To work around these problems and
improve the queries against the low
cardinality grouping conditions, a
certain solution described in the
article should be used.
This solution uses recursive CTE’s to
emulate loose index scan and is very
efficient if the grouping columns have
low cardinality.
Use this:
SELECT DISTINCT ON (num) id, num, dis
FROM tbl
ORDER BY num, dis
Or if you intend to use other RDBMS in future, use this:
select * from tbl a where dis =
(select min(dis) from tbl b where b.num = a.num)
If you need to have IF logic you can use PL/pgSQL.
http://www.postgresql.org/docs/8.4/interactive/plpgsql-control-structures.html
But try to solve your issue with SQL first if possible, it will be faster and use PL/pgSQL when SQL can't solve your problem.