PostgreSQL group by column with aggregate - sql

I need to group by id and select the task with min/max seq as start and end
id | task | seq
----+------+-----
1 | aaa | 1
1 | bbb | 2
1 | ccc | 3
SELECT
id,
CASE WHEN seq = MIN(seq) THEN task AS start,
CASE WHEN seq = MAX(seq) THEN task AS end
FROM table
GROUP BY id;
But this results in
ERROR: column "seq" must appear in the GROUP BY clause or be used in an aggregate function
But I do not want group by seq

One method uses arrays:
SELECT id,
(ARRAY_AGG(task ORDER BY seq ASC))[1] as start_task,
(ARRAY_AGG(task ORDER BY seq DESC))[1] as end_task
FROM table
GROUP BY id;
Another method uses window functions with SELECT DISTINCT:
select distinct id,
first_value(task) over (partition by id order by seq) as start_task,
first_value(task) over (partition by id order by seq desc) as end_task
from t;

You can use window functions with a derived table:
select id, task, min_seq as start, max_seq as "end"
from (
select id, task, seq,
max(seq) over (partition by id) as max_seq,
min(seq) over (partition by id) as min_seq
from the_table
) t
where seq in (max_seq, min_seq)

One option here would be to use ROW_NUMBER along with aggregation and pivoting logic:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY seq) rn_min,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY seq DESC) rn_max
FROM yourTable
)
SELECT
id,
MAX(CASE WHEN rn_min = 1 THEN task END) AS start,
MAX(CASE WHEN rn_max = 1 THEN task END) AS end
FROM cte
GROUP BY
id;
Demo

Related

How to get columns from multiple rows in a single row in SQL

I want to get 2 columns col_a and col_b's values for min and max of some other column. For example:
id
last_updated
col_a
col_b
1
2021-01-01
abc
xyz
1
2021-01-02
abc_0
xyz_0
1
2021-01-03
abc_1
xyz_1
1
2021-01-04
abc_2
xyz_2
2
2021-01-01
abc
xyz
2
2021-01-01
abc
xyz
...
I want to get the result:
|1|abc|abc_2|xyz|xyz_2|
That is the result of grouping by id, and getting the values of these columns while putting the condition of min and max on some other column(last_updated).
I came up with the following query:
select id, max(last_updated), min(last_updated)
from my_table
group by id
This gives me the id and min and max dates but not the other 2 columns. I'm not sure how to get the values for the other 2 columns for both dates in same query.
You can use MIN and MAX analytical function as follows:
select id,
max(case when mindt = last_updated then col_a end) as min_col_a,
max(case when maxdt = last_updated then col_a end) as max_col_a,
max(case when mindt = last_updated then col_b end) as min_col_b,
max(case when maxdt = last_updated then col_b end) as max_col_b
from
(select t.*,
min(last_updated) over (partition by id) as mindt,
max(last_updated) over (partition by id) as maxdt
from your_table t) t
group by id
We can use ROW_NUMBER, twice, to find the first and last rows, as ordered by last_updated, for each id group of records. Then, aggregate by id and pivot out columns for the various col_a and col_b values.
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY last_updated) rn_min,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY last_updated DESC) rn_max
FROM yourTable
)
SELECT
id,
MAX(CASE WHEN rn_min = 1 THEN col_a END) AS col_a_min,
MAX(CASE WHEN rn_max = 1 THEN col_a END) AS col_a_max,
MAX(CASE WHEN rn_min = 1 THEN col_b END) AS col_b_min,
MAX(CASE WHEN rn_max = 1 THEN col_b END) AS col_b_max
FROM cte
GROUP BY id;
Not the neatest solution but demonstrates another way to obtain the data you want. We join the table on itself as we normally want data from 2 rows, then we use cross apply to restrict it to first and last.
select T1.id, T2.col_a, T1.col_a, T2.col_b, T1.col_b
from #my_table T1
inner join #my_table T2 on T1.id = T2.id
cross apply (
select id, max(last_updated) MaxLastUpdated, min(last_updated) MinLastUpdated
from #my_table
group by id
) X
where T1.last_updated = X.MaxLastUpdated and T2.last_updated = X.MinLastUpdated;
With the sample data provided this appear to perform worse than the row_number() solution. The fastest solution is the analytical functions.
select id,max(last_updated) last_updatedMax, min(last_updated) last_updatedMin,max(col_aMax) col_aMax, max(col_aMin) col_aMin,max(col_bMax) col_bMax, max(col_bMin) col_bMin
from
(
select
*
, first_value(col_a) OVER (PARTITION BY id ORDER BY last_updated desc) as col_aMax
, first_value(col_a) OVER (PARTITION BY id ORDER BY last_updated asc) as col_aMin
, first_value(col_b) OVER (PARTITION BY id ORDER BY last_updated desc) as col_bMax
, first_value(col_b) OVER (PARTITION BY id ORDER BY last_updated asc) as col_bMin
from my_table
) t
group by id

Group sequential repeated values sqlite

I have data that repeated sequentially..
A
A
A
B
B
B
A
A
A
I need to group them like this
A
B
A
What is the best approach to do so using sqlite?
Assuming that you have a column that defines the ordering of the rows, say id, you can address this gaps-and-island problem with window functions:
select col, count(*) cnt, min(id) first_id, max(id) last_id
from (
select t.*,
row_number() over(order by id) rn1,
row_number() over(partition by col order by id) rn2
from mytable t
) t
group by col, rn1 - rn2
order by min(id)
I added a few columns to the resultset that give more information about the content of each group.
If you have defined a column that defines the order of the rows, like an id, you can use window function LEAD():
select col
from (
select col, lead(col, 1, '') over (order by id) next_col
from tablename
)
where col <> next_col
See the demo.
Results:
| col |
| --- |
| A |
| B |
| A |

Consolidate Rows with rank

Current:
When the RNK is 1 then consolidate the ID as shown else if RNK is 0 then keep it as it is .
Please help how to do .
Required:
This is a gaps-and-islands problem. However, you really only care about the islands when rnk = 1. So, a convenient way to calculate them is the cumulative sum of rnk = 0. Then the rest is aggregation and combining the ids:
select (case when min(id) = max(id) then min(id)
else min(id) || '-' || max(id)
end) as id,
rnk
from (select t.*, sum(1 - rnk) over (order by id) as grp
from t
) t
group by grp, rnk
order by min(id);
Here is a db<>fiddle.
This is a gaps-and-island problem. You want to group together adjacent rows where rnk has value 1.
Here is an approach using row_number() and conditional expressions:
select
case when min(id) <> max(id) then concat(min(id), '-', max(id)) else min(id) end id,
min(rnk) rnk
from (
select
t.*,
row_number() over(order by id) rn1,
row_number() over(partition by rnk order by id) rn2
from mytable t
) t
group by case when rnk = 1 then rn1 - rn2 else rn1 + rn2 end
order by min(id)
Demo on DB Fiddle:
id | rnk
:-------- | --:
A100-A102 | 1
A103 | 0
A104 | 0
A105-A106 | 1

Select SUM and column with max

I looking best or simplest way to SELECT type, user_with_max_value, SUM(value) GROUP BY type. Table look similar
type | user | value
type1 | 1 | 100
type1 | 2 | 200
type2 | 1 | 50
type2 | 2 | 10
And result look:
type1 | 2 | 300
type2 | 1 | 60
Use window functions:
select type, max(case when seqnum = 1 then user end), sum(value)
from (select t.*,
row_number() over (partition by type order by value desc) as seqnum
from t
) t
where seqnum = 1;
Some databases have functionality for an aggregation function that returns the first value. One method without a subquery using standard SQL is:
select distinct type,
first_value(user) over (partition by type order by value desc) as user,
sum(value) over (partition by type)
from t;
You can use window function :
select t.*
from (select t.type,
row_number() over (partition by type order by value desc) as seq,
sum(value) over (partition by type) as value
from table t
) t
where seq = 1;
Try below query.
It will help you.
SELECT type, max(user), SUM(value) from table1 GROUP BY type
use analytical functions
create table poo2
(
thetype varchar(5),
theuser int,
thevalue int
)
insert into poo2
select 'type1',1,100 union all
select 'type1',2,200 union all
select 'type2',1,50 union all
select 'type2',2,10
select thetype,theuser,mysum
from
(
select thetype ,theuser
,row_number() over (partition by thetype order by thevalue desc) r
,sum(thevalue) over (partition by thetype) mysum from poo2
) ilv
where r=1

Get max number Oracle SQL

I have the following data in database, the primary key is the field SEQ, I wish to select the data which has the maximum seq:
ID SEQ FILE
1007 1 abc
1007 2 def
The following query is invalid but I wish to do the same thing as following.
SELECT * FROM table1 WHERE id = '1007' AND Max(seq)
SELECT id, seq, file
FROM (
select id, seq, file,
max(seq) over (partition by id) as max_seq
from table1
WHERE id = '1007'
) t
where seq = max_seq;
If I understood right, you wish something like this
select *
from (select t.*,
max(t.seq)
keep (dense_rank first order by t.seq desc)
over (partition by t.id) max#
from table1 t)
where seq = max#
Another approach:
select id,seq,"FILE" from
(select t1.*,row_number() over (partition by id order by seq desc) cont_seq
from your_table t1)
where cont_seq = 1
order by seq;
This will give you all rows grouped by id who has the max seq value. If you want an specific value just add the condition in the where clause like this:
select id,seq,"FILE" from
(select t1.*,row_number() over (partition by id order by seq desc) cont_seq
from your_table t1)
where cont_seq = 1 and id = '1007'
order by seq;
select * from alber.table1;
MYID SEQ FILENAME
1007 2 abc
1007 10 def
1008 45 abc
1008 9 def
SELECT myid, seq, filename from alber.table1 mq
where seq = (select max(seq) from alber.table1 sq where sq.myid = mq.myid);
MYID SEQ FILENAME
1007 10 def
1008 45 abc