SQL Row Count Over Partition By - sql

As we know, over partition by increases until the group changes. When the group is changed, it starts over. How can the opposite be done? that is, if the group is not changed, the number should repeat as follows.
NAME | ROW_COUNT
A 1
A 1
A 1
B 2
C 3
C 3
D 4
E 5

Your scenario is of using dense_rank() as rank() doesn't maintain the sequence but just ranks the column also row_number() maintains the sequence but again in case of similar rank it assigns it a unique number
select name
, dense_rank() over (partition by name order by name)
from table;

Related

Use window functions to select the value from a column based on the sum of another column, in an aggregate query

Consider this data (View on DB Fiddle):
id
dept
value
1
A
5
1
A
5
1
B
7
1
C
5
2
A
5
2
A
5
2
B
15
2
A
2
The base query I am running is pretty simple. Just get the total value by id and the most frequent dept.
SELECT
id,
MODE() WITHIN GROUP(ORDER BY dept) AS dept_freq,
SUM(value) AS value
FROM test
GROUP BY id
;
id
dept_freq
value
1
A
22
2
A
27
But I also need to get, for each id, the dept that concentrates the greatest value (so the greatest sum of value by id and dept, not the highest individual value in the original table).
Is there any way to use window functions to achieve that and do it directly in the base query above?
The expected output for this particular example would be:
id
dept_freq
dept_value
value
1
A
A
22
2
A
B
27
I could achieve that with the query below and then joining that with the results of the base query above
SELECT * FROM(
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY value DESC) as row
FROM (
SELECT id, dept, SUM(value) AS value
FROM test
GROUP BY id, dept
) AS alias1
) AS alias2
WHERE alias2.row = 1
;
id
dept
value
row
1
A
10
1
2
B
15
1
But it is not easy to read/maintain and seems also pretty inefficient. So I thought it should be possible to achieve this using window functions directly in the base query, and that also may also help Postgres to come up with a better query plan that does less passes over the data. But none of my attempts using over partition and filter worked.
step-by-step demo:db<>fiddle
You can fetch the dept for the highest values using the first_value() partition function. Adding this before your mode() grouping should do it:
SELECT
id,
highest_value_dept,
MODE() WITHIN GROUP(ORDER BY dept) AS dept_freq,
SUM(value) as value
FROM (
SELECT
id,
dept,
value,
FIRST_VALUE(dept) OVER (PARTITION BY id ORDER BY value DESC) as highest_value_dept
FROM test
) s
GROUP BY 1,2

How to return the category with max value for every user in postgresql?

This is the table
id
category
value
1
A
40
1
B
20
1
C
10
2
A
4
2
B
7
2
C
7
3
A
32
3
B
21
3
C
2
I want the result like this
id
category
1
A
2
B
2
C
3
A
For small tables or for only very few rows per user, a subquery with the window function rank() (as demonstrated by The Impaler) is just fine. The resulting sequential scan over the whole table, followed by a sort will be the most efficient query plan.
For more than a few rows per user, this gets increasingly inefficient though.
Typically, you also have a users table holding one distinct row per user. If you don't have it, created it! See:
Is there a way to SELECT n ON (like DISTINCT ON, but more than one of each)
Select first row in each GROUP BY group?
We can leverage that for an alternative query that scales much better - using WITH TIES in a LATERAL JOIN. Requires Postgres 13 or later.
SELECT u.id, t.*
FROM users u
CROSS JOIN LATERAL (
SELECT t.category
FROM tbl t
WHERE t.id = u.id
ORDER BY t.value DESC
FETCH FIRST 1 ROWS WITH TIES -- !
) t;
db<>fiddle here
See:
Get top row(s) with highest value, with ties
Fetching a minimum of N rows, plus all peers of the last row
This can use a multicolumn index to great effect - which must exist, of course:
CREATE INDEX ON tbl (id, value);
Or:
CREATE INDEX ON tbl (id, value DESC);
Even faster index-only scans become possible with:
CREATE INDEX ON tbl (id, value DESC, category);
Or (the optimum for the query at hand):
CREATE INDEX ON tbl (id, value DESC) INCLUDE (category);
Assuming value is defined NOT NULL, or we have to use DESC NULLS LAST. See:
Sort by column ASC, but NULL values first?
To keep users in the result that don't have any rows in table tbl, user LEFT JOIN LATERAL (...) ON true. See:
What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
You can use RANK() to identify the rows you want. Then, filtering is easy. For example:
select *
from (
select *,
rank() over(partition by id order by value desc) as rk
from t
) x
where rk = 1
Result:
id category value rk
--- --------- ------ --
1 A 40 1
2 B 7 1
2 C 7 1
3 A 32 1
See running example at DB Fiddle.

How to get number sequence in Postgres for similar value of data in a particular column?

I'm looking for an efficient approach where I can assign numbers in sequence to each group.
Record Group GroupSequence
-------|---------|--------------
1 Car 1
2 Car 2
3 Bike 1
4 Bus 1
5 Bus 2
6 Bus 3
I came through this question: How to add sequence number for groups in a SQL query without temp tables. But my use case is slightly different from it. Any ideas on how to accomplish this with a single query?
You are looking for row_number():
select t.*, row_number() over (partition by group order by record) as group_sequence
from t;
You can calculate this when you need it, so I see no reason to store it. However, you can update the values if you like:
update t
set group_sequence = tt.new_group_sequence
from (select t.*,
row_number() over (partition by group order by record) as new_group_sequence
from t
) tt
where tt.record = t.record;

SQL Query to obtain the maximum value for each unique value in another column

ID Sum Name
a 10 Joe
a 8 Mary
b 21 Kate
b 110 Casey
b 67 Pierce
What would you recommend as the best way to
obtain for each ID the name that corresponds to the largest sum (grouping by ID).
What I tried so far:
select ID, SUM(Sum) s, Name
from Table1
group by ID, Name
Order by SUM(Sum) DESC;
this will arrange the records into groups that have the highest sum first. Then I have to somehow flag those records and keep only those. Any tips or pointers? Thanks a lot
In the end I'd like to obtain:
a 10 Joe
b 110 Casey
You want the row_number() function:
select id, [sum], name
from (select t.*]
row_number() over (partition by id order by [sum] desc) as seqnum
from table1
) t
where seqnum = 1;
Your question is more confusing than it needs to be because you have a column called sum. You should avoid using SQL reserved words for identifiers.
The row_number() function assigns a sequential number to a group of rows, starting with 1. The group is defined by the partition by clause. In this case, all rows with the same id are in the same group. The ordering of the numbers is determined by the order by clause, so the one with the largest value of sum gets the value of 1.
If you might have duplicate maximum values and you want all of them, use the related function rank() or dense_rank().
select *
from
(
select *
,rn = row_number() over (partition by Id order by sum desc)
from table
)x
where x.rn=1
demo

query for roww returning the first element of a group in db2

Suppose I have a table filled with the data below, what SQL function or query I should use in db2 to retrieve all rows having the FIRST field FLD_A with value A, the FIRST field FLD_A with value B..and so on?
ID FLD_A FLD_B
1 A 10
2 A 20
3 A 30
4 B 10
5 A 20
6 C 30
I am expecting a table like below; I am aware of grouping done by function GROUP BY but how can I limit the query to return the very first of each group?
Essentially I would like to have the information about the very first row where a new value for FLD_A is appearing for the first time?
ID FLD_A FLD_B
1 A 10
4 B 10
6 C 30
Try this it works in sql
SELECT * FROM Table1
WHERE ID IN (SELECT MIN(ID) FROM Table1 GROUP BY FLD_A)
A good way to approach this problem is with window functions and row_number() in particular:
select t.*
from (select t.*,
row_number() over (partition by fld_a order by id) as seqnum
from table1
) t
where seqnum = 1;
(This is assuming that "first" means "minimum id".)
If you use t.*, this will add one extra column to the output. You can just list the columns you want to avoid this.