How to get rows with minimum ID on a multiple columns query - sql

I have a table like this:
Id
Type
multiple columns (a lot)...
1
50
2
50
3
50
4
75
5
75
6
75
I need to get only the rows with the older (min) id as a part of my query. The result should include all the columns of the table, but given that these multiple columns have multiple values, it's not posible to use MIN() and then GROUP BY
I need something like this:
Id
Type
multiple columns (a lot)...
1
50
4
75
I've tried using MIN() function and grouping by but that's not an option cause the rest of the columns have different values and if I use a GROUP BY I'm getting all the rows and not only the ones with the lowest ID's.
Any ideas?
Thanks!

You can use the WITH TIES option in concert with the window function lag() over()
To be clear, this will flag when the value changes
Example
Select top 1 with ties *
From YourTable
Order by case when lag([type],1) over (order by id) = [Type] then 0 else 1 end desc
Results
Id Type
1 50
4 75
Based on Rodrigo's solution, you may have wanted the first [Type] regardless of sequence.
Select top 1 with ties *
From YourTable
Order by row_number() over (partition by [Type] order by ID)

You can add a column that represents the number of dups.
That result will be used to join only with unique rows.
You can use Common table Expression to split the steps
WITH rows_with_index AS (
SELECT
ROW_NUMBER() OVER(PARTITION BY Type) AS row_number,
id,
Type
FROM
<TABLE>
ORDER BY 2
)
SELECT * FROM rows_with_index t
WHERE rows_with_index.row_number = 1;

Related

Use window functions to select the value from a column based on the sum of another column, in an aggregate query

Consider this data (View on DB Fiddle):
id
dept
value
1
A
5
1
A
5
1
B
7
1
C
5
2
A
5
2
A
5
2
B
15
2
A
2
The base query I am running is pretty simple. Just get the total value by id and the most frequent dept.
SELECT
id,
MODE() WITHIN GROUP(ORDER BY dept) AS dept_freq,
SUM(value) AS value
FROM test
GROUP BY id
;
id
dept_freq
value
1
A
22
2
A
27
But I also need to get, for each id, the dept that concentrates the greatest value (so the greatest sum of value by id and dept, not the highest individual value in the original table).
Is there any way to use window functions to achieve that and do it directly in the base query above?
The expected output for this particular example would be:
id
dept_freq
dept_value
value
1
A
A
22
2
A
B
27
I could achieve that with the query below and then joining that with the results of the base query above
SELECT * FROM(
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY value DESC) as row
FROM (
SELECT id, dept, SUM(value) AS value
FROM test
GROUP BY id, dept
) AS alias1
) AS alias2
WHERE alias2.row = 1
;
id
dept
value
row
1
A
10
1
2
B
15
1
But it is not easy to read/maintain and seems also pretty inefficient. So I thought it should be possible to achieve this using window functions directly in the base query, and that also may also help Postgres to come up with a better query plan that does less passes over the data. But none of my attempts using over partition and filter worked.
step-by-step demo:db<>fiddle
You can fetch the dept for the highest values using the first_value() partition function. Adding this before your mode() grouping should do it:
SELECT
id,
highest_value_dept,
MODE() WITHIN GROUP(ORDER BY dept) AS dept_freq,
SUM(value) as value
FROM (
SELECT
id,
dept,
value,
FIRST_VALUE(dept) OVER (PARTITION BY id ORDER BY value DESC) as highest_value_dept
FROM test
) s
GROUP BY 1,2

How to get number sequence in Postgres for similar value of data in a particular column?

I'm looking for an efficient approach where I can assign numbers in sequence to each group.
Record Group GroupSequence
-------|---------|--------------
1 Car 1
2 Car 2
3 Bike 1
4 Bus 1
5 Bus 2
6 Bus 3
I came through this question: How to add sequence number for groups in a SQL query without temp tables. But my use case is slightly different from it. Any ideas on how to accomplish this with a single query?
You are looking for row_number():
select t.*, row_number() over (partition by group order by record) as group_sequence
from t;
You can calculate this when you need it, so I see no reason to store it. However, you can update the values if you like:
update t
set group_sequence = tt.new_group_sequence
from (select t.*,
row_number() over (partition by group order by record) as new_group_sequence
from t
) tt
where tt.record = t.record;

return tuples with maximum values in one field when there is a tie

I have a sql table like:
Name | Value
------+------
Andy | 22
Ben | 22
Carl | 22
David | 21
Eddie | 20
Frank | 19
I need an sql query that will return the tuples containing the maximum value, and if there is a tie (as in the example above), the relevant tuples in the tie will all need to be returned. Note that the values are already in descending order, and if there is no tie, one tuple is returned.
I have tried TOP and MAX in conjunction with GROUP BYs, but none of these are working.
TOP returns an error for invalid syntax and my attempts with MAX are flat out wrong.
In the above example, the tuples with Andy, Ben and Carl should be returned.
You mention TOP which suggests SQL Server. If so, you can use TOP WITH TIES:
select top (1) with ties t.*
from t
order by value desc;
Alas, that won't work in Postgres. Just use a correlated subquery:
select t.*
from t
where t.value = (select max(t2.value) from t);
In Postgres, you can use window function rank():
select name, value
from (
select
t.*,
rank() over(order by value desc) rn
from mytable t
) t
where rn = 1
The subquery ranks records per descending value; records that have the same value get the same rank. Then, the outer query filters on the top record(s).

SQL select top rows based on limit

Please help me t make below select query
Source table
name Amount
-----------
A 2
B 3
C 2
D 7
if limit is 5 then result table should be
name Amount
-----------
A 2
B 3
if limit is 8 then result table
name Amount
-----------
A 2
B 3
C 2
You can use window function to achieve this:
select name,
amount
from (
select t.*,
sum(amount) over (
order by name
) s
from your_table t
) t
where s <= 8;
The analytic function sum will be aggregated row-by-row based on the given order order by name.
Once you found sum till given row using this, you can filter the result using a simple where clause to find rows till which sum of amount is under or equal to the given limit.
More on this topic:
The SQL OVER() clause - when and why is it useful?
https://explainextended.com/2009/03/08/analytic-functions-sum-avg-row_number/

query for roww returning the first element of a group in db2

Suppose I have a table filled with the data below, what SQL function or query I should use in db2 to retrieve all rows having the FIRST field FLD_A with value A, the FIRST field FLD_A with value B..and so on?
ID FLD_A FLD_B
1 A 10
2 A 20
3 A 30
4 B 10
5 A 20
6 C 30
I am expecting a table like below; I am aware of grouping done by function GROUP BY but how can I limit the query to return the very first of each group?
Essentially I would like to have the information about the very first row where a new value for FLD_A is appearing for the first time?
ID FLD_A FLD_B
1 A 10
4 B 10
6 C 30
Try this it works in sql
SELECT * FROM Table1
WHERE ID IN (SELECT MIN(ID) FROM Table1 GROUP BY FLD_A)
A good way to approach this problem is with window functions and row_number() in particular:
select t.*
from (select t.*,
row_number() over (partition by fld_a order by id) as seqnum
from table1
) t
where seqnum = 1;
(This is assuming that "first" means "minimum id".)
If you use t.*, this will add one extra column to the output. You can just list the columns you want to avoid this.