Select the highest value of column 2 per column 1 - sql

Given the following table P_PROV
+----+-----------+-----------+
| id | date | person_id |
+----+-----------+-----------+
| 1 |19/06/2019 | 1 |
| 2 |18/07/2010 | 2 |
| 3 |19/06/2020 | 1 |
| 4 |17/06/2020 | 2 |
| 5 |28/06/2020 | 3 |
+----+-----------+-----------+
I want this output
+----+-----------+-----------+
| id | date | person_id |
+----+-----------+-----------+
| 3 |19/06/2020 | 1 |
| 4 |17/06/2020 | 2 |
| 5 |28/06/2020 | 3 |
+----+-----------+-----------+
Putting this in words, I want to return per person the maximum date. I tried something like this
SELECT DISTINCT pp.date, pp.id FROM P_PROV pp
WHERE (SELECT MAX(aa.date)
FROM P_PROV aa) = pp.date;
This one is only returning one row (of course, because the MAX will return the maximum date only), but I really don't know how to approach this issue, any kind of help would be appreciated

ROW_NUMBER provides one way to handle this:
SELECT id, date, person_id
FROM
(
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY date DESC) rn
FROM yourTable t
) t
WHERE rn = 1;

Oracle has a fun way to do this using aggregation:
select max(id) keep (dense_rank first order by date desc) as id,
max(date) as date, person_id
from P_PROV
group by person_id;
Given that your ids are increasing, this probably also does what you want:
select max(id) as id, max(date) as date, person_id
from P_PROV
group by person_id;

Related

SQL Server Add row number each group

I working on a query for SQL Server 2016. I have order by serial_no and group by pay_type and I would like to add row number same example below
row_no | pay_type | serial_no
1 | A | 4000118445
2 | A | 4000118458
3 | A | 4000118461
4 | A | 4000118473
5 | A | 4000118486
1 | B | 4000118499
2 | B | 4000118506
3 | B | 4000118519
4 | B | 4000118521
1 | A | 4000118534
2 | A | 4000118547
3 | A | 4000118550
1 | B | 4000118562
2 | B | 4000118565
3 | B | 4000118570
4 | B | 4000118572
Help me please..
SELECT
ROW_NUMBER() OVER(PARTITION BY paytype ORDER BY serial_no) as row_no,
paytype, serial_no
FROM table
ORDER BY serial_no
You can assign groups to adjacent pay types that are the same and then use row_number(). For this purpose, the difference of row numbers is a good way to determine the groups:
select row_number() over (partition by pay_type, seqnum - seqnum_2 order by serial_no) as row_no,
t.*
from (select t.*,
row_number() over (order by serial_no) as seqnum,
row_number() over (partition by pay_type order by serial_no) as seqnum_2
from t
) t;
This type of problem is one example of a gaps-and-islands problem. Why does the difference of row numbers work? I find that the simplest way to understand is to look at the results of the subquery.
Here is a db<>fiddle.
add this to your select list
ROW_NUMBER() OVER ( ORDER BY (SELECT 1) )
since you already sorting by your stuff, so you don't need to sorting in your windowing function so consuming less CPU,

Extract employee record based on certain criteria

I have a database of employees with their employment history in organization.
Sample Data -
+----+----------+------------+
| ID | Date | Event |
+----+----------+------------+
| 1 | 20190807 | Hired |
| 1 | 20191209 | Promoted |
| 1 | 20200415 | Terminated |
| 2 | 20180901 | Hired |
| 2 | 20191231 | Terminated |
| 3 | 20180505 | Hired |
| 3 | 20190630 | Promoted |
+----+----------+------------+
I want to extract the list of employees who were terminated after promotion. In above example, the query should return ID 1.
I am using SSMS 17 if that helps.
You can try using lag()
DEMO
select distinct ID from
(
select *,lag(event) over(partition by id order by dateval) as prevval
from t
)A where prevval='Promoted'
If you want immediately after, then you would use lag(). If you want any time after, then you can use aggregation:
select id
from t
group by id
having max(case when event = 'Promoted' then dateval end) < max(case when event = 'Terminated' then dateval end);
Using lag(), the code looks like:
select id
from (select t.*, lag(event) over (partition by id order by dateval) as prev_event
from t
) t
where prev_event = 'Promoted' and event = 'Terminated';
A simple exists check could also solve this simple requirement.
DEMO
select * from table1 a
where event='Terminated'
and exists(select 1 from table1 b where a.ID = b.ID and event='Promoted');
output:
ID date1 event
1 20191209 Terminated
We can even compare event date in correlated sub-query as shown in DEMO link.

SQL query for selecting multiple records for one product for a single id

My table looks like this, what I'm trying to achieve is to pull out all the records for one user for the product that have the earliest date
product |type_id| user | Date |Desired ROW_NUMBER as output |
-------+--------+------+-------+---------------------
1 | 1 | A | 0101 | 1
1 | 1 | A | 0102 | 1
2 | 3 | A | 0105 | 2
2 | 5 | A | 0105 | 2
3 | 7 | B | 0101 | 1
3 | 8 | B | 0104 | 1
So I want to pull all the records with "1" in the desired row_num column, but I haven't figured out hot to get this without doing another group by. Any helps would be appreciated.
You can use window functions:
select t.*
from (select t.*,
rank() over (partition by user order by min_date) as seqnum
from (select t.*,
min(date) over (partition by user, product) as min_date
from t
) t
) t
where seqnum = 1;
Or, with only one subquery:
select t.*
from (select t.*,
min(date) over (partition by user, product) as min_date_up,
min(date) over (partition by user) as min_date_u
from t
) t
where min_date_u = min_date_up;
You can interpret this as "return all rows where the product has the minimum date for the user".
Here is a db<>fiddle.
SELECT * FROM [tableName] WHERE Desired ROW_NUMBER = 1 ORDER BY Date[DESC, ASC]
Pass the Desired ROW_NUMBER value dynamically as a parameter.

Selecting a row after multiple groupings in postgres

i have a table in a postgres DB which has the following structure:
id | date | groupme1 | groupme2 | value
----------------------------------------
1 |
2 |
3 |
Now i want to achieve the following:
Grouping the table after groupme1 and groupme2
Get the value for every group
But only the last entry for each group-compination (odered after date)
Example:
id | date | groupme1 | groupme2 | value
---------------------------------------
| | A | 1 | 4
| | A | 2 | 7
| | A | 3 | 3
| | B | 1 | 9
My current approach looks like this:
SELECT a.*
FROM table AS a
JOIN (SELECT max(id) AS id
FROM table
GROUP BY groupme1, groupme2) AS b
ON a.id = b.id
The Problems of this approach:
it asumes that higher dates have a higher id
it takes long
Is there a faster and better way of doing this? Can windowing function help with this?
I think you just want window functions:
select t.*
from (select t.*,
row_number() over (partition by groupme1, groupme2 order by date desc) as seqnum
from t
) t
where seqnum = 1;
Or, a better way to do this in Postgres uses distinct on:
select distinct on (groupme1, groupme2) t.*
from t
order by groupme1, groupme2, date desc;

Select entire partition where max row in partition is greater than 1

I'm partitioning by some non unique identifier, but I'm only concerned in the partitions with at least two results. What would be the way to get out all the instances where there's exactly one of the specified identifier?
Query I'm using:
SELECT ROW_NUMBER() OVER
(PARTITION BY nonUniqueId ORDER BY nonUniqueId, aTimeStamp) as row
,nonUniqueId
,aTimeStamp
FROM myTable
What I'm getting:
row | nonUniqueId | aTimeStamp
---------------------------------
1 | 1234 | 2014-10-08...
2 | 1234 | 2014-10-09...
1 | 1235 | 2014-10-08...
1 | 1236 | 2014-10-08...
2 | 1236 | 2014-10-09...
What I want:
row | nonUniqueId | aTimeStamp
---------------------------------
1 | 1234 | 2014-10-08...
2 | 1234 | 2014-10-09...
1 | 1236 | 2014-10-08...
2 | 1236 | 2014-10-09...
Thanks for any direction :)
Based on syntax, I'm assuming this is SQL Server 2005 or higher. My answer will be meant for that.
You have a couple options.
One, use a CTE:
;WITH CTE AS (
SELECT ROW_NUMBER() OVER
(PARTITION BY nonUniqueId ORDER BY nonUniqueId, aTimeStamp) as row
,nonUniqueId
,aTimeStamp
FROM myTable
)
SELECT *
FROM CTE t
WHERE EXISTS (SELECT 1 FROM CTE WHERE row = 2 and nonUniqueId = t.nonUniqueId);
Or, you can use subqueries:
SELECT ROW_NUMBER() OVER
(PARTITION BY nonUniqueId ORDER BY nonUniqueId, aTimeStamp) as row
,nonUniqueId
,aTimeStamp
FROM myTable t
WHERE EXISTS (SELECT 1 FROM myTable
WHERE nonUniqueId = t.nonUniqueId GROUP BY nonUniqueId, aTimeStamp HAVING COUNT(*) >= 2);