Selecting a row after multiple groupings in postgres

Selecting a row after multiple groupings in postgres - sql

i have a table in a postgres DB which has the following structure:
id | date | groupme1 | groupme2 | value
----------------------------------------
1 |
2 |
3 |
Now i want to achieve the following:
Grouping the table after groupme1 and groupme2
Get the value for every group
But only the last entry for each group-compination (odered after date)
Example:
id | date | groupme1 | groupme2 | value
---------------------------------------
| | A | 1 | 4
| | A | 2 | 7
| | A | 3 | 3
| | B | 1 | 9
My current approach looks like this:
SELECT a.*
FROM table AS a
JOIN (SELECT max(id) AS id
FROM table
GROUP BY groupme1, groupme2) AS b
ON a.id = b.id
The Problems of this approach:
it asumes that higher dates have a higher id
it takes long
Is there a faster and better way of doing this? Can windowing function help with this?

I think you just want window functions:
select t.*
from (select t.*,
row_number() over (partition by groupme1, groupme2 order by date desc) as seqnum
from t
) t
where seqnum = 1;
Or, a better way to do this in Postgres uses distinct on:
select distinct on (groupme1, groupme2) t.*
from t
order by groupme1, groupme2, date desc;

Related

Extract employee record based on certain criteria

I have a database of employees with their employment history in organization.
Sample Data -
+----+----------+------------+
| ID | Date | Event |
+----+----------+------------+
| 1 | 20190807 | Hired |
| 1 | 20191209 | Promoted |
| 1 | 20200415 | Terminated |
| 2 | 20180901 | Hired |
| 2 | 20191231 | Terminated |
| 3 | 20180505 | Hired |
| 3 | 20190630 | Promoted |
+----+----------+------------+
I want to extract the list of employees who were terminated after promotion. In above example, the query should return ID 1.
I am using SSMS 17 if that helps.

You can try using lag()
DEMO
select distinct ID from
(
select *,lag(event) over(partition by id order by dateval) as prevval
from t
)A where prevval='Promoted'

If you want immediately after, then you would use lag(). If you want any time after, then you can use aggregation:
select id
from t
group by id
having max(case when event = 'Promoted' then dateval end) < max(case when event = 'Terminated' then dateval end);
Using lag(), the code looks like:
select id
from (select t.*, lag(event) over (partition by id order by dateval) as prev_event
from t
) t
where prev_event = 'Promoted' and event = 'Terminated';

A simple exists check could also solve this simple requirement.
DEMO
select * from table1 a
where event='Terminated'
and exists(select 1 from table1 b where a.ID = b.ID and event='Promoted');
output:
ID date1 event
1 20191209 Terminated
We can even compare event date in correlated sub-query as shown in DEMO link.

Select the highest value of column 2 per column 1

Given the following table P_PROV
+----+-----------+-----------+
| id | date | person_id |
+----+-----------+-----------+
| 1 |19/06/2019 | 1 |
| 2 |18/07/2010 | 2 |
| 3 |19/06/2020 | 1 |
| 4 |17/06/2020 | 2 |
| 5 |28/06/2020 | 3 |
+----+-----------+-----------+
I want this output
+----+-----------+-----------+
| id | date | person_id |
+----+-----------+-----------+
| 3 |19/06/2020 | 1 |
| 4 |17/06/2020 | 2 |
| 5 |28/06/2020 | 3 |
+----+-----------+-----------+
Putting this in words, I want to return per person the maximum date. I tried something like this
SELECT DISTINCT pp.date, pp.id FROM P_PROV pp
WHERE (SELECT MAX(aa.date)
FROM P_PROV aa) = pp.date;
This one is only returning one row (of course, because the MAX will return the maximum date only), but I really don't know how to approach this issue, any kind of help would be appreciated

ROW_NUMBER provides one way to handle this:
SELECT id, date, person_id
FROM
(
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY date DESC) rn
FROM yourTable t
) t
WHERE rn = 1;

Oracle has a fun way to do this using aggregation:
select max(id) keep (dense_rank first order by date desc) as id,
max(date) as date, person_id
from P_PROV
group by person_id;
Given that your ids are increasing, this probably also does what you want:
select max(id) as id, max(date) as date, person_id
from P_PROV
group by person_id;

SQL query for selecting multiple records for one product for a single id

My table looks like this, what I'm trying to achieve is to pull out all the records for one user for the product that have the earliest date
product |type_id| user | Date |Desired ROW_NUMBER as output |
-------+--------+------+-------+---------------------
1 | 1 | A | 0101 | 1
1 | 1 | A | 0102 | 1
2 | 3 | A | 0105 | 2
2 | 5 | A | 0105 | 2
3 | 7 | B | 0101 | 1
3 | 8 | B | 0104 | 1
So I want to pull all the records with "1" in the desired row_num column, but I haven't figured out hot to get this without doing another group by. Any helps would be appreciated.

You can use window functions:
select t.*
from (select t.*,
rank() over (partition by user order by min_date) as seqnum
from (select t.*,
min(date) over (partition by user, product) as min_date
from t
) t
) t
where seqnum = 1;
Or, with only one subquery:
select t.*
from (select t.*,
min(date) over (partition by user, product) as min_date_up,
min(date) over (partition by user) as min_date_u
from t
) t
where min_date_u = min_date_up;
You can interpret this as "return all rows where the product has the minimum date for the user".
Here is a db<>fiddle.

SELECT * FROM [tableName] WHERE Desired ROW_NUMBER = 1 ORDER BY Date[DESC, ASC]
Pass the Desired ROW_NUMBER value dynamically as a parameter.

Keep all columns in MIN / MAX query, but return 1 result

I'm sure I've done this before, but seem to have forgotten how..
I'm trying to filter a recordset so that I get just the 1 record, so for example, if this is my table called TableA:
| ID | User | Type | Date |
------------------------------------
| 1 | Matt | Opened | 1/8/2014 |
| 2 | Matt | Opened | 2/8/2014 |
| 3 | Matt | Created| 5/8/2014 |
| 4 | John | Opened | 1/8/2014 |
| 5 | John | Created| 2/8/2014 |
I'd want to filter it so I get the MIN of Date where the User is "Matt" and the Type is "Opened".
The result set needs to include the ID field and return just the 1 record, so it would look like this:
| ID | User | Type | Date |
------------------------------------
| 1 | Matt | Opened | 1/8/2014 |
I'm struggling with getting past the GROUPBY requirement when selecting the ID field... this seems to ignore MIN of Date and return more than 1 record.

Use TOP and ORDER BY:
select top 1 *
from table
where user = "Matt" and type = "Opened"
order by date asc;
Edit: changed order by from desc to asc as this achieves the MIN effect I'm after.

Another way is by finding the min or max date per user and type then join the result back to the main table
SELECT A.ID,
A.USER,
A.Type,
A.Date
FROM yourtable A
INNER JOIN (SELECT USER,
Type,
Min(Date) Date
FROM yourtable
WHERE USER = "Matt"
AND type = "Opened"
GROUP BY USER,
Type) B
ON A.USER = B.USER
AND A.Type = B.Type
AND A.date = B.Date

you can try using partition functions very easy its gives result for each user and performs better
;WITH cte
AS (
SELECT *
,ROW_NUMBER() OVER (
PARTITION BY [USER]
,type ORDER BY DATE ASC
) rnk
FROM tablea
)
SELECT *
FROM cte
WHERE type = 'opened'
AND rnk = 1

Make a field monotonic across all rows

I have table in my sql server database which I want to convert to PK column
To do that I want to change value of each row in this column to 1,2,3 ...
Could You write T-Sql query for that task ?
Thanks for help
begin state:
Id | Name |
----------
1 | One |
2 | Two |
2 | Three|
x | xxx |
result:
Id | Name |
----------
1 | One |
2 | Two |
3 | Three|
4 | xxx |

;with cte as
(
SELECT Id, ROW_NUMBER() over (order by Id) as rn
from YourTable
)
UPDATE cte SET Id = rn

you can also do it with name if you dont have the id!
;with cte as
(
SELECT Id, ROW_NUMBER() over (order by name) as rn
from YourTable
)
UPDATE cte SET Id = rn

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Selecting a row after multiple groupings in postgres - sql

Related

Extract employee record based on certain criteria

Select the highest value of column 2 per column 1

SQL query for selecting multiple records for one product for a single id

Keep all columns in MIN / MAX query, but return 1 result

Make a field monotonic across all rows

Categories

Resources