Extract employee record based on certain criteria - sql

I have a database of employees with their employment history in organization.
Sample Data -
+----+----------+------------+
| ID | Date | Event |
+----+----------+------------+
| 1 | 20190807 | Hired |
| 1 | 20191209 | Promoted |
| 1 | 20200415 | Terminated |
| 2 | 20180901 | Hired |
| 2 | 20191231 | Terminated |
| 3 | 20180505 | Hired |
| 3 | 20190630 | Promoted |
+----+----------+------------+
I want to extract the list of employees who were terminated after promotion. In above example, the query should return ID 1.
I am using SSMS 17 if that helps.

You can try using lag()
DEMO
select distinct ID from
(
select *,lag(event) over(partition by id order by dateval) as prevval
from t
)A where prevval='Promoted'

If you want immediately after, then you would use lag(). If you want any time after, then you can use aggregation:
select id
from t
group by id
having max(case when event = 'Promoted' then dateval end) < max(case when event = 'Terminated' then dateval end);
Using lag(), the code looks like:
select id
from (select t.*, lag(event) over (partition by id order by dateval) as prev_event
from t
) t
where prev_event = 'Promoted' and event = 'Terminated';

A simple exists check could also solve this simple requirement.
DEMO
select * from table1 a
where event='Terminated'
and exists(select 1 from table1 b where a.ID = b.ID and event='Promoted');
output:
ID date1 event
1 20191209 Terminated
We can even compare event date in correlated sub-query as shown in DEMO link.

Related

Select the highest value of column 2 per column 1

Given the following table P_PROV
+----+-----------+-----------+
| id | date | person_id |
+----+-----------+-----------+
| 1 |19/06/2019 | 1 |
| 2 |18/07/2010 | 2 |
| 3 |19/06/2020 | 1 |
| 4 |17/06/2020 | 2 |
| 5 |28/06/2020 | 3 |
+----+-----------+-----------+
I want this output
+----+-----------+-----------+
| id | date | person_id |
+----+-----------+-----------+
| 3 |19/06/2020 | 1 |
| 4 |17/06/2020 | 2 |
| 5 |28/06/2020 | 3 |
+----+-----------+-----------+
Putting this in words, I want to return per person the maximum date. I tried something like this
SELECT DISTINCT pp.date, pp.id FROM P_PROV pp
WHERE (SELECT MAX(aa.date)
FROM P_PROV aa) = pp.date;
This one is only returning one row (of course, because the MAX will return the maximum date only), but I really don't know how to approach this issue, any kind of help would be appreciated
ROW_NUMBER provides one way to handle this:
SELECT id, date, person_id
FROM
(
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY date DESC) rn
FROM yourTable t
) t
WHERE rn = 1;
Oracle has a fun way to do this using aggregation:
select max(id) keep (dense_rank first order by date desc) as id,
max(date) as date, person_id
from P_PROV
group by person_id;
Given that your ids are increasing, this probably also does what you want:
select max(id) as id, max(date) as date, person_id
from P_PROV
group by person_id;

Update in the same column from the same table

I'm trying to update a column in my table that was ignored at the initial insert based on a key and not null values in the same column.
My table is a history table in a data warehouse : it consists of (to simplify):
id which is its primary key
employee_id
date_of_birth
project_id
The rows help the company keep track of projects that an employee had worked on.
The problem is that when updating this table, the date_of_birth column is ignored, which is a problem for me since I'm working on a project that needs the age of the employee at the time he changed projects.
Actual:
+----+-------------+---------------+------------+
| ID | EMPLOYEE_ID | YEAR_OF_BIRTH | PROJECT_ID |
+----+-------------+---------------+------------+
| 1 | 1 | 1980 | 1 |
| 2 | 1 | NULL | 2 |
| 3 | 2 | 1990 | 2 |
| 4 | 2 | NULL | 1 |
+----+-------------+---------------+------------+
And this what I want:
+----+-------------+---------------+------------+
| ID | EMPLOYEE_ID | YEAR_OF_BIRTH | PROJECT_ID |
+----+-------------+---------------+------------+
| 1 | 1 | 1980 | 1 |
| 2 | 1 | 1980 | 2 |
| 3 | 2 | 1990 | 2 |
| 4 | 2 | 1990 | 1 |
+----+-------------+---------------+------------+
We could try using COALESCE to conditionally replace a NULL year of birth with a non NULL value:
SELECT
ID,
EMPLOYEE_ID,
COALESCE(YEAR_OF_BIRTH, MAX(YEAR_OF_BIRTH) OVER (PARTITION BY EMPLOYEE_ID)) AS YEAR_OF_BIRTH,
PROJECT_ID
FROM yourTable;
The following query should do what you want:
UPDATE yourTable
SET YEAR_OF_BIRTH = (SELECT MIN(YEAR_OF_BIRTH) FROM yourTable a where a.EMPLOYEE_ID = EMPLOYEE_ID)
WHERE YEAR_OF_BIRTH IS NULL
According to your sample data, you can also use a correlated subquery as
SELECT T1.ID,
T1.EMPLOYEE_ID,
ISNULL(YEAR_OF_BIRTH,(
SELECT MAX(T2.YEAR_OF_BIRTH)
FROM T T2
WHERE T2.EMPLOYEE_ID = T1.EMPLOYEE_ID
)),
T1.PROJECT_ID
FROM T T1 ;
OR
SELECT ID,
EMPLOYEE_ID,
ISNULL(YEAR_OF_BIRTH, MAX(YEAR_OF_BIRTH) OVER (PARTITION BY EMPLOYEE_ID)) AS YEAR_OF_BIRTH,
PROJECT_ID
FROM T;
Demo
I would use an updatable CTE for this purpose:
with toupdate as (
select a.*, min(year_of_birth) over (partition by employee_id) as min_date_of_birth
from actual a
)
update toupdate
set date_of_birth = min_date_of_birth
where date_of_birth is null or date_of_birth <> min_ date_of_birth;
The where clause reduces the number of rows being updated.
That said, FIX YOUR DATA MODEL. Sorry for raising my voice. The date-of-birth information should not be stored in this table. It should be in the employee table, because an employee has only one of them.
Your desired output can get by this query:
SELECT ID, EMPLOYEE_ID,
MAX(YEAR_OF_BIRTH) OVER (PARTITION BY EMPLOYEE_ID) AS YEAR_OF_BIRTH,
PROJECT_ID
FROM Table1
To check the output of the query you can Click Here

SELECT based on multiple fields in MS-SQL

I have a table with 4 columns:
AcctNumb | PeriodEndingDate | WaterConsumption | ReadingType
There are multiple records for each AcctNumb, with the date that each record was recorded.
What I want to do is grab the most recent date, consumption reading, and reading type for each account.
I have tried using MAX(PeriodEndingDate) and GROUP BY AcctNumb, but I would need to aggregate all the other values, and none of the aggregate functions help me for the WaterConsumption, etc.
Can anyone point me in the right direction?
Thanks
EDIT
Here is a sample table
+----------+------------------+------------------+-------------+
| AcctNumb | PeriodEndingDate | WaterConsumption | ReadingType |
+----------+------------------+------------------+-------------+
| 1000 | 2018-03-31 | 122230 | A |
| 1001 | 2018-03-31 | 24850 | A |
| 1002 | 2018-03-31 | 88540 | A |
| 1000 | 2017-12-31 | 123800 | A |
| 1001 | 2017-12-31 | 3000 | E |
+----------+------------------+------------------+-------------+
The ReadingType is whether it's an actual (A) reading, or an estimate (E).
Try this
SELECT
AcctNumb,
PeriodEndingDate,
WaterConsumption,
ReadingType
FROM (SELECT
AcctNumb,
PeriodEndingDate,
WaterConsumption,
ReadingType,
ROW_NUMBER() OVER (PARTITION BY AcctNumb ORDER BY PeriodEndingDate DESC) AS MostrecentRecord
FROM <TableName>) dt
WHERE MostrecentRecord= 1
This can be done using ROW_NUMBER. It has been asked an answered thousands of times but the query is easier to write than find a duplicate.
select *
from
(
select *
, RowNum = ROW_NUMBER() over(partition by AcctNumb order by PeriodEndingDate)
from YourTable
) x
where x.RowNum = 1
SELECT DQ.* FROM
(SELECT *,
Row_Number() OVER (PARTITION BY AcctNumb ORDER BY PeriodEndingDate DESC) AS RN
FROM YourTable
) AS DQ
WHERE DQ.RN = 1

Selecting a row after multiple groupings in postgres

i have a table in a postgres DB which has the following structure:
id | date | groupme1 | groupme2 | value
----------------------------------------
1 |
2 |
3 |
Now i want to achieve the following:
Grouping the table after groupme1 and groupme2
Get the value for every group
But only the last entry for each group-compination (odered after date)
Example:
id | date | groupme1 | groupme2 | value
---------------------------------------
| | A | 1 | 4
| | A | 2 | 7
| | A | 3 | 3
| | B | 1 | 9
My current approach looks like this:
SELECT a.*
FROM table AS a
JOIN (SELECT max(id) AS id
FROM table
GROUP BY groupme1, groupme2) AS b
ON a.id = b.id
The Problems of this approach:
it asumes that higher dates have a higher id
it takes long
Is there a faster and better way of doing this? Can windowing function help with this?
I think you just want window functions:
select t.*
from (select t.*,
row_number() over (partition by groupme1, groupme2 order by date desc) as seqnum
from t
) t
where seqnum = 1;
Or, a better way to do this in Postgres uses distinct on:
select distinct on (groupme1, groupme2) t.*
from t
order by groupme1, groupme2, date desc;

Keep all columns in MIN / MAX query, but return 1 result

I'm sure I've done this before, but seem to have forgotten how..
I'm trying to filter a recordset so that I get just the 1 record, so for example, if this is my table called TableA:
| ID | User | Type | Date |
------------------------------------
| 1 | Matt | Opened | 1/8/2014 |
| 2 | Matt | Opened | 2/8/2014 |
| 3 | Matt | Created| 5/8/2014 |
| 4 | John | Opened | 1/8/2014 |
| 5 | John | Created| 2/8/2014 |
I'd want to filter it so I get the MIN of Date where the User is "Matt" and the Type is "Opened".
The result set needs to include the ID field and return just the 1 record, so it would look like this:
| ID | User | Type | Date |
------------------------------------
| 1 | Matt | Opened | 1/8/2014 |
I'm struggling with getting past the GROUPBY requirement when selecting the ID field... this seems to ignore MIN of Date and return more than 1 record.
Use TOP and ORDER BY:
select top 1 *
from table
where user = "Matt" and type = "Opened"
order by date asc;
Edit: changed order by from desc to asc as this achieves the MIN effect I'm after.
Another way is by finding the min or max date per user and type then join the result back to the main table
SELECT A.ID,
A.USER,
A.Type,
A.Date
FROM yourtable A
INNER JOIN (SELECT USER,
Type,
Min(Date) Date
FROM yourtable
WHERE USER = "Matt"
AND type = "Opened"
GROUP BY USER,
Type) B
ON A.USER = B.USER
AND A.Type = B.Type
AND A.date = B.Date
you can try using partition functions very easy its gives result for each user and performs better
;WITH cte
AS (
SELECT *
,ROW_NUMBER() OVER (
PARTITION BY [USER]
,type ORDER BY DATE ASC
) rnk
FROM tablea
)
SELECT *
FROM cte
WHERE type = 'opened'
AND rnk = 1