Join these 2 tables to get this specific output? - sql

I am using SQL Server 2014. I am working on a query where I want to extract information from 2 specific tables to create my final output.
An extract of the 2 tables (Rebookings and ResaList) are given below.
Rebookings Table (each CancelledID has its corresponding RebookingID):
CancelledID RebookingID
102 541
250 351
129 800
...
ResaList Table:
ID Property ArrivalDate RN
100 X 2020-05-22 9
102 X 2020-03-05 7
250 D 2020-04-12 10
129 E 2020-03-15 8
351 D 2020-09-23 5
541 X 2020-06-01 7
800 E 2020-07-11 8
...
Here is my desired output:
ID Property ArrivalDate RN RebookingID Rebooking_ArrivalDate Rebooking_RN Tag
102 X 2020-03-05 7 541 2020-06-01 7 Cancelled
250 D 2020-04-12 10 351 2020-09-23 5 Re-booked
129 E 2020-03-15 8 800 2020-07-11 8 Re-booked
This is what I have done so far:
USE [MyDatabase]
select
a.[ID],
a.[Property],
a.[Arrival Date],
a.[RN],
b.[RebookingID],
(CASE WHEN a.[ID] in (SELECT [CancelledID] FROM [Rebookings]) THEN 'Re-booked'
ELSE 'Cancelled'
END) as [Tag]
from [ResaList] a
LEFT JOIN [Rebookings] b on b.[CancelledID] = a.[ID]
where a.[ID] in (SELECT [CancelledID] FROM [Rebookings])
GROUP BY a.[ID], a.[Property], a.[ArrivalDate], b.[RebookingID]
I am stuck at how to bring Rebooking_ArrivalDate and Rebooking_RN into the above output. Any help would be appreciated.

I have tried this in SQL Server.
DECLARE #cancelled Table (cancelledId int, rebookingId int)
INSERT into #cancelled values
(102 , 541 ),
( 250 , 351 ),
( 129 , 800 );
DECLARE #ResaList TABLE(Id int, property CHAR(1), ArrivalDate date, RN int)
INSERT INTO #ResaList values
(100 ,'X','2020-05-22', 9),
(102 ,'X','2020-03-05', 7),
(250 ,'D','2020-04-12', 10),
(129 ,'E','2020-03-15', 8),
(351 ,'D','2020-09-23', 5),
(541 ,'X','2020-06-01', 7),
(800 ,'E','2020-07-11', 8);
SELECT r.Id, r.Property, r.ArrivalDate, r.rn,
rebooking.RebookingId
, rebooking.Rebooking_ArrivalDate
, rebooking.Rebooking_RN
, CASE WHEN rebooking.RebookingId IS NOT NULL THEN 'Re-booked' ELSE 'cancelled' end as tag
FROM #ResaList as r
OUTER APPLY (SELECT rc.Id, rc.ArrivalDate, rc.RN
FROM #cancelled as c
INNER JOIN #ResaList AS rc
ON rc.Id = c.rebookingId
WHERE c.cancelledId = r.Id) as rebooking(RebookingId, Rebooking_ArrivalDate, Rebooking_RN)
+-----+----------+-------------+----+-------------+-----------------------+--------------+-----------+
| Id | Property | ArrivalDate | rn | RebookingId | Rebooking_ArrivalDate | Rebooking_RN | tag |
+-----+----------+-------------+----+-------------+-----------------------+--------------+-----------+
| 100 | X | 2020-05-22 | 9 | NULL | NULL | NULL | cancelled |
| 102 | X | 2020-03-05 | 7 | 541 | 2020-06-01 | 7 | Re-booked |
| 250 | D | 2020-04-12 | 10 | 351 | 2020-09-23 | 5 | Re-booked |
| 129 | E | 2020-03-15 | 8 | 800 | 2020-07-11 | 8 | Re-booked |
| 351 | D | 2020-09-23 | 5 | NULL | NULL | NULL | cancelled |
| 541 | X | 2020-06-01 | 7 | NULL | NULL | NULL | cancelled |
| 800 | E | 2020-07-11 | 8 | NULL | NULL | NULL | cancelled |
+-----+----------+-------------+----+-------------+-----------------------+--------------+-----------+

Using select in an inner query to find out existing records is very bad way to do this. On larger tables you will suffer a lot.
Here is a compact and efficient way-
select
a.ID,
a.Property,
a.Arrival Date,
a.RN,
b.RebookingID,
b.Rebooking_ArrivalDate,
b.Rebooking_RN,
coalesce(b.tag,'Cancelled')as tag
from ResaList a left join
(
select *, 'Re-booked' as tag from Rebookings
) b
on a.ID=b.CancelledID

You can select the same table twice by using aliases.
SELECT
*
,(CASE WHEN b.[ID] IS NOT NULL THEN 'Re-booked'
ELSE 'Cancelled'
END) as [Tag]
FROM ResaList AS a
LEFT JOIN Rebookings ON a.ID = Rebookings.CancelledID
LEFT JOIN ResaList AS b ON b.ID = Rebookings.RebookingID

Related

SQL - get default NULL value if data is not available

I got a table data as follows:
ID | TYPE_ID | CREATED_DT | ROW_NUM
=====================================
123 | 485 | 2019-08-31 | 1
123 | 485 | 2019-05-31 | 2
123 | 485 | 2019-02-28 | 3
123 | 485 | 2018-11-30 | 4
123 | 485 | 2018-08-31 | 5
123 | 485 | 2018-05-31 | 6
123 | 487 | 2019-05-31 | 1
123 | 487 | 2018-05-31 | 2
I would like to select 6 ROW_NUMs for each TYPE_ID, if there is missing data I need to return NULL value for CREATED_DT and the final result set should look like:
ID | TYPE_ID | CREATED_DT | ROW_NUM
=====================================
123 | 485 | 2019-08-31 | 1
123 | 485 | 2019-05-31 | 2
123 | 485 | 2019-02-28 | 3
123 | 485 | 2018-11-30 | 4
123 | 485 | 2018-08-31 | 5
123 | 485 | 2018-05-31 | 6
123 | 487 | 2019-05-31 | 1
123 | 487 | 2018-05-31 | 2
123 | 487 | NULL | 3
123 | 487 | NULL | 4
123 | 487 | NULL | 5
123 | 487 | NULL | 6
Query:
SELECT
A.*
FROM TBL AS A
WHERE A.ROW_NUM <= 6
UNION ALL
SELECT
B.*
FROM TBL AS B
WHERE B.ROW_NUM NOT IN (SELECT ROW_NUM FROM TBL)
AND B.ROW_NUM <= 6
I tried using UNION ALL and ISNULL to backfill data that is not available but it is still giving me the existing data but not the expected result. I think this can be done in a easy way by using CTE but not sure how to get this working. Can any help me in this regard.
Assuming Row_Num has at least record has at least all 6 rows... 1,2,3,4,5,6 in tbl and no fractions or 0 or negative numbers...
we get a list of all the distinct type ID's and IDs. (Alias A)
Then we get a distinct list of row numbers less than 7 (giving us 6 records)
we cross join these to ensure each ID & Type_ID has all 6 rows.
we then left join back in the base set (tbl) to get all the needed dates; where such dates exist. As we're using left join the rows w/o a date will still persist.
.
SELECT A.ID, A.Type_ID, C.Created_DT, B.Row_Num
FROM (SELECT DISTINCT ID, Type_ID FROM tbl) A
CROSS JOIN (SELECT distinct row_num from tbl where Row_num < 7) B
LEFT JOIN tbl C
on C.ID = A.ID
and C.Type_ID = A.Type_ID
and C.Row_num = B.Row_num
Giving us:
+----+-----+---------+------------+---------+
| | ID | Type_ID | Created_DT | Row_Num |
+----+-----+---------+------------+---------+
| 1 | 123 | 485 | 2019-08-31 | 1 |
| 2 | 123 | 485 | 2019-05-31 | 2 |
| 3 | 123 | 485 | 2019-02-28 | 3 |
| 4 | 123 | 485 | 2018-11-30 | 4 |
| 5 | 123 | 485 | 2018-08-31 | 5 |
| 6 | 123 | 485 | 2018-05-31 | 6 |
| 7 | 123 | 487 | 2019-05-31 | 1 |
| 8 | 123 | 487 | 2018-05-31 | 2 |
| 9 | 123 | 487 | NULL | 3 |
| 10 | 123 | 487 | NULL | 4 |
| 11 | 123 | 487 | NULL | 5 |
| 12 | 123 | 487 | NULL | 6 |
+----+-----+---------+------------+---------+
Rex Tester: Example
This also assumes that you'd want 1-6 for each combination of type_id and ID. If ID's irrelevant, then simply exclude it from the join criteria. I included it as it's an ID and seems like it's part of a key.
Please reference the other answer for how you can do this using a CROSS JOIN - which is pretty neat. Alternatively, we can utilize the programming logic available in MS-SQL to achieve the desired results. The following approach stores distinct ID and TYPE_ID combinations inside a SQL cursor. Then it iterates through the cursor entries to ensure the appropriate amount of data is stored into a temp table. Finally, the SELECT is performed on the temp table and the cursor is closed. Here is a proof of concept that I validated on https://rextester.com/l/sql_server_online_compiler.
-- Create schema for testing
CREATE TABLE Test (
ID INT,
TYPE_ID INT,
CREATED_DT DATE
)
-- Populate data
INSERT INTO Test(ID, TYPE_ID, CREATED_DT)
VALUES
(123,485,'2019-08-31')
,(123,485,'2019-05-31')
,(123,485,'2019-02-28')
,(123,485,'2018-11-30')
,(123,485,'2018-08-31')
,(123,485,'2018-05-31')
,(123,487,'2019-05-31')
,(123,487,'2018-05-31');
-- Create TempTable for output
CREATE TABLE #OutputTable (
ID INT,
TYPE_ID INT,
CREATED_DT DATE,
ROW_NUM INT
)
-- Declare local variables
DECLARE #tempID INT, #tempType INT;
-- Create cursor to iterate ID and TYPE_ID
DECLARE mycursor CURSOR FOR (
SELECT DISTINCT ID, TYPE_ID FROM Test
);
OPEN mycursor
-- Populate cursor
FETCH NEXT FROM mycursor
INTO #tempID, #tempType;
-- Loop
WHILE ##FETCH_STATUS = 0
BEGIN
DECLARE #count INT = (SELECT COUNT(*) FROM Test WHERE ID = #tempID AND TYPE_ID = #tempType);
INSERT INTO #OutputTable (ID, TYPE_ID, CREATED_DT, ROW_NUM)
SELECT ID, TYPE_ID, CREATED_DT, ROW_NUMBER() OVER(ORDER BY ID ASC)
FROM Test
WHERE ID = #tempID AND TYPE_ID = #tempType;
WHILE #count < 6
BEGIN
SET #count = #count + 1
INSERT INTO #OutputTable
VALUES (#tempID, #tempType, NULL, #count);
END
FETCH NEXT FROM mycursor
INTO #tempID, #tempType;
END
-- Close cursor
CLOSE mycursor;
-- View results
SELECT * FROM #OutputTable;
Note, if you have an instance where a unique combination of ID and TYPE_ID are grouped more than 6 times, the additional groupings will be included in your final result. If you must only show exactly 6 groupings, you can change that part of the query to SELECT TOP 6 ....
create a cte with a series and cross apply it
CREATE TABLE Test (
ID INT,
TYPE_ID INT,
CREATED_DT DATE
)
INSERT INTO Test(ID, TYPE_ID, CREATED_DT)
VALUES
(123,485,'2019-08-31')
,(123,485,'2019-05-31')
,(123,485,'2019-02-28')
,(123,485,'2018-11-30')
,(123,485,'2018-08-31')
,(123,485,'2018-05-31')
,(123,487,'2019-05-31')
,(123,487,'2018-05-31')
;
WITH n(n) AS
(
SELECT 1
UNION ALL
SELECT n+1 FROM n WHERE n < 6
)
,id_n as (
SELECT
DISTINCT
ID
,TYPE_ID
,n
FROM
Test
cross apply n
)
SELECT
id_n.ID
,id_n.TYPE_ID
,test.CREATED_DT
,id_n.n row_num
FROM
id_n
left join
(
select
ID
,TYPE_ID
,CREATED_DT
,ROW_NUMBER() over(partition by id, type_id order by created_dt) rn
from
Test
) Test on Test.ID = id_n.ID and Test.TYPE_ID = id_n.TYPE_ID and id_n.n = test.rn
drop table Test

SQL server 2008: join 3 tables and select last entered record from child table against each parent record

I have following 3 tables and last entered reasoncode from Reasons table against each claimno in claims table.
Reasons:
Rid |chargeid| enterydate user reasoncode
-----|--------|-------------|--------|----------
1 | 210 | 04/03/2018 | john | 99
2 | 212 | 05/03/2018 | juliet | 24
5 | 212 | 26/12/2018 | umar | 55
3 | 212 | 07/03/2018 | borat | 30
4 | 211 | 03/03/2018 | Juliet | 20
6 | 213 | 03/03/2018 | borat | 50
7 | 213 | 24/12/2018 | umer | 60
8 | 214 | 01/01/2019 | john | 70
Charges:
chargeid |claim# | amount
---------|-------|---------
210 | 1 | 10
211 | 1 | 24.2
212 | 2 | 5.45
213 | 2 | 76.30
214 | 1 | 2.10
Claims:
claimno | Code | Code
--------|-------|------
1 | AH22 | AH22
2 | BB32 | BB32
Expected result would be like this:
claimno | enterydate | user | reasoncode
--------|-------------|--------|-----------
1 | 01/01/2019 | john | 70
2 | 26/12/2018 | umer | 55
I have applied many solutions but no luck. Following is the latest solution I was trying using SQL Server 2008 but still got incorrect result.
With x As
(
select r.chargeid,r.enterydate,ch.claimno from charges ch
join (select chargeid,max(enterydate) enterydate,user from Reasons group by chargeid) r on r.chargeid = ch.chargeid
)
select x.*,r1.user, r1.reasoncode from x
left outer join Reasons r1 on r1.chargeid = x.chargeid and r1.enterydate = x.enterydate
--group by x.claimno
Is this what you want?
select claimno, enterydate, user, reasoncode
from (select c.claimno, r.*,
row_number() over (partition by c.claimno order by r.entrydate desc) as seqnum
from charges c join
reasons r
on c.chargeid = r.chargeid
) cr
where seqnum = 1;
You can try using row_number()
select * from
(
select r.chargeid,r.enterydate,ch.claimno,user,reasoncode,
row_number() over(partition by ch.claimno order by r1.enterydate desc) as rn
from charges ch left outer join Reasons r1 on r1.chargeid = ch.chargeid
)A where rn=1

Parse 2 records into one and join tables

Reviewed
+----+--------------------- +------+-------+
| id | POS | Review Date| Role |app ID |
+----+----------------------+------+-------+
| 1 | A | 2018-12-03 | E | 170 |
| 2 | A | 2018-12-04 | P | 170 |
| 3 | B | 2018-12-01 | E | 180 |
| 4 | B | 2018-12-05 | P | 180 |
| 5 | B | 2018-12-05 | X | 190 |
| 6 | B | 2018-12-05 | w | 195 |
| 7 | C | 2018-12-06 | w | 170 |
+----+--------+-------------+------+-------+
Call_Center
+----+------+-----+------+
| id | POS | Emp| yrs |
+----+------+-----+------+
| 1 | A | F | 4 |
| 2 | B | F | 3 |
| 3 | C | P | 3 |
+----+------+-----+------+
Need Call Center joined; also, forgot that there can be many roles(x, w, u, t), but just interested in combining reviewed date for Roles E and P.
Need to return one record for each unique POS, including both review_dates for ONLY roles E and P; only app id 170; EMP and yrs from call_center, joining on POS
For example:
POS Review_Date(role E) Review_Date(role P) EMP Yrs app ID
A 2018-12-03 2018-12-04 F 4 170
See updated tables
Oracle syntax please
You could use case when to filter the dates matching the desired role, and combine that with a group by (and max):
select Pos,
max(case Role when 'E' then review_date end) review_E,
max(case Role when 'P' then review_date end) review_P
from reviewed
group by Pos
You can also use the pivot clause available since Oracle 11g:
select *
from (
select Pos, Role, review_date
from reviewed
)
pivot
(
max(review_date)
for Role
in ('E', 'P')
);
I would use two CTEs:
with
e as (
select * from reviewed where role = 'E'
),
p as (
select * from reviewed where role = 'P'
)
select
coalesce(e.pos, p.pos) as pos,
e.review_date as review_date_role_e,
p.review_date as review_date_role_p
from e
full outer join p on e.pos = p.pos
order by coalesce(e.pos, p.pos)
select A.POS, A.review_date as review_date1, a.ROLE as ROLE1, B.review_date as
review_date2, B.ROLE as ROLE2
from reviewed a
left join reviewed b
on a.POS = b.POS
and a.review_date < b.review_date
where B.ROLE is not null
;

Adding Missing Days in Data -SQL

I have a table that's only showing weekday values. It's grabbing these from a file that's imported only on the weekdays. I'm needing to also add in the weekend (or holidays) with the previously known day's value. I have asked this question when I was needing it to be used in MS Access. I'm now moving this database to SQL Server.
If you're wanting to see what worked for me in Access, you're more than welcome to check out the link.
I have attempted to adapt the MS Access SQL to SQL Server with:
SELECT a1.IDNbr, a1.Balance, CONVERT(int, DAY(a1.BalDate)) + 3
FROM tblID a1 INNER JOIN tblID a2 ON (CONVERT(int, DAY(a1.BalDate)) + 4 = a2.BalDate) AND (a1.IDNbr = a2.IDNbr)
WHERE NOT EXISTS (
SELECT *
FROM tblID a3
WHERE a3.IDNbr = a1.IDNbr AND a3.BalDate = CONVERT(int, DAY(a1.BalDate)) + 3) AND (DATEPART(W, a1.BalDate) = 6
);
However, I'm getting the Error:
Msg 206, Level 16, State 2, Line 4
Operand type clash: date is incompatible with int
Question: How can I get this query (which I will be turning into an INSERT statement) to show all the missing days within my data and to assign the value of the last known day to the missing days?
Data that I have(starting on Friday):
+-------------------------------------+
|ID | IDNbr | Balance | BalDate |
+-------------------------------------+
|001| 91 | 529 | 1/5/2018 |
|002| 87 | 654 | 1/5/2018 |
|003| 45 | 258 | 1/5/2018 |
|004| 91 | 611 | 1/8/2018 |
|005| 87 | 753 | 1/8/2018 |
|006| 45 | 357 | 1/8/2018 |
|...| .. | ... | ........ |
+-------------------------------------+
'BalDate then skips past 1/6/2018 and 1/7/2018 to 1/8/2018
Data that I'm needing:
+-------------------------------------+
|ID | IDNbr | Balance | BalDate |
+-------------------------------------+
|001| 91 | 529 | 1/5/2018 |
|002| 87 | 654 | 1/5/2018 |
|003| 45 | 258 | 1/5/2018 |
|004| 91 | 529 | 1/6/2018 |
|005| 87 | 654 | 1/6/2018 |
|006| 45 | 258 | 1/6/2018 |
|007| 91 | 529 | 1/7/2018 |
|008| 87 | 654 | 1/7/2018 |
|009| 45 | 258 | 1/7/2018 |
|010| 91 | 611 | 1/8/2018 |
|011| 87 | 753 | 1/8/2018 |
|012| 45 | 357 | 1/8/2018 |
|...| .. | ... | ........ |
+-------------------------------------+
'I'm needing it to add the Saturday(1/6/2018) and Sunday(1/7/2018) before continuing on to 1/8/2018
Any help would be appreciated. Thank you in advance!
If there are downvotes, I ask that you please explain why you are downvoting so I may correct it!
Ok, you're going to need the CalTable() function from Bernd's answer. We're going to use it to create a list of all calendar dates between the MIN(BalDate) and the MAX(BalDate) in tblID. We're also going to CROSS JOIN that with the list of DISTINCT IDNbr values, which I assume is the PK of tblID.
Let's create some sample data.
CREATE TABLE #tblID (ID VARCHAR(3), IDNbr INT, Balance INT, BalDate DATE)
INSERT INTO #tblID
(
ID
,IDNbr
,Balance
,BalDate
)
VALUES
('001',91,529,'1/5/2018'),
('002',87,654,'1/5/2018'),
('003',45,258,'1/5/2018'),
('004',91,611,'1/8/2018'),
('005',87,753,'1/8/2018'),
('006',45,357,'1/8/2018')
Next, we're going to INSERT new records into #tblID for the missing days. The magic here is in the LAG() function, which can looks at a previous row's data. We give it an expression for the offset value, based on the difference between missing date and the last date with data.
;WITH IDs AS
(
SELECT DISTINCT
IDNbr
FROM #tblID
)
,IDDates AS
(
SELECT
BalDate = c.[Date]
,i.IDNbr
FROM [CalTable]((SELECT MIN(BalDate) FROM #tblID), (SELECT MAX(BalDate) FROM #tblID)) c
CROSS APPLY IDs i
)
,FullResults AS
(
SELECT
i.BalDate
,i.IDNbr
,Balance = CASE WHEN t.Balance IS NOT NULL THEN t.Balance
ELSE LAG(t.Balance,
DATEDIFF(
DAY
,(SELECT MAX(t1.BalDate) FROM #tblID t1 WHERE t1.IDNbr = i.IDNbr AND t1.BalDate <= i.BalDate GROUP BY t1.IDNbr)
,i.BalDate
)
) OVER (PARTITION BY i.IDNbr ORDER BY i.BalDate ASC)
END
FROM IDDates i
LEFT JOIN #tblID t ON t.BalDate = i.BalDate AND t.IDNbr = i.IDNbr
)
INSERT INTO #tblID
(
IDNbr
,Balance
,BalDate
)
SELECT
f.IDNbr
,f.Balance
,f.BalDate
FROM FullResults f
LEFT JOIN #tblID t ON t.IDNbr = f.IDNbr AND t.BalDate = f.BalDate
WHERE t.IDNbr IS NULL
At this point, if we didn't care about the ID field, which appears to be a 3-character string representation of the row number, we'd be good. However, while I don't think it's a good practice to use a string in this manner, I'm also not one to comment on someone else's business requirements that I am not privy to.
So let's assume we have to update the ID field to match the expected output. We can do that like this:
;WITH IDUpdate AS
(
SELECT
ID = RIGHT('000' + CAST(ROW_NUMBER() OVER (ORDER BY BalDate ASC, IDNbr DESC) AS VARCHAR), 3)
,t.IDNbr
,t.Balance
,t.BalDate
FROM #tblID t
)
UPDATE t
SET t.ID = i.ID
FROM #tblID t
INNER JOIN IDUpdate i ON i.IDNbr = t.IDNbr AND i.BalDate = t.BalDate
Now if you query your updated table, you'll get the following:
SELECT
ID
,IDNbr
,Balance
,BalDate
FROM #tblID
ORDER BY BalDate ASC, IDNbr DESC
Output:
ID | IDNbr | Balance | BalDate
------------------------------
001 | 91 | 529 | 2018-01-05
002 | 87 | 654 | 2018-01-05
003 | 45 | 258 | 2018-01-05
004 | 91 | 529 | 2018-01-06
005 | 87 | 654 | 2018-01-06
006 | 45 | 258 | 2018-01-06
007 | 91 | 529 | 2018-01-07
008 | 87 | 654 | 2018-01-07
009 | 45 | 258 | 2018-01-07
010 | 91 | 611 | 2018-01-08
011 | 87 | 753 | 2018-01-08
012 | 45 | 357 | 2018-01-08
Here is a samples for the linked function:
create FUNCTION [dbo].[CalTable]
(
#startDate date,
#endDate date
)
RETURNS
#calender TABLE
(
[Date] date not null primary key CLUSTERED,
isMondayToFriday bit not null
)
AS
BEGIN
declare #currentday date = #startDate;
declare #isMondayToFriday bit;
while (#currentday<=#endDate)
begin
-- respect DATEFIRST depending on language settings
if (DATEPART(dw, #currentday)+##DATEFIRST-2)%7+1>5
set #isMondayToFriday = 0
else
set #isMondayToFriday = 1
insert into #calender values (#currentday, #isMondayToFriday);
set #currentday = DATEADD(D, 1, #currentday);
end
RETURN
END
GO
select * from [CalTable]({d'2018-01-01'}, {d'2018-02-03'});
use this for find the gaps.

Cleaning up duplicate chronological values

I am trying to clean up some chronological data to remove duplicate chronological data.
Example Table:
+--------+------------+----------------+
| emp_id | department | effective_date |
+--------+------------+----------------+
| 1 | 50 | 2015-04-01 |
| 1 | 50 | 2015-05-22 |
| 1 | null | 2015-07-04 |
| 1 | null | 2015-07-24 |
| 1 | null | 2015-07-30 |
| 1 | 50 | 2015-09-07 |
| 1 | 50 | 2016-01-16 |
| 1 | null | 2016-04-23 |
| 2 | 60 | 2015-01-20 |
| 2 | 60 | 2015-11-22 |
| 2 | 60 | 2016-07-20 |
| 3 | 50 | 2015-04-02 |
| 3 | 50 | 2015-07-15 |
| 3 | 60 | 2016-01-25 |
+--------+------------+----------------+
As you can see, the same individual with the same department may have the same department but multiple effective_dates. I want to clean this up with a query to only have the first date for each department change. However, I don't want to remove instances where someone went from department 50 to null then back to 50, as those are actual changes in position.
Example Output:
+--------+------------+----------------+
| emp_id | department | effective_date |
+--------+------------+----------------+
| 1 | 50 | 2015-04-01 |
| 1 | null | 2015-07-04 |
| 1 | 50 | 2015-09-07 |
| 1 | null | 2016-04-23 |
| 2 | 60 | 2015-01-20 |
| 3 | 50 | 2015-04-02 |
| 3 | 60 | 2016-01-25 |
+--------+------------+----------------+
How can I achieve this?
My solution is
DECLARE #myTable TABLE (emp_id INT, department INT, effective_date DATE);
INSERT INTO #myTable VALUES
(1, 50 , '2015-04-01'),
(1, 50 , '2015-05-22'),
(1, null, '2015-07-04'),
(1, null, '2015-07-24'),
(1, null, '2015-07-30'),
(1, 50 , '2015-09-07'),
(1, 50 , '2016-01-16'),
(1, null, '2016-04-23'),
(2, 60 , '2015-01-20'),
(2, 60 , '2015-11-22'),
(2, 60 , '2016-07-20'),
(3, 50 , '2015-04-02'),
(3, 50 , '2015-07-15'),
(3, 60 , '2016-01-25')
;WITH T AS (
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY emp_id ORDER BY effective_date)
FROM #myTable
)
SELECT T1.emp_id, T1.department, T1.effective_date
FROM
T T1
LEFT JOIN T T2 ON T1.emp_id = T2.emp_id AND T1.RN -1 = T2.RN
WHERE (CASE WHEN ISNULL(T1.department,'') = ISNULL(T2.department,'') THEN 1 ELSE 0 END) = 0
ORDER BY T1.emp_id, T1.RN
Result:
emp_id department effective_date
----------- ----------- --------------
1 50 2015-04-01
1 NULL 2015-07-04
1 50 2015-09-07
1 NULL 2016-04-23
2 60 2015-01-20
3 50 2015-04-02
3 60 2016-01-25
(7 row(s) affected)
For delete the duplicate values:
;WITH T AS (
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY emp_id ORDER BY effective_date)
FROM #myTable
)
DELETE T1
FROM
T T1
LEFT JOIN T T2 ON T1.emp_id = T2.emp_id AND T1.RN -1 = T2.RN
WHERE ( CASE
WHEN ISNULL(T1.department,'') <> ISNULL(T2.department,'') THEN 1
ELSE 0 END ) = 0
An alternative for where clause
WHERE ( CASE WHEN T1.department <> T2.department
OR (T1.department IS NULL AND T2.department IS NOT NULL)
OR (T2.department IS NULL AND T1.department IS NOT NULL)
THEN 1 ELSE 0 END ) = 0
This was tougher than expected:
declare #temp as table (emp_id int, department int,effective_date date)
insert into #temp
values
(1,50,'2015-04-01')
, (1,50,'2015-05-22')
, (1, null ,'2015-07-04')
, (1, null ,'2015-07-24')
, (1, null ,'2015-07-30')
, (1,50,'2015-09-07')
, (1,50,'2016-01-16')
, (1, null ,'2016-04-23')
, (2,60,'2015-01-20')
, (2,60,'2015-11-22')
, (2,60,'2016-07-20')
, (3,50,'2015-04-02')
, (3,50,'2015-07-15')
, (3,60,'2016-01-25')
;with cte as
(
--Please not I am changing null to -1 for comparison
select emp_id,isnull(department,-1) department,effective_date
,row_number() over (partition by emp_id order by effective_date) rn
from #temp
)
,cte2 as
(
--Compare to next record
select cte.*
,ctelast.emp_id cte2Emp
,ctelast.department cte2dept
,ctelast.effective_date cte2ED
,isSame = case when cte.department=ctelast.department then 1 else 0 end
from cte
join cte ctelast
on cte.emp_id=ctelast.emp_id and cte.rn = ctelast.rn-1
)
/*
Result of above:
emp_id department effective_date rn cte2Emp cte2dept cte2ED isSame
1 50 2015-04-01 1 1 50 2015-05-22 1
1 50 2015-05-22 2 1 -1 2015-07-04 0
1 -1 2015-07-04 3 1 -1 2015-07-24 1
1 -1 2015-07-24 4 1 -1 2015-07-30 1
1 -1 2015-07-30 5 1 50 2015-09-07 0
1 50 2015-09-07 6 1 50 2016-01-16 1
1 50 2016-01-16 7 1 -1 2016-04-23 0
2 60 2015-01-20 1 2 60 2015-11-22 1
2 60 2015-11-22 2 2 60 2016-07-20 1
3 50 2015-04-02 1 3 50 2015-07-15 1
3 50 2015-07-15 2 3 60 2016-01-25 0
*/
--Now you want both the first record and then any changes
select emp_id,department,effective_date from cte2 where rn=1
union all
select cte2emp,cte2dept,cte2.cte2ED from cte2 where isSame=0
order by 1,3
/*
result:
emp_id department effective_date
1 50 2015-04-01
1 -1 2015-07-04
1 50 2015-09-07
1 -1 2016-04-23
2 60 2015-01-20
3 50 2015-04-02
3 60 2016-01-25
*/