Sql Server Find Next Most Recent Changed Record - sql

In my employee history table I'm trying to find what the salary was and then what it was changed to. Each Salary change inserts a new record because the new salary is considered a new "job" so it has a start and end date attached. I can select all these dates fine but I keep getting duplicates because I can't seem to compare the current record only against its most recent prior record for that employee. (if that makes sense)
I would like the results to be along the lines of:
Employe Name, OldSalary, NewSalary, ChangeDate(EndDate)
Joe 40,000 42,000 01/10/2011
Example data looks like
EmployeeHistId EmpId Name Salary StartDate EndDate
1 45 Joe 40,000.00 01/05/2011 01/10/2011
2 45 Joe 42,000.00 01/11/2011 NULL
3 46 Bob 20,000.00 01/12/2011 NULL

The Swiss army ROW_NUMBER() to the rescue:
with cte as (
select EmployeeHistId
, EmpId
, Name
, Salary
, StartDate
, EndDate
, row_number () over (
partition by EmpId order by StartDate desc) as StartDateRank
from EmployeeHist)
select n.EmpId
, n.Name
, o.Salary as OldDalary
, n.Salary as NewSalary
, o.EndData as ChangeDate
from cte n
join cte o on o.EmpId = n.EmpId
and n.StartDateRank = 1
and o.StartDateRank = 2;
Use outer join to get employees that never got a raise too.
These kind of queries are always tricky because of data purity issues, if StartDate and EndDate overlap for instance.

I assume the StartDate and EndDate will be same for the new job and previous job.
If thats the case try this.
SELECT a.Name AS EmployeeName, b.Salary AS NewSalary a.Salary AS NewSalary, a.StartDate AS ChangeDate
FROM EMPLOYEE A, EMPLOYEE B
WHERE a.EmpID = b.EmpID
AND a.EndDate IS NULL
AND a.StartDate = b.EndDate

You can use the correlated join operator APPLY which can solve these types of challenges easily
select a.name, curr.salary, prev.salary, prev.enddate
from employee e
cross apply ( -- to get the current
select top(1) *
from emphist h
where e.empid = h.empid -- related to the employee
order by startdate desc) curr
outer apply ( -- to get the prior, if any
select top(1) *
from emphist h
where e.empid = h.empid -- related to the employee
and h.EmployeeHistId <> curr.EmployeeHistId -- prevent curr=prev
order by enddate desc) prev -- last ended

Related

Find two values of one Column based on value of another column per ID

I have a sqlite db with one table that called Loan. This table with sample data is here:
Loan Table at sqlfiddle.com
This table contains below Columns:
mainindex | brindx | YearMonth | EmpID | VarNo | Name | LastName | CodepayID | CodepaySH | Lval | Lint | Lrmn
Now, I need a query to show desired result, contain
[empid],[Codepayid],[Lval-1],[Lval-2],[Sum(Lint)],[Lrmn-1],[Lrmn-2],
With this Conditions:
[lVal-1] as Value of Lval Column of each Employee Correspond to their Lowest YearMonth
[lVal-2] as Value of Lval Column of each Employee Correspond to their Highest YearMonth
[Sum(Lint)] Sum of Lint Column for each Employee.
[Lrmn-1] as Value of Lrmn Column of each Emplyee Correspond to that Lowest YearMonth
[Lrmn-2] as Value of Lrmn Column of each Emplyee Correspond to that Highest YearMonth
For example:
select empid, Codepayid, Lval1, Lval2, Sum(Lint), Lrmn1, Lmrn2
from Loan
where CodepayID=649 and EmpID=12450400
group by EmpID
Result:
EmpID
CodepayID
Lval1
Lval2
Sum(Lint)
Lrmn1
Lrmn2
12450400
649
405480
405485
270320
337900
202740
Use FIRST_VALUE() and SUM() window functions:
SELECT DISTINCT EmpID, CodepayID,
FIRST_VALUE(Lval) OVER (PARTITION BY EmpID, CodepayID ORDER BY YearMonth) LVal1,
FIRST_VALUE(Lval) OVER (PARTITION BY EmpID, CodepayID ORDER BY YearMonth DESC) LVal2,
SUM(Lint) OVER (PARTITION BY EmpID, CodepayID) sum_Lint,
FIRST_VALUE(Lrmn) OVER (PARTITION BY EmpID, CodepayID ORDER BY YearMonth) Lrmn1,
FIRST_VALUE(Lrmn) OVER (PARTITION BY EmpID, CodepayID ORDER BY YearMonth DESC) Lrmn2
FROM loan
See the demo.
To get your desired result, first you have to find min and max value of column YearMonth from the table for each employee and use those values as to get respective lval and lrmn for min and max YearMonth.
You can do it either in stored procedure or in single query, depends on your choice. Below is the query which will give your desire output
SELECT l.empid, l.CodepayID, min(a.lval1) as lval1, max(b.lval2) as lval2 ,sum(l.lint) as lint,min(a.lrmn1) as lrmn1,max(b.lrmn2) as lrmn2
FROM loan l inner join
( SELECT empid,CodepayID, lval lval1, Lrmn lrmn1
FROM loan ln
WHERE YEARMonth = (SELECT MIN(YearMonth)
FROM loan
WHERE empid = ln.EmpID and CodepayID = ln.CodepayID
group by empid,CodepayID)
) a on l.empid = a.EmpID and l.CodepayID = a.codepayid
inner join
( SELECT empid, CodepayID, YearMonth,lval lval2, Lrmn lrmn2
FROM loan ln
WHERE YEARMonth = (SELECT MAX(YearMonth)
FROM loan
WHERE empid = ln.EmpID and CodepayID = ln.CodepayID
GROUP BY empid,CodepayID)
) b on l.empid = b.EmpID and l.CodepayID = b.codepayid
GROUP BY l.EmpID,l.Codepayid
The accepted answer is unnecessarily overly complex. Even if you don't want to use window functions, the following uses the same logic as the accepted answer with far less work for the database, completely avoiding correlated sub-queries...
SELECT
l.empid,
l.CodepayID,
min(case when l.yearmonth = s.yearmonth_min then l.lval end) as lval1,
max(case when l.yearmonth = s.yearmonth_max then l.lval end) as lval2,
sum(l.lint) as lint,
min(case when l.yearmonth = s.yearmonth_min then l.lrmn end) as lrmn1,
max(case when l.yearmonth = s.yearmonth_max then l.lrmn end) as lrmn2
FROM
loan l
inner join
(
select
empid,
CodepayID,
min(yearmonth) as yearmonth_min,
max(yearmonth) as yearmonth_max
from
loan
group by
empid,
CodepayID
)
s
on l.empid = s.empid
and l.CodepayID = s.codepayid
group by
l.EmpID,
l.Codepayid

Instead of ROW_NUMBER ORDER BY

In my last post (Below link), I've tried to use the ROW_NUMBER ORDER BY and finally got the required solution. See the following link:
Get Wages Yearly Increment Column Wise Using Sql
Now I am trying to use the following query in Sql server 2000, just for demonstration purpose. I know, ROW_NUMBER ORDER BY can't be used in it. And after some googling, tried to use the following for Sql server 2000: (1st query)
SELECT k.ID, k.[Name], m.Amt,
(SELECT COUNT(*) FROM EmpIncrement l WHERE l.EmpID <= m.EmpID) as RowNum
FROM EmpIncrement m
JOIN Employee k ON m.EmpID = k.ID
And I got this output:
ID Name Amt RowNum
1 John 2000 2
2 Jack 8000 4
1 John 1000 2
2 Jack 4000 4
Similarly when I use the following with ROW_NUMBER ORDER BY, then it shows different output: (2nd query)
SELECT k.ID, k.Name, m.Amt,
ROW_NUMBER() OVER (PARTITION BY EmpID ORDER BY DATEPART(yy,IncrementDate)) as RowNum
FROM EmpIncrement m
JOIN Employee k ON k.ID = m.EmpID
Output:
ID Name Amt RowNum
1 John 1000 1
2 John 2000 2
1 Jack 4000 1
2 Jack 8000 2
So it's noticed that the grouping for the employee ids (RowNum) are different in both the queries where the output of the second query is correct. I would like to know the difference of both the queries output and if the 1st query is equivalent to ROW_NUMBER ORDER BY. Thanks.
Note: I didn't include the table structure and sample data here again. Never mind - You can see the earlier post for that.
To recreate the partition by EmpId, your subquery should have l.EmpId = m.Empid. You really need a unique column or set of columns to unique identify a row for this version to work properly. In an attempt based on the given data, if EmpId, Amt are a unique pair you can use and l.Amt < m.Amt. If you have a surrogateid on the table, that would be better instead of Amt.
select
k.id
, k.[Name]
, m.Amt
, ( select count(*)
from EmpIncrement l
where l.Empid = m.Empid
and l.Amt <= m.Amt
) as RowNum
from EmpIncrement m
inner join Employee k
on m.Empid = k.id
If you have no set of columns to uniquely identify and order the rows, you can use a temporary table with an identity() column.
create table #temp (tmpid int identity(1,1) not null, id int, [Name] varchar(32), Amt int);
insert into #temp (id, [Name], Amt);
select
k.id
, k.[Name]
, m.Amt
from EmpIncrement m
inner join Employee k
on m.Empid = k.id;
select
t.id
, t.[Name]
, t.Amt
, ( select count(*)
from #Temp i
where i.Empid = t.Empid
and i.tmpId <= t.tmpId
) as RowNum
from #temp t

SQL Server : finding consecutive absence counts for students over custom dates

I have a table which stores attendances of students for each day. I need to get students who are consecutively absent for 3 days. However, the dates when attendance is taken is not in a order, some days like nonattendance, holidays, weekends are excluded. The dates when students attended are the dates where records exist in that table .
The data is like
StudentId Date Attendance
-----------------------------------------
178234 1/1/2017 P
178234 5/1/2107 A
178234 6/1/2107 A
178234 11/1/2107 A
178432 1/1/2107 P
178432 5/1/2107 A
178432 6/1/2107 P
178432 11/1/2107 A
In the above case the result should be
StudentId AbsenceStartDate AbsenceEndDate ConsecutiveAbsences
----------------------------------------------------------------------------
178234 5/1/2017 11/1/2017 3
I have tried to implement this solution Calculating Consecutive Absences in SQL However that only worked for dates in order only. Any suggestions will be great, thanks
Oh, you have both absences and presents in the table. You can use the difference of row_numbers() approach:
select studentid, min(date), max(date)
from (select a.*,
row_number() over (partition by studentid order by date) as seqnum,
row_number() over (partition by studentid, attendance order by date) as seqnum_a
from attendance a
) a
where attendance = 'A'
group by studentid, (seqnum - seqnum_a)
having count(*) >= 3;
The difference of row numbers gets consecutive values that are the same. This is a little tricky to understand, but if you run the subquery, you should see how the difference is constant for consecutive absences or presents. You only care about absences, hence the where in the outer query.
try this:
declare #t table (sid int, d date, att char(1))
insert #t (sid,d, att) values
(178234, '1/1/2017','P'),
(178234, '5/1/2017','A'),
(178234, '6/1/2017','A'),
(178234, '11/1/2017','A'),
(178432, '1/1/2017','P'),
(178432, '5/1/2017','A'),
(178432, '6/1/2017','P'),
(178432, '11/1/2017','A')
Select s.sid, Min(s.d) startDt, Max(e.d) endDt, s.att, e.att, count(*)
from #t s join #t e on e.d <=
(select max(d) from #t m
Where sid = s.sid
and d > s.d
and att = 'A'
and not exists
(Select * from #t
where sid = s.sid
and d between s.d and m.d
and att = 'P'))
Where s.att = 'A'
and s.d = (Select Min(d) from #t
Where sid = s.sid
and d < e.d
and att = 'A')
group by s.sid, s.d, s.att, e.att
this is also tricky to explain:
basically, it joins the table to itself using aliases s (for start) and e (for end), where the s-row is the first row in a set of contiguous absences, and the e. rows are all following absences that are before the next date where the stud is present. This will generate a set of all the 'A' that do not have a P row within them. Then the sql groups by the appropriate values to return the earliest and latest date, and the count of rows, in each group.
The last where clause ensures that the s row is the first row in the group.

Get record with min time difference

I have an requirement where I need to get a record with min time difference with current record.
Let us assume that in a table I have insert date, group Id and Id column.
I have selected a record and not want to get another record for which difference between insert date of selected record and another record is min.
I have tried outer apply, but that query takes forever to run.
Query:
select e.id
from B.Emp t
where id = 5
outer apply (
select top 1 *
from B.Emp
where t.group_id = group_id
order by insert_time desc ) e
select * From B.Emp emp
Inner Join
(
select MAX(emp1.insert_time) maxTime, emp1.id From B.Emp emp1 group by emp1.id
) maxDateRec ON maxDateRec.id = emp.id AND maxDateRec.maxTime = emp.insert_time
where emp.id = 5
Try with this second one.
If you indeed want to find closest insert time for the selected record' group, regardless of the direction, then something like this might help:
select t.insert_date, ca.*
from B.Emp t
outer apply (
select top (1) * from B.Emp e
where e.group_id = t.group_id
and e.id != t.id
order by abs(datediff(ms, t.insert_date, e.insert_date))
) ca
where t.Id = 5;
EDIT: You will definitely need to check indexing options, though. For starters, assuming you have a clustered primary key on Id column, this should help:
create index [IX_B_Emp_GroupInsert] on B.Emp (group_id, insert_time);
This Query is not tested.
Please reply me is any error or changes needs to be done in the query.
CREATE TABLE Emp
(
group_Id INT,
ID INT,
date DATETIME
)
INSERT INTO EMP
(group_Id,ID,date)
values
(1,5,'11-Jan-2014'),
(1,5,'12-Jan-2014')
--
select ID,date from Emp t
group by
ID,
date having id = 5
and
date =
(select max(date) from Emp where id = 5)

What is wrong with this sql innerjoin?

Here is my select statement with the innerjoin of two tables,
if not exists(select EmpId from SalaryDetails
where EmpId in (select Emp_Id
from Employee where Desig_Id=#CategoryId))
begin
// some statements here
end
else
begin
SELECT e.Emp_Id, e.Identity_No, e.Emp_Name,
case WHEN e.SalaryBasis=1 THEN 'Weekly'
ELSE 'Monthly' end as SalaryBasis,e.FixedSalary,
(SELECT TOP 1 RemainingAdvance
FROM SalaryDetails
ORDER BY CreatedDate DESC) as Advance
FROM Employee as e inner join Designation as d on e.Desig_Id=d.Desig_Id
INNER JOIN SalaryDetails as S on e.Emp_Id=S.EmpId
End
My results pane,
alt text http://img220.imageshack.us/img220/7774/resultpane.jpg
And My SalaryDetails Table,
alt text http://img28.imageshack.us/img28/770/salarydettable.jpg
EDIT:
My Output must be,
16 CR14 Natarajan Weekly 150.00
354.00 17 cr12333 Pandian Weekly 122.00 0.00
You're not filtering the sub-query (SELECT TOP 1 RemainingAdvance FROM SalaryDetails ORDER BY CreatedDate DESC) on any employee ID, so it's giving you the first record in the entire table when sorted by CreatedDate DESC (which I'm guessing is 354.)
You will probably want to move that table expression into your FROM clause, not your SELECT, include your employee ID, and do a join on that expression.
SELECT
e.Emp_Id,e.Identity_No,e.Emp_Name,case WHEN e.SalaryBasis=1 THEN 'Weekly' ELSE 'Monthly' end as SalaryBasis,e.FixedSalary,
from Employee as e inner join Designation as d on e.Desig_Id=d.Desig_Id
inner join SalaryDetails as S on e.Emp_Id=S.EmpId
inner join
(SELECT EmpID, RemainingAdvance, RANK() OVER (PARTITION BY EmpID ORDER BY CreatedDate DESC) AS SalaryRank FROM SalaryDetails ORDER BY CreatedDate DESC) as Advance ON Advance.EmpID = e.Emp_ID AND Advance.SalaryRank = 1
This is just off the top of my head so may take a bit of tweaking to run correctly. Note also the use of the RANK() function - if you use TOP 1, you're only ever getting the first record of the entire table. What you need is the first record per employee ID.
If this was me I would probably make that table expression a view or even a scalar-valued function taking your employee ID and returning the first RemainingAdvance value, then you could use TOP 1 and filter on the employee ID.
It looks like your join to Designation isn't even used and you're also missing your WHERE clause that you used in the IF statement at the top. I'd also move the subquery down into the join like Andy pointed out. Without having the DB to test against this probably won't be exact but I'd rewrite it to something like;
SELECT e.Emp_Id, e.Identity_No, e.Emp_Name,
case WHEN e.SalaryBasis=1
THEN 'Weekly'
ELSE 'Monthly' end as SalaryBasis,
e.FixedSalary,S.RemainingAdvance as Advance
FROM Employee as e
INNER JOIN (
SELECT TOP 1 EmpId, RemainingAdvance
FROM SalaryDetails
ORDER BY CreatedDate DESC) as S on e.Emp_Id=S.EmpId
WHERE e.Desig_Id=#CategoryId
Andy's suggestion to move the subquery into a view is a good one, much easier to read and probably a lot more efficient if the DB is large.
EDIT: (ANSWER)
(SELECT sd.empid,
sd.remainingadvance,
ROW_NUMBER() OVER (PARTITION BY sd.empid ORDER BY sd.createddate DESC) AS rank
FROM SALARYDETAILS sd
JOIN EMPLOYEE e ON e.emp_id = sd.empid
AND e.desig_id = #CategoryId) s
WHERE s.rank = 1
I edited jay's answer because he came close to my output...