Select latest and 2nd latest date rows per user - sql

I have the following query to select rows where the LAST_UPDATE_DATE field is getting records that have a date value greater than or equal to the last 7 days, which works great.
SELECT 'NEW ROW' AS 'ROW_TYPE', A.EMPLID, B.FIRST_NAME, B.LAST_NAME,
A.BANK_CD, A.ACCOUNT_NUM, ACCOUNT_TYPE, PRIORITY, A.LAST_UPDATE_DATE
FROM PS_DIRECT_DEPOSIT D
INNER JOIN PS_DIR_DEP_DISTRIB A ON A.EMPLID = D.EMPLID AND A.EFFDT = D.EFFDT
INNER JOIN PS_EMPLOYEES B ON B.EMPLID = A.EMPLID
WHERE
B.EMPL_STATUS NOT IN ('T','R','D')
AND ((A.DEPOSIT_TYPE = 'P' AND A.AMOUNT_PCT = 100)
OR A.PRIORITY = 999
OR A.DEPOSIT_TYPE = 'B')
AND A.EFFDT = (SELECT MAX(A1.EFFDT)
FROM PS_DIR_DEP_DISTRIB A1
WHERE A1.EMPLID = A.EMPLID
AND A1.EFFDT <= GETDATE())
AND D.EFF_STATUS = 'A'
AND D.EFFDT = (SELECT MAX(D1.EFFDT)
FROM PS_DIRECT_DEPOSIT D1
WHERE D1.EMPLID = D.EMPLID
AND D1.EFFDT <= GETDATE())
AND A.LAST_UPDATE_DATE >= GETDATE() - 7
What I would like to add onto this is to also add the previous (2nd MAX) row per EMPLID, so that I can output the 'old' row (that was prior to the last update the latest row meeting above criteria), along with the new row that I already am outputting in the query.
ROW_TYPE EMPLID FIRST_NAME LAST_NAME BANK_CD ACCOUNT_NUM ACCOUNT_TYPE PRIORITY LAST_UPDATE_DATE
NEW ROW 12345 JOHN SMITH 123548999 45234879 C 999 2019-03-06 00:00:00.000
OLD ROW 12345 JOHN SMITH 214080046 92178616 C 999 2018-10-24 00:00:00.000
NEW ROW 56399 CHARLES MASTER 785816167 84314314 C 999 2019-03-07 00:00:00.000
OLD ROW 56399 CHARLES MASTER 345761227 547352 C 999 2017-05-16 00:00:00.000
So the EMPLID would be ordered by NEW ROW, followed by OLD ROW as shown above. In this example the 'NEW ROW' is getting the record that is within the past 7 days, as indicated by the LAST_UPDATE_DATE.
I would like to get feedback on how to modify the query so I can also get the 'old' row (which is the max row that is less than the 'NEW' row retrieved above).

It was a slow day for crime in Gotham, so I gave this a whirl. Might work.
This is unlikely to work right out of the box, though, but it should get you started.
Your LAST_UPDATE_DATE column is on the table PS_DIR_DEP_DISTRIB, so we'll start there. First, you want to identify all of the records that were updated in the last 7 days because those are the only ones you're interested in. Throughout this, I'm assuming, and I'm probably wrong, that the natural key for the table consists of EMPLID, BANK_CD, and ACCOUNT_NUM. You'll want to sub in the actual natural key for those columns in a few places. That said, the date limiter looks something like this:
SELECT
EMPLID
,BANK_CD
,ACCOUNT_NUM
FROM
PS_DIR_DEP_DISTRIB AS limit
WHERE
limit.LAST_UPDATE_DATE >= DATEADD(DAY, -7, CAST(GETDATE() AS DATE))
AND
limit.LAST_UPDATE_DATE <= CAST(GETDATE() AS DATE)
Now we'll use that as a correlated sub-query in a WHERE EXISTS clause that we'll correlate back to the base table to limit ourselves to records with natural key values that were updated in the last week. I altered the SELECT list to just SELECT 1, which is typical verbiage for a correlated sub, since it stops looking for a match when it finds one (1), and doesn't actually return any values at all.
Additionally, since we're filtering this record set anyway, I moved all the other WHERE clause filters for this table into this (soon to be) sub-query.
Finally, in the SELECT portion, I added a DENSE_RANK to force order the records. We' use the DENSE_RANK value later to filter off only the first (N) records of interest.
So that leaves us with this:
SELECT
EMPLID
,BANK_CD
,ACCOUNT_NUM
--,ACCOUNT_TYPE --Might belong here. Can't tell without table alias in original SELECT
,PRIORITY
,EFFDT
,LAST_UPDATE_DATE
,DEPOSIT_TYPE
,AMOUNT_PCT
,DENSE_RANK() OVER (PARTITION BY --Add actual natural key columns here...
EMPLID
ORDER BY
LAST_UPDATE_DATE DESC
) AS RowNum
FROM
PS_DIR_DEP_DISTRIB AS sdist
WHERE
EXISTS
(
-- Get the set of records that were last updated in the last 7 days.
-- Correlate to the outer query so it only returns records related to this subset.
-- This uses a correlated subquery. A JOIN will work, too. Try both, pick the faster one.
-- Something like this, using the actual natural key columns in the WHERE
SELECT
1
FROM
PS_DIR_DEP_DISTRIB AS limit
WHERE
--The first two define the date range.
limit.LAST_UPDATE_DATE >= DATEADD(DAY, -7, CAST(GETDATE() AS DATE))
AND limit.LAST_UPDATE_DATE <= CAST(GETDATE() AS DATE)
AND
--And these are the correlations to the outer query.
limit.EMPLID = sdist.EMPLID
AND limit.BANK_CD = sdist.BANK_CD
AND limit.ACCOUNT_NUM = sdist.ACCOUNT_NUM
)
AND
(
dist.DEPOSIT_TYPE = 'P'
AND dist.AMOUNT_PCT = 100
)
OR dist.PRIORITY = 999
OR dist.DEPOSIT_TYPE = 'B'
Replace the original INNER JOIN to PS_DIR_DEP_DISTRIB with that query. In the SELECT list, the first hard-coded value is now dependent on the RowNum value, so that's a CASE expression now. In the WHERE clause, the dates are all driven by the subquery, so they're gone, several were folded into the subquery, and we're adding WHERE dist.RowNum <= 2 to bring back the top 2 records.
(I also replaced all the table aliases so I could keep track of what I was looking at.)
SELECT
CASE dist.RowNum
WHEN 1 THEN 'NEW ROW'
ELSE 'OLD ROW'
END AS ROW_TYPE
,dist.EMPLID
,emp.FIRST_NAME
,emp.LAST_NAME
,dist.BANK_CD
,dist.ACCOUNT_NUM
,ACCOUNT_TYPE
,dist.PRIORITY
,dist.LAST_UPDATE_DATE
FROM
PS_DIRECT_DEPOSIT AS dd
INNER JOIN
(
SELECT
EMPLID
,BANK_CD
,ACCOUNT_NUM
--,ACCOUNT_TYPE --Might belong here. Can't tell without table alias in original SELECT
,PRIORITY
,EFFDT
,LAST_UPDATE_DATE
,DEPOSIT_TYPE
,AMOUNT_PCT
,DENSE_RANK() OVER (PARTITION BY --Add actual natural key columns here...
EMPLID
ORDER BY
LAST_UPDATE_DATE DESC
) AS RowNum
FROM
PS_DIR_DEP_DISTRIB AS sdist
WHERE
EXISTS
(
-- Get the set of records that were last updated in the last 7 days.
-- Correlate to the outer query so it only returns records related to this subset.
-- This uses a correlated subquery. A JOIN will work, too. Try both, pick the faster one.
-- Something like this, using the actual natural key columns in the WHERE
SELECT
1
FROM
PS_DIR_DEP_DISTRIB AS limit
WHERE
--The first two define the date range.
limit.LAST_UPDATE_DATE >= DATEADD(DAY, -7, CAST(GETDATE() AS DATE))
AND limit.LAST_UPDATE_DATE <= CAST(GETDATE() AS DATE)
AND
--And these are the correlations to the outer query.
limit.EMPLID = sdist.EMPLID
AND limit.BANK_CD = sdist.BANK_CD
AND limit.ACCOUNT_NUM = sdist.ACCOUNT_NUM
)
AND
(
dist.DEPOSIT_TYPE = 'P'
AND dist.AMOUNT_PCT = 100
)
OR dist.PRIORITY = 999
OR dist.DEPOSIT_TYPE = 'B'
) AS dist
ON
dist.EMPLID = dd.EMPLID
AND dist.EFFDT = dd.EFFDT
INNER JOIN
PS_EMPLOYEES AS emp
ON
emp.EMPLID = dist.EMPLID
WHERE
dist.RowNum <= 2
AND
emp.EMPL_STATUS NOT IN ('T', 'R', 'D')
AND
dd.EFF_STATUS = 'A';

Related

Need to join 3 tables and return results where everyone is over age 75

I have table employee with
id, name, dob, emplid
table documentation has
cdid, emplid, status, record
table appointment has
cvid, emplid, slotid
Everyone has a record in table Employee. Table Appointment stores everyone who schedules an appointment and table Documentation is where the record gets inserted when they complete their appointment. The problem is, they take walk-ins and will have a record in table Documentation, but no record in table Appointment. They want me to find everyone in table Employee who is over age 75, but does not currently have an appointment or has never come in as a walk-in.
I started with the below, but I am stuck on how to accurately get everyone counted.
SELECT COUNT(AgeYearsIntTrunc)
FROM (
SELECT DATEDIFF(hour,e.DOB,GETDATE())/8766.0 AS AgeYearsDecimal
,CONVERT(int,ROUND(DATEDIFF(hour,e.DOB,GETDATE())/8766.0,0)) AS AgeYearsIntRound
,DATEDIFF(hour,e.DOB,GETDATE())/8766 AS AgeYearsIntTrunc
FROM [dbo].Employee e
LEFT JOIN [dbo].Documentation d ON e.EmplID = d.EMPLID
WHERE d.Status IS NULL
) dt WHERE AgeYearsIntTrunc >='75'
Sample Data
It's normally best to not expect the compiler to do algebra for you, in other words: write predicates like x > y + 5 rather than x - y > 5.
Generally, the most efficient method for comparing dates is to keep the column you are checking without any function, do any necessary calculations on the GETDATE side.
NOT EXISTS is much easier for the compiler to reason about than LEFT JOIN / IS NULL
SELECT e.* --COUNT(*)
FROM [dbo].Employee e
WHERE e.DOB < DATEADD(year, -75, GETDATE())
AND (
NOT EXISTS (SELECT 1
FROM [dbo].Documentation d
WHERE e.EmplID = d.EMPLID
)
AND -- do you want OR maybe?
NOT EXISTS (SELECT 1
FROM dbo.Appointment a
WHERE a.EmplId = e.EmplId
)
)

Need expert opinion for the scenario defined below in description

SELECT
VP.PERSONNUM,
VP.PAYCODENAME,
VP.PAYCODEID,
VP.TIMEINSECONDS,
TI.ENTEREDONDTM,
VP.APPLYDATE,
VP.LABORLEVELNAME2,
VP.LABORLEVELNAME4,
VP.LABORLEVELNAME5,[enter image description here][1]
CONVERT(VARCHAR(10),VP.ADJUSTEDAPPLYDATE,23) ORIGINALDATE,
CONVERT(VARCHAR(10),VP.PREVPAYPERIODSTART,23)PREVPAYPERIODSTAR,
CONVERT(VARCHAR(10),VP.PREVPAYPERIODEND,112)PREVPAYPERIODEND
FROM VP_TOTALS VP
JOIN TIMESHEETITEM TI
ON TI.EMPLOYEEID = VP.EMPLOYEEID AND TI.TIMESHEETITEMID = VP.TIMESHEETITEMID and TI.DELETEDSW <> '1'
WHERE VP.PAYCODETYPE <> 'G'
and VP.PERSONNUM='100419'
and vp.PAYCODEID in ('145','701')
AND APPLYDATE BETWEEN '2020-06-01' AND '2020-06-30'
Order by VP.PERSONNUM, VP.APPLYDATE
have 2 pay codes Overtime(SWG)Regular(SWG) coming separately in individual rows based on the dates
Example- 1st row personnum =100149 , applydate =6/3/2020, paycode =Overtime(SWG),Timeinseconds =1800
2nd row personnum is 100149 , applydate =6/3/2020, paycode Regular(SWG),Timeinseconds =1500.
2)My main requirement is these two rows should be added together and show only 1 row by adding together.
i.e.personnum =100149 , applydate =6/3/2020, paycode = [Overtime(SWG)+Regular(SWG)] ,Timeinseconds =2300]1
It's hard to tell what all is going on with the WHERE conditions. Based on what's described in the notes I'm thinking something like this.
[Edit] Added PAYCODEID concatenation to the GROUP BY clause
;with
overtime_cte as (select * from VP_TOTALS where PAYCODEID='Overtime(SWG)' and VP.PERSONNUM='100419'),
regular_cte as (select * from VP_TOTALS where PAYCODEID='Regular(SWG)' and VP.PERSONNUM='100419')
select rc.PERSONNUM, rc.APPLYDATE, concat(rc.PAYCODEID, '+', oc.PAYCODEID) PAYCODEID,
sum(rc.TIMEINSECONDS+oc.TIMEINSECONDS)
from regular_cte rc
join
overtime_cte oc on rc.PERSONNUM=oc.PERSONNUM
and rc.APPLYDATE=oc.APPLYDATE
group by rc.PERSONNUM, rc.APPLYDATE, concat(rc.PAYCODEID, '+', oc.PAYCODEID);

Clean up 'duplicate' data while preserving most recent entry

I want to display each crew member, basic info, and the most recent start date from their contracts. With my basic query, it returns a row for each contract, duplicating the basic info with a distinct start and end date.
I only need one row per person, with the latest start date (or null if they have never yet had a start date).
I have limited understanding of group by and partition functions. Queries I have reverse engineered for similar date use partition and create temp tables where they select from. Ultimately I could reuse that but it seems more convoluted than what we need.
select
Case when P01.EMPLOYMENTENDDATE < getdate() then 'Y'
else ''
end as "Deactivate",
concat(p01.FIRSTNAME,' ',p01.MIDDLENAME) as "First and Middle",
p01.LASTNAME,
p01.PIN,
(select top 1 TELENO FROM PW001P0T WHERE PIN = P01.PIN and TELETYPE = 6 ORDER BY TELEPRIORITY) as "EmailAddress",
org.NAME AS Vessel,
case
WHEN c02.CODECATEGORY= '20' then 'MARINE'
WHEN c02.CODECATEGORY= '10' then 'MARINE'
ELSE 'HOTEL' end as "Department",
c02.name as RankName,
c02.Alternative RankCode,
convert(varchar, ACT.DATEFROM,101) EmbarkDate,
convert(varchar,(case when ACT.DATEFROM is null then p03.TODATEESTIMATED else ACT.DATEFROM end),101) DebarkDate
FROM PW001P01 p01
JOIN PW001P03 p03
ON p03.PIN = p01.PIN
LEFT JOIN PW001C02 c02
ON c02.CODE = p03.RANK
/*LEFT JOIN PW001C02 CCIRankTbl
ON CCIRankTbl.CODE = p01.RANK*/
LEFT JOIN PWORG org
ON org.NUMORGID = dbo.ad_scanorgtree(p03.NUMORGID, 3)
LEFT JOIN PWORGVESACT ACT
ON ACT.numorgid=dbo.ad_scanorgtree(p03.numorgid,3)
where P01.EMPLOYMENTENDDATE > getdate()-10 or P01.EMPLOYMENTENDDATE is null
I only need to show one row per column. The first 5 columns will be the same always. The last columns depend on contract, and we just need data from the most recent one.
<table><tbody><tr><th>Deactivate</th><th>First and Middle</th><th>Lastname</th><th>PIN</th><th>Email</th><th>Vessel</th><th>Department</th><th>Rank</th><th>RankCode</th><th>Embark</th><th>Debark</th></tr><tr><td> </td><td>Martin</td><td>Smith</td><td>123</td><td>msmith#fake.com</td><td>Ship1</td><td>Marine</td><td>ViceCaptain</td><td>VICE</td><td>9/1/2008</td><td>9/20/2008</td></tr><tr><td> </td><td>Matin</td><td>Smith</td><td>123</td><td>msmith#fake.com</td><td>Ship2</td><td>Marine</td><td>Captain</td><td>CAP</td><td>12/1/2008</td><td>12/20/2008</td></tr><tr><td> </td><td>Steve Mark</td><td>Dude</td><td>98765</td><td>sdude#fake.com</td><td>Ship1</td><td>Hotel</td><td>Chef</td><td>CHEF</td><td>5/1/2009</td><td>8/1/2009</td></tr><tr><td> </td><td>Steve Mark</td><td>Dude</td><td>98765</td><td>sdude#fake.com</td><td>Ship3</td><td>Hotel</td><td>Chef</td><td>CHEF</td><td>10/1/2010</td><td>12/20/2010</td></tr></tbody></table>
Change your query to a SELECT DISTINCT on the main query and use a sub-select for DebarkDate column:
(SELECT TOP 1 A.DATEFROM FROM PWORGVESACT A WHERE A.numorgid = ACT.numorgid ORDER BY A.DATEFROM DESC) AS DebarkDate
You can do whatever conversions on the date you need to from the result of that sub-query.

Query for active records between date range or most recent before date range

I need to find active records that fall between a range of date parameters from a table containing applications. First, I look for a record between the date range in a table called 'app_notes' and check if is linked to an application. If there is no app_note record in the date range, I must look at the most recent app note from before the date range. If this app note indicates a status of active, I select it.
The app_indiv table connects an individual to an application. There can be multiple app_indiv records for each individual and multiple app_notes for each app_indiv. Here is what I have so far:
SELECT DISTINCT individual.indiv_id
FROM individual INNER JOIN
app_indiv ON app_indiv.indiv_id = individual.indiv_id INNER JOIN
app_note ON app_indiv.app_indiv_id = app_note.app_indiv_id
WHERE (app_note.when_mod BETWEEN #date_from AND #date_to)
/* OR most recent app_note indicates active */
How can I get the most recent app_note record if there is not one in the date range? Since there are multiple app_note records possible, I don't know how to make it only retrieve the most recent.
SELECT *
FROM individual i
INNER JOIN app_indiv ai
ON ai.indiv_id = i.indiv_id
OUTER APPLY
(
SELECT TOP 1 * FROM app_note an
WHERE an.app_indiv_id = ai.app_indiv_id
AND an.when_mod < #date_to
ORDER BY an.when_mod DESC
) d
WHERE d.status = 'active'
Find the last note less than end date, check to see if it's active and if so show the individual record.
(untested) You'll need to use a CASE switch.
SELECT DISTINCT individual.indiv_id
FROM individual INNER JOIN
app_indiv ON app_indiv.indiv_id = individual.indiv_id INNER JOIN
app_note ON app_indiv.app_indiv_id = app_note.app_indiv_id
WHERE (CASE WHEN app_note.when_mod BETWEEN #date_from AND #date_to
THEN (SELECT appnote.when_mod from individual where appnote.when_mod BETWEEN #date_from AND #date_to)
WHEN app_note.when_mod NOT BETWEEN #date_from and #date_to
THEN (SELECT appnote.when_mod from individual appnote.when_mod LIMIT 1))
Query might not be correct. Switch might need to be in the first SELECT part of the query.
It seems to me that you really only care about the end date of your date range, since you want to be able to look farther back if there's nothing in that date range. I would use a CTEand the ROW_NUMBER() function. The CTE is just a cleaner way to write a sub-query (in this case, a CTE can do a lot more though). The Row_Number function will numbers the rows based on the order by statement. The partition by resets the numbering to one each time you hit a new value in that column.
with AppNoteCTE as
(select
<not sure what columns you need here>
app_indiv_id,
ROW_NUMBER() OVER (PARTITION BY APP_INDIV_ID ORDER BY WHEN_MOD DESC) RN
FROM
APP_INDIV
WHERE
WHEN_MOD <= #endDate)
SELECT DISTINCT individual.indiv_id
FROM individual INNER JOIN
app_indiv ON app_indiv.indiv_id = individual.indiv_id INNER JOIN
AppNoteCTE ON app_indiv.app_indiv_id = AppNoteCTE .app_indiv_id
and AppNoteCTE.RN = 1

SQL Gap Overlap DateRanges

I have a query that identifies gaps and overlaps of date ranges in sql server 2008 r2. Each unique data set has 12 records. What I would like to do is to adjust or add to the code that identifies the gaps and overlaps and update the records to be sequential.
--gaps and overlaps tbl_volumes
with s as
(
select esiid,Read_Start,Read_End ,row_number() over(partition by esiid order by Read_Start) rn
from tbl_Volumes
where Status=0
group by esiid,Read_Start,Read_End)
select a.esiid, a.Read_Start, a.Read_End, b.Read_Start as nextstartdate,datediff(d,a.Read_End, b.Read_Start) as gap
into #go
from s a
join s b on b.esiid = a.esiid and b.rn = a.rn + 1
where datediff(d, a.Read_End, b.Read_Start) not in (0,1)
order by a.esiid
Here is the bad record set that I would like to see sequential:
e Read_Start Read_End Source
10032789402145965 2011-01-21 2011-02-22 867_03_1563303
10032789402145965 2011-02-22 2011-03-21 867_03_1665865
10032789402145965 2011-03-26 2011-04-20 867_03_1782993
Well, you could just assign a new Read_end to each record based on the next value. The calculation for the new start can be done like this:
select t.*,
(select top 1 Read_Start
from t t2
where t2.e = t.e and t2.Read_Start > t.Read_Start
order by t2.Read_Start
) as New_Read_End
from t
Do you actually want to update the value or just see what it should be?