Delete rows with continuous dates within date period in SQL Server [duplicate] - sql

This question already exists:
Closed 10 years ago.
Possible Duplicate:
Trying to consolidate employer records who are continuously work for same department
I am trying to consolidate employees records who have been continuously (anything < 45 days) enrolled with the specific department
Note: If the date diff (between emp_eff_to_date and next row emp_eff_from_date) is less than 45 days then it is considered as continuous
INPUT:
EMP_ID + DEPT_ID + EMP_EFF_FROM_DATE + EMP_EFF_TO_DATE
-----------------------------------------------------------------------
10 10001 8/1/2008 10/31/2009
10 10001 11/1/2009 2/25/2010
10 10001 2/26/2010 5/1/2011
10 10001 8/1/2011 10/30/2011
10 10001 12/1/2011 10/31/2012
10 10003 7/1/2007 10/31/2007
10 10004 9/27/2004 6/8/2006
10 10004 6/30/2006 6/29/2007
10 10007 6/25/2006 6/20/2007
10 10007 8/25/2007 5/25/2008
Output desired:
EMP_ID DEPT_ID EMP_EFF_FROM_DATE EMP_EFF_TO_DATE
-------------------------------------------------------------------------
10 10001 2008-08-01 2011-05-01
10 10001 2011-08-01 2012-10-31
10 10003 2007-07-01 2007-10-31
10 10004 2004-09-27 2007-06-29
10 10007 2006-06-25 2007-06-20
10 10007 2007-08-25 2007-06-29

I had to do a very similar thing recently, and my first thought was a Recursive table expression, which works, but may not be the best solution depending on the amount of data that is in your table.
It is not clear whether you want to actually delete the rows from the database, or just view the results as required based on the records as they currently are.
SOLUTION 1 (SQL Fiddle)
This uses the CTE to just select the results. It will essentially find the next row where the from date is within 45 days of the current row's to date, and keep looping until there are no matches. Once done it finds the result for the latest result for each from date (MaxRecursion field), and excludes then all other rows that fall within the date range of that row.
WITH CTE AS
( SELECT *, [Recursion] = 0
FROM T
UNION ALL
SELECT T.EMP_ID,
T.DEPT_ID,
T.EMP_EFF_FROM_DATE,
T2.EMP_EFF_TO_DATE,
T.[Recursion] + 1
FROM CTE T
INNER JOIN T T2
ON T.EMP_ID = T.EMP_ID
AND T.DEPT_ID = T2.DEPT_ID
AND T2.EMP_EFF_FROM_DATE > T.EMP_EFF_FROM_DATE
AND T2.EMP_EFF_TO_DATE > T.EMP_EFF_TO_DATE
AND T2.EMP_EFF_FROM_DATE <= DATEADD(DAY, 45, T.EMP_EFF_TO_DATE)
), CTE2 AS
( SELECT *,
[MaxRecursion] = MAX(Recursion) OVER(PARTITION BY EMP_ID, DEPT_ID, EMP_EFF_FROM_DATE)
FROM CTE
)
SELECT T.EMP_ID,
T.DEPT_ID,
T.EMP_EFF_FROM_DATE,
T.EMP_EFF_TO_DATE
FROM CTE2 T
WHERE Recursion = MaxRecursion
AND NOT EXISTS
( SELECT 1
FROM CTE2 T2
WHERE T.EMP_ID = T2.EMP_ID
AND T.DEPT_ID = T2.DEPT_ID
AND T.EMP_EFF_FROM_DATE < T2.EMP_EFF_FROM_DATE
AND T.EMP_EFF_TO_DATE >= T2.EMP_EFF_TO_DATE
)
ORDER BY EMP_ID, DEPT_ID, EMP_EFF_FROM_DATE, EMP_EFF_TO_DATE;
SOLUTION 2 (SQL Fiddle)
This will actually update existing rows, and delete redundant rows, meaning you can just select from the table to get the desired results. If ofcourse you don't want to actually delete from the database you could just insert the data into a temp table and apply the same principle (Example here). In my case this solution ran a lot faster than using a recursive CTE, because at each stage of the loop the query is dealing with less data, rather than more as with the recursive cte.
WHILE EXISTS
( SELECT 1
FROM T
INNER JOIN T T2
ON T2.EMP_ID = T.EMP_ID
AND T2.DEPT_ID = T.DEPT_ID
AND T2.EMP_EFF_FROM_DATE > T.EMP_EFF_TO_DATE
AND T2.EMP_EFF_FROM_DATE <= DATEADD(DAY, 45, T.EMP_EFF_TO_DATE)
)
BEGIN
UPDATE T
SET EMP_EFF_TO_DATE = T2.EMP_EFF_TO_DATE
FROM T
INNER JOIN
( SELECT *
FROM T
) T2
ON T2.EMP_ID = T.EMP_ID
AND T2.DEPT_ID = T.DEPT_ID
AND T2.EMP_EFF_FROM_DATE > T.EMP_EFF_TO_DATE
AND T2.EMP_EFF_FROM_DATE <= DATEADD(DAY, 45, T.EMP_EFF_TO_DATE)
DELETE T
FROM T
WHERE EXISTS
( SELECT 1
FROM T T2
WHERE T2.EMP_ID = T.EMP_ID
AND T2.DEPT_ID = T.DEPT_ID
AND T2.EMP_EFF_FROM_DATE < T.EMP_EFF_FROM_DATE
AND T2.EMP_EFF_TO_DATE BETWEEN T.EMP_EFF_FROM_DATE AND T.EMP_EFF_TO_DATE
)
END;
SELECT *
FROM T
ORDER BY EMP_ID, DEPT_ID, EMP_EFF_FROM_DATE;
All of these solutions differ to your sample data in the last row which appears to be an error:
I think this row:
10 10007 2007-08-25 2007-06-29
should be:
10 10007 2007-08-25 2008-05-25

Assuming the next row is according to the emp_eff_from_date field (sorted), here is a way to solve it:
WITH DATA
AS (SELECT *,
Row_number()
OVER (
PARTITION BY EMP_ID
ORDER BY EMP_EFF_FROM_DATE)rn
FROM TEST)
SELECT t1.*
FROM DATA t1
INNER JOIN DATA t2
ON t1.RN = t2.RN - 1
WHERE Datediff(DAY, t1.EMP_EFF_TO_DATE, t2.EMP_EFF_FROM_DATE) <= 45
The full solution is here
Let me know if it's not exactly what you wanted.

Related

SQL Get closest value to a number

I need to find the closet value of each number in column Divide from the column Quantity and put the value found in the Value column for both Quantities.
Example:
In the column Divide the value of 5166 would be closest to Quantity column value 5000. To keep from using those two values more than once I need to place the value of 5000 in the value column for both numbers, like the example below. Also, is it possible to do this without a loop?
Quantity Divide Rank Value
15500 5166 5 5000
1250 416 5 0
5000 1666 5 5000
12500 4166 4 0
164250 54750 3 0
5250 1750 3 0
6250 2083 3 0
12250 4083 3 0
1750 583 2 0
17000 5666 2 0
2500 833 2 0
11500 3833 2 0
1250 416 1 0
There are a couple of answers here but they both use ctes/complex subqueries. There is a much simpler/faster way by just doing a couple of self joins and a group-by
https://www.db-fiddle.com/f/rM268EYMWuK7yQT3gwSbGE/0
select
min(min.quantity) as minQuantityOverDivide
, t1.divide
, max(max.quantity) as maxQuantityUnderDivide
, case
when
(abs(t1.divide - coalesce(min(min.quantity),0))
<
abs(t1.divide - coalesce(max(max.quantity),0)))
then max(max.quantity)
else min(min.quantity) end as cloestQuantity
from t1
left join (select quantity from t1) min on min.quantity >= t1.divide
left join (select quantity from t1) max on max.quantity < t1.divide
group by
t1.divide
If I understood the requirements, 5166 is not closest to 5000 - it's closes to 5250 (delta of 166 vs 84)
The corresponding query, without loops, shall be (fiddle here: https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=be434e67ba73addba119894a98657f17).
(I added a Value_Rank as it's not sure if you want Rank to be kept or recomputed)
select
Quantity, Divide, Rank, Value,
dense_rank() over(order by Value) as Value_Rank
from
(
select
Quantity, Divide, Rank,
--
case
when abs(Quantity_let_delta) < abs(Quantity_get_delta) then Divide + Quantity_let_delta
else Divide + Quantity_get_delta
end as Value
from
(
select
so.Quantity, so.Divide, so.Rank,
-- There is no LessEqualThan, assume GreaterEqualThan
max(isnull(so_let.Quantity, so_get.Quantity)) - so.Divide as Quantity_let_delta,
-- There is no GreaterEqualThan, assume LessEqualThan
min(isnull(so_get.Quantity, so_let.Quantity)) - so.Divide as Quantity_get_delta
from
SO so
left outer join SO so_let
on so_let.Quantity <= so.Divide
--
left outer join SO so_get
on so_get.Quantity >= so.Divide
group by so.Quantity, so.Divide, so.Rank
) so
) result
Or, if by closest you mean the previous closest (fiddle here: https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=b41fb1a3fc11039c7f82926f8816e270).
select
Quantity, Divide, Rank, Value,
dense_rank() over(order by Value) as Value_Rank
from
(
select
so.Quantity, so.Divide, so.Rank,
-- There is no LessEqualThan, assume 0
max(isnull(so_let.Quantity, 0)) as Value
from
SO so
left outer join SO so_let
on so_let.Quantity <= so.Divide
group by so.Quantity, so.Divide, so.Rank
) result
You don't need a loop, basically you need to find which is lowest difference between the divide and all the quantities (first cte). Then use this distance to find the corresponding record (second cte) and then join with your initial table to get the converted values (final select)
;with cte as (
select t.Divide, min(abs(t2.Quantity-t.Divide)) as ClosestQuantity
from #t1 as t
cross apply #t1 as t2
group by t.Divide
)
,cte2 as (
select distinct
t.Divide, t2.Quantity
from #t1 as t
cross apply #t1 as t2
where abs(t2.Quantity-t.Divide) = (select ClosestQuantity from cte as c where c.Divide = t.Divide)
)
select t.Quantity, cte2.Quantity as Divide, t.Rank, t.Value
from #t1 as t
left outer join cte2 on t.Divide = cte2.Divide

What's the most efficient way to match values between 2 tables based on most recent prior date?

I've got two tables in MS SQL Server:
dailyt - which contains daily data:
date val
---------------------
2014-05-22 10
2014-05-21 9.5
2014-05-20 9
2014-05-19 8
2014-05-18 7.5
etc...
And periodt - which contains data coming in at irregular periods:
date val
---------------------
2014-05-21 2
2014-05-18 1
Given a row in dailyt, I want to adjust its value by adding the corresponding value in periodt with the closest date prior or equal to the date of the dailyt row. So, the output would look like:
addt
date val
---------------------
2014-05-22 12 <- add 2 from 2014-05-21
2014-05-21 11.5 <- add 2 from 2014-05-21
2014-05-20 10 <- add 1 from 2014-05-18
2014-05-19 9 <- add 1 from 2014-05-18
2014-05-18 8.5 <- add 1 from 2014-05-18
I know that one way to do this is to join the dailyt and periodt tables on periodt.date <= dailyt.date and then imposing a ROW_NUMBER() (PARTITION BY dailyt.date ORDER BY periodt.date DESC) condition, and then having a WHERE condition on the row number to = 1.
Is there another way to do this that would be more efficient? Or is this pretty much optimal?
I think using APPLY would be the most efficient way:
SELECT d.Val,
p.Val,
NewVal = d.Val + ISNULL(p.Val, 0)
FROM Dailyt AS d
OUTER APPLY
( SELECT TOP 1 Val
FROM Periodt p
WHERE p.Date <= d.Date
ORDER BY p.Date DESC
) AS p;
Example on SQL Fiddle
If there relatively very few periodt rows, then there is an option that may prove quite efficient.
Convert periodt into a From/To ranges table using subqueries or CTEs. (Obviously performance depends on how efficiently this initial step can be done, which is why a small number of periodt rows is preferable.) Then the join to dailyt will be extremely efficient. E.g.
;WITH PIds AS (
SELECT ROW_NUMBER() OVER(ORDER BY PDate) RN, *
FROM #periodt
),
PRange AS (
SELECT f.PDate AS FromDate, t.PDate as ToDate, f.PVal
FROM PIds f
LEFT OUTER JOIN PIds t ON
t.RN = f.RN + 1
)
SELECT d.*, p.PVal
FROM #dailyt d
LEFT OUTER JOIN PRange p ON
d.DDate >= p.FromDate
AND (d.DDate < p.ToDate OR p.ToDate IS NULL)
ORDER BY 1 DESC
If you want to try the query, the following produces the sample data using table variables. Note I added an extra row to dailyt to demonstrate no periodt entries with a smaller date.
DECLARE #dailyt table (
DDate date NOT NULL,
DVal float NOT NULL
)
INSERT INTO #dailyt(DDate, DVal)
SELECT '20140522', 10
UNION ALL SELECT '20140521', 9.5
UNION ALL SELECT '20140520', 9
UNION ALL SELECT '20140519', 8
UNION ALL SELECT '20140518', 7.5
UNION ALL SELECT '20140517', 6.5
DECLARE #periodt table (
PDate date NOT NULL,
PVal int NOT NULL
)
INSERT INTO #periodt
SELECT '20140521', 2
UNION ALL SELECT '20140518', 1

SQL query - Difference between the values from two rows and two columns

I am struggling to get this working, using T-SQL Query (SQL SERVER 2008) for the following problem:
Ky ProductID Start # End # Diff
1 100 10 12 0
2 100 14 20 2 (14 - 12)
3 100 21 25 1 (21 - 20)
4 100 30 33 5 (30 - 25)
1 110 6 16 0
2 110 20 21 4 (20 - 16)
3 110 22 38 1 (22 - 21)
as you can see I need the difference between values in two different rows and two columns.
I tried
with t1
( select ROW_NUMBER() OVER (PARTITION by ProductID ORDER BY ProductID, Start# ) as KY
, productid
, start#
, end#
from mytable)
and
select DATEDIFF(ss, T2.complete_dm, T1.start_dm)
, <Keeping it simple not including all the columns which I selected..>
FROM T1 as T2
RIGHT OUTER JOIN T1 on T2.Ky + 1 = T1.KY
and T1.ProductID = T2.ProductID
The problem with the above query is when the productID changes from 100 to 110 still it calculates the difference.
Any help in modifying the query or any simpler solution much appreciated.
Thanks
You can try below code for the required result :
select ky,Start,[End],(select [end] from table1 tt where (tt.ky)=(t.ky-1) and tt.ProductID=t.ProductID) [End_Prev_Row],
case ky when 1 then 0
else (t.start -(select [end] from table1 tt where (tt.ky)=(t.ky-1) and tt.ProductID=t.ProductID))
end as Diff
from table1 t
SQL FIDDLE
Try something like that. It should give you the difference you want. I'm getting the first row for each product in the first part and then recursively build up by using the next Ky.
with t1
as
(
select ProductID, Ky, 0 as Difference, [End#]
from mytable where ky = 1
union all
select m.ProductID, m.Ky, m.[Start#] - t1.[End#] as Difference, m.[End#]
from mytable m
inner join t1 on m.ProductID = t1.ProductID and m.Ky = t1.Ky + 1
)
select Ky, ProductID, Difference from t1
order by ProductID, Ky
As Anup has mentioned, your query seems to be working fine, I just removed DateDiff to calculate the difference, as I assume columns are not of DATE datatype from your example, I guess that was the issue, please find below the modified query
with t1
as
( select ROW_NUMBER() OVER (PARTITION by ProductID ORDER BY ProductID ) as KY
, productid
, st
, ed
from YourTable)
select T1.ProductID, t1.ST,t1.ED, ISNULL(T1.st - T2.ed,0) as Diff
FROM T1 as T2
RIGHT OUTER JOIN T1 on T2.KY+1 = T1.KY
and T1.ProductID = T2.ProductID
SELECT ROW_NUMBER() OVER (PARTITION by rc.ContractID ORDER BY rc.ID) AS ROWID,rc.ID,rc2.ID,rc.ContractID,rc2.ContractID,rc.ToDate,rc2.FromDate
FROM tbl_RenewContracts rc
LEFT OUTER JOIN tbl_RenewContracts rc2
ON rc2.ID = (SELECT MAX(ID) FROM tbl_RenewContracts rcs WHERE rcs.ID < rc.ID AND rcs.ContractID = rc.ContractID)
ORDER BY rc.ContractID
Replace your table name and columns and add calculated column to get the DATEDIFF.

Grouping in SQL Statement

I have the following SQL statement:
SELECT TOP 30
a.ClassAdID, -- 0
a.AdTitle, -- 1
a.ClassAdCatID, -- 2
b.ClassAdCat, -- 3
a.Img1, -- 4
e.Domain, -- 5
a.AdText, -- 6
a.RegionID, -- 7
a.IsEvent, -- 8
a.IsCoupon, -- 9
b.ParentID, -- 10
a.MemberID, -- 11
a.AdURL, -- 12
a.Location, -- 13
a.GroupID -- 14
FROM ClassAd a
INNER JOIN ClassAdCat b ON b.ClassAdCatID = a.ClassAdCatID
INNER JOIN Member d ON d.MemberID = a.MemberID
INNER JOIN Region e ON e.RegionID = a.RegionID
WHERE DATEDIFF(d, GETDATE(), a.ExpirationDate) >= 0
AND PostType <> 'CPN'
ORDER BY a.CreateDate DESC
I want to only show one from each GROUPID... How can I adjust the statement to achieve this as I am lost with DISTINCT, GROUP BY etc..
Any help would be appreciated.
Many thanks,
Paul
You can use ROW_NUMBER function to partition data set based on GroupId values thus: for every new GroupId values the counter is restarted from 1 and the first row (with ROW_NUMBER = 1) is the newest record (a.CreateDate DESC). Then, we filter all records having ROW_NUMBER = 1 .
SELECT TOP 30 *
FROM
(
SELECT
a.ClassAdID, -- 0
a.AdTitle, -- 1
a.ClassAdCatID, -- 2
b.ClassAdCat, -- 3
a.Img1, -- 4
e.Domain, -- 5
a.AdText, -- 6
a.RegionID, -- 7
a.IsEvent, -- 8
a.IsCoupon, -- 9
b.ParentID, -- 10
a.MemberID, -- 11
a.AdURL, -- 12
a.Location, -- 13
a.GroupID, -- 14
ROW_NUMBER() OVER(PARTITION BY a.GroupId ORDER BY a.CreateDate DESC) AS PseudoId
FROM ClassAd a
INNER JOIN ClassAdCat b ON b.ClassAdCatID = a.ClassAdCatID
INNER JOIN Member d ON d.MemberID = a.MemberID
INNER JOIN Region e ON e.RegionID = a.RegionID
WHERE DATEDIFF(d, GETDATE(), a.ExpirationDate) >= 0
AND PostType <> 'CPN'
) q
WHERE q.PseudoId = 1;
GROUP BY goes with an AGGREGATE function... meaning you want to add up the values in the group, or find the biggest, or smallest in the group etc.
DISTINCT will remove duplicate rows.
in your query, you may be getting a bunch of not-so-similar rows that all happen to have the same group_id... if this is so, then you need to decide which one of those rows you really want to see.
maybe you want the newest one, or the one with the longest name, or something like that.
for grouping, you would pick a column like createdon and say something like MAX( createdon ) in the select list, then group on every other column in the select list to find the rows that match each other (except for created on), and return that only once with the largest value for created on... hope that makes sense.
edit:
very simple example for group id and create date. ( you can keep adding more columns as needed - one in the group by list for every one in the select list :
SELECT groupid, max( createdate )
FROM ClassAd
GROUP BY groupId
If I understand correctly you want to get one row from each group (like groupid)
I used sql server 2005 (Nothwind)
SELECT TOP 30 Customers.CompanyName, Orders.ShipCity, Orders.Freight
FROM Customers INNER JOIN
Orders ON Customers.CustomerID = Orders.CustomerID
GROUP BY Customers.CompanyName, Orders.ShipCity, Orders.Freight

SQL select row-wise increase in amount of running total column

Suppose I have a table with columns (DayId, RunningTotal):
DayId RunningTotal
---------------------
1 25
3 50
6 100
9 200
10 250
How can I select the DayId and the amount the RunningTotal has increased from the previous day? i.e. how can I select:
DayId DayTotal
---------------------
1 25
3 25
6 50
9 100
10 50
The only current method I know is with a while loop I am trying to factor out. Also, the DayId has no regular rules, just that it is some increasing integer value, but it increases by an irregular amount as shown in the example table.
EDIT: using MS SQL Server 2005
with cte as (
select dayid, runningtotal, row_number() over (order by dayid asc) as row_index
from #the_table
)
select cur.dayid, cur.runningtotal - coalesce(prev.runningtotal, 0) as daytotal
from cte cur
left join cte prev on prev.row_index = cur.row_index - 1
(I really wish they'd implemented support for the lead and lag functions in SQL Server :|)
There is probably a more succinct way than this, but try:
select t3.DayId,
case when t4.DayId is null then t3.RunningTotal else t3.RunningTotal - t4.RunningTotal end as DayTotal
from (
select t1.DayId, max(t2.DayId) as PreviousDayId as
from MyTable t1
left outer join MyTable t2 on t2.DayId < t1.DayId
group by t1.DayId
) a
inner join MyTable t3 on a.DayId = t3.DayId
left outer join MyTable t4 on a.PreviousDayId = t4.DayId