Deleting rows within floating data ranges using SQL - sql

I have some date data as follows:-
Person | Date
1 | 1/1/2000
1 | 6/1/2000
1 | 11/1/2000
1 | 21/1/2000
1 | 28/1/2000
I need to delete rows within 14 days of a previous one. However, if a row is deleted, it should not later become a 'base' date against which later rows are checked. It's perhaps easier to show the results needed:-
Person | Date
1 | 1/1/2000
1 | 21/1/2000
My feeling is that recursive SQL will be needed but I'm not sure how to set it up. I'll be running this on Teradata.
Thanks.
--- Edit ---
Well, this is embarrassing. It turns out this question has been asked before - and it was asked by me! See this old question for an excellent answer from #dnoeth:-
Drop rows identified within moving time window

Use recursive tables. Use ROWNUMBER() to Order and Number the dates.
DATEDIFF() to receive the number of days passed from previous date
Maybe SQL2012 and above can simplify using SUM() OVER PARTITION with a RANGE
I didn't find it useful in this case
DECLARE #Tab TABLE ([MyDate] SMALLDATETIME)
INSERT INTO #Tab ([MyDate])
VALUES
('2000-01-06'),
('2000-01-01'),
('2000-01-11'),
('2000-01-21'),
('2000-01-28')
;
WITH DOrder (MyDate, SortID) AS (
SELECT MyDate,
ROW_NUMBER() OVER (ORDER BY MyDate)SortID
FROM #Tab t)
,Summarize(MyDate, SortID, sSum, rSum ) AS (
SELECT MyDate, SortID, 0, 0 rSum
FROM DOrder WHERE SortID = 1
UNION ALL
SELECT t.MyDate, t.SortID, DATEDIFF(D, ISNULL(s.MyDate,t.MyDate), t.MyDate) rSum,
CASE WHEN DATEDIFF(D, ISNULL(s.MyDate,t.MyDate), t.MyDate) + s.rSum>14 THEN 0
ELSE DATEDIFF(D, ISNULL(s.MyDate,t.MyDate), t.MyDate)
END rSum
FROM DOrder t INNER JOIN Summarize s
ON (t.SortID = s.SortID+1))
SELECT MyDate
FROM Summarize
WHERE rSum=0

Related

Selecting the difference between dates in a stored procedure using a subquery

I can't get my head around whether this is even possible, but I feel like I might have done it before and lost that bit of code. I am trying to craft a select statement that contains an inner join on a subquery to show the number of days between two dates from the same table.
A simple example of the data structure would look like:
Name ID Date Day Hours
Bill 1 3/3/20 Thursday 8
Fred 2 4/3/20 Monday 6
Bill 1 8/3/20 Tuesday 2
Based on this data, I want to select each row plus an extra column which is the number of days between the date from each row for each ID. Something like:
Select * from tblData
Inner join (datediff(Select Top(1) Date from tblData where Date < Date), Date) And ID = ID)
or for simplicity:
Select * from tblData
Inner join (datediff(Select Top(1) Date from tblData where Date < 8/3/20), 8/3/20) And ID = 1)
The resulting dataset would look like:
Name ID Date Day Hours DaysBtwn
Bill 1 3/3/20 Thursday 8 4 (Assuming there was an earlier row in the table)
Fred 2 4/3/20 Monday 6 5 (Assuming there was an earlier row in the table)
Bill 1 8/3/20 Tuesday 2 5 (Based on the previous row date being 3/3/20 for Bill)
Does this make sense and am I trying to do this the wrong way? I want to do this for about 600000 rows in table and therefore efficiency is the key, so if there is a better way to do this, i'm open to suggestions.
You can use lag():
select t.*, datediff(day, lag(date) over(partition by id order by date), date) diff
from mytable t
I think you just want lag():
select t.*,
datediff(day,
lag(date) over (partition by name order by date),
date
) as diff
from tblData t;
Note: If you want to filter the data so rows in the result set are used for the lag() but not in the result set, then use a subquery:
select t.*
from (select t.*,
datediff(day,
lag(date) over (partition by name order by date),
date
) as diff
from tblData t
) t
where date < '2020-08-03';
Also note the use of the date constant as a string in YYYY-MM-DD format.

Finding most recent date based on consecutive dates

I have s table that lists absences(holidays) of all employees, and what we would like to find out is who is away today, and the date that they will return.
Unfortunately, absences aren't given IDs, so you can't just retrieve the max date from an absence ID if one of those dates is today.
However, absences are given an incrementing ID per day as they are inputt, so I need a query that will find the employeeID if there is an entry with today's date, then increment the AbsenceID column to find the max date on that absence.
Table Example (assuming today's date is 11/11/2014, UK format):
AbsenceID EmployeeID AbsenceDate
100 10 11/11/2014
101 10 12/11/2014
102 10 13/11/2014
103 10 14/11/2014
104 10 15/11/2014
107 21 11/11/2014
108 21 12/11/2014
120 05 11/11/2014
130 15 20/11/2014
140 10 01/03/2015
141 10 02/03/2015
142 10 03/03/2015
143 10 04/03/2015
So, from the above, we'd want the return dates to be:
EmployeeID ReturnDate
10 15/11/2014
21 12/11/2014
05 11/11/2014
Edit: note that the 140-143 range couldn't be included in the results as they appears in the future, and none of the date range of the absence are today.
Presumably I need an iterative sub-function running on each entry with today's date where the employeeID matches.
So based on what I believe you're asking, you want to return a list of the people that are off today and when they are expected back based on the holidays that you have recorded in the system, which should only work only on consecutive days.
SQL Fiddle Demo
Schema Setup:
CREATE TABLE EmployeeAbsence
([AbsenceID] int, [EmployeeID] int, [AbsenceDate] DATETIME)
;
INSERT INTO EmployeeAbsence
([AbsenceID], [EmployeeID], [AbsenceDate])
VALUES
(100, 10, '2014-11-11'),
(101, 10, '2014-11-12'),
(102, 10, '2014-11-13'),
(103, 10, '2014-11-14'),
(104, 10, '2014-11-15'),
(107, 21, '2014-11-11'),
(108, 21, '2014-11-12'),
(120, 05, '2014-11-11'),
(130, 15, '2014-11-20')
;
Recursive CTE to generate the output:
;WITH cte AS (
SELECT EmployeeID, AbsenceDate
FROM dbo.EmployeeAbsence
WHERE AbsenceDate = CAST(GETDATE() AS DATE)
UNION ALL
SELECT e.EmployeeID, e.AbsenceDate
FROM cte
INNER JOIN dbo.EmployeeAbsence e ON e.EmployeeID = cte.EmployeeID
AND e.AbsenceDate = DATEADD(d,1,cte.AbsenceDate)
)
SELECT cte.EmployeeID, MAX(cte.AbsenceDate)
FROM cte
GROUP BY cte.EmployeeID
Results:
| EMPLOYEEID | Return Date |
|------------|---------------------------------|
| 5 | November, 11 2014 00:00:00+0000 |
| 10 | November, 15 2014 00:00:00+0000 |
| 21 | November, 12 2014 00:00:00+0000 |
Explanation:
The first SELECT in the CTE gets employees that are off today with this filter:
WHERE AbsenceDate = CAST(GETDATE() AS DATE)
This result set is then UNIONED back to the EmployeeAbsence table with a join that matches EmployeeID as well as the AbsenceDate + 1 day to find the consecutive days recursively using:
-- add a day to the cte.AbsenceDate from the first SELECT
e.AbsenceDate = DATEADD(d,1,cte.AbsenceDate)
The final SELECT simply groups the cte results by employee with the MAX AbsenceDate that has been calculated per employee.
SELECT cte.EmployeeID, MAX(cte.AbsenceDate)
FROM cte
GROUP BY cte.EmployeeID
Excluding Weekends:
I've done a quick test based on your comment and the below modification to the INNER JOIN within the CTE should exclude weekends when adding the extra days if it detects that adding a day will result in a Saturday:
INNER JOIN dbo.EmployeeAbsence e ON e.EmployeeID = cte.EmployeeID
AND e.AbsenceDate = CASE WHEN datepart(dw,DATEADD(d,1,cte.AbsenceDate)) = 7
THEN DATEADD(d,3,cte.AbsenceDate)
ELSE DATEADD(d,1,cte.AbsenceDate) END
So when you add a day: datepart(dw,DATEADD(d,1,cte.AbsenceDate)) = 7, if it results in Saturday (7), then you add 3 days instead of 1 to get Monday: DATEADD(d,3,cte.AbsenceDate).
You'd need to do a few things to get this data into a usable format. You need to be able to work out where a group begins and ends. This is difficult with this example because there is no straight forward grouping column.
So that we can calculate when a group starts and ends, you need to create a CTE containing all the columns and also use LAG() to get the AbsenceID and EmployeeID from the previous row for each row. In this CTE you should also use ROW_NUMBER() at the same time so that we have a way to re-order the rows into the same order again.
Something like:
WITH
[AbsenceStage] AS (
SELECT [AbsenceID], [EmployeeID], [AbsenceDate]
,[RN] = ROW_NUMBER() OVER (ORDER BY [EmployeeID] ASC, [AbsenceDate] ASC, [AbsenceID] ASC)
,[AbsenceID_Prev] = LAG([AbsenceID]) OVER (ORDER BY [EmployeeID] ASC, [AbsenceDate] ASC, [AbsenceID] ASC)
,[EmployeeID_Prev] = LAG([EmployeeID]) OVER (ORDER BY [EmployeeID] ASC, [AbsenceDate] ASC, [AbsenceID] ASC)
FROM [HR_Absence]
)
Now that we have this we can compare each row to the previous to see if the current row is in a different "group" to the previous row.
The condition would be something like:
[EmployeeID_Prev] IS NULL -- We have a new group if the previous row is null
OR [EmployeeID_Prev] <> [EmployeeID] -- Or if the previous row is for a different employee
OR [AbsenceID_Prev] <> ([AbsenceID]-1) -- Or if the AbsenceID is not sequential
You can then use this to join the CTE to it's self to find the first row in each group with something like:
....
FROM [AbsenceStage] AS [Row]
INNER JOIN [AbsenceStage] AS [First]
ON ([First].[RN] = (
-- Get the first row before ([RN] Less that or equal to) this one where it is the start of a grouping
SELECT MAX([RN]) FROM [AbsenceStage]
WHERE [RN] <= [Row].[RN] AND (
[EmployeeID_Prev] IS NULL
OR [EmployeeID_Prev] <> [EmployeeID]
OR [AbsenceID_Prev] <> ([AbsenceID]-1)
)
))
...
You can then GROUP BY the [First].[RN] which will now act like a group id and allow you to get the start and end date of each absence group.
SELECT
[Row].[EmployeeID]
,MIN([Row].[AbsenceDate]) AS [Absence_Begin]
,MAX([Row].[AbsenceDate]) AS [Absence_End]
...
-- FROM and INNER JOIN from above
...
GROUP BY [First].[RN], [Row].[EmployeeID];
You could then put all that into a view giving you the EmployeeID with the Start and End date of each absence. You can then easily pull out the Employee's currently off with a:
WHERE CAST(CURRENT_TIMESTAMP AS date) BETWEEN [Absence_Begin] AND [Absence_End]
SQL Fiddle
Like another answer here, I'm going to create the leave intervals, but via a different method. First the code:
declare #today date = getdate(); --use whatever date here
with g as (
select *, dateadd(day, -1 * row_number() over (partition by employeeid order by absencedate), AbsenceDate) as group_number
from employeeabsence
) , leave_intervals as (
select employeeid, min(absencedate) as [start], max(absencedate) as [end]
from g
group by EmployeeID, group_number
)
select employeeid, [start], [end]
from leave_intervals
where #today between [start] and [end]
By way of explanation, we first put a date value into a variable. I chose today, but this code will work for any date passed in. Next, we create a common table expression (CTE) that will add on a grouping column to your table. This is the meat of the solution, so it bears some treatment. Within a given interval, the AbsenceDate increases at a rate of one day per row. row_number() also increases at a rate of one per row. So, if we subtract a row_number() number of days from the AbsenceDate, we'll get another (arbitrary) date. The key here is to realize that that arbitrary date will be the same for every row in the interval, so we can use it to group by. From there, it's just a matter of doing just that; get the min and max per interval. Lastly, we find what intervals contain #today.

Calculate items, loop by month, adding month each time through

I have a table of tickets. I am trying to calculate how many tickets were "open" at each month end over the course of the current year. As well, I am pushing this to a bar chart and I am needing out put this into an array through LINQ.
My SQL query to get my calculation is:
SELECT
(SELECT COUNT(*) FROM tblMaintenanceTicket t WHERE (CreateDate < DATEADD(MM, 1, '01/01/2012')))
-
(SELECT COUNT(*) FROM tblMaintenanceTicket t WHERE (CloseDate < DATEADD(MM, 1, '01/01/2012'))) AS 'Open #Month End'
My logic is the following: Count all tickets open between first and end of the month. Subtract that count from the tickets closed before the end of the month.
UPDATED:
I have updated my query with the comments below and it is not working with errors in the GROUP, but I am not truly understanding the logic I guess, my lack of skill in SQL is to blame.
I have added a SQL Fiddle example to show you my query: http://sqlfiddle.com/#!3/c9b638/1
Desired output:
-----------
| Jan | 3 |
-----------
| Feb | 4 |
-----------
| Mar | 0 |
-----------
Your SQL has several erros . . . are grouping by CreateDate but you don't have it as a column from the subqueries. And, you don't have a column alias on the count(*).
I think this is what you are trying to do:
select DATENAME(MONTH,CreateDate), DATEPART(YEAR,CreateDate),
(sum(case when CreateDate < DATEADD(MM, 1, '01/01/2012') then 1 else 0 end) -
sum(case when CloseDate < DATEADD(MM, 1, '01/01/2012') then 1 else 0 end)
)
from tblMaintenanceTicket
group by DATENAME(MONTH,CreateDate), DATEPART(YEAR,CreateDate)
Your comment seems to elucidate what you want clearer than your question (the explanation in the question is a bit buried). What you need is a driver table of months and then join this to your table. Something like:
select mons.yr, mons.mon, count(*) as OpenTickets
from (select month(CreateDate) as mon, year(CreateDate) as yr,
cast(min(CreateDate) as date) as MonthStart,
cast(max(CreateDate) as date) as monthEnd
from tblMaintenanceTicket
group by month(CreateDate), year(CreateDate)
) mons left outer join
tblMaintenanceTicket mt
on mt.CreateDate <= mons.MonthEnd and
(mt.CloseDate > mons.MonthEnd or mt.CloseDate is null)
group by mons.yr, mons.mon
I am assuming records are created on every day. This is a convenience so I don't have to think about getting the first and last day of each month using other SQL functions.
If your query is returning what you need, then simply use DATENAME(MONTH, yourDate) to retrieve the month and group by Month,Year:
SELECT SUM(*), DATENAME(MONTH,yourDate), DATEPART(YEAR,yourDate)
FROM
(
your actual query here
)
GROUP BY DATENAME(MONTH,yourDate), DATEPART(YEAR,yourDate)

Modulo Time in SQL Server 2005 - Return data every n hours

I have something like this:
SELECt *
FROM (
SELECT prodid, date, time, tmp, rowid
FROM live_pilot_plant
WHERE date BETWEEN CONVERT(DATETIME, '3/19/2012', 101)
AND CONVERT(DATETIME, '3/31/2012', 101)
) b
WHERE b.rowid % 400 = 0
FYI: The reason for the convert in the where clause, is because my date is stored as a varchar(10), I had to convert it to datetime in order to get the correct range of data. (I tried a bunch of different things and this worked)
I'm wondering how I can return the data I want every 4 hours during those selected dates. I have data collected approximately every 5 seconds (with some breaks in data) - ie data wasn't collected during a 2 hour period, but then continues at 5 second increments.
In my example I just used a modulo with my rowid - and the syntax works, but as I mentioned above there are some periods where data isnt collected so using logic like: if you take data every 5 seconds and multiple that by 4 hours you can approximately say how many rows are in between wont work.
My time column is a varchar column and is in the form hh:mm:ss
My ideal output is:
| prodid | date | time | tmp |
| 4 | 3/19/2012 | 10:00:00 | 2.3 |
| 7 | 3/19/2012 | 14:00:24 | 3.2 |
As you can see I can be a bit off (in terms of seconds) - I more so need the approximate value in terms of time.
Thank you in advance.
This should work
select prodid, date, time, tmp, rowid
from live_pilot_plant as lpp
inner join (
select min(prodid) as prodid -- is prodid your PK?? if not change it to rowid or whatelse is your PK
from live_pilot_plant
WHERE date BETWEEN CONVERT(DATETIME, '3/19/2012', 101) -- or whatever you want
AND CONVERT(DATETIME, '3/31/2012', 101) -- for better performance it is on the inner select
group by date,
floor( -- floor makes the trick
convert(float,convert(datetime, time)) -- assumes "time" column is a varchar containing data like '19:23:05'
* 6 -- 6 comes form 24 hours / 4 hours
)
) as filter on lpp.prodid = filter.prodid -- if prodid is not the PK also correct here.
A side note for everyone else who have date + time data in only one datetime field, suppose named "when_it_was", the group by can be as simple as:
group by floor(when_it_was * 6) -- again, 6 comes from 24/4
something along the lines of the following should work. Basically create date + time partitions, each partition representing a block of 4 hours and pick the record with the highest rank from each partition
select * from (
select *,
row_number() over (partition by date,cast(left( time, charindex( ':', time) - 1) as int) / 4 order by
date, time) as ranker from live_pilot_plant
) Z where ranker = 1
Assuming rowid is a PK and increased with date/time. Just convert time field to 4 hours interval number substring(time,1,2))/4 and select MIN(rowid) from each of 4 hours groups in a day:
select prodid, date, time, tmp, rowid from live_pilot_plant where rowid in
(
select min(rowid)
from live_pilot_plant
WHERE CONVERT(DATETIME, date, 101) BETWEEN CONVERT(DATETIME, '3/19/2012', 101)
AND CONVERT(DATETIME, '3/31/2012', 101)
group by date,convert(int,substring(time,1,2))/4
)
order by CONVERT(DATETIME, date, 101),time

Select repeat occurrences within time period <x days

If I had a large table (100000 + entries) which had service records or perhaps admission records. How would I find all the instances of re-occurrence within a set number of days.
The table setup could be something like this likely with more columns.
Record ID Customer ID Start Date Time Finish Date Time
1 123456 24/04/2010 16:49 25/04/2010 13:37
3 654321 02/05/2010 12:45 03/05/2010 18:48
4 764352 24/03/2010 21:36 29/03/2010 14:24
9 123456 28/04/2010 13:49 31/04/2010 09:45
10 836472 19/03/2010 19:05 20/03/2010 14:48
11 123456 05/05/2010 11:26 06/05/2010 16:23
What I am trying to do is work out a way to select the records where there is a re-occurrence of the field [Customer ID] within a certain time period (< X days). (Where the time period is Start Date Time of the 2nd occurrence - Finish Date Time of the first occurrence.
This is what I would like it to look like once it was run for say x=7
Record ID Customer ID Start Date Time Finish Date Time Re-occurence
9 123456 28/04/2010 13:49 31/04/2010 09:45 1
11 123456 05/05/2010 11:26 06/05/2010 16:23 2
I can solve this problem with a smaller set of records in Excel but have struggled to come up with a SQL solution in MS Access. I do have some SQL queries that I have tried but I am not sure I am on the right track.
Any advice would be appreciated.
I think this is a clear expression of what you want. It's not extremely high performance but I'm not sure that you can avoid either correlated sub-query or a cartesian JOIN of the table to itself to solve this problem. It is standard SQL and should work in most any engine, although the details of the date math may differ:
SELECT * FROM YourTable YT1 WHERE EXISTS
(SELECT * FROM YourTable YT2 WHERE
YT2.CustomerID = YT1.CustomerID AND YT2.StartTime <= YT2.FinishTime + 7)
In order to accomplish this you would need to make a self join as you are comparing the entire table to itself. Assuming similar names it would look something like this:
select r1.customer_id, min(start_time), max(end_time), count(1) as reoccurences
from records r1,
records r2
where r1.record_id > r2.record_id -- this ensures you don't double count the records
and r1.customer_id = r2.customer_id
and r1.finish_time - r2.start_time <= 7
group by r1.customer_id
You wouldn't be able to easily get both the record_id and the number of occurences, but you could go back and find it by correlating the start time to the record number with that customer_id and start_time.
This will do it:
declare #t table(Record_ID int, Customer_ID int, StartDateTime datetime, FinishDateTime datetime)
insert #t values(1 ,123456,'2010-04-24 16:49','2010-04-25 13:37')
insert #t values(3 ,654321,'2010-05-02 12:45','2010-05-03 18:48')
insert #t values(4 ,764352,'2010-03-24 21:36','2010-03-29 14:24')
insert #t values(9 ,123456,'2010-04-28 13:49','2010-04-30 09:45')
insert #t values(10,836472,'2010-03-19 19:05','2010-03-20 14:48')
insert #t values(11,123456,'2010-05-05 11:26','2010-05-06 16:23')
declare #days int
set #days = 7
;with a as (
select record_id, customer_id, startdatetime, finishdatetime,
rn = row_number() over (partition by customer_id order by startdatetime asc)
from #t),
b as (
select record_id, customer_id, startdatetime, finishdatetime, rn, 0 recurrence
from a
where rn = 1
union all
select a.record_id, a.customer_id, a.startdatetime, a.finishdatetime,
a.rn, case when a.startdatetime - #days < b.finishdatetime then recurrence + 1 else 0 end
from b join a
on b.rn = a.rn - 1 and b.customer_id = a.customer_id
)
select record_id, customer_id, startdatetime, recurrence from b
where recurrence > 0
Result:
https://data.stackexchange.com/stackoverflow/q/112808/
I just realize it should be done in access. I am so sorry, this was written for sql server 2005. I don't know how to rewrite it for access.