Find missing record between date range - sql

At the end of an enormous stored procedure (in SQL Server), I've created two CTE. One with some date ranges (with 6 month intervals) and one with some records.
Let's assume i have date ranges on table B from 2020-01-01 to 2010-01-01 (with 6 months intervals)
Start End
----------------------
2020-01-01 | 2020-07-01
... ...
other years here
... ...
2010-01-01 | 2010-07-01
and on table A this situation:
Name Date
-----------------
John 2020-01-01
John 2019-01-01
John 2018-07-01
... ...
Rob 2020-01-01
Rob 2019-07-01
Rob 2018-07-01
... ...
I'm trying to generate a recordset like this:
Name MissingDate
-----------------
John 2019-07-01
... ...
John 2010-01-01
Rob 2019-01-01
... ...
Rob 2010-01-01
I've got the flu and I barely know who I am at this moment, I hope it was clear and if anyone could help me with this I would really appreciate it.

If you want missing dates (which appear to be by month), then generate all available dates and take out the ones you have.
with cte as (
select start, end
from dateranges
union all
select dateadd(month, 1, start), end
from cte
where start < end
)
select n.name, cte.start
from cte cross join
(select distinct name from tablea) n left join
tablea a
on a.date = cte.start and a.name = n.name
where a.date is null;

Related

How to subtract next row from first one for each account id in SQL?

The question I am trying to answer is how can I return the correct order and sequence of weeks for each ID? For example, while it is true the first week for each ID will always start at 1 (its the first week in the series), it could be the following date in the series may also be within the first week (e.g., so should return 1 again) or perhaps be a date that falls in the 3rd week (e.g., so should return 3).
The code I've written so far is:
select distinct
row_number() over (partition by ID group by date) row_nums
,ID
,date
from table_a
Which simply returns the running tally of dates by ID, and doesn't take into account what week number that date falls in.
But what I'm looking for is this:
Here's some setup code to assist:
CREATE TABLE random_table
(
ID VarChar(50),
date DATETIME
);
INSERT INTO random_table
VALUES
('AAA',5/14/2021),
('AAA',6/2/2021),
('AAA',7/9/2021),
('BBB', 5/25/2021),
('CCC', 12/2/2020),
('CCC',12/6/2020),
('CCC',12/10/2020),
('CCC',12/14/2020),
('CCC',12/18/2020),
('CCC',12/22/2020),
('CCC',12/26/2020),
('CCC',12/30/2020),
('CCC',1/3/2021),
('DDD',1/7/2021),
('DDD',1/11/2021)
with adj as (
select *, dateadd(day, -1, "date") as adj_dt
from table_a
)
select
datediff(week,
min(adj_dt) over (partition by id),
adj_dt) + 1 as week_logic,
id, "date"
from adj
This assumes that your idea of weeks corresponds with ##datefirst set as Sunday. For a Sunday to Saturday definition you would find 12/06/2020 and 12/10/2020 in the same week, so presumably you want something like a Monday start instead (which also seems to line up with the numbering for 12/02/2020, 12/14/2020 and 12/18/2020.) I'm compensating by sliding backward a day in the weeks calculation. That step could be handled inline without a CTE but perhaps it illustrates the approach more clearly.
Your objective isn't clear but I think you would benefit from a Tally-Table of the weeks and then LEFT JOIN to your source data.
This will give you a row for each week AND source data if it exists
SELECT
CASE WHEN ROW_NUMBER() OVER (PARTITION BY ID ORDER BY [date])=1 THEN 1
ELSE DATEPART(WK, (DATE) ) - DATEPART(WK, FIRST_VALUE([DATE]) OVER (PARTITION BY ID ORDER BY [date])) END PD,
ID,
CONVERT(VARCHAR(10), [date],120)
FROM random_table rt
ORDER BY ID,[date]
DBFIDDLE
output:
PD
ID
(No column name)
1
AAA
2021-05-14
3
AAA
2021-06-02
8
AAA
2021-07-09
1
BBB
2021-05-25
1
CCC
2020-12-02
1
CCC
2020-12-06
1
CCC
2020-12-10
2
CCC
2020-12-14
2
CCC
2020-12-18
3
CCC
2020-12-22
3
CCC
2020-12-26
4
CCC
2020-12-30
-47
CCC
2021-01-03
1
DDD
2021-01-07
1
DDD
2021-01-11
Dates are in the format YYYY-MM-DD.
I will leave the -47 in here, so you can fix it yourself (as an exercise) 😁😉

query to find the records between two tables for a given date range

I have a employee table which holds the information about which department the employee belongs during a period of time. At any point in a time a employee can belong to only one department. The end date column holds till what date the employee had stayed in a particular department. if the end date column holds a future date which means thats the latest department for a employee.
empid
deptname
startdate
enddate
1
sales
jan-20-2022
jan-24-2022
1
marketing
jan-25-2022
feb-03-2022
1
support
feb-04-2022
feb-06-2022
1
training
feb-07-2022
dec-31-2050
I have a call details table which holds the information of which employee took the call and what is call start time and call end time.
call_id
empid
callstart_time
callendtime
10
1
jan-21-2022 10:00:00
jan-21-2022 10:30:00
11
1
jan-21-2022 10:40:00
jan-21-2022 10:45:00
12
1
feb-01-2022 11:20:00
feb-01-2022 11:30:00
13
1
feb-05-2022 09:00:00
feb-05-2022 10:00:00
14
1
feb-08-2022 10:00:00
feb-08-2022 11:00:00
Now my question is:
I am looking for inputs and the sample query where i need to know what was the employees department during the time the employee took the call.
For example, if I want to know what are the calls took by an employee from jan-20-2022 to feb-02-2022 and what was there department name during the time of the call. i need the below output.
call_id
empid
callstart_time
callendtime
deptname
10
1
jan-21-2022 10:00:00
jan-21-2022 10:30:00
sales
11
1
jan-21-2022 10:40:00
jan-21-2022 10:45:00
sales
12
1
feb-01-2022 11:20:00
feb-01-2022 11:30:00
marketing
If i run the query for a date range from feb-04-2022 to feb-10-2022, i want to see the below output
call_id
empid
callstart_time
callendtime
deptname
13
1
feb-05-2022 09:00:00
feb-05-2022 10:00:00
support
14
1
feb-08-2022 10:00:00
feb-08-2022 11:00:00
training
please share few inputs on how to achieve this output using the sql query
A CROSS APPLY lets you define a subselect to pick the applicable employee record. In your case, the latest employee record prior to the call. Something like:
SELECT C.*, E.deptname
FROM calldetails C
CROSS APPLY (
SELECT TOP 1 *
FROM employee E
WHERE E.empid = C.empid
AND E.startdate <= C.callstart_time
ORDER BY E.startdate DESC
) E
ORDER BY C.callstart_time
See this db<>fiddle
Or since you have end date, a simple join will do
SELECT C.*, E.deptname
FROM calldetails C
JOIN employee E
ON E.empid = C.empid
AND E.startdate <= C.callstart_time
AND E.enddate > DATEADD(day, -1, C.callstart_time)
ORDER BY C.callstart_time
Note the date adjustment needed for the enddate comparison. This is needed because you are using inclusive enddates, Using exclusive end dates (where enddate = startdate for the next record works much better for range checks and calculations.

Selecting the most recent date

I have data structured like this:
ID | Enrolment_Date | Appointment1_Date | Appointment2_Date | .... | Appointment150_Date |
112 01/01/2015 01/02/2015 01/03/2018 01/08/2018
113 01/06/2018 01/07/2018 NULL NULL
114 01/04/2018 01/05/2018 01/06/2018 NULL
I need a new variable which counts the number of months between the enrolment_date and the most recent appointment. The challenge is is that all individuals have a different number of appointments.
Update: I agree with the comments that this is poor table design and it needs to be reformatted. Could proposed solutions please include suggested code on how to transform the table?
Since the OP is currently stuck with this bad design, I will point out a temporary solution. As others have suggested, you really must change the structure here. For now, this will suffice:
SELECT '['+ NAME + '],' FROM sys.columns WHERE OBJECT_ID = OBJECT_ID ('TableA') -- find all columns, last one probably max appointment date
SELECT ID,
Enrolment_Date,
CASE WHEN Appointment150_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment150_Date)
WHEN Appointment149_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment149_Date)
WHEN Appointment148_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment148_Date)
WHEN Appointment147_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment147_Date)
WHEN Appointment146_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment146_Date)
WHEN Appointment145_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment145_Date)
WHEN Appointment144_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment144_Date) -- and so on
END AS NumberOfMonths
FROM TableA
This is a very ugly temporary solution and should be considered as such.
You will need to restructure your data, the given structure is poor database design. Create two separate tables - one called users and one called appointments. The users table contains the user id, enrollment date and any other specific user information. Each row in the appointments table contains the user's unique id and a specific appointment date. Structuring your tables like this will make it easier to write a query to get days/months since last appointment.
For example:
Users Table:
ID, Enrollment_Date
1, 2018-01-01
2, 2018-03-02
3, 2018-05-02
Appointments Table:
ID, Appointment_Date
1, 2018-01-02
1, 2018-02-02
1, 2018-02-10
2, 2018-05-01
You would then be able to write a query to join the two tables together and calculate the difference between the enrollment date and min value of the appointment date.
It is better if you can create two tables.
Enrolment Table (dbo.Enrolments)
ID | EnrolmentDate
1 | 2018-08-30
2 | 2018-08-31
Appointments Table (dbo.Appointments)
ID | EnrolmentID | AppointmentDate
1 | 1 | 2018-09-02
2 | 1 | 2018-09-03
3 | 2 | 2018-09-01
4 | 2 | 2018-09-03
Then you can try something like this.
If you want the count of months from Enrolment Date to the final appointment date then use below query.
SELECT E.ID, E.EnrolmentDate, A.NoOfMonths
FROM dbo.Enrolments E
OUTER APPLY
(
SELECT DATEDIFF(mm, E.EnrolmentDate, MAX(A.AppointmentDate)) AS NoOfMonths
FROM dbo.Appointments A
WHERE A.EnrolmentId = E.ID
) A
And, If you want the count of months from Enrolment Date to the nearest appointment date then use below query.
SELECT E.ID, E.EnrolmentDate, A.NoOfMonths
FROM dbo.Enrolments E
OUTER APPLY
(
SELECT DATEDIFF(mm, E.EnrolmentDate, MIN(A.AppointmentDate)) AS NoOfMonths
FROM dbo.Appointments A
WHERE A.EnrolmentId = E.ID
) A
Try this on sqlfiddle
You have a lousy data structure, as others have noted. You really one a table with one row per appointment. After all, what happens after the 150th appointment?
select t.id, t.Enrolment_Date,
datediff(month, t.Enrolment_Date, m.max_Appointment_Date) as months_diff
from t cross apply
(select max(Appointment_Date) as max_Appointment_Date
from (values (Appointment1_Date),
(Appointment2_Date),
. . .
(Appointment150_Date)
) v(Appointment_Date)
) m;

Counting number of Days in the where clause

I have a SQL Server table that looks like this:
ID | Club Name | Booking Date | Submission Date
---+-------------+-------------------------+-------------------------
1 | Basketball | 2015-10-21 00:00:00.000 | 9/18/2015 3:23:42 PM
2 | Tennis | 2015-10-14 00:00:00.000 | 9/28/2015 1:50:25 PM
3 | Basketball | 2015-10-06 00:00:00.000 | 9/29/2015 11:08:20 AM
1 | Other | 2015-10-21 00:00:00.000 | 9/29/2015 11:08:39 AM
I want to know how many times each club did a submission less than 15 days from the booking date..
The solution I came up with was adding a new column and running a the datefiff function and storing the value in the new column.. Then just grouping by club name and adding a parameter for > 15 on the new column..
The question I have is: can this be done on the fly with out having to create the new column? how much would that affect performance if its done on the fly?
Yes, this can be done inline, in a query. In a database, you almost never want to store a calculated column, which is what that datediff column would be. Instead, you can do the math in the WHERE clause.
SELECT
*
FROM
myTable
WHERE
DATEDIFF(day, -15, BookingDate) >= SubmissionDate
I wrote that pretty quickly, so the date math might be going in the wrong direction (checking in the future instead of in the past) but playing with the above query should set you on the right path. Just keep in mind that, if this table gets very big, you're going to be doing a TON of DATEDIFFs and that can have a performance impact.
Something like this?
Declare #Table table (Id int,Club_Name varchar(50),Booking_Date datetime,Sumbission_Date datetime)
Insert #Table values
(1,'Basketball','2015-10-21 00:00:00.000','9/18/2015 3:23:42 PM'),
(2,'Tennis ','2015-10-14 00:00:00.000','9/28/2015 1:50:25 PM'),
(3,'Basketball','2015-10-06 00:00:00.000','9/29/2015 11:08:20 AM'),
(1,'Other ','2015-10-21 00:00:00.000','9/29/2015 11:08:39 AM')
Select Club_Name
,Submissions= count(*)
,Early = sum(case when datediff(DD,Sumbission_Date,Booking_Date)<15 then 1 else 0 end)
From #Table
Group By Club_Name
Returns
Club_Name Submissions Early
Basketball 2 1
Other 1 0
Tennis 1 0
Try this.
SELECT ID,
ClubName,
Sum(Value) As Ttle
FROM
(
SELECT ID,
ClubName,
COUNT(*) AS Value
FROM TableName
GROUP BY ID,
ClubName,
RecordDate
HAVING DATEDIFF(D, BookingDate, SubmissionDate) > 15
) Data
GROUP BY ID,
ClubName,
ORDER BY ttle DESC

Total number of days for a task before going on to the next one, grouped by person

I am trying to figure out how to show how many days have been worked on a certain task by using the dates in between each “task login” for each person. I think this can be done with one query? I'm open to suggestions and/or ideas.
The Table:
--------+-----------+----------
Person | TaskLogin | Date
--------+-----------+----------
Jane | A | 2013-01-01
Jane | B | 2013-01-03
Jane | A | 2013-01-06
Jane | B | 2013-01-10
Bob | A | 2013-01-01
Bob | A | 2013-01-06
---------------------------------------------------------------------
Row 1: Jane starts task A starting 2013-01-01 and works on it until starting Task B on 2013-01-03 = 2 days worked on Task A
Row 2: Jane starts on task B starting 2013-01-03 and works on it until starting task A on 2013-01-06 = 3 days worked on Task B
Row 3: Jane starts on task A starting 2013-01-06 and works on it until starting task B on 2013-01-10 = 4 days worked on Task A
Row 4: Skip because that is the highest date for Jane (Jane may or may not finish task B 2013-01-10 but we will not count it)
Row 5: Bob starts task A starting on 2013-01-01 and works on it until continuing to work on task A by logging it again on 2013-01-06 = 5 days worked on task A
Row 6: Skip because that is the highest date for Bob
A = 11 days because 2 + 4 + 5
B = 3 days because of Row 2
The output:
------+---------------------
Tasks | Time between Tasks
------+---------------------
A | 11 days
B | 3 days
**EDIT:*****
The solutions of Nicarus and Gordon Linoff (first pre-2013 solution specifically, with my edits in the comments) works. Note that (select distinct * from table t) t for table can be added to Gordon Linoff's solution to accommodate for the case of someone logging in twice in the same day.
What you are looking for is the lead() function. This is only available in SQL Server 2012. Before that, the easiest way is a correlated subquery:
select TaskLogin, sum(datediff(day, date, nextdate)) as days
from (select t.*,
(select top 1 date
from table t2
where t2.person = t.person
order by date desc
) as nextdate
from table t
) t
where nextdate is not null
group by TaskLogin;
In SQL Server 2012, it would be:
select TaskLogin, sum(datediff(day, date, nextdate)) as days
from (select t.*, lead(date) over (partition by person order by date) as nextdate
from table t
) t
where nextdate is not null
group by TaskLogin;
Maybe not the most elegant way, but it certainly works:
-- Setup table/insert values --
IF OBJECT_ID('TempDB.dbo.#TaskAccounting') IS NOT NULL BEGIN
DROP TABLE #TaskAccounting
END
CREATE TABLE #TaskAccounting
(
Person VARCHAR(4) NOT NULL,
TaskLogin CHAR(1) NOT NULL,
TaskDate DATETIME NOT NULL
)
INSERT INTO #TaskAccounting
VALUES ('Jane','A','2013-01-01')
INSERT INTO #TaskAccounting
VALUES ('Jane','B','2013-01-03')
INSERT INTO #TaskAccounting
VALUES ('Jane','A','2013-01-06')
INSERT INTO #TaskAccounting
VALUES ('Jane','B','2013-01-10')
INSERT INTO #TaskAccounting
VALUES ('Bob','A','2013-01-01')
INSERT INTO #TaskAccounting
VALUES ('Bob','A','2013-01-06');
-- Use a CTE to add sequence and join on it --
WITH Tasks AS (
SELECT
Person,
TaskLogin,
TaskDate,
ROW_NUMBER() OVER(PARTITION BY Person ORDER BY TaskDate) AS Sequence
FROM
#TaskAccounting
)
SELECT
a.TaskLogin AS Tasks,
CAST(SUM(DATEDIFF(DD,a.TaskDate,b.TaskDate)) AS VARCHAR) + ' days' AS TimeBetweenTasks
FROM
Tasks a
JOIN
Tasks b
ON (a.Person = b.Person)
AND (a.Sequence = b.Sequence - 1)
GROUP BY
a.TaskLogin