I've got a table where we have registries of employees and where they have worked. In each row, we have the employee's starting date on that place. It's something like this:
Employee ID
Name
Branch
Start Date
1
John Doe
234
2018-01-20
1
John Doe
300
2019-03-20
1
John Doe
250
2022-01-19
2
Jane Doe
200
2019-02-15
2
Jane Doe
234
2020-05-20
I need a query where the data returned looks for the next value, making the starting date on the next branch as the end of the current. Eg:
Employee ID
Name
Branch
Start Date
End Date
1
John Doe
234
2018-01-20
2019-03-20
1
John Doe
300
2019-03-20
2022-01-19
1
John Doe
250
2022-01-19
---
2
Jane Doe
200
2019-02-15
2020-05-20
2
Jane Doe
234
2020-05-20
---
When there is not another register, we assume that the employee is still working on that branch, so we can leave it blank or put a default "9999-01-01" value.
Is there any way we can achieve a result like this using only SQL?
Another approach to my problem would be a query that returns only the row that is in a range. For example, if I look for what branch John Doe worked in 2020-12-01, the query should return the row that shows the branch 300.
You can use LEAD() to peek at the next row, according to a subgroup and ordering within it.
For example:
select
t.*,
lead(start_date) over(partition by employee_id order by start_date) as end_date
from t
I have 2 Tables (History and Responsible). They need to be JOINED based on Service Date.
History Table:
Id
ServiceDate
Hours
ClientId
ClientName
1
2021-10-15
3
123
Tom Holland
2
2021-10-25
5
123
Tom Holland
3
2022-01-14
2
123
Tom Holland
Responsible Table:
2999-12-31 means Responsible has no end date (current)
ClientId
ClientName
ResponsibleId
ResponsibleName
ResponsibleStartDate
ResponsibleEndtDate
123
Tom Holland
77
Thomas Anderson
2020-09-17
2021-10-17
123
Tom Holland
88
Tom Cruise
2021-10-18
2999-12-31
123
Tom Holland
99
Sten Lee
2022-01-07
2999-12-31
My code produces multiple rows, because 2022-01-14 Service date falls under multiple date ranges from Responsible Table:
SELECT h.Id,
h.ServiceDate,
h.Hours,
h.ClientId,
h.ClientName,
r.ResponsibleName
FROM History AS h
LEFT JOIN Responsible AS r
ON (h.ClientId = r.ClientId AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate)
The output of the query above is:
Id
ServiceDate
Hours
ClientId
ClientName
ResponsibleName
1
2021-10-15
3
123
Tom Holland
Thomas Anderson
2
2021-10-25
5
123
Tom Holland
Tom Cruise
3
2022-01-14
2
123
Tom Holland
Tom Cruise
3
2022-01-14
2
123
Tom Holland
Sten Lee
Technically, output is correct (because 2022-01-14 is between 2021-10-18 - 2999-12-31 as well between 2022-01-07 - 2999-12-31), but not what I need.
I would like to know if possible to achieve 2 outputs:
1) If Service Date falls in multiple date ranges from Responsible Table, Responsible Should be the person who's ResponsibleStartDate is closer to the ServiceDate:
Id
ServiceDate
Hours
ClientId
ClientName
ResponsibleName
1
2021-10-15
3
123
Tom Holland
Thomas Anderson
2
2021-10-25
5
123
Tom Holland
Tom Cruise
3
2022-01-14
2
123
Tom Holland
Sten Lee
2) Keep all rows, if Service Date falls in multiple date ranges from Responsible Table, but split Hours evenly between Responsible:
Id
ServiceDate
Hours
ClientId
ClientName
ResponsibleName
1
2021-10-15
3
123
Tom Holland
Thomas Anderson
2
2021-10-25
5
123
Tom Holland
Tom Cruise
3
2022-01-14
1
123
Tom Holland
Tom Cruise
3
2022-01-14
1
123
Tom Holland
Sten Lee
First one, we can use a window function to apply a row number, based on how close to ServiceDate the ResponsibleStartDate is, then we can just pick the first row per h.Id. If there is a tie we can break it by picking something that will give us deterministic order, e.g. ORDER BY {DATEDIFF expression}, ResponsibleName.
;WITH cte AS
(
SELECT h.Id,
h.ServiceDate,
h.Hours,
h.ClientId,
h.ClientName,
r.ResponsibleName,
RankOrderedByProximityToServiceDate = ROW_NUMBER() OVER
(PARTITION BY h.Id
ORDER BY ABS(DATEDIFF(DAY, ResponsibleStartDate, ServiceDate)))
FROM dbo.History AS h
LEFT JOIN dbo.Responsible AS r
ON (h.ClientId = r.ClientId
AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate)
)
SELECT Id, ServiceDate, Hours, ClientId, ClientName, ResponsibleName
FROM cte WHERE RankOrderedByProximityToServiceDate = 1;
Output:
Id
ServiceDate
Hours
ClientId
ClientName
ResponsibleName
1
2021-10-15
3
123
Tom Holland
Thomas Anderson
2
2021-10-25
5
123
Tom Holland
Tom Cruise
3
2022-01-14
2
123
Tom Holland
Sten Lee
Second one doesn't require a CTE, we can simply divide the Hours in h by the number of rows that exist for that h.Id, then limit it to 2 decimal places:
SELECT h.Id,
h.ServiceDate,
Hours = CONVERT(decimal(11,2),
h.Hours * 1.0
/ COUNT(h.Id) OVER (PARTITION BY h.Id)),
h.ClientId,
h.ClientName,
r.ResponsibleName
FROM dbo.History AS h
LEFT JOIN dbo.Responsible AS r
ON (h.ClientId = r.ClientId
AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate);
Output:
Id
ServiceDate
Hours
ClientId
ClientName
ResponsibleName
1
2021-10-15
3.00
123
Tom Holland
Thomas Anderson
2
2021-10-25
5.00
123
Tom Holland
Tom Cruise
3
2022-01-14
1.00
123
Tom Holland
Tom Cruise
3
2022-01-14
1.00
123
Tom Holland
Sten Lee
Both demonstrated in this db<>fiddle.
My attempt at part 1 - it doesn't work if there's more than one Responsible as of the same start date.
WITH
"all_services" AS (
SELECT
h.Id,
h.ServiceDate,
h.Hours,
h.ClientId,
h.ClientName,
r.ResponsibleName,
r.ResponsibleStartDate
FROM History AS h
LEFT JOIN Responsible AS r
ON h.ClientId = r.ClientId
AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate
),
"most_recent_key" AS (
SELECT
ServiceDate,
ClientId,
MAX(ResponsibleStartDate) AS "ResponsibleStartDate"
FROM all_services
GROUP BY ServiceDate, ClientId
)
SELECT Id, ServiceDate, Hours, ClientId, ClientName, ResponsibleName
FROM all_services
INNER JOIN most_recent_key
USING (ServiceDate, ClientId, ResponsibleStartDate)
Posting it anyway as a contrast to Aaron's better solution as a learning point for myself.
I have the following dataset:
A
B
C
1
John
2018-08-14
1
John
2018-08-20
1
John
2018-09-03
2
John
2018-11-13
2
John
2018-12-11
2
John
2018-12-12
1
John
2020-01-20
1
John
2020-01-21
3
John
2021-03-02
3
John
2021-03-03
1
John
2020-05-10
1
John
2020-05-12
And I would like to have the following result:
A
B
C
1
John
2018-08-14
2
John
2018-11-13
1
John
2020-01-20
3
John
2021-03-02
1
John
2020-05-10
If I group by A, B the 1st row and the third just concatenate which is coherent. How could I create another columns to still use a group by and have the result I want.
If you have another ideas than mine, please explain it !
I tried to use some first, last, rank, dense_rank without success.
Use lag(). Looks like B is a function of A in your data. So checking lag(A) will suffice.
select A,B,C
from (
select *, case when lag(A) over(order by C) = A then 0 else 1 end startFlag
from mytable
) t
where startFlag = 1
order by C
Table: Teacher
ID_Teacher Password Id_Users Name
--------------------------------------
1001 1234 1 A
1002 1234 2 B
Table: Student
Id_Student Password Id_Users Name
--------------------------------------
52001 1234 3 C
52002 1234 4 D
Table: Employee
Id_Employee Password Id_Users Name Country
--------------------------------------------------
60001 1234 5 E NewYork
60002 1234 6 F London
Table: Shop
Id_Shop Password Id_Users Name Country
--------------------------------------------------
70001 1234 7 G NewYork
70002 1234 8 H London
Table:Transaction
ID_Transaction Transaction_Of(Fk Id_Users) Method Recived_Transaction(Fk Id_Users) Money Status TimeStamp
----------------------------------------------------------------------------------------------------------------------
1 1 Tranfer 5 500 Sucess 2020-01-05 18:00:00
2 5 Tranfer 8 500 Sucess 2020-01-05 18:00:00
I need this
Result:
ID_Transaction Transaction_Of Method Recived_Transaction(Fk Id_Users) Money Status TimeStamp
----------------------------------------------------------------------------------------------------------------------
1 A Tranfer E 500 Sucess 2020-01-05 18:00:00
2 E Tranfer H 500 Sucess 2020-01-05 18:00:00
How I can write query?
This may work
$result = $this->db->select("t.ID_Transaction, c.Name, t.Method, e.Name, t.Money, t.Status, t.TimeStamp")
->join("Teacher as c" , "t.Transaction_Of = c.Id_Users")
->join("Employee as e" , "t.Recived_Transaction = e.Id_Users")
->get("Table_Transaction as t")->result_array();
How would I go about extracting the overlapping dates from the following table?
ID Name StartDate EndDate Type
==============================================================
1 John Smith 01/01/2014 31/01/2014 A
2 John Smith 20/01/2014 20/02/2014 B
3 John Smith 01/03/2014 28/03/2014 A
4 John Smith 18/03/2014 24/03/2014 B
5 John Smith 01/07/2014 31/07/2014 A
6 John Smith 15/07/2014 31/07/2014 B
7 John Smith 25/07/2014 25/08/2014 C
Based on the first example for John Smith, the dates 01/01/2014 to 31/01/2014 overlap with 20/01/2014 to 20/02/2014, so I am expecting just overlapping period back which is 20/01/2014 to 31/01/2014.
The final result would be:
ID Name StartDate EndDate
==================================================
8 John Smith 20/01/2014 31/01/2014
9 John Smith 18/03/2014 24/03/2014
10 John Smith 15/07/2014 31/07/2014
11 John Smith 25/07/2014 31/07/2014
HELP REQUIRED 10 August 2014
In addition to the above request, I am looking for help or guidance on how to get the following results which should include the dates that overlap and the dates that don't. The ID column is irrelevant.
ID Name StartDate EndDate Type
==================================================
1 John Smith 01/01/2014 19/01/2014 A
8 John Smith 20/01/2014 31/01/2014 AB
2 John Smith 01/02/2014 20/02/2014 B
3 John Smith 01/03/2014 17/03/2014 A
9 John Smith 18/03/2014 24/03/2014 AB
3 John Smith 25/03/2014 28/03/2014 A
5 John Smith 01/07/2014 14/07/2014 A
10 John Smith 15/07/2014 31/07/2014 AB
11 John Smith 25/07/2014 31/07/2014 ABC
7 John Smith 01/08/2014 25/08/2014 C
Although the following image is not an exact reflection of the above, for illustration purposes, I am interested in seeing the dates that overlap (red) and the dates that don't (sky blue) in the same result set.
http://imgur.com/SeR9sY1
If you want just overlapping periods, you can get this with a self join. Do note that the results might be redundant if more than two periods overlap on certain dates.
select ft.name,
(case when max(ft.startdate) > max(ft2.startdate) then max(ft.startdate)
else max(ft2.startdate)
end) as startdate,
(case when min(ft.enddate) > min(ft2.enddate) then min(ft.enddate)
else min(ft2.enddate)
end) as enddate
from followingtable ft join
followingtable ft2
on ft.name = ft2.name and
ft.id < ft2.id and
ft.startdate <= ft2.enddate and
ft.enddate > ft2.startdate
group by ft.name, ft.id, ft2.id;
This doesn't assign the ids. You can do that with row_number() and an offset.