JOIN Tables based on Service Date - sql

I have 2 Tables (History and Responsible). They need to be JOINED based on Service Date.
History Table:
Id
ServiceDate
Hours
ClientId
ClientName
1
2021-10-15
3
123
Tom Holland
2
2021-10-25
5
123
Tom Holland
3
2022-01-14
2
123
Tom Holland
Responsible Table:
2999-12-31 means Responsible has no end date (current)
ClientId
ClientName
ResponsibleId
ResponsibleName
ResponsibleStartDate
ResponsibleEndtDate
123
Tom Holland
77
Thomas Anderson
2020-09-17
2021-10-17
123
Tom Holland
88
Tom Cruise
2021-10-18
2999-12-31
123
Tom Holland
99
Sten Lee
2022-01-07
2999-12-31
My code produces multiple rows, because 2022-01-14 Service date falls under multiple date ranges from Responsible Table:
SELECT h.Id,
h.ServiceDate,
h.Hours,
h.ClientId,
h.ClientName,
r.ResponsibleName
FROM History AS h
LEFT JOIN Responsible AS r
ON (h.ClientId = r.ClientId AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate)
The output of the query above is:
Id
ServiceDate
Hours
ClientId
ClientName
ResponsibleName
1
2021-10-15
3
123
Tom Holland
Thomas Anderson
2
2021-10-25
5
123
Tom Holland
Tom Cruise
3
2022-01-14
2
123
Tom Holland
Tom Cruise
3
2022-01-14
2
123
Tom Holland
Sten Lee
Technically, output is correct (because 2022-01-14 is between 2021-10-18 - 2999-12-31 as well between 2022-01-07 - 2999-12-31), but not what I need.
I would like to know if possible to achieve 2 outputs:
1) If Service Date falls in multiple date ranges from Responsible Table, Responsible Should be the person who's ResponsibleStartDate is closer to the ServiceDate:
Id
ServiceDate
Hours
ClientId
ClientName
ResponsibleName
1
2021-10-15
3
123
Tom Holland
Thomas Anderson
2
2021-10-25
5
123
Tom Holland
Tom Cruise
3
2022-01-14
2
123
Tom Holland
Sten Lee
2) Keep all rows, if Service Date falls in multiple date ranges from Responsible Table, but split Hours evenly between Responsible:
Id
ServiceDate
Hours
ClientId
ClientName
ResponsibleName
1
2021-10-15
3
123
Tom Holland
Thomas Anderson
2
2021-10-25
5
123
Tom Holland
Tom Cruise
3
2022-01-14
1
123
Tom Holland
Tom Cruise
3
2022-01-14
1
123
Tom Holland
Sten Lee

First one, we can use a window function to apply a row number, based on how close to ServiceDate the ResponsibleStartDate is, then we can just pick the first row per h.Id. If there is a tie we can break it by picking something that will give us deterministic order, e.g. ORDER BY {DATEDIFF expression}, ResponsibleName.
;WITH cte AS
(
SELECT h.Id,
h.ServiceDate,
h.Hours,
h.ClientId,
h.ClientName,
r.ResponsibleName,
RankOrderedByProximityToServiceDate = ROW_NUMBER() OVER
(PARTITION BY h.Id
ORDER BY ABS(DATEDIFF(DAY, ResponsibleStartDate, ServiceDate)))
FROM dbo.History AS h
LEFT JOIN dbo.Responsible AS r
ON (h.ClientId = r.ClientId
AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate)
)
SELECT Id, ServiceDate, Hours, ClientId, ClientName, ResponsibleName
FROM cte WHERE RankOrderedByProximityToServiceDate = 1;
Output:
Id
ServiceDate
Hours
ClientId
ClientName
ResponsibleName
1
2021-10-15
3
123
Tom Holland
Thomas Anderson
2
2021-10-25
5
123
Tom Holland
Tom Cruise
3
2022-01-14
2
123
Tom Holland
Sten Lee
Second one doesn't require a CTE, we can simply divide the Hours in h by the number of rows that exist for that h.Id, then limit it to 2 decimal places:
SELECT h.Id,
h.ServiceDate,
Hours = CONVERT(decimal(11,2),
h.Hours * 1.0
/ COUNT(h.Id) OVER (PARTITION BY h.Id)),
h.ClientId,
h.ClientName,
r.ResponsibleName
FROM dbo.History AS h
LEFT JOIN dbo.Responsible AS r
ON (h.ClientId = r.ClientId
AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate);
Output:
Id
ServiceDate
Hours
ClientId
ClientName
ResponsibleName
1
2021-10-15
3.00
123
Tom Holland
Thomas Anderson
2
2021-10-25
5.00
123
Tom Holland
Tom Cruise
3
2022-01-14
1.00
123
Tom Holland
Tom Cruise
3
2022-01-14
1.00
123
Tom Holland
Sten Lee
Both demonstrated in this db<>fiddle.

My attempt at part 1 - it doesn't work if there's more than one Responsible as of the same start date.
WITH
"all_services" AS (
SELECT
h.Id,
h.ServiceDate,
h.Hours,
h.ClientId,
h.ClientName,
r.ResponsibleName,
r.ResponsibleStartDate
FROM History AS h
LEFT JOIN Responsible AS r
ON h.ClientId = r.ClientId
AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate
),
"most_recent_key" AS (
SELECT
ServiceDate,
ClientId,
MAX(ResponsibleStartDate) AS "ResponsibleStartDate"
FROM all_services
GROUP BY ServiceDate, ClientId
)
SELECT Id, ServiceDate, Hours, ClientId, ClientName, ResponsibleName
FROM all_services
INNER JOIN most_recent_key
USING (ServiceDate, ClientId, ResponsibleStartDate)
Posting it anyway as a contrast to Aaron's better solution as a learning point for myself.

Related

Duplicating a row and changing one word

I have a table that contains data similar to the following
ID Created Username Email Dept PW
1 01/01/2021 07:00 admin a werfv
2 02/01/2021 07:00 George a rtyh
3 03/01/2021 07:00 Jane a earg
4 04/01/2021 07:00 Admin b sdfbrgsth
5 05/01/2021 07:00 George b sdgrf
6 06/01/2021 07:00 Mike b sthjyu
7 07/01/2021 07:00 admin c drytdyt
8 08/01/2021 07:00 jenny c aregerg
9 09/01/2021 07:00 admin d erte453
10 10/01/2021 07:00 harry d argkjtyui
now I need to change every line where the user is admin to make the user John and create new duplicate entries for username jason, liz, sally for that dept
there are 463 depts so manually doing this will take awhile
Something like this?
WITH CTE AS(
SELECT DISTINCT
YT.dept,
V.username
FROM dbo.YourTable YT
CROSS APPLY (VALUES('jason','liz','sally')
WHERE YT.username = 'admin')
INSERT INTO dbo.YourTable (dept, username)
SELECT C.dept
C.username
FROM CTE;
UPDATE dbo.YourTable
SET username = 'John'
WHERE username = 'admin';

How to select data in SQL based on a filter which changes if there is no data in a specific table column?

I have tables similar to the three below. I need to join the first two tables based on id, and then join the third table based on second name. However the last table needs a filter where the city should be equal to London unless age is empty in which case the city should equal Manchester.
I tried the code below using CASE statement but it is not working. I am new to SQL so I was not sure how can I combine a where statement with an if clause where the filter for the selection changes depending on whether there is data in a different column than the one used to filter by. The DBMS I am using Toad for Oracle.
FIRST.NAME.TABLE
ID FIRST_NAME ENTRY_DATE
1 JOHN 09/09/2019
2 NICOLA 09/09/2019
3 PATRICK 05/09/2019
4 JOAN 01/09/2019
5 JAKE 09/09/2019
6 AMELIA 01/09/2019
7 CAMERON 09/09/2019
SECOND.NAME.TABLE
ID SECOND_NAME ENTRY_DATE
1 BROWN 09/09/2019
2 SMITH 09/09/2019
3 COLE 05/09/2019
4 HOUSTON 01/09/2019
5 FARRIS 09/09/2019
6 HATHAWAY 01/09/2019
7 JONES 09/09/2019
CITY.AGE.TABLE
CITY SECOND_NAME AGE
LONDON BROWN 24.00
LONDON SMITH
MANCHESTER COLE 30.00
MANCHESTER HOUSTON 66.00
LONDON FARRIS
LONDON HATHAWAY 32.00
GLASGOW JONES 28.00
MANCHESTER SMITH 32.00
LONDON FARRIS 62.00
SELECT FN.ID,
FN.FIRST_NAME,
SN.SECOND_NAME,
AC.CITY,
AC.AGE
FROM FIRST.NAME.TABLE AS FN
INNER JOIN SECOND.NAME.TABLE SN
ON FN.ID=SN.ID
INNER JOIN CITY.AGE.TABLE AS CA
ON SN.SECOND NAME=AC.SECOND_NAME
WHERE FN.ENTRY_DATE='09-SEP-19'
AND SN.ENTRY_DATE='09-SEP-19'
AND (CASE WHEN AC.CITY='LONDON' AND AC.AGE IS NOT NULL
THEN AC.CITY='LONDON'
ELSE AS.CITY='MANCHESTER' END)
You can express this as boolean logic:
WHERE FN.ENTRY_DATE = DATE '2019-09-09' AND
SN.ENTRY_DATE = DATE '2019-09-09' AND
(AC.AGE IS NOT NULL AND AC.CITY = 'LONDON' OR
AC.AGE IS NULL AND AC.CITY = 'MANCHESTER'
)
This answers your question about how to implement the logic using SQL. However, I'm not sure that is the logic that you really want. I speculate that you really want a LEFT JOIN to the age table.

How to write a query to identify names with similar sounds?

How do I write a query to identify names(possibly including non-English names) that have similar sounds? Soundex does not seem to handle non-English names well.
The code should be able to identify that for example the following(or most of them) are names with similar sounds?
Helena - Elena
Violet - Viola
Beatrix - Beatrice
Madeline - Madeleine (ma-duh-LINE vs ma-duh-LEN)
Alice - Elise
Madeline - Adeline
Kristen - Kirsten
Lily - Millie
Charlotte - Scarlett
Zara / Lara / Sara / Mara
Elena - Alana
Emily - Emmeline
Amelia - Amalia
Stella - Bella - Ella
Isabel - Isabeau
Holly - Hallie
Laura - Lara
Fiona - Finola
Louise - Eloise
Cara - Clara
Susanna vs Susannah
Nora vs Norah
Talia vs Tahlia vs Thalia
Catherine vs Katherine
Cecilia vs Cecelia
Lucy vs Lucie
Vivian vs Vivien
Lillian vs Lilian
Gwendolen vs Gwendolyn
Sofia vs Sophia
Isabel vs Isobel vs Isabelle
Seraphina vs Serafina
Juliet vs Juliette
Annabel vs Annabelle
Emily vs Emilie
Elisabeth vs Elizabeth
...and non-English names too.
Would it help by using algorithm like Levenshtein Distance to compare the similarity between two sequences?
https://en.wikipedia.org/wiki/Levenshtein_distance
Particularly in Oracle, you can use utl_match.
For example:
--Find closest names based on UTL_MATCH.EDIT_DISTANCE.
with names as
(
--Names data.
select column_value name
from table(sys.odcivarchar2list('Adeline','Alana','Alice','Amalia','Amelia','Annabel',
'Annabelle','Beatrice','Beatrix','Bella','Cara','Catherine','Cecelia','Cecilia',
'Charlotte','Clara','Elena','Elisabeth','Elise','Elizabeth','Ella','Eloise','Emilie',
'Emily','Emmeline','Finola','Fiona','Gwendolen','Gwendolyn','Hallie','Helena','Holly',
'Isabeau','Isabel','Isabelle','Isobel','Juliet','Juliette','Katherine','Kirsten',
'Kristen','Lara','Laura','Lilian','Lillian','Lily','Louise','Lucie','Lucy',
'Madeleine','Madeline','Mara','Millie','Nora','Norah','Sara','Scarlett','Serafina',
'Seraphina','Sofia','Sophia','Stella','Susanna','Susannah','Tahlia','Talia','Thalia',
'Viola','Violet','Vivian','Vivien','Zara'))
)
--Name with the closest matches.
select name1, edit_distance, listagg(name2, ',') within group (order by name2) names
from
(
--Compare strings.
select names1.name name1, names2.name name2
,utl_match.edit_distance(names1.name, names2.name) edit_distance
,min(utl_match.edit_distance(names1.name, names2.name))
over (partition by names1.name) min_edit_distance
from names names1
cross join names names2
--This cross join could get expensive. It may help to add conditions here to
--filter out obvious non-matches. For example, maybe throw out rows where the
--string length is vastly different?
where names1.name <> names2.name
order by 1, 3, 2
)
where edit_distance = min_edit_distance
group by name1, edit_distance
order by 1;
Results:
NAME1 EDIT_DISTANCE NAMES
----- ------------- -----
Adeline 2 Madeline
Alana 2 Clara,Elena
Alice 2 Elise
Amalia 1 Amelia
Amelia 1 Amalia
Annabel 2 Annabelle
Annabelle 2 Annabel
Beatrice 2 Beatrix
Beatrix 2 Beatrice
Bella 2 Ella,Stella
Cara 1 Clara,Lara,Mara,Sara,Zara
Catherine 1 Katherine
Cecelia 1 Cecilia
Cecilia 1 Cecelia
Charlotte 4 Scarlett
Clara 1 Cara
Elena 2 Alana,Ella,Helena
Elisabeth 1 Elizabeth
Elise 1 Eloise
Elizabeth 1 Elisabeth
Ella 2 Bella,Elena
Eloise 1 Elise
Emilie 2 Emily
Emily 2 Emilie,Lily
Emmeline 3 Adeline,Emilie,Madeline
Finola 2 Fiona,Viola
Fiona 2 Finola,Viola
Gwendolen 1 Gwendolyn
Gwendolyn 1 Gwendolen
Hallie 2 Millie
Helena 2 Elena
Holly 3 Bella,Ella,Emily,Hallie,Lily
Isabeau 2 Isabel
Isabel 1 Isobel
Isabelle 2 Isabel
Isobel 1 Isabel
Juliet 2 Juliette
Juliette 2 Juliet
Katherine 1 Catherine
Kirsten 2 Kristen
Kristen 2 Kirsten
Lara 1 Cara,Laura,Mara,Sara,Zara
Laura 1 Lara
Lilian 1 Lillian
Lillian 1 Lilian
Lily 2 Emily,Lucy
Louise 3 Elise,Eloise,Lucie
Lucie 2 Lucy
Lucy 2 Lily,Lucie
Madeleine 1 Madeline
Madeline 1 Madeleine
Mara 1 Cara,Lara,Sara,Zara
Millie 2 Hallie
Nora 1 Norah
Norah 1 Nora
Sara 1 Cara,Lara,Mara,Zara
Scarlett 4 Charlotte
Serafina 2 Seraphina
Seraphina 2 Serafina
Sofia 2 Sophia
Sophia 2 Sofia
Stella 2 Bella
Susanna 1 Susannah
Susannah 1 Susanna
Tahlia 1 Talia
Talia 1 Tahlia,Thalia
Thalia 1 Talia
Viola 2 Finola,Fiona,Violet
Violet 2 Viola
Vivian 1 Vivien
Vivien 1 Vivian
Zara 1 Cara,Lara,Mara,Sara

SQL: Group By on Consecutive Records Part 2

Problem 1
I have the following table which shows the location of a person at 1 hour intervals
Id EntityID EntityName LocationID Timex delta
1 1 Mickey Club house 0300 1
2 1 Mickey Club house 0400 1
3 1 Mickey Park 0500 2
4 1 Mickey Minnies Boutique 0600 3
5 1 Mickey Minnies Boutique 0700 3
6 1 Mickey Club house 0800 4
7 1 Mickey Club house 0900 4
8 1 Mickey Park 1000 5
9 1 Mickey Club house 1100 6
The delta increments by +1 every time the location changes.
I would like to return an aggregate grouped by delta as per example below.
EntityName LocationID StartTime EndTime
Mickey Club house 0300 0500
Mickey Park 0500 0600
Mickey Minnies Boutique 0600 0800
Mickey Club house 0800 1000
Mickey Park 1000 1100
Mickey Club house 1100 1200
I am using the following query which was taken and adapted from here
SQL: Group By on Consecutive Records
(which works fine):
select
min(timex) as start_date
,end_date
,entityid
,entityname
,locationid
,delta
from
(
select
s1.timex
,(
select
max(timex)
from
[locationreport2] s2
where
s2.entityid = s1.entityid
and s2.delta = s1.delta
and not exists
(
select
null
from
[dbo].[locationreport2] s3
where
s3.timex < s2.timex
and s3.timex > s1.timex
and s3.entityid <> s1.entityid
and s3.entityname <> s1.entityname
and s3.delta <> s1.delta
)
) as end_date
, s1.entityid
, s1.entityname
, s1.locationid
, s1.delta
from
[dbo].[locationreport2] s1
) Result
group by
end_date
, entityid
, entityname
, locationid
, delta
order by
1 asc
However I would like to not use the delta (it takes effort to calculate and populate it); instead I am wondering if there is any way to calculate it as part / whilst running the query.
I wouldn’t mind using a view either.
 
Problem 2
I have the following table which shows the location of different people at 1 hour intervals
Id EntityID EntityName LocationID Timex Delta
1 1 Mickey Club house 0900 1
2 1 Mickey Club house 1000 1
3 1 Mickey Park 1100 2
4 2 Donald Club house 0900 1
5 2 Donald Park 1000 2
6 2 Donald Park 1100 2
7 3 Goofy Park 0900 1
8 3 Goofy Club house 1000 2
9 3 Goofy Park 1100 3
I would like to return an aggregate grouped by person and location.
For example
EntityID EntityName LocationID StartTime EndTime
1 Mickey Club house 0900 1100
1 Mickey Park 1100 1200
2 Donald Club house 0900 1000
2 Donald Park 1000 1200
3 Goofy Park 0900 1000
3 Goofy Club house 1000 1100
3 Goofy Park 1100 1200
What modifications do I need to the above query (Problem 1)?
Sounds like a case for analytic functions. You would need to add 1 hour to the EndTime, but that will be dependent on the data type.
SELECT EntityName, LocationID, StartTime, EndTime
FROM ( SELECT EntityName, LocationID
,MIN(Timex) OVER (PARTITION BY EntityID, delta ORDER BY delta) AS StartTime
,MAX(Timex) OVER (PARTITION BY EntityID, delta ORDER BY delta) AS EndTime
FROM locationreport2
) x
GROUP BY EntityName, LocationID, StartTime, EndTime
ORDER BY EntityName, StartTime

SQL Server Extract overlapping date ranges (return dates that cross other dates)

How would I go about extracting the overlapping dates from the following table?
ID Name StartDate EndDate Type
==============================================================
1 John Smith 01/01/2014 31/01/2014 A
2 John Smith 20/01/2014 20/02/2014 B
3 John Smith 01/03/2014 28/03/2014 A
4 John Smith 18/03/2014 24/03/2014 B
5 John Smith 01/07/2014 31/07/2014 A
6 John Smith 15/07/2014 31/07/2014 B
7 John Smith 25/07/2014 25/08/2014 C
Based on the first example for John Smith, the dates 01/01/2014 to 31/01/2014 overlap with 20/01/2014 to 20/02/2014, so I am expecting just overlapping period back which is 20/01/2014 to 31/01/2014.
The final result would be:
ID Name StartDate EndDate
==================================================
8 John Smith 20/01/2014 31/01/2014
9 John Smith 18/03/2014 24/03/2014
10 John Smith 15/07/2014 31/07/2014
11 John Smith 25/07/2014 31/07/2014
HELP REQUIRED 10 August 2014
In addition to the above request, I am looking for help or guidance on how to get the following results which should include the dates that overlap and the dates that don't. The ID column is irrelevant.
ID Name StartDate EndDate Type
==================================================
1 John Smith 01/01/2014 19/01/2014 A
8 John Smith 20/01/2014 31/01/2014 AB
2 John Smith 01/02/2014 20/02/2014 B
3 John Smith 01/03/2014 17/03/2014 A
9 John Smith 18/03/2014 24/03/2014 AB
3 John Smith 25/03/2014 28/03/2014 A
5 John Smith 01/07/2014 14/07/2014 A
10 John Smith 15/07/2014 31/07/2014 AB
11 John Smith 25/07/2014 31/07/2014 ABC
7 John Smith 01/08/2014 25/08/2014 C
Although the following image is not an exact reflection of the above, for illustration purposes, I am interested in seeing the dates that overlap (red) and the dates that don't (sky blue) in the same result set.
http://imgur.com/SeR9sY1
If you want just overlapping periods, you can get this with a self join. Do note that the results might be redundant if more than two periods overlap on certain dates.
select ft.name,
(case when max(ft.startdate) > max(ft2.startdate) then max(ft.startdate)
else max(ft2.startdate)
end) as startdate,
(case when min(ft.enddate) > min(ft2.enddate) then min(ft.enddate)
else min(ft2.enddate)
end) as enddate
from followingtable ft join
followingtable ft2
on ft.name = ft2.name and
ft.id < ft2.id and
ft.startdate <= ft2.enddate and
ft.enddate > ft2.startdate
group by ft.name, ft.id, ft2.id;
This doesn't assign the ids. You can do that with row_number() and an offset.