SQL - Period range in subgroups of a group by - sql

I have the following dataset:
A
B
C
1
John
2018-08-14
1
John
2018-08-20
1
John
2018-09-03
2
John
2018-11-13
2
John
2018-12-11
2
John
2018-12-12
1
John
2020-01-20
1
John
2020-01-21
3
John
2021-03-02
3
John
2021-03-03
1
John
2020-05-10
1
John
2020-05-12
And I would like to have the following result:
A
B
C
1
John
2018-08-14
2
John
2018-11-13
1
John
2020-01-20
3
John
2021-03-02
1
John
2020-05-10
If I group by A, B the 1st row and the third just concatenate which is coherent. How could I create another columns to still use a group by and have the result I want.
If you have another ideas than mine, please explain it !
I tried to use some first, last, rank, dense_rank without success.

Use lag(). Looks like B is a function of A in your data. So checking lag(A) will suffice.
select A,B,C
from (
select *, case when lag(A) over(order by C) = A then 0 else 1 end startFlag
from mytable
) t
where startFlag = 1
order by C

Related

SQL Db2 - How to unify two rows in one using datetime

I've got a table where we have registries of employees and where they have worked. In each row, we have the employee's starting date on that place. It's something like this:
Employee ID
Name
Branch
Start Date
1
John Doe
234
2018-01-20
1
John Doe
300
2019-03-20
1
John Doe
250
2022-01-19
2
Jane Doe
200
2019-02-15
2
Jane Doe
234
2020-05-20
I need a query where the data returned looks for the next value, making the starting date on the next branch as the end of the current. Eg:
Employee ID
Name
Branch
Start Date
End Date
1
John Doe
234
2018-01-20
2019-03-20
1
John Doe
300
2019-03-20
2022-01-19
1
John Doe
250
2022-01-19
---
2
Jane Doe
200
2019-02-15
2020-05-20
2
Jane Doe
234
2020-05-20
---
When there is not another register, we assume that the employee is still working on that branch, so we can leave it blank or put a default "9999-01-01" value.
Is there any way we can achieve a result like this using only SQL?
Another approach to my problem would be a query that returns only the row that is in a range. For example, if I look for what branch John Doe worked in 2020-12-01, the query should return the row that shows the branch 300.
You can use LEAD() to peek at the next row, according to a subgroup and ordering within it.
For example:
select
t.*,
lead(start_date) over(partition by employee_id order by start_date) as end_date
from t

How to execute join between three slow change dimensions sort by all start date columns?

I'm trying to join data between three slow change dimension type 2. When I query the result, the sort by date between the dimensions are not as expected.
I have the slow change dimensions below:
Table Subsidiaries
id
name
subsidiary
department
start_date_dep
end_date_dep
last_record_flg
1
John Doe
AL
Engineering
2005-10-01
2013-01-01
0
1
John Doe
AL
Sales
2013-01-01
2014-05-01
0
1
John Doe
NY
Sales
2014-05-01
1
38
Ivy Johnson
NY
Sales
2020-06-01
1
Table Functions
id
function
start_date_fun
end_date_fun
last_record_flg
1
operator
2005-10-01
2009-08-01
0
1
leader
2009-08-01
2011-10-01
0
1
manager
2011-10-01
2017-07-01
0
1
director
2017-07-01
1
38
operator
2020-06-01
1
Table Graduations
id
university_graduation
conclusion_date
last_record_flg
1
bachelor
15/12/2005
0
1
master
15/12/2008
1
38
bachelor
15/12/2014
1
The desired result is:
id
name
subsidiary
department
start_date_dep
end_date_dep
last_record_flg
function
start_date_fun
end_date_fun
last_record_flg
university_graduation
conclusion_date
last_record_flg
max_date
seq
start
end
last_record_flg
1
John Doe
AL
Engineering
2005-10-01
2013-01-01
0
operator
2005-10-01
2009-08-01
0
bachelor
2005-12-15
0
2005-12-15
1
2005-10-01
2008-12-15
0
1
John Doe
AL
Engineering
2005-10-01
2013-01-01
0
operator
2005-10-01
2009-08-01
0
master
2008-12-15
1
2008-12-15
1
2008-12-15
2009-08-01
0
1
John Doe
AL
Engineering
2005-10-01
2013-01-01
0
leader
2009-08-01
2011-10-01
0
master
2008-12-15
1
2009-08-01
1
2009-08-01
2011-10-01
0
1
John Doe
AL
Engineering
2005-10-01
2013-01-01
0
manager
2011-10-01
2017-07-01
0
master
2008-12-15
1
2011-10-01
1
2011-10-01
2013-01-01
0
1
John Doe
AL
Sales
2013-01-01
2014-05-01
0
manager
2011-10-01
2017-07-01
0
master
2008-12-15
1
2013-01-01
1
2013-01-01
2014-05-01
0
1
John Doe
NY
Sales
2014-05-01
NULL
1
manager
2011-10-01
2017-07-01
0
master
2008-12-15
1
2014-05-01
1
2014-05-01
2017-07-01
0
1
John Doe
NY
Sales
2014-05-01
NULL
1
director
2017-07-01
NULL
1
master
2008-12-15
1
2017-07-01
1
2017-07-01
NULL
1
38
Ivy Johnson
NY
Sales
2020-06-01
NULL
1
operator
2020-06-01
NULL
1
bachelor
2014-12-15
1
2020-06-01
1
2020-06-01
NULL
1
I tried with CROSS APPLY, but is returning only one line for each id. I'm trying with CASE WHEN but the query output is not exactly equal the desired result. In my return the column 'FUNCTION' and 'START_DATE_FUN' not follow the sequence (sort) presented in the desired result, the same occur for columns 'UNIVERSITY_GRADUATION' and 'CONCLUSION_DATE'.
The query:
select
*
from(
select
tb.*
,row_number() over(partition by tb.id,tb.max_date order by tb.max_date) as seq
,tb.max_date as [start]
,lead( tb.max_date ) over( partition by tb.id order by tb.max_date ) as [end]
,case when lead( tb.max_date ) over( partition by tb.id order by tb.max_date ) is null then 1 else 0 end as last_record_flg
from(
select
sb.id
,sb.[name]
,sb.subsidiary
,sb.department
,sb.start_date_dep
,sb.end_date_dep
,sb.last_record_flg as lr_sb
,fc.[function]
,fc.start_date_fun
,fc.end_date_fun
,fc.last_record_flg as lr_fc
,gd.university_graduation
,gd.end_date_grad
,gd.last_record_flg as lr_gd
,case
when sb.start_date_dep >= fc.start_date_fun and sb.start_date_dep >= gd.end_date_grad then sb.start_date_dep
when fc.start_date_fun >= sb.start_date_dep and fc.start_date_fun >= gd.end_date_grad then fc.start_date_fun
else gd.end_date_grad
end as max_date
from
#Subsidiaries as sb
left outer join #Functions as fc
on sb.id = fc.id
left outer join #Graduations as gd
on sb.id = gd.id
) as tb
) as tb2
where
tb2.seq = 1
Below the DDL:
create table #Subsidiaries (
id int
,[name] varchar(15)
,subsidiary varchar(2)
,department varchar(15)
,start_date_dep date
,end_date_dep date
,last_record_flg bit
)
go
insert into #Subsidiaries values
(1,'John Doe','AL','Engineering','2005-10-01','2013-01-01',0),
(1,'John Doe','AL','Sales','2013-01-01','2014-05-01',0),
(1,'John Doe','NY','Sales','2014-05-01',null,1),
(38,'Ivy Johnson','NY','Sales','2020-06-01',null,1)
go
create table #Functions (
id int
,[function] varchar(15)
,start_date_fun date
,end_date_fun date
,last_record_flg bit
)
go
insert into #Functions values
(1,'operator','2005-10-01','2009-08-01',0),
(1,'leader','2009-08-01','2011-10-01',0),
(1,'manager','2011-10-01','2017-07-01',0),
(1,'director','2017-07-01',null,1),
(38,'operator','2020-06-01',null,1)
go
create table #Graduations (
id int
,university_graduation varchar(15)
,end_date_grad date
,last_record_flg bit
)
go
insert into #Graduations values
(1,'bachelor','2005-12-15',0),
(1,'master','2008-12-15',1),
(38,'bachelor','2014-12-15',1)
go
Case when someone find the same difficult to join two or more SCD type 2, I could find a reference in this link https://sqlsunday.com/2014/11/30/joining-two-scd2-tables/ (SQL Sunday) that help me to build the query and use the range intervals in the join condition to return result as desired.

SQL Select statement and show only changed

I have the simple select script and it generates following audit table.
SELECT *
FROM Mytable
WHERE File = '123456A'
Output:
ID
File
StatusA
StatusB
User
UpdateDate
1
123456A
A
0
Tom
2021-01-01
12
123456A
B
0
Jack
2021-01-05
19
123456A
A
1
Alicia
2021-02-09
56
123456A
B
1
Jason
2021-03-09
87
123456A
A
1
Jason
2021-03-10
107
123456A
B
0
Ellie
2021-03-26
203
123456A
A
0
lucy
2021-04-08
239
123456A
B
1
Ellie
2021-04-16
I am trying to retrieve the rows when only column StatusB is changed. So it will generates the table like this.
SELECT *
FROM Mytable
WHERE File = '123456A'
-AND StatusB is changed
ID
File
StatusA
StatusB
User
UpdateDate
1
123456A
A
0
Tom
2021-01-01
19
123456A
A
1
Alicia
2021-02-09
107
123456A
B
0
Ellie
2021-03-26
239
123456A
B
1
Ellie
2021-04-16
In this case, I can see Alicia and Ellie changed the column StatusB. I am still thinking how to accomplish this goal.
Thanks,
-Ming
You can use lag():
select t.*
from (select t.*,
lag(statusB) over (order by updatedate) as prev_statusB
from Mytable t
where File = '123456A'
) t
where prev_statusB is null or prev_statusB <> statusB;

MS Access SQL query - Count records until value is met

I have an Access query (qr1) that returns the following data:
dateField
stringField1
stringField2
booleanField
11/09/20 17:15
John
Nick
0
12/09/20 17:00
John
Mary
-1
13/09/20 17:30
Ann
John
0
13/09/20 19:30
Kate
Alan
0
19/09/20 19:30
Ann
Missy
0
20/09/20 17:15
Jim
George
0
20/09/20 19:30
John
Nick
0
27/09/20 15:00
John
Mary
-1
27/09/20 17:00
Ann
John
-1
27/09/20 19:30
Kate
Alan
0
28/09/20 18:30
Ann
Missy
-1
03/10/20 18:30
Jim
George
-1
04/10/20 15:00
John
Nick
0
04/10/20 17:15
John
Mary
0
04/10/20 20:45
Ann
John
0
05/10/20 18:30
Kate
Alan
0
17/10/20 15:00
Jim
George
0
17/10/20 17:15
John
Nick
0
18/10/20 15:00
John
Mary
-1
18/10/20 17:15
Ann
John
0
Notes:
The string data may by repetitive or not.
The date data are stored as string. I use a function to convert it as date.
Public Function STR2TIME(sTime As String) As Date
Dim arr() As String
sTime = Replace(sTime, ".", "/")
arr = Split(sTime, " ")
STR2TIME = DateValue(Format(arr(0), "dd/mm/yyyy")) + TimeValue(arr(1))
End Function
qr1 is ORDERED BY STR2TIME(dateField) ASC
Now I need to run an extra query that will do the following:
add an extra column where:
counts records until yes (-1) on
booleanField
after this, starts over counting by 1
In this case the output should look like this:
dateField
stringField1
stringField2
booleanField
countField
11/09/20 17:15
John
Nick
0
1
12/09/20 17:00
John
Mary
-1
2
13/09/20 17:30
Ann
John
0
1
13/09/20 19:30
Kate
Alan
0
2
19/09/20 19:30
Ann
Missy
0
3
20/09/20 17:15
Jim
George
0
4
20/09/20 19:30
John
Nick
0
5
27/09/20 15:00
John
Mary
-1
6
27/09/20 17:00
Ann
John
-1
1
27/09/20 19:30
Kate
Alan
0
1
28/09/20 18:30
Ann
Missy
-1
2
03/10/20 18:30
Jim
George
-1
1
04/10/20 15:00
John
Nick
0
1
04/10/20 17:15
John
Mary
0
2
04/10/20 20:45
Ann
John
0
3
05/10/20 18:30
Kate
Alan
0
4
17/10/20 15:00
Jim
George
0
5
17/10/20 17:15
John
Nick
0
6
18/10/20 15:00
John
Mary
-1
7
18/10/20 17:15
Ann
John
0
1
Problem
I have tried many things all giving wrong numeric results.
Finally I thought that counting the zeros from the current date till the previous (biggest and smaller than the current), would do the trick:
SELECT t.*, (SELECT COUNT(*)
FROM qr1 tt
WHERE booleanField = 0
AND STR2TIME(tt.dateField) >= (SELECT TOP 1 dateField
FROM qr1
WHERE booleanField = -1
AND STR2TIME(dateField) < STR2TIME(t.dateField)
ORDER BY STR2TIME(dateField) DESC
)
AND STR2TIME(tt.dateField) <= STR2TIME(t.dateField)
) AS CountMatches
FROM qr1 t;
but still gives me wrong numeric results on countField:
countField
0
0
1
2
3
4
5
6
1
2
3
1
12
13
14
15
16
17
18
13
What am I doing wrong? I can't get it. How to get the desired result?
EDIT:
I'm posting the final code, based on #Gordon Linoff 's and #Gustav 's answers, slightly simplified.
Explanation of changes:
I got rid of the conversion-function in this step. Instead of converting 7 times * every single record, I convert only once in the first query and here the values are ready to compare.
I omitted checking the zeros as it was not necessary.
I added NZ function to get values when the inner subquery returns NULL. That is when there isn't any yes with smaller date to count from (first records usually).
The only problem left, was that with NZ I got values 1 less than what I needed, so I added -1 to the dateField to count 1 more.
Here is the code:
SELECT t.*, (SELECT COUNT(*) FROM qr2 tt
WHERE tt.dateField <= t.dateField
AND tt.dateField > NZ((SELECT TOP 1 dateField FROM qr2
WHERE booleanField = True
AND dateField < t.dateField
ORDER BY dateField DESC
), tt.dateField - 1)
) AS CountMatches
FROM qr2 AS t;
This is doable, though a little convoluted:
Select
qr1.dateField,
qr1.stringField1,
qr1.stringField2,
qr1.booleanField,
(Select Count(*) From qr1 As t1
Where
(t1.booleanfield = true And t1.dateField = qr1.dateField)
Or
(t1.booleanfield = false And t1.dateField <= qr1.dateField And
t1.dateField >= Nz(
(Select Top 1 dateField From qr1 As t
Where t.dateField < qr1.dateField And t.booleanField = True
Order By t.dateField Desc),
t1.dateField ))) As countField
From
qr1;
Output:
You string converter can be replaced by this expression:
TrueDate = CDate(Replace(TextDotDate, ".", "/"))
This you should apply at a much earlier state, like in qr1.
One obvious problem is:
AND STR2TIME(tt.dateField) >= (SELECT TOP 1 dateField
This should be:
AND STR2TIME(tt.dateField) >= (SELECT TOP 1 STR2TIME(dateFielda)
Second, you have:
WHERE booleanField = 0
But I don't think this filter is appropriate.

SQL Server Extract overlapping date ranges (return dates that cross other dates)

How would I go about extracting the overlapping dates from the following table?
ID Name StartDate EndDate Type
==============================================================
1 John Smith 01/01/2014 31/01/2014 A
2 John Smith 20/01/2014 20/02/2014 B
3 John Smith 01/03/2014 28/03/2014 A
4 John Smith 18/03/2014 24/03/2014 B
5 John Smith 01/07/2014 31/07/2014 A
6 John Smith 15/07/2014 31/07/2014 B
7 John Smith 25/07/2014 25/08/2014 C
Based on the first example for John Smith, the dates 01/01/2014 to 31/01/2014 overlap with 20/01/2014 to 20/02/2014, so I am expecting just overlapping period back which is 20/01/2014 to 31/01/2014.
The final result would be:
ID Name StartDate EndDate
==================================================
8 John Smith 20/01/2014 31/01/2014
9 John Smith 18/03/2014 24/03/2014
10 John Smith 15/07/2014 31/07/2014
11 John Smith 25/07/2014 31/07/2014
HELP REQUIRED 10 August 2014
In addition to the above request, I am looking for help or guidance on how to get the following results which should include the dates that overlap and the dates that don't. The ID column is irrelevant.
ID Name StartDate EndDate Type
==================================================
1 John Smith 01/01/2014 19/01/2014 A
8 John Smith 20/01/2014 31/01/2014 AB
2 John Smith 01/02/2014 20/02/2014 B
3 John Smith 01/03/2014 17/03/2014 A
9 John Smith 18/03/2014 24/03/2014 AB
3 John Smith 25/03/2014 28/03/2014 A
5 John Smith 01/07/2014 14/07/2014 A
10 John Smith 15/07/2014 31/07/2014 AB
11 John Smith 25/07/2014 31/07/2014 ABC
7 John Smith 01/08/2014 25/08/2014 C
Although the following image is not an exact reflection of the above, for illustration purposes, I am interested in seeing the dates that overlap (red) and the dates that don't (sky blue) in the same result set.
http://imgur.com/SeR9sY1
If you want just overlapping periods, you can get this with a self join. Do note that the results might be redundant if more than two periods overlap on certain dates.
select ft.name,
(case when max(ft.startdate) > max(ft2.startdate) then max(ft.startdate)
else max(ft2.startdate)
end) as startdate,
(case when min(ft.enddate) > min(ft2.enddate) then min(ft.enddate)
else min(ft2.enddate)
end) as enddate
from followingtable ft join
followingtable ft2
on ft.name = ft2.name and
ft.id < ft2.id and
ft.startdate <= ft2.enddate and
ft.enddate > ft2.startdate
group by ft.name, ft.id, ft2.id;
This doesn't assign the ids. You can do that with row_number() and an offset.