Loop and calculate month difference based on criteria - sql

I've got a dataset like the below and not really sure where to start. I'm using Aginity Workbench for Netezza and I'm wanting to see when there is an interaction see if there is a conversion within 3 months. Needs to scale across multiple customers.
Date Customer Interaction Conversion
1/01/2017 John Smith 1 0
1/02/2017 John Smith 0
1/03/2017 John Smith 0
1/04/2017 John Smith 0
1/05/2017 John Smith 0
1/06/2017 John Smith 1 0
1/07/2017 John Smith 1 0
1/08/2017 John Smith 1
1/09/2017 John Smith 0
1/10/2017 John Smith 0
1/11/2017 John Smith 0
1/12/2017 John Smith 0
Ideally the output should look like the below where the conversion is attributed once based on a three month window of interactions. So if there any interactions in subsequent months, then attribute the conversion to the first month of the 3 month window. Also needs to flag what an interaction and conversion happen in the same month.
Date Customer Interaction Conversion 3MonthConversion
1/01/2017 John Smith 1 0 0
1/02/2017 John Smith 0
1/03/2017 John Smith 0
1/04/2017 John Smith 0
1/05/2017 John Smith 0
1/06/2017 John Smith 1 0 1
1/07/2017 John Smith 1 0
1/08/2017 John Smith 1
1/09/2017 John Smith 0
1/10/2017 John Smith 0
1/11/2017 John Smith 0
1/12/2017 John Smith 0

The below query should work. Please let me know if you face any issue
select date,customer, interaction,conversion,
case when interaction=1 and (lead(conversion,1) over (order by date))=1 then 1
when interaction=1 and (lead(conversion,2) over (order by date))=1 then 1
when interaction=1 and (lead(conversion,3) over (order by date))=1 then 1
else 0 end as threeMonthconversion
from test_month

Related

SQL Db2 - How to unify two rows in one using datetime

I've got a table where we have registries of employees and where they have worked. In each row, we have the employee's starting date on that place. It's something like this:
Employee ID
Name
Branch
Start Date
1
John Doe
234
2018-01-20
1
John Doe
300
2019-03-20
1
John Doe
250
2022-01-19
2
Jane Doe
200
2019-02-15
2
Jane Doe
234
2020-05-20
I need a query where the data returned looks for the next value, making the starting date on the next branch as the end of the current. Eg:
Employee ID
Name
Branch
Start Date
End Date
1
John Doe
234
2018-01-20
2019-03-20
1
John Doe
300
2019-03-20
2022-01-19
1
John Doe
250
2022-01-19
---
2
Jane Doe
200
2019-02-15
2020-05-20
2
Jane Doe
234
2020-05-20
---
When there is not another register, we assume that the employee is still working on that branch, so we can leave it blank or put a default "9999-01-01" value.
Is there any way we can achieve a result like this using only SQL?
Another approach to my problem would be a query that returns only the row that is in a range. For example, if I look for what branch John Doe worked in 2020-12-01, the query should return the row that shows the branch 300.
You can use LEAD() to peek at the next row, according to a subgroup and ordering within it.
For example:
select
t.*,
lead(start_date) over(partition by employee_id order by start_date) as end_date
from t

SQL - Period range in subgroups of a group by

I have the following dataset:
A
B
C
1
John
2018-08-14
1
John
2018-08-20
1
John
2018-09-03
2
John
2018-11-13
2
John
2018-12-11
2
John
2018-12-12
1
John
2020-01-20
1
John
2020-01-21
3
John
2021-03-02
3
John
2021-03-03
1
John
2020-05-10
1
John
2020-05-12
And I would like to have the following result:
A
B
C
1
John
2018-08-14
2
John
2018-11-13
1
John
2020-01-20
3
John
2021-03-02
1
John
2020-05-10
If I group by A, B the 1st row and the third just concatenate which is coherent. How could I create another columns to still use a group by and have the result I want.
If you have another ideas than mine, please explain it !
I tried to use some first, last, rank, dense_rank without success.
Use lag(). Looks like B is a function of A in your data. So checking lag(A) will suffice.
select A,B,C
from (
select *, case when lag(A) over(order by C) = A then 0 else 1 end startFlag
from mytable
) t
where startFlag = 1
order by C

MS Access SQL query - Count records until value is met

I have an Access query (qr1) that returns the following data:
dateField
stringField1
stringField2
booleanField
11/09/20 17:15
John
Nick
0
12/09/20 17:00
John
Mary
-1
13/09/20 17:30
Ann
John
0
13/09/20 19:30
Kate
Alan
0
19/09/20 19:30
Ann
Missy
0
20/09/20 17:15
Jim
George
0
20/09/20 19:30
John
Nick
0
27/09/20 15:00
John
Mary
-1
27/09/20 17:00
Ann
John
-1
27/09/20 19:30
Kate
Alan
0
28/09/20 18:30
Ann
Missy
-1
03/10/20 18:30
Jim
George
-1
04/10/20 15:00
John
Nick
0
04/10/20 17:15
John
Mary
0
04/10/20 20:45
Ann
John
0
05/10/20 18:30
Kate
Alan
0
17/10/20 15:00
Jim
George
0
17/10/20 17:15
John
Nick
0
18/10/20 15:00
John
Mary
-1
18/10/20 17:15
Ann
John
0
Notes:
The string data may by repetitive or not.
The date data are stored as string. I use a function to convert it as date.
Public Function STR2TIME(sTime As String) As Date
Dim arr() As String
sTime = Replace(sTime, ".", "/")
arr = Split(sTime, " ")
STR2TIME = DateValue(Format(arr(0), "dd/mm/yyyy")) + TimeValue(arr(1))
End Function
qr1 is ORDERED BY STR2TIME(dateField) ASC
Now I need to run an extra query that will do the following:
add an extra column where:
counts records until yes (-1) on
booleanField
after this, starts over counting by 1
In this case the output should look like this:
dateField
stringField1
stringField2
booleanField
countField
11/09/20 17:15
John
Nick
0
1
12/09/20 17:00
John
Mary
-1
2
13/09/20 17:30
Ann
John
0
1
13/09/20 19:30
Kate
Alan
0
2
19/09/20 19:30
Ann
Missy
0
3
20/09/20 17:15
Jim
George
0
4
20/09/20 19:30
John
Nick
0
5
27/09/20 15:00
John
Mary
-1
6
27/09/20 17:00
Ann
John
-1
1
27/09/20 19:30
Kate
Alan
0
1
28/09/20 18:30
Ann
Missy
-1
2
03/10/20 18:30
Jim
George
-1
1
04/10/20 15:00
John
Nick
0
1
04/10/20 17:15
John
Mary
0
2
04/10/20 20:45
Ann
John
0
3
05/10/20 18:30
Kate
Alan
0
4
17/10/20 15:00
Jim
George
0
5
17/10/20 17:15
John
Nick
0
6
18/10/20 15:00
John
Mary
-1
7
18/10/20 17:15
Ann
John
0
1
Problem
I have tried many things all giving wrong numeric results.
Finally I thought that counting the zeros from the current date till the previous (biggest and smaller than the current), would do the trick:
SELECT t.*, (SELECT COUNT(*)
FROM qr1 tt
WHERE booleanField = 0
AND STR2TIME(tt.dateField) >= (SELECT TOP 1 dateField
FROM qr1
WHERE booleanField = -1
AND STR2TIME(dateField) < STR2TIME(t.dateField)
ORDER BY STR2TIME(dateField) DESC
)
AND STR2TIME(tt.dateField) <= STR2TIME(t.dateField)
) AS CountMatches
FROM qr1 t;
but still gives me wrong numeric results on countField:
countField
0
0
1
2
3
4
5
6
1
2
3
1
12
13
14
15
16
17
18
13
What am I doing wrong? I can't get it. How to get the desired result?
EDIT:
I'm posting the final code, based on #Gordon Linoff 's and #Gustav 's answers, slightly simplified.
Explanation of changes:
I got rid of the conversion-function in this step. Instead of converting 7 times * every single record, I convert only once in the first query and here the values are ready to compare.
I omitted checking the zeros as it was not necessary.
I added NZ function to get values when the inner subquery returns NULL. That is when there isn't any yes with smaller date to count from (first records usually).
The only problem left, was that with NZ I got values 1 less than what I needed, so I added -1 to the dateField to count 1 more.
Here is the code:
SELECT t.*, (SELECT COUNT(*) FROM qr2 tt
WHERE tt.dateField <= t.dateField
AND tt.dateField > NZ((SELECT TOP 1 dateField FROM qr2
WHERE booleanField = True
AND dateField < t.dateField
ORDER BY dateField DESC
), tt.dateField - 1)
) AS CountMatches
FROM qr2 AS t;
This is doable, though a little convoluted:
Select
qr1.dateField,
qr1.stringField1,
qr1.stringField2,
qr1.booleanField,
(Select Count(*) From qr1 As t1
Where
(t1.booleanfield = true And t1.dateField = qr1.dateField)
Or
(t1.booleanfield = false And t1.dateField <= qr1.dateField And
t1.dateField >= Nz(
(Select Top 1 dateField From qr1 As t
Where t.dateField < qr1.dateField And t.booleanField = True
Order By t.dateField Desc),
t1.dateField ))) As countField
From
qr1;
Output:
You string converter can be replaced by this expression:
TrueDate = CDate(Replace(TextDotDate, ".", "/"))
This you should apply at a much earlier state, like in qr1.
One obvious problem is:
AND STR2TIME(tt.dateField) >= (SELECT TOP 1 dateField
This should be:
AND STR2TIME(tt.dateField) >= (SELECT TOP 1 STR2TIME(dateFielda)
Second, you have:
WHERE booleanField = 0
But I don't think this filter is appropriate.

Extract Conditional Middle name and last name

I have a column in a data frame which has a full name as first name, middle name lastname, however for some records no middle name available and want to make sure that it populates the middle name conditionally based on the available pattern but not sure how I can achieve this.
import pandas as pd
name_df = pd.read_csv(r"NameData1.txt",delimiter=",")
splitted_name=name_df.name.str.split(' ',expand=True).fillna('No Value')
##splited_name['middle_name']= splited_name.apply(lambda x : x[1] if x[2] != 'No Value' else '' )
name_df['Middle_name']=name_df.apply(lambda splited_name : splited_name[1] if splited_name[2] != 'No Value' else '')
name_df
I want to display the middle name only when it's there else the last name should be populated.
Sample records:
Id,name
1,TOM M SMITH
2,Gary SMITH
3,John C Doe
4,Hary Knox
5,Rakesh Vaidya
6,John Doe Doe
Use numpy.where for set new column by condition, here are tested None values by Series.isna:
splitted_name=name_df.name.str.split(expand=True)
name_df['First_name'] = splitted_name[0]
name_df['Middle_name']= np.where(splitted_name[2].notna(), splitted_name[1], '')
name_df['Last_name']= np.where(splitted_name[2].notna(), splitted_name[2], splitted_name[1])
print (name_df)
Id name First_name Middle_name Last_name
0 1 TOM M SMITH TOM M SMITH
1 2 Gary SMITH Gary SMITH
2 3 John C Doe John C Doe
3 4 Hary Knox Hary Knox
4 5 Rakesh Vaidya Rakesh Vaidya
5 6 John Doe Doe John Doe Doe
I want to display middle name only wen its there else last name should be populated.
So you can do the below using str.split():
df['middle_or_last']=df.name.apply(lambda x:x.split(' ', maxsplit=len(x.split()))).str[1]
print(df)
Id name middle_or_last
0 1 TOM M SMITH M
1 2 Gary SMITH SMITH
2 3 John C Doe C
3 4 Hary Knox Knox
4 5 Rakesh Vaidya Vaidya
5 6 John Doe Doe Doe

SQL Server Extract overlapping date ranges (return dates that cross other dates)

How would I go about extracting the overlapping dates from the following table?
ID Name StartDate EndDate Type
==============================================================
1 John Smith 01/01/2014 31/01/2014 A
2 John Smith 20/01/2014 20/02/2014 B
3 John Smith 01/03/2014 28/03/2014 A
4 John Smith 18/03/2014 24/03/2014 B
5 John Smith 01/07/2014 31/07/2014 A
6 John Smith 15/07/2014 31/07/2014 B
7 John Smith 25/07/2014 25/08/2014 C
Based on the first example for John Smith, the dates 01/01/2014 to 31/01/2014 overlap with 20/01/2014 to 20/02/2014, so I am expecting just overlapping period back which is 20/01/2014 to 31/01/2014.
The final result would be:
ID Name StartDate EndDate
==================================================
8 John Smith 20/01/2014 31/01/2014
9 John Smith 18/03/2014 24/03/2014
10 John Smith 15/07/2014 31/07/2014
11 John Smith 25/07/2014 31/07/2014
HELP REQUIRED 10 August 2014
In addition to the above request, I am looking for help or guidance on how to get the following results which should include the dates that overlap and the dates that don't. The ID column is irrelevant.
ID Name StartDate EndDate Type
==================================================
1 John Smith 01/01/2014 19/01/2014 A
8 John Smith 20/01/2014 31/01/2014 AB
2 John Smith 01/02/2014 20/02/2014 B
3 John Smith 01/03/2014 17/03/2014 A
9 John Smith 18/03/2014 24/03/2014 AB
3 John Smith 25/03/2014 28/03/2014 A
5 John Smith 01/07/2014 14/07/2014 A
10 John Smith 15/07/2014 31/07/2014 AB
11 John Smith 25/07/2014 31/07/2014 ABC
7 John Smith 01/08/2014 25/08/2014 C
Although the following image is not an exact reflection of the above, for illustration purposes, I am interested in seeing the dates that overlap (red) and the dates that don't (sky blue) in the same result set.
http://imgur.com/SeR9sY1
If you want just overlapping periods, you can get this with a self join. Do note that the results might be redundant if more than two periods overlap on certain dates.
select ft.name,
(case when max(ft.startdate) > max(ft2.startdate) then max(ft.startdate)
else max(ft2.startdate)
end) as startdate,
(case when min(ft.enddate) > min(ft2.enddate) then min(ft.enddate)
else min(ft2.enddate)
end) as enddate
from followingtable ft join
followingtable ft2
on ft.name = ft2.name and
ft.id < ft2.id and
ft.startdate <= ft2.enddate and
ft.enddate > ft2.startdate
group by ft.name, ft.id, ft2.id;
This doesn't assign the ids. You can do that with row_number() and an offset.