Join two data frames SQL where and between overlaps [duplicate] - sql

This question already has an answer here:
Finding Overlaps between interval sets / Efficient Overlap Joins
(1 answer)
Closed 7 years ago.
I am trying to join two data frames which in SQL would utilise a where and a between statement for dates.
In SQL, the code would be:
select Date,(Value1-Test1) as Ans1,(Value2-Test2) as Ans2,ID
from Data a
inner join Test b on a.ID=b.ID and a.Date between b.DateStart and c.DateEnd
This is Data
Date Value1 Value2 ID
01/01/16 19:30:00 10 30 A
01/01/16 19:50:20 20 40 B
01/01/16 19:55:30 30 50 C
This is Test
RowNumber DateStart DateEnd Test1 Test2 ID
1 01/01/16 17:00:00 01/01/16 22:00:05 2 4 A
2 01/01/16 22:00:06 01/01/16 01:50:00 3 6 A
3 01/01/16 17:00:00 01/01/16 22:00:05 4 8 B
4 01/01/16 22:00:06 01/01/16 01:50:00 5 2 B
5 01/01/16 17:00:00 01/01/16 22:00:05 6 4 C
6 01/01/16 22:00:06 01/01/16 01:50:00 7 5 C
The results I am trying to create
Date Ans1 Ans2 ID
01/01/16 19:30:00 8 26 A
01/01/16 19:50:12 16 32 B
01/01/16 19:55:24 24 46 C
Any help and pointers would be great.

Following advice from #zx8754 I have tried to use data.table::foverlaps()
In Data, rename the Date field to DateStart and create a second date field where DateEnd=Date. Add the following code:
setkey(Data,ID,DateStart,DateEnd)
setkey(Test,Id,DateStart,DateEnd)
CompleteDataset <- foverlaps(Data, Test, type="any")
This give me exactly what I want.
Finding Overlaps between interval sets / Efficient Overlap Joins

Simply merge the two datasets on ID, then conditionally filter rows afterwards which corresponds to SQL's JOIN and WHERE clauses. Finally, run calculations and select columns afterwards.
mergedf <- merge(data, test, by="ID")
mergedf <- mergedf[(mergedf$Date >= mergedf$DateStart &
mergedf$Date <= mergedf$DateEnd),]
mergedf$Ans1 <- mergedf$Value1 - mergedf$Test1
mergedf$Ans2 <- mergedf$Value2 - mergedf$Test2
mergedf <- mergedf[c('Date', 'Ans1', 'Ans2', 'ID')]

Related

count number of records by month over the last five years where record date > select month

I need to show the number of valid inspectors we have by month over the last five years. Inspectors are considered valid when the expiration date on their certification has not yet passed, recorded as the month end date. The below SQL code is text of the query to count valid inspectors for January 2017:
SELECT Count(*) AS RecordCount
FROM dbo_Insp_Type
WHERE (dbo_Insp_Type.CERT_EXP_DTE)>=#2/1/2017#);
Rather than designing 60 queries, one for each month, and compiling the results in a final table (or, err, query) are there other methods I can use that call for less manual input?
From this sample:
Id
CERT_EXP_DTE
1
2022-01-15
2
2022-01-23
3
2022-02-01
4
2022-02-03
5
2022-05-01
6
2022-06-06
7
2022-06-07
8
2022-07-21
9
2022-02-20
10
2021-11-05
11
2021-12-01
12
2021-12-24
this single query:
SELECT
Format([CERT_EXP_DTE],"yyyy/mm") AS YearMonth,
Count(*) AS AllInspectors,
Sum(Abs([CERT_EXP_DTE] >= DateSerial(Year([CERT_EXP_DTE]), Month([CERT_EXP_DTE]), 2))) AS ValidInspectors
FROM
dbo_Insp_Type
GROUP BY
Format([CERT_EXP_DTE],"yyyy/mm");
will return:
YearMonth
AllInspectors
ValidInspectors
2021-11
1
1
2021-12
2
1
2022-01
2
2
2022-02
3
2
2022-05
1
0
2022-06
2
2
2022-07
1
1
ID
Cert_Iss_Dte
Cert_Exp_Dte
1
1/15/2020
1/15/2022
2
1/23/2020
1/23/2022
3
2/1/2020
2/1/2022
4
2/3/2020
2/3/2022
5
5/1/2020
5/1/2022
6
6/6/2020
6/6/2022
7
6/7/2020
6/7/2022
8
7/21/2020
7/21/2022
9
2/20/2020
2/20/2022
10
11/5/2021
11/5/2023
11
12/1/2021
12/1/2023
12
12/24/2021
12/24/2023
A UNION query could calculate a record for each of 50 months but since you want 60, UNION is out.
Or a query with 60 calculated fields using IIf() and Count() referencing a textbox on form for start date:
SELECT Count(IIf(CERT_EXP_DTE>=Forms!formname!tbxDate,1,Null)) AS Dt1,
Count(IIf(CERT_EXP_DTE>=DateAdd("m",1,Forms!formname!tbxDate),1,Null) AS Dt2,
...
FROM dbo_Insp_Type
Using the above data, following is output for Feb and Mar 2022. I did a test with Cert_Iss_Dte included in criteria and it did not make a difference for this sample data.
Dt1
Dt2
10
8
Or a report with 60 textboxes and each calls a DCount() expression with criteria same as used in query.
Or a VBA procedure that writes data to a 'temp' table.

How to select max date from table for distinct values [duplicate]

This question already has answers here:
Retrieving the last record in each group - MySQL
(33 answers)
Closed 11 months ago.
I have a table that looks like this:
date
account
asset
amount
01-01-2022
1
A
12
01-01-2022
1
B
100
02-01-2022
1
A
14
02-01-2022
1
B
98
01-01-2022
2
A
15
01-01-2022
2
C
230
02-01-2022
2
A
13
02-01-2022
2
B
223
03-01-2022
2
A
17
03-01-2022
2
B
237
I want to be able to get the last values (i.e. max date) for each account. So the result should look like this:
date
account
asset
amount
02-01-2022
1
A
14
02-01-2022
1
B
98
03-01-2022
2
A
17
03-01-2022
2
B
237
How can this be done in SQL?
EDIT: Notice that the max dates for the different accounts are not the same.
You can do it by first selecting the max dates for each account and then forcing the match between accounts given the date constraints, like in the following query:
SELECT
*
FROM
(
SELECT
MAX(date) AS date,
account
FROM
tab
GROUP BY
account
) max_date_per_account
INNER JOIN
tab
ON
tab.date = max_date_per_account.date
AND
tab.account = max_date_per_account.account

Creating a new calculated column in SQL

Is there a way to find the solution so that I need for 2 days, there are 2 UD's because there are June 24 2 times and for the rest there are single days.
I am showing the expected output here:
Primary key UD Date
-------------------------------------------
1 123 2015-06-24 00:00:00.000
6 456 2015-06-24 00:00:00.000
2 123 2015-06-25 00:00:00.000
3 658 2015-06-26 00:00:00.000
4 598 2015-06-27 00:00:00.000
5 156 2015-06-28 00:00:00.000
No of times Number of days
-----------------------------
4 1
2 2
The logic is 4 users are there who used the application on 1 day and there are 2 userd who used the application on 2 days
You can use two levels of aggregation:
select cnt, count(*)
from (select date, count(*) as cnt
from t
group by date
) d
group by cnt
order by cnt desc;

SAS/SQL: Combine two columns while retaining others

I need to merge two data sets. Each data set contains a sequential observation number. The first data set contains only the first observation. The second data set contains all subsequent observations. Not all subjects have the same number of observations.
The problem is as follows. There are two different types of subject. The type is contained only in the first data set. When I merge the two data sets together, the type is missing on all observations but the first for each subject. Please see my example below.
I would like to know how to do this with both SQL and a DATA step. My real data sets are not large, so efficiency of processing is not major a concern.
I have tried using RETAIN, but as the second data set doesn't contain the TYPE variable, there is no value to retain. Regarding SQL, it seems like UNION should work, and there are countless examples of UNION on the internet, but they all involve a single variable. I need to know how to union the Observation variable by ID while retaining the Amount and assigning the Type.
Example
data set1;
input ID $
Observation
Type $
Amount
;
datalines;
002 1 A 15
026 1 A 30
031 1 B 7
028 1 B 10
036 1 A 22
;
run;
data set2;
input ID $
Observation
Amount
;
datalines;
002 2 11
002 3 35
002 4 13
002 5 12
026 2 21
026 3 12
026 4 40
031 2 11
028 2 27
036 2 10
036 3 15
036 4 16
036 5 12
036 6 20
;
run;
proc sort data = set1;
by ID
Observation
;
run;
proc sort data = set2;
by ID
Observation
;
run;
data merged;
merge set1
set2
;
by ID
Observation
;
run;
This gives
ID Observation Type Amount
002 1 A 15
002 2 11
002 3 35
002 4 13
002 5 12
026 1 A 30
026 2 21
026 3 12
026 4 40
028 1 B 10
028 2 27
031 1 B 7
031 2 11
036 1 A 22
036 2 10
036 3 15
036 4 16
036 5 12
036 6 20
However, what I need is
ID Observation Type Amount
002 1 A 15
002 2 A 11
002 3 A 35
002 4 A 13
002 5 A 12
026 1 A 30
026 2 A 21
026 3 A 12
026 4 A 40
028 1 B 10
028 2 B 27
031 1 B 7
031 2 B 11
036 1 A 22
036 2 A 10
036 3 A 15
036 4 A 16
036 5 A 12
036 6 A 20
I'm sure there are other ways to do it, but this is how I'd do it.
First, stack the data keeping only the common fields.
data new;
set set1 (drop = TYPE) set2;
run;
Then merge the type field back over.
proc sql;
create table new2 as select
a.*,
b.TYPE
from new a
left join set1 b
on a.id=b.id;
quit;
Proc SQL:
proc sql;
create table want as
select coalesce(a.id,b.id) as id,observation,type,amount from (select * from set1(drop=type) union
select * from set2) a left join set1 (keep=id type) b
on a.id=b.id;
quit;
The DATA step method is straight forward, just use SET with BY to interleave the records. You need to create a NEW variable to retain the values. If you want you can drop the old one and rename the new one to have its name.
data want ;
set set1 set2 ;
by id ;
if first.id then new_type=type;
retain new_type;
run;
For SQL use the method that #JJFord3 posted to first union the common fields and then merge on the TYPE flag. You can combine into a single statement.
proc sql;
create table want as
select a.*,b.type
from
(select id,observation,amount from set1
union
select id,observation,amount from set2
) a
left join set1 b
on a.id = b.id
order by 1,2
;
quit;

Max date among records and across tables - SQL Server

I tried max to provide in table format but it seem not good in StackOver, so attaching snapshot of the 2 tables. Apologize about the formatting.
SQL Server 2012
**MS Table**
**mId tdId name dueDate**
1 1 **forecastedDate** 1/1/2015
2 1 **hypercareDate** 11/30/2016
3 1 LOE 1 7/4/2016
4 1 LOE 2 7/4/2016
5 1 demo for yy test 10/15/2016
6 1 Implementation – testing 7/4/2016
7 1 Phased Rollout – final 7/4/2016
8 2 forecastedDate 1/7/2016
9 2 hypercareDate 11/12/2016
10 2 domain - Forte NULL
11 2 Fortis completion 1/1/2016
12 2 Certification NULL
13 2 Implementation 7/4/2016
-----------------------------------------------
**MSRevised**
**mId revisedDate**
1 1/5/2015
1 1/8/2015
3 3/25/2017
2 2/1/2016
2 12/30/2016
3 4/28/2016
4 4/28/2016
5 10/1/2016
6 7/28/2016
7 7/28/2016
8 4/28/2016
9 8/4/2016
9 5/28/2016
11 10/4/2016
11 10/5/2016
13 11/1/2016
----------------------------------------
The required output is
1. Will be passing the 'tId' number, for instance 1, lets call it tid (1)
2. Want to compare tId (1)'s all milestones (except hypercareDate) with tid(1)'s forecastedDate milestone
3. return if any of the milestone date (other than hypercareDate) is greater than the forecastedDate
The above 3 steps are simple, but I have to first compare the milestones date with its corresponding revised dates, if any, from the revised table, and pick the max date among all that needs to be compared with the forecastedDate
I managed to solve this. Posting the answer, hope it helps aomebody.
//Insert the result into temp table
INSERT INTO #mstab
SELECT [mId]
, [tId]
, [msDate]
FROM [dbo].[MS]
WHERE ([msName] NOT LIKE 'forecastedDate' AND [msName] NOT LIKE 'hypercareDate'))
// this scalar function will get max date between forecasted duedate and forecasted revised date
SELECT #maxForecastedDate = [dbo].[fnGetMaxDate] ( 'forecastedDate');
// this will get the max date from temp table and compare it with forecasatedDate/
SET #maxmilestoneDate = (SELECT MAX(maxDate)
FROM ( SELECT ms.msDueDate AS dueDate
, mr.msRevisedDate AS revDate
FROM #mstab as ms
LEFT JOIN [MSRev] as mr on ms.msId = mr.msId
) maxDate
UNPIVOT (maxDate FOR DateCols IN (dueDate, revDate))up );