SELECT Earliest Date from Grouped Results in DATEDIFF - sql

From this table T2, I need to select the earliest date from each group by ID where the Prog is 'YY' and use it in DATEDIFF with respect to EDate:
+----+-----------+-----------+------+
| ID | SDate | Edate | Prog |
+----+-----------+-----------+------+
| 1 | 4/12/2016 | 5/18/2016 | XX |
| 1 | 4/1/2016 | 4/4/2016 | YY |
| 1 | 5/23/2016 | 5/28/2016 | YY |
| 2 | 9/21/2016 | 9/26/2016 | XX |
| 2 | 8/7/2016 | 8/9/2016 | YY |
| 3 | 8/2/2015 | 8/12/2015 | YY |
| 3 | 4/12/2015 | 4/18/2015 | YY |
+----+-----------+-----------+------+
And then show it with the aggregate level in Table T1 as the Desired Output:
+----+------+-----+-----------+------+
| ID | Name | Age | SDate | Days |
+----+------+-----+-----------+------+
| 1 | A | 52 | 4/1/2016 | 3 |
| 2 | B | 11 | 8/7/2016 | 2 |
| 3 | C | 24 | 4/12/2015 | 6 |
+----+------+-----+-----------+------+
Attempt:
SELECT
T1.ID,
T1.Name,
T1.Age,
MIN(T2.SDate) AS [SDate],
--DATEDIFF(day,MIN(T2.SDate),T2.EDate) AS [Days]
FROM T1
INNER JOIN T2
ON T1.ID=T2.ID
WHERE T2.Prog='YY'
GROUP BY
T1.ID,
T1.Name,
T1.Age
I commented out the DATEDIFF function for Days since I am not sure how to formulate that. Obviously, something like DATEDIFF(day,SELECT MIN(SDate) FROM T2 WHERE Prog='YY','Another Date') won't work since I will get an overall MIN(SDate) which won't be partitioned by ID and I can't do SELECT ID,MIN(SDate) FROM T2 WHERE Prog='YY' GROUP BY ID in the inner subquery either since DATEDIFF will only accept a Date field.
So how do I extract MIN(SDate) and calculate the DATEDIFF for corresponding Edate, for each grouped ID in that case?

Use the min window function to get the min sdate for each id and use it for computing the date difference.
SELECT ID,NAME,Age,DATEDIFF(DD,SDate,EDate)
FROM (
SELECT
T1.ID,
T1.Name,
T1.Age,
MIN(CASE WHEN T2.PROG = 'YY' THEN T2.SDate END) OVER(PARTITION BY T2.ID) AS [SDate],
T2.EDate
FROM T1
INNER JOIN T2 ON T1.ID=T2.ID
) x

Use MIN as a window function:
SELECT T1.ID,
T1.Name,
T1.Age,
DATEDIFF(day,
MIN(T2.SDate) OVER PARTITION BY (T1.ID, T1.Name, T1.Age),
T2.EDate) AS [Days]
FROM T1
INNER JOIN T2
ON T1.ID = T2.ID
WHERE T2.Prog = 'YY'

Related

Take the row after the specific row

I have the table, where I need to take the next row after the row which has course 'TA' and flag = 1. For this I created the column rnum (OVER DATE) which may help for finding it
| student | date | course | flag | rnum |
| ------- | ----- | ----------- | ---- | ---- |
| 1 | 17:00 | Math | null | 1 |
| 1 | 17:10 | Python | null | 2 |
| 1 | 17:15 | TA | 1 | 3 |
| 1 | 17:20 | English | null | 4 |
| 1 | 17:35 | Geography | null | 5 |
| 2 | 16:10 | English | null | 1 |
| 2 | 16:20 | TA | 1 | 2 |
| 2 | 16:30 | SQL | null | 3 |
| 2 | 16:40 | Python | null | 4 |
| 3 | 19:05 | English | null | 1 |
| 3 | 19:20 | Literachure | null | 2 |
| 3 | 19:30 | TA | null | 3 |
| 3 | 19:40 | Python | null | 4 |
| 3 | 19:50 | Python | null | 5 |
As a result I should have:
| student | date | course | flag | rnum |
| ------- | ----- | ------- | ---- | ---- |
| 1 | 17:20 | English | null | 4 |
| 2 | 16:30 | SQL | null | 3 |
There are many ways to get your desired result, let's see some of them.
1) EXISTS
You can use the EXISTS clause, specifying a subquery to match for the condition.
SELECT T2.*
FROM #MyTable T2
WHERE EXISTS (
SELECT 'x' x
FROM #MyTable T1
WHERE T1.course = 'TA' AND T1.flag = 1
AND T1.student = T2.student AND T2.rnum = T1.rnum + 1
)
2) LAG
You ca use window function LAG to access previous row for a given order and then filter your resultset with your conditions.
SELECT w.student, w.date, w.course, w.flag, w.rnum
FROM (
SELECT T1.*
, LAG(course, 1) OVER (PARTITION BY student ORDER BY rnum) prevCourse
, LAG(flag, 1) OVER (PARTITION BY student ORDER BY rnum) prevFlag
FROM #MyTable T1
) w
WHERE prevCourse = 'TA' AND prevFlag = 1
3) JOIN
You can self-JOIN your table on the next rnum and keep only the rows who match the right condition.
SELECT T2.*
FROM MyTable T1
JOIN MyTable T2 ON T1.student = T2.student AND T2.rnum = T1.rnum + 1
WHERE T1.course = 'TA' AND T1.flag = 1
4) CROSS APPLY
You can use CROSS APPLY to specify a subquery with the matching condition. It is pretty similar to EXISTS clause, but you will also get in your resultset the columns from the subquery.
SELECT T2.*
FROM #MyTable T2
CROSS APPLY (
SELECT 'x' x
FROM #MyTable T1
WHERE T1.course = 'TA' AND T1.flag = 1
AND T1.student = T2.student AND T2.rnum = T1.rnum + 1
) x
5) CTE
You can use common table expression (CTE) to extract matching rows and then use it to filter your table with a JOIN.
;WITH
T1 AS (
SELECT student, rnum
FROM #MyTable T1
WHERE T1.course = 'TA' AND T1.flag = 1
)
SELECT T2.*
FROM #MyTable T2
JOIN T1 ON T1.student = T2.student AND T2.rnum = T1.rnum + 1
Adding the rownumber was a good start, you can use it to join the table with itself:
WITH matches AS (
SELECT
student,
rnum
FROM table
WHERE flag = 1
AND course = 'TA'
)
SELECT t.*
FROM table t
JOIN matches m
on t.student = m.student
and t.rnum = m.rnum + 1

Using a subquery in the where clause to select 2nd highest date from a table

I have a need to do (in psuedo code)
where yyyy_mm_dd >= '2019-02-01'
and yyyy_mm_dd <= second highest date in a table
To achieve this, I've used this code:
where
p.yyyy_mm_dd >= "2019-02-02"
and p.yyyy_mm_dd <= (select max(yyyy_mm_dd) from schema.table1 where yyyy_mm_dd < (select max(yyyy_mm_dd) from schema.table1 where yyyy_mm_dd is not null))
The above works when it is wrapped in spark.sql() but when I run the query without Spark i.e. as raw HQL, I run into this error:
Error while compiling statement: FAILED: ParseException line 102:25 cannot recognize input near 'select' 'max' '(' in expression specification
I tried to fix it by aliasing all columns in the subquery like this:
where
p.yyyy_mm_dd >= "2019-02-02"
and p.yyyy_mm_dd <= (select max(t1.yyyy_mm_dd) from schema.table1 t1 where t1.yyyy_mm_dd < (select max(t2.yyyy_mm_dd) from schema.table2 t2 where t2.yyyy_mm_dd is not null))
Though, I still run into the same error.
Edit to include sample data and query:
table1:
| yyyy_mm_dd | company_id | account_manager |
|------------|------------|-----------------|
| 2020-11-10 | 321 | Peter |
| 2020-11-09 | 632 | John |
| 2020-11-08 | 598 | Doe |
| 2020-11-07 | 104 | Bob |
| ... | ... | ... |
| ... | ... | ... |
table2:
| yyyy_mm_dd | company_id | tier |
|-------------------|------------|--------|
| 2020-11-10 | 321 | Bronze |
| 2020-11-09 | 632 | Silver |
| 2020-11-08 | 598 | Gold |
| 2020-11-07 | 104 | Bob |
| ... | ... | ... |
| ... | ... | ... |
| 2019_12_13_backup | 321 | Bronze |
| 2019_12_13_backup | 632 | Silver |
| ... | | |
Query:
select
p.yyyy_mm_dd,
p.company_id,
p.account_manager,
t.tier
from
table1 p
left join(
select
yyyy_mm_dd,
company_id,
max(tier) as tier
from
table2
where
yyyy_mm_dd >= "2019-02-02"
group by
1,2
) t on (t.company_id = p.company_id and t.yyyy_mm_dd = p.yyyy_mm_dd)
where
p.yyyy_mm_dd >= "2019-02-02"
and p.yyyy_mm_dd <= (select max(yyyy_mm_dd) from table2 where yyyy_mm_dd < (select max(yyyy_mm_dd) from table2 where yyyy_mm_dd is not null))
As table2 contains backup_2019_12_31 in the yyyy_mm_dd column, those rows will be returned when doing max() on the table. So I need to get the second highest value, which from the dataset here would be 2020-11-10. There are multiple company_ids per yyyy_mm_dd.
In essence, I want to query table1 where yyyy_mm_dd is between table1 starting point (hardcoded as 2019-02-02) and the true max date from table2
To get the second highest date from table3 you can use dense_rank. All rows with second highest date will be assigned rn=2. Use LIMIT to get single row or use max() or distinct aggregation for the same, then cross join your table with max_date and filter.
with max_date as(
select yyyy_mm_dd
from
(
select yyyy_mm_dd,
dense_rank() over(order by yyyy_mm_dd desc) rn
from table2
)s
where rn=2 --second max date
limit 1 --need only one record
)
select t1.*
from table1 t1
cross join max_date t2
where t1.yyyy_mm_dd <= t2.yyyy_mm_dd

Join on the first id where the date is the same or in the past

How can I join Table1 on Table2 on opid, only if the table1's date <= table2's date, AND it has no other matches?
Here are some example tables:
Table1
------------+-------+-----+
date | spend | opid|
------------+-------+-----+
2019-07-05 | 5 | 1 |
------------+-------+-----+
2019-07-07 | 4 | 2 |
------------+-------+-----+
2019-07-08 | 6 | 2 |
------------+-------+-----+
Table2
+------------+-------+-----+
| date | users | opid|
+------------+-------+-----+
| 2019-07-06 | 100 | 1 |
+------------+-------+-----+
| 2019-07-08 | 200 | 2 |
+------------+-------+-----+
Expected Table
+------------+-------+-------+
| date | spend | users |
+------------+-------+-------+
| 2019-07-05 | 10 | 100 |
+------------+-------+-------+
| 2019-07-07 | 4 | null |
+------------+-------+-------+
| 2019-07-08 | 6 | 200 |
+------------+-------+-------+
So 7-July doesn't join, because 8-July has already joined.
I think you should try with inner join.
select t1.id, t1.date, t1.spend, t2.id as table2_id, t2.users
from table1 t1 inner join
table2 t2
on t1.date <= t2.date;
This answers the original version of the question.
I think you want a full join:
select t1.id, t1.date, t1.spend, t2.id as table2_id, t2.users
from table1 t1 full join
table2 t2
on t1.date = t2.date;

Select max value record from one-to-many join

I want to join two tables, but the second table contains multiple rows of parameters on which I wish to build by join.
TABLE1
+------------+-----------+
| Ddate | ROOMNO |
+------------+-----------+
| 2018-22-11 | 101 |
| 2018-22-11 | 102 |
| 2018-22-11 | 103 |
| 2018-22-11 | 104 |
+------------+-----------+
TABLE2 (Multiple rows per Room No)
+------------+-----------+------------------+
| Ddate | ROOMNO | MaxVoltage |
+------------+-----------+------------------+
| 2018-22-11 | 101 | 230 |
| 2018-22-11 | 101 | 240 |
| 2018-22-11 | 101 | 250 -----MAX |
| 2018-22-11 | 102 | 230 |
| 2018-22-11 | 102 | 255 -----MAX |
+------------+-----------+------------------+
DESIRED RESULT (I want the Max Voltage for the Room on the Ddate)
+------------+-----------+------------+
| Ddate | ROOMNO | MaxVoltage |
+------------+-----------+------------+
| 2018-22-11 | 101 | 250 |
| 2018-22-11 | 102 | 255 |
| 2018-22-11 | 103 | 235 |
| 2018-22-11 | 104 | 238 |
| 2018-22-11 | 105 | 255 |
+------------+-----------+------------+
SELECT t2.d, t2.roomno, max(t2.maxvolt)
FROM table1 AS t1 JOIN table2 AS t2 ON t1.ddate = t2.ddate
AND t1.roomno = t2.roomno
GROUP BY t2.d, t2.roomno;
use subquery
select t1.dDate,t1.roomno,mvoltage from table1 t1 join
(select Ddate ,roomno,max(MaxVoltage ) as mvoltage from table2
group by Ddate,roomno
) t2 on t1.Ddate=t2.Ddate
Use apply:
select t1.*, t2.maxvoltage
from table1 t1 outer apply
(select top (1) t2.*
from table2 t2
where t2.roomno = t1.roomno and t2.ddate = t1.ddate
order by maxvoltage desc
) t2;
select t1.dDate,t1.roomno, mvoltage from table1 t1 join
(select Ddate ,roomno,max(MaxVoltage ) as mvoltage from table2
group by Ddate,roomno
) t2 on t1.Ddate=t2.Ddate and t1.Roomno = t2.RoomNo
First you will join between two tables in a regular way, using the date and room no.
then use the aggregate function max for the voltage field with over clause , then group by Room No and Date like following
select distinct t1.Ddate, t1.RoomNo, MAX(t2.MaxVoltage) over(partition by t1.RoomNo order by t1.Ddate) MaxVoltage
from Table1 t1
join Table2 t2 on t2.Ddate = t1.Ddate and t2.RoomNo = t1.RoomNo

How can I write a select statement for this use case?

Please help me compose a SELECT statement. I have these two tables:
Table1 Table2
---------------- ------------------------------------------------
ID | PName | | ID | NameID | DateActive | HoursActive |
---------------- ------------------------------------------------
1 | Neil | | 1 | 1 | 8/2/2013 | 3 |
2 | Mark | | 2 | 1 | 8/3/2013 | 4 |
3 | Onin | | 3 | 2 | 8/2/2013 | 2 |
---------------- | 4 | 2 | 8/6/2013 | 5 |
| 5 | 3 | 8/7/2013 | 1 |
| 6 | 3 | 8/8/2013 | 10 |
------------------------------------------------
And I just want to retrieve the earliest DateActive but no duplicate PName. Like this:
PName | DateActive | HoursActive |
----------------------------------------
Neil | 8/2/2013 | 3 |
Mark | 8/2/2013 | 2 |
Onin | 8/7/2013 | 1 |
----------------------------------------
Something like this might do it. You need to find the min date for each NameID first, then join back to the table to get the hours.
SELECT
PName, MaxDate as DataActive, HoursActive
From
Table1 t1
inner Join Table2 t2 on t1.ID = t2.NameID
Inner Join (Select min(DateActive) as mindate, NameID from Table2 Group by NameID) as t3 on t3.mindate = t2.ActiveDate and t3.NameID = t2.NameId
This should be a pretty standard solution:
select t.pname,
t2.dateactive,
t2.hoursac
from table1 t
join table2 t2 on t.id = t2.nameid
join (
select nameid, min(dateactive) mindateactive
from table2
group by nameid
) t3 on t2.nameid = t3.name
and t3.mindateactive = t2.dateactive
If you are using an RDBMS that supports partition by statements, then this would be more efficient:
select pname, dateactive, HoursActive
from (
select t.pname,
t2.dateactive,
t2.hoursactive,
rank() over (partition by t.id order by t2.dateactive) rownum
from table1 t
join table2 t2 on t.id = t2.nameid
) t
where rownum = 1