SQL: Select Multiple Columns with Max() on calculated values

SQL: Select Multiple Columns with Max() on calculated values - sql

Real basic: I have table T with following data:
ID StartDate Term (months)
----------------------
1 10/1/2012 12
2 10/1/2012 24
3 12/1/2012 12
I need to know the ID of the row that has the max end date. I've successfully calculated the end date as
select max( DateAdd(month, term, StartDate) from table [this would result in 10/1/2014]
how do i get the ID value and Start Date of the row that contains the max end date?

MS SQL:
SELECT TOP 1 ID, StartDate
FROM T
ORDER BY DateAdd(month, term, StartDate) DESC
MySQL:
SELECT ID, StartDate
FROM T
ORDER BY DateAdd(month, term, StartDate) DESC
LIMIT 1

In case more than one ID has the same extreme "end date" and you need them all, you can try this:
SELECT x.id
FROM (
SELECT id
, RANK ( ) OVER ( ORDER BY DateAdd(month, term, StartDate) DESC) as rn
FROM T
) x
WHERE t.rn = 1

Related

Oracle SQL LAG() function results in duplicate rows

I have a very simple query that results in two rows:
SELECT DISTINCT
id,
trunc(start_date) start_date
FROM example.table
WHERE ID = 1
This results in the following rows:
id start_date
1 7/1/2012
1 9/1/2016
I want to add a column that simply shows the previous date for each row. So I'm using the following:
SELECT DISTINCT id,
Trunc(start_date) start_date,
Lag(start_date, 1)
over (
ORDER BY start_date) pdate
FROM example.table
WHERE id = 1
However, when I do this, I get four rows instead of two:
id start_date pdate
1 7/1/2012 NULL
1 7/1/2012 7/1/2012
1 9/1/2016 7/1/2012
1 9/1/2016 9/1/2012
If I change the offset to 2 or 3 the results remain the same. If I change the offset to 0, I get two rows again but of course now the start_date == pdate.
I can't figure out what's going on

Use an explicit GROUP BY instead:
SELECT id, trunc(start_date) as start_date,
LAG(trunc(start_date)) OVER (PARTITION BY id ORDER BY trunc(start_date))
FROM example.table
WHERE ID = 1
GROUP BY id, trunc(start_date)

The reason for this is: the order of execution of an SQL statements, is that LAG runs before the DISTINCT.
You actually want to run the LAG after the DISTINCT, so the right query should be:
WITH t1 AS (
SELECT DISTINCT id, trunc(start_date) start_date
FROM example.table
WHERE ID = 1
)
SELECT *, LAG(start_date, 1) OVER (ORDER BY start_date) pdate
FROM t1

SQL Query - Combine rows based on multiple columns

On the image above, I'd like to combine rows with the same value on consecutive days.
Combined rows will have the earliest date on From column and the latest date on To column.
Looking at the example, even if Rows 3 and 4 have the same value, they were not combined because of the date gap.
I've tried using LAG and LEAD functions but no luck.

You can try below way -
DEMO
with c as
(
select *, datediff(dd,todate,laedval) as leaddiff,
datediff(dd,todate,lagval) as lagdiff
from
(
select *,lead(todate) over(partition by value order by todate) laedval,
lag(todate) over(partition by value order by todate) lagval
from t1
)A
)
select * from
(
select value,min(todate) as fromdate,max(todate) as todate from c
where coalesce(leaddiff,0)+coalesce(lagdiff,0) in (1,-1)
group by value
union all
select value,fromdate,todate from c
where coalesce(leaddiff,0)+coalesce(lagdiff,0)>1 or coalesce(leaddiff,0)+coalesce(lagdiff,0)<-1
)A order by value
OUTPUT:
value fromdate todate
1 16/07/2019 00:00:00 17/07/2019 00:00:00
3 21/07/2019 00:00:00 26/07/2019 00:00:00
2 18/07/2019 00:00:00 18/07/2019 00:00:00
2 20/07/2019 00:00:00 20/07/2019 00:00:00

I am going to recommend the following approach:
Find where each new group begins. You can do this by comparing the previous maximum todate with the fromdate in this row.
Do a cumulative sum of the starts to define a group.
Aggregate the results.
This can be handled using window functions and aggregation:
select value, min(fromdate) as fromdate, max(todate) as todate
from (select t.*,
sum(case when prev_todate >= dateadd(day, -1, fromdate)
then 0 -- overlap, so this does not begin a new group
else 1 -- no overlap, so this does begin a new group
end) over
(partition by value order by fromdate) as grp
from (select t.*,
max(todate) over (partition by value
order by fromdate
rows between unbounded preceding and 1 preceding
) as prev_todate
from t
) t
) t
group by value, grp
order by value, min(fromdate);
Here is a db<>fiddle.

SQL how to write a query that return missing date ranges?

I am trying to figure out how to write a query that looks at certain records and finds missing date ranges between today and 9999-12-31.
My data looks like below:
ID |start_dt |end_dt |prc_or_disc_1
10412 |2018-07-17 00:00:00.000 |2018-07-20 00:00:00.000 |1050.000000
10413 |2018-07-23 00:00:00.000 |2018-07-26 00:00:00.000 |1040.000000
So for this data I would want my query to return:
2018-07-10 | 2018-07-16
2018-07-21 | 2018-07-22
2018-07-27 | 9999-12-31
I'm not really sure where to start. Is this possible?

You can do that using the lag() function in MS SQL (but that is available starting with 2012?).
with myData as
(
select *,
lag(end_dt,1) over (order by start_dt) as lagEnd
from myTable),
myMax as
(
select Max(end_dt) as maxDate from myTable
)
select dateadd(d,1,lagEnd) as StartDate, dateadd(d, -1, start_dt) as EndDate
from myData
where lagEnd is not null and dateadd(d,1,lagEnd) < start_dt
union all
select dateAdd(d,1,maxDate) as StartDate, cast('99991231' as Datetime) as EndDate
from myMax
where maxDate < '99991231';
If lag() is not available in MS SQL 2008, then you can mimic it with row_number() and joining.

select
CASE WHEN DATEDIFF(day, end_dt, ISNULL(LEAD(start_dt) over (order by ID), '99991231')) > 1 then end_dt +1 END as F1,
CASE WHEN DATEDIFF(day, end_dt, ISNULL(LEAD(start_dt) over (order by ID), '99991231')) > 1 then ISNULL(LEAD(start_dt) over (order by ID) - 1, '99991231') END as F2
from t
Working SQLFiddle example is -> Here
FOR 2008 VERSION
SELECT
X.end_dt + 1 as F1,
ISNULL(Y.start_dt-1, '99991231') as F2
FROM t X
LEFT JOIN (
SELECT
*
, (SELECT MAX(ID) FROM t WHERE ID < A.ID) as ID2
FROM t A) Y ON X.ID = Y.ID2
WHERE DATEDIFF(day, X.end_dt, ISNULL(Y.start_dt, '99991231')) > 1
Working SQLFiddle example is -> Here

This should work in 2008, it assumes that ranges in your table do not overlap. It will also eliminate rows where the end_date of the current row is a day before the start date of the next row.
with dtRanges as (
select start_dt, end_dt, row_number() over (order by start_dt) as rownum
from table1
)
select t2.end_dt + 1, coalesce(start_dt_next -1,'99991231')
FROM
( select dr1.start_dt, dr1.end_dt,dr2.start_dt as start_dt_next
from dtRanges dr1
left join dtRanges dr2 on dr2.rownum = dr1.rownum + 1
) t2
where
t2.end_dt + 1 <> coalesce(start_dt_next,'99991231')

http://sqlfiddle.com/#!18/65238/1
SELECT
*
FROM
(
SELECT
end_dt+1 AS start_dt,
LEAD(start_dt-1, 1, '9999-12-31')
OVER (ORDER BY start_dt)
AS end_dt
FROM
yourTable
)
gaps
WHERE
gaps.end_dt >= gaps.start_dt
I would, however, strongly urge you to use end dates that are "exclusive". That is, the range is everything up to but excluding the end_dt.
That way, a range of one day becomes '2018-07-09', '2018-07-10'.
It's really clear that my range is one day long, if you subtract one from the other you get a day.
Also, if you ever change to needing hour granularity or minute granularity you don't need to change your data. It just works. Always. Reliably. Intuitively.
If you search the web you'll find plenty of documentation on why inclusive-start and exclusive-end is a very good idea from a software perspective. (Then, in the query above, you can remove the wonky +1 and -1.)

This solves your case, but provide some sample data if there will ever be overlaps, fringe cases, etc.
Take one day after your end date and 1 day before the next line's start date.
DECLARE # TABLE (ID int, start_dt DATETIME, end_dt DATETIME, prc VARCHAR(100))
INSERT INTO # (id, start_dt, end_dt, prc)
VALUES
(10410, '2018-07-09 00:00:00.00','2018-07-12 00:00:00.000','1025.000000'),
(10412, '2018-07-17 00:00:00.00','2018-07-20 00:00:00.000','1050.000000'),
(10413, '2018-07-23 00:00:00.00','2018-07-26 00:00:00.000','1040.000000')
SELECT DATEADD(DAY, 1, end_dt)
, DATEADD(DAY, -1, LEAD(start_dt, 1, '9999-12-31') OVER(ORDER BY id) )
FROM #

You may want to take a look at this:
http://sqlfiddle.com/#!18/3a224/1
You just have to edit the begin range to today and the end range to 9999-12-31.

how can i create a loop in my SQL query that will look for each week out of the year and return the top 10 values for each week?

I am writing a query that returns the total number of submissions by person per week. Like Below...
Person WeekNumber Total
ABC 1 12
ADE 1 10
ACD 1 8
LKJ 2 15
HJK 2 14
FGH 2 12
So far, I have the query to do everything EXCEPT being able to select the top Person from EACH WEEK. I am thinking I might need to use a loop to do this but just trying to see if anyone might have a better/easier idea?
Here is my query:
Select sub.Person, sub.WeekNumber, sum(sr_id_count) as TotalSRID
from
(
SELECT
Person,
DATEDIFF(week, '2016-12-25', create_date) AS WeekNumber,
count(SR_ID) as SR_ID_COUNT
from [dbo].[tbl_Hist]
where create_date >= '01/01/2017'
and SR_STatus <> 'Canceled'
and Created_by <> 'System'
group by person, create_date
) sub
group by sub.Person, sub.WeekNumber
order by WeekNumber, TotalSRID desc

The syntax is SQL Server, so I will assume that is the database you are using. If so, you can use row_number(). To get the top person per week:
select *
from (select Person, datediff(week, '2016-12-25', create_date) AS WeekNumber,
count(SR_ID) as SR_ID_COUNT,
row_number() over (partition by datediff(week, '2016-12-25', create_date order by count(SR_ID) desc) as seqnum
from [dbo].[tbl_Hist]
where create_date >= '2017-01-01' and SR_STatus <> 'Canceled' and
Created_by <> 'System'
group by person, datediff(week, '2016-12-25', create_date)
) sub
where seqnum = 1;
If you want the top 10 per week, then use seqnum <= 10.

To club the rows for week days

I have data like below:
StartDate EndDate Duration
----------
41890 41892 3
41898 41900 3
41906 41907 2
41910 41910 1
StartDate and EndDate are respective ID values for any dates from calendar. I want to calculate the sum of duration for consecutive days. Here I want to include the days which are weekends. E.g. in the above data, let's say 41908 and 41909 are weekends, then my required result set should look like below.
I already have another proc that can return me the next working day, i.e. if I pass 41907 or 41908 or 41909 as DateID in that proc, it will return 41910 as the next working day. Basically I want to check if the DateID returned by my proc when I pass the above EndDateID is same as the next StartDateID from above data, then both the rows should be clubbed. Below is the data I want to get.
ID StartDate EndDate Duration
----------
278457 41890 41892 3
278457 41898 41900 3
278457 41906 41910 3
Please let me know in case the requirement is not clear, I can explain further.
My Date Table is like below:
DateId Date Day
----------
41906 09-04-2014 Thursday
41907 09-05-2014 Friday
41908 09-06-2014 Saturdat
41909 09-07-2014 Sunday
41910 09-08-2014 Monday
Here is the SQL Code for setup:
CREATE TABLE Table1
(
StartDate INT,
EndDate INT,
LeaveDuration INT
)
INSERT INTO Table1
VALUES(41890, 41892, 3),
(41898, 41900, 3),
(41906, 41907, 3),
(41910, 41910, 1)
CREATE TABLE DateTable
(
DateID INT,
Date DATETIME,
Day VARCHAR(20)
)
INSERT INTO DateTable
VALUES(41907, '09-05-2014', 'Friday'),
(41908, '09-06-2014', 'Saturday'),
(41909, '09-07-2014', 'Sunday'),
(41910, '09-08-2014', 'Monday'),
(41911, '09-09-2014', 'Tuesday')

This is rather complicated. Here is an approach using window functions.
First, use the date table to enumerate the dates without weekends (you can also take out holidays if you want). Then, expand the periods into one day per row, by using a non-equijoin.
You can then use a trick to identify sequential days. This trick is to generate a sequential number for each id and subtract it from the sequential number for the dates. This is a constant for sequential days. The final step is simply an aggregation.
The resulting query is something like this:
with d as (
select d.*, row_number() over (order by date) as seqnum
from dates d
where day not in ('Saturday', 'Sunday')
)
select t.id, min(t.date) as startdate, max(t.date) as enddate, sum(duration)
from (select t.*, ds.seqnum, ds.date,
(d.seqnum - row_number() over (partition by id order by ds.date) ) as grp
from table t join
d ds
on ds.date between t.startdate and t.enddate
) t
group by t.id, grp;
EDIT:
The following is the version on this SQL Fiddle:
with d as (
select d.*, row_number() over (order by date) as seqnum
from datetable d
where day not in ('Saturday', 'Sunday')
)
select t.id, min(t.date) as startdate, max(t.date) as enddate, sum(duration)
from (select t.*, ds.seqnum, ds.date,
(ds.seqnum - row_number() over (partition by id order by ds.date) ) as grp
from (select t.*, 'abc' as id from table1 t) t join
d ds
on ds.dateid between t.startdate and t.enddate
) t
group by grp;
I believe this is working, but the date table doesn't have all the dates in it.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL: Select Multiple Columns with Max() on calculated values - sql

MS SQL: SELECT TOP 1 ID, StartDate FROM T ORDER BY DateAdd(month, term, StartDate) DESC MySQL: SELECT ID, StartDate FROM T ORDER BY DateAdd(month, term, StartDate) DESC LIMIT 1

In case more than one ID has the same extreme "end date" and you need them all, you can try this: SELECT x.id FROM ( SELECT id , RANK ( ) OVER ( ORDER BY DateAdd(month, term, StartDate) DESC) as rn FROM T ) x WHERE t.rn = 1

Related

Oracle SQL LAG() function results in duplicate rows

SQL Query - Combine rows based on multiple columns

SQL how to write a query that return missing date ranges?

how can i create a loop in my SQL query that will look for each week out of the year and return the top 10 values for each week?

To club the rows for week days

Categories

Resources