sql select query and group by issue - sql

Given the following Contract table records
Id EmployeeId StartDate EndDate
1 5601 2011-01-01 2011-09-01
2 5601 2011-09-02 2012-05-01
3 5601 2012-02-01 2012-08-01
4 5602 2011-01-01 2011-09-01
5 5602 2011-07-01 2012-10-01
Every Employee could have multiple contract
I'm trying to find invalid contract which StartDate is bigger than EndDate for each Employee.
For the given result Id=3 and Id=5 is invalid .
What i have done is :
SELECT a.Id
FROM Contracts a
GROUP BY a.EmpId
HAVING a.StartDate > a.EndDate
But I get this error :
Column 'Contract.Id' is invalid in the HAVING clause because it is not contained in either an aggregate function or the group by clause.
Any idea ?

If I understood correctly, you want records where StartDate is not bigger then previous EndDate?
You can do that using CTE and ROW_NUMBER() function - joining the previous and current record.
WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY StartDate) RN
FROM Contracts
)
SELECT * FROM CTE c1
INNER JOIN CTE c2 ON c1.RN + 1 = c2.RN AND c1.EmployeeID = c2.EmployeeID
WHERE c1.EndDATE > c2.StartDate

You may try:
SELECT a.Id, a.EmpId
FROM Contracts a
WHERE a.StartDate > a.EndDate
GROUP BY a.Id, a.EmpId

Related

Order By One One Column in MSSQL

I Have the following SQL Tables:
[Calendar]
[CalendarId]
[Name]
SAMPLE DATA:
CalendarId ResourceKey Name
1 1 tk1-Room1
2 2 tk1-Room2
3 3 tk1-noentries
[CalendarEntry]
[CalendarId]
[CalendarEntryId]
[Start]
[End]
SAMPLE DATA:
CalendarId Start End
1 2019-11-18 16:00:00.0000000 2019-11-18 17:00:00.0000000
1 2019-11-19 16:00:00.0000000 2019-11-19 17:00:00.0000000
2 2019-11-25 16:00:00.0000000 2019-11-25 17:00:00.0000000
1 2019-11-25 17:00:00.0000000 2019-11-25 18:00:00.0000000
Expected output:
Name StartDate EndDate ResourceKey
tk1-Room1 2019-11-25 17:00:00 2019-11-25 17:00:00 1
tk1-Room2 2019-11-25 16:00:00 2019-11-25 17:00:00 2
tk1-noentries NULL NULL 3
I am trying to list all Calendar entries, with their corresponding Start(Most Recent) and End Times.
I have the following code which is working partially:
SELECT Name,StartDate,ResourceKey FROM [Calendar].[dbo].[Calendar] CAL
LEFT JOIN(
SELECT
CalendarId,
MAX(ENT.[Start]) as StartDate
FROM [CalendarEntry] ENT
GROUP BY CalendarId
)
AS ST on CAL.CalendarId = ST.CalendarId
However, If i was to include that column, In my sub SELECT, EG:
SELECT
CalendarId,
MAX(ENT.[Start]) as StartDate,
ENT.[End] as endDate
I get the following error:
Column 'CalendarEntry.End' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
However, including it in the GROUP BY now causes Multiple CalendarEntry Rows to come back for each Calendar..
What is the best way for me to grab the most recent row out of CalendarEntry which allows me access to all the columns?
Thanks!
This is a typical top 1 per group question.
You can either use row_number():
select *
from (
select
c.*,
e.*,
row_number() over(partition by c.CalendarId order by e.Start desc) rn
from [Calendar].[dbo].[Calendar] c
left join [CalendarEntry] e ON c.CalendarId = e.CalendarId
) t
where rn = 1
Or you can filter with a correlated subquery:
select c.*, e.*
from [Calendar].[dbo].[Calendar] c
left join [CalendarEntry] e
on c.CalendarId = e.CalendarId
and c.Start = (
select max(e1.Start) from [CalendarEntry] e where c.CalendarId = e1.CalendarId
)
I am trying to list all Calendar entries, with their corresponding Start(Most Recent) and End Times.
I interpret this as the most recent record from CalendarEntry for each CalendarId:
select ce.*
from CalendarEntry ce
where ce.StartDate = (select max(ce2.StartDate)
from CalendarEntry ce2
where ce2.CalendarId = ce.CalendarId
);
You can try OUTER APPLY too, however #GMB's answer is a better approach from performance prospective
SELECT Name,
StartDate,
EndDate,
ResourceKey
FROM dbo.Calendar AS C
OUTER APPLY
(
SELECT TOP 1 *
FROM dbo.CalendarEntry
WHERE CalendarId = C.CalendarId
ORDER BY StartDate DESC,
EndDate DESC
) AS K;
You can also try LAST_VALUE/FIRST_VALUE(available in SQL Server 2012 and later) functions too, as below, , however again #GMB's answer is a better approach from performance prospective:
SELECT DISTINCT
Name,
LAST_VALUE(StartDate) OVER (PARTITION BY C.CalendarId
ORDER BY StartDate,EndDate
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING),
LAST_VALUE(EndDate) OVER (PARTITION BY C.CalendarId
ORDER BY StartDate,EndDate
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING),
ResourceKey
FROM dbo.Calendar AS C
LEFT JOIN dbo.CalendarEntry
ON CalendarEntry.CalendarId = C.CalendarId;
If you want to use FIRST_VALUE function, then you should rewrite the order by as below:
ORDER BY StartDate DESC,EndDate DESC
And you also you will not need to specify the ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING section
SELECT DISTINCT
Name,
FIRST_VALUE(StartDate) OVER (PARTITION BY C.CalendarId
ORDER BY StartDate DESC,EndDate DESC),
FIRST_VALUE(EndDate) OVER (PARTITION BY C.CalendarId
ORDER BY StartDate DESC,EndDate DESC),
ResourceKey
FROM dbo.Calendar AS C
LEFT JOIN dbo.CalendarEntry
ON CalendarEntry.CalendarId = C.CalendarId;

SQL analytical functions first value over max(another atribute)

I have a table month_totals, which looks like:
Name DateFrom Total
a 2017-01-01 34
b 2017-01-01 54
a 2017-02-01 22
b 2017-02-01 12
a 2017-03-01 34
b 2017-03-01 54
How to select latest Totals per Name where DateFrom < '2017-03-01' (possibly using analytical functions)
Following statement does not work as expected
SELECT name,
First_value(total)
OVER (
ORDER BY Max(datefrom) DESC)
FROM month_totals
WHERE datefrom < '2017-03-01'
GROUP BY NAME
The desired result should be
Name Total
a 22
b 12
You can try below using min aggregation
SELECT name,min(total)
FROM month_totals
GROUP BY NAME
OR you can use row_number()
select * from
(
SELECT name,
row_number() over(partition by name order by total)rn
FROM month_totals
WHERE datefrom < '2017-03-01'
)A where rn=1
Although you can use window functions, I think a correlated subquery is a simple-enough way to write the query and should have good performance;
select mt.*
from month_totals mt
where mt.datefrom = (select max(mt2.datefrom)
from month_totals mt2
where mt2.name = mt.name and mt2.datefrom < '2017-03-01'
);

select rows in sql with end_date >= start_date for each ID repeated multiple times

Attached the image how the data looks like. In my table I have 3 columns id, start date, and end date, and values like this:
id start date end date
-------------------------------
100 2015-01-01 2015-12-31
100 2016-01-10 2018-12-31
200 2015-02-15 2016-03-15
200 2016-03-15 2016-12-31
300 2016-01-01 2016-12-31
400 2017-01-01 2017-12-31
500 2017-02-01 2017-12-31
600 2017-01-15 2017-03-05
600 2017-02-01 2018-12-31
I want my output to be
id start date end date
--------------------------------
100 2015-01-01 2015-12-31
100 2016-01-10 2018-12-31
200 2015-02-15 2016-12-31
300 2016-01-01 2016-12-31
400 2017-01-01 2017-12-31
500 2017-02-01 2017-12-31
600 2017-01-15 2018-12-31
Query:
select
id, *
from
dbo.test_sl
where
id in (select id
from dbo.test_sl
where end_date >= start_date
group by id)
Please help me get the output I am looking for.
This is an example of a gaps-and-islands problem. In this case, you want to find adjacent rows that do not overlap for the same id. These are the starts of groups. A cumulative sum of the starts of a group providing a grouping number, which can be used for aggregation.
In a query, this looks like:
select id, min(startdate), max(enddate)
from (select t.*,
sum(isstart) over (partition by id order by startdate) as grp
from (select t.*,
(case when exists (select 1
from test_sl t2
where t2.id = t.id and
t2.startdate < t.startdate and
t2.enddate >= t.startdate
)
then 0 else 1
end) as isstart
from test_sl t
) t
) t
group by id, grp;
Assuming that only two records can be combined together, you can LEFT JOIN the table with itself and then use a CASE to display the end date of the self-joined record, if available.
SELECT
t1.id,
min(t1.start_date),
CASE WHEN t2.end_date IS NULL THEN t1.end_date ELSE t2.end_date END
FROM
table t1
LEFT JOIN table t2
ON t1.id = t2.id
AND t2.start_date > t1.start_date
AND t2.start_date <= t1.end_date
GROUP BY
t1.id,
CASE WHEN t2.end_date IS NULL THEN t1.end_date ELSE t2.end_date END
ORDER BY 1
Tested in this SQL Fiddle
Here's a solution that uses a Recursive CTE.
It basically loops through the dates per id, and keeps the smallest start_date for the overlapping end_date/start_date.
Then the result is grouped so there are no more overlaps.
Test here on rextester.
WITH SRC AS
(
SELECT id, start_date, end_date,
row_number() over (partition by id order by start_date) as rn
FROM test_sl
)
, RCTE AS
(
SELECT id, rn, start_date, end_date
FROM SRC
WHERE rn = 1
UNION ALL
SELECT t.id, t.rn, iif(r.end_date >= t.start_date, r.start_date, t.start_date), t.end_date
FROM RCTE r
JOIN SRC t ON t.id = r.id AND t.rn = r.rn + 1
)
SELECT id, start_date, max(end_date) as end_date
FROM RCTE
GROUP BY id, start_date
ORDER BY id, start_date;

Finding the interval between dates in SQL Server

I have a table including more than 5 million rows of sales transactions. I would like to find sum of date intervals between each customer three recent purchases.
Suppose my table looks like this :
CustomerID ProductID ServiceStartDate ServiceExpiryDate
A X1 2010-01-01 2010-06-01
A X2 2010-08-12 2010-12-30
B X4 2011-10-01 2012-01-15
B X3 2012-04-01 2012-06-01
B X7 2012-08-01 2013-10-01
A X5 2013-01-01 2015-06-01
The Result that I'm looking for may looks like this :
CustomerID IntervalDays
A 802
B 135
I know the query need to first retrieve 3 resent transactions of each customer (based on ServiceStartDate) and then calculate the interval between startDate and ExpiryDate of his/her transactions.
You want to calculate the difference between the previous row's ServiceExpiryDate and the current row's ServiceStartDate based on descending dates and then sum up the last two differences:
with cte as
(
select tab.*,
row_number()
over (partition by customerId
order by ServiceStartDate desc
, ServiceExpiryDate desc -- don't know if this 2nd column is necessary
) as rn
from tab
)
select t2.customerId,
sum(datediff(day, prevEnd, ServiceStartDate)) as Intervaldays
,count(*) as purchases
from cte as t2 left join cte as t1
on t1.customerId = t2.customerId
and t1.rn = t2.rn+1 -- previous and current row
where t2.rn <= 3 -- last three rows
group by t2.customerId;
Same result using LEAD:
with cte as
(
select tab.*,
row_number()
over (partition by customerId
order by ServiceStartDate desc) as rn
,lead(ServiceExpiryDate)
over (partition by customerId
order by ServiceStartDate desc
) as prevEnd
from tab
)
select customerId,
sum(datediff(day, prevEnd, ServiceStartDate)) as Intervaldays
,count(*) as purchases
from cte
where rn <= 3
group by customerId;
Both will not return the expected result unless you subtract purchases (or max(rn)) from Intervaldays. But as you only sum two differences this seems to be not correct for me either...
Additional logic must be applied based on your rules regarding:
customer has less than 3 purchases
overlapping intervals
Assuming there are no overlaps, I think you want this:
select customerId,
sum(datediff(day, ServiceStartDate, ServieEndDate) as Intervaldays
from (select t.*, row_number() over (partition by customerId
order by ServiceStartDate desc) as seqnum
from table t
) t
where seqnum <= 3
group by customerId;
Try this:
SELECT dt.CustomerID,
SUM(DATEDIFF(DAY, dt.PrevExpiry, dt.ServiceStartDate)) As IntervalDays
FROM (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY ServiceStartDate DESC) AS rn
, (SELECT Max(ti.ServiceExpiryDate)
FROM yourTable ti
WHERE t.CustomerID = ti.CustomerID
AND ti.ServiceStartDate < t.ServiceStartDate) As PrevExpiry
FROM yourTable t )dt
GROUP BY dt.CustomerID
Result will be:
CustomerId | IntervalDays
-----------+--------------
A | 805
B | 138

Concatenation of adjacent dates in SQL

I would like to know how to make intersections or concatenations of adjacent date ranges in sql.
I have a list of customer start and end dates, for example (in dd/mm/yyyy format, where 31/12/9999 means the customer is still a current customer).
CustID | StartDate | Enddate |
1 | 01/08/2011|19/06/2012|
1 | 20/06/2012|07/03/2012|
1 | 03/05/2012|31/12/9999|
2 | 09/03/2009|16/08/2009|
2 | 16/01/2010|10/10/2010|
2 | 11/10/2010|31/12/9999|
3 | 01/08/2010|19/08/2010|
3 | 20/08/2010|26/12/2011|
Although the dates in different rows don't overlap, I would consider some of the ranges as a contigous period of time, e.g when the start date comes one day after an end date (for a given customer). Hence I would like to return a query that returns just the intersection of the dates,
CustID | StartDate | Enddate |
1 | 01/08/2011|07/03/2012|
1 | 03/05/2012|31/12/9999|
2 | 09/03/2009|16/08/2009|
2 | 16/01/2010|31/12/9999|
3 | 01/08/2010|26/12/2011|
I've looked at CTE tables, but I can't figure out how to return just one row for one contigous block of dates.
This should work in 2005 forward:
;WITH cte2 AS (SELECT 0 AS Number
UNION ALL
SELECT Number + 1
FROM cte2
WHERE Number < 10000)
SELECT CustID, Min(GroupStart) StartDate, MAX(EndDate) EndDate
FROM (SELECT *
, DATEADD(DAY,b.number,a.StartDate) GroupStart
, DATEADD(DAY,1- DENSE_RANK() OVER (PARTITION BY CustID ORDER BY DATEADD(DAY,b.number,a.StartDate)),DATEADD(DAY,b.number,a.StartDate)) GroupDate
FROM Table1 a
JOIN cte2 b
ON b.number <= DATEDIFF(d, startdate, EndDate)
) X
GROUP BY CustID, GroupDate
ORDER BY CustID, StartDate
OPTION (MAXRECURSION 0)
Demo: SQL Fiddle
You can build a quick table of numbers 0-something large enough to cover the spread of dates in your ranges to replace the cte so it doesn't run each time, indexed properly it will run quickly.
you can do this with recursive common table expression:
with cte as (
select t.CustID, t.StartDate, t.EndDate, t2.StartDate as NextStartDate
from Table1 as t
left outer join Table1 as t2 on t2.CustID = t.CustID and t2.StartDate = case when t.EndDate < '99991231' then dateadd(dd, 1, t.EndDate) end
), cte2 as (
select c.CustID, c.StartDate, c.EndDate, c.NextStartDate
from cte as c
where c.NextStartDate is null
union all
select c.CustID, c.StartDate, c2.EndDate, c2.NextStartDate
from cte2 as c2
inner join cte as c on c.CustID = c2.CustID and c.NextStartDate = c2.StartDate
)
select CustID, min(StartDate) as StartDate, EndDate
from cte2
group by CustID, EndDate
order by CustID, StartDate
option (maxrecursion 0);
sql fiddle demo
Quick performance tests:
Results on 750 rows, small periods of 2 days length:
sql fiddle demo
My query: 300 ms
Goat CO query with CTE: 10804 ms
Goat CO query with table of fixed numbers: 7 ms
Results on 5 rows, large periods:
sql fiddle demo
My query: 1 ms
Goat CO query with CTE: 700 ms
Goat CO query with table of fixed numbers: 36 ms