Remove all duplicated records from a resultset(remove both) - sql

I have a result set in generated as CTE using Union that contains duplicate records. as in image below:
And the query is:
WITH CTE (StartTime ,EndTime )
AS
(
SELECT StartTime ,EndTime, Null as Exclude, SupplierId FROM cms.TimeSlotMaster
WHERE Monday = 1 AND SupplierID IS NULL
UNION
SELECT StartTime ,EndTime FROM cms.TimeSlotOverRider
WHERE SupplierID IS NULL
AND StartDate <= cast(GETDATE() as DATE) AND EndDate >= cast(GETDATE() as DATE)
)
Now I am trying to remove the duplicate results from this result set at all. So finally the results set should be only 2 rows. So it should look like below:
Any help would be appreciated. Thanks.
For more information the first result set is generated using below CTE

You can use NOT EXISTS:
SELECT t.*
FROM dbo.TableName t
WHERE NOT EXISTS
(
SELECT 1 FROM dbo.TableName t2
WHERE t. ID <> t2.ID
AND t.StartTime = t2.StartTime
AND t.EndTime = t2.EndTime
)
or - if you don't have a primary key in this table:
WITH CTE AS
(
SELECT t.*, cnt = COUNT(*) OVER (PARTITION BY StartTime, EndTime)
FROM dbo.TableName t
)
SELECT StartTime, EndTime
FROM CTE
WHERE cnt = 1

Related

Avoiding use of a temp table by using sub queries

I want to create a subquery to avoid the use of a temp table. Right now I have:
select id,COUNT (id)as Attempts
into #tmp
from Table1
where State in ('SD')
and Date >= cast( GETDATE() -7 as date )
group by Accountid having COUNT (accountid) > 2
select *
from #tmp a join Table1 b on a.id= b.id
and b.Date >= cast( GETDATE() -7 as date )
where CAST(Date as date) = cast(GETDATE()-1 as date)
order by a.id,b.Date
Is there a way to get this result in just one query?
Replace #tmp in the second query with the first query enclosed in parenthesis, as in:
select *
from (
select id,COUNT (id) as Attempts
from Table1
where State in ('SD')
and Date >= cast( GETDATE() -7 as date )
group by Accountid having COUNT (accountid) > 2
) a join Table1 b on a.id= b.id
and b.Date >= cast( GETDATE() -7 as date )
where CAST(Date as date) = cast(GETDATE()-1 as date)
order by a.id,b.Date
The first query becomes a "table expression".

Select date ranges where periods do not overlap

I have two tables each containing the start and end dates of several periods. I want an efficient way to find periods (date ranges) where dates are within the ranges of the first table but not within ranges of the second table.
For example, if this is my first table (with dates that I want)
start_date end_date
2001-01-01 2010-01-01
2012-01-01 2015-01-01
And this is my second table (with dates that I do not want)
start_date end_date
2002-01-01 2006-01-01
2003-01-01 2004-01-01
2005-01-01 2009-01-01
2014-01-01 2018-01-01
Then output looks like
start_date end_date
2001-01-01 2001-12-31
2009-01-02 2010-01-01
2012-01-01 2013-12-31
We can safely assume that periods in the first table are non-overlapping, but can not assume periods in the second table are overlapping.
I already have a method for doing this but it is an order of magnitude slower than I can accept. So hoping someone can propose a faster approach.
My present method looks like:
merge table 2 into non-overlapping periods
find the inverse of table 2
join overlapping periods from table 1 and inverted-table-2
I am sure there is a faster way if some of these steps can be merged together.
In more detail
/* (1) merge overlapping preiods */
WITH
spell_starts AS (
SELECT [start_date], [end_date]
FROM table_2 s1
WHERE NOT EXISTS (
SELECT 1
FROM table_2 s2
WHERE s2.[start_date] < s1.[start_date]
AND s1.[start_date] <= s2.[end_date]
)
),
spell_ends AS (
SELECT [start_date], [end_date]
FROM table_2 t1
WHERE NOT EXISTS (
SELECT 1
FROM table_2 t2
WHERE t2.[start_date] <= t1.[end_date]
AND t1.[end_date] < t2.[end_date]
)
)
SELECT s.[start_date], MIN(e.[end_date]) as [end_date]
FROM spell_starts s
INNER JOIN spell_ends e
ON s.[start_date] <= e.[end_date]
GROUP BY s.[start_date]
/* (2) inverse table 2 */
SELECT [start_date], [end_date]
FROM (
/* all forward looking spells */
SELECT DATEADD(DAY, 1, [end_date]) AS [start_date]
,LEAD(DATEADD(DAY, -1, [start_date]), 1, '9999-01-01') OVER ( ORDER BY [start_date] ) AS [end_date]
FROM merge_table_2
UNION ALL
/* back looking spell (to 'origin of time') created separately */
SELECT '1900-01-01' AS [start_date]
,DATEADD(DAY, -1, MIN([start_date])) AS [end_date]
FROM merge_table_2
) k
WHERE [start_date] <= [end_date]
AND '1900-01-01' <= [start_date]
AND [end_date] <= '9999-01-01'
/* (3) overlap spells */
SELECT IIF(t1.start_date < t2.start_date, t2.start_date, t1.start_date) AS start_date
,IIF(t1.end_date < t2.end_date, t1.end_date, t2.end_date) AS end_date
FROM table_1 t1
INNER JOIN inverse_merge_table_2 t2
ON t1.start_date < t2.end_date
AND t2.start_date < t1.end_date
Hope this helps. I have comment the two ctes I am using for explanation purposes
Here you go:
drop table table1
select cast('2001-01-01' as date) as start_date, cast('2010-01-01' as date) as end_date into table1
union select '2012-01-01', '2015-01-01'
drop table table2
select cast('2002-01-01' as date) as start_date, cast('2006-01-01' as date) as end_date into table2
union select '2003-01-01', '2004-01-01'
union select '2005-01-01', '2009-01-01'
union select '2014-01-01', '2018-01-01'
/***** Solution *****/
-- This cte put all dates into one column
with cte as
(
select t
from
(
select start_date as t
from table1
union all
select end_date
from table1
union all
select dateadd(day,-1,start_date) -- for table 2 we bring the start date back one day to make sure we have nothing in the forbidden range
from table2
union all
select dateadd(day,1,end_date) -- for table 2 we bring the end date forward one day to make sure we have nothing in the forbidden range
from table2
)a
),
-- This one adds an end date using the lead function
cte2 as (select t as s, coalesce(LEAD(t,1) OVER ( ORDER BY t ),t) as e from cte a)
-- this query gets all intervals not in table2 but in table1
select s, e
from cte2 a
where not exists(select 1 from table2 b where s between dateadd(day,-1,start_date) and dateadd(day,1,end_date) and e between dateadd(day,-1,start_date) and dateadd(day,1,end_date) )
and exists(select 1 from table1 b where s between start_date and end_date and e between start_date and end_date)
and s <> e
If you want performance, then you want to use window functions.
The idea is to:
Combine the dates with flags of being in-and-out of the two tables.
Use cumulative sums to determine where dates start being in-and-out.
Then you have a gaps-and-islands problem where you want to combine the results.
Finally, filter on the particular periods you want.
This looks like:
with dates as (
select start_date as dte, 1 as in1, 0 as in2
from table1
union all
select dateadd(day, 1, end_date), -1, 0
from table1
union all
select start_date, 0, 1 as in2
from table2
union all
select dateadd(day, 1, end_date), 0, -1
from table2
),
d as (
select dte,
sum(sum(in1)) over (order by dte) as ins_1,
sum(sum(in2)) over (order by dte) as ins_2
from dates
group by dte
)
select min(dte), max(next_dte)
from (select d.*, dateadd(day, -1, lead(dte) over (order by dte)) as next_dte,
row_number() over (order by dte) as seqnum,
row_number() over (partition by case when ins_1 >= 1 and ins_2 = 0 then 'in' else 'out' end order by dte) as seqnum_2
from d
) d
group by (seqnum - seqnum_2)
having max(ins_1) > 0 and max(ins_2) = 0
order by min(dte);
Here is a db<>fiddle.
Thanks to #zip and #Gordon for their answers. Both were superior to my initial approach. However, the following solution was faster than both of their approaches in my environment & context:
WITH acceptable_starts AS (
SELECT [start_date] FROM table1 AS a
WHERE NOT EXISTS (
SELECT 1 FROM table2 AS b
WHERE DATEADD(DAY, 1, a.[end_date]) BETWEEN b.[start_date] AND b.
UNION ALL
SELECT DATEADD(DAY, 1, [end_date]) FROM table2 AS a
WHERE NOT EXISTS (
SELECT 1 FROM table2 AS b
WHERE DATEADD(DAY, 1, a.[end_date]) BETWEEN b.[start_date] AND b.[end_date]
)
),
acceptable_ends AS (
SELECT [end_date] FROM table1 AS a
WHERE NOT EXISTS (
SELECT 1 FROM table2 AS b
WHERE DATEADD(DAY, -1, a.[start_date]) BETWEEN b.[start_date] AND b.[end_date]
)
UNION ALL
SELECT DATEADD(DAY, -1, [start_date]) FROM table2 AS a
WHERE NOT EXISTS (
SELECT 1 FROM table2 AS b
WHERE DATEADD(DAY, -1, a.[start_date]) BETWEEN b.[start_date] AND b.[end_date]
)
)
SELECT s.[start_date], MIN(e.[end_date]) AS [end_date]
FROM acceptable_starts
INNER JOIN acceptable_ends
ON s.[start_date] < e.[end_date]

Calculating AVG for NULL values from all previous rows

I have a table with 4 columns like this: EmployeeID, Date, StartTime, EndTime. First two columns are not nullable, but others 2 are.
I want to generate a report and fill the missing StartTime and EndTime with AVG value of the previous rows. I'm using the following statement for the StartTime column:
ISNULL([StartTime], DATEADD(SECOND, AVG([dbo].[GetTimeInSecondsFromDateTime]([StartTime])) OVER (PARTITION BY [EmployeeID] ORDER BY [Date]), [Date]))
The problem is, when i have 2 NULL values one after another, they get the same value (AVG from all the previous ones) and what i need is: in the calculation for the second NULL value, the previous one to be included too (which is calculated) ... the thing is, i have no idea how to implement it.
The query is not tested. I Hope it helps
Because of the null values, I suggest you to first Update By StartTime
UPDATE t1
SET StartTime = ISNULL(StartTime, t2.AvgStartTime)
FROM yourTable t1
JOIN (
SELECT
EmployeeID,
Date,
Avg(StartTime) OVER(ORDER BY Date Asc) As AvgStartTime,
FROM yourTAble
) t2 ON t1.EmployeeID = t2.EmployeeID
Where
t1.StartTime is null
then for endTime
UPDATE t1
SET EndTime = ISNULL(EndTime, t2.AvgEndTime)
FROM yourTable t1
JOIN (
SELECT
EmployeeID,
Date,
Avg(EndTime) OVER(ORDER BY Date Asc) As AvgEndTime,
FROM yourTAble
) t2 ON t1.EmployeeID = t2.EmployeeID
Where
t1.EndTime is null

SQL - create multiple calendars in one table

I am trying to create a table which calculates multiple calendars depending on the start and end date of a certain row. I have a table which looks like this:
key Start_date End_date
123.1 1-10-2009 24-12-2009
123.2 1-7-2010 9-2-2011
123.3 1-5-2011 30-10-2011
.........
For each key I want a new row with startdate +1 month until enddate.
For now I have query which works only if my temporary table contains one row, which is:
DECLARE #StartDate DATE = (select Start_date from #dim2);
SET DATEFIRST 7;
SET DATEFORMAT ymd;
SET LANGUAGE US_ENGLISH;
DECLARE #CutoffDate DATE = (select End_date from #dim2);
CREATE TABLE #dim3
([verwachte_aflossing] DATE,-- PRIMARY KEY,
);
INSERT #dim3([verwachte_aflossing] )
SELECT d
FROM
(
SELECT d = DATEADD(month, rn-1, #StartDate)
FROM
(
SELECT TOP (DATEDIFF(month, #StartDate, #CutoffDate))
rn = ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
ORDER BY s1.[object_id]
) AS x
) AS y
Does anyone now how to handle this?
For SQL Server recursive cte is one option to do that
WITH CTE as
(
select [Key], start_date, end_date from table
union all
select [Key], dateadd(month,1,start_date) start_date, end_date
from cte
where datediff(month, start_date, end_date) > 0
)
SELECT * FROM CTE
OPTION (MAXRECURSION 0)

Select rows where value is equal given value or lower and nearest to it

Sorry for confusing title. Please, tell, if it's possible to do via db request. Assume we have following table
ind_id name value date
----------- -------------------- ----------- ----------
1 a 10 2010-01-01
1 a 20 2010-01-02
1 a 30 2010-01-03
2 b 10 2010-01-01
2 b 20 2010-01-02
2 b 30 2010-01-03
2 b 40 2010-01-04
3 c 10 2010-01-01
3 c 20 2010-01-02
3 c 30 2010-01-03
3 c 40 2010-01-04
3 c 50 2010-01-05
4 d 10 2010-01-05
I need to query all rows to include each ind_id once for the given date, and if there's no ind_id for given date, then take the nearest lower date, if there's no any lower dates, then return ind_id + name (name/ind_id pairs are equal) with nulls.
For example, date is 2010-01-04, I expect following result:
ind_id name value date
----------- -------------------- ----------- ----------
1 a 30 2010-01-03
2 b 40 2010-01-04
3 c 40 2010-01-04
4 d NULL NULL
If it's possible, I'll be very grateful if someone help me with building query. I'm using SQL server 2008.
Check this SQL FIDDLE DEMO
with CTE_test
as
(
select int_id,
max(date) MaxDate
from test
where date<='2010-01-04 00:00:00:000'
group by int_id
)
select A.int_id, A.[Value], A.[Date]
from test A
inner join CTE_test B
on a.int_id=b.int_id
and a.date = b.Maxdate
union all
select int_id, null, null
from test
where int_id not in (select int_id from CTE_test)
(Updated) Try:
with cte as
(select m.*,
max(date) over (partition by ind_id) max_date,
max(case when date <= #date then date end) over
(partition by ind_id) max_acc_date
from myTable m)
select ind_id,
name,
case when max_acc_date is null then null else value end value,
max_acc_date date
from cte c
where date = coalesce(max_acc_date, max_date)
(SQLFiddle here)
Here is a query that returns the result that you are looking for:
SELECT
t1.ind_id
, CASE WHEN t1.date <= '2010-01-04' THEN t1.value ELSE null END
FROM test t1
WHERE t1.date=COALESCE(
(SELECT MAX(DATE)
FROM test t2
WHERE t2.ind_id=t1.ind_id AND t2.date <= '2010-01-04')
, t1.date)
The idea is to pick a row in a correlated query such that its ID matches that of the current row, and the date is the highest one prior to your target date of '2010-01-04'.
When such row does not exist, the date for the current row is returned. This date needs to be replaced with a null; this is what the CASE statement at the top is doing.
Here is a demo on sqlfiddle.
You can use something like:
declare #date date = '2010-01-04'
;with ids as
(
select distinct ind_id
from myTable
)
,ranks as
(
select *
, ranking = row_number() over (partition by ind_id order by date desc)
from myTable
where date <= #date
)
select ids.ind_id
, ranks.value
, ranks.date
from ids
left join ranks on ids.ind_id = ranks.ind_id and ranks.ranking = 1
SQL Fiddle with demo.
Ideally you wouldn't be using the DISTINCT statement to get the ind_id values to include, but I've used it in this case to get the results you needed.
Also, standard disclaimer for these sorts of queries; if you have duplicate data you should consider a tie-breaker column in the ORDER BY or using RANK instead of ROW_NUMBER.
Edited after OPs update
Just add the new column into the existing query:
with ids as
(
select distinct ind_id, name
from myTable
)
,ranks as
(
select *
, ranking = row_number() over (partition by ind_id order by date desc)
from myTable
where date <= #date
)
select ids.ind_id
, ids.name
, ranks.value
, ranks.date
from ids
left join ranks on ids.ind_id = ranks.ind_id and ranks.ranking = 1
SQL Fiddle with demo.
As with the previous one it would be best to get the ind_id/name information through joining to a standing data table if available.
Try
DECLARE #date DATETIME;
SET #date = '2010-01-04';
WITH temp1 AS
(
SELECT t.ind_id
, t.name
, CASE WHEN t.date <= #date THEN t.value ELSE NULL END AS value
, CASE WHEN t.date <= #date THEN t.date ELSE NULL END AS date
FROM test1 AS t
),
temp AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY ind_id ORDER BY t.date DESC) AS rn
FROM temp1 AS t
WHERE t.date <= #date OR t.date IS NULL
)
SELECT *
FROM temp AS t
WHERE rn = 1
Use option with EXISTS operator
DECLARE #date date = '20100104'
SELECT ind_id,
CASE WHEN date <= #date THEN value END AS value,
CASE WHEN date <= #date THEN date END AS date
FROM dbo.test57 t
WHERE EXISTS (
SELECT 1
FROM dbo.test57 t2
WHERE t.ind_id = t2.ind_id AND t2.date <= #date
HAVING ISNULL(MAX(t2.date), t.date) = t.date
)
Demo on SQLFiddle
This is not the exact answer but will give you the concept as i just write it down quickly without any testing.
use
go
if
(Select value from table where col=#col1) is not null
--you code to get the match value
else if
(Select LOWER(Date) from table ) is not null
-- your query to get the nerst dtae record
else
--you query withh null value
end