Finding efficient overlapped entries in a SQL table - sql

What is the most efficient way to find all entries which do overlap with others in the same table? Every entry has a start and end date. For example I have the following database setup:
CREATE TABLE DEMO
(
DEMO_ID int IDENTITY ,
START date NOT NULL ,
END date NOT NULL
);
INSERT INTO DEMO (DEMO_ID, START, END) VALUES (1, '20100201', '20100205');
INSERT INTO DEMO (DEMO_ID, START, END) VALUES (2, '20100202', '20100204');
INSERT INTO DEMO (DEMO_ID, START, END) VALUES (3, '20100204', '20100208');
INSERT INTO DEMO (DEMO_ID, START, END) VALUES (4, '20100206', '20100211');
My query looks as follow:
SELECT DISTINCT *
FROM DEMO A, DEMO B
WHERE A.DEMO_ID != B.DEMO_ID
AND A.START < B.END
AND B.START < A.END
The problem is when my demo table has for example 20'000 rows the query takes too long. My environment is MS SQL Server 2008.
Thanks for any more efficient solution

This is simpler and executes in about 2 seconds for over 20000 records
select * from demo a
where not exists(
select 1 from demo b
where a.demo_id!=b.demo_id
AND A.S < B.E
AND B.S < A.E)

You could rewrite the query a bit:
SELECT A.DEMO_ID, B.DEMO_ID
FROM DEMO A, DEMO B
WHERE A.DEMO_ID != B.DEMO_ID
AND A.START >= B.START
AND A.START <= B.END
Getting rid of the DISTINCT keyword may make things cheaper, because Sql Server will do a sort on the returned column (which is all of them when you use DISTINCT *) to eliminate duplicates.
You should also consider adding an index. With Sql Server 2008, I would recommend an index on START, END, containing DEMO_ID.

Use a function or stored procedure:
First, order the entries by Start and End
DECLARE #t table (
Position int identity(1,1),
DEMO_ID int,
START date NOT NULL ,
END date NOT NULL
)
INSERT INTO #t (DEMO_ID, START, END)
SELECT DEMO_ID, START, END
FROM DEMO
ORDER BY START, END
Then check for overlaps with previous and next record:
SELECT t.DEMO_ID
FROM #t t INNER JOIN #t u ON t.Position + 1 = u.Position
WHERE u.Start <= t.End
UNION
SELECT t.DEMO_ID
FROM #t t INNER JOIN #t u ON t.Position - 1 = u.Position
WHERE t.Start <= u.End
You need to measure to be sure this is faster. In any case, we won't compare the date fields of all records with all the other records, so this could be faster for large datasets.

Late answer, but wondering if this would help:
create index IXNCL_Demo_DemoId on Demo(Demo_Id)
select a.demo_id, b.demo_id as [CrossingDate]
from demo a
cross join demo b
where a.[end] between b.start and b.[end]
and a.demo_id <> b.demo_id

Related

Variables Declaration, CTEs, and While Loops in Oracle SQL

So I might be stuck at something very trivial but can't figure out how to make it work. I create a 2 blocks of code that work in SQL but I have some problems with the date variable declaration in Oracle SQL.
I had write access to the SQL database when I create these codes so I did a 'Insert Into' to create temp tables. I don't have write access anymore. So I am using CTEs for it.
The original code looks like this:
DECLARE #Startdate Datetime = '2021-Jun-01 00:00:00.000'
DECLARE #Enddate Datetime = '2021-Jun-30 00:00:00.000'
Insert into Temp1
select ...
from ...
WHILE Startdate <= Enddate
BEGIN
Insert into Temp2
select ...
from (Temp 1)
left join
select ...
set #startdate=dateadd(d,1,#startdate)
end;
With my new code, I have made the following adjustmnets:
VARIABLE Startdate Datetime = '2021-Jun-01 00:00:00.000'
VARIABLE Enddate Datetime = '2021-Jun-30 00:00:00.000'
EXEC :Startdate := '2021-Jun-30 00:00:00.000'
EXEC :Enddate := '2021-Jun-30 00:00:00.000'
WITH Temp1 as (
select ...
from ...),
/* Unsure about using WHILE with with 2 CTEs so removing them for now but will need to be added*/
WITH Temp2 as
select ...
from (Temp 1)
left join
select ...
set startdate = :startdate + 1
end)
select * from Temp2;
The 2 blocks of code work perfectly individually. I think my concern lies with one or all of the following:
Variable Declaration - I read a couple of stackoverflow posts and it seems like there is binding variable and substitution variable. Is there a different way to declare variables?
The WHILE Loop specially between 2 CTEs. Can we do a while loop as a CTE? (similar to this) create while loop with cte
How the date is incremented. Is this the proper way to increment dates in Oracle PL/SQL?
Any guidance would be helpful.
Also adding 2 blocks of codes for reference:
Details of Tables:
Transactions - Contains Transaction information. Execution Date is a timestamp of the transaction execution
Account - Contains Account Information with a unique Account_Key for every account
Code_Rel - Maps the transaction code to a transaction type
Group Rel - Maps the transaction type to a transaction group
/***Block 1 of Code***/
insert into Temp1
select
a.ACCOUNT_KEY
,a.SPG_CD
,t.EXECUTION_DATE
from Schema_Name.TRANSACTIONS t
inner join Schema_Name.ACCOUNT a on a.en_sk=t.ac_sk
inner join Schema_Name.Code_Rel tr on t.t_cd_s = tr.t_cd_s
inner join ( select * from Schema_Name.Group_Rel
where gtrt_cd in ('Type1','Type2')) tt on tr.trt_cd = tt.trt_cd
where t.EXECUTION_DATE >= #startdate and t.EXECUTION_DATE<=#EndDt
and tt.gtrt_cd in ('Type1','Type2')
group by a.ACCOUNT_KEY ,a.SPG_CD, t.EXECUTION_DATE;
/***WHILE LOOP***/
while #startdate <= #EndDt
BEGIN
/***INSERT AND BLOCK 2 OF CODE***/
insert into Temp2
select table1.account_key, table1.SPG_CD, #startdate, coalesce(table2.sum_tr1,0),coalesce(table3.sum_tr2,0),
case when coalesce(table3.sum_tr2,0)>0 THEN coalesce(table2.sum_tr1,0)/coalesce(table3.sum_tr2,0) ELSE 0 END,
case when coalesce(table3.sum_tr2,0)>0 THEN
CASE WHEN coalesce(table2.sum_tr1,0)/coalesce(table3.sum_tr2,0)>=0.9 and coalesce(table2.sum_tr1,0)/coalesce(table3.sum_tr2,0)<=1.10 and coalesce(table2.sum_tr1,0)>=1000 THEN 'Yes' else 'No' END
ELSE 'No' END
FROM ( SELECT * FROM Temp1 WHERE execution_date=#startdate) TABLE1 LEFT JOIN
(
select a.account_key,a.SPG_CD, SUM(t.AC_Amt) as sum_tr1
from Schema_Name.TRANSACTIONS t
inner join Schema_Name.ACCOUNT a on a.en_sk=t.ac_sk
inner join Schema_Name.Code_Rel tr on t.t_cd_s = tr.t_cd_s
inner join ( select * from Schema_Name.Group_Rel
where gtrt_cd in ('Type1')) tt on tr.trt_cd = tt.trt_cd
where t.EXECUTION_DATE <= #startdate
and t.EXECUTION_DATE >=dateadd(day,-6,#startdate)
and tt.gtrt_cd in ('Type1')
group by a.account_key, a.SPG_CD
) table2 ON table1.account_key=table2.account_key
LEFT JOIN
(
select a.account_key,a.SPG_CD, SUM(t.AC_Amt) as sum_tr2
from Schema_Name.TRANSACTIONS t
inner join Schema_Name.ACCOUNT a on a.en_sk=t.ac_sk
inner join Schema_Name.Code_Rel tr on t.t_cd_s = tr.t_cd_s
inner join ( select * from Schema_Name.Group_Rel
where gtrt_cd in ('Type2')) tt on tr.trt_cd = tt.trt_cd
where t.EXECUTION_DATE <= #startdate
and t.EXECUTION_DATE >=dateadd(day,-6,#startdate)
and tt.gtrt_cd in ('Type2')
group by a.account_key, a.SPG_CD ) table3 on table1.account_key=table3.account_key
where coalesce(table2.sum_tr1,0)>=1000
set #startdate=dateadd(d,1,#startdate)
end;
You do not need to use PL/SQL or a WHILE loop or to declare variables and can probably do it all in a single SQL query using subquery factoring clauses (and recursion) to generate a calendar of incrementing dates. Something like this made-up example:
INSERT INTO temp2 (col1, col2, col3)
WITH time_bounds(start_date, end_date) AS (
-- You can declare the bounds in the query.
SELECT DATE '2021-06-01',
DATE '2021-06-30'
FROM DUAL
),
calendar (dt, end_date) AS (
-- Recursive query to generate a row for each day.
SELECT start_date, end_date FROM time_bounds
UNION ALL
SELECT dt + INTERVAL '1' DAY, end_date
FROM calendar
WHERE dt + INTERVAL '1' DAY <= end_date
),
temp1 (a, b, c) AS (
-- Made-up query
SELECT a, b, c FROM some_table
),
temp2 (a, d, e) AS (
-- Another made-up query.
SELECT t1.a,
s2.d,
s2.e
FROM temp1 t1
LEFT OUTER JOIN some_other_table s2
ON (t1.b = s2.b)
)
-- Get the values to insert.
SELECT t2.a,
t2.d,
t2.e
FROM temp2 t2
INNER JOIN calendar c
ON (t2.e = c.dt)
WHERE a BETWEEN 3.14159 AND 42;
If you try doing it with multiple inserts in a PL/SQL loop then it will be much slower than a single statement.

query on sql server

I have the following query
SELECT
A.IdDepartment,
A.IdParent,
A.Localidad,
A.Codigo,
A.Nombre,
A.Departamento,
A.Fecha,
A.[Registro Entrada],
A.[Registro Salida],
CASE
WHEN (SELECT IdUser FROM Exception WHERE IdUser = A.Codigo) <> ''
THEN(SELECT Description FROM Exception WHERE IdUser = A.Codigo AND A.Fecha BETWEEN BeginingDate AND EndingDate)
ELSE ('Ausente')
END AS Novedades
FROM VW_HORARIOS A
WHERE A.[Registro Entrada] = A.[Registro Salida]
GROUP BY A.IdDepartment,A.IdParent, A.Localidad, A.Codigo, A.Nombre, A.Departamento, A.Fecha, A.[Registro Entrada],A.[Registro Salida]
ORDER BY A.Fecha
the query performs the following selects all the records placed in the following query, what I want to validate is the following if on a date there was no record I want to create it but I do not know how to create that record because it does not exist, if someone can help me I would appreciate the help
You can try something like this. Just fill out your own Date table with values that is within your range of dates.
Remember to verify the last join. I dont know if that is the unique businesskey within your data sample
SQL Test Code
declare #DateTable table (Dates date)
insert into #DateTable
values
('2017-01-01'),
('2017-01-02'),
('2017-01-03'),
('2017-01-04'),
('2017-01-05'),
('2017-01-06'),
('2017-01-07'),
('2017-01-08'),
('2017-01-09'),
('2017-01-10')
declare #SamleTable table (DateStamp date,Department nvarchar(50),LocationId nvarchar(50),Code int,name nvarchar(50),Entrada nvarchar(50))
insert into #SamleTable
values
('2017-01-01','BOTELLO','SANTO',5540,'JOSE','Something'),
('2017-01-04','BOTELLO','SANTO',5540,'JOSE','Something'),
('2017-01-06','BOTELLO','SANTO',5540,'JOSE','Something'),
('2017-01-09','BOTELLO','SANTO',5540,'JOSE','Something')
select z.Department,z.LocationId,z.Code,z.name,z.Dates,COALESCE(a.Entrada,'EMPTY') as Entrada from (
Select Department,LocationId,Code,Name,Dates from (
select Department,LocationId,Code,Name,MIN(DateStamp) mind, MAX(Datestamp) maxd from #SamleTable
group by Department,LocationId,Code,Name
)x
CROSS JOIN #DateTable b
where b.Dates between x.mind and x.maxd
) z
left join #SamleTable a on a.Department = z.Department and a.LocationId = z.LocationId and a.Code = z.Code and a.name = z.name
and a.DateStamp = z.Dates
Result
You can use a recursive query building all dates from the minimum date to the maximum date found in your table.
with dates(fecha, maxfecha) as
(
select min(fecha) as fecha, max(fecha) as maxfecha from vw_horarios
union all
select dateadd(dd, 1, fecha) as fecha, maxfecha from dates where fecha < maxfecha
)
select d.fecha, q.*
from dates d
left join ( your query here ) q on q.fecha = d.fecha;

tsql most effective way to compare three date values in where clause?

I am trying to create a stored procedure.
What is most effective way to compare three date values in where clause?
Example:
tbl1.Date1,
tbl2.Date2, -- NOTE: Date2 can be NULL.
tbl3.Date3
Example data:
Date1 Date2 Date3
2016-12-20 2016-11-21 2016-11-30
2016-11-21 NULL 2016-12-20
First, I compare Date1 and Date2 and I choose "bigger" date.
Then I compare this "bigger" date to Date3.
If comparsion is true, I write values to table.
-- This is simplified example:
INSERT INTO records
(
[user_date],
[user_name]
)
SELECT
tbl1.Date1,
tbl1.user_name
FROM
table1 AS tbl1
INNER JOIN table2 AS tbl2 ON tbl1.id = tbl2.id
INNER JOIN table3 AS tbl3 ON tbl2.id = tbl3.id
WHERE
-- I need to know what is bigger, Date1 or Date2, so I can compare correct date to Date3.
ISNULL(tbl2.Date2, tbl1.Date1) <= tbl3.Date3 -- ISNULL, doesn't work here, because Date2 and Date1 can get a value and comparison fails if Date1 is bigger than Date3.
AND ISNULL(tbl2.Date2, tbl1.Date1) > tbl3.last_date
Perhaps something like this?
Declare #tbl1 table (id int,date1 date);Insert Into #tbl1 values (1,'2016-12-20'),(2,'2016-11-21 ');
Declare #tbl2 table (id int,date2 date);Insert Into #tbl2 values (1,'2016-11-21'),(2,null);
Declare #tbl3 table (id int,date3 date);Insert Into #tbl3 values (1,'2016-12-30'),(2,'2016-12-20');
Select User_Date = (Select max(d) from (values(date1),(date2),(date3)) D(D))
,A.ID
From #tbl1 A
Join #tbl2 B on A.ID=B.ID
Join #tbl3 C on A.ID=C.ID
Returns
User_Date ID
2016-12-30 1
2016-12-20 2
You can use a case expression, or inline if, to conditionally return the bigger of the two dates, for comparison.
Sample Data
-- Sample data.
DECLARE #Sample TABLE
(
Date1 DATE NOT NULL,
Date2 DATE NULL,
Date3 DATE NOT NULL
)
;
INSERT INTO #Sample
(
Date1,
Date2,
Date3
)
VALUES
('2016-01-02', NULL, '2016-01-01'), -- D1 > D3.
('2015-12-31', '2016-01-02', '2016-01-01'), -- D2 > D1 and D3
('2015-12-30', '2016-12-31', '2016-01-01') -- D3 > D1 and D2
;
Case Expression
-- Using a CASE EXPRESSION.
SELECT
CASE WHEN s.Date2 > s.Date1 THEN s.Date2 ELSE s.Date1 END AS Bigger_of_D1_D2,
*
FROM
#Sample AS s
WHERE
CASE WHEN s.Date2 > s.Date1 THEN s.Date2 ELSE s.Date1 END > s.Date3
;
Inline If (SQL Serer 2012, or above)
-- Using an INLINE IF.
SELECT
IIF(s.Date2 > s.Date1, s.Date2, s.Date1) AS Bigger_of_D1_D2,
*
FROM
#Sample AS s
WHERE
IIF(s.Date2 > s.Date1, s.Date2, s.Date1) > s.Date3
;
Both methods rely on the fact that NULL is not equal to anything, inculding itself. This means that checking D2 against D1 will always return false, if Date2 is null. If Date1 also allows NULLs this technique would fail. In that case you could expand the case expression to include more where expressions, or include the ISNULL function.
The second option is an example of sytactic sugar. Behind the scenes SQL Server will convert your code into a simple case expression, as per MSDN:
IIF is a shorthand way for writing a CASE expression.
Here is my suggestion:
INSERT INTO records
(
[user_date],
[user_name]
)
SELECT tbl1.Date1
,tbl1.user_name
FROM table1 as tbl1
INNER JOIN table2 as tbl2 on tbl1.ID=tbl2.ID
INNER JOIN table3 as tbl3 on tbl1.ID=tbl3.ID
WHERE (SELECT CASE WHEN tbl1.Date1 > ISNULL(tbl2.Date2,'1900-01-01') THEN tbl1.Date1 ELSE tbl2.Date2 END) <= tbl3.Date3
AND
(SELECT CASE WHEN tbl1.Date1 > ISNULL(tbl2.Date2,'1900-01-01') THEN tbl1.Date1 ELSE tbl2.Date2 END) > tbl3.last_date

Rank & find difference of value in the same column

I have the below table -
Here, I have created the "Order" column by using the rank function partitioned by case_identifier ordered by audit_date.
Now, I want to create a new column as below -
The logic for the new column would be -
select *,
case when [order] = '1' then [days_diff]
else (val of [days_diff] in rank 2) - (val of [days_diff] in rank 1) ...
end as '[New_Col]'
from TABLE
Can you please help me with the syntax? Thanks.
Take a look at the LAG function. It will provide you with what you want.
something like:
declare #temptable TABLE (case_id varchar(2), row_order int, days_diff float)
INSERT INTO #temptable values ('A',1,5)
INSERT INTO #temptable values ('A',2,3)
INSERT INTO #temptable values ('A',3,2)
INSERT INTO #temptable values ('B',1,5)
INSERT INTO #temptable values ('B',2,1)
--select * from #temptable
SELECT case_id,row_order, LAG(days_diff,1) OVER (PARTITION BY case_id ORDER BY row_order) AS prev_row,days_diff,
CASE
WHEN row_order = 1 THEN days_diff
ELSE LAG(days_diff,1) OVER (PARTITION BY case_id ORDER BY row_order) - days_diff
END AS newcolumn
FROM #temptable
order by case_id,row_order asc
SELECT case_id,row_order,LAG(days_diff,1) OVER (PARTITION BY case_id ORDER BY row_order) AS prev_row, days_diff,
COALESCE(LAG(days_diff,1) OVER (PARTITION BY case_id ORDER BY row_order) - days_diff , days_diff)
FROM #temptable
order by case_id,row_order asc
Other answers will use a coalesce in place of the CASE statement. It's probably faster, but I feel like this is clearer.
If you run both and look at the execution plans they are the same.
I believe the following query gets you what you want.
SELECT a.*,
'NEW DAYS DIFF' =
CASE
WHEN a.[order] = 1 THEN a.days_diff
ELSE a.days_diff - b.days_diff
END
FROM dbo.tblCaseDaysDiff a
INNER JOIN dbo.tblCaseDaysDiff b
ON
(b.CASE_ID = a.CASE_ID AND b.[order] + 1 = a.[order] ) -- Get the current row and compare with the next highest order
OR (b.CASE_ID = a.CASE_ID AND b.[order] = 1 AND a.[order] = 1) --WHEN ORDER = 1 Get days_diff value
ORDER BY a.CASE_ID, a.[order]
As it happens, you're already hip-deep in window functions, and as others have pointed out, LAG will do the trick. In general, though, you can always get the difference of two rows by making one row: by joining the table to itself.
with T (CASE_IDENTIFIER, AUDIT_DATE, order, days_diff)
as (
... your query ...
)
select a.*,
a.days_diff - coalesce(b.days_diff, 0) as delta_days_diff
from T as a left join T as b
on a.CASE_IDENTIFIER = b.CASE_IDENTIFIER
and b.days_diff = a.days_diff - 1
LAG METHOD
SELECT
CASE_IDENTIFIER
,AUDIT_DATE
,[order]
,days_diff
,days_diff - ISNULL(LAG(days_diff,1) OVER (PARTITION BY CASE_IDENTIFIER ORDER BY [order]),0) AS New_Column
FROM #Table
SELF JOIN METHOD
SELECT
t1.CASE_IDENTIFIER
,AUDIT_DATE
,t1.[order]
,t1.days_diff
,t1.days_diff - ISNULL(t2.days_diff,0) AS New_Column
FROM
#Table t1
LEFT JOIN #Table t2
ON t1.CASE_IDENTIFIER = t2.CASE_IDENTIFIER
AND t1.[order] - 1 = t2.[order]
I feel like a lot of the other answers are on the right track but there are some nuances or easier ways of writing some of them. Or also some of the answer provide the write direction but had something wrong with their join or syntax. Anyways, you don't need the CASE STATEMENT whether you use the LAG of SELF JOIN Method. Next COALESCE() is great but you are only comparing 2 values so ISNULL() works fine too for sql-server but either will do.

What's the best way to select max over multiple fields in SQL?

The I kind of want to do is select max(f1, f2, f3). I know this doesn't work, but I think what I want should be pretty clear (see update 1).
I was thinking of doing select max(concat(f1, '--', f2 ...)), but this has various disadvantages. In particular, doing concat will probably slow things down. What's the best way to get what I want?
update 1: The answers I've gotten so far aren't what I'm after. max works over a set of records, but it compares them using only one value; I want max to consider several values, just like the way order by can consider several values.
update 2: Suppose I have the following table:
id class_name order_by1 order_by_2
1 a 0 0
2 a 0 1
3 b 1 0
4 b 0 9
I want a query that will group the records by class_name. Then, within each "class", select the record that would come first if you ordered by order_by1 ascending then order_by2 ascending. The result set would consist of records 2 and 3. In my magical query language, it would look something like this:
select max(* order by order_by1 ASC, order_by2 ASC)
from table
group by class_name
Select max(val)
From
(
Select max(fld1) as val FROM YourTable
union
Select max(fld2) as val FROM YourTable
union
Select max(fld3) as val FROM YourTable
) x
Edit: Another alternative is:
SELECT
CASE
WHEN MAX(fld1) >= MAX(fld2) AND MAX(fld1) >= MAX(fld3) THEN MAX(fld1)
WHEN MAX(fld2) >= MAX(fld1) AND MAX(fld2) >= MAX(fld3) THEN MAX(fld2)
WHEN MAX(fld3) >= MAX(fld1) AND MAX(fld3) >= MAX(fld2) THEN MAX(fld3)
END AS MaxValue
FROM YourTable
I have one of these.
CREATE FUNCTION [dbo].[MathMax]
(
#a int,
#b int
)
RETURNS int
WITH SCHEMABINDING
AS
BEGIN
IF #a IS NULL AND #b IS NULL
RETURN 0
IF #a IS NULL
RETURN #b
IF #b IS NULL
RETURN #a
IF #a < #b
RETURN #b
RETURN #a
END
Then I can do SELECT dbo.MathMax(dbo.MathMax(MAX(f1), MAX(f2)), MAX(f3)) FROM T1
I believe this performs about the same as the CASE while being more readable, and performs much better than the multi-UNION (even the slightly more efficient UNION ALL).
Based on an answer I gave to another question: SQL - SELECT MAX() and accompanying field
To make it work for multiple columns, add more columns to the inner select's ORDER BY.
You would group by your class (could be multiple columns) and use a subselect within the column list that uses an ordinary order by in combination with limit 1.