Sample data below
id start end
a 1 3
a 5 6
a 8 9
b 2 4
b 6 7
b 9 10
c 2 4
c 6 7
c 9 10
I'm trying to come up with a query that will return all the overlap start-end inclusive between a, b, and c (but extendable to more). So the expected data will look like the following
start end
2 3
6 6
9 9
The only way I can picture this is with a custom aggregate function that tracks the current valid intervals then computes the new intervals during the iterate phase. However I can't see this approach being practical when working with large datasets. So if some bright mind out there have a query or some innate function that I'm not aware of I would greatly appreciate the help.
You can do this using aggregation and a join. Assuming no internal overlaps for "a" and "b":
select greatest(ta.start, tb.start) as start,
least(ta.end, tb.end) as end
from t ta join
t tb
on ta.start <= tb.end and ta.end >= tb.start and
ta.id = 'a' and tb.id = 'b';
This is a lot uglier and more complex than Gordon's solution, but I think it gives the expected answer better and should extend to work with more ids:
WITH NUMS(N) AS ( --GENERATE NUMBERS N FROM THE SMALLEST START VALUE TO THE LARGEST END VALUE
SELECT MIN("START") N FROM T
UNION ALL
SELECT N+1 FROM NUMS WHERE N < (SELECT MAX("END") FROM T)
),
SEQS(N,START_RANK,END_RANK) AS (
SELECT N,
CASE WHEN IS_START=1 THEN ROW_NUMBER() OVER (PARTITION BY IS_START ORDER BY N) ELSE 0 END START_RANK, --ASSIGN A RANK TO EACH RANGE START
CASE WHEN IS_END=1 THEN ROW_NUMBER() OVER (PARTITION BY IS_END ORDER BY N) ELSE 0 END END_RANK --ASSIGN A RANK TO EACH RANGE END
FROM (
SELECT N,
CASE WHEN NVL(LAG(N) OVER (ORDER BY N),N) + 1 <> N THEN 1 ELSE 0 END IS_START, --MARK N AS A RANGE START
CASE WHEN NVL(LEAD(N) OVER (ORDER BY N),N) -1 <> N THEN 1 ELSE 0 END IS_END /* MARK N AS A RANGE END */
FROM (
SELECT DISTINCT N FROM ( --GET THE SET OF NUMBERS N THAT ARE INCLUDED IN ALL ID RANGES
SELECT NUMS.*,T.*,COUNT(*) OVER (PARTITION BY N) N_CNT,COUNT(DISTINCT "ID") OVER () ID_CNT
FROM NUMS
JOIN T ON (NUMS.N >= T."START" AND NUMS.N <= T."END")
) WHERE N_CNT=ID_CNT
)
) WHERE IS_START + IS_END > 0
)
SELECT STARTS.N "START",ENDS.N "END" FROM SEQS STARTS
JOIN SEQS ENDS ON (STARTS.START_RANK=ENDS.END_RANK AND STARTS.N <= ENDS.N) ORDER BY "START"; --MATCH CORRESPONDING RANGE START/END VALUES
First we generate all the numbers between the smallest start value and the largest end value.
Then we find the numbers that are included in all the provided "id" ranges by joining our generated numbers to the ranges, and selecting each number "n" that appears once for each "id".
Then we determine whether each of these values "n" starts or ends a range. To determine that, for each N we say:
If the previous value of N does not exist or is not 1 less than current N, current N starts a range. If the next value of N does not exist or is not 1 greater than current N, current N ends a range.
Next, we assign a "rank" to each start and end value so we can match them up.
Finally, we self-join where the ranks match (and where the start <= the end) to get our result.
EDIT: After some searching, I came across this question which shows a better way to find the start/ends and refactored the query to:
WITH NUMS(N) AS ( --GENERATE NUMBERS N FROM THE SMALLEST START VALUE TO THE LARGEST END VALUE
SELECT MIN("START") N FROM T
UNION ALL
SELECT N+1 FROM NUMS WHERE N < (SELECT MAX("END") FROM T)
)
SELECT MIN(N) "START",MAX(N) "END" FROM (
SELECT N,ROW_NUMBER() OVER (ORDER BY N)-N GRP_ID
FROM (
SELECT DISTINCT N FROM ( --GET THE SET OF NUMBERS N THAT ARE INCLUDED IN ALL ID RANGES
SELECT NUMS.*,T.*,COUNT(*) OVER (PARTITION BY N) N_CNT,COUNT(DISTINCT "ID") OVER () ID_CNT
FROM NUMS
JOIN T ON (NUMS.N >= T."START" AND NUMS.N <= T."END")
) WHERE N_CNT=ID_CNT
)
)
GROUP BY GRP_ID ORDER BY "START";
Related
For example, I have a sequence of numbers: {1, 2, 5, 7}.
I need to find the smallest and the biggest one, which are missed in this sequence (min=3 and max=6 for this example). Values can also be negative.
Here is my solution, but it doesn't pass on extra checking database (Wrong number of records (less by 1)), so I can't say what is exactly wrong. I also tried versions with LEFT OUTER JOIN and EXCEPT predicates - same problem. Please, help me to improve my solution.
WITH AA AS (SELECT MAX(Q_ID) MX
FROM UTQ),
BB AS (SELECT MIN(Q_ID) CODE
FROM UTQ
UNION ALL
SELECT CODE + 1
FROM BB
WHERE CODE < (SELECT MX
FROM AA)
)
SELECT MIN(CODE) MIN_RES, MAX(CODE) MAX_RES
FROM BB
WHERE CODE NOT IN (SELECT Q_ID
FROM UTQ)
One method is not exists:
select min(q_id + 1)
from utq
where not exists (select 1 from utq utq2 where utq2.q_id = utq.id + 1)
union all
select max(q_id - 1)
from utq
where not exists (select 1 from utq utq2 where utq2.q_id = utq.id - 1);
You can also use lead() and lag():
select min(case when next_q_id <> q_id + 1 then q_id + 1 end),
max(case when prev_q_id <> q_id - 1 then q_id - 1 end)
from (select utq.*,
lag(q_id) over (order by q_id) as prev_q_id,
lead(q_id) over (order by q_id) as next_q_id
from utq
) utq;
A tally based method seems like a good approach here. Especially if the sequences are large.
The first CTE summarizes the maximum and minimum q_id's in the test table. The second CTE selects the missing integers by generating the complete sequence (using the fnNumbers tvf) between the minimum and maximum q_id values and comparing WHERE NOT EXISTS to the original sequence. Something like this.
numbers function
create function [dbo].[fnNumbers](
#zero_or_one bit,
#n bigint)
returns table with schemabinding as return
with n(n) as (select null from (values (1),(2),(3),(4)) n(n))
select 0 n where #zero_or_one = 0
union all
select top(#n) row_number() over(order by (select null)) n
from n na, n nb, n nc, n nd, n ne, n nf, n ng, n nh,
n ni, n nj, n nk, n nl, n nm, n np, n nq, n nr;
data and query
drop table if exists #seq;
go
create table #seq(
q_id int unique not null);
insert #seq values (1),(2),(5),(7);
with
max_min_cte(max_q, min_q) as (
select max(q_id), min(q_id)
from #seq),
missing_cte(q_id) as (
select mm.min_q+fn.n
from max_min_cte mm
cross apply dbo.fnNumbers(0, mm.max_q-mm.min_q) fn
where not exists (select 1
from #seq s
where (mm.min_q+fn.n)=s.q_id))
select max(q_id) max_missing, min(q_id) min_missing
from missing_cte;
output
max_missing min_missing
6 3
You can try like following using LEAD
SELECT MIN(Q_ID + 1) AS MinValue
,MAX(Q_ID + 1) AS MaxValue
FROM (
SELECT *,LEAD(Q_ID) OVER (ORDER BY Q_ID) NQ_ID
FROM (VALUES (1),(2),(5),(7)) v(Q_ID)
) t
WHERE NQ_ID - Q_ID <> 1
I have the following table and as you can see the ids are not the same. So I can't do group by. I need to count all the ones that are in sequence. Like from id 9 to 13, from id 20 to 23. How i do it?
Here's a solution with LAG and LEAD.
;WITH StackValues AS
(
SELECT
T.*,
PreviousStatus = LAG(T.Status, 1, 0) OVER (ORDER BY T.ID ASC),
NextStatus = LEAD(T.Status, 1, 0) OVER (ORDER BY T.ID ASC)
FROM
#YourTable AS T
),
ValuesToSum AS
(
SELECT
L.*,
ValueToSum = CASE
WHEN L.Status = 1 AND L.PreviousStatus = 1 AND L.NextStatus = 0 THEN 1
ELSE 0 END
FROM
StackValues AS L
)
SELECT
Total = SUM(V.ValueToSum)
FROM
ValuesToSum AS V
LAG will give you the N previous row (N = 1 for this example) while LEAD will give you the N next row (N = 1 for this example). The query generates another column (ValueToSum) based on the previous and next values and uses it's result to sum.
table A
no date count
1 20160401 1
1 20160403 4
2 20160407 3
result
no date count
1 20160401 1
1 20160402 0
1 20160403 4
1 20160404 0
.
.
.
2 20160405 0
2 20160406 0
2 20160407 3
.
.
.
I'm using Oracle and I want to write a query that returns rows for every date within a range based on table A.
Is there some function in Oracle that can help me?
you can use the SEQUENCES.
First create a sequence
Create Sequence seq_name start with 20160401 max n;
where n is the max value till u want to display.
Then use the sql
select seq_name.next,case when seq_name.next = date then count else 0 end from tableA;
Note:- Its better not to use date,count as the column names.
Try this:
with
A as (
select 1 no, to_date('20160401', 'yyyymmdd') dat, 1 cnt from dual union all
select 1 no, to_date('20160403', 'yyyymmdd') dat, 4 cnt from dual union all
select 2 no, to_date('20160407', 'yyyymmdd') dat, 3 cnt from dual),
B as (select min(dat) mindat, max(dat) maxdat from A t),
C as (select level + mindat - 1 dat from B connect by level + mindat - 1 <= maxdat),
D as (select distinct no from A),
E as (select * from D,C)
select E.no, E.dat, nvl(cnt, 0) cnt
from E
full outer join A on A.no = E.no and A.dat = E.dat
order by 1, 2, 3
This isn't an oracle specific answer, you'll need to translate it to oracle yourself.
Create an intervals table, containing all integers from 0 to 999. Something like this:
CREATE TABLE intervals (days int);
INSERT INTO intervals (days) VALUES (0), (1);
DECLARE #rc int;
SELECT #rc = 2;
WHILE (SELECT Count(*) FROM intervals) < 1000 BEGIN
INSERT INTO intervals (days) SELECT days + #rc FROM intervals WHERE days + #rc < 1000;
SELECT #rc = #rc * 2
END;
Then all the dates in the range can be identified by adding intervals.days to the first date you've got, where the first date + intervals.days is <= the end date, and the resultant date is new. Do this by cross joining intervals to your own table. Something like (it would be in SQL, but again you'll need to translate):
SELECT DateAdd(a.date, d, i.days)
FROM (select min(date) from table_A) a, intervals I
WHERE DateAdd(a.date, d, i.days) < (select max(date) from table_A)
AND NOT EXISTS (select 1 from table_A aa where aa.date = DateAdd(a.date, d, i.days))
Hope this gives you a starting point
Intention: detect whether a numeric sequence contains gaps. No need to identify the missing elements, just flag (true / false) the sequence if it contains gaps.
CREATE TABLE foo(x INTEGER);
INSERT INTO foo(x) VALUES (1), (2), (4);
Below is my (apparently correctly functioning) query to detect gaps:
WITH cte AS
(SELECT DISTINCT x FROM foo)
SELECT
( (SELECT COUNT(*) FROM cte a
CROSS JOIN cte b
WHERE b.x=a.x-1)
=(SELECT COUNT(*)-1 FROM cte))
OR (NOT EXISTS (SELECT 1 FROM cte))
where the OR is needed for the edge case where the table is empty. The query's logic is based on the observation that in a contiguous sequence the number of links equals the number of elements minus 1.
Anything more idiomatic or performant (should I be worried by the CROSS JOIN in particularly long sequences?)
Try this:
SELECT
CASE WHEN ((MAX(x)-MIN(x)+1 = COUNT(DISTINCT X)) OR
(COUNT(DISTINCT X) = 0) )
THEN 'TRUE'
ELSE 'FALSE'
END
FROM foo
SQLFiddle demo
The following should detect whether or not there are gaps:
select (case when max(x) - min(x) + 1 = count(distinct x)
then 'No Gaps'
else 'Some Gaps'
end)
from foo;
If there are no gaps or duplicates, then the number of distinct values of x is the max minus the min plus 1.
A different approach...
If you subtract your min value from the max value, and add 1, you should equal the count.
if count = (max-min)+1 then "no gaps!"
If you can express that in SQL, it should be very efficient.
SELECT 'Has ' || count(*) - 1 || ' gaps.' AS gaps
FROM foo f1
LEFT JOIN foo f2 ON f2.id = f1.id + 1
WHERE f2.id IS NULL;
The trick is to count rows, where the next row is missing - which only happens for the last row(s) if there are no gaps.
If there are no rows, you get 'Has -1 gaps.'.
If there are no gaps, you get 'Has 0 gaps.'.
Else you get 'Has n gaps.' .. n being the exact number of gaps, no matter how big.
The count can be increased for duplicates, but 0 and -1 are immune to dupes.
I have the specific result above in a select:
1 2
1 3
1 5
1 6
1 9
1 10
1 11
1 13
1 14
1 16
1 18
1 20
1 23
1 24
1 25
What I want to find is the longest increasing-by-one chain that occurs in the results.
For example, I know that 3 is the maximum length sequence in this number range, coming from the last 3 results (23,24,25 being 3 in a row).
A sequence will have the property that the difference between the number and a sequential ordering will be constant. In most dialects of SQL, you have a function called row_number(), which assigns sequential numbers.
We can use this observation to solve your problem:
select (num - seqnum), count(*) as NumInSequence
from (select t.*, row_number() over (order by num) as seqnum
from t
) t
group by (num - seqnum)
This gives every sequence. To get the max, either use max() with a subquery or some version of limit/top. In SQL Server, for instance, you can do:
select top 1 count(*) as NumInSequence
from (select t.*, row_number() over (order by num) as seqnum
from t
) t
group by (num - seqnum)
order by NumInSQuence desc
Using this article as the main query:
http://www.xaprb.com/blog/2006/03/22/find-contiguous-ranges-with-sql/
Just add a column that calculates the difference and select the MAX().
SELECT MAX(seq.end - seq.start)
FROM (
select l.id as start,
(
select min(a.id) as id
from sequence as a
left outer join sequence as b on a.id = b.id - 1
where b.id is null
and a.id >= l.id
) as end,
from sequence as l
left outer join sequence as r on r.id = l.id - 1
where r.id is null;
) AS seq
#Gordon gave a brilliant and more terse answer. However, I think a recursive implementation may be useful as well. Here's a very useful article on recursive CTEs: http://msdn.microsoft.com/en-us/library/ms186243(v=sql.105).aspx
-- This first CTE is unnecessary because you presumably already have
-- your data. But I wanted to include it to make it easier test.
WITH myNumbers AS (
SELECT *
FROM (
VALUES
(2),
(3),
(5),
(6),
(9),
(10),
(11),
(13),
(14),
(16),
(18),
(20),
(23),
(24),
(25)
) AS x (num)
),
-- To get my sequences I recurse until there is no num + 1 in my set
mySequences AS (
-- Anchor member definition: Create the first invocation
SELECT v.num, 0 AS iteration, v.num AS previous, v.num AS start
FROM myNumbers v
UNION ALL
-- Recursive member definition: Recurse until value + 1 does not exist
SELECT s.num + 1, s.iteration + 1 AS iteration, s.num AS previous, s.start
FROM mySequences s -- Notice that we can reference the CTE within itself
JOIN myNumbers v
ON v.num = s.num + 1
)
-- I must increment by 1 because I chose to start my recursion at 0
SELECT MAX(iteration + 1)
FROM mySequences
That recursive query is similar to writing
public int GetSequenceLength(int start, int iteration, int[] myNumbers)
{
if (myNumbers.Contains(start + 1))
{
return GetSequenceLength(start + 1, iteration + 1, myNumbers);
}
return iteration;
}
foreach (var myNumber in myNumbers)
{
var sequenceLength = GetSequenceLength(myNumber, 0, myNumbers) + 1;
Console.WriteLine(myNumber + " : " + sequenceLength);
}