I have the following table of rates for given date range.
I want to write a sql query (PostgreSQL) to get the sum of prices for a give period if it's a continuous period..for example:
if I specify 2011-05-02 to 2011-05-09 on the first set the sum of the 6 rows should be returned,
but
if i specify 2011-05-02 to 2011-05-011 on the second set nothing should be returned.
My problem is that I don't know how to determine if a date range is continuous...can you please help? Thanks a lot
case 1: sum expected
price from_date to_date
------ ------------ ------------
1.0 "2011-05-02" "2011-05-02"
2.0 "2011-05-03" "2011-05-03"
3.0 "2011-05-04" "2011-05-05"
4.0 "2011-05-05" "2011-05-06"
5.0 "2011-05-06" "2011-05-07"
4.0 "2011-05-08" "2011-05-09"
case 2: no results expected
price from_date to_date
------ ------------ ------------
1.0 "2011-05-02" "2011-05-02"
2.0 "2011-05-03" "2011-05-03"
3.0 "2011-05-07" "2011-05-09"
4.0 "2011-05-09" "2011-05-011"
I do not have overlapping rates date ranges.
Not sure I understood the question completely, but what about this:
select *
from prices
where not exists (
select 1 from (
select from_date - lag(to_date) over (partition by null order by from_date asc) as days_diff
from prices
where from_date >= DATE '2011-05-01'
and to_date < DATE '2011-07-01'
) t
where coalesce(days_diff, 0) > 1
)
order by from_date
Here's a rather fonky way to solve it :
WITH RECURSIVE t AS (
SELECT * FROM d WHERE '2011-05-02' BETWEEN start_date AND end_date
UNION ALL
SELECT d.* FROM t JOIN d ON (d.key=t.key AND d.start_date=t.end_date+'1 DAY'::INTERVAL)
WHERE d.start_date <= '2011-05-09')
SELECT sum(price), min(start_date), max(end_date)
FROM t
HAVING min(start_date) <= '2011-05-02' AND max(end_date)>= '2011-05-09';
I think you need to combine window functions and CTEs:
WITH
raw_rows AS (
SELECT your_table.*,
lag(to_date) OVER w as prev_date,
lead(from_date) OVER w as next_date
FROM your_table
WHERE ...
WINDOW w as (ORDER by from_date, to_date)
)
SELECT sum(stuff)
FROM raw_rows
HAVING bool_and(prev_date >= from_date - interval '1 day' AND
next_date <= to_date + interval '1 day');
http://www.postgresql.org/docs/9.0/static/tutorial-window.html
http://www.postgresql.org/docs/9.0/static/queries-with.html
Related
This is a little bit confusing so i'll try to clarify.
let's say I have an employee table like this
employee
eff_Dt
end_effective_date
1
1900-01-01
2020-12-31
1
2021-01-01
2021-02-01
1
2021-02-02
9999-01-01
2
1900-01-01
9999-01-01
3
1900-01-01
2015-12-31
3
2016-01-01
2020-01-01
4
1900-01-01
2016-01-01
4
2018-01-01
9999-01-01
Employees 1 and 2 are fine. They have a full effective dated history from 1900-01-01 to 9999-12-31. All of my employee records need that.
The SQL I need is to find records like 3 and 4. In the case of employee 3, we are missing the data from 2020-01-02 to 9999-01-01 and for employee 4 we are missing data from 2016-01-02 to 2017-12-31.
How can I develop a query that will return these records? I am on Oracle SQL - would prefer an ANSI SQL solution if possible but if the best solution is uses oracle specific functions than it is what it is. I do not have access to create indices or create stored procedures. This can only be done via query.
Thank you in advance.
I'd suggest to count days. The 2958098 is amount of days between 1900-01-01 and 9999-01-01.
This query will return employees 3 and 4
select employee, sum(end_effective_date - eff_dt)
from test
group by employee
having sum(end_effective_date - eff_dt) < 2958098;
UPD: Same query without hard-coded values
select employee, sum(end_effective_date - eff_dt)
from test
group by employee
having sum(end_effective_date - eff_dt) < (date'9999-01-01' - date'1900-01-01' + 1);
If you want the employees missing dates:
select employee
from t
group by employee
having min(eff_Dt) <> date '1900-01-01' or
max(end_effective_date) <> date '9999-01-01';
If you want the specific missing time periods, use lead() for most of them . . . and then union all to get the first one:
select employee,
end_effective_date + interval '1' day as missing_eff_dt,
next_eff_dt - interval '1' day as missing_end_dt
from (select t.*, lead(eff_dt) as next_eff_dt
from t
) t
where next_eff_dt > end_effective_date + interval '1' day
union all
select employee, date '1900-01-01',
min(eff_dt) - interval '1' day
from t
group by employee
where min(eff_dt) > date '1900-01-01'
If I got it well, you need to find records for employees who have gaps between end dates and start dates, and those who don't have 9999-01-01 as the max end_date. The query below will work for that purpose.
select EMPLOYEE, EFF_DT, END_EFFECTIVE_DATE
from (
select tt.*
, count(distinct grp)
over(partition by EMPLOYEE) cnt
, max(END_EFFECTIVE_DATE)
over(partition by EMPLOYEE order by EFF_DT desc) max_END_EFFECTIVE_DATE
from (
select t.*
, case
when EFF_DT != nvl(
lag(END_EFFECTIVE_DATE, 1)over(partition by EMPLOYEE order by EFF_DT)
, date '-4712-01-01'
)+ 1
then row_number()over (partition by EMPLOYEE order by EFF_DT)
else null
end
grp
from your_table
) tt
)ttt
where cnt > 1
or max_END_EFFECTIVE_DATE < date '9999-01-01'
;
The query below uses the Tabibitosan method to stitch together the adjacent time periods. The method itself uses the analytic sum() function and standard aggregation; it works almost unchanged in any SQL dialect that supports basic analytic functions.
The output shows only the employees with incomplete data. It shows uninterrupted periods of "effectivity"; if data is "complete", then there should be only one such interval for the employee, from 1 JAN 1900 to 1 JAN 9999. Those are excluded; the output shows the employees with gaps at the beginning, in the middle, and/or the end, and for those employees it shows the interval (or intervals) of "effectivity".
While you didn't request this, the query could be modified easily to show the "missing" periods for each employee (the periods when they were not effective).
with
t (employee, eff_dt, end_effective_date, grp) as (
select employee, eff_dt, end_effective_date,
end_effective_date - sum(end_effective_date + 1 - eff_dt)
over (partition by employee order by eff_dt)
from sample_data
)
select employee, min(eff_dt) as eff_dt, max(end_effective_date) as end_dt
from t
group by employee, grp
having min(eff_dt) != date '1900-01-01'
or max(end_effective_date) != date '9999-01-01'
order by employee, eff_dt
;
EMPLOYEE EFF_DT END_DT
---------- ---------- ----------
3 1900-01-01 2020-01-01
4 1900-01-01 2016-01-01
4 2018-01-01 9999-01-01
The best way to explain what I need is showing, so, here it is:
Currently I have this query
select
date_
,count(*) as count_
from table
group by date_
which returns me the following database
Now I need to get a new column, that shows me the count off all the previous 7 days, considering the row date_.
So, if the row is from day 29/06, I have to count all ocurrencies of that day ( my query is already doing it) and get all ocurrencies from day 22/06 to 29/06
The result should be something like this:
If you have values for all dates, without gaps, then you can use window functions with a rows frame:
select
date,
count(*) cnt
sum(count(*)) over(order by date rows between 7 preceding and current row) cnt_d7
from mytable
group by date
order by date
you can try something like this:
select
date_,
count(*) as count_,
(select count(*)
from table as b
where b.date_ <= a.date_ and b.date_ > a.date - interval '7 days'
) as count7days_
from table as a
group by date_
If you have gaps, you can do a more complicated solution where you add and subtract the values:
with t as (
select date_, count(*) as count_
from table
group by date_
union all
select date_ + interval '8 day', -count(*) as count_
from table
group by date_
)
select date_,
sum(sum(count_)) over (order by date_ rows between unbounded preceding and current row) - sum(count_)
from t;
The - sum(count_) is because you do not seem to want the current day in the cumulated amount.
You can also use the nasty self-join approach . . . which should be okay for 7 days:
with t as (
select date_, count(*) as count_
from table
group by date_
)
select t.date_, t.count_, sum(tprev.count_)
from t left join
t tprev
on tprev.date_ >= t.date_ - interval '7 day' and
tprev.date_ < t.date_
group by t.date_, t.count_;
The performance will get worse and worse as "7" gets bigger.
Try with subquery for the new column:
select
table.date_ as groupdate,
count(table.date_) as date_count,
(select count(table.date_)
from table
where table.date_ <= groupdate and table.date_ >= groupdate - interval '7 day'
) as total7
from table
group by groupdate
order by groupdate
Dear Stack Overflow community,
I am looking for the patient id where the two consecutive dates after the very first one are less than 7 days.
So differences between 2nd and 1st date <= 7 days
and differences between 3rd and 2nd date <= 7 days
Example:
ID Date
1 9/8/2014
1 9/9/2014
1 9/10/2014
2 5/31/2014
2 7/20/2014
2 9/8/2014
For patient 1, the two dates following it are less than 7 days apart.
For patient 2 however, the following date are more than 7 days apart (50 days).
I am trying to write an SQL query that just output the patient id "1".
Thanks for your help :)
You want to use lead(), but this is complicated because you want this only for the first three rows. I think I would go for:
select t.*
from (select t.*,
lead(date, 1) over (partition by id order by date) as next_date,
lead(date, 2) over (partition by id order by date) as next_date_2,
row_number() over (partition by id order by date) as seqnum
from t
) t
where seqnum = 1 and
next_date <= date + interval '7' day and
next_date2 <= next_date + interval '7' day;
You can try using window function lag()
select * from
(
select id,date,lag(date) over(order by date) as prevdate
from tablename
)A where datediff(day,date,prevdate)<=7
I have a table containing each a start and and end date:
DROP TABLE temp_period;
CREATE TABLE public.temp_period
(
id integer NOT NULL,
"startDate" date,
"endDate" date
);
INSERT INTO temp_period(id,"startDate","endDate") VALUES(1,'2010-01-01','2010-03-31');
INSERT INTO temp_period(id,"startDate","endDate") VALUES(2,'2013-05-17','2013-07-18');
INSERT INTO temp_period(id,"startDate","endDate") VALUES(3,'2010-02-15','2010-05-31');
INSERT INTO temp_period(id,"startDate","endDate") VALUES(7,'2014-01-01','2014-12-31');
INSERT INTO temp_period(id,"startDate","endDate") VALUES(56,'2014-03-31','2014-06-30');
Now I want to know the total duration of all periods stored there. I need just the time as an interval. That's pretty easy:
SELECT sum(age("endDate","startDate")) FROM temp_period;
However, the problem is: Those periods do overlap. And I want to eliminate all overlapping periods, so that I get the total amount of time which is covered by at least one record in the table.
You see, there are quite some gaps in between the times, so passing the smallest start date and the most recent end date to the age function won't do the trick. However, I thought about doing that and subtracting the total amount of gaps, but no elegant way to do that came into my mind.
I use PostgreSQL 9.6.
What about this:
WITH
/* get all time points where something changes */
points AS (
SELECT "startDate" AS p
FROM temp_period
UNION SELECT "endDate"
FROM temp_period
),
/*
* Get all date ranges between these time points.
* The first time range will start with NULL,
* but that will be excluded in the next CTE anyway.
*/
inter AS (
SELECT daterange(
lag(p) OVER (ORDER BY p),
p
) i
FROM points
),
/*
* Get all date ranges that are contained
* in at least one of the intervals.
*/
overlap AS (
SELECT DISTINCT i
FROM inter
CROSS JOIN temp_period
WHERE i <# daterange("startDate", "endDate")
)
/* sum the lengths of the date ranges */
SELECT sum(age(upper(i), lower(i)))
FROM overlap;
For your data it will return:
┌──────────┐
│ interval │
├──────────┤
│ 576 days │
└──────────┘
(1 row)
You could try to use recursive cte to calculate the period. For each record, we will check if it's overlapped with previous records. If it is, we only calculate the period that is not overlapping.
WITH RECURSIVE days_count AS
(
SELECT startDate,
endDate,
AGE(endDate, startDate) AS total_days,
rowSeq
FROM ordered_data
WHERE rowSeq = 1
UNION ALL
SELECT GREATEST(curr.startDate, prev.endDate) AS startDate,
GREATEST(curr.endDate, prev.endDate) AS endDate,
AGE(GREATEST(curr.endDate, prev.endDate), GREATEST(curr.startDate, prev.endDate)) AS total_days,
curr.rowSeq
FROM ordered_data curr
INNER JOIN days_count prev
ON curr.rowSeq > 1
AND curr.rowSeq = prev.rowSeq + 1),
ordered_data AS
(
SELECT *,
ROW_NUMBER() OVER (ORDER BY startDate) AS rowSeq
FROM temp_period)
SELECT SUM(total_days) AS total_days
FROM days_count;
I've created a demo here
Actually there is a case that is not covered by the previous examples.
What if we have such a period ?
INSERT INTO temp_period(id,"startDate","endDate") VALUES(100,'2010-01-03','2010-02-10');
We have the following intervals:
Interval No. | | start_date | | end_date
--------------+------------------+------------+----------------+------------
1 | Interval start | 2010-01-01 | Interval end | 2010-03-31
2 | Interval start | 2010-01-03 | Interval end | 2010-02-10
3 | Interval start | 2010-02-15 | Interval end | 2010-05-31
4 | Interval start | 2013-05-17 | Interval end | 2013-07-18
5 | Interval start | 2014-01-01 | Interval end | 2014-12-31
6 | Interval start | 2014-03-31 | Interval end | 2014-06-30
Even though segment 3 overlaps segment 1, it's seen as a new segment, hence the (wrong) result:
sum
-----
620
(1 row)
The solution is to tweak the core of the query
CASE WHEN start_date < lag(end_date) OVER (ORDER BY start_date, end_date) then NULL ELSE start_date END
needs to be replaced by
CASE WHEN start_date < max(end_date) OVER (ORDER BY start_date, end_date rows between unbounded preceding and 1 preceding) then NULL ELSE start_date END
then it works as expected
sum
-----
576
(1 row)
Summary:
SELECT sum(e - s)
FROM (
SELECT left_edge as s, max(end_date) as e
FROM (
SELECT start_date, end_date, max(new_start) over (ORDER BY start_date, end_date) as left_edge
FROM (
SELECT start_date, end_date, CASE WHEN start_date < max(end_date) OVER (ORDER BY start_date, end_date rows between unbounded preceding and 1 preceding) then NULL ELSE start_date END AS new_start
FROM temp_period
) s1
) s2
GROUP BY left_edge
) s3;
This one required two outer joins on a complex query. One join to identify all overlaps with a startdate larger than THIS and to expand the timespan to match the larger of the two. The second join is needed to match records with no overlaps. Take the Min of the min and the max of the max, including non matched. I was using MSSQL so the syntax may be a bit different.
DECLARE #temp_period TABLE
(
id int NOT NULL,
startDate datetime,
endDate datetime
)
INSERT INTO #temp_period(id,startDate,endDate) VALUES(1,'2010-01-01','2010-03-31')
INSERT INTO #temp_period(id,startDate,endDate) VALUES(2,'2013-05-17','2013-07-18')
INSERT INTO #temp_period(id,startDate,endDate) VALUES(3,'2010-02-15','2010-05-31')
INSERT INTO #temp_period(id,startDate,endDate) VALUES(3,'2010-02-15','2010-07-31')
INSERT INTO #temp_period(id,startDate,endDate) VALUES(7,'2014-01-01','2014-12-31')
INSERT INTO #temp_period(id,startDate,endDate) VALUES(56,'2014-03-31','2014-06-30')
;WITH OverLaps AS
(
SELECT
Main.id,
OverlappedID=Overlaps.id,
OverlapMinDate,
OverlapMaxDate
FROM
#temp_period Main
LEFT OUTER JOIN
(
SELECT
This.id,
OverlapMinDate=CASE WHEN This.StartDate<Prior.StartDate THEN This.StartDate ELSE Prior.StartDate END,
OverlapMaxDate=CASE WHEN This.EndDate>Prior.EndDate THEN This.EndDate ELSE Prior.EndDate END,
PriorID=Prior.id
FROM
#temp_period This
LEFT OUTER JOIN #temp_period Prior ON Prior.endDate > This.startDate AND Prior.startdate < this.endDate AND This.Id<>Prior.ID
) Overlaps ON Main.Id=Overlaps.PriorId
)
SELECT
T.Id,
--If has overlapped then sum all overlapped records prior to this one, else not and overlap get the start and end
MinDate=MIN(COALESCE(HasOverlapped.OverlapMinDate,startDate)),
MaxDate=MAX(COALESCE(HasOverlapped.OverlapMaxDate,endDate))
FROM
#temp_period T
LEFT OUTER JOIN OverLaps IsAOverlap ON IsAOverlap.OverlappedID=T.id
LEFT OUTER JOIN OverLaps HasOverlapped ON HasOverlapped.Id=T.id
WHERE
IsAOverlap.OverlappedID IS NULL -- Exclude older records that have overlaps
GROUP BY
T.Id
Beware: the answer by Laurenz Albe has a huge scalability issue.
I was more than happy when I found it. I customized it for our needs. We deployed to staging and very soon, the server took several minutes to return the results.
Then I found this answer on postgresql.org. Much more efficient.
https://wiki.postgresql.org/wiki/Range_aggregation
SELECT sum(e - s)
FROM (
SELECT left_edge as s, max(end_date) as e
FROM (
SELECT start_date, end_date, max(new_start) over (ORDER BY start_date, end_date) as left_edge
FROM (
SELECT start_date, end_date, CASE WHEN start_date < lag(end_date) OVER (ORDER BY start_date, end_date) then NULL ELSE start_date END AS new_start
FROM temp_period
) s1
) s2
GROUP BY left_edge
) s3;
Result:
sum
-----
576
(1 row)
I have data in an Ingres table something like this;
REF FROM_DATE TO_DATE
A 01.04.1997 01.04.1998
A 01.04.1998 27.05.1998
A 27.05.1998 01.04.1999
B 01.04.1997 01.04.1998
B 01.04.1998 26.07.1998
B 01.04.2012 01.04.2013
Some refs have continuous periods from the min(from_date) to the max(to_date), but some have gaps in the period.
I would like to know a way in Ingres SQL of identifying which refs have gaps in the date periods.
I am doing this as a Unix shell script calling the Ingres sql command.
Please advise.
I am not familiar with the date functions in Ingres. Let me assume that - gets the difference between two dates in days.
If there are no overlaps in the data, then you can do what you want pretty easily. If there are no gaps, then the difference between the minimum and maximum date is the same as the sum of the differences on each line. If the difference is greater than 0, then there are gaps.
So:
select ref,
((max(to_date) - min(from_date)) -
sum(to_date - from_date)
) as total_gaps
from t
group by ref;
I believe this will work in your case. In other cases, there might be an "off-by-1" problem, depending on whether or not the end date is included in the period.
This query works in SQL SERVER. PARTITION is a ANSI SQL command, I don't know if INGRES supports it. if partition is supported probably you would have an equivalent to Dense_Rank()
select *
INTO #TEMP
from (
select 'A' as Ref, Cast('1997-01-04' as DateTime) as From_date, Cast('1998-01-04' as DateTime) as to_date
union
select 'A' as Ref, Cast('1998-01-04' as DateTime) as From_date, Cast('1998-05-27' as DateTime) as to_date
union
select 'A' as Ref, Cast('1998-05-27' as DateTime) as From_date, Cast('1999-01-04' as DateTime) as to_date
union
select 'B' as Ref, Cast('1997-01-04' as DateTime) as From_date, Cast('1998-01-04' as DateTime) as to_date
union
select 'B' as Ref, Cast('1998-01-04' as DateTime) as From_date, Cast('1998-07-26' as DateTime) as to_date
union
select 'B' as Ref, Cast('2012-01-04' as DateTime) as From_date, Cast('2013-01-04' as DateTime) as to_date
) X
SELECT *
FROM
(
SELECT Ref, Min(NewStartDate) From_Date, MAX(To_Date) To_Date, COUNT(1) OVER (PARTITION BY Ref ) As [CountRanges]
FROM
(
SELECT Ref, From_Date, To_Date,
NewStartDate = Range_UNTIL_NULL.From_Date + NUMBERS.number,
NewStartDateGroup = DATEADD(d,
1 - DENSE_RANK() OVER (PARTITION BY Ref ORDER BY Range_UNTIL_NULL.From_Date + NUMBERS.number),
Range_UNTIL_NULL.From_Date + NUMBERS.number)
FROM
(
--This subquery is necesary needed to "expand the To_date" to the next day and allowing it to be null
SELECT
REF, From_date, DATEADD(d, 1, ISNULL(To_Date, From_Date)) AS to_date
FROM #Temp T1
WHERE
NOT EXISTS ( SELECT *
FROM #Temp t2
WHERE T1.Ref = T2.Ref and T1.From_Date > T2.From_Date AND T2.To_Date IS NULL
)
) AS Range_UNTIL_NULL
CROSS APPLY Enumerate ( ABS(DATEDIFF(d, From_Date, To_Date))) AS NUMBERS
) X
GROUP BY Ref, NewStartDateGroup
) OVERLAPED_RANGES_WITH_COUNT
-- WHERE OVERLAPED_RANGES_WITH_COUNT.CountRanges >= 2 --This filter is for identifying ranges that have at least one gap
ORDER BY Ref, From_Date
The result for the given example is:
Ref From_Date To_Date CountRanges
---- ----------------------- ----------------------- -----------
A 1997-01-04 00:00:00.000 1999-01-05 00:00:00.000 1
B 1997-01-04 00:00:00.000 1998-07-27 00:00:00.000 2
B 2012-01-04 00:00:00.000 2013-01-05 00:00:00.000 2
as you can see those ref having "CountRanges" > 1 have at least one gap
This answer goes far beyound the initial question, because:
Ranges can be overlaped, is not clear if in the initial question that can happen
The question only ask which refs have gaps but with this query you can list the gaps
Tis query allows To_date in null, representing a semi segment to the infinite