Group price with start and end date - sql

I have a table
Recordid Price Start date end date
-----------------------------------------
1 20 2017-10-01 2017-10-02
2 20 2017-10-03 2017-10-04
3 30 2017-10-05 2017-10-05
4 20 2017-10-06 2017-10-07
I want to get every price when it started and when it ended so my result set would be
20. 2017-10-01. 2017-10-04
30. 2017-10-05. 2017-10-05
20. 2017-10-06. 2017-10-07
I'm having problems to figure it out
It's an Oracle database

i figured it out with the code below
SELECT distinct price
, case when start_dt is null then lag(start_dt) over (order by start_date)
else start_dt end realstart
, case when end_dt is null then lead(end_dt) over (order by end_date)
else end_dt end realend
FROM (SELECT case when nvl(lag(price) over (order by start_date),-1) <> price then start_date end start_dt
, case when nvl(lead(price) over (order by end_date),-1) <>price then end_date end end_dt
, price
, start_date
, end_date
FROM t) main
WHERE start_dt is not null
or end_dt is not null

From your sample data I think you want to have start date and end date whenever the price has been changed in order of the record id.
The following query may contain more sub queries than neccessary, because readability. The very inner select determines when the price has been changed here called group changes. The Next level froms the group by a rolling sum. This is possible, because the only the group change contains values > 0. The rest is obvious.
SELECT GRP,
PRICE,
MIN("Start date") AS "Start date",
MAX("end date") AS "end date"
FROM ( SELECT sub.*,
SUM(GROUP_CHANGE) OVER (ORDER BY RECORDID) AS GRP
FROM ( SELECT t.*,
CASE
WHEN RECORDID = LAG(t.RECORDID) OVER (ORDER BY t.PRICE, t.RECORDID) + 1
THEN 0
ELSE RECORDID
END AS GROUP_CHANGE
FROM t ) sub ) fin
GROUP BY GRP, PRICE
ORDER BY GRP
GRP PRICE Start date end date
---------- ---------- ---------- --------
1 20 01.10.17 04.10.17
4 30 05.10.17 05.10.17
8 20 06.10.17 11.10.17
Tested with following data (Note that I have add some record to your deliverd sample data as I wanted to have a group with three records)
CREATE TABLE t (
Recordid INT,
Price INT,
"Start date" DATE,
"end date" DATE
);
INSERT INTO t VALUES (1, 20, TO_DATE('2017-10-01', 'YYYY-MM-DD'), TO_DATE('2017-10-02', 'YYYY-MM-DD'));
INSERT INTO t VALUES (2, 20, TO_DATE('2017-10-03', 'YYYY-MM-DD'), TO_DATE('2017-10-04', 'YYYY-MM-DD'));
INSERT INTO t VALUES (3, 30, TO_DATE('2017-10-05', 'YYYY-MM-DD'), TO_DATE('2017-10-05', 'YYYY-MM-DD'));
INSERT INTO t VALUES (4, 20, TO_DATE('2017-10-06', 'YYYY-MM-DD'), TO_DATE('2017-10-07', 'YYYY-MM-DD'));
INSERT INTO t VALUES (5, 20, TO_DATE('2017-10-08', 'YYYY-MM-DD'), TO_DATE('2017-10-09', 'YYYY-MM-DD'));
INSERT INTO t VALUES (6, 20, TO_DATE('2017-10-10', 'YYYY-MM-DD'), TO_DATE('2017-10-11', 'YYYY-MM-DD'));

Here is one method that might work in your case:
select price, min(start_date), max(end_date)
from (select t.*,
sum(case when prev_price = price and prev_end_date = start_date - 1
then 0 else 1
end) over (order by t.start_date) as grp
from (select t.*,
lag(t.end_date) over (order by t.start_date) as prev_end_date,
lag(t.price) over (order by t.start_date) as prev_price
from t
) t
) t
group by price, grp

Related

SQL Server - Get row based on a sum

I need to calculate 5 working days from a given date, based on the table below.
5 working days from Jan 9 is Jan 16 because the sum of the working_days column below between those dates is 5.
Here is SQL that I used.
WITH dates AS
(
SELECT t_from.start_date, t_to.start_date end_date
FROM #t t_from, #t t_to
WHERE t_from.start_date < t_to.start_date
),
sum_days AS
(
SELECT
start_date, end_date,
(SELECT SUM(t_sum.working_days)
FROM #t t_sum
WHERE t_sum.start_date BETWEEN d.start_date AND d.end_date) tot_days
FROM
dates d
)
SELECT
start_date, MAX(end_date) end_date
FROM
sum_days
WHERE
tot_days = 5
GROUP BY
start_date
It works, but it is inefficient. The real table that I'm using has 1,000 rows, and it takes over 1 minute for the query to return.
My question: is there a better way?
Input:
start_date
working_days
2023-01-09
1
2023-01-10
1
2023-01-11
1
2023-01-12
1
2023-01-13
1
2023-01-14
0
2023-01-15
0
2023-01-16
0
2023-01-17
1
2023-01-18
1
2023-01-19
1
2023-01-20
1
2023-01-21
0
2023-01-22
0
2023-01-23
1
2023-01-24
1
Desired output:
start_date
end_date
2023-01-09
2023-01-16
2023-01-10
2023-01-17
2023-01-11
2023-01-18
2023-01-12
2023-01-19
2023-01-13
2023-01-22
2023-01-14
2023-01-23
2023-01-15
2023-01-23
2023-01-16
2023-01-23
2023-01-17
2023-01-23
2023-01-18
2023-01-24
SQL to create the table:
drop table if exists #t;
GO
select '2023-01-09' start_date,1 working_days into #t;
GO
insert into #t values('2023-01-10',1) ;
go
insert into #t values('2023-01-11',1);
insert into #t values('2023-01-12',1);
insert into #t values('2023-01-13',1);
insert into #t values('2023-01-14',0);
insert into #t values('2023-01-15',0);
insert into #t values('2023-01-16',0);
insert into #t values('2023-01-17',1);
insert into #t values('2023-01-18',1);
insert into #t values('2023-01-19',1);
insert into #t values('2023-01-20',1);
insert into #t values('2023-01-21',0);
insert into #t values('2023-01-22',0);
insert into #t values('2023-01-23',1);
insert into #t values('2023-01-24',1);
go
FROM #t t_from, #t t_to
where t_from.start_date < t_to.start_date
is a "triangular" join. It is not quite as bad as a cross join but getting that way (rows returned are N*(N-1)/2 rather than N*N).
This will not scale with large numbers of rows in #t.
One way of getting your desired results (db fiddle) is
WITH Dates
AS (SELECT *,
sum(working_days)
OVER (
ORDER BY start_date) AS working_day_count
FROM #t)
SELECT D1.start_date,
MAX(D2.start_date)
FROM Dates D1
JOIN Dates D2
ON D1.working_day_count + 5 - D1.working_days = D2.working_day_count
GROUP BY D1.start_date
This calculates the running total efficiently. Potentially a solution will be provided that does it all in one pass rather than requiring the self join above but this is at least an equi join and should be a lot faster in your 1,000 row case than your current method.
Out of curiosity, look at the solution without JOIN
with cte as (
select start_date,working_days
,lead(start_date,1)over(order by start_date ) d1
,lead(start_date,4)over(order by start_date ) d2 -- target date
,lead(start_date,5)over(order by start_date ) d3 --target+1, if not_working_days between d2 and d3
from (select * from #t where working_days=1) t -- dates, only working_days
)
,t2 as(
select *
,case when datediff(d,d2,d3)>1 then dateadd(d,-1,d3)
else d2
end end_date
,datediff(d,start_date,d1) dn
from cte
)
select
dateadd(d,isnull(n,0),start_date) start_date
,case when isnull(n,0)=0 then working_days else 0 end working_days
,case when isnull(n,0)=0 then end_date else dateadd(d,1,end_date) end end_date
from t2 left join (values(0),(1),(2),(3),(4),(5))nn(n) --to restore not_working_days
on nn.n<t2.dn
If there is an opportunity to compare the cost, it will be interesting.
Test example
with data as (
select start_date as dt, working_days as adj,
sum(cast(working_days as int)) over (order by start_date) as ofs
from #t
)
select ds.dt as start_date, de.dt as end_date
from data ds cross apply (
select max(dt) as end_date
from data de
where de.ofs = ds.ofs + 5 - ds.adj
) de(dt)
where de.dt is not null;
Basically the same as above but cross apply might be improvement. This seems to favor a clustered index on start_date in my experiments.
Or you could just search via lead():
with data as (
select start_date as dt, working_days as adj,
sum(cast(working_days as int)) over (order by start_date) as ofs
from #t
), data2 as (
select dt as start_date,
case ofs + 5 - adj -- check in reverse order
when lead(ofs, 9) over (order by dt) then lead(dt, 9) over (order by dt)
when lead(ofs, 8) over (order by dt) then lead(dt, 8) over (order by dt)
when lead(ofs, 7) over (order by dt) then lead(dt, 7) over (order by dt)
when lead(ofs, 6) over (order by dt) then lead(dt, 6) over (order by dt)
end as end_date
from data
)
select * from data2 where end_date is not null;
This query assumes at least two days off per week and a maximum of four-day weekends to limit the number of dates that need to be searched. Expand as necessary.
Check out the fiddle here with a demonstration that both of these approaches seem to generate cheaper plans: https://dbfiddle.uk/1SdBRmmg
There is a way to use a single pass over the data. You'll have to wrap it up in a table expression to filter the null values near the end of the calendar.
select start_date,
dateadd(day,
case 5
when sum(working_days) over (order by start_date rows between current row and 9 following) then 9
when sum(working_days) over (order by start_date rows between current row and 8 following) then 8
when sum(working_days) over (order by start_date rows between current row and 7 following) then 7
when sum(working_days) over (order by start_date rows between current row and 6 following) then 6
end,
start_date) as end_date
from #t;
And just for fun you could do this with range over (https://dbfiddle.uk/cm3KJO-W) just not on SQL Server yet:
with data as (select *, sum(working_days) over (order by start_date) as ofs from t)
select start_date,
max(start_date) over (order by ofs range between 5 following and 5 following) as end_date
from data;

Incremental count

I have a table with a list of Customer Numbers and Order Dates and want to add a count against each Customer number, restarting from 1 each time the customer number changes, I've sorted the Table into Customer then Date order, and need to add an order count column.
CASE WHEN 'Customer Number' on This row = 'Customer Number' on Previous Row then ( Count = Count on Previous Row + 1 )
Else Count = 1
What is the best way to approach this?
Customer and Dates in Customer then Date order:
Customer Date Count
0001 01/05/18 1
0001 02/05/18 2
0001 03/05/18 3
0002 03/05/18 1 <- back to one here as Customer changed
0002 04/05/18 2
0003 05/05/18 1 <- back to one again
I've just tried COUNT(*) OVER (PARTITION BY Customer ) as COUNT but it doesn't seem to be starting from 1 for some reason when the Customer changes
It's hard to tell what you want, but "to add a count against each Customer number, restarting from 1 each time the customer number changes" sounds as if you simply want:
count(*) over (partition by customer_number)
or maybe that should be the count "up-to" the date of the row:
count(*) over (partition by customer_number order by order_date)
It sound like you just want an analytic row_number() call:
select customer_number,
order_date,
row_number() over (partition by customer_number order by order_date) as num
from your_table
order by customer_number,
order_date
Using an analytic count also works, as #horse_with_no_name suggested:
count(*) over (partition by customer_number order by order_date) as num
Quick demo showing both, with your sample data in a CTE:
with your_table (customer_number, order_date) as (
select '0001', date '2018-05-01' from dual
union all select '0001', date '2018-05-03' from dual
union all select '0001', date '2018-05-02' from dual
union all select '0002', date '2018-05-03' from dual
union all select '0002', date '2018-05-04' from dual
union all select '0003', date '2018-05-05' from dual
)
select customer_number,
order_date,
row_number() over (partition by customer_number order by order_date) as num1,
count(*) over (partition by customer_number order by order_date) as num2
from your_table
order by customer_number,
order_date
/
CUST ORDER_DATE NUM1 NUM2
---- ---------- ---------- ----------
0001 2018-05-01 1 1
0001 2018-05-02 2 2
0001 2018-05-03 3 3
0002 2018-05-03 1 1
0002 2018-05-04 2 2
0003 2018-05-05 1 1

Select min/max dates for periods that don't intersect

Example! I have a table with 4 columns. date format dd.MM.yy
id ban start end
1 1 01.01.15 31.12.18
1 1 02.02.15 31.12.18
1 1 05.04.15 31.12.17
In this case dates from rows 2 and 3 are included in dates from row 1
1 1 02.04.19 31.12.20
1 1 05.05.19 31.12.20
In this case dates from row 5 are included in dates from rows 4. Basically we have 2 periods that don't intersect.
01.01.15 31.12.18
and
02.04.19 31.12.20
Situation where a date starts in one period and ends in another are impossible. The end result should look like this
1 1 01.01.15 31.12.18
1 1 02.04.19 31.12.20
I tried using analitical functions(LAG)
select id
, ban
, case
when start >= nvl(lag(start) over (partition by id, ban order by start, end asc), start)
and end <= nvl(lag(end) over (partition by id, ban order by start, end asc), end)
then nvl(lag(start) over (partition by id, ban order by start, end asc), start)
else start
end as start
, case
when start >= nvl(lag(start) over (partition by id, ban order by start, end asc), start)
and end <= nvl(lag(end) over (partition by id, ban order by start, end asc), end)
then nvl(lag(end) over (partition by id, ban order by start, end asc), end)
else end
end as end
from table
Where I order rows and if current dates are included in previous I replace them. It works if I have just 2 rows. For example this
1 1 08.09.15 31.12.99
1 1 31.12.15 31.12.99
turns into this
1 1 08.09.15 31.12.99
1 1 08.09.15 31.12.99
which I can then group by all fields and get what I want, but if there are more
1 2 13.11.15 31.12.99
1 2 31.12.15 31.12.99
1 2 16.06.15 31.12.99
I get
1 2 16.06.15 31.12.99
1 2 16.06.15 31.12.99
1 2 13.11.15 31.12.99
I understand why this happens, but how do I work around it? Running the query multiple times is not an option.
This query looks promising:
-- test data
with t(id, ban, dtstart, dtend) as (
select 1, 1, date '2015-01-01', date '2015-03-31' from dual union all
select 1, 1, date '2015-02-02', date '2015-03-31' from dual union all
select 1, 1, date '2015-03-15', date '2015-03-31' from dual union all
select 1, 1, date '2015-08-05', date '2015-12-31' from dual union all
select 1, 2, date '2015-01-01', date '2016-12-31' from dual union all
select 2, 1, date '2016-01-01', date '2017-12-31' from dual),
-- end of test data
step1 as (select id, ban, dt, to_number(inout) direction
from t unpivot (dt for inout in (dtstart as '1', dtend as '-1'))),
step2 as (select distinct id, ban, dt, direction,
sum(direction) over (partition by id, ban order by dt) sm
from step1),
step3 as (select id, ban, direction, dt dt1,
lead(dt) over (partition by id, ban order by dt) dt2
from step2
where (direction = 1 and sm = 1) or (direction = -1 and sm = 0) )
select id, ban, dt1, dt2
from step3 where direction = 1 order by id, ban, dt1
step1 - unpivot dates and assign 1 for start date, -1 for end
date (column direction)
step2 - add cumulative sum for direction
step3 - filter only interesting dates, pivot second date using lead()
You can shorten this syntax, I divided it to steps to show what's going on.
Result:
ID BAN DT1 DT2
------ ---------- ----------- -----------
1 1 2015-01-01 2015-03-31
1 1 2015-08-05 2015-12-31
1 2 2015-01-01 2016-12-31
2 1 2016-01-01 2017-12-31
I assumed that for different (ID, BAN) we have to make calculations separately. If not - change partitioning and ordering in sum() and lead().
Pivot and unpivot works in Oracle 11 and later, for earlier versions you need case when.
BTW - START is reserved word in Oracle so in my example I changed slightly column names.
I like to do this by identifying the period starts, then doing a cumulative sum to define the group, and a final aggregation:
select id, ban, min(start), max(end)
from (select t.*, sum(start_flag) over (partition by id, bin order by start) as grp
from (select t.*,
(case when exists (select 1
from t t2
where t2.id = t.id and t2.ban = t.ban and
t.start <= t2.end and t.end >= t2.start and
t.start <> t2.start and t.end <> t2.end
)
then 0 else 1
end) as start_flag
from t
) t
) t
group by id, ban, grp;

Oracle query to get how many days a record has certain status before today

I need query to get how many days last time a meter has been online.
For example :
METER PDATE STATUS
ABC 1-Jan off
ABC 2-Jan on
ABC 3-Jan on
ABC 4-Jan on
ABC 5-Jan off
ABC 6-Jan off
ABC 7-Jan on
ABC 8-Jan on
ABC 9-Jan off
If today is Jan 8th than the query will return : 3 (Jan 2-4).
If today is Jan 9th than the query will return : 2 (Jan 7-8).
My query below is working OK, but it takes 40-50 seconds if applied to the real table which has 5 millions records.
Please let me know if there are faster ways to get such data.
with last_off as
(
select meter,pdate lastoff from
(
select meter, pdate,
row_number() over (partition by meter order by pdate desc) rnum
from mytable
where status = 'off'
)
where rnum=1
),
last_on as
(
select meter, laston from
(
select a.meter, a.pdate laston, b.lastoff,
row_number() over (partition by a.meter order by a.pdate desc) rnum
from mytable a, last_off b
where status = 'on'
and a.meter=b.meter(+) and a.pdate < b.lastoff
)
where rnum=1
),
days_on as
(
select meter, laston-pdate dayson from
(
select a.meter, a.pdate, b.laston,
row_number() over (partition by a.meter order by a.pdate desc) rnum
from mytable a, last_on b
where status = 'off'
and a.meter=b.meter(+) and a.pdate < b.laston
)
where rnum=1
)
select meter, dayson
from days_on
with t as (
select meter, pdate, status,
case when lag(status) over (partition by meter order by pdate)
< status then 1 end chg1,
case when lead(status) over (partition by meter order by pdate)
< status then 1 end chg2
from mytable),
d2 as (
select meter, max(pdate) do2
from t where chg2 = 1 and pdate < date '2015-01-09' group by meter),
d1 as (
select meter, max(pdate) do1 from t join d2 using (meter)
where chg1 = 1 and pdate < d2.do2 group by meter)
select meter, do2-do1+1 days_on from d1 join d2 using (meter)
SQLFiddle demo
Change value in line containing date '2015-01-09' to whatever value you want, probably trunc(sysdate). Also change last line to:
select meter, count(1) cnt from t join d1 using (meter) join d2 using (meter)
where pdate between do1 and do2 group by (meter)
if you want to count rows from main table instead of simple subtracting days.
This would get the list of meters that have been on, and how many days they've been on.
(caveat: I don't have an Oracle instance to try this on as I'm writing it)
select maxon.METER,
(maxon.maxdate - maxoff.maxdate) as dayson
from
(select METER,
Max(PDATE) maxdate
from MY_TABLE
where PSTATUS = 'on'
group by meter) as maxon,
(select METER,
Max(PDATE) maxdate
from MY_TABLE
where PSTATUS = 'off'
group by meter) as maxoff
where maxon.meter = maxoff.meter
and maxon.maxdate > maxoff.maxdate;
You could union a second query to get the meters that have been off, or just be more clever in how you interpret the subtraction result (i.e., do a CASE statement such that if the result is negative, it's off and if positive, it's on)
http://www.techonthenet.com/oracle/functions/case.php
Data Setup:
CREATE TABLE my_table
(METER varchar2(3), PDATE date, STATUS varchar2(3))
;
INSERT ALL
INTO my_table (METER, PDATE, STATUS)
VALUES ('ABC', '01-Jan-2001', 'off')
INTO my_table (METER, PDATE, STATUS)
VALUES ('ABC', '02-Jan-2001', 'on')
INTO my_table (METER, PDATE, STATUS)
VALUES ('ABC', '03-Jan-2001', 'on')
INTO my_table (METER, PDATE, STATUS)
VALUES ('ABC', '04-Jan-2001', 'on')
INTO my_table (METER, PDATE, STATUS)
VALUES ('ABC', '05-Jan-2001', 'off')
INTO my_table (METER, PDATE, STATUS)
VALUES ('ABC', '06-Jan-2001', 'off')
INTO my_table (METER, PDATE, STATUS)
VALUES ('ABC', '07-Jan-2001', 'on')
INTO my_table (METER, PDATE, STATUS)
VALUES ('ABC', '08-Jan-2001', 'on')
INTO my_table (METER, PDATE, STATUS)
VALUES ('ABC', '09-Jan-2001', 'off')
SELECT * FROM dual
;
You can make use of Analytical function called lag
Query:
SELECT meter,
pdate,
pdate - Min (pdate)
over(
ORDER BY grp DESC) + 1 AS daysoff
FROM (SELECT meter,
pdate,
status,
Max(grp)
over(
ORDER BY pdate) grp
FROM (SELECT meter,
pdate,
status,
CASE
WHEN Lag(status)
over (
ORDER BY pdate) != ( status ) THEN
Row_number()
over (
ORDER BY pdate)
WHEN Row_number()
over (
ORDER BY pdate) = 1 THEN 1
END grp
FROM my_table))
WHERE status = 'on'
ORDER BY pdate ASC;
Results:
METER PDATE DAYSOFF
ABC January, 02 2001 00:00:00 1
ABC January, 03 2001 00:00:00 2
ABC January, 04 2001 00:00:00 3
ABC January, 07 2001 00:00:00 1
ABC January, 08 2001 00:00:00 2

breakdown by weeks

Below is a simple query and the result: Is the a way to aggregate the total EVENTs by 7 days, then sum up the total EVENTs? Would a rollup function work? I am using SQL SERVER 05 & 08. Thanks again, folks.
SELECT DATE_SOLD, count(DISTINCT PRODUCTS) AS PRODUCT_SOLD
FROM PRODUCTS
WHERE DATE >='10/1/2009'
and DATE <'10/1/2010'
GROUP BY DATE_SOLD
RESULTS:
DATE_SOLD PRODUCT_SOLD
10/1/09 5
10/2/09 11
10/3/09 14
10/4/09 6
10/5/09 11
10/6/09 13
10/7/09 10
Total 70
10/8/09 4
10/9/09 11
10/10/09 8
10/11/09 4
10/12/09 7
10/13/09 4
10/14/09 9
Total 47
Not having your table design to work with here's what I think you are after (although I have to admit the output needs to be cleaned up). It should, at least, get you some way to the solution you are looking for.
CREATE TABLE MyTable(
event_date date,
event_type char(1)
)
GO
INSERT MyTable VALUES ('2009-1-01', 'A')
INSERT MyTable VALUES ('2009-1-11', 'B')
INSERT MyTable VALUES ('2009-1-11', 'C')
INSERT MyTable VALUES ('2009-1-20', 'N')
INSERT MyTable VALUES ('2009-1-20', 'N')
INSERT MyTable VALUES ('2009-5-23', 'D')
INSERT MyTable VALUES ('2009-5-23', 'E')
INSERT MyTable VALUES ('2009-5-10', 'F')
INSERT MyTable VALUES ('2009-5-10', 'F')
GO
WITH T AS (
SELECT DATEPART(MONTH, event_date) event_month, event_date, event_type
FROM MyTable
)
SELECT CASE WHEN (GROUPING(event_month) = 0)
THEN event_month ELSE '99' END AS event_month,
CASE WHEN (GROUPING(event_date) = 1)
THEN '9999-12-31' ELSE event_date END AS event_date,
COUNT(DISTINCT event_type) AS event_count
FROM T
GROUP BY event_month, event_date WITH ROLLUP
ORDER BY event_month, event_date
This gives the following output:
event_month event_date event_count
1 2009-01-01 1
1 2009-01-11 2
1 2009-01-20 1
1 9999-12-31 4
5 2009-05-10 1
5 2009-05-23 2
5 9999-12-31 3
99 9999-12-31 7
Where the '99' for month and '9999-12-31' for year are the totals.
SELECT DATEDIFF(week, 0, DATE_SOLD) Week,
DATEADD(week, DATEDIFF(week, 0, DATE_SOLD), 0) From,
DATEADD(week, DATEDIFF(week, 0, DATE_SOLD), 0) + 6 To,
COUNT(DISTINCT PRODUCTS) PRODUCT_SOLD
FROM dbo.PRODUCTS
WHERE DATE >= '2009-10-01'
AND DATE < '2010-10-01'
GROUP BY DATEDIFF(week, 0, DATE_SOLD) WITH ROLLUP
ORDER BY DATEDIFF(week, 0, DATE_SOLD)