Oracle SQL Trending MTD Data - sql

I am trying to solve a trending problem at work very similar to the below example. I think I have a method but don't know how to do it in SQL.
The input data is:
MTD LOC_ID RAINED
1-Apr-16 1 Y
1-Apr-16 2 N
1-May-16 1 N
1-May-16 2 N
1-Jun-16 1 N
1-Jun-16 2 N
1-Jul-16 1 Y
1-Jul-16 2 N
1-Aug-16 1 N
1-Aug-16 2 Y
The desired output is:
MTD LOC_ID RAINED TRENDS
1-Apr-16 1 Y New
1-May-16 1 N No Rain
1-Jun-16 1 N No Rain
1-Jul-16 1 Y Carryover
1-Aug-16 1 N No Rain
1-Apr-16 2 N No Rain
1-May-16 2 N No Rain
1-Jun-16 2 N No Rain
1-Jul-16 2 N No Rain
1-Aug-16 2 Y New
I'm trying to produce the output from the input by trending on MTD without depending on it. This way, when new months are added to the input, the output changes without editing the query.
The logic for TRENDS will occur on each unique LOC_ID. Trends will have three values: "New" in the first month RAINED is "Y", "Carryover" in any following months where RAINED is "Y", and "No Rain" in any months where RAINED is "N".
I'd like to automate this problem by introducing an intermediate step with a listagg. For example, for LOC_ID = "1":
MTD LOC_ID RAINED PREV_RAINED
1-Apr-16 1 Y (null) / 0 / (I don't care)
1-May-16 1 N Y
1-Jun-16 1 N Y;N
1-Jul-16 1 Y Y;N;N
1-Aug-16 1 N Y;N;N;Y
This way, to produce "TRENDS" in the output, I can say:
case when RAINED = 'Y' then
case when not regexp_like(PREV_RAINED, 'Y', 'i') then
'New'
else
'Carryover'
end
else
'No Rain'
end as TRENDS
My problem is that I'm not sure how to produce PREV_RAINED for each unique LOC_ID. I have a feeling it needs to combine LAG() statements and partition by LOC_ID order by MTD, but the number of lags I need to do depends on each month.
Is there an easy way to produce PREV_RAINED or a simpler way to solve my overall problem while preserving automation each month?
Thanks for reading all of this! :)

In the below SQL there are two parts.
(i) Calculating the ROWNUMBER value for rained attribute at loc_id,rained level.
(ii) Get the count at partition level loc_id,rained.
By computing the above two we can write the CASE WHEN logic to calculate the trends based on your requirement.
SELECT mtd,
loc_id,
rained,
CASE WHEN rained = 'N' THEN 'No Rain'
WHEN rained = 'Y' AND rn = 1 THEN 'New'
ELSE 'Carry Over'
END AS Trends
FROM
(
SELECT mtd,
loc_id,
rained,
ROW_NUMBER() OVER ( PARTITION BY loc_id,rained ORDER BY mtd ) AS rn,
COUNT(*) OVER ( PARTITION BY loc_id,rained ) AS count_locid_rained
FROM INPUT
ORDER BY loc_id,mtd,rained,rn
) X;

Here is a solution for older versions. The WITH clause is for input data; the solution starts right after the WITH clause.
I'll work on a MATCH_RECOGNIZE solution next, I may add it to this answer.
with
input_data ( mtd, loc_id, rained ) as (
select to_date('1-Apr-16', 'dd-Mon-rr'), 1, 'Y' from dual union all
select to_date('1-Apr-16', 'dd-Mon-rr'), 2, 'N' from dual union all
select to_date('1-May-16', 'dd-Mon-rr'), 1, 'N' from dual union all
select to_date('1-May-16', 'dd-Mon-rr'), 2, 'N' from dual union all
select to_date('1-Jun-16', 'dd-Mon-rr'), 1, 'N' from dual union all
select to_date('1-Jun-16', 'dd-Mon-rr'), 2, 'N' from dual union all
select to_date('1-Jul-16', 'dd-Mon-rr'), 1, 'Y' from dual union all
select to_date('1-Jul-16', 'dd-Mon-rr'), 2, 'N' from dual union all
select to_date('1-Aug-16', 'dd-Mon-rr'), 1, 'N' from dual union all
select to_date('1-Aug-16', 'dd-Mon-rr'), 2, 'Y' from dual
)
select mtd, loc_id, rained,
case rained when 'N' then 'No Rain'
else case when rn = 1 then 'New'
else 'Carryover' end
end as trends
from ( select mtd, loc_id, rained,
row_number() over (partition by loc_id, rained order by mtd) rn
from input_data
)
order by loc_id, mtd
;
Output
MTD LOC_ID RAINED TRENDS
------------------- ---------- ------ ---------
01/04/2016 00:00:00 1 Y New
01/05/2016 00:00:00 1 N No Rain
01/06/2016 00:00:00 1 N No Rain
01/07/2016 00:00:00 1 Y Carryover
01/08/2016 00:00:00 1 N No Rain
01/04/2016 00:00:00 2 N No Rain
01/05/2016 00:00:00 2 N No Rain
01/06/2016 00:00:00 2 N No Rain
01/07/2016 00:00:00 2 N No Rain
01/08/2016 00:00:00 2 Y New
10 rows selected

Solution using MATCH_RECOGNIZE (for Oracle 12c only). Test the different solutions on your dataset; I am told that MATCH_RECOGNIZE may be significantly faster than other solutions, but this depends on many factors.
select loc_id, mtd, rained, trends
from input_data
match_recognize (
partition by loc_id, rained
order by mtd
measures mtd as mtd,
case when rained = 'N' then 'No Rain'
else case when match_number() = 1 then 'New' else 'Carryover' end
end as trends
pattern (a)
define a as 0 = 0
)
order by loc_id, mtd;

Related

SQL Server : create summarization based on multiple dates

I have the following table containing positions for workers dated back by 10 years:
worker_id
position_code
date_from
date_to
1
x1
2021-01-01
2100-12-31
1
x2
2020-12-01
2021-01-01
2
x3
2000-01-01
2100-12-31
I want to create a view, where I can see for each worker what their position for every month.
So for example:
year
month
worker_id
position_code
2020
12
1
x2
2020
12
2
x3
2021
1
1
x1
2021
1
2
x3
2021
2
1
x1
Ideally I'm only interested on the last 6 month to have better performance.
overall there is ~10000 workers, and the table itself around ~100000 lines.
for some workers there is only 1 position, but it can be multiple.
In theory position is only changing at the beginning of months, but would be better to watch for this as well, and in this case take the which is active at the end of the month.
(so for example: from jan 1-10 position is x1, from jan 10-to 31 x2, in this case x2 is the one I'm looking for)
WITH WORKERS(worker_id, position_code, date_from, date_to) AS
(
SELECT 1 , 'x1', '2021-01-01', '2100-12-31' UNION ALL
SELECT 1 , 'x2' , '2020-12-01', '2021-01-01' UNION ALL
SELECT 2 , 'x3' , '2000-01-01' , '2100-12-31'
),
MINI_MAX AS
(
SELECT MIN(DATE_FROM)AS STARTT_DATE,MAX(DATE_TO)AS END_DATE
FROM WORKERS
),
CALENDAR AS
(
SELECT CAST(STARTT_DATE AS DATE)DATE_D FROM MINI_MAX AS W
UNION ALL
SELECT DATEADD(MONTH,1,Z.DATE_D)
FROM CALENDAR AS Z
WHERE Z.DATE_D<=(SELECT END_DATE FROM MINI_MAX)
),
RESULT AS
(
SELECT YEAR(C.DATE_D)AS YEARR,MONTH(C.DATE_D)MONTHH,W.worker_id,W.position_code
FROM CALENDAR AS C
JOIN WORKERS AS W ON C.DATE_D BETWEEN W.date_from AND W.date_to
)
SELECT R.YEARR,R.MONTHH,R.worker_id,R.position_code
FROM RESULT AS R
OPTION(MAXRECURSION 0)
I would say that the most suitable way for this kind of queries is to use a permanent calendar table and perform JOIN directly to it
The hard part is generating the months. One method is a recursive CTE:
with cte as (
select worker_id, position_code, date_from as dte,
eomonth(case when date_to < eomonth(getdate()) then dateadd(day, -1, date_to) else getdate() end) as date_to
from t
union all
select worker_id, position_code,
dateadd(month, 1, datefromparts(year(dte), month(dte), 1)), date_to
from cte
where eomonth(dte) < eomonth(date_to)
)
select *
from cte
order by worker_id, dte desc
option (maxrecursion 0)
Note: You might get duplicates if a worker starts a position in the middle of a month.
Here is a db<>fiddle.

Oracle SQL recursive adding values

I have the following data in the table
Period Total_amount R_total
01/01/20 2 2
01/02/20 5 null
01/03/20 3 null
01/04/20 8 null
01/05/20 31 null
Based on the above data I would like to have the following situation.
Period Total_amount R_total
01/01/20 2 2
01/02/20 5 3
01/03/20 3 0
01/04/20 8 8
01/05/20 31 23
Additional data
01/06/20 21 0 (previously it would be -2)
01/07/20 25 25
01/08/20 29 4
Pattern to the additional data is:
if total_amount < previous(r_total) then 0
Based on the filled data, we can spot the pattern is:
R_total = total_amount - previous(R_total)
Could you please help me out with this issue?
As Gordon Linoff suspected, it is possible to solve this problem with analytic functions. The benefit is that the query will likely be much faster. The price to pay for that benefit is that you need to do a bit of math beforehand (before ever thinking about "programming" and "computers").
A bit of elementary arithmetic shows that R_TOTAL is an alternating sum of TOTAL_AMOUNT. This can be arranged easily by using ROW_NUMBER() (to get the signs) and then an analytic SUM(), as shown below.
Table setup:
create table sample_data (period, total_amount) as
select to_date('01/01/20', 'mm/dd/rr'), 2 from dual union all
select to_date('01/02/20', 'mm/dd/rr'), 5 from dual union all
select to_date('01/03/20', 'mm/dd/rr'), 3 from dual union all
select to_date('01/04/20', 'mm/dd/rr'), 8 from dual union all
select to_date('01/05/20', 'mm/dd/rr'), 31 from dual
;
Query and result:
with
prep (period, total_amount, sgn) as (
select period, total_amount,
case mod(row_number() over (order by period), 2) when 0 then 1 else -1 end
from sample_data
)
select period, total_amount,
sgn * sum(sgn * total_amount) over (order by period) as r_total
from prep
;
PERIOD TOTAL_AMOUNT R_TOTAL
-------- ------------ ----------
01/01/20 2 2
01/02/20 5 3
01/03/20 3 0
01/04/20 8 8
01/05/20 31 23
This may be possible with window functions, but the simplest method is probably a recursive CTE:
with t as (
select t.*, row_number() over (order by period) as seqnum
from yourtable t
),
cte(period, total_amount, r_amount, seqnum) as (
select period, total_amount, r_amount, seqnum
from t
where seqnum = 1
union all
select t.period, t.total_amount, t.total_amount - cte.r_amount, t.seqnum
from cte join
t
on t.seqnum = cte.seqnum + 1
)
select *
from cte;
This question explicitly talks about "recursively" adding values. If you want to solve this using another mechanism, you might explain the logic in detail and ask if there is a non-recursive CTE solution.

Find From/To Dates across multiple rows - SQL Postgres

I want to be able to "book" within range of dates, but you can't book across gaps of days. So booking across multiple rates is fine as long as they are contiguous.
I am happy to change data structure/index, if there are better ways of storing start/end ranges.
So far I have a "rates" table which contains Start/End Periods of time with a daily rate.
e.g. Rates Table.
ID Price From To
1 75.00 2015-04-12 2016-04-15
2 100.00 2016-04-16 2016-04-17
3 50.00 2016-04-18 2016-04-30
For the above data I would want to return:
From To
2015-04-12 2016-4-30
For simplicity sake it is safe to assume that dates are safely consecutive. For contiguous dates To is always 1 day before from.
For the case there is only 1 row, I would want it to return the From/To of that single row.
Also to clarify if I had the following data:
ID Price From To
1 75.00 2015-04-12 2016-04-15
2 100.00 2016-04-17 2016-04-18
3 50.00 2016-04-19 2016-04-30
4 50.00 2016-05-01 2016-05-21
Meaning where there is a gap >= 1 day it would count as a separate range.
In which case I would expect the following:
From To
2015-04-12 2016-04-15
2015-04-17 2016-05-21
Edit 1
After playing around I have come up with the following SQL which seems to work. Although I'm not sure if there are better ways/issues with it?
WITH grouped_rates AS
(SELECT
from_date,
to_date,
SUM(grp_start) OVER (ORDER BY from_date, to_date) group
FROM (SELECT
gite_id,
from_date,
to_date,
CASE WHEN (from_date - INTERVAL '1 DAY') = lag(to_date)
OVER (ORDER BY from_date, to_date)
THEN 0
ELSE 1
END grp_start
FROM rates
GROUP BY from_date, to_date) AS start_groups)
SELECT
min(from_date) from_date,
max(to_date) to_date
FROM grouped_rates
GROUP BY grp;
This is identifying contiguous overlapping groups in the data. One approach is to find where each group begins and then do a cumulative sum. The following query adds a flag indicating if a row starts a group:
select r.*,
(case when not exists (select 1
from rates r2
where r2.from < r.from and r2.to >= r.to or
(r2.from = r.from and r2.id < r.id)
)
then 1 else 0 end) as StartFlag
from rate r;
The or in the correlation condition is to handle the situation where intervals that define a group overlap on the start date for the interval.
You can then do a cumulative sum on this flag and aggregate by that sum:
with r as (
select r.*,
(case when not exists (select 1
from rates r2
where (r2.from < r.from and r2.to >= r.to) or
(r2.from = r.from and r2.id < r.id)
)
then 1 else 0 end) as StartFlag
from rate r
)
select min(from), max(to)
from (select r.*,
sum(r.StartFlag) over (order by r.from) as grp
from r
) r
group by grp;
CREATE TABLE prices( id INTEGER NOT NULL PRIMARY KEY
, price MONEY
, date_from DATE NOT NULL
, date_upto DATE NOT NULL
);
-- some data (upper limit is EXCLUSIVE)
INSERT INTO prices(id, price, date_from, date_upto) VALUES
( 1, 75.00, '2015-04-12', '2016-04-16' )
,( 2, 100.00, '2016-04-17', '2016-04-19' )
,( 3, 50.00, '2016-04-19', '2016-05-01' )
,( 4, 50.00, '2016-05-01', '2016-05-22' )
;
-- SELECT * FROM prices;
-- Recursive query to "connect the dots"
WITH RECURSIVE rrr AS (
SELECT date_from, date_upto
, 1 AS nperiod
FROM prices p0
WHERE NOT EXISTS (SELECT * FROM prices nx WHERE nx.date_upto = p0.date_from) -- no preceding segment
UNION ALL
SELECT r.date_from, p1.date_upto
, 1+r.nperiod AS nperiod
FROM prices p1
JOIN rrr r ON p1.date_from = r.date_upto
)
SELECT * FROM rrr r
WHERE NOT EXISTS (SELECT * FROM prices nx WHERE nx.date_from = r.date_upto) -- no following segment
;
Result:
date_from | date_upto | nperiod
------------+------------+---------
2015-04-12 | 2016-04-16 | 1
2016-04-17 | 2016-05-22 | 3
(2 rows)

Finding Avg of following dataset

Following is the data.
select * from (
select to_date('20140601','YYYYMMDD') log_date, null weight from dual
union
select to_date('20140601','YYYYMMDD')+1 log_date, 0 weight from dual
union
select to_date('20140601','YYYYMMDD')+2 log_date, 4 weight from dual
union
select to_date('20140601','YYYYMMDD')+3 log_date, 4 weight from dual
union
select to_date('20140601','YYYYMMDD')+4 log_date, null weight from dual
union
select to_date('20140601','YYYYMMDD')+5 log_date, 8 weight from dual);
Log_date weight avg_weight
----------------------------------
6/1/2014 NULL 0 (0/1) Since no previous data, I consider it as 0
6/2/2014 0 0 ((0+0)/2)
6/3/2014 4 4/3 ((0+0+4)/3)
6/4/2014 4 2 (0+0+4+4)/4
6/5/2014 NULL 2 (0+0+4+4+2)/5 Since it is NULL I want to take previous day avg = 2
6/6/2014 8 3 (0+0+4+4+2+8)/6 =3
So the average for the above data should be 3.
How can I achieve this in SQL instead of PLSQL. Appreciate any help on this.
I just learned how to use recursive CTEs today, really excited! Hope this helps...
; WITH RawData (log_Date, Weight) AS (
select cast('2014-06-01' as SMALLDATETIME)+0, null
UNION ALL select cast('2014-06-01' as SMALLDATETIME)+1, 0
UNION ALL select cast('2014-06-01' as SMALLDATETIME)+2, 4
UNION ALL select cast('2014-06-01' as SMALLDATETIME)+3, 4
UNION ALL select cast('2014-06-01' as SMALLDATETIME)+4, null
UNION ALL select cast('2014-06-01' as SMALLDATETIME)+5, 8
)
, IndexedData (Id, log_Date, Weight) AS (
SELECT ROW_NUMBER() OVER (ORDER BY log_Date)
, log_Date
, Weight
FROM RawData
)
, ResultData (Id, log_Date, Weight, total, avg_weight) AS (
SELECT Id
, log_Date
, Weight
, CAST(CASE WHEN Weight IS NULL THEN 0 ELSE Weight END AS FLOAT)
, CAST(CASE WHEN Weight IS NULL THEN 0 ELSE Weight END AS FLOAT)
FROM IndexedData
WHERE Id = 1
UNION ALL
SELECT i.Id
, i.log_Date
, i.Weight
, CAST(r.total + CASE WHEN i.Weight IS NULL THEN r.avg_weight ELSE i.Weight END AS FLOAT)
, CAST(r.total + CASE WHEN i.Weight IS NULL THEN r.avg_weight ELSE i.Weight END AS FLOAT) / i.Id
FROM ResultData r
JOIN IndexedData i ON i.Id = r.Id + 1
)
SELECT Log_Date, Weight, avg_weight FROM ResultData
OPTION (MAXRECURSION 0)
This gives the output:
Log_Date Weight avg_weight
----------------------- ----------- ----------------------
2014-06-01 00:00:00 NULL 0
2014-06-02 00:00:00 0 0
2014-06-03 00:00:00 4 1.33333333333333
2014-06-04 00:00:00 4 2
2014-06-05 00:00:00 NULL 2
2014-06-06 00:00:00 8 3
Note that in my answer, I modified the "Data" section of your question as it didn't compile for me. It's still the same data though, hope it helps.
Edit: By default, MAXRECURSION is set to 100. This means that the query will not work for more than 101 rows of Raw Data. By adding the OPTION (MAXRECURSION 0), I have removed this limit so that the query works for all input data. However, this can be dangerous if the query isn't tested thoroughly because it might lead to infinite recursion.

Query the Minimum Value per day within a month's worth of data

I have two sets of pricing data (A and B). Set A consists of all of my pricing data per order over a month. Set B consists of all of my competitor's pricing data over the same month. I want to compare my competitor's lowest price to each of my prices per day.
Graphically, the data appears like this:
Date:-- Set A: -- Set B:
1---------25---------31
1---------54---------47
1---------23---------56
1---------12---------23
1---------76---------40
1---------42
I want pass only the lowest price to a case statement which evaluates which prices are better. I would like to process an entire month's worth of data all at one time, so in my example, Dates 1 thru 30(1) would be included and crunched all at once, and for each day, there would only be one value from set B included: the lowest price in the set.
Important notes: Set B does not have a datapoint for each point in Set A
Hopefully this makes sense. Thanks in advance for any help you may be able to render.
That's a strange example you have - do you really have prices ranging from 12 to 76 within a single day?
Anyway, left joining your (grouped) data with their (grouped) data should work (untested):
with
my_prices as (
select price_date, min(price_value) min_price from my_prices group by price_date),
their_prices as (
select price_date, min(price_value) min_price from their_prices group by price_date)
select
mine.price_date,
(case
when theirs.min_price is null then mine.min_price
when theirs.min_price >= mine.min_price then mine.min_price
else theirs.min_price
end) min_price
from
my_min_prices mine
left join their_prices theirs on mine.price_date = theirs.price_date
I'm still not sure that I understand your requirements. My best guess is that you want something like
SQL> ed
Wrote file afiedt.buf
1 with your_data as (
2 select 1 date_id, 25 price_a,31 price_b from dual
3 union all
4 select 1, 54, 47 from dual union all
5 select 1, 23, 56 from dual union all
6 select 1, 12, 23 from dual union all
7 select 1, 76, 40 from dual union all
8 select 1, 42, null from dual)
9 select date_id,
10 sum( case when price_a < min_price_b
11 then 1
12 else 0
13 end) better,
14 sum( case when price_a = min_price_b
15 then 1
16 else 0
17 end) tie,
18 sum( case when price_a > min_price_b
19 then 1
20 else 0
21 end) worse
22 from( select date_id,
23 price_a,
24 min(price_b) over (partition by date_id) min_price_b
25 from your_data )
26* group by date_id
SQL> /
DATE_ID BETTER TIE WORSE
---------- ---------- ---------- ----------
1 1 1 4