thanks to everyone who took the time to comment and answer.
-
I have a price history table like that (pseudocode):
table price_history (
product_id,
price,
changed_date
)
in which the historical prices of some products are stored:
1, 1.0, '2017-12-18'
1, 1.2, '2017-12-20'
1, 0.9, '2018-04-20'
1, 1.1, '2018-07-20'
1, 1.3, '2018-07-22'
2, 10.0, '2017-12-15'
2, 11.0, '2017-12-16'
2, 9.9, '2018-01-02'
2, 10.3, '2018-04-04
Now I want the prices of some products within a certain period. Eg. between 2018-01-01 and now.
The simple approach:
SELECT * FROM price_history
WHERE product_id in (1,2) AND changed_date >= 2018-01-01
is not ok, since the individual price for each product from 2018-01-01 until the first price change is not included:
1, 0.9, '2018-04-20'
1, 1.1, '2018-07-20'
1, 1.3, '2018-07-22'
2, 9.9, '2018-01-02'
2, 10.3, '2018-04-04
But it is crucial to know the prices from the start of the period.
So, in addition to the price changes within the period, the last change before must also included.
The result should be like so:
1, 1.2, '2017-12-20'
1, 0.9, '2018-04-20'
1, 1.1, '2018-07-20'
1, 1.3, '2018-07-22'
2, 11.0, '2017-12-16'
2, 9.9, '2018-01-02'
2, 10.3, '2018-04-04
Q: how to specify such a select statement?
Edit:
The test scenario and the solution from Ajay Gupta
CREATE TABLE price_history (
product_id integer,
price float,
changed_date timestamp
);
INSERT INTO price_history (product_id,price,changed_date) VALUES
(1, 1.0, '2017-12-18'),
(1, 1.2, '2017-12-20'),
(1, 0.9, '2018-04-20'),
(1, 1.1, '2018-07-20'),
(1, 1.3, '2018-07-22'),
(2, 10.0, '2017-12-15'),
(2, 11.0, '2017-12-16'),
(2, 9.9, '2018-01-02'),
(2, 10.3, '2018-04-04');
Winning Select:
with cte1 as
(Select *, lag(changed_date,1,'01-01-1900')
over(partition by product_id order by changed_date)
as FromDate from price_history),
cte2 as (Select product_id, max(FromDate)
as changed_date from cte1
where '2018-01-01'
between FromDate and changed_date group by product_id)
Select p.* from price_history p
join cte2 c on p.product_id = c.product_id
where p.changed_date >= c.changed_date
order by product_id,changed_date;
Result:
product_id | price | changed_date
------------+-------+---------------------
1 | 1.2 | 2017-12-20 00:00:00
1 | 0.9 | 2018-04-20 00:00:00
1 | 1.1 | 2018-07-20 00:00:00
1 | 1.3 | 2018-07-22 00:00:00
2 | 11 | 2017-12-16 00:00:00
2 | 9.9 | 2018-01-02 00:00:00
2 | 10.3 | 2018-04-04 00:00:00
I must admit, this is way beyond my limited (PG-)SQL skills.
Using Lag and cte
with cte1 as (
Select *,
lag(changed_date,1,'01-01-1900') over(partition by product_id order by changed_date) as FromDate
from price_history
), cte2 as (
Select product_id, max(FromDate) as changed_date
from cte1
where '2018-01-01' between FromDate and changed_date
group by product_id
)
Select p.*
from price_history p
join cte2 c on p.product_id = c.product_id
where p.changed_date >= c.changed_date;
I guess this is what you are looking for
SELECT Top 1 * FROM price_history WHERE product_id in (1,2) AND changed_date < 2018-01-01
UNION ALL
SELECT * FROM price_history WHERE product_id in (1,2) AND changed_date >= 2018-01-01
You need 1st change date and all other date >"2018-01-01"
select product_id,price, changed_date
from
(
select product_id,price, changed_date,
row_number() over(partition by product_id order by changed_date ) as rn
from price_history
) x
where x.rn = 2 and product_id in (1,2);
union all
select product_id,price, changed_datefrom from price_history
where product_id in (1,2) and changed_date >= '2018-01-01'
If you did have the option to change your table structure, a different approach would be to have both start_date and end_date in your table, this way your records would not depend on prev/next row and your query becomes easier to write. See Slowly changing dimension - Type 2
If you want to solve the problem with existing structure, in PostgresSQL you can use LIMIT 1 to get latest record before changed_date:
SELECT
*
FROM
price_history
WHERE
product_id in (1,2)
AND changed_date >= '2018-01-01'
UNION ALL
-- this would give you the latest price before changed_date
SELECT
*
FROM
price_history
WHERE
product_id in (1,2)
AND changed_date < '2018-01-01'
ORDER BY
changed_date DESC
LIMIT 1
The solution with union is still simpler but not realized correctly in other answers. So:
SELECT * FROM price_history
WHERE product_id in (1,2) AND changed_date >= '2018-01-01'
union all
(
select distinct on (product_id)
*
from price_history
where product_id in (1,2) AND changed_date < '2018-01-01'
order by product_id, changed_date desc)
order by product_id, changed_date;
Demo
Related
I want to show cases of clients with at least 2 purchases on the same day. But I only want to count those purchases that were made in different stores.
So far I have:
Select Purchase.PurClientId, Purchase.PurDate, Purchase.PurId
from Purchase
join
(
Select count(Purchase.PurId),
Purchase.PurClientId,
to_date(Purchase.PurDate)
from Purchases
group by Purchase.PurClientId,
to_date(Purchase.PurDate)
having count (Purchase.PurId) >=2
) k
on k.PurClientId=Purchase.PurClientId
But I have no clue how to make it count purchases only if those were made in different stores. The column which would allow to identify shop is Purchase.PurShopId.
Thanks for help!
You can use:
SELECT PurId,
PurDate,
PurClientId,
PurShopId
FROM (
SELECT p.*,
COUNT(DISTINCT PurShopId) OVER (
PARTITION BY PurClientId, TRUNC(PurDate)
) AS num_stores
FROM Purchase p
)
WHERE num_stores >= 2;
Or
SELECT *
FROM Purchase p
WHERE EXISTS(
SELECT 1
FROM Purchase x
WHERE p.purclientid = x.purclientid
AND p.purshopid != x.purshopid
AND TRUNC(p.purdate) = TRUNC(x.purdate)
);
Which, for the sample data:
CREATE TABLE purchase (
purid PRIMARY KEY,
purdate,
purclientid,
PurShopId
) AS
SELECT 1, DATE '2021-01-01', 1, 1 FROM DUAL UNION ALL
SELECT 2, DATE '2021-01-02', 1, 1 FROM DUAL UNION ALL
SELECT 3, DATE '2021-01-02', 1, 2 FROM DUAL UNION ALL
SELECT 4, DATE '2021-01-03', 1, 1 FROM DUAL UNION ALL
SELECT 5, DATE '2021-01-03', 1, 1 FROM DUAL UNION ALL
SELECT 6, DATE '2021-01-04', 1, 2 FROM DUAL;
Both output:
PURID
PURDATE
PURCLIENTID
PURSHOPID
2
2021-01-02 00:00:00
1
1
3
2021-01-02 00:00:00
1
2
db<>fiddle here
I have this table that contains sales by stores & date.
-------------------------------------------
P_DATE - P_STORE - P_SALES
-------------------------------------------
2019-02-05 - S1 - 5000
2019-02-05 - S2 - 9850
2018-06-17 - S1 - 6980
2018-05-17 - S2 - 6590
..
..
..
-------------------------------------------
I want to compare Sum of sales for each store of last 10 weeks of this year with same week of previous years.
I want a result like this :
---------------------------------------------------
Week - Store - Sales-2019 - Sales2018
---------------------------------------------------
20 - S1 - 2580 - 2430
20 - S2 - 2580 - 2430
.
.
10 - S1 - 5905 - 5214
10 - S2 - 4789 - 6530
---------------------------------------------------
I'v tried this :
Select
[Week] = DATEPART(WEEK, E_Date),
[Store] = E_store
[Sales 2019] = Case when Year(P_date) = '2019' Then Sum (P_Sales)
[Sales 2018] = Case when Year(P_date) = '2018' Then Sum (P_Sales)
From
PIECE
Group by
DATEPART(WEEK, E_Date),
E_store
I need your help please.
This script will consider 10 weeks including current week-
WITH wk_list (COMMON,DayMinus)
AS
(
SELECT 1,0 UNION ALL
SELECT 1,1 UNION ALL
SELECT 1,2 UNION ALL
SELECT 1,3 UNION ALL
SELECT 1,4 UNION ALL
SELECT 1,5 UNION ALL
SELECT 1,6 UNION ALL
SELECT 1,7 UNION ALL
SELECT 1,8 UNION ALL
SELECT 1,9
)
SELECT
DATEPART(ISO_WEEK, P_DATE) WK,
P_STORE,
SUM(CASE WHEN YEAR(P_DATE) = 2019 THEN P_SALES ELSE 0 END) SALES_2019,
SUM(CASE WHEN YEAR(P_DATE) = 2018 THEN P_SALES ELSE 0 END) SALES_2018
FROM your_table
WHERE YEAR(P_DATE) IN (2019,2018)
AND DATEPART(ISO_WEEK, P_DATE) IN
(
SELECT A.WKNUM-wk_list.DayMinus AS [WEEK NUMBER]
FROM wk_list
INNER JOIN (
SELECT 1 AS COMMON,DATENAME(ISO_WEEK,GETDATE()) WKNUM
) A ON wk_list.COMMON = A.COMMON
)
GROUP BY DATEPART(ISO_WEEK, P_DATE),P_STORE
But if you want to exclude current week, just replace the following part in above script
, wk_list (COMMON,DayMinus)
AS
(
SELECT 1,1 UNION ALL
SELECT 1,2 UNION ALL
SELECT 1,3 UNION ALL
SELECT 1,4 UNION ALL
SELECT 1,5 UNION ALL
SELECT 1,6 UNION ALL
SELECT 1,7 UNION ALL
SELECT 1,8 UNION ALL
SELECT 1,9 UNION ALL
SELECT 1,10
)
Is this what you're looking for?
DECLARE #t TABLE (TransactionID INT, Week INT, Year INT, Amount MONEY)
INSERT INTO #t
(TransactionID, Week, Year, Amount)
VALUES
(1, 20, 2018, 50),
(2, 20, 2019, 20),
(3, 19, 2018, 35),
(4, 19, 2019, 40),
(5, 20, 2018, 70),
(6, 20, 2019, 80)
SELECT TOP 10 Week, [2018], [2019] FROM (SELECT Week, Year, SUM(Amount) As Amount FROM #t GROUP BY Week, Year) t
PIVOT
(
SUM(Amount)
FOR Year IN ([2018], [2019])
) sq
ORDER BY Week DESC
I have a SQL table that contains employeeid, StartDateTime and EndDatetime as follows:
CREATE TABLE Sample
(
SNO INT,
EmployeeID NVARCHAR(10),
StartDateTime DATE,
EndDateTime DATE
)
INSERT INTO Sample
VALUES
( 1, 'xyz', '2018-01-01', '2018-01-02' ),
( 2, 'xyz', '2018-01-03', '2018-01-05' ),
( 3, 'xyz', '2018-01-06', '2018-02-01' ),
( 4, 'xyz', '2018-02-15', '2018-03-15' ),
( 5, 'xyz', '2018-03-16', '2018-03-19' ),
( 6, 'abc', '2018-01-16', '2018-02-25' ),
( 7, 'abc', '2018-03-08', '2018-03-19' ),
( 8, 'abc', '2018-02-26', '2018-03-01' )
I want the result to be displayed as
EmployeeID | StartDateTime | EndDateTime
------------+-----------------+---------------
xyz | 2018-01-01 | 2018-02-01
xyz | 2018-02-15 | 2018-03-19
abc | 2018-01-16 | 2018-03-01
abc | 2018-03-08 | 2018-03-19
Basically, I want to recursively look at records of each employee and datemine the continuity of Start and EndDates and make a set of continuous date records.
I wrote my query as follows:
SELECT *
FROM dbo.TestTable T1
LEFT JOIN dbo.TestTable t2 ON t2.EmpId = T1.EmpId
WHERE t1.EndDate = DATEADD(DAY, -1, T2.startdate)
to see if I could decipher something from the output looking for a pattern. Later realized that with the above approach, I need to join the same table multiple times to get the output I desire.
Also, there is a case that there can be multiple employee records, so I need direction on efficient way of getting this desired output.
Any help is greatly appreciated.
This will do it for you. Use a recursive CTE to get all the adjacent rows, then get the highest end date for each start date, then the first start date for each end date.
;with cte as (
select EmployeeID, StartDateTime, EndDateTime
from sample s
union all
select CTE.EmployeeID, CTE.StartDateTime, s.EndDateTime
from sample s
join cte on cte.EmployeeID=s.EmployeeID and s.StartDateTime=dateadd(d,1,CTE.EndDateTime)
)
select EmployeeID, Min(StartDateTime) as StartDateTime, EndDateTime from (
select EmployeeID, StartDateTime, Max(EndDateTime) as EndDateTime from cte
group by EmployeeID, StartDateTime
) q group by EmployeeID, EndDateTime
You can use this.
WITH T AS (
SELECT S1.SNO,
S1.EmployeeID,
S1.StartDateTime,
ISNULL(S2.EndDateTime, S1.EndDateTime) EndDateTime,
ROW_NUMBER() OVER(PARTITION BY S1.EmployeeId ORDER BY S1.StartDateTime)
- ROW_NUMBER() OVER(PARTITION BY S1.EmployeeId, CASE WHEN S2.StartDateTime IS NULL THEN 0 ELSE 1 END ORDER BY S1.StartDateTime ) RN,
ROW_NUMBER() OVER(PARTITION BY S1.EmployeeId, ISNULL(S2.EndDateTime, S1.EndDateTime) ORDER BY S1.EmployeeId, S1.StartDateTime) RN_END
FROM Sample S1
LEFT JOIN Sample S2 ON DATEADD(DAY,1,S1.EndDateTime) = S2.StartDateTime
)
SELECT EmployeeID, MIN(StartDateTime) StartDateTime,MAX(EndDateTime) EndDateTime FROM T
WHERE RN_END = 1
GROUP BY EmployeeID, RN
ORDER BY EmployeeID DESC, StartDateTime
Result:
EmployeeID StartDateTime EndDateTime
---------- ------------- -----------
xyz 2018-01-01 2018-02-01
xyz 2018-02-15 2018-03-19
abc 2018-01-16 2018-03-01
abc 2018-03-08 2018-03-19
Given I have multiple tables in BigQuery, hence I have multiple SQL-statements that gives me "the number of X per day". For example:
SELECT FORMAT_TIMESTAMP("%F",timestamp) AS day, COUNT(*) as installs
FROM database.table1
GROUP BY day
ORDER BY day ASC
Which would give the result:
| day | installs |
-------------------------
| 2017-01-01 | 11 |
| 2017-01-02 | 22 |
etc
Another statement:
SELECT FORMAT_TIMESTAMP("%F",timestamp) AS day, COUNT(*) as uninstalls
FROM database.table2
GROUP BY day
ORDER BY day ASC
Which would give the result:
| day | uninstalls |
---------------------------
| 2017-01-02 | 22 |
| 2017-01-03 | 33 |
etc
Another statement:
SELECT FORMAT_TIMESTAMP("%F",timestamp) AS day, COUNT(*) as cases
FROM database.table3
GROUP BY day
ORDER BY day ASC
Which would give the result:
| day | cases |
----------------------
| 2017-01-01 | 11 |
| 2017-01-03 | 33 |
etc
etc
Now I need to combine all these into a single SELECT statement that gives the following results:
| day | installs | uninstalls | cases |
----------------------------------------------
| 2017-01-01 | 11 | 0 | 11 |
| 2017-01-02 | 22 | 22 | 0 |
| 2017-01-03 | 0 | 33 | 33 |
etc
Is this even possible?
Or what's the closest SQL-statement I can write that would give me a similar result?
Any feedback is appreciated!
Here is a self-contained example that might help to get you started. It uses two dummy tables, InstallEvents and UninstallEvents, which contain timestamps for the respective actions. It creates a common table expression called StartAndEnd that computes the minimum and maximum dates for these events in order to decide which dates to aggregate over, then unions the contents of the InstallEvents and UninstallEvents, counting the events for each day.
WITH InstallEvents AS (
SELECT TIMESTAMP_ADD('2017-01-01 00:00:00', INTERVAL x HOUR) AS timestamp
FROM UNNEST(GENERATE_ARRAY(0, 100)) AS x
),
UninstallEvents AS (
SELECT TIMESTAMP_ADD('2017-01-02 00:00:00', INTERVAL 2 * x HOUR) AS timestamp
FROM UNNEST(GENERATE_ARRAY(0, 50)) AS x
),
StartAndEnd AS (
SELECT MIN(DATE(timestamp)) AS min_date, MAX(DATE(timestamp)) AS max_date
FROM (
SELECT * FROM InstallEvents UNION ALL
SELECT * FROM UninstallEvents
)
)
SELECT
day,
COUNTIF(is_install AND DATE(timestamp) = day) AS installs,
COUNTIF(NOT is_install AND DATE(timestamp) = day) AS uninstalls
FROM (
SELECT *, true AS is_install
FROM InstallEvents UNION ALL
SELECT *, false
FROM UninstallEvents
)
CROSS JOIN UNNEST(GENERATE_DATE_ARRAY(
(SELECT min_date FROM StartAndEnd),
(SELECT max_date FROM StartAndEnd)
)) AS day
GROUP BY day
ORDER BY day;
If you know what the start and end dates are in advance, you can hard-code them in the query instead and then omit the StartAndEnd CTE:
WITH InstallEvents AS (
SELECT TIMESTAMP_ADD('2017-01-01 00:00:00', INTERVAL x HOUR) AS timestamp
FROM UNNEST(GENERATE_ARRAY(0, 100)) AS x
),
UninstallEvents AS (
SELECT TIMESTAMP_ADD('2017-01-02 00:00:00', INTERVAL 2 * x HOUR) AS timestamp
FROM UNNEST(GENERATE_ARRAY(0, 50)) AS x
)
SELECT
day,
COUNTIF(is_install AND DATE(timestamp) = day) AS installs,
COUNTIF(NOT is_install AND DATE(timestamp) = day) AS uninstalls
FROM (
SELECT *, true AS is_install
FROM InstallEvents UNION ALL
SELECT *, false
FROM UninstallEvents
)
CROSS JOIN UNNEST(GENERATE_DATE_ARRAY('2017-01-01', '2017-01-04')) AS day
GROUP BY day
ORDER BY day;
To see the events in the sample data, use a query that unions the contents:
WITH InstallEvents AS (
SELECT TIMESTAMP_ADD('2017-01-01 00:00:00', INTERVAL x HOUR) AS timestamp
FROM UNNEST(GENERATE_ARRAY(0, 100)) AS x
),
UninstallEvents AS (
SELECT TIMESTAMP_ADD('2017-01-02 00:00:00', INTERVAL 2 * x HOUR) AS timestamp
FROM UNNEST(GENERATE_ARRAY(0, 50)) AS x
)
SELECT timestamp, true AS is_install
FROM InstallEvents UNION ALL
SELECT timestamp, false
FROM UninstallEvents;
Below is for BigQuery Standard SQL
#standardSQL
WITH calendar AS (
SELECT day
FROM (
SELECT MIN(min_day) AS min_day, MAX(max_day) AS max_day
FROM (
SELECT MIN(DATE(timestamp)) AS min_day, MAX(DATE(timestamp)) AS max_day FROM `database.table1` UNION ALL
SELECT MIN(DATE(timestamp)) AS min_day, MAX(DATE(timestamp)) AS max_day FROM `database.table2` UNION ALL
SELECT MIN(DATE(timestamp)) AS min_day, MAX(DATE(timestamp)) AS max_day FROM `database.table3`
)
), UNNEST(GENERATE_DATE_ARRAY(min_day, max_day, INTERVAL 1 DAY)) AS day
)
SELECT
c.day AS day,
IFNULL(SUM(installs), 0) AS installs,
IFNULL(SUM(uninstalls), 0) AS uninstalls,
IFNULL(SUM(cases),0) AS cases
FROM calendar AS c
LEFT JOIN (SELECT DATE(timestamp) day, COUNT(1) installs FROM `database.table1` GROUP BY day) t1 ON t1.day = c.day
LEFT JOIN (SELECT DATE(timestamp) day, COUNT(1) uninstalls FROM `database.table2` GROUP BY day) t2 ON t2.day = c.day
LEFT JOIN (SELECT DATE(timestamp) day, COUNT(1) cases FROM `database.table3` GROUP BY day) t3 ON t3.day = c.day
GROUP BY day
HAVING installs + uninstalls + cases > 0
-- ORDER BY day
Please note: you are using timestamp as a column name which is not the best practice as it is keyword, so in my example i leave your naming but consider to change this!
You can test / play this solution with below dummy data
#standardSQL
WITH `database.table1` AS (
SELECT TIMESTAMP '2017-01-01' AS timestamp, 1 AS installs
UNION ALL SELECT TIMESTAMP '2017-01-01', 22
),
`database.table2` AS (
SELECT TIMESTAMP '2016-12-01' AS timestamp, 1 AS installs UNION ALL SELECT TIMESTAMP '2017-01-01', 22 UNION ALL SELECT TIMESTAMP '2017-01-01', 22 UNION ALL
SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22
),
`database.table3` AS (
SELECT TIMESTAMP '2017-01-01' AS timestamp, 1 AS installs UNION ALL SELECT TIMESTAMP '2017-01-01', 22 UNION ALL SELECT TIMESTAMP '2017-01-01', 22 UNION ALL
SELECT TIMESTAMP '2017-01-10', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22
),
calendar AS (
SELECT day
FROM (
SELECT MIN(min_day) AS min_day, MAX(max_day) AS max_day
FROM (
SELECT MIN(DATE(timestamp)) AS min_day, MAX(DATE(timestamp)) AS max_day FROM `database.table1` UNION ALL
SELECT MIN(DATE(timestamp)) AS min_day, MAX(DATE(timestamp)) AS max_day FROM `database.table2` UNION ALL
SELECT MIN(DATE(timestamp)) AS min_day, MAX(DATE(timestamp)) AS max_day FROM `database.table3`
)
), UNNEST(GENERATE_DATE_ARRAY(min_day, max_day, INTERVAL 1 DAY)) AS day
)
SELECT
c.day AS day,
IFNULL(SUM(installs), 0) AS installs,
IFNULL(SUM(uninstalls), 0) AS uninstalls,
IFNULL(SUM(cases),0) AS cases
FROM calendar AS c
LEFT JOIN (SELECT DATE(timestamp) day, COUNT(1) installs FROM `database.table1` GROUP BY day) t1 ON t1.day = c.day
LEFT JOIN (SELECT DATE(timestamp) day, COUNT(1) uninstalls FROM `database.table2` GROUP BY day) t2 ON t2.day = c.day
LEFT JOIN (SELECT DATE(timestamp) day, COUNT(1) cases FROM `database.table3` GROUP BY day) t3 ON t3.day = c.day
GROUP BY day
HAVING installs + uninstalls + cases > 0
ORDER BY day
I am not very familiar with bigquery, so this is probably not going to be a copy-paste answer.
You'll first have to build a calander table to make sure you have all dates. Here's an example for sql server. There are probably examples for bigquery available as well. The following assumes a Calander table with Date attribute in timestamp.
Once you have your calander table you can join all your tables to that:
SELECT FORMAT_TIMESTAMP("%F",C.Date) AS day
, COUNT(T1.DATE(T1.TIMESTAMP)) AS installs --Here you could also use your FORMAT_TIMESTAMP
, COUNT(T1.DATE(T2.TIMESTAMP)) AS uninstalls
FROM Calander C
LEFT JOIN database.table1 T1
ON DATE(T1.TIMESTAMP) = DATE(C.Date) --Convert to date to remove times, you could also use your FORMAT_TIMESTAMP
LEFT JOIN database.table2 T2
ON DATE(T2.TIMESTAMP) = DATE(C.Date)
GROUP BY day
ORDER BY day ASC
I have a table with four columns : id,validFrom,validTo and price.
This table contains the price of an article and the duration when that price is effective.
| id| validFrom | validTo | price
|---|-----------|-----------|---------
| 1 | 01-01-17 | 10-01-17 | 30000
| 1 | 04-01-17 | 09-01-17 | 20000
Now, for this inputs in my table my query output should be :
| id| validFrom | validTo | price
|---|-----------|----------|-------
| 1 | 01-01-17 | 03-01-17 | 30000
| 1 | 04-01-17 | 09-01-17 | 20000
| 1 | 10-01-17 | 10-01-17 | 30000
I can compare the dates and check if products with same id have overlapping dates but I have no idea how to split those dates into non-overlapping dates. Also I am not allowed to use PL/SQL.
Is this possible using only SQL ?
Oracle Setup:
CREATE TABLE prices ( id, validFrom, validTo, price ) AS
SELECT 1, DATE '2017-01-01', DATE '2017-01-10', 30000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-04', DATE '2017-01-09', 20000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-11', DATE '2017-01-15', 10000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-16', DATE '2017-01-18', 15000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-17', DATE '2017-01-20', 40000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-21', DATE '2017-01-24', 28000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-23', DATE '2017-01-26', 23000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-26', DATE '2017-01-26', 17000 FROM DUAL;
Query:
WITH daily_prices ( id, dt, price, duration ) AS (
-- Unroll the price ranges to individual days
SELECT id,
d.COLUMN_VALUE,
price,
validTo - validFrom
FROM prices p,
TABLE(
CAST(
MULTISET(
SELECT p.validFrom + LEVEL - 1
FROM DUAL
CONNECT BY p.validFrom + LEVEL - 1 <= p.validTo
)
AS SYS.ODCIDATELIST
)
) d
),
min_daily_prices ( id, dt, price ) AS (
-- Where a day falls between multiple ranges group them so the price
-- is for the shortest duration offer and if there are two equally short
-- durations then take the minimum price
SELECT id,
dt,
MIN( price ) KEEP ( DENSE_RANK FIRST ORDER BY duration )
FROM daily_prices
GROUP BY id, dt
),
group_changes ( id, dt, price, has_changed_group ) AS (
-- Find when the price changes or a day is skipped which means a new price
-- group is beginning
SELECT id,
dt,
price,
CASE WHEN dt = LAG( dt ) OVER ( PARTITION BY id ORDER BY dt ) + 1
AND price = LAG( price ) OVER ( PARTITION BY id ORDER BY dt )
THEN 0
ELSE 1
END
FROM min_daily_prices
),
groups ( id, dt, price, grp ) AS (
-- Calculate unique indexes (per id) for each group of price ranges
SELECT id,
dt,
price,
SUM( has_changed_group ) OVER ( PARTITION BY id ORDER BY dt )
FROM group_changes
)
SELECT id,
MIN( dt ) AS validFrom,
MAX( dt ) AS validTo,
MIN( price ) AS price
FROM groups
GROUP BY id, grp
ORDER BY id, validFrom;
Output:
ID VALIDFROM VALIDTO PRICE
---------- -------------------- -------------------- ----------
1 01-JAN-2017 00:00:00 03-JAN-2017 00:00:00 30000
1 04-JAN-2017 00:00:00 09-JAN-2017 00:00:00 20000
1 10-JAN-2017 00:00:00 10-JAN-2017 00:00:00 30000
1 11-JAN-2017 00:00:00 15-JAN-2017 00:00:00 10000
1 16-JAN-2017 00:00:00 18-JAN-2017 00:00:00 15000
1 19-JAN-2017 00:00:00 20-JAN-2017 00:00:00 40000
1 21-JAN-2017 00:00:00 22-JAN-2017 00:00:00 28000
1 23-JAN-2017 00:00:00 25-JAN-2017 00:00:00 23000
1 26-JAN-2017 00:00:00 26-JAN-2017 00:00:00 17000