How to build date change query using presto sql - sql

I need to make a tracker containing item name and price date change.
The date shows when the price is changed. I used lead, but it shows each day not when the day has changed.
for instance,
item name = A
price date = 2022-11-21, price = $4
price date = 2022-11-25, price = $3
price date = 2022-11-30, price = $4
The expectation for the result is:
start date, next date, price
2022-11-21 2022-11-24 $4
2022-11-25 2022-11-29 $3
2022-11-30 2023-02-14 (current date) $4
Any help would be appreciated.
#Updated:
The dataset is containing daily price
for instance :
item name = A
price date = 2022-11-21, price = $4
price date = 2022-11-22, price = $4
price date = 2022-11-23, price = $4
price date = 2022-11-24, price = $4
price date = 2022-11-25, price = $3
price date = 2022-11-26, price = $3
price date = 2022-11-27, price = $3
price date = 2022-11-29, price = $3
price date = 2022-12-01, price = $4
Query :
select
item_name,
supplier_name,
price_date,
price,
lead(price_date) over (partition by item_name order by price) as next_price from
price
Result:
price = $4, 2022-11-21, 2022-11-22
price = $4, 2022-11-22, 2022-11-23
price = $4, 2022-11-23, 2022-11-24
price = $3, 2022-11-25, 2022-11-26
price = $3, 2022-11-26, 2022-11-27
price = $3, 2022-11-27, 2022-11-29
price = $3, 2022-11-29, 2022-11-30
price = $4, 2022-12-01, 2023-02-14
While my expectation is :
price = $4, 2022-11-21, 2022-11-25
price = $3, 2022-11-26, 2022-11-30
price = $4, 2023-12-01, 2023-02-14

You can try using gaps-and-island approach - i.e. introduce a column representing the change in price, use cumulative sum to calculate groups and use group by to calculate the results:
-- sample data
with dataset(date, price) as (
values (date '2022-11-21', 4),
(date '2022-11-22', 4),
(date '2022-11-23', 4),
(date '2022-11-24', 4),
(date '2022-11-25', 3),
(date '2022-11-26', 3),
(date '2022-11-27', 3),
(date '2022-11-29', 3),
(date '2022-12-01', 4)
)
-- query
select arbitrary(price) price,
min(date) as start_dt,
max(date) as end_dt
from (
select date, price, sum (change) over (order by date) as grp
from (
select *, if(lag(price) over (order by date) != price, 1, 0) as change
from dataset)
)
group by grp;
Output:
price
start_dt
end_dt
4
2022-11-21
2022-11-24
3
2022-11-25
2022-11-29
4
2022-12-01
2022-12-01
Few notes:
since your actual data is partitioned - do not forget to add item_name for window partitioning and final group by
if you really need current date as the end_dt for final row in partition there are several ways to achieve that - not very fun with inserting missing dates (see this for inspiration) or you can just roll up another subquery which will check if lead(end_dt) over (partition ... order by end_dt) is null and use current date for end_dt

Related

(SQL) How do you select a max float value along with other datatypes values within a query?

I'm working with the Iowa Liquor Sales dataset which in this case is called "bigquery-public-data.iowa_liquor_sales.sales". Relevant columns and their datatypes are date(DATE), sale_dollars(FLOAT), item_description(STRING), store_name(STRING).
I am trying to write a query that will return the top sale for each year, of the past three years (2021,2020,2019) along with the date, item_description, and store_name.
The below code works, but only covers one year. I know I could copy+paste and change the date every time but that seems tedious. Is there a better way?
SELECT date, sale_dollars, item_description, store_name
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE date between '2021-01-01' and '2021-12-31'
ORDER BY sale_dollars DESC
LIMIT 1
date
sale_dollars
item_description
store_name
2021-04-19
250932.0
Titos Handmade Vodka
Hy-Vee #3
When trying different ways to write it so the max sale of 2019,2020, and 2021 return along with their date, item_description, and store_name, I ran into errors. The below is the closest I got (missing date, item_description, and store_name).
SELECT
(SELECT MAX(sale_dollars)
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE date between '2021-01-01' and '2021-12-31') as sale_2021,
(SELECT MAX(sale_dollars)
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE date between '2020-01-01' and '2020-12-31') as sale_2020,
(SELECT MAX(sale_dollars)
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE date between '2019-01-01' and '2019-12-31') as sale_2019
How can I write a query that returns the max sale of the past three years along with it's date, item, and store name?
Consider below query
SELECT EXTRACT(YEAR FROM date) year,
ARRAY_AGG(
STRUCT(date, sale_dollars, item_description, store_name)
ORDER BY sale_dollars DESC LIMIT 1
)[OFFSET(0)].*
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE date BETWEEN '2019-01-01' AND '2021-12-31'
GROUP BY 1;
Query results
+------+------------+--------------+----------------------+-------------------------------+
| year | date | sale_dollars | item_description | store_name |
+------+------------+--------------+----------------------+-------------------------------+
| 2020 | 2020-10-08 | 250932.0 | Titos Handmade Vodka | Hy-Vee #3 / BDI / Des Moines |
| 2019 | 2019-10-08 | 78435.0 | Makers Mark | Hy-Vee Food Store / Urbandale |
| 2021 | 2021-07-05 | 250932.0 | Titos Handmade Vodka | Hy-Vee #3 / BDI / Des Moines |
+------+------------+--------------+----------------------+-------------------------------+
or, you can get same result with a window function
SELECT date, sale_dollars, item_description, store_name
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE date BETWEEN '2019-01-01' AND '2021-12-31'
QUALIFY ROW_NUMBER() OVER (
PARTITION BY EXTRACT(YEAR FROM date) ORDER BY sale_dollars DESC
) = 1;
As the three values deliver only one value, you can add them to the first query, only adapted to three years
SELECT
date, sale_dollars, item_description, store_name,
(SELECT MAX(sale_dollars)
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE date between '2021-01-01' and '2021-12-31') as sale_2021,
(SELECT MAX(sale_dollars)
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE date between '2020-01-01' and '2020-12-31') as sale_2020,
(SELECT MAX(sale_dollars)
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE date between '2019-01-01' and '2019-12-31') as sale_2019
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE date between '2019-01-01' and '2021-12-31'
ORDER BY sale_dollars DESC
LIMIT 1

Cumulative average and count over occurrences increasing in time

I am looking to calculate an average (over number of occurrences) and observation count over increasing dates per instance (take customer as an example instance) in Oracle SQL.
So the count will increase as date goes up, the average could go up or down.
I can do it for an individual case and a fixed time interval, but I would like to see a series for every customer, with every row a separate date where a sale occurred. Right now, I have a single row per customer. Here is the SQL summarizing the average and count for a fixed time interval:
SELECT AVG(bought_usd) as avg_bought
, COUNT(*) as num_of_interactions
, cust_id
FROM salesTable
WHERE obsdate >= DATE('2000-01-01')
AND obsdate <= DATE('2022-01-01')
GROUP BY cust_id
So for an input of:
the output should look like:
Use analytic functions:
SELECT "DATE",
cust,
AVG(bought_usd) OVER (PARTITION BY cust ORDER BY "DATE") AS avg,
COUNT(*) OVER (PARTITION BY cust ORDER BY "DATE") AS cnt
FROM salestable
ORDER BY cust, "DATE"
Note: DATE is a reserved word. You should not use it as an identifier.
Which, for the sample data:
CREATE TABLE salestable ("DATE", cust, bought_usd) AS
SELECT DATE '2010-10-01', 'Cust A', 100 FROM DUAL UNION ALL
SELECT DATE '2010-12-18', 'Cust A', 50 FROM DUAL UNION ALL
SELECT DATE '2010-12-18', 'Cust B', 120 FROM DUAL UNION ALL
SELECT DATE '2011-10-01', 'Cust B', 180 FROM DUAL;
Outputs:
DATE
CUST
AVG
CNT
2010-10-01 00:00:00
Cust A
100
1
2010-12-18 00:00:00
Cust A
75
2
2010-12-18 00:00:00
Cust B
120
1
2011-10-01 00:00:00
Cust B
150
2
db<>fiddle here

Table with daily historical stock prices. How to pull stocks where the price reached a certain number for the first time

I have a table with historical stocks prices for hundreds of stocks. I need to extract only those stocks that reached $10 or greater for the first time.
Stock
Price
Date
AAA
9
2021-10-01
AAA
10
2021-10-02
AAA
8
2021-10-03
AAA
10
2021-10-04
BBB
9
2021-10-01
BBB
11
2021-10-02
BBB
12
2021-10-03
Is there a way to count how many times each stock hit >= 10 in order to pull only those where count = 1 (in this case it would be stock BBB considering it never reached 10 in the past)?
Since I couldn't figure how to create count I've tried the below manipulations with min/max dates but this looks like a bit awkward approach. Any idea of a simpler solution?
with query1 as (
select Stock, min(date) as min_greater10_dt
from t
where Price >= 10
group by Stock
), query2 as (
select Stock, max(date) as max_greater10_dt
from t
where Price >= 10
group by Stock
)
select Stock
from t a
join query1 b on b.Stock = a.Stock
join query2 c on c.Stock = a.Stock
where not(a.Price < 10 and a.Date between b.min_greater10_dt and c.max_greater10_dt)
This is a type of gaps-and-islands problem which can be solved as follows:
detect the change from < 10 to >= 10 using a lagged price
count the number of such changes
filter in only stock where this has happened exactly once
and take the first row since you only want the stock (you could group by here but a row number allows you to select the entire row should you wish to).
declare #Table table (Stock varchar(3), Price money, [Date] date);
insert into #Table (Stock, Price, [Date])
values
('AAA', 9, '2021-10-01'),
('AAA', 10, '2021-10-02'),
('AAA', 8, '2021-10-03'),
('AAA', 10, '2021-10-04'),
('BBB', 9, '2021-10-01'),
('BBB', 11, '2021-10-02'),
('BBB', 12, '2021-10-03');
with cte1 as (
select Stock, Price, [Date]
, row_number() over (partition by Stock, case when Price >= 10 then 1 else 0 end order by [Date] asc) rn
, lag(Price,1,0) over (partition by Stock order by [Date] asc) LaggedStock
from #Table
), cte2 as (
select Stock, Price, [Date], rn, LaggedStock
, sum(case when Price >= 10 and LaggedStock < 10 then 1 else 0 end) over (partition by Stock) StockOver10
from cte1
)
select Stock
--, Price, [Date], rn, LaggedStock, StockOver10 -- debug
from cte2
where Price >= 10
and StockOver10 = 1 and rn = 1;
Returns:
Stock
BBB
Note: providing DDL+DML as show above makes it much easier of people to assist.

calculate weekly and monthly sales from daily sales

If I have daily sales, how do I show weekly and monthly sales along with daily in a single record in oracle? I can calculate weekly sum and monthly sum in separate tables, but I want to the results in a single data set.
Output should look like shown below.
Date Week Month Daily_Sale Weekly_Sale Monthly_Sale
1/1/20 1 1 $5 $5 $5
1/2/20 1 1 $5 $10 $10
1/3/20 1 1 $1 $11 $11
1/4/20 1 1 $2 $13 $13
1/5/20 1 1 $5 $18 $18
1/6/20 1 1 $1 $19 $19
1/7/20 1 1 $1 $20 $20
1/8/20 2 1 $5 $5 $25
1/8/20 2 1 $5 $10 $30
1/10/20 2 1 $1 $11 $31
1/11/20 2 1 $2 $13 $33
1/12/20 2 1 $5 $18 $38
1/13/20 2 1 $1 $19 $39
1/14/20 2 1 $1 $20 $40
Thank you!
Edit: Highlighting the table
You seem to want running totals. Assuming that your tables contains sales information for more than a single year you need to partition on year as well.
select Date, extract(week from Date) Wk, extract(year from Date) Yr,
Daily_Sale,
sum(Daily_Sale) over (
partition by extract(year from Date), extract(week from Date)
order by Date
) as Weekly_Sale
sum(Daily_Sale) over (
partition by extract(year from Date), extract(month from Date)
order by Date
) as Monthly_Sale
from T
order by Date;
I don't have oracle compiler so I replicated the scenario in SSMS. Here is the query:
SELECT T1.D1, T1.AnnualSales, T1.MonthlySales, T1.WeeklySales, T1.DailySales
From
(SELECT [Date] As D1,
Sum([Sales]) Over (Partition by Year(Date)) as AnnualSales,
Sum([Sales]) Over (Partition by Month(Date)) as MonthlySales,
Sum([Sales]) Over (Partition by Datepart(wk,Date)) as WeeklySales,
Sum([Sales]) Over (Partition by Day(Date)) as DailySales
FROM [dbo].[DailySales_Test]) AS T1
Group by T1.D1, T1.AnnualSales, T1.MonthlySales, T1.WeeklySales, T1.DailySales

Group price with start and end date

I have a table
Recordid Price Start date end date
-----------------------------------------
1 20 2017-10-01 2017-10-02
2 20 2017-10-03 2017-10-04
3 30 2017-10-05 2017-10-05
4 20 2017-10-06 2017-10-07
I want to get every price when it started and when it ended so my result set would be
20. 2017-10-01. 2017-10-04
30. 2017-10-05. 2017-10-05
20. 2017-10-06. 2017-10-07
I'm having problems to figure it out
It's an Oracle database
i figured it out with the code below
SELECT distinct price
, case when start_dt is null then lag(start_dt) over (order by start_date)
else start_dt end realstart
, case when end_dt is null then lead(end_dt) over (order by end_date)
else end_dt end realend
FROM (SELECT case when nvl(lag(price) over (order by start_date),-1) <> price then start_date end start_dt
, case when nvl(lead(price) over (order by end_date),-1) <>price then end_date end end_dt
, price
, start_date
, end_date
FROM t) main
WHERE start_dt is not null
or end_dt is not null
From your sample data I think you want to have start date and end date whenever the price has been changed in order of the record id.
The following query may contain more sub queries than neccessary, because readability. The very inner select determines when the price has been changed here called group changes. The Next level froms the group by a rolling sum. This is possible, because the only the group change contains values > 0. The rest is obvious.
SELECT GRP,
PRICE,
MIN("Start date") AS "Start date",
MAX("end date") AS "end date"
FROM ( SELECT sub.*,
SUM(GROUP_CHANGE) OVER (ORDER BY RECORDID) AS GRP
FROM ( SELECT t.*,
CASE
WHEN RECORDID = LAG(t.RECORDID) OVER (ORDER BY t.PRICE, t.RECORDID) + 1
THEN 0
ELSE RECORDID
END AS GROUP_CHANGE
FROM t ) sub ) fin
GROUP BY GRP, PRICE
ORDER BY GRP
GRP PRICE Start date end date
---------- ---------- ---------- --------
1 20 01.10.17 04.10.17
4 30 05.10.17 05.10.17
8 20 06.10.17 11.10.17
Tested with following data (Note that I have add some record to your deliverd sample data as I wanted to have a group with three records)
CREATE TABLE t (
Recordid INT,
Price INT,
"Start date" DATE,
"end date" DATE
);
INSERT INTO t VALUES (1, 20, TO_DATE('2017-10-01', 'YYYY-MM-DD'), TO_DATE('2017-10-02', 'YYYY-MM-DD'));
INSERT INTO t VALUES (2, 20, TO_DATE('2017-10-03', 'YYYY-MM-DD'), TO_DATE('2017-10-04', 'YYYY-MM-DD'));
INSERT INTO t VALUES (3, 30, TO_DATE('2017-10-05', 'YYYY-MM-DD'), TO_DATE('2017-10-05', 'YYYY-MM-DD'));
INSERT INTO t VALUES (4, 20, TO_DATE('2017-10-06', 'YYYY-MM-DD'), TO_DATE('2017-10-07', 'YYYY-MM-DD'));
INSERT INTO t VALUES (5, 20, TO_DATE('2017-10-08', 'YYYY-MM-DD'), TO_DATE('2017-10-09', 'YYYY-MM-DD'));
INSERT INTO t VALUES (6, 20, TO_DATE('2017-10-10', 'YYYY-MM-DD'), TO_DATE('2017-10-11', 'YYYY-MM-DD'));
Here is one method that might work in your case:
select price, min(start_date), max(end_date)
from (select t.*,
sum(case when prev_price = price and prev_end_date = start_date - 1
then 0 else 1
end) over (order by t.start_date) as grp
from (select t.*,
lag(t.end_date) over (order by t.start_date) as prev_end_date,
lag(t.price) over (order by t.start_date) as prev_price
from t
) t
) t
group by price, grp