How to find first time a price has changed in SQL - sql

I have a table that contains an item ID, the date and the price. All items show their price for each day, but I want only to select the items that have not had their price change, and to show the days without change.
An example of the table is
id
Price
Day
Month
Year
asdf
10
03
11
2022
asdr1
8
03
11
2022
asdf
10
02
11
2022
asdr1
8
02
11
2022
asdf
10
01
11
2022
asdr1
7
01
11
2022
asdf
9
31
10
2022
asdr1
8
31
10
2022
asdf
8
31
10
2022
asdr1
8
31
10
2022
The output I want is:
Date
id
Last_Price
First_Price_Appearance
DaysWOchange
2022-11-03
asdf
10
2022-11-01
2
2022-11-03
asdr1
8
2022-11-02
1
The solutions needs to run quickly, so how are some efficency intensive ways to solve this, considering that the table has millions of rows, and there are items that have not changed their price in years.
The issue for efficiency comes because for each id, I would need to loop the entire table, looking for the first match in which the price has changed, and repeat this for thousands of items.
I am attempting to calculate the difference between the current last price, and all the history, but these becomes slow to process, and may take several minutes to calculate for all of history.
The main concern for this problem is efficiency.

DECLARE #table TABLE (id NVARCHAR(5), Price INT, Date DATE)
INSERT INTO #table (id, Price, Date) VALUES
('asdf', 10, '2022-10-20'),
('asdr1', 8, '2022-10-15'),
('asdf', 10, '2022-11-03'),
('asdr1', 8, '2022-11-02'),
('asdf', 10, '2022-11-02'),
('asdr1', 8, '2022-11-02'),
('asdf', 10, '2022-11-01'),
('asdr1', 7, '2022-11-01'),
('asdf', 9, '2022-10-31'),
('asdr1', 8, '2022-10-31'),
('asdf', 8, '2022-10-31'),
('asdr1', 8, '2022-10-31')
Tables of data are useful, but it's even more so if you can put the demo date into an object.
SELECT id, FirstDate, LastChange, DaysSinceChange, Price
FROM (
SELECT id, MIN(Date) OVER (PARTITION BY id ORDER BY Date) AS FirstDate, Date AS LastChange, Price,
CASE WHEN LEAD(Date,1) OVER (PARTITION BY id ORDER BY Date) IS NULL THEN DATEDIFF(DAY,Date,CURRENT_TIMESTAMP)
ELSE DATEDIFF(DAY,LAG(Date) OVER (PARTITION BY id ORDER BY Date),Date)
END AS DaysSinceChange, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) AS rn
FROM #table
) a
WHERE rn = 1
This is a quick way to get what you want. If you execute the subquery by itself you can see all the history.
id FirstDate LastChange Price DaysSinceChange
-------------------------------------------------------
asdf 2022-10-20 2022-11-03 10 0
asdr1 2022-10-15 2022-11-02 8 1
SELECT id, MIN(Date) OVER (PARTITION BY id ORDER BY Date) AS FirstDate, Date AS LastChange, Price,
CASE WHEN LEAD(Date,1) OVER (PARTITION BY id ORDER BY Date) IS NULL THEN DATEDIFF(DAY,Date,CURRENT_TIMESTAMP)
ELSE DATEDIFF(DAY,LAG(Date) OVER (PARTITION BY id ORDER BY Date),Date)
END AS DaysSinceChange, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) AS rn
FROM #table
id FirstDate LastChange Price DaysSinceChange rn
------------------------------------------------------
asdf 2022-10-20 2022-11-03 10 0 1
asdf 2022-10-20 2022-11-02 10 1 2
asdf 2022-10-20 2022-11-01 10 1 3
asdf 2022-10-20 2022-10-31 9 11 4
asdf 2022-10-20 2022-10-31 8 0 5
asdf 2022-10-20 2022-10-20 10 NULL 6
asdr1 2022-10-15 2022-11-02 8 1 1
asdr1 2022-10-15 2022-11-02 8 1 2
asdr1 2022-10-15 2022-11-01 7 1 3
asdr1 2022-10-15 2022-10-31 8 16 4
asdr1 2022-10-15 2022-10-31 8 0 5
asdr1 2022-10-15 2022-10-15 8 NULL 6

You can use lag() and a cumulative max():
select id, date, price
from (select t.*,
max(case when price <> lag_price then date end) over (partition by id) as price_change_date
from (select t.*, lag(price) over (partition by id order by date) as lag_price
from t
) t
) t
where price_change_date is null;
This calculates the first date of a price change for each id. It then filters out all rows where a price change occurred.
The use of window functions should be highly efficient, taking advantage of indexes on (id, date) and (id, price, date).

Related

Group items from the first time + certain time period

I want to group orders from the same customer if they happen within 10 minutes of the first order, then find the next first order and group them and so on.
Ex:
Customer group orders
6 1 3
2 4,5
3 8
7 1 9,10
2 11,12
3 13
id customer time
3 6 2021-05-12 12:14:22.000000
4 6 2021-05-12 12:24:24.000000
5 6 2021-05-12 12:29:16.000000
8 6 2021-05-12 13:01:40.000000
9 7 2021-05-14 12:13:11.000000
10 7 2021-05-14 12:20:01.000000
11 7 2021-05-14 12:45:00.000000
12 7 2021-05-14 12:48:41.000000
13 7 2021-05-14 12:58:16.000000
18 9 2021-05-18 12:22:13.000000
25 15 2021-05-18 13:44:02.000000
26 16 2021-05-17 09:39:02.000000
27 16 2021-05-18 19:38:43.000000
28 17 2021-05-18 15:40:02.000000
29 18 2021-05-19 15:32:53.000000
30 18 2021-05-19 15:45:56.000000
31 18 2021-05-19 16:29:09.000000
34 15 2021-05-24 15:45:14.000000
35 15 2021-05-24 15:45:14.000000
36 19 2021-05-24 17:14:53.000000
Here is what I have currently, I think that it is currently not grouping by customer when case when d.StartTime > dateadd(minute, 10, c.first_time) so it compares StartTime of all orders for all customers.
with
data as (select Customer,StartTime,Id, row_number() over(partition by Customer order by StartTime) rn from orders t),
cte as (
select d.*, StartTime as first_time
from data d
where rn = 1
union all
select d.*,
case when d.StartTime > dateadd(minute, 10, c.first_time)
then d.StartTime
else c.first_time
end
from cte c
inner join data d on d.rn = c.rn + 1
)
select c.*, dense_rank() over(partition by Customer order by first_time) grp
from cte c;'
I have two databases (MySQL & SQL Server) having similar schema so either would work for me.
Try the following on SQL Server:
SELECT customer,
ROW_NUMBER() OVER (PARTITION BY customer ORDER BY grp) AS group_no,
STRING_AGG(id, ',') AS orders
FROM
(
SELECT id,customer, [time],
(DATEDIFF(SECOND, MIN([time]) OVER (PARTITION BY CUSTOMER), [time])/60)/10 grp
FROM orders
) T
GROUP BY customer, grp
ORDER BY customer
See a demo.
According to your posted requirement, you are trying to divide the period between the first order date and the last order date into groups (or let's say time frames) each one is 10 minutes long.
What I did in this query: for each customer order, find the difference between the order date and the minimum date (first customer order date) in seconds and then divide it by 10 to get it's time frame number. i.e. for a difference = 599s the frame number = 599/60 =9m /10 = 0. for a difference = 620s the frame number = 620/60 =10m /10 = 1.
After defining the correct groups/time frames for each order you can simply use the STRING_AGG function to get the desired output. Noting that the STRING_AGG function applies to SQL Server 2017 (14.x) and later.

Estimation of Cumulative value every 3 months in SQL

I have a table like this:
ID Date Prod
1 1/1/2009 5
1 2/1/2009 5
1 3/1/2009 5
1 4/1/2009 5
1 5/1/2009 5
1 6/1/2009 5
1 7/1/2009 5
1 8/1/2009 5
1 9/1/2009 5
And I need to get the following result:
ID Date Prod CumProd
1 2009/03/01 5 15 ---Each 3 months
1 2009/06/01 5 30 ---Each 3 months
1 2009/09/01 5 45 ---Each 3 months
What could be the best approach to take in SQL?
You can try the below - using window function
DEMO Here
select * from
(
select *,sum(prod) over(order by DATEPART(qq,dateval)) as cum_sum,
row_number() over(partition by DATEPART(qq,dateval) order by dateval) as rn
from t
)A where rn=1
How about just filtering on the month number?
select t.*
from (select id, date, prod, sum(prod) over (partition by id order by date) as running_prod
from t
) t
where month(date) in (3, 6, 9, 12);

Identify New Seller (without buying in recent 3 months)

In my SQL - BigQuery, I have a table with 3 columns: Month, Date, ID about records of transactions of users.
Here is the example
I want to identify which ID is the new seller in each month, the definition of a new seller is the seller without buying the recent 3 months.
I tried to sort row_number the ID order by date, ID. I reckon that the row_number not in (2,3,4) is the new seller. However, ID can skip 1 month and rebuy next month, my code doesn't work with this situation.
Could you please help me to solve this problem? Thank you very much.
Below is for BigQuery Standard SQL
#standardSQL
SELECT *,
COUNT(1) OVER(
PARTITION BY id
ORDER BY DATE_DIFF(`date`, '2000-01-01', MONTH)
RANGE BETWEEN 4 PRECEDING AND 1 PRECEDING
) = 0 AS new_seller
FROM `project.dataset.table`
You can test, play with above using sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'Mar-19' month, DATE '2019-03-01' `date`, 1 id UNION ALL
SELECT 'Mar-19', '2019-03-03', 2 UNION ALL
SELECT 'Mar-19', '2019-03-04', 3 UNION ALL
SELECT 'Apr-19', '2019-04-05', 3 UNION ALL
SELECT 'Apr-19', '2019-04-06', 4 UNION ALL
SELECT 'Apr-19', '2019-04-07', 5 UNION ALL
SELECT 'May-19', '2019-05-03', 3 UNION ALL
SELECT 'May-19', '2019-05-04', 6 UNION ALL
SELECT 'May-19', '2019-05-05', 5 UNION ALL
SELECT 'Jun-19', '2019-06-06', 1 UNION ALL
SELECT 'Jun-19', '2019-06-07', 7 UNION ALL
SELECT 'Jun-19', '2019-06-08', 8 UNION ALL
SELECT 'Jun-19', '2019-06-09', 9 UNION ALL
SELECT 'Jul-19', '2019-07-05', 2 UNION ALL
SELECT 'Jul-19', '2019-07-06', 5 UNION ALL
SELECT 'Jul-19', '2019-07-07', 9
)
SELECT *,
COUNT(1) OVER(
PARTITION BY id
ORDER BY DATE_DIFF(`date`, '2000-01-01', MONTH)
RANGE BETWEEN 4 PRECEDING AND 1 PRECEDING
) = 0 AS new_seller
FROM `project.dataset.table`
-- ORDER BY `date`
with below output
Row month date id new_seller
1 Mar-19 2019-03-01 1 true
2 Mar-19 2019-03-03 2 true
3 Mar-19 2019-03-04 3 true
4 Apr-19 2019-04-05 3 false
5 Apr-19 2019-04-06 4 true
6 Apr-19 2019-04-07 5 true
7 May-19 2019-05-03 3 false
8 May-19 2019-05-04 6 true
9 May-19 2019-05-05 5 false
10 Jun-19 2019-06-06 1 false
11 Jun-19 2019-06-07 7 true
12 Jun-19 2019-06-08 8 true
13 Jun-19 2019-06-09 9 true
14 Jul-19 2019-07-05 2 false
15 Jul-19 2019-07-06 5 false
16 Jul-19 2019-07-07 9 false

Current record with group by function

Trying to get userid recent aggregate value for session_id.
(session_id 3 has two records, recent agg value is 80.00
session_id 4 has four records, recent agg value is 95.00
session_id 6 has three records, recent agg value is 72.00
Table:session_agg
id session_id userid agg date
-- ---------- ------ ----- -------
1 3 11 60.00 1573561586
4 3 11 80.00 1573561586
6 4 11 35.00 1573561749
7 4 11 50.00 1573561751
8 4 11 70.00 1573561912
10 4 11 95.00 1573561921
11 6 14 40.00 1573561945
12 6 14 67.00 1573561967
13 6 14 72.00 1573561978
select id, session_id, userid, agg, date from session_agg
WHERE date IN (select MAX(date) from session_agg GROUP BY session_id) AND
userid = 11
If you want to stick with your current approach, then you need to correlate the session_id in the subquery which checks for the max date for each session:
SELECT id, session_id, userid, add, date
FROM session_agg sa1
WHERE
date = (SELECT MAX(date) FROM session_agg sa2 WHERE sa2.session_id = sa1.session_id) AND
userid = 11;
But, if your version of SQL supports analytic functions, ROW_NUMBER is an easier way to do this:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY session_id ORDER BY date DESC) rn
FROM session_agg
)
SELECT id, session_id, userid, add, date
FROM cte
WHERE rn = 1;

SELECT query for skipping rows with duplicates but leaving the first and the last occurrences in PostgreSQL

I have a table with items, dates, and prices and I am trying to find a way to write a SELECT query in PostgreSQL which will skip rows with duplicate prices so that, only the first and last occurrence of the same price in a row would stay. After the price change, it can go back to the previous value and it should be preserved as well.
id date price item
1 20.10.2018 10 a
2 21.10.2018 10 a
3 22.10.2018 10 a
4 23.10.2018 15 a
5 24.10.2018 15 a
6 25.10.2018 15 a
7 26.10.2018 10 a
8 27.10.2018 10 a
9 28.10.2018 10 a
10 29.10.2018 10 a
11 26.10.2018 3 b
12 27.10.2018 3 b
13 28.10.2018 3 b
14 29.10.2018 3 c
Result:
id date price item
1 20.10.2018 10 a
3 22.10.2018 10 a
4 23.10.2018 15 a
6 25.10.2018 15 a
7 26.10.2018 10 a
10 29.10.2018 10 a
11 26.10.2018 3 b
13 28.10.2018 3 b
14 29.10.2018 3 c
You can use lag() and lead():
select id, date, price, item
from (select t.*,
lag(price) over (partition by item order by date) as prev_price,
lead(price) over (partition by item order by date) as next_price
from t
) t
where prev_price is null or prev_price <> price or
next_price is null or next_price <> price