Find the start and end date of stock difference - sql

Please Suggest good sql query to find the start and end date of stock difference
imagine i data in a table like below.
Sample_table
transaction_date stock
2018-12-01 10
2018-12-02 10
2018-12-03 20
2018-12-04 20
2018-12-05 20
2018-12-06 20
2018-12-07 20
2018-12-08 10
2018-12-09 10
2018-12-10 30
Expected result should be
Start_date end_date stock
2018-12-01 2018-12-02 10
2018-12-03 2018-12-07 20
2018-12-08 2018-12-09 10
2018-12-10 null 30

It is the gap and island problem. You may use row_numer and group by for this.
select t.stock, min(transaction_date), max(transaction_date)
from (
select row_number() over (order by transaction_date) -
row_number() over (partition by stock order by transaction_date) grp,
transaction_date,
stock
from data
) t
group by t.grp, t.stock
In the following DBFIDDLE DEMO I solve also the null value of the last group, but the main idea of finding consecutive rows is build on the above query.
You may check this for an explanation of this solution.

You can try below using row_number()
select stock,min(transaction_date) as start_date,
case when min(transaction_date)=max(transaction_date) then null else max(transaction_date) end as end_date
from
(
select *,row_number() over(order by transaction_date)-
row_number() over(partition by stock order by transaction_date) as rn
from t1
)A group by stock,rn

Try to use GROUP BY with MIN and MAX:
SELECT
stock,
MIN(transaction_date) Start_date,
CASE WHEN COUNT(*)>1 THEN MAX(transaction_date) END end_date
FROM Sample_table
GROUP BY stock
ORDER BY stock

You can try with LEAD, LAG functions as below:
select currentStockDate as startDate,
LEAD(currentStockDate,1) as EndDate,
currentStock
from
(select *
from
(select
LAG(transaction_date,1) over(order by transaction_date) as prevStockDate,
transaction_date as CurrentstockDate,
LAG(stock,1) over(order by transaction_date) as prevStock,
stock as currentStock
from sample_table) as t
where (prevStock <> currentStock) or (prevStock is null)
) as t2

Related

Rank the dates in a table for each month

I need to find the last three distinct loaddates for each month in various tables for reporting purposes. Example: If I have data from 2021 February to today: I need the three loaddates of Feb 2021, March 2021 and so on till. Dec 2022
So far, I'm able to create the below query in SQL Server which gives me the result for a particular month that I pass in the where condition.
SELECT ROW_NUMBER() OVER (ORDER BY loaddate desc) AS myrank, loaddate
FROM <tablename>
where year(loaddate) = 2022 and month(loaddate) = 6
group by loaddate
It gives me:
myrank loaddate
1 2022-08-29 00:00:00.000
2 2022-08-25 00:00:00.000
3 2022-08-18 00:00:00.000
4 2022-08-17 00:00:00.000
5 2022-08-11 00:00:00.000
From this I can easily select the top three dates with the below query:
SELECT myrank, loaddate
FROM
(
SELECT ROW_NUMBER() OVER (ORDER BY loaddate desc) AS myrank, loaddate
FROM <tablename>
where year(loaddate) = 2022 and month(loaddate) = 6
group by loaddate
) as daterank
WHERE daterank.myrank <= 3
which outputs:
rank loaddate
1 2022-08-29 00:00:00.000
2 2022-08-25 00:00:00.000
3 2022-08-18 00:00:00.000
But this is only for one month. I'm manually passing the month number in the where condition. How to make this ranking query give me the the last 3 distinct loaddates for each month of data that exists in the table?
And also, how to do I run such a generic query on list of 400+ tables instead of changing the tablename manually for each table in the list?
You just add the PARTITION BY clause to ROW_NUMBER() and partition by month (and year since your data might cross a year boundary).
WITH cte AS (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY DATEPART(year, loaddate), DATEPART(month, loaddate) ORDER BY loaddate desc) AS myrank
FROM #MyTable
)
SELECT *
FROM cte
WHERE myrank <= 3
ORDER BY loaddate;
Note: The CTE is doing the same thing as your sub-query - don't let that confuse you - I just prefer it for neatness.
If I understand your request I think this would help you:
SELECT myrank, loaddate, monthofyear
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY month(loaddate) ORDER BY loaddate DESC) AS myrank
, loaddate, month(loaddate) as monthofyear
FROM Db15.dbo.mytable
GROUP BY loaddate
) AS daterank
WHERE daterank.myrank <= 3

Sales amounts of the top n selling vendors by month in bigquery

i have a table in bigquery like this (260000 rows):
vendor date item_price
x 2021-07-08 23:41:10 451,5
y 2021-06-14 10:22:10 41,7
z 2020-01-03 13:41:12 74
s 2020-04-12 01:14:58 88
....
exactly what I want is to group this data by month and find the sum of the sales of only the top 20 vendors in that month. Expected output:
month sum_of_only_top20_vendor's_sales
2020-01 7857
2020-02 9685
2020-03 3574
2020-04 7421
.....
Consider below approach
select month, sum(sale) as sum_of_only_top20_vendor_sales
from (
select vendor,
format_datetime('%Y%m', date) month,
sum(item_price) as sale
from your_table
group by vendor, month
qualify row_number() over(partition by month order by sale desc) <= 20
)
group by month
Another solution that potentially can show much much better performance on really big data:
select month,
(select sum(sum) from t.top_20_vendors) as sum_of_only_top20_vendor_sales
from (
select
format_datetime('%Y%m', date) month,
approx_top_sum(vendor, item_price, 20) top_20_vendors
from your_table
group by month
) t
or with a little refactoring
select month, sum(sum) as sum_of_only_top20_vendor_sales
from (
select
format_datetime('%Y%m', date) month,
approx_top_sum(vendor, item_price, 20) top_20_vendors
from your_table
group by month
) t, t.top_20_vendors
group by month

Joining on the same key on the next row

Suppose we have a table which contains customer_id, order_date, and ship_date. A reorder of the product occurs when the same customer's next order_date is within 30 days of the last ship_date.
select * from mytable
customer_id order_date ship_date
1 2017-08-04 2017-08-09
1 2017-09-01 2017-09-05
2 2017-02-02 2017-03-01
2 2017-04-05 2017-04-09
2 2017-04-15 2017-04-19
3 2018-02-02 2018-03-01
Requested: Reorders
customer_id order_date ship_date
1 2017-09-01 2017-09-05
2 2017-04-15 2017-04-19
How can I retrieve only the records for the same customers who had reorders, next order_date within 30
days of the last ship_date.
You can use exists as follows:
Select * from your_table t
Where exists (select 1 from your_table tt
Where tt.customer_id = t.customer_id
And t.ship_date > tt.ship_date
and t.ship_date <= dateadd(day, 30, tt.ship_date))
One method is lead():
select t.customer_id, t.order_date, t.next_ship_date
from (select t.*,
lead(order_date) over (partition by customer_id order by order_date) as next_order_date
lead(ship_date) over (partition by customer_id order by order_date) as next_ship_date
from t
) t
where next_order_date < dateadd(day, 30, ship_date);
EDIT:
If you want the "reorder" row, just use lag():
select t.*
from (select t.*,
lag(ship_date) over (partition by customer_id order by order_date) as prev_ship_date
from t
) t
where prev_ship_date > dateadd(day, 30, order_date);

Additional condition withing partition over

https://www.db-fiddle.com/f/rgLXTu3VysD3kRwBAQK3a4/3
My problem here is that I want function partition over to start counting the rows only from certain time range.
In this example, if I would add rn = 1 at the end, order_id = 5 would be excluded from the results (because partition is ordering by paid_date and there's order_id = 6 with earlier date) but it shouldn't be as I want that time range for partition starts from '2019-01-10'.
Adding condition rn = 1expected output should be order_id 3,5,11,15, now its only 3,11,15
it should include only orders with is_paid = 0 that are the first one within given time range (if there's preceeding order with is_paid = 1 it shouldn't be counted)
use correlated subquery with not exists
DEMO
SELECT order_id, customer_id, amount, is_paid, paid_date, rn FROM (
SELECT o.*,
ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY paid_date,order_id) rn
FROM orders o
WHERE paid_date between '2019-01-10'
and '2019-01-15'
) x where rn=1 and not exists (select 1 from orders o1 where x.order_id=o1.order_id
and is_paid=1)
OUTPUT:
order_id customer_id amount is_paid paid_date rn
3 101 30 0 10/01/2019 00:00:00 1
5 102 15 0 10/01/2019 00:00:00 1
11 104 31 0 10/01/2019 00:00:00 1
15 105 11 0 10/01/2019 00:00:00 1
If priority should be given to order_id then put that before paid date in the partition function order by clause, this will solve your issue.
SELECT order_id, customer_id, amount, is_paid, paid_date, rn FROM (
SELECT o.*,
ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY order_id,paid_date) rn
FROM orders o
) x WHERE is_paid = 0 and paid_date between
'2019-01-10' and '2019-01-15' and rn=1
Since you need the paid date to be ordered first you need to imply a where condition in the partitioning table in order to avoid unnecessary dates interrupting the partition function.
SELECT order_id, customer_id, amount, is_paid, paid_date, rn FROM (
SELECT o.*,
ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY paid_date, order_id) rn
FROM orders o
where paid_date between '2019-01-10' and '2019-01-15'
) x WHERE is_paid = 0 and rn=1

To subtract a previous row value in SQL Server 2012

This is SQL Query
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT 1)) [Sno],
_Date,
SUM(Payment) Payment
FROM
DailyPaymentSummary
GROUP BY
_Date
ORDER BY
_Date
This returns output like this
Sno _Date Payment
---------------------------
1 2017-02-02 46745.80
2 2017-02-03 100101.03
3 2017-02-06 140436.17
4 2017-02-07 159251.87
5 2017-02-08 258807.51
6 2017-02-09 510986.79
7 2017-02-10 557399.09
8 2017-02-13 751405.89
9 2017-02-14 900914.45
How can I get the additional column like below
Sno _Date Payment Diff
--------------------------------------
1 02/02/2017 46745.80 46745.80
2 02/03/2017 100101.03 53355.23
3 02/06/2017 140436.17 40335.14
4 02/07/2017 159251.87 18815.70
5 02/08/2017 258807.51 99555.64
6 02/09/2017 510986.79 252179.28
7 02/10/2017 557399.09 46412.30
8 02/13/2017 751405.89 194006.80
9 02/14/2017 900914.45 149508.56
I have tried the following query but not able to solve the error
WITH cte AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT 1)) [Sno],
_Date,
SUM(Payment) Payment
FROM
DailyPaymentSummary
GROUP BY
_Date
ORDER BY
_Date
)
SELECT
t.Payment,
t.Payment - COALESCE(tprev.col, 0) AS diff
FROM
DailyPaymentSummary t
LEFT OUTER JOIN
t tprev ON t.seqnum = tprev.seqnum + 1;
Can anyone help me?
Use a order by with column(s) to get consistent results.
Use lag function to get data from previous row and do the subtraction like this:
with t
as (
select ROW_NUMBER() over (order by _date) [Sno],
_Date,
sum(Payment) Payment
from DailyPaymentSummary
group by _date
)
select *,
Payment - lag(Payment, 1, 0) over (order by [Sno]) diff
from t;
You can use lag() to get previous row values
coalesce(lag(sum_payment_col) OVER (ORDER BY (SELECT 1)),0)