Related
QUESTION : Display Average Billing Amount For Each Customer ONLY between YEAR(2019-2021).
If customer doesn't have any billing amount for any of the particular year then consider as 0.
-------: OUTPUT :
Customer_ID | Customer_Name | AVG_Billed_Amount
-------------------------------------------------------------------------
1 | A | 87.00
2 | B | 200.00
3 | C | 183.00
--------: EXPLANATION :
If any customer doesn't have any billing records for these 3 years then we need to consider as one record with billing_amount = 0
Like Customer C doesn't have any record for Year 2020, so for C Average will be
(250+300+0)/3 = 183.33 OR 183.00
TEMP TABLE HAS FOLLOWING DATA
DROP TABLE IF EXISTS #TEMP;
CREATE TABLE #TEMP
(
Customer_ID INT
, Customer_Name NVARCHAR(100)
, Billing_ID NVARCHAR(100)
, Billing_creation_Date DATETIME
, Billed_Amount INT
);
INSERT INTO #TEMP
SELECT 1, 'A', 'ID1', TRY_CAST('10-10-2020' AS DATETIME), 100 UNION ALL
SELECT 1, 'A', 'ID2', TRY_CAST('11-11-2020' AS DATETIME), 150 UNION ALL
SELECT 1, 'A', 'ID3', TRY_CAST('12-11-2021' AS DATETIME), 100 UNION ALL
SELECT 2, 'B', 'ID4', TRY_CAST('10-11-2019' AS DATETIME), 150 UNION ALL
SELECT 2, 'B', 'ID5', TRY_CAST('11-11-2020' AS DATETIME), 200 UNION ALL
SELECT 2, 'B', 'ID6', TRY_CAST('12-11-2021' AS DATETIME), 250 UNION ALL
SELECT 3, 'C', 'ID7', TRY_CAST('01-01-2018' AS DATETIME), 100 UNION ALL
SELECT 3, 'C', 'ID8', TRY_CAST('05-01-2019' AS DATETIME), 250 UNION ALL
SELECT 3, 'C', 'ID9', TRY_CAST('06-01-2021' AS DATETIME), 300
-----------------------------------------------------------------------------------
Here, 'A' has 3 transactions - TWICE in year 2020(100+150) and 1 in year 2021(100), but none in 2019(SO, Billed_Amount= 0).
so the average will be calculated as (100+150+100+0)/4
DECLARE #BILL_dATE DATE = (SELECT Billing_creation_date from #temp group by customer_id, Billing_creation_date) /*-- THIS THROWS ERROR AS #BILL_DATE WON'T ACCEPT MULTIPLE VALUES.*/
OUTPUT should look like this:
Customer_ID
Customer_Name
AVG_Billed_Amount
1
A
87.00
2
B
200.00
3
C
183.00
You just need a formula to count the number of missing years.
That's 3 - COUNT(DISTINCT YEAR(Billing_creation_Date)
Then the average = SUM() / (COUNT() + (3 - COUNT(DISTINCT YEAR)))...
SELECT
Customer_ID,
Customer_Name,
SUM(Billed_Amount) * 1.0
/
(COUNT(*) + 3 - COUNT(DISTINCT YEAR(Billing_creation_Date)))
AS AVG_Billed_amount
FROM
#temp
WHERE
Billing_creation_Date >= '2019-01-01'
AND Billing_creation_Date < '2022-01-01'
GROUP BY
Customer_ID,
Customer_Name
Demo : https://dbfiddle.uk/ILcfiGWL
Note: The WHERE clause in another answer here would cause a scan of the table, due to hiding the filtered column behind a function. The way I've formed the WHERE clause allows a "Range Seek" if the column is in an index.
Here is a query that can do that :
select s.Customer_ID, s.Customer_Name, sum(Billed_amount)/ ( 6 - count(1)) as AVG_Billed_Amount from (
select Customer_ID, Customer_Name, sum(Billed_Amount) as Billed_amount
from TEMP
where year(Billing_creation_Date) between 2019 and 2021
group by Customer_ID, year(Billing_creation_Date)
) as s
group by Customer_ID;
According to your description the customer_name C will be 137.5000 not 183.00 since 2018 is not counted and 2020 is not there.
Forgive me if I word this poorly.
And sorry if it has already been asked, but I was not able to find an answer here.
I'm using Snowflake to try and do the below.
Basically, I'm trying to do a piece of work to find out how many times a customer as placed an order after a specific date for each customer.
Scenario:
We want to see if customers continue to shop with us after they have been short-shipped (received 1 or more items less than they ordered).
So for example:
customer 1 places an order on 01/01/2020 and this was a short-shipment.
they then go on to place an order 06/06/2020 and 02/02/2021.
so this customer has a total of 2 additional orders since they were short-shipped on 01/01/2020.\
customer 2 places an order on 02/03/2020 and this was short-shipped.
customer 2 has not since placed an order, so they will have 0 additional orders.
Data available:
cust_id
ord_id
order_date
1
0123
01/01/2020
1
0456
06/06/2020
1
0789
02/02/2021
2
1011
01/01/2020
Desired output:
cust_id
number_of_orders
1
2
2
0
So using a boosted version of your data:
with data_cte( cust_id, ord_id, order_date, short_order_flg) as (
select * from values
(1, '1', '2018-06-06'::date, false),
(1, '2', '2019-01-01'::date, true),
(1, '3', '2019-06-06'::date, false),
(1, '4', '2019-12-02'::date, false),
(1, '5', '2020-01-01'::date, true),
(1, '6', '2020-06-06'::date, false),
(1, '7', '2021-02-02'::date, false),
(2, '8', '2020-01-01'::date, true)
)
which shows a "valid" purchase, multiple "short ships" and how to batch them
SELECT
cust_id,
min(order_date) as short_date,
count(*) -1 as follow_count
FROM (
select
cust_id
,order_date
,CONDITIONAL_TRUE_EVENT(short_order_flg) over(partition by cust_id order by order_date ) as edge
from data_cte
)
where edge > 0
group by 1, edge
order by 1,2;
gives:
CUST_ID
SHORT_DATE
FOLLOW_COUNT
1
2019-01-01
2
1
2020-01-01
2
2
2020-01-01
0
The key things to note, CONDITIONAL_TRUE_EVENT increases each time the event happen, which gives cust_id,edge value as batch key, and if the event has not happened those lines are zero, thus the WHERE filter.
The last things is given we have atleast one count for the start of "post short" batch, we need to subtract one from the count.
Try this
with CTE as (
select 1 as cust_id, '0123' as ord_id, '2020-01-01'::date as order_date, 1 as short_order_flg union all
select 1 as cust_id, '0456' as ord_id, '2020-06-06'::date as order_date, 0 as short_order_flg union all
select 1 as cust_id, '0789' as ord_id, '2021-02-02'::date as order_date, 0 as short_order_flg union all
select 2 as cust_id, '1011' as ord_id, '2020-01-01'::date as order_date, 1 as short_order_flg
),
following_orders as (
select cust_id, short_order_flg, count(ord_id) over (partition by cust_id order by order_date rows between current row and unbounded following) - 1 as number_of_orders
from cte
order by cust_id, order_date
)
select cust_id, number_of_orders
from following_orders
where short_order_flg = 1
;
I added column short_order_flg to indicate which record represents the short order. Then I used window function count(ord_id) over(...) to calculate the number of orders following each order, subtracting 1 to exclude the current record itself. Finally, I applied a filter to select only the short order records.
I have a table orders that looks like this:
order_id
sales_amount
order_time
store_id
1412412
30
2022/03/28
456
1551211
5
2022/03/27
145
I am interested in calculating the sales from stores that had their first order in the last 28 days, on a rolling basis. The following will give me this for the most recent day:
with first_order_dates AS (
select
min(order_time) as first_order_time,
store_id
from Orders
group by store_id
)
select
dateadd(day,-1, cast(getdate() as date)) AS date,
sum(sales_amount) AS new_revenue_last_28d
from Orders
left join first_order_dates
on first_order_dates.store_id = Orders.store_id
where first_order_time between dateadd(day,-29, cast(getdate() as date)) and dateadd(day,-1, cast(getdate() as date))
group by dateadd(day,-1, cast(getdate() as date))
Resulting in:
Date
new_revenue_last_28d
2022/04/06
5400
What I want is to go back and calculate this for every historical day, i.e to end up with
Date
new_revenue_last_28d
2022/04/06
5400
2022/04/05
5732
2022/04/04
4300
and so on so I can chart this. I have run out of ideas - how can I do this with only the info I have available? Using Snowflake ideally
So if you want to only show sales for shops that have their first sale in the last 28 days, and for "those 28 days, have a rolling window of the sum of those sales"
WITH data as (
select * from values
(100, '2022-04-07'::date, 10),
(100, '2022-04-06'::date, 8),
(100, '2022-04-05'::date, 11),
(100, '2022-04-01'::date, 12),
(101, '2022-04-02'::date, 110),
(101, '2022-04-01'::date, 120)
t(store_id, order_date, sales_amount)
), store_valid_orders as (
select
store_id
,order_date
,sales_amount
from data
qualify min(order_date) over(partition by store_id) >= current_date() - 28
), those_28_days as (
select current_date() - row_number()over(order by null) + 1 as date
from table(generator(ROWCOUNT => 29))
), day_join_sales as (
select
d.date
,s.store_id
,sum(s.sales_amount) as sales_amount
from those_28_days as d
left join store_valid_orders as s on d.date = s.order_date
group by 1,2
)
select
date
,store_id
,sum(sales_amount) over(partition by store_id order by date rows between 28 preceding and current row ) as prior_28_days_sales
from day_join_sales
qualify store_id is not null;
gives:
DATE
STORE_ID
PRIOR_28_DAYS_SALES
2022-04-01
100
12
2022-04-05
100
23
2022-04-06
100
31
2022-04-07
100
41
2022-04-01
101
120
2022-04-02
101
230
that is actually more complex that it needs to be.. but I half have the concept for solving rolling windows of days, which include the first sales with respect to rolling date. Which is more complex, but the above might be enough to answer your question. So I will stop here.
Take 2:
with daily 28 days of sales per store, rolled into single daily total:
WITH data as (
select * from values
(100, '2022-04-07'::date, 10),
(100, '2022-04-06'::date, 8),
(100, '2022-04-05'::date, 11),
(100, '2022-04-01'::date, 12),
(101, '2022-04-02'::date, 110),
(101, '2022-04-01'::date, 120)
t(store_id, order_date, sales_amount)
), store_first_orders as (
select
store_id
,min(order_date) as first_order
from data
group by 1
), _29_rows as (
select
row_number()over(order by null) - 1 as rn
from table(generator(ROWCOUNT => 29))
), those_29_rows as (
select
v.store_id
,dateadd(day, r.rn, v.first_order) as date
from _29_rows as r
full join store_first_orders as v
), first_28_days_of_data as (
select
r.store_id
,r.date
,d.sales_amount
from those_29_rows r
left join data as d
on d.store_id = r.store_id AND d.order_date = r.date
), per_site_dailies as (
select
store_id
,date
,sum(sales_amount) over(partition by store_id order by date) as roll_sales
from first_28_days_of_data
order by 2,1
)
select
date,
sum(roll_sales) as new_revenue_last_28d
from per_site_dailies
group by 1
having date <= current_date()
order by 1;
gives:
DATE
NEW_REVENUE_LAST_28D
2022-04-01
132
2022-04-02
242
2022-04-03
242
2022-04-04
242
2022-04-05
253
2022-04-06
261
2022-04-07
271
2022-04-08
271
I am trying to figure out how to write a query that will give me the correct historical data between dates. But only using sql. I know it is possible coding a loop, but I'm not sure if this is possible in a SQL query. Dates: DD/MM/YYYY
An Example of Data
ID
Points
DATE
1
10
01/01/2018
1
20
02/01/2019
1
25
03/01/2020
1
10
04/01/2021
With a simple query
SELECT ID, Points, MIN(Date), MAX(Date)
FROM table
GROUP BY ID,POINTS
The Min date for 10 points would be 01/01/2018, and the Max Date would be 04/01/2021. Which would be wrong in this instance. As It should be:
ID
Points
Min DATE
Max DATE
1
10
01/01/2018
01/01/2019
1
20
02/01/2019
02/01/2020
1
25
03/01/2020
03/01/2021
1
10
04/01/2021
04/01/2021
I was thinking of using LAG, but need some ideas here. What I haven't told you is there is a record per day. So I would need to group until a change of points. This is to create a view from the data that I already have.
It looks like - for your sample data set - the following lead should suffice:
select id, points, date as MinDate,
IsNull(DateAdd(day, -1, Lead(Date,1) over(partition by Id order by Date)), Date) as MaxDate
from t
Example Fiddle
I'm guessing you want the MAX date to be 1 day before the next MIN date.
And you can use the window function LEAD to get the next MIN date.
And if you group also by the year, then the date ranges match the expected result.
SELECT ID, Points
, MIN([Date]) AS [Min Date]
, COALESCE(DATEADD(day, -1, LEAD(MIN([Date])) OVER (PARTITION BY ID ORDER BY MIN([Date]))), MAX([Date])) AS [Max Date]
FROM your_table
GROUP BY ID, Points, YEAR([Date]);
ID
Points
Min Date
Max Date
1
10
2018-01-01
2019-01-01
1
20
2019-01-02
2020-01-02
1
25
2020-01-03
2021-01-03
1
10
2021-01-04
2021-01-04
Test on db<>fiddle here
We can do this by creating two tables one with the minimum and one with the maximum date for each grouping and then combining them
CREATE TABLE dataa(
id INT,
points INT,
ddate DATE);
INSERT INTO dataa values(1 , 10 ,'2018-10-01');
INSERT INTO dataa values(1 , 20 ,'2019-01-02');
INSERT INTO dataa values(1 , 25 ,'2020-01-03');
INSERT INTO dataa values(1 , 10 ,'2021-01-04');
SELECT
mi.id, mi.points,mi.date minDate, ma.date maxDate
FROM
(select id, points, min(ddate) date from dataa group by id,points) mi
JOIN
(select id, points, max(ddate) date from dataa group by id,points) ma
ON
mi.id = ma.id
AND
mi.points = ma.points;
DROP TABLE dataa;
this gives the following output
+------+--------+------------+------------+
| id | points | minDate | maxDate |
+------+--------+------------+------------+
| 1 | 10 | 2018-10-01 | 2021-01-04 |
| 1 | 20 | 2019-01-02 | 2019-01-02 |
| 1 | 25 | 2020-01-03 | 2020-01-03 |
+------+--------+------------+------------+
I've used the default date formatting. This could be modified if you wish.
*** See my other answer, as I don't think this answer is correct after reexamining the OPs question. Leaving ths answer in place, in case it has any value.
As I understand the problem consecutive daily values with the same value for a given ID may be ignored. This can be done by examining the prior value using the LAG() function and excluding records where the current value is unchanged from the prior.
From the remaining records, the LEAD() function can be used to look ahead to the next included record to extract the date where this value is superseded. Max Date is then calculated as one day prior.
Below is an example that includes expanded test data to cover multiple IDs and repeated Points values.
DECLARE #Data TABLE (Id INT, Points INT, Date DATE)
INSERT #Data
VALUES
(1, 10, '2018-01-01'), -- Start
(1, 20, '2019-01-02'), -- Updated
(1, 25, '2020-01-03'), -- Updated
(1, 10, '2021-01-04'), -- Updated
(2, 10, '2022-01-01'), -- Start
(2, 20, '2022-02-01'), -- Updated
(2, 20, '2022-03-01'), -- No change
(2, 20, '2022-04-01'), -- No change
(2, 20, '2022-05-01'), -- No change
(2, 25, '2022-06-01'), -- Updated
(2, 25, '2022-07-01'), -- No change
(2, 20, '2022-08-01'), -- Updated
(2, 25, '2022-09-08'), -- Updated
(2, 10, '2022-10-09'), -- Updated
(3, 10, '2022-01-01'), -- Start
(3, 10, '2022-01-02'), -- No change
(3, 20, '2022-01-03'), -- Updated
(3, 20, '2022-01-04'), -- No change
(3, 20, '2022-01-05'), -- No change
(3, 10, '2022-01-06'), -- Updated
(3, 10, '2022-01-07'); -- No change
WITH CTE AS (
SELECT *, PriorPoints = LAG(Points) OVER (PARTITION BY Id ORDER BY Date)
FROM #Data
)
SELECT ID, Points, MinDate = Date,
MaxDate = DATEADD(day, -1, (LEAD(Date) OVER (PARTITION BY Id ORDER BY Date)))
FROM CTE
WHERE (PriorPoints <> Points OR PriorPoints IS NULL) -- Exclude unchanged
ORDER BY Id, Date
Results:
ID
Points
MinDate
MaxDate
1
10
2018-01-01
2019-01-01
1
20
2019-01-02
2020-01-02
1
25
2020-01-03
2021-01-03
1
10
2021-01-04
null
2
10
2022-01-01
2022-01-31
2
20
2022-02-01
2022-05-31
2
25
2022-06-01
2022-07-31
2
20
2022-08-01
2022-09-07
2
25
2022-09-08
2022-10-08
2
10
2022-10-09
null
3
10
2022-01-01
2022-01-02
3
20
2022-01-03
2022-01-05
3
10
2022-01-06
null
db<>fiddle
For the last value for a given ID, the calculated MaxDate is NULL indicating no upper bound to the date range. If you really want MaxDate = MinDate for this case, you can add ISNULL( ..., Date).
(I am adding this as an alternative (and simpler) interpretation of the OP's question.)
Problem restatement: Given a collection if IDs, Dates, and Points values, a group is defined as any consecutive sequence of the same Points value for a given ID and ascending dates. For each such group, calculate the min and max dates.
The start of such a group can be identified as a row where the Points value changes from the preceding value, or if there is no preceding value for a given ID. If we first tag such rows (NewGroup = 1), we can then assign group numbers based on a count of preceding tagged rows (including the current row). Once we have assigned group numbers, it is then a simple matter to apply a group and aggregate operation.
Below is a sample that includes some additional test data to show multiple IDs and repeating values.
DECLARE #Data TABLE (Id INT, Points INT, Date DATE)
INSERT #Data
VALUES
(1, 10, '2018-01-01'), -- Start
(1, 20, '2019-01-02'), -- Updated
(1, 25, '2020-01-03'), -- Updated
(1, 10, '2021-01-04'), -- Updated
(2, 10, '2022-01-01'), -- Start
(2, 20, '2022-02-01'), -- Updated
(2, 20, '2022-03-01'), -- No change
(2, 20, '2022-04-01'), -- No change
(2, 20, '2022-05-01'), -- No change
(2, 25, '2022-06-01'), -- Updated
(2, 25, '2022-07-01'), -- No change
(2, 20, '2022-08-01'), -- Updated
(2, 25, '2022-09-08'), -- Updated
(2, 10, '2022-10-09'), -- Updated
(3, 10, '2022-01-01'), -- Start
(3, 10, '2022-01-02'), -- No change
(3, 20, '2022-01-03'), -- Updated
(3, 20, '2022-01-04'), -- No change
(3, 20, '2022-01-05'), -- No change
(3, 10, '2022-01-06'), -- Updated
(3, 10, '2022-01-07'); -- No change
WITH CTE AS (
SELECT *,
PriorPoints = LAG(Points) OVER (PARTITION BY Id ORDER BY Date)
FROM #Data
)
, CTE2 AS (
SELECT *,
NewGroup = CASE WHEN (PriorPoints <> Points OR PriorPoints IS NULL)
THEN 1 ELSE 0 END
FROM CTE
)
, CTE3 AS (
SELECT *, GroupNo = SUM(NewGroup) OVER(
PARTITION BY ID
ORDER BY Date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
FROM CTE2
)
SELECT Id, Points, MinDate = MIN(Date), MaxDate = MAX(Date)
FROM CTE3
GROUP BY Id, GroupNo, Points
ORDER BY Id, GroupNo
Results:
Id
Points
MinDate
MaxDate
1
10
2018-01-01
2018-01-01
1
20
2019-01-02
2019-01-02
1
25
2020-01-03
2020-01-03
1
10
2021-01-04
2021-01-04
2
10
2022-01-01
2022-01-01
2
20
2022-02-01
2022-05-01
2
25
2022-06-01
2022-07-01
2
20
2022-08-01
2022-08-01
2
25
2022-09-08
2022-09-08
2
10
2022-10-09
2022-10-09
3
10
2022-01-01
2022-01-02
3
20
2022-01-03
2022-01-05
3
10
2022-01-06
2022-01-07
To see the intermediate results, replace the final select with SELECT * FROM CTE3 ORDER BY Id, Date.
If you wish to treat gaps in dates as group criteria, add a PriorDate calculation to CTE and add OR Date <> PriorDate to the NewGroup condition.
db<>fiddle
Caution: In your original post, you state that "this is to create a view". Beware that if the above logic is included in a view, the entire result may be recalculated every time the view is accessed, regardless of any ID or date criteria applied. It might make more sense to use the above to populate and periodically refresh a historic roll-up data table for efficient access. Another alternative is to make a stored procedure with appropriate parameters that could filter that data before feeding it into the above.
I have a table with historical stocks prices for hundreds of stocks. I need to extract only those stocks that reached $10 or greater for the first time.
Stock
Price
Date
AAA
9
2021-10-01
AAA
10
2021-10-02
AAA
8
2021-10-03
AAA
10
2021-10-04
BBB
9
2021-10-01
BBB
11
2021-10-02
BBB
12
2021-10-03
Is there a way to count how many times each stock hit >= 10 in order to pull only those where count = 1 (in this case it would be stock BBB considering it never reached 10 in the past)?
Since I couldn't figure how to create count I've tried the below manipulations with min/max dates but this looks like a bit awkward approach. Any idea of a simpler solution?
with query1 as (
select Stock, min(date) as min_greater10_dt
from t
where Price >= 10
group by Stock
), query2 as (
select Stock, max(date) as max_greater10_dt
from t
where Price >= 10
group by Stock
)
select Stock
from t a
join query1 b on b.Stock = a.Stock
join query2 c on c.Stock = a.Stock
where not(a.Price < 10 and a.Date between b.min_greater10_dt and c.max_greater10_dt)
This is a type of gaps-and-islands problem which can be solved as follows:
detect the change from < 10 to >= 10 using a lagged price
count the number of such changes
filter in only stock where this has happened exactly once
and take the first row since you only want the stock (you could group by here but a row number allows you to select the entire row should you wish to).
declare #Table table (Stock varchar(3), Price money, [Date] date);
insert into #Table (Stock, Price, [Date])
values
('AAA', 9, '2021-10-01'),
('AAA', 10, '2021-10-02'),
('AAA', 8, '2021-10-03'),
('AAA', 10, '2021-10-04'),
('BBB', 9, '2021-10-01'),
('BBB', 11, '2021-10-02'),
('BBB', 12, '2021-10-03');
with cte1 as (
select Stock, Price, [Date]
, row_number() over (partition by Stock, case when Price >= 10 then 1 else 0 end order by [Date] asc) rn
, lag(Price,1,0) over (partition by Stock order by [Date] asc) LaggedStock
from #Table
), cte2 as (
select Stock, Price, [Date], rn, LaggedStock
, sum(case when Price >= 10 and LaggedStock < 10 then 1 else 0 end) over (partition by Stock) StockOver10
from cte1
)
select Stock
--, Price, [Date], rn, LaggedStock, StockOver10 -- debug
from cte2
where Price >= 10
and StockOver10 = 1 and rn = 1;
Returns:
Stock
BBB
Note: providing DDL+DML as show above makes it much easier of people to assist.