Exclude null or zeroes in previous trading days average calculation - sql

I thought I got it, but actually not. Working with some trading data and need to do average stockprice for trading days only. Used the below query for 3 day average; but recently found out there can be dividends on a trading holiday; so for those days in the fact table there is data for the stockcode and closeprice is either zero or null.
Please help me to improve my query to ignore zero and nulls in the 3 preceding trading day's average calculation
select StockCode, datekey, ClosePrice,
AVG(ClosePrice) OVER (partition by StockCode order by datekey
ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING) Avg3Days
from Fact

You can partition by StockCode AND sign(NullIf([ClosePrice],0)) rather than having to know the trading days.
Example
Declare #YourTable Table ([datekey] date,[StockCode] varchar(50),[ClosePrice] money)
Insert Into #YourTable Values
('2019-06-15','xyx',5)
,('2019-06-16','xyx',10)
,('2019-06-17','xyx',NULL)
,('2019-06-18','xyx',0)
,('2019-06-19','xyx',15)
,('2019-06-20','xyx',20)
Select *
,AvgPrice = AVG(ClosePrice) over (partition by StockCode,sign(NullIf([ClosePrice],0)) order By datekey rows between 3 preceding and 1 preceding )
from #YourTable
Order By datekey
Returns
datekey StockCode ClosePrice AvgPrice
2019-06-15 xyx 5.00 NULL
2019-06-16 xyx 10.00 5.00
2019-06-17 xyx NULL NULL
2019-06-18 xyx 0.00 NULL
2019-06-19 xyx 15.00 7.50
2019-06-20 xyx 20.00 10.00
Update
A little uglier, but perhaps something like this
Select *
,AvgPrice = case when sum(1) over (partition by StockCode,sign(NullIf([ClosePrice],0)) order By datekey rows between 3 preceding and 1 preceding ) = 3
then avg(ClosePrice) over (partition by StockCode,sign(NullIf([ClosePrice],0)) order By datekey rows between 3 preceding and 1 preceding )
else null end
from #YourTable
Order By datekey
Returns

Assuming you have a flag that indicates trading days, you can do something like this:
SELECT StockCode, datekey, ClosePrice,
(CASE WHEN isTradingDay = 1
THEN AVG(ClosePrice) OVER (PARTITION BY StockCode, isTradingDay
ORDER BY datekey
ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING
)
END) as Avg3Days
FROM Fact;
This takes the average of the previous three trading days. The value is NULL on non-trading days.
If the StockCode is NULL, it will not be included in the average anyway. If the only indicator is the closePrice, then one method is:
SELECT f.StockCode, f.datekey, f.ClosePrice,
(CASE WHEN v.isTradingDay = 1
THEN AVG(f.ClosePrice) OVER (PARTITION BY f.StockCode, v.isTradingDay
ORDER BY f.datekey
ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING
)
END) as Avg3Days
FROM Fact f CROSS APPLY
(VALUES (CASE WHEN f.ClosePrice > 0 THEN 1 ELSE 0 END)
) v(isTradingDay);
Personally, I would prefer to have an explicit trading day indicator rather than relying on special values of the close price. For instance, trading on a single stock might be suspending for some company-specific reason.
You may want to also have WHERE f.StockCode <> '' to filter out invalid stock codes.

Related

Joining client records based on overlapping date ranges in oracle SQL

I have a dataset that looks like this:
Client id
stayId
start_date
end_date
type
1
101
1-1-2010
20-7-2010
A
1
105
1-7-2010
30-12-2010
A
2
108.
8-10-2012
10-12-2012
B
2
108.
8-10-2012
10-12-2012
B
And i want to merge rows with overlapping date ranges and take the highest stayId but only if the client id and types match. How should i do this in oracle sql?
The result would look like this:
Client id
stayId
start_date
end_date
type
1
105
1-1-2010
30-12-2010
A
2
108.
8-10-2012
10-12-2012
B
2
108.
01-01-2013
13-10-2013
B
This is a type of gaps-and-islands problem. It looks tricky, because there can be arbitrary overlaps -- I suspect that the overlap might even be an earlier record, as in:
|------| |-------|
|------------------|
For this version, I recommend a cumulative max to identify the rows with no overlap. These rows start the "islands". Then, a cumulative sum identifies the islands (the sum of rows where there is no overlap). The final step is aggregation:
select clientid, type, max(stayid),
min(start_date), max(end_date)
from (select t.*,
sum(case when prev_end_date >= start_date then 0 else 1 end) over
(partition by clientid, type
order by start_date
) as grp
from (select t.*,
max(end_date) over (partition by clientid, type
order by start_date
range between unbounded preceding and '1' day preceding
) as prev_end_date
from t
) t
) t
group by clientid, type, grp;

How to use SQL to get column count for a previous date?

I have the following table,
id status price date
2 complete 10 2020-01-01 10:10:10
2 complete 20 2020-02-02 10:10:10
2 complete 10 2020-03-03 10:10:10
3 complete 10 2020-04-04 10:10:10
4 complete 10 2020-05-05 10:10:10
Required output,
id status_count price ratio
2 0 0 0
2 1 10 0
2 2 30 0.33
I am looking to add the price for previous row. Row 1 is 0 because it has no previous row value.
Find ratio ie 10/30=0.33
You can use analytical function ROW_NUMBER and SUM as follows:
SELECT
id,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) - 1 AS status_count,
COALESCE(SUM(price) OVER (PARTITION BY id ORDER BY date), 0) - price as price
FROM yourTable;
DB<>Fiddle demo
I think you want something like this:
SELECT
id,
COUNT(*) OVER (PARTITION BY id ORDER BY date) - 1 AS status_count,
COALESCE(SUM(price) OVER (PARTITION BY id
ORDER BY date ROWS BETWEEN
UNBOUNDED PRECEDING AND 1 PRECEDING), 0) price
FROM yourTable;
Demo
Please also check another method:
with cte
as(*,ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) - 1 AS status_count,
SUM(price) OVER (PARTITION BY id ORDER BY date) ss from yourTable)
select id,status_count,isnull(ss,0)-price price
from cte

Add a 0 in the next row of a column after the last data point

I have written a query which gives the output as shown below:
Date Amount
01-01-2020
01-02-2020 10000
01-03-2020 20000
01-04-2020 30000
01-05-2020 40000
01-06-2020
01-07-2020
01-08-2020
In the above table, we can see that the amount is null for 01-01-2020, 01-06-2020, 01-07-2020, 01-08-2020. Now, I want to add a 0 to the amount column for just 1 row i.e for the date- 01-06-2020 which is after the last data point - 40000. And I'm not sure how to do it. Is there any straight forward query to achieve this? Thank you.
You can use lag() and a case expression:
select date,
case when amount is null and lag(amount) over(order by date) is not null
then 0
else amount
end as amount
from mytable
If you wanted an update statement:
with cte as (
select amount,
case when amount is null and lag(amount) over(order by date) is not null
then 0
end as new_amount
from mytable
)
update cte set amount = new_amount where new_amount = 0

Count of difference in values by date

I have a data set that contain two columns [date, cust_id].
date cust_id
2019-12-08 123
2019-12-08 321
2019-12-09 123
2019-12-09 456
There is a high churn rate for my customers and I am trying to create two additional columns [new_cust, left_cust] by counting the numbers of cust_id that are new and have left by day respectively.
In the case I have two tables broken out by day, I have no issues by querying:
count of new customers
SELECT DISTINCT cust_id
FROM 2019-12-09
WHERE cust_id NOT IN (SELECT DISTINCT cust_id FROM 2019-12-08)
count of customers who churned
SELECT DISTINCT cust_id
FROM 2019-12-08
WHERE cust_id NOT IN (SELECT DISTINCT cust_id FROM 2019-12-09)
I'm not sure how I would query a single table and compare these values by date. What would be the best approach to getting the correct results? I am using AWS Athena.
Expected results:
date new_cust cust_left
2019-12-08 2 0
2019-12-09 1 1
Explanation: Assuming 2019-12-08 is the very first date, I have 2 new customers and 0 customers who have churned. 2019-12-09, I have gained 1 new customer "456", but have 1 customer "321" who has churned. I would have to apply this to a longer range of dates and cust_id.
Hmmm. I think you want:
select date,
sum(case when prev_date is null then 1 else 0 end) as new_cust,
sum(case when next_date = date + interval '1' day then 0 else 1 end) as left_cust
from (select t.*,
lag(date) over (partition by cust_id order by date) as prev_date,
lead(date) over (partition by cust_id order by date) as next_date
from t
) t
group by date;

Redshift SQL Window Function frame_clause with days

I am trying to perform a window function on a data-set in Redshift using days an an interval for the preceding rows.
Example data:
date ID score
3/1/2017 123 1
3/1/2017 555 1
3/2/2017 123 1
3/3/2017 555 3
3/5/2017 555 2
SQL window function for avg score from the last 3 scores:
select
date,
id,
avg(score) over
(partition by id order by date rows
between preceding 3 and
current row) LAST_3_SCORES_AVG,
from DATASET
Result:
date ID LAST_3_SCORES_AVG
3/1/2017 123 1
3/1/2017 555 1
3/2/2017 123 1
3/3/2017 555 2
3/5/2017 555 2
Problem is that I would like the average score from the last 3 DAYS (moving average) and not the last three tests. I have gone over the Redshift and Postgre Documentation and can't seem to find any way of doing it.
Desired Result:
date ID 3_DAY_AVG
3/1/2017 123 1
3/1/2017 555 1
3/2/2017 123 1
3/3/2017 555 2
3/5/2017 555 2.5
Any direction would be appreciated.
You can use lag() and explicitly calculate the average.
select t.*,
(score +
(case when lag(date, 1) over (partition by id order by date) >=
date - interval '2 day'
then lag(score, 1) over (partition by id order by date)
else 0
end) +
(case when lag(date, 2) over (partition by id order by date) >=
date - interval '2 day'
then lag(score, 2) over (partition by id order by date)
else 0
end)
)
) /
(1 +
(case when lag(date, 1) over (partition by id order by date) >=
date - interval '2 day'
then 1
else 0
end) +
(case when lag(date, 2) over (partition by id order by date) >=
date - interval '2 day'
then 1
else 0
end)
)
from dataset t;
The following approach could be used instead of the RANGE window option in a lot of (or all) cases.
You can introduce "expiry" for each of the input records. The expiry record would negate the original one, so when you aggregate all preceding records, only the ones in the desired range will be considered.
AVG is a bit harder as it doesn't have a direct opposite, so we need to think of it as SUM/COUNT and negate both.
SELECT id, date, running_avg_score
FROM
(
SELECT id, date, n,
SUM(score) OVER (PARTITION BY id ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
/ NULLIF(SUM(n) OVER (PARTITION BY id ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 0) as running_avg_score
FROM
(
SELECT date, id, score, 1 as n
FROM DATASET
UNION ALL
-- expiry and negate
SELECT DATEADD(DAY, 3, date), id, -1 * score, -1
FROM DATASET
)
) a
WHERE a.n = 1