This is my table:
userID
Year
Month
Day
NbOfVisits
I would like to calculate the 7 days moving average, my query is as follows:
select userID,year,month,day, sum(nbofvisits) OVER (Partition by userID order by year,month,day RANGE BETWEEN 7 PRECEDING AND CURRENT ROW) as nbVisits7days
from table
order by userID, year, month, day;
But I keep getting this error: "A range window frame with value boundaries cannot be used in a window specification with multiple order by expressions". I understand I have multiple "Order Bys", but I can't think of a straight way other than this.
Following Jon Armstrong's comment, I managed to run my intended query as follows:
select userID,year,month,day, sum(nbofvisits) OVER (Partition by userID order by TIMESTAMP(concat(annee,'-',mois,'-',jour)) RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW) as nbVisits7days
from table
order by userID, year, month, day;
Thank you!
Related
i want to calculate rolling average over a rolling period of 252 days but only if 252 days data available in table otherwise null value for rows.
currently i am using this query:
SELECT datestamp, symbol, avg(close) OVER (PARTITION BY symbol ORDER BY datestamp ROWS BETWEEN 251 PRECEDING AND CURRENT ROW) FROM daily_prices.
it is giving avg also if 252 days data not available.
i want acheive result as we get with pandas rolling function by defining min_period value.
"i want to calculate rolling average over a rolling period of 252 days"
The clause ROWS BETWEEN 251 PRECEDING AND CURRENT ROW doesn't refer to a period of time but to the number of rows in the window and which preceed the current row according to the ORDER BY datestamp clause.
I would suggest you a slightly different solution for the window function in order to implement the period of time :
SELECT datestamp, symbol, avg(close) OVER (PARTITION BY symbol ORDER BY datestamp RANGE BETWEEN '251 days' PRECEDING AND CURRENT ROW) FROM daily_prices
Then I don't understand in which case you want a null value. In the window of a current row, you will have at least the current row, so that the avg() can't be null.
Just do a count over the same window and use it to modify your results.
I used a named window to avoid specifying the same window repeatedly.
with daily_prices as (select 1 as symbol, 5 as close, t as datestamp from generate_series(now()-interval '1 year',now(),interval '1 day') f(t))
SELECT
datestamp,
symbol,
case when count(close) OVER w = 252 then
avg(close) OVER w
end
FROM daily_prices
window w as (PARTITION BY symbol ORDER BY datestamp ROWS BETWEEN 251 PRECEDING AND CURRENT ROW);
I am having trouble using window functions in SNOWFLAKE to look at historical data (from 12 months prior). When I add a dimension, this code doesn't work.
SELECT
DATE_TRUNC('MONTH',pl.DATE) AS MONTH,
COUNT(DISTINCT PL.ID) AS CURRENT,
PL.DIMENSION,
FIRST_VALUE(count(DISTINCT pl.ID)) OVER (PARTITION BY PL.DIMENSION ORDER BY MONTH ASC ROWS BETWEEN 12 PRECEDING AND 12 PRECEDING) AS 1_YEAR_AGO
from table1 pl
group by MONTH, PL.DIMENSION
ORDER BY MONTH
here are the results if i filter on the dimension:
i am wanting more rows.. for example month = 2019-10-01, CURRENT_ would be NULL and 1_YR_AGO should be 1 and so on.. what am I missing? (I put examples of this in the highlighted section of the picture. the results are unhighlighted.
NOTE: I've also tried a lag and it does the same thing here.
I have a dataset with the following columns
city
user
week
month
earnings
Ideally I want to calculate a 50th % from percentile_cont(earnings,0.5) over (partition by city order by month range between 1 preceding and current row). But Big query doesn't support window framing in percentile_cont. Can anyone please help me if there is a work around this problem.
If I understand correctly, you can aggregate into an array and then unnest:
select t.*,
(select percentile_cont(earning) over ()
from unnest(ar_earnings) earning
limit 1
) as median_2months
from (select t.*,
array_agg(earnings) over (partition by city
order by month
range between 1 preceding and current month
) as ar_earnings
from t
) t;
You don't provide sample data, but this version assumes that month is an incrementing integer that represents the month. You may need to adjust the range depending on the type.
I am trying to form a query for returning, reactivated and WAU as defined below:
Returning WAU - active last week
WAU - not active last week, but active within last 30 days
Reactivated WAU – not seen in 30+ days
I have table for past 60 days containing cust_id, login date but cant lag function to work (Teradata ODBC connection). I keep getting this error:
[3706] Syntax error: Data Type "logindate" does not match a Defined
Type name. My format is: select .... lag(logindate, 1) over (partition
by cust_id order by 1 asc) as lag_ind from ( ....
Please help for the 3 cases above.
You can aggregate to get the expected answer:
select cust_id,
case
when max(logindate) > current_date - 7 -- active last week
then 'Returning WAU'
when max(logindate) > current_date - 30 -- not active last week, but active within last 30 days
then 'WAU'
else 'Reactivated WAU' –- not seen in 30+ days
end
from tab
group by 1
Regarding the issue with LAG, this has been introduced in 16.10, before you have to rewrite:
lag(logindate, 1)
over (partition by cust_id
order by col asc) as lag_ind
max(logindate)
over (partition by cust_id
order by col asc
rows between 1 preceding and 1 preceding) as lag_ind
Hint: never use ORDER BY 1 in an OLAP function, here it's the literal value one and not the first column.
I have been working with window functions a fair amount but I don't think I understand enough about how they work to answer why they behave the way they do.
For the query that I was working on (below), why am I required to take my aggregated field and add it to the group by? (In the second half of my query below I am unable to produce a result if I don't include "Events" in my second group by)
With Data as (
Select
CohortDate as month
,datediff(week,CohortDate,EventDate) as EventAge
,count(distinct case when EventDate is not null then GUID end) as Events
From MyTable
where month >= [getdate():month] - interval '12 months'
group by 1, 2
order by 1, 2
)
Select
month
,EventAge
,sum(Events) over (partition by month order by SubAge asc rows between unbounded preceding and current row) as TotEvents
from data
group by 1, 2, Events
order by 1, 2
I have run into this enough that I have just taken it for granted, but would really love some more color as to why this is needed. Is there a way I should be formatting these differently in order to avoid this (somewhat non-intuitive) requirement?
Thanks a ton!
What you are looking for is presumably a cumulative sum. That would be:
select month, EventAge,
sum(sum(Events)) over (partition by month
order by SubAge asc
rows between unbounded preceding and current row
) as TotEvents
from data
group by 1, 2
order by 1, 2 ;
Why? That might be a little hard to explain. Perhaps if you see the equivalent version with a subquery it will be clearer:
select me.*
sum(sum_events) over (partition by month
order by SubAge asc
rows between unbounded preceding and current row
) as TotEvents
from (select month, EventAge, sum(events) as sum_events
from data
group by 1, 2
) me
order by 1, 2 ;
This is pretty much an exactly shorthand for the query. The window function is evaluated after aggregation. You want to sum the SUM of the events after the aggregation. Hence, you need sum(sum(events)). After the aggregation, events is no longer available.
The nesting of aggregation functions is awkward at first -- at least it was for me. When I first started using window functions, I think I first spent a few days writing aggregation queries using subqueries and then rewriting without the subqueries. Quickly, I got used to writing them without subqueries.