How to use a window function in snowflake to look back 12 months - sql

I am having trouble using window functions in SNOWFLAKE to look at historical data (from 12 months prior). When I add a dimension, this code doesn't work.
SELECT
DATE_TRUNC('MONTH',pl.DATE) AS MONTH,
COUNT(DISTINCT PL.ID) AS CURRENT,
PL.DIMENSION,
FIRST_VALUE(count(DISTINCT pl.ID)) OVER (PARTITION BY PL.DIMENSION ORDER BY MONTH ASC ROWS BETWEEN 12 PRECEDING AND 12 PRECEDING) AS 1_YEAR_AGO
from table1 pl
group by MONTH, PL.DIMENSION
ORDER BY MONTH
here are the results if i filter on the dimension:
i am wanting more rows.. for example month = 2019-10-01, CURRENT_ would be NULL and 1_YR_AGO should be 1 and so on.. what am I missing? (I put examples of this in the highlighted section of the picture. the results are unhighlighted.
NOTE: I've also tried a lag and it does the same thing here.

Related

Is there a way to calculate percentile using percentile_cont() function over a rolling window in Big Query?

I have a dataset with the following columns
city
user
week
month
earnings
Ideally I want to calculate a 50th % from percentile_cont(earnings,0.5) over (partition by city order by month range between 1 preceding and current row). But Big query doesn't support window framing in percentile_cont. Can anyone please help me if there is a work around this problem.
If I understand correctly, you can aggregate into an array and then unnest:
select t.*,
(select percentile_cont(earning) over ()
from unnest(ar_earnings) earning
limit 1
) as median_2months
from (select t.*,
array_agg(earnings) over (partition by city
order by month
range between 1 preceding and current month
) as ar_earnings
from t
) t;
You don't provide sample data, but this version assumes that month is an incrementing integer that represents the month. You may need to adjust the range depending on the type.

SQL Server need the total of the previous 6 rows

I'm using SQL Server and I need to get the sum of the previous 6 rows of my table and place the results in its own column.
I'm able to get the 6th row back with the below query:
SELECT id
,FileSize
,LAG(FileSize,6) OVER (ORDER BY DAY(CompleteTime)) previous
FROM Jobs_analytics
group by id, CompleteTime, Jobs_analytics.FileSize
which gives me the six row back, but what I need is the sum of all six rows previous.
any help would be appreciate
Mike
You can use:
SELECT ja.id, ja.FileSize, CompleteTime,
SUM(FileSize) OVER (ORDER CompleteTime ROWS BETWEEN 5 PRECEDING AND CURRENT ROW) as previous
FROM Jobs_analytics ja;
I don't see why GROUP BY is necessary. There are no aggregation functions.
Note that this takes 6 days including the current day. If you want the six preceding rows:
SELECT ja.id, ja.FileSize, DATE,
SUM(FileSize) OVER (ORDER BY CompleteTime ja.id ROWS BETWEEN 6 PRECEDING AND 1 PRECEDING) as previous
FROM Jobs_analytics ja

Why Window Functions Require My Aggregated Column in Group

I have been working with window functions a fair amount but I don't think I understand enough about how they work to answer why they behave the way they do.
For the query that I was working on (below), why am I required to take my aggregated field and add it to the group by? (In the second half of my query below I am unable to produce a result if I don't include "Events" in my second group by)
With Data as (
Select
CohortDate as month
,datediff(week,CohortDate,EventDate) as EventAge
,count(distinct case when EventDate is not null then GUID end) as Events
From MyTable
where month >= [getdate():month] - interval '12 months'
group by 1, 2
order by 1, 2
)
Select
month
,EventAge
,sum(Events) over (partition by month order by SubAge asc rows between unbounded preceding and current row) as TotEvents
from data
group by 1, 2, Events
order by 1, 2
I have run into this enough that I have just taken it for granted, but would really love some more color as to why this is needed. Is there a way I should be formatting these differently in order to avoid this (somewhat non-intuitive) requirement?
Thanks a ton!
What you are looking for is presumably a cumulative sum. That would be:
select month, EventAge,
sum(sum(Events)) over (partition by month
order by SubAge asc
rows between unbounded preceding and current row
) as TotEvents
from data
group by 1, 2
order by 1, 2 ;
Why? That might be a little hard to explain. Perhaps if you see the equivalent version with a subquery it will be clearer:
select me.*
sum(sum_events) over (partition by month
order by SubAge asc
rows between unbounded preceding and current row
) as TotEvents
from (select month, EventAge, sum(events) as sum_events
from data
group by 1, 2
) me
order by 1, 2 ;
This is pretty much an exactly shorthand for the query. The window function is evaluated after aggregation. You want to sum the SUM of the events after the aggregation. Hence, you need sum(sum(events)). After the aggregation, events is no longer available.
The nesting of aggregation functions is awkward at first -- at least it was for me. When I first started using window functions, I think I first spent a few days writing aggregation queries using subqueries and then rewriting without the subqueries. Quickly, I got used to writing them without subqueries.

Display a rolling 12 weeks chart in SSRS report

I am calling the data query in ssrs like this:
SELECT * FROM [DATABASE].[dbo].[mytable]
So, the current week is the last week from the query (e.g. 3/31 - 4/4) and each number represents the week before until we have reached the 12 weeks prior to this week and display in a point chart.
How can I accomplish grouping all the visits for all locations by weeks and adding it to the chart?
I suggest updating your SQL query to Group by a descending Dense_Rank of DatePart(Week,ARRIVED_DATE). In this example, I have one column for Visits because I couldn't tell which columns you were using to get your Visit count:
-- load some test data
if object_id('tempdb..#MyTable') is not null
drop table #MyTable
create table #MyTable(ARRIVED_DATE datetime,Visits int)
while (select count(*) from #MyTable) < 1000
begin
insert into #MyTable values
(dateadd(day,round(rand()*100,0),'2014-01-01'),round(rand()*1000,0))
end
-- Sum Visits by WeekNumber relative to today's WeekNumber
select
dense_rank() over(order by datepart(week,ARRIVED_DATE) desc) [Week],
sum(Visits) Visits
from #MyTable
where datepart(week,ARRIVED_DATE) >= datepart(week,getdate()) - 11
group by datepart(week,ARRIVED_DATE)
order by datepart(week,ARRIVED_DATE)
Let me know if I can provide any more detail to help you out.
You are going to want to do the grouping of the visits within SQL. You should be able to add a calculated column to your table which is something like WorkWeek and it should be calculated on the days difference from a certain day such as Sunday. This column will then by your X value rather than the date field you were using.
Here is a good article that goes into first day of week: First Day of Week

analyze range and if true tell me

I want to see if the price of a stock has changed by 5% this week. I have data that captures the price everyday. I can get the rows from the last 7 days by doing the following:
select price from data where date(capture_timestamp)>date(current_timestamp)-7;
But then how do I analyze that and see if the price has increased or decreased 5%? Is it possible to do all this with one sql statement? I would like to be able to then insert any results of it into a new table but I just want to focus on it printing out in the shell first.
Thanks.
It seems odd to have only one stock in a table called data. What you need to do is bring the two rows together for last week's and today's values, as in the following query:
select d.price
from data d cross join
data dprev
where cast(d.capture_timestamp as date = date(current_timestamp) and
cast(dprev.capture_timestamp as date) )= cast(current_timestamp as date)-7 and
d.price > dprev.price * 1.05
If the data table contains the stock ticker, the cross join would be an equijoin.
You may be able to use query from the following subquery for whatever calculations you want to do. This is assuming one record per day. The 7 preceding rows is literal.
SELECT ticker, price, capture_ts
,MIN(price) OVER (PARTITION BY ticker ORDER BY capture_ts ROWS BETWEEN 7 PRECEDING AND CURRENT ROW) AS min_prev_7_records
,MAX(price) OVER (PARTITION BY ticker ORDER BY capture_ts ROWS BETWEEN 7 PRECEDING AND CURRENT ROW) AS max_prev_7_records
FROM data