Moving trailing 13-week average in Postgres - sql

I am trying to build a view that generates a movable 13-week average over the past year.
My source data includes a date, customer ID, and integer, and basically I want to average the 13 prior values (including the current one), over the previous 52 weeks. When I'm finished, I'd like to have a table with a date, each customer ID, and trailing 13-week average for that customer.

After upgrading Postgres to 9.1, the window functions worked great for this:
SELECT vs.weekending,
cs.slinkcust AS customer,
cs.slinkid AS id,
round(avg(vs.maxattached) OVER (PARTITION BY cs.slinkid ORDER BY vs.weekending DESC ROWS BETWEEN 0 PRECEDING AND 12 FOLLOWING), 2) AS rolling_conc_avg,
round(avg(vs.totsessions) OVER (PARTITION BY cs.slinkid ORDER BY vs.weekending DESC ROWS BETWEEN 0 PRECEDING AND 12 FOLLOWING), 2) AS rolling_sess_avg,
dense_rank() OVER (ORDER BY vs.weekending) AS week_number
FROM cfg_slink cs
JOIN view_statslink vs ON cs.slinkid = vs.id
WHERE vs.weekending >= (now() - '364 days'::interval) AND cs.disabled = 0
GROUP BY vs.weekending, cs.slinkid, vs.maxattached, vs.totsessions
ORDER BY vs.weekending DESC, cs.slinkcust;

Related

How to join partitioned table with another one

Sorry for the newbie question, but I'm really having trouble with the following issue:
Say, I have this code in place:
WITH active_pass AS (SELECT DATE_TRUNC(fr.day, MONTH) AS month, id,
CASE
WHEN SUM(fr.imps) > 100 THEN 1
WHEN SUM(fr.imps) < 100 THEN 0
END AS active_or_passive
FROM table1 AS fr
WHERE day between (CURRENT_DATE() - 730) AND (CURRENT_DATE() - EXTRACT(DAY FROM CURRENT_DATE()))
GROUP BY month, id
ORDER BY month desc),
# summing the score for each customer (sum for the whole year)
active_pass_assigned AS (SELECT id, month,
SUM(SUM(active_or_passive)) OVER (PARTITION BY id ORDER BY month rows BETWEEN 3 PRECEDING AND 1 PRECEDING) AS trailing_act
FROM active_pass AS a
GROUP BY month, id
ORDER BY MONTH desc)
What it does is it creates a trailing total over the last 3 months to see how many of those last 3 month the customer was active. However, I have no idea how to join with the next table to get a sum of revenue that said client generated. What I tried is this:
SELECT c.id, DATE_TRUNC(day, MONTH) AS month, SUM(revenue) AS Rev, name
FROM table2 AS c
JOIN active_pass_assigned AS a
ON c.id = a.id
WHERE day between (CURRENT_DATE() - 365) AND (CURRENT_DATE() - EXTRACT(DAY FROM CURRENT_DATE()))
GROUP BY month, id, name
ORDER BY month DESC
However, it returns waaay higher values for Revenue than the actual ones and I have no idea why. Furthermore, could you please tell me how to join those two tables together so that I only get the customer's revenue on the months his activity was equal to 3?

Issues with calculating running total in BigQuery

Not sure what the error here is but the returned result won't give the running total. I keep getting the same numbers returned for both ad_rev and running_total_ad_rev. Maybe someone could point out what the issue is?
Thank you!
SELECT
days,
sum(ad_revenue) as ad_rev,
sum(sum(ad_revenue)) over (partition by days ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as running_total_ad_rev
FROM(
SELECT
DATE_DIFF(activity_date,creation_date,DAY) AS days,
ad_revenue
FROM
table1 INNER JOIN table2
USING (id)
WHERE
creation_date >= *somedate*
and
activity_date = *somedate*
GROUP BY 1,2
ORDER BY 1)
GROUP BY 1
ORDER BY 1
You can't need partition by days if you want have running sum. Also you need to calculate daily_revenue step earlier. Feels like this is what you trying to achieve.
SELECT
days,
daily_revenue,
SUM(ad_revenue) OVER ( ORDER BY days ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) as running_total_ad_rev
FROM(
SELECT
DATE_DIFF(activity_date,creation_date,DAY) AS days,
SUM(ad_revenue) AS daily_revenue
FROM
table1
INNER JOIN table2
USING (id)
WHERE
creation_date >= *somedate*
and
activity_date = *somedate*
GROUP BY 1
ORDER BY 1)
ORDER BY 1

Faster alternative of MIN/MAX in SQL Server

I need the lowest/highest price of stocks for the past n days. The following query works really slow. I would appreciate faster alternative:
SELECT
*,
MIN(Close) OVER (PARTITION BY Ticker ORDER BY PriceDate ROWS BETWEEN 14 PRECEDING AND 1 PRECEDING) AS MinPrice14d,
MAX(Close) OVER (PARTITION BY Ticker ORDER BY PriceDate ROWS BETWEEN 14 PRECEDING AND 1 PRECEDING) AS MaxPrice14d
FROM
(SELECT CompanyID, Ticker, PriceDate, Close
FROM price.PriceHistoryDaily) a
I need the columns specified.
It is trailing, so I need it day by day.
As for period, I will limit it to one year.
Although it doesn't affect the performance, no subquery is needed. So start with the simpler version:
SELECT phd.CompanyID, phd.Ticker, phd.PriceDate, phd.Close,
min(Close) over (partition by Ticker
order by PriceDate
rows between 14 preceding and 1 preceding
) as MinPrice14d,
max(Close) over (partition by Ticker
order by PriceDate
rows between 14 preceding and 1 preceding
) as MaxPrice14d
FROM price.PriceHistoryDaily phd;
Then try adding an index: PriceHistoryDaily(Ticker, PriceDate).
Note: That this returns all rows from PriceHistoryDaily and -- depending on the size of the table -- that might be what is driving the performance.

SQL Server : PRECEDING with another condition

I have a query that is working fine: The query is to find the sum & Avg for the last 3 months and last year. It is working fine, till I got a new request to break the query down to more details by AwardCode.
So how to include that?
I mean for this section
SUM(1.0 * InvolTerm) OVER (ORDER BY Calendar_Date ASC
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS InvolMov3Mth,
I want to find the last 3 months based on AwardCode.
My original query that is working is
SELECT
Calendar_Date, Mth, NoOfEmp, MaleCount, FemaleCount,
SUM(1.0*InvolTerm) OVER (ORDER BY Calendar_Date ASC
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS InvolMov3Mth,
SUM(1.0*TotalTerm) OVER (ORDER BY Calendar_Date ASC
ROWS BETWEEN 11 PRECEDING AND CURRENT ROW) AS TermSum12Mth
FROM #X
The result is
But now I need to add another group AwardCode
SELECT
Mth, AwardCode, NoOfEmp, MaleCount, FemaleCount,
SUM(1.0 * InvolTerm) OVER (ORDER BY Calendar_Date ASC
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS InvolMov3Mth,
SUM(1.0 * TotalTerm) OVER (ORDER BY Calendar_Date ASC
ROWS BETWEEN 11 PRECEDING AND CURRENT ROW) AS TermSum12Mth
FROM #X
The result will be like this
You can notice that the sum of InvolMov3Mth & TermSum12Mth for the whole period does not match the query above
I think I found the answer for my question.
I used PARTITION BY AwardCode before ORDER BY
seems to be working.
SUM(1.0*TotalTerm) OVER (PARTITION BY AwardCode ORDER BY Calendar_Date ASC
ROWS BETWEEN 11 PRECEDING AND CURRENT ROW) AS TermSum12Mth,
Yes. "Partition by" will make it work for your requirment

How can I create a week-to-date metric in vertica?

I have a table which stores year-to-date metrics once per client per day. The schema simplified looks roughly like so, lets call this table history::
bus_date | client_id | ytd_costs
I'd like to create a view that adds a week-to-date costs, essentially any cost that occurs after the prior friday would be considered part of the week-to-date. Currently, I have the following but I'm concerned about the switch case logic.
Here is an example of the logic I have right now to show that this works.
I also got to use the timeseries clause which I've never used before...
;with history as (
select bus_date,client_id,ts_first_Value(value,'linear') "ytd_costs"
from (select {ts'2016-10-07'} t,1 client_id,5.0 "value" union all select {ts'2016-10-14'},1, 15) k
timeseries bus_Date as '1 day' over (partition by client_id order by t)
)
,history_with_wtd as (select bus_date
,client_id
,ytd_costs
,ytd_costs - decode(
dayofweek(bus_date)
,6,first_value(ytd_costs) over (partition by client_id order by bus_date range '1 week' preceding)
,first_value(ytd_costs) over (partition by client_id,date_trunc('week',bus_date+3) order by bus_date)
) as "wtd_costs"
,ytd_costs - 5 "expected_wtd"
from history)
select *
from history_with_wtd
where date_trunc('week',bus_date) = '2016-10-10'
In Sql server, I could just use the lag function, as I can pass a variable to the look-back clause. but in Vertica no such option exists.
How about you partition by week starting on Saturday? First grab the first day of the week, then offset to start on Saturday. trunc(bus_date + 1,'D') - 1
Also notice the window frame is from the start of the partition (Saturday, unbounded preceding) to the current row.
select
bus_date
,client_id
,ytd_costs
,ytd_costs - first_value(ytd_costs) over (
partition by client_id, trunc(bus_date + 1,'D') - 1
order by bus_date
range between unbounded preceding and current row) wtd_costs
from sos.history
order by client_id, bus_date