Issues with calculating running total in BigQuery - sql

Not sure what the error here is but the returned result won't give the running total. I keep getting the same numbers returned for both ad_rev and running_total_ad_rev. Maybe someone could point out what the issue is?
Thank you!
SELECT
days,
sum(ad_revenue) as ad_rev,
sum(sum(ad_revenue)) over (partition by days ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as running_total_ad_rev
FROM(
SELECT
DATE_DIFF(activity_date,creation_date,DAY) AS days,
ad_revenue
FROM
table1 INNER JOIN table2
USING (id)
WHERE
creation_date >= *somedate*
and
activity_date = *somedate*
GROUP BY 1,2
ORDER BY 1)
GROUP BY 1
ORDER BY 1

You can't need partition by days if you want have running sum. Also you need to calculate daily_revenue step earlier. Feels like this is what you trying to achieve.
SELECT
days,
daily_revenue,
SUM(ad_revenue) OVER ( ORDER BY days ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) as running_total_ad_rev
FROM(
SELECT
DATE_DIFF(activity_date,creation_date,DAY) AS days,
SUM(ad_revenue) AS daily_revenue
FROM
table1
INNER JOIN table2
USING (id)
WHERE
creation_date >= *somedate*
and
activity_date = *somedate*
GROUP BY 1
ORDER BY 1)
ORDER BY 1

Related

SQL - calculating hours since the earliest date in a partition

I have the following SQL code:
select
survey.ContactId,
survey.CommId,
survey.CommCreatedDate,
survey.CommIdStatus,
br.[Value],
null as HoursPastSinceFirstActiveSurvey,
row_number() over (partition by survey.ContactId order by survey.CommCreatedDate desc) as [row]
from
Survey_Completed survey
inner join
Business_Rules br on br.Name = 'OPT_OUT_TIME'
where
survey.CommIdStatus = 'Active'
Which produces the following result set:
What I need help with is filling out HoursPastSinceFirstActiveSurvey. The logic here should be as follows:
Calculate the total number of hours that has passed since the earliest (by CommCreatedDate) record in the partition for consecutive (by day) records. In order to address the "consecutive" part, I was thinking perhaps it might be possible to add to the partitioning logic to only partition if the days are consecutive. I'm not entirely sure if that's possible though. So for example, look at the last two records. They are grouped as a partition and the dates are consecutive and the earliest date/time on this partition is Nov 11 2020 12:00 AM. So I would want to perform the following in order to populate HoursPastSinceFirstActiveSurvey for these two records:
Today's date minus Nov 11 2020 12:00 AM.
This would be the value for those two records in the partition for HoursPastSinceFirstActiveSurvey. I am not sure where to even start with this!! Thank you all.
I was able to solve for this by the following query. Feedback is entirely WELCOME!
select
Q2.ContactId,
min(Q2.CommCreatedDate) as MinDate,
max(Q2.CommCreatedDate) as MaxDate,
Q2.Consecutive,
datediff(hour, min(Q2.CommCreatedDate), max(Q2.CommCreatedDate)) AS HoursPassed
from
(select
Q1.ContactId,
Q1.CommId,
Q1.CommCreatedDate,
Q1.CommIdStatus,
Q1.[Value],
Q1.Consecutive,
Q1.[row],
Q1.countOfPartition
from
(select
survey.ContactId,
survey.CommId,
survey.CommCreatedDate,
survey.CommIdStatus,
br.[Value],
CAST(dateadd(day,-row_number() over (partition by survey.ContactId order by survey.CommCreatedDate), survey.CommCreatedDate) as Date) as Consecutive,
row_number() over (partition by survey.ContactId order by survey.CommCreatedDate desc) as [row],
count(*) over (partition by survey.ContactId) as countOfPartition
from
Survey_Completed survey
inner join
Business_Rules br on br.Name = 'OPT_OUT_TIME'
where
survey.CommIdStatus = 'Active') Q1
where
Q1.countOfPartition <> 1) Q2
group by
Q2.ContactId, Q2.Consecutive, Q2.[Value]
having
datediff(hour, min(Q2.CommCreatedDate), max(Q2.CommCreatedDate)) > Q2.[Value]

PostgreSQL subquery - calculating average of lagged values

I am looking at Sales Rates by month, and was able to query the 1st table. I am quite new to PostgreSQL and am trying to figure out how I can query the second (I had to do the 2nd one in Excel for now)
I have the current Sales Rate and I would like to compare it to the Sales Rate 1 and 2 months ago, as an averaged rate.
I am not asking for an answer how exactly to solve it because this is not the point of getting better, but just for hints for functions to use that are specific to PostgreSQL. What I am trying to calculate is the 2 month average in the 2nd table based on the lagged values of the 2nd table. Thanks!
Here is the query for the 1st table:
with t1 as
(select date,
count(sales)::numeric/count(poss_sales) as SR_1M_before
from data
where date between '2019-07-01' and '2019-11-30'
group by 1),
t2 as
(select date,
count(sales)::numeric/count(poss_sales) as SR_2M_before
from data
where date between '2019-07-01' and '2019-10-31'
group by 1)
select t0.date,
count(t0.sales)::numeric/count(t0.poss_sales) as Sales_Rate
t1.SR_1M_before,
t2.SR_2M_before
from data as t0
left join t1 on t0.date=t1.date
left join t2 on t0.date=t1.date
where date between '2019-07-01' and '2019-12-31'
group by 1,3,4
order by 1;
As commented by a_horse_with_no_name, you can use window functions to take the average of the two previous monthes with a range clause:
select
date,
count(sales)::numeric/count(poss_sales) as Sales_Rate,
avg(count(sales)::numeric/count(poss_sales)) over(
order by date
rows between '2 month' preceding and '1 month' preceding
) Sales_Rate,
count(sales)::numeric/count(poss_sales) as Sales_Rate
- avg(count(sales)::numeric/count(poss_sales)) over(
order by date
rows between '2 month' preceding and '1 month' preceding
) PercentDeviation
from data
where date between '2019-07-01' and '2019-12-31'
group by date
order by date;
Your data is a bit confusing -- it would be less confusing if you had decimal places (that is, 58% being the average of 57% and 58% is not obvious).
Because you want to have NULL values on the first two rows, I'm going to calculate the values using sum() and count():
with q as (
<whatever generates the data you have shown>
)
select q.*,
(sum(sales_rate) over (order by date
rows between 2 preceding and 1 preceding
) /
nullif(count(*) over (order by date
rows between 2 preceding and 1 preceding
)
) as two_month_average
from q;
You could also express this using case and avg():
select q.*,
(case when row_number() over (order by date) > 2)
then avg(sales_rate) over (order by date
rows between 2 preceding and 1 preceding
)
end) as two_month_average
from q;

SQL count new values only with partition by - running count with no duplicates

Based on table below in Presto I need a column for all new 'rid'. What I managed to do is the same what I can achieve with partition by but it's not exactly what I'm looking for (db<>fiddle demo).
Goal is to have many groupings counts but I think this should describe problem sufficiently.
I need data truncated by days and column for new users every day as shown at example below. In simple words - if value repeats don't count it. I've tried to find correlation between this and relational division problem but I just stuck.
You could use row_number() to rank the records of each rid by time; then you can aggregate and count in only the top record per group.
select
date_trunc(day, t.time) dy,
count(*) rid_count,
sum(case when t.rn = 1 then 1 else 0 end) new_rid_count
from (
select
t.*
row_number() over(partition by t.rid order by t.time) rn
from mytable t
) t
group by date_trunc(day, t.time)
I think of this as two levels of aggregation. The inner one to get the earliest date. The outer to aggregate:
select first_day, count(*)
from (select rid, date_trunc('day', min(time))::date as first_day
from orders o
group by rid
) r
group by 1

How to take only one entry from a table based on an offset to a date column value

I have a requirement to get values from a table based on an offset conditions on a date column.
Say for eg: for the below attached table, if there is any dates that comes close within 15 days based on effectivedate column I should return only the first one.
So my expected result would be as below:
Here for A1234 policy, it returns 6/18/16 entry and skipped 6/12/16 entry as the offset between these 2 dates is within 15 days and I took the latest one from the list.
If you want to group rows together that are within 15 days of each other, then you have a variant of the gaps-and-islands problem. I would recommend lag() and cumulative sum for this version:
select polno, min(effectivedate), max(expirationdate)
from (select t.*,
sum(case when prev_ed >= dateadd(day, -15, effectivedate)
then 1 else 0
end) over (partition by polno order by effectivedate) as grp
from (select t.*,
lag(expirationdate) over (partition by polno order by effectivedate) as prev_ed
from t
) t
) t
group by polno, grp;

How do I get all rows from the second to latest date?

I have gotten all rows for the latest date like this:
SELECT date, quarter, sales_region, revenue
FROM regions
WHERE date = (SELECT MAX(date) FROM regions)
ORDER BY 1
So how would I get the rows for the second latest date?
I have tried but no luck:
SELECT MAX(date), quarter, sales_region, revenue
FROM regions
WHERE date < (SELECT MAX(date) FROM regions)
ORDER BY 1
Here is one method:
SELECT date, quarter, sales_region, revenue
FROM regions
WHERE date = (SELECT DISTINCT date
FROM regions r2
ORDER BY date DESC
OFFSET 1 FETCH FIRST 1 ROW ONLY
)
ORDER BY 1;
Another method uses dense_rank():
select r.*
from (select r.*, dense_rank() over (order by date desc) as seqnum
from regions r
) r
where seqnum = 2;
Gordon answered your question precisely, but if you want to get the records for the last two dates in one query, you could use IN instead of =, and get the top two records with LIMIT 2:
SELECT date, quarter, sales_region, revenue
FROM regions
WHERE date IN (SELECT DISTINCT date
FROM regions r2
ORDER BY date DESC
LIMIT 2)
ORDER BY 1;
Starting with version 8.4, you can also use FETCH FIRST 2 ROW ONLY instead of LIMIT 2.