Redshift - How to SUM number over last 4 weeks as a window function per row? - sql

is it possible to SUM a number over a special time period in Amazon Redshift with a WINDOW-Function?
As an example I'm counting login numbers for different companies per day.
What I now want per row is, that it sums up the logins over the last 4 weeks (referenced by the date of the row): The field which I'm serarching for is marked yellow in the screenshot.
Thanks in advance for your help.

If you have data for each day, then you can use rows:
select t.*,
sum(logs) over (partition by company
order by date
rows between 27 preceding and current row
) as logins_4_weeks
from t;
Redshift does not yet support range for the window frame, so this is your best bet.

Related

SQL LAG function

I tried using the LAG function to calculate the value of previous weeks, but there are gaps in the data due to the fact that certain weeks are missing.
This is the table:
The problem is that the LAG functions takes the previous found week in the table. But I would like it to be zero if the previous week is not consecutive previous week.
This is what I would like it to be:
I'm open to any solutions.
Thank you in advance
Your example data is baffling. You have multiple rows per time frame. The first column looks like a string, which doesn't really make sense for the comparison.
So, let me answer based on a simpler data mode. The answer is to use range. If you had an integer column that specified the time frame:
ordering sales
1 10
2 20
3 30
5 50
Then you would phrase this as:
select max(sales) over (order by ordering range between 1 preceding and 1 preceding)
This would return the value from the "previous" row as defined by the first column. The value would be in a separate column, not a separate row.

Window function - N preceding and unbounded question

Say I create a window function and specify:
ROWS BETWEEN 10 PRECEDING AND CURRENT ROW
How does the window function treat the first 9 rows? Does it only calculate up to however many rows above it are available?
I couldn't find this documented in SQL Server's documentation but I could find it in Postgres, and I believe it is standardised1:
In any case, the distance to the end of the frame is limited by the distance to the end of the partition, so that for rows near the partition ends the frame might contain fewer rows than elsewhere.
(My emphasis)
1Have also search MySQL documentation to no avail; This Q is just tagged sql so should be based on the standard but I can't find any downloadable drafts of those at the moment either.
It does the computation ,considering the 10 rows prior to the current row and the current row ,for the given partition window .For example if you want to sum up a number based on the last 3 years and current year ,you can do sum(amount) over (order by year asc) rows between 3 PRECEDING and CURRENT ROW.
To answer your question "Does it only calculate up to however many rows above it are available?" - Yes it considers only those rows which are available

Big Query SQL - Group into every n numbers

I have a table that includes a column called minutes_since. It is an integer containing the number of minutes since a pre-defined event. Multiple rows maybe fall within the same minute.
I want to group and aggregate the rows into every n minutes. For example, I want to get the average of another column for all rows occurring within 5 minute intervals.
How could this be achieved in big query standard sql?
#standardSQL
SELECT
MIN(minutes_since) minute_start,
MAX(minutes_since) minute_end,
AVG(value) value_avg
FROM `project.dataset.table`
GROUP BY DIV(minutes_since - 1, 5)

Use row number in aggregate sum over UNBOUNDED FOLLOWING SQL

I would like to add a discount rate when summing Cashflows over a number of period. To do this I need to multiply each of the remaining cashflows by the discount rate, consummate with this period. I could do this, if I knew the row number of each period, but I can't use it with the window calc I am using. The example below shows the column 'Remaining Interest' which is what I am trying to calculate based on raw data of period and interest.
select Period,RemainingInterest = SUM(PeriodInterestPaid)
OVER (PARTITION BY Name ORDER BY period ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM CF A
Period Interest Remaining Interest(Query) Remaining Interest(Required)
1 1000 1000+2000 1000/1.02^1+2000/1.02^2
2 2000 2000 2000/1.02^1
hi i hope i understand Well ---
you need to get the sum of value based on the period that what i under stand from the query but u said that you need a multiply
So there's no need to make a window function just group by
select Period, SUM(PeriodInterestPaid) as RemainingInterest
FROM CF A
and if u want a multiplay you will make group by also but u will use anther exp :
Pls explan what exactly u need

Aggregating 15-minute data into weekly values

I'm currently working on a project in which I want to aggregate data (resolution = 15 minutes) to weekly values.
I have 4 weeks and the view should include a value for each week AND every station.
My dataset includes more than 50 station.
What I have is this:
select name, avg(parameter1), avg(parameter2)
from data
where week in ('29','30','31','32')
group by name
order by name
But it only displays the avg value of all weeks. What I need is avg values for each week and each station.
Thanks for your help!
The problem is that when you do a 'GROUP BY' on just name you then flatten the weeks and you can only perform aggregate functions on them.
Your best option is to do a GROUP BY on both name and week so something like:
select name, week, avg(parameter1), avg(parameter2)
from data
where week in ('29','30','31','32')
group by name, week
order by name
PS - It' not entirely clear whether you're suggesting that you need one set of results for stations and one for weeks, or whether you need a set of results for every week at every station (which this answer provides the solution for). If you require the former then separate queries are the way to go.