I tried using the LAG function to calculate the value of previous weeks, but there are gaps in the data due to the fact that certain weeks are missing.
This is the table:
The problem is that the LAG functions takes the previous found week in the table. But I would like it to be zero if the previous week is not consecutive previous week.
This is what I would like it to be:
I'm open to any solutions.
Thank you in advance
Your example data is baffling. You have multiple rows per time frame. The first column looks like a string, which doesn't really make sense for the comparison.
So, let me answer based on a simpler data mode. The answer is to use range. If you had an integer column that specified the time frame:
ordering sales
1 10
2 20
3 30
5 50
Then you would phrase this as:
select max(sales) over (order by ordering range between 1 preceding and 1 preceding)
This would return the value from the "previous" row as defined by the first column. The value would be in a separate column, not a separate row.
Related
Say I create a window function and specify:
ROWS BETWEEN 10 PRECEDING AND CURRENT ROW
How does the window function treat the first 9 rows? Does it only calculate up to however many rows above it are available?
I couldn't find this documented in SQL Server's documentation but I could find it in Postgres, and I believe it is standardised1:
In any case, the distance to the end of the frame is limited by the distance to the end of the partition, so that for rows near the partition ends the frame might contain fewer rows than elsewhere.
(My emphasis)
1Have also search MySQL documentation to no avail; This Q is just tagged sql so should be based on the standard but I can't find any downloadable drafts of those at the moment either.
It does the computation ,considering the 10 rows prior to the current row and the current row ,for the given partition window .For example if you want to sum up a number based on the last 3 years and current year ,you can do sum(amount) over (order by year asc) rows between 3 PRECEDING and CURRENT ROW.
To answer your question "Does it only calculate up to however many rows above it are available?" - Yes it considers only those rows which are available
is it possible to SUM a number over a special time period in Amazon Redshift with a WINDOW-Function?
As an example I'm counting login numbers for different companies per day.
What I now want per row is, that it sums up the logins over the last 4 weeks (referenced by the date of the row): The field which I'm serarching for is marked yellow in the screenshot.
Thanks in advance for your help.
If you have data for each day, then you can use rows:
select t.*,
sum(logs) over (partition by company
order by date
rows between 27 preceding and current row
) as logins_4_weeks
from t;
Redshift does not yet support range for the window frame, so this is your best bet.
Example in the attached image.
I'm trying to write a SQL query that checks a given row against the available preceding data.
In this case, the yellow row (6/18/2028) should check against if it's dtstart and dtend fall within the min(dtstart) and max(dtstart) of the consecutive preceding rows where cumulative = 1.
E.g.
The current min(dtstart) = 6/1/2018 and max(dtstart) = 6/30/2018. However, if the 6/7/2018 row had cumulative = 1, then the min(dtstart) = 6/8/2018 and max(dtstart) = 6/30/2018.
With Pandas, I'd separate our rows and come up with a ranking for each set of continuous values to find the min/max of each set, and compare against the compacted list. Not sure what the best approach is in sql.
Thanks in advance for any help.
Consider the following result set returned from a stored procedure:
The goal with the IHD column, is to do a calculation of the previous 6 rows (days) to determine a IHD value from within the stored procedure.
In this case, only from row 7 and onwards will there be an IHD value, since the calculation needs to take into consideration the previous 6 days' closing balance including current day (day 7) and calculate an average. Basically, it needs to use row 1 to 7 for row's 7 IHD value. And then, to calculate row 8's IHD value, it needs to use row 2 to 8.
I have had a look at SQL LAG function, but this only allows me to skip to 1 previous row, and I am not quite sure if I would be able to successfully use the LAG function in a self referencing CTE where averages of more than one previous row is required.
How should I approach this scenario?
Use ROWS BETWEEN. Without Consumable sample data and expected results I can only give Pseudo SQL, but this'll put you on the right path:
AVG({Your Column}) OVER ([PARTITION BY {Other Column}] ORDER BY {Column To Order BY}
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
Obviously replace the parts in braces ({}) and remove the parts in brackets ([]) if not required.
I have one problem in PostgreSQL.
This is my table (this table does not showing all data in image).
What is my requirement is:
Step 1 : find count of value (this is a column in table) Order by value for today date. So it will be like this and I did it.
Step 2 : find count of value for last 30 days starting from today. I am stuck here. Also one another thing is included in this step --
Example : today has 10 count for a value - kash, this will be 10x30,
yesterday had 4 count for the same value , so will be 4x29, so the total sum would be
(10x30) + (4x29) = 416.
This calculation is calculated for each and every value.
This loop execute for 30 times (as I said before last 30 days starting from today). Take today as thirtieth day.
Query will just need to return two columns with value and sum, ordered by the sum.
Add a WHERE clause to your existing query:
WHERE Timestamp > current_date - interval '30' day;
As far as ordering by the sum, add an ORDER BY clause.
ORDER BY COUNT(*) DESC.
I do not believe that you will need a loop (CURSOR) for this query.