Calculate averages of previous 7 rows SQL - sql

Consider the following result set returned from a stored procedure:
The goal with the IHD column, is to do a calculation of the previous 6 rows (days) to determine a IHD value from within the stored procedure.
In this case, only from row 7 and onwards will there be an IHD value, since the calculation needs to take into consideration the previous 6 days' closing balance including current day (day 7) and calculate an average. Basically, it needs to use row 1 to 7 for row's 7 IHD value. And then, to calculate row 8's IHD value, it needs to use row 2 to 8.
I have had a look at SQL LAG function, but this only allows me to skip to 1 previous row, and I am not quite sure if I would be able to successfully use the LAG function in a self referencing CTE where averages of more than one previous row is required.
How should I approach this scenario?

Use ROWS BETWEEN. Without Consumable sample data and expected results I can only give Pseudo SQL, but this'll put you on the right path:
AVG({Your Column}) OVER ([PARTITION BY {Other Column}] ORDER BY {Column To Order BY}
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
Obviously replace the parts in braces ({}) and remove the parts in brackets ([]) if not required.

Related

SQL LAG function

I tried using the LAG function to calculate the value of previous weeks, but there are gaps in the data due to the fact that certain weeks are missing.
This is the table:
The problem is that the LAG functions takes the previous found week in the table. But I would like it to be zero if the previous week is not consecutive previous week.
This is what I would like it to be:
I'm open to any solutions.
Thank you in advance
Your example data is baffling. You have multiple rows per time frame. The first column looks like a string, which doesn't really make sense for the comparison.
So, let me answer based on a simpler data mode. The answer is to use range. If you had an integer column that specified the time frame:
ordering sales
1 10
2 20
3 30
5 50
Then you would phrase this as:
select max(sales) over (order by ordering range between 1 preceding and 1 preceding)
This would return the value from the "previous" row as defined by the first column. The value would be in a separate column, not a separate row.

Window function - N preceding and unbounded question

Say I create a window function and specify:
ROWS BETWEEN 10 PRECEDING AND CURRENT ROW
How does the window function treat the first 9 rows? Does it only calculate up to however many rows above it are available?
I couldn't find this documented in SQL Server's documentation but I could find it in Postgres, and I believe it is standardised1:
In any case, the distance to the end of the frame is limited by the distance to the end of the partition, so that for rows near the partition ends the frame might contain fewer rows than elsewhere.
(My emphasis)
1Have also search MySQL documentation to no avail; This Q is just tagged sql so should be based on the standard but I can't find any downloadable drafts of those at the moment either.
It does the computation ,considering the 10 rows prior to the current row and the current row ,for the given partition window .For example if you want to sum up a number based on the last 3 years and current year ,you can do sum(amount) over (order by year asc) rows between 3 PRECEDING and CURRENT ROW.
To answer your question "Does it only calculate up to however many rows above it are available?" - Yes it considers only those rows which are available

SQL Query to check if current row date ranges fall within preceding sequence

Example in the attached image.
I'm trying to write a SQL query that checks a given row against the available preceding data.
In this case, the yellow row (6/18/2028) should check against if it's dtstart and dtend fall within the min(dtstart) and max(dtstart) of the consecutive preceding rows where cumulative = 1.
E.g.
The current min(dtstart) = 6/1/2018 and max(dtstart) = 6/30/2018. However, if the 6/7/2018 row had cumulative = 1, then the min(dtstart) = 6/8/2018 and max(dtstart) = 6/30/2018.
With Pandas, I'd separate our rows and come up with a ranking for each set of continuous values to find the min/max of each set, and compare against the compacted list. Not sure what the best approach is in sql.
Thanks in advance for any help.

Running a complex loop query in PostgreSQL

I have one problem in PostgreSQL.
This is my table (this table does not showing all data in image).
What is my requirement is:
Step 1 : find count of value (this is a column in table) Order by value for today date. So it will be like this and I did it.
Step 2 : find count of value for last 30 days starting from today. I am stuck here. Also one another thing is included in this step --
Example : today has 10 count for a value - kash, this will be 10x30,
yesterday had 4 count for the same value , so will be 4x29, so the total sum would be
(10x30) + (4x29) = 416.
This calculation is calculated for each and every value.
This loop execute for 30 times (as I said before last 30 days starting from today). Take today as thirtieth day.
Query will just need to return two columns with value and sum, ordered by the sum.
Add a WHERE clause to your existing query:
WHERE Timestamp > current_date - interval '30' day;
As far as ordering by the sum, add an ORDER BY clause.
ORDER BY COUNT(*) DESC.
I do not believe that you will need a loop (CURSOR) for this query.

Subtract last year's ending quarter value from current quarter value

I know
(DateAdd("s",-1,
DateAdd("q",DateDiff("q","1/1/1900",
DateAdd("yyyy",-1,Date())),"1/1/1900")),
"Short Date")
returns the last day of a quarter 1 year ago.
All of the NAV_Dates are the last day of each quarter, and have a value associated with them which makes the row unique. (Closing value titled as NetAssetValue)
How can I use that (or something similar), to get the value associated with the ending year quarterly date, and subtract it from the value of the current quarter's ending value. Note: I do not have to use this, it's just the only SQL I know that will return a value to somewhat close to what I need.
The table's values would be set up similar to this:
+----------+--------------+
|NAV_Date |NetAssetValue |
+----------+--------------+
|12/31/2012| $4,000|
+----------+--------------+
|03/31/2013| $5,000|
+----------+--------------+
The Year to Date would then be (5,000/4,000) - 1 and saved as a percent. Another example would be:
+----------+--------------+
|NAV_Date |NetAssetValue |
+----------+--------------+
|12/31/2012| $4,000|
+----------+--------------+
|06/30/2013| $4,025|
+----------+--------------+
Year to Date calculation: (4,025/4,000) - 1 and saved as a percent.
I know it involves a nested subquery (or possibly more than one) and that we'd essentially have to capture the current quarter's end, use that value, and capture the prior year's quarter end and use that value also. Just not quite sure how to do it.
You were on the right track considering a correlated subquery for this. I think you want the subquery to return the year-end NetAssetValue for each quarterly record.
I hope the WHERE clause in this query makes the logic clear. However it would force the subquery to run the Year() function against every row in the table. Even so, you may be satisfied with the performance if the table is small enough.
SELECT
y1.NAV_Date,
y1.NetAssetValue,
(
SELECT TOP 1 y2.NetAssetValue
FROM YourTable AS y2
WHERE Year(y2.NAV_Date) = Year(y1.NAV_Date)
ORDER BY y2.NAV_Date DESC
) AS YearEndValue
FROM YourTable AS y1;
I think the following WHERE clause should offer better performance than the one above, assuming NAV_Date is indexed. However, you may find it less intuitive. If so, try the first version and then work on this one later if you need it:
WHERE y2.NAV_Date <= DateSerial(Year(y1.NAV_Date), 12, 31)
Beware, in the current year, the query will return NetAssetValue from the most recent quarterly record as YearEndValue, even though the year hasn't ended. I don't know what else you would want in that situation.
Finally, the query should give you NetAssetValue and YearEndValue for each quarter. All you have left is to add your calculation which uses those values.