I'm using SQL Server and I need to get the sum of the previous 6 rows of my table and place the results in its own column.
I'm able to get the 6th row back with the below query:
SELECT id
,FileSize
,LAG(FileSize,6) OVER (ORDER BY DAY(CompleteTime)) previous
FROM Jobs_analytics
group by id, CompleteTime, Jobs_analytics.FileSize
which gives me the six row back, but what I need is the sum of all six rows previous.
any help would be appreciate
Mike
You can use:
SELECT ja.id, ja.FileSize, CompleteTime,
SUM(FileSize) OVER (ORDER CompleteTime ROWS BETWEEN 5 PRECEDING AND CURRENT ROW) as previous
FROM Jobs_analytics ja;
I don't see why GROUP BY is necessary. There are no aggregation functions.
Note that this takes 6 days including the current day. If you want the six preceding rows:
SELECT ja.id, ja.FileSize, DATE,
SUM(FileSize) OVER (ORDER BY CompleteTime ja.id ROWS BETWEEN 6 PRECEDING AND 1 PRECEDING) as previous
FROM Jobs_analytics ja
Related
I am having trouble using window functions in SNOWFLAKE to look at historical data (from 12 months prior). When I add a dimension, this code doesn't work.
SELECT
DATE_TRUNC('MONTH',pl.DATE) AS MONTH,
COUNT(DISTINCT PL.ID) AS CURRENT,
PL.DIMENSION,
FIRST_VALUE(count(DISTINCT pl.ID)) OVER (PARTITION BY PL.DIMENSION ORDER BY MONTH ASC ROWS BETWEEN 12 PRECEDING AND 12 PRECEDING) AS 1_YEAR_AGO
from table1 pl
group by MONTH, PL.DIMENSION
ORDER BY MONTH
here are the results if i filter on the dimension:
i am wanting more rows.. for example month = 2019-10-01, CURRENT_ would be NULL and 1_YR_AGO should be 1 and so on.. what am I missing? (I put examples of this in the highlighted section of the picture. the results are unhighlighted.
NOTE: I've also tried a lag and it does the same thing here.
I am using Hive SQL server. In my database, I am trying to remove records which have less than 7 days of the gap with the previous record but when removing the record I want to check the gap with a "previous retained record", not with any previous record.
I want to retain all the record marked as 1 specifically Rec # 7 Although the gap of 7th record is <7, since the previous record is being removed the gap of 7th with 5th becomes 8.
You can use a cumulative max:
select t.*
from (select t.*,
max(case when retained = 1 then intdate end) over (order by intdate rows between unbounded preceding and 1 preceding) as prev_intdate
from t
) t
where prev_intdate is null or
prev_intdate > dateadd(intdate, 7);
I’m trying to calculate a 3 month rolling average grouped by region and month, as in
Region Month Avg(var_a) Avg(var_b)
Northland Dec-Jan-Feb 7.1 5.9
Southland Dec-Jan-Feb 7.2 6.1
Northland Nov-Dec-Jan 7.4 6.1
Southland Nov-Dec-Jan 7.5 6.2
Northland Oct-Nov-Dec 7.5 6.2
Southland Oct-Nov-Dec 7.5 6.1
Note that month is expanded for illustrative purposes, I’d really expect the output to just say a single month.
Now I can do this by creating a CTE grouping by region and month, then joining to it a couple times like
With month_rollup_cte as
(Select region,month,sum(var_a) a_sum, sum(var_b) b_sum, count(1) cnt
From vw_score_by_region
Group by region,month)
Select c1.region, c1.month,sum(c1.a_sum + c2.a_sum + c3.a_sum) / sum(c1.cnt + c2.cnt + c3.cnt) a_avg, sum(c1.b_sum + c2.b_sum + c3.b_sum) / sum(c1.cnt + c2.cnt + c3.cnt) b_avg
From month_rollup_cte c1
Join month_rollup_cte c2 on c1.region = c2. Region and c1.month = dateadd(mm,1,c2.month)
Join month_rollup_cte c3 on c1.region = c3. Region and c1.month = dateadd(mm,2,c3.month)
Group by c1.region, c1.month;
But that’s ugly, imagine if you had to do a 6 month rolling average or 12 month rolling average… I’m trying to use the t-sql 2012 analytic functions, specifically the RANGE option. I’ve used ROWS preceding before, but never range.
What I tried was
select region,avg(var_a) OVER (order by (year(entry_month) * 100 + month(entry_month)) range between 2 preceding and 1 following)
from [dbo].[vw_score_by_region]
group by region
But I get a syntax error:
*Msg 8120, Level 16, State 1, Line 2
Column 'dbo.vw_score_by_region.month' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.*
Clearly I'm doing something silly, but I'm not sure what.
First of all RANGE is only supported with UNBOUNDED and CURRENT ROW frame delimiters, It cannot be used with N PRECEDING or N FOLLOWING. From your title, looks like your want to get 3 months rolling avg (sliding avg), then you'd better to use ROWS
Using ROWS (This is more likely what you need) SQl Fiddle Demo
select region,
avg(var_a) OVER (partition by region
order by (entry_month)
rows between 2 preceding and current row) as ThreeMonthSlidingAvg
from [dbo].[vw_score_by_region]
Note: No need to calcuate year+month, if entry_month is date or datetime, it is sortable already, thanks for Steve's correction.
Using RANGE:
select region,
avg(var_a) OVER (partition by region,(year(entry_month) * 12 + month(entry_month))/3
order by (entry_month)
range between unbounded preceding and current row) as ThreeMonthSlidingAvg
from [dbo].[vw_score_by_region]
Note: Using RANGE you have to control the partition width, since you want to agg by 3 month, and range doesn't support N PRECEDING and N FOLLOWING, it only supports following:
| UNBOUNDED PRECEDING | Starts the window at first row of the partition
| UNBOUNDED FOLLOWING | Ends the window at last row of the partition
| CURRENT ROW | Starts or Ends the window at current row
Looking for advice with 2 different types of sub-totals using PLSQL.
I need to pull a data set with 1) a unique headcount, and 2) a total number of credits, as a running total over time.
Raw Data:
This is the transactional data -- every time a student registers or a course, a record is inserted with the date, student id, and credits (along with course number and a bunch of other relevant data). One record per course per student.
STUDENT_ID CREDITS DATE
1 3 01-JAN-12
1 2 02-JAN-12
57 1 03-JAN-12
1 1 03-JAN-12
Processed Data:
This is what the boss needs to see -- it will be used for trending later (to see, for example, how this year's Jan-01 is measuring up against last year's Jan-01, etc.).
UniqueHeadcount SumCredits Date
1 3 01-JAN-12
1 5 02-JAN-12
2 7 03-JAN-12
The brute approach to this is to write a bunch of separate SELECTS (one for each day), and UNION them together. For example:
SELECT
COUNT(DISTINCT STUDENT_ID) as "UniqueHeadcount",
SUM(CREDIT_HR) as "SumCredits",
'01-JAN-12' as "DATE"
FROM
REGISTRATIONS
WHERE
TO_CHAR(DATE,'yyyymmdd') <= '20120101'
GROUP BY
'01-JAN-12'
UNION
SELECT
COUNT(DISTINCT STUDENT_ID) as "UniqueHeadcount",
SUM(CREDIT_HR) as "SumCredits",
'02-JAN-12' as "DATE"
FROM
REGISTRATIONS
WHERE
TO_CHAR(DATE,'yyyymmdd') <= '20120102'
GROUP BY
'02-JAN-12'
UNION
...
And that works -- the results are accurate -- but as you can see -- this is nowhere near elegant -- and if you have to do it for 365 days, well...it's a beast. There's got to be a better way to do it.
So far in my search, I've learned about an 'OVER' clause that I can use -- like this:
SELECT
COUNT(DISTINCT STUDENT_ID) OVER(ORDER BY TRUNC(RSTS_DATE) ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) "UniqueHeadcount",
SUM(CREDIT_HR) OVER(ORDER BY TRUNC(RSTS_DATE) ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as "SumCredits",
TRUNC(RSTS_DATE) as "DATE"
FROM
REGISTRATIONS
This query is way, way shorter (yay) -- but has two significant problems that I can't yet find my way around. First is that it doesn't work (by design, aparently?) with the COUNT DISTINCT. So I comment that out for a moment, but then run into the second problem: it ignores the TRUNC() function. The RSTS_DATE, though it appears to be just a day/month/year value when you run a SELECT on it, actually holds the time as well, so the result set I get is not summed simply over date, but also over times -- so instead of one record per day, my processed data returns hundreds of records per day (one for each individual course registration). For example:
UniqueHeadcount SumCredits Date
1 3 01-JAN-12
1 5 02-JAN-12
2 6 03-JAN-12 (hidden time: 07:32:27)
2 7 03-JAN-12 (hidden time: 08:01:33)
Not what I'm after.
So I'm looking for expertise -- if what I've explained so far makes sense -- is there another way to use the OVER clause, or perhaps there may be another feature of PLSQL altogether I should be using for this? I'm not strong in PLSQL if you can't tell, but if anyone can give me some direction -- even just words to google, I'd appreciate the help.
Thanks
Try this:
WITH CRdata AS
(
SELECT COUNT(DISTINCT STUDENT_ID) AS UniqueHeadcount,
SUM(CREDIT_HR) AS SumCredits,
TRUNC(RSTS_DATE) RSTS_DATE
FROM REGISTRATIONS
GROUP BY TRUNC(RSTS_DATE)
)
SELECT SUM(UniqueHeadcount) OVER(ORDER BY RSTS_DATE ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS UniqueHeadcount,
SUM(SumCredits) OVER(ORDER BY RSTS_DATE ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS SumCredits,
RSTS_DATE
FROM CRdata
I want to see if the price of a stock has changed by 5% this week. I have data that captures the price everyday. I can get the rows from the last 7 days by doing the following:
select price from data where date(capture_timestamp)>date(current_timestamp)-7;
But then how do I analyze that and see if the price has increased or decreased 5%? Is it possible to do all this with one sql statement? I would like to be able to then insert any results of it into a new table but I just want to focus on it printing out in the shell first.
Thanks.
It seems odd to have only one stock in a table called data. What you need to do is bring the two rows together for last week's and today's values, as in the following query:
select d.price
from data d cross join
data dprev
where cast(d.capture_timestamp as date = date(current_timestamp) and
cast(dprev.capture_timestamp as date) )= cast(current_timestamp as date)-7 and
d.price > dprev.price * 1.05
If the data table contains the stock ticker, the cross join would be an equijoin.
You may be able to use query from the following subquery for whatever calculations you want to do. This is assuming one record per day. The 7 preceding rows is literal.
SELECT ticker, price, capture_ts
,MIN(price) OVER (PARTITION BY ticker ORDER BY capture_ts ROWS BETWEEN 7 PRECEDING AND CURRENT ROW) AS min_prev_7_records
,MAX(price) OVER (PARTITION BY ticker ORDER BY capture_ts ROWS BETWEEN 7 PRECEDING AND CURRENT ROW) AS max_prev_7_records
FROM data