I'm currently stucked on an issue. I have daily data and I want to SUM all the data for the next 30 days over a year.
Date
Views
28-01-2021
1
29-01-2021
5
30-01-2021
1
31-01-2021
5
And I want to have the number of views starting the 28th for the next 30 days, then the next 30 days etc... over a year (or twelve times)
So basically what I want to see is something like this, Series being the series of 30 days (first 30 days, second 30 days etc...)
Series
Views
1 (from 28-01 to 28-02)
250
2 (from 01-03 to 30-03)
200
3 (from 31-04 to 29-04)
300
4 (from 30-04 to 29-05)
550
Thank you if anyone can help.
Regards
Assuming you have one row for each day, you can use a window function:
select t.*,
sum(views) over (order by date
rows between current row and 29 following
) as views_30days
from t;
Note: This interprets "next 30 days" as really being "today plus the next 29 days". If you don't want the current day, then the window frame would be:
rows between 1 following and 29 following
Related
I'm struggling with this.
I have a column in Snowflake called DURATION, it is VARCHAR type.
The values include basically number in days, hours, minutes, seconds. The value could include either just the number with one unit of time (day or hour or minute or second) such as 3 hours or 14 minutes or 3 seconds or it could include the combination of either all units of time or a few such as 1 day 3 hours 35 minutes or 1 hour 9 minutes or 45 minutes 1 second.
The value could also be blank or invalid such as text or it could be indicating day, hour or minute but without a number (see the last 3 rows in the table below).
I would greatly appreciate it if you guys could help me with the following:
in SNOWFLAKE, convert all valid values to number type and normalize them to minutes (e.g. the resulted value for 7 Hours and 13 Minutes would be 433).
Thanks a lot, guys!
DURATION
1 Second
10 Seconds
1 Minute
3 Minutes
20 Minutes
1 Hour
2 Hours
7 Hours 13 Minutes
1 Hour 1 Minute
1 Day
1 Day 1 Hour
1 Day 1 Hour 1 Minute
1 Day 10 Hours
2 Days 1 Hour
3 Days 9 Hours
1 Day 3 Hours 45 Minutes
Duration (invalid)
Days
Day Minute
Minutes
I tried many things using regex_substr, try_to_number, coalesce functions in CASE statements but I'm getting either 0s or NULL for all values. Very frustrating
I think you would want to use STRTOK_TO_ARRAY in a CTE subquery or put into a temp table. Then you could use ARRAY_POSITION to find the labels and the index one less than the label should be the value. Those values could be put into separate columns with a case for each label pulling the found values. The case statements could be computed columns if you insert the results of the first query into a table. From there you can concatenate colons and cast to a time type and use datediff, or do the arithmetic to calculate the minutes.
I am trying to sum over rows that have a timestamp within 5 minutes previous to the timestamp of any given row. The issue I have faced is that there is no predefined number of rows that could be within the previous 5 minutes and counting 5 minute intervals from the initial timestamp will not give the desired outcome.
The table below shows the desired output:
TimeStamp
Sales
SUM_PREVIOUS_5_MINS_SALES
18:04:03
2
2
18:05:23
5
7
18:05:58
3
10
18:09:34
4
12
One method uses a window function:
select t.*,
sum(sales) over (order by extract(epoch from timestamp)
range between 299 preceding and current row
) as sales_prev_5_minutes
from t;
Note: 299 = 5 * 60 - 1, i.e. the number of seconds in 5 minutes (well, minus one because the current row is included).
I have 2 columns i need to get the different period with days and hours
Example
2019-10-22 13:22:59 min getdate()
I need result to be
3 days 7 hours
This is a slightly adjusted version of the question here: Calculating Weekly Returns from Daily Time Series of Prices which was answered by #Scott Craner:
I want to calculate weekly returns of a mutual fund from a time series of daily prices. My data looks like this:
A B C D E
DATE WEEK W.DAY MF.PRICE WEEKLY RETURN
02/01/12 1 1 2,7587 -0,0108
03/01/12 1 2 2,7667
04/01/12 1 3 2,7892
05/01/12 1 4 2,7666
06/01/12 1 5 2,7391
09/01/12 2 1 2,7288 0,0067
10/01/12 2 2 2,6707
11/01/12 2 3 2,7044
12/01/12 2 4 2,7183
13/01/12 2 5 2,7619
16/01/12 3 1 2,7470 0,0511
17/01/12 3 2 2,7878
18/01/12 3 3 2,8156
19/01/12 3 4 2,8310
20/01/12 3 5 2,8760
23/01/12 4 1 2,8875
The date is (dd/mm/yy) format and "," is decimal separator. This would be done by using this formula: (Price for first weekday of next week - Price for first weekday of current week)/(Price for first weekday of current week). For example the return for the first week is (2,7288 - 2,7587)/2,7587 = -0,0108 and for the second is (2,7470 - 2,7288)/2,7288 = 0,0067.
The problem is that the list goes on for a year, and some weeks have less than five working days due to holidays or other reasons. Some weeks start with weekday 2, some end with weekday 3. So I can't simply copy and paste the formula above. I added the extra two columns for week number and week day using WEEKNUM and WEEKDAY functions, thought it might help. I want to automate this with a formula and hope to get a table like this:
WEEK RETURN
1 -0,0108
2 0,0067
3 0,0511
.
.
.
I'm looking for a way to tell excel to "find the prices that correspond to the min weekdays of two consecutive weeks and apply the formula "(Price for first weekday of next week - Price for first weekday of current week)/(Price for first weekday of current week)".
I would appreciate any help! (I have 5 separate worksheets for consecutive years, each with daily prices of 20 mutual funds)
It seems to me that you can generate your column E with this formula in E2 :
=IF(B2=B1, "", (VLOOKUP(1+B2, B3:D9, 3, FALSE) - D2)/D2)
It's a VLookup limited on the next 7 rows from each row that declares a new week.
Copying into all cells will give the result indicated in your first tableau. To transform this result into to the list (Week, Return) is a matter of a filter that hides blanks from E.
Notice that a problem could occur if the WeekNum restarts from one when a new year is reached, but since you say that each of your sheets is for one (calendar) year, it shouldn't happen.
I've been struggling with the following problem:
EXPLAINING
I have a table called part_subhourly_data that holds production data for a part (For the purpose of the problem, no need to know what a part is).
I need to archive the any data older than a 100 days. But since there's a lot of data (they arrive each 5 or 10 minutes) and we have more than 1000 parts, I need to do it the 5 oldest days each time.
This is the schema of my table:
part_subhourly_data
id INTERGER,
part_id INTEGER,
produced_at TIMESTAMP
data HSTORE
So basically I need to get all data that is in this table, where produced_at is prior to 100 days ago and limit that to the first 5 days, per part.
Example:
Part 1 has data from 15 Aug 2016 until 12 Dec 2016
Part 2 has data from 1st Sep 2016 until 12 Dec 2016
100 days ago would be 3 Sep 2016.
For Part 1 I would take data from 15 Aug 2016 until 19 Aug 2016 (5 days).
For Part 2 I would take data from 1st Sep 2016 until 3 Sep 2016 (3 days because of the 100 days old condition).
WHAT HAVE I TRIED
Well, I'm using rails on this, but a SQL solution is welcome as well. For now, What I'm doing is to grab the oldest data with:
SELECT "part_subhourly_data"."part_id", MIN(produced_at) produced_at
FROM "part_subhourly_data"
WHERE (produced_at < (NOW() - INTERVAL '100 days'))
GROUP BY "part_subhourly_data"."part_id"
And then I loop Over each part_id and grab the data based on the MIN(produced_at). It works, but it doesn't seems ideal. I'm sure that there is some SQL magic to make it simpler, and quicker, without having to loop over each part.
Any idea?
Take all records where produced_at is prior to 100 days ago.
dense rank the records per part_id ordered by produced_at::date in ascending order.
The records with the oldest date will get 1, the records with the next oldest date will get 2 etc.
select part_id,produced_at
from (select part_id,produced_at
,dense_rank () over (partition by part_id order by produced_at::date) as dr
from part_subhourly_data
where produced_at < now() - interval '100 days'
) p
where dr <= 5
;