Rolling 31 day average including previous 31 days from BigQuery - sql

I've created a query that returns a counts the number of rows (records) for the last 31 days (based on a timestamp field) and include the previous 31 days before that period as well, eg. produce a query that returns both. I now have the following query:
SELECT
COUNT(*) OVER(ORDER BY datetime DESC RANGE BETWEEN 2678400000 PRECEDING AND CURRENT ROW) AS rolling_avg_31_days,
COUNT(*) OVER(ORDER BY datetime DESC RANGE BETWEEN 5356800000 PRECEDING AND CURRENT ROW) AS rolling_avg_62_days
FROM `p`
ORDER BY rolling_avg_31_days DESC LIMIT 1
And it returns some data, but not really the data I was hoping for:
rolling_avg_31_days | rolling_avg_62_days
8,422,783 | 9,790,304
If I query the same table with (rolling 62 days):
SELECT COUNT(*) FROM `p`
WHERE datetime > UNIX_MILLIS(CURRENT_TIMESTAMP)-5356800000 AND datetime < UNIX_MILLIS(CURRENT_TIMESTAMP)-2678400000'
I get a value of 6,192,920
I'm not sure what I'm doing wrong. Any help is much appreciated!

So, the first query is correct and gives you rolling counts (31 and 62 days) based on the timestamp field - also, because of order by .. desc and limit 1 you are getting the most row that has biggest rolling_avg_31_days which is not necessarily row for the most recent () datetime
The second query just produces count of rows between 62 and 31 days based on the current timestamp - which is as explain above is not what first query produces - thus the discrepancy
To further troubleshoot or to try to understand difference - change ORDER BY rolling_avg_31_days DESC LIMIT 1 to ORDER BY datetime DESC LIMIT 1 and also add datetime to select statement so you can see if it belong to current date or close to current statement so results are comparable

Instead of going with the above, I've decided to change the query to be a bit simpler:
SELECT
(SELECT COUNT(DISTINCT(wasabi_user_id)) FROM `p` WHERE datetime > UNIX_MILLIS(CURRENT_TIMESTAMP)-5356800000 AND datetime < UNIX_MILLIS(CURRENT_TIMESTAMP)-2678400000) as _62days,
(SELECT COUNT(DISTINCT(wasabi_user_id)) FROM `p` WHERE datetime > UNIX_MILLIS(CURRENT_TIMESTAMP)-2678400000) AS _31days
FROM `mycujoo_kafka_public.v_web_event_pageviews` LIMIT 1
Thanks #Mikhail for the help though!

Related

sum last n days quantity using sql window function

I am trying to create following logic in Alteryx and data is coming from Exasol database.
Column “Sum_Qty_28_days“ should sum up the values of “Qty ” column for same article which falls under last 28 days.
My sample data looks like:
and I want following output:
E.g. “Sum_Qty_28_days” value for “article” = ‘A’ and date = ‘’2019-10-8” is 8 because it is summing up the “Qty” values associated with dates (coming within previous 28 days) Which are:
2019-09-15
2019-10-05
2019-10-08
for “article” = ‘A’.
Is this possible using SQL window function?
I tried myself with following code:
SUM("Qty") OVER (PARTITION BY "article", date_trunc('month',"Date")
ORDER BY "Date")
But, it is far from what I need. It is summing up the Qty for dates falling in same month. However, I need to sum of Qty for last 28 days.
Thanks in advance.
Yes, this is possible using standard SQL and in many databases. However, this will not work in all databases:
select t.*,
sum(qty) over (partition by article
order by date
range between interval '27 day' preceding and current row
) as sum_qty_28_days
from t;
If your RDBMS does not support the range frame, an alternative solution is to use an inline subquery:
select
t.*,
(
select sum(t1.qty)
from mytable t1
where
t1.article = t.article
and t1.date between t.date - interval 28 days and t.date
) sum_qty_28_days
from mytable t

How to compare time stamps from consecutive rows

I have a table that I would like to sort by a timestamp desc and then compare all consecutive rows to determine the difference between each row. From there, I would like to find all the rows whose difference is greater than ~2hours.
I'm stuck on how to actually compare consecutive rows in a table. Any help would be much appreciated.
I'm using Oracle SQL Developer 3.2
You didn't show us your table definition, but something like this:
select *
from (
select t.*,
t.timestamp_column,
t.timestamp_column - lag(timestamp_column) over (order by timestamp_column) as diff
from the_table t
) x
where diff > interval '2' hour;
This assumes that timestamp_column is defined as timestamp not date (otherwise the result of the difference wouldn't be an interval)

how to display last 10 numbers in sql

i have a table A with two column (number varchar(600),Date_ varchar(800))
now i have to display last 10 numbers order by Date_.
SELECT top(10) Number,Date FROM A ORDER BY Date_ DESC,
the problem is that for one month its showing result as desired,
but as soon next month start it not showing result as desired
i want the result like this.
10,2/2/2016
22,1/2/2016
10,31/1/2016
20,30/1/2016
30,29/1/2016
23,28/1/2016
20,27/1/2016
11,26/1/2016
18,25/1/2016
62,24/1/2016
56,23/1/2016
54,22/1/2016
44,21/1/2016
i am getting this result for --/1/2016 month but not for --/2/2016.
so kindly help.
Try the below script
SELECT top(10) Number,Date
FROM A
ORDER BY convert(datetime,Date,103) DESC
If you don't want to/can't change the structure of your table, then you need to use Parsing.
SELECT TOP 10 PARSE(Number AS int) AS Number,
PARSE(Date AS datetime2) AS Date
FROM A
ORDER BY Date DESC
You may need to do a PARSE in your ORDER BY as well.
Just a small change to your code should fix this
SELECT top(10) Number,Date FROM A ORDER BY cast(DATE_ as date) DESC.
Typically dates are stored as numbers in Microsoft world, i.e. 1/1/1900 is 1
1/2/1900 is 2
1/31/1900 is 31 and so on...
So changing your varchar to a date (provided there is no junk in the field) should fix this.

Change select to a previous date

I have basic knowledge of SQL and have a question:
I am trying to select data from a time series (date and windspeed). I want to select the original wind speed value if it lies between hours 7 and 21. If the hour is outside this range I would like to assign the wind speed to the previous wind speed at hour 21. There is also a concern that there is the occasional point where hour 21 does not exist and would like to assign the windspeed as hour 20... 19 etc until it finds the next available hour.
SELECT
date,
CASE WHEN DATEPART(HH,date) < 7 OR DATEPART(HH,date) > 21
THEN '<WIND SPEED AT HOUR 21> ELSE <WIND SPEED> END AS ModifiedWindspeed
,WindSpeed, winddirection
from TerrainCorrectedHourlyWind w
This might make things clearer. If the hour is in the specified range, select windspeed. If not then select the wind speed from the prior day at 21 hours.
Though you've tagged the question mysql, I'm guessing this is actually SQL Server because of the DATEPART() function used. Try the following, which uses an OUTER APPLY to get your alternate value:
SELECT Date
, CASE
WHEN DATEPART(HOUR, Date)BETWEEN 7 AND 21 THEN w.WindSpeed
ELSE m.WindSpeed
END AS ModifiedWindSpeed
, w.WindSpeed
, w.WindDirection
FROM TerrainCorrectedHourlyWind AS w
OUTER APPLY(SELECT TOP 1 WindSpeed
FROM TerrainCorrectedHourlyWind
WHERE DATEPART(HOUR, Date)BETWEEN 7 AND 21
AND Date < w.Date
ORDER BY Date DESC)AS m;
Just to explain what this is doing--the OUTER APPLY will get the single most recent record (TOP 1 and ORDER BY Date DESC) for dates prior to the record in question (Date < w.Date) as well as within the hours specified. The CASE near the top chooses whether to use the current value or this alternate one based on the hour.

Find closest date in SQL Server

I have a table dbo.X with DateTime column Y which may have hundreds of records.
My Stored Procedure has parameter #CurrentDate, I want to find out the date in the column Y in above table dbo.X which is less than and closest to #CurrentDate.
How to find it?
The where clause will match all rows with date less than #CurrentDate and, since they are ordered descendantly, the TOP 1 will be the closest date to the current date.
SELECT TOP 1 *
FROM x
WHERE x.date < #CurrentDate
ORDER BY x.date DESC
Use DateDiff and order your result by how many days or seconds are between that date and what the Input was
Something like this
select top 1 rowId, dateCol, datediff(second, #CurrentDate, dateCol) as SecondsBetweenDates
from myTable
where dateCol < #currentDate
order by datediff(second, #CurrentDate, dateCol)
I have a better solution for this problem i think.
I will show a few images to support and explain the final solution.
Background
In my solution I have a table of FX Rates. These represent market rates for different currencies. However, our service provider has had a problem with the rate feed and as such some rates have zero values. I want to fill the missing data with rates for that same currency that as closest in time to the missing rate. Basically I want to get the RateId for the nearest non zero rate which I will then substitute. (This is not shown here in my example.)
1) So to start off lets identify the missing rates information:
Query showing my missing rates i.e. have a rate value of zero
2) Next lets identify rates that are not missing.
Query showing rates that are not missing
3) This query is where the magic happens. I have made an assumption here which can be removed but was added to improve the efficiency/performance of the query. The assumption on line 26 is that I expect to find a substitute transaction on the same day as that of the missing / zero transaction.
The magic happens is line 23: The Row_Number function adds an auto number starting at 1 for the shortest time difference between the missing and non missing transaction. The next closest transaction has a rownum of 2 etc.
Please note that in line 25 I must join the currencies so that I do not mismatch the currency types. That is I don't want to substitute a AUD currency with CHF values. I want the closest matching currencies.
Combining the two data sets with a row_number to identify nearest transaction
4) Finally, lets get data where the RowNum is 1
The final query
The query full query is as follows;
; with cte_zero_rates as
(
Select *
from fxrates
where (spot_exp = 0 or spot_exp = 0)
),
cte_non_zero_rates as
(
Select *
from fxrates
where (spot_exp > 0 and spot_exp > 0)
)
,cte_Nearest_Transaction as
(
select z.FXRatesID as Zero_FXRatesID
,z.importDate as Zero_importDate
,z.currency as Zero_Currency
,nz.currency as NonZero_Currency
,nz.FXRatesID as NonZero_FXRatesID
,nz.spot_imp
,nz.importDate as NonZero_importDate
,DATEDIFF(ss, z.importDate, nz.importDate) as TimeDifferece
,ROW_NUMBER() Over(partition by z.FXRatesID order by abs(DATEDIFF(ss, z.importDate, nz.importDate)) asc) as RowNum
from cte_zero_rates z
left join cte_non_zero_rates nz on nz.currency = z.currency
and cast(nz.importDate as date) = cast(z.importDate as date)
--order by z.currency desc, z.importDate desc
)
select n.Zero_FXRatesID
,n.Zero_Currency
,n.Zero_importDate
,n.NonZero_importDate
,DATEDIFF(s, n.NonZero_importDate,n.Zero_importDate) as Delay_In_Seconds
,n.NonZero_Currency
,n.NonZero_FXRatesID
from cte_Nearest_Transaction n
where n.RowNum = 1
and n.NonZero_FXRatesID is not null
order by n.Zero_Currency, n.NonZero_importDate