7 day rolling average using Unix seconds in Hive - hive

I need to find the 7 day rolling average of the temperature.
Using date strings provided in my dataset, I created a unix timestamp. I substituted the first time stamp on each day with the associated unix timestamp on midnight of that day.
There are 604800 unix seconds in one week, so I tried using the following code to calculate it, but it did not work. How can I fix this code so it performs the window calculation correctly?
DROP VIEW IF EXISTS every_7_days;
CREATE VIEW every_7_days AS
SELECT weather_dt,
time,
fixed_unix_time,
temperature,
avg(temperature) OVER(ORDER BY fixed_unix_time RANGE BETWEEN 604800 PRECEDING AND CURRENT ROW) AS roll7day_avg
FROM clean_first_row
ORDER BY fixed_unix_time;

Related

Calculate the average time between two dates

I need to find the result of a calculation that is nothing more than the average time in days from creation to completion of a task.
In this case, using a Redshift database (looker).
I have two dates (2022/10/01 to 2022/10/21) and I need to find the average day of execution of the creation of an object from start to finish.
Previously, I was able to calculate the totals of objects created per day, but I can't bring up the average:
SELECT created::date, count(n1pk_package_id)
FROM dbt_dw.base_package
WHERE fk_company_id = 245821 and created >= '2022-10-01' and created < '2022-10-22'
GROUP BY created::date
ORDER BY created DESC
I'm not able to do the opposite way of the count to bring the average of the range of days.
Assumption:
There is a created column in your table
You want to know the 'average' of the created column
You could extract the number of days that each date is different from a base date, and then use that to determine the 'average date'. It would be something like this:
select
date '2022-10-01' + interval '1 day' * int(avg(created - date '2022-10-01'))
from table
It subtracts a date (any date will do) from created, finds the average of that value against all desired rows, converts it to days and adds it back to that same date.

Date Functions Trunc (SysDate)

I am running the below query to get data recorded in the past 24 hours. I need the same data recorded starting midnight (DATE > 12:00 AM) and also data recorded starting beginning of the month. Not sure if using between will work or if there is better option. Any suggestions.
SELECT COUNT(NUM)
FROM TABLE
WHERE
STATUS = 'CNLD'
AND
TRUNC(TO_DATE('1970-01-01','YYYY-MM-DD') + OPEN_DATE/86400) = trunc(sysdate)
Output (Just need Count). OPEN_DATE Data Type is NUMBER. the output below displays count in last 24 hours. I need the count beginning midnight and another count starting beginning of the month.
The query you've shown will get the count of rows where OPEN_DATE is an 'epoch date' number representing time after midnight this morning*. The condition:
TRUNC(TO_DATE('1970-01-01','YYYY-MM-DD') + OPEN_DATE/86400) = trunc(sysdate)
requires every OPEN_DATE value in your table (or at least all those for CNLD rows) to be converted from a number to an actual date, which is going to be doing a lot more work than necessary, and would stop a standard index against that column being used. It could be rewritten as:
OPEN_DATE >= (trunc(sysdate) - date '1970-01-01') * 86400
which converts midnight this morning to its epoch equivalent, once, and compares all the numbers against that value; using an index if there is one and the optimiser thinks it's appropriate.
To get everything since the start of the month you could just change the default behaviour of trunc(), which is to truncate to the 'DD' element, to truncate to the start of the month instead:
OPEN_DATE >= (trunc(sysdate, 'MM') - date '1970-01-01') * 86400
And the the last 24 hours, subtract a day from the current time instead of truncating it:
OPEN_DATE >= ((sysdate - 1) - date '1970-01-01') * 86400
db<>fiddle with some made-up data to get 72 back for today, more for the last 24 hours, and more still for the whole month.
Based on your current query I'm assuming there won't be any future-dated values, so you don't need to worry about an upper bound for any of these.
*Ignoring leap seconds...
It sounds like you have a column that is of data type TIMESTAMP and you only want to select rows where that TIMESTAMP indicates that it is today's date? And as a related problem, you want to find those that are the current month, based on some system values like CURRENT TIMESTAMP and CURRENT DATE? If so, let's call your column TRANSACTION_TIMESTAMP instead of (reserved word) DATE. Your first query could be:
SELECT COUNT(NUM)
FROM TABLE
WHERE
STATUS = 'CLND'
AND
DATE(TRANSACTION_TIMESTAMP)=CURRENT DATE
The second example of finding all for the current month up to today's date could be:
SELECT COUNT(NUM)
FROM TABLE
WHERE
STATUS = 'CLND'
AND
YEAR(DATE(TRANSACTION_TIMESTAMP)=YEAR(CURRENT DATE) AND
MONTH(DATE(TRANSACTION_TIMESTAMP)=MONTH(CURRENT DATE) AND
DAY(DATE(TRANSACTION_TIMESTAMP)<=DAY(CURRENT DATE)

SQL timestamp filtering based only on time

I want to create a query in Oracle SQL that will grab records from a given time interval, during certain hours of the day, e.g. records between 10am to noon, in the past 10 days. I tried this, but it does not work:
select * from my_table where timestamp between
to_timestamp('2020-12-30','YYYY-MM-DD')
and
to_timestamp('2021-01-08','YYYY-MM-DD') and
timestamp between
to_timestamp('10:00:00','HH24:MI:SS')
and
to_timestamp('12:00:00','HH24:MI:SS')
where timestamp is of type TIMESTAMP. I have also thought of using a join, but I am struggling to find a way to filter on time of day.
Is there a way to filter using only the time, not the date, or a way to filter on time for every day in the interval?
select *
from my_table
where timestamp between to_timestamp('2020-12-30','YYYY-MM-DD')
and to_timestamp('2021-01-08','YYYY-MM-DD')
and timestamp - trunc(timestamp) between interval '10' hour
and interval '12' hour
If you don't need to include exactly noon (including no fractional seconds), you could also do
select *
from my_table
where timestamp between to_timestamp('2020-12-30','YYYY-MM-DD')
and to_timestamp('2021-01-08','YYYY-MM-DD')
and extract( hour from timestamp ) between 10 and 11
As an aside, I'd hope that your actual column name isn't timestamp. It's legal as a column name but it is a reserved word so you're generally much better off using a different name.

grafana: last 24 hours - shifted and 2 hours missing

I have a grafana chart showing the data of the last 24 hours
But the data does not fit the time axis. There is missing 2 hours in the beginning of the 24 hour period. And the last value at 21:27:57 is 66.74 but at this time it was 73.50.
The time axis seems to be shifted by 2 hours. The data at time x shows the data of time x-2h.
The timestamp (datetime) in the SQL database is correct.
EDIT:
Changing the timezone doesn't help much. Using UTC (which is wrong for me) the most recent time on the time axis is about 20:40 (wrong)
Using UTC+2 (which fits my timezone) the most recent time is about 22:40, the correct local time when taking the screenshot.
The data is not affected and there is still 2 hours missing in the 24 hour period. And still the most recent value in the chart shows the value of 2 hours ago.
I don't really understand why, but I figured out that there is a UNIX_TIMESTAMP() needed:
SELECT
UNIX_TIMESTAMP(timestamp) AS "time",
humidity
FROM Sensor_BME280_01
WHERE
$__timeFilter(timestamp)
ORDER BY timestamp
instead of
SELECT
timestamp AS "time",
humidity
FROM Sensor_BME280_01
WHERE
$__timeFilter(timestamp)
ORDER BY timestamp
The value timestamp is of type DATETIME in a MariaDB.

datetime manipulation: replace all dates with 00:00 time with 24:00 the previous day

I have a table described here: http://sqlfiddle.com/#!3/f8852/3
The date_time field for when the time is 00:00 is wrong. For example:
5/24/2013 00:00
This should really be:
5/23/2013 24:00
So hour 00:00 corresponds to the last hour of the previous day (I didn't create this table but have to work with it). Is there way quick way when I do a select I can replace all dates with 00:00 as the time with 24:00 the previous day? I can do it easily in python in a for loop but not quite sure how to structure it in sql. Appreciate the help.
All datetimes are instants in time, not spans of a finite length, and they can exist in only one day. The instant that represents Midnight is by definition, in the next day, the day in which it is the start of the day, i.e., a day is closed on its beginning and open at its end, or, to phrase it again, valid allowable time values within a single calendar date vary from 00:00:00.00000, to 23:59:59.9999.
This would be analogous to asking that the minute value within an hour be allowed to vary from 1 to 60, instead of from 0 to 59, and that the value of 60 was the last minute of the previous hour.
What you are talking about is only a display issue. Even if you could enter a date as 1 Jan 2013 24:00, (24:00:00 is not a legal time of day) it would be entered as a datetime at the start of the date 2 Jan, not at the end of 1 Jan.
One thing that illustrates this, is to notice that, because of rounding (SQL can only resolve datetimes to within about 300 milleseconds), if you create a datetime that is only a few milleseconds before midnight, it will round up to midnight and move to the next day, as can be seen by running the following in enterprise manager...
Select cast ('1 Jan 2013 23:59:59.999' as datetime)
SQL server stoers all datetimes as two integers, one that represents the number days since 1 Jan 1900, and the other the number of ticks (1 tick is 1/300th of a second, about 3.33 ms), since midnight. If it has been zero time interval since Midnight, it is stll the same day, not the previous day.
If you have been inserting data assuming that midnight 00:00:00 means the end of the day, you need to fix that.
If you need to correct your existing data, you need to add one day to every date in your database that has midnight as it's time component, (i.e., has a zero time component).
Update tbale set
date_time = dateAdd(day, 1, date_time)
Where date_time = dateadd(day, datediff(day, 0, date_time), 0)