Simple Averaging Algorithm is Slightly Off. Why? Active Record/PostgreSQL issue? - sql

In my Rails app, I have two custom Rake Tasks running every 30 minutes. Task A scrapes hourly prices from the internet and saves them to a database as HourlyPrice. Task B goes into the db, takes hourly prices from each day for the last seven days, and averages them to create a new DailyAveragePrice record in a separate DB Table.
However, when running Task B, the last day's (of the seven) average price is incorrect.
After fiddling with the hourly prices of that day in an Excel spreadsheet, I see that the average price Task B is generating is the result of taking only the last three hours and averaging them.
Task B is mostly done with this single query:
averages = HourlyPrice.where('date >= ?', 7.days.ago).average(:price, :group => "DATE_TRUNC('day', date - INTERVAL '1 hour')")
I can't figure out why this is happening?
Clues
HourlyPrice has two attributes (datetime,price). Each HourlyPrice actually represents a price for the previous hour. So, source data lists a 24:00:00 price for each day which PostgreSQL does not want to import as is into a datetime column. Instead, it converts all 24:00:00 prices to 00:00:00 of the next day. To make up for this, I've tried to subtract an hour interval, as you can see in the query. Is this causing the problem?
My ActiveRecord's time zone is currently set to 'Mountain Time (US & Canada)'. That is where the price exchange is located. I have not adjusted my PostgreSQL DB's timezone, and I believe it defaults to UTC. When running Task B, I noticed that it was 9:20PM UTC, leaving three hours left in the UTC day, which might explain the averaging of only three HourlyPrices of the last of the seven days. I'll try running Task B again in the next hour to see if it will average only two hours. Update to come... Is this timezone conflict causing a problem, or is what I am doing insulated from timezones since I have my own date columns?
UPDATE - Problem identified, but how to fix?
Clue #2 is correct. It is a timezone issue. I just ran Task B again (an hour later, with 2 hours left until UTC day change), and it only averages two HourlyPrices now for the last of the seven days.
How can I fix my query above to average ONLY if there are 24 HourlyPrice records available?

Related

How Schedule a Azure Data Factory Trigger for 30 minute intervals per day

I how like a create a Azure Data Factory Triggger to run every day at 30min intervals. However, I don't seem to be able to create 30mins interval per day. The nearest I appear to get is 1 hour.
E.g, I would like 6:30, 7:00, 7:30, 8:00 etc.
But as you can see I appear to only schedule hourly per day
Why not this , its more simpler , its you want to start at :00 or:30 min mark .
Then set the start time as accordingly
I figure it out.
I simply adjusted the execution times to include start minutes of 0, with interval of 30

Date_diff with specific condition time start and time end

is it possible to have date_diff with specific start and end time?
let say my store are open from 8AM - 10PM, which is 14 Hours.
and I have a lot of stuff to sell during that time. One of the SKU is out of stock from 2022-11-01 06.00 PM until tomorrow 2022-11-02 11.00 AM.
Instead of calculate 24 hours, I just want to calculate only from opening store until it closed or until its restock. Meaning from 6PM to 11AM is 8 Hours
my query
select date_diff('2022-11-02 11.00 AM', '2022-11-02 06.00 PM', hour) from table
with the result 17 hours instead of 8 hours
There isn't a way to configure DATE_DIFF to do this for you, but it's possible to do what you want, with some effort.
You should convert your dates to timestamps (TIMESTAMP(yourdate) or CAST(yourdate AS TIMESTAMP)) and use TIMESTAMP_DIFF instead.
This will allow you to work with smaller intervals than days.
For your calculation, you ultimately need to find the total time difference between the two timestamps and then subtract the out-of-hours timeframe.
However, calculating the latter is not as simple as taking the difference in days and multiplying by 8 hours (10pm-6am), because your out-of-hours calculation has to account for weekends and possibly holidays etc. Hence it can get quite complex, which is where the solution in my first link might come in.

SQL date time Query

Need help to get the data of particular format
We have a table which have a data which of production now we need to select the data of each day with particular time period which is differentiate between three shift A,B,C.
In our table we have a datetime column which capture's each seconds data now that data we need in shiftwise like 6am to 2pm is of A shift production count and 2pm to 10pm of shift B and 10pm to 6 am of shift C.
here i am getting the data for single day where i have written the below query which is working good.
select distinct(count(PRD_SERIAL_NUMBER)),(select convert(date,getdate())) as date,'B' as shift_name
from table_name
where status=02
and LAST_UPDATED_DATE
between (SELECT FORMAT(GETDATE(),'yyyy-MM-dd 14:01:00.000')) and
(SELECT FORMAT(GETDATE()-26,'yyyy-MM-dd 22:01:00.000'))
refer below output image 1
Here i am getting the count for single day and for upcoming days i have solution but now the question arise is i have a past 4 Month data which i need to get in datewise and shiftwise count and for the column prd_serial_number have duplicate entries so it should be in distinct.
please refer below image 2 for required output format

An Advanced Query Date Grouping Dilemna

In my Rails app's PostgreSQL DB are records containing hourly prices for the last 10 years:
10(24 x 365) of these: "12/31/2012 01:00:00", "11.99"
The following query, groups prices by day, averages the prices in those daily groupings to create daily price averages, and returns "day", "daily average" pairs for each day:
HourlyPrice.average(:price, :group => "DATE_TRUNC('day', date)")
The problem is, the hourly prices in my source data actually reflect the price for the previous hour. So, in my data source .CSV, the day starts at the time 01:00:00 and ends at the time 24:00:00.
This conflicts with how PostgreSQL likes to save records in its DateTime column. Upon importing the CSV data, PostgreSQL converts my records containing the time 24:00:00 to 00:00:00 of the next day.
This throws off the accuracy of my Averaging Query above. To fix the query, I still want to group by day, but offset 1 hour. So, that the range averaged starts at 01:00:00 and ends with the 00:00:00 value of the next day.
Is it possible to adjust the above query to reflect this?
You could subtract one hour from date before applying the DATE_TRUNC function to it, like this:
HourlyPrice.average(:price, :group => "DATE_TRUNC('day', date - INTERVAL '1 hour')")

A Database DateTime Value Conflict

I have hourly price data for 10 years. Meaning, 24 prices for each day.
The problem is, the price is from the previous hour of trading. So, the source of my data has listed a 24th hour for each day, and there is no 0 hour.
Example (for further clarity):
The records for a day start at: 07/20/2010 01:00:00
The records for a day end at: 07/20/2010 24:00:00
This conflicts with the way my Rails Apps PostgreSQL DB wants to save DateTime value. When I imported this data from CSV into my DB and saved the dates into a DateTime column, it changed all of the 24:00:00 into 00:00:00 of the following day. This throws off the accuracy of my various end-uses.
Is there anyway I can modify my Postgres DB's behavior to not do this? Any other suggestions?
You could always subtract an hour after you perform the import.
I don't know your database schema so to do this in a general fashion you'd have to execute this SQL on each column that has a date.
UPDATE table SET date_field = date_field - INTERVAL '1 hour'