SQL: select one record for each day nearest to a specific time - sql

I have one table that stores values with a point in time:
CREATE TABLE values
(
value DECIMAL,
datetime DATETIME
)
There may be many values on each day, there may also be only one value for a given day. Now I want to get the value for each day in a given timespan (e.g. one month) which is nearest to a given time of day. I only want to get one value per day if there are records for this day or no value if there are no records. My database is PostgreSQL. I'm quite stuck with that. I could just get all values in the timespan and select the nearest value for each day programmatically, but that would mean to pull a huge amount of data from the database, because there can be many values on one day.
(Update)
To formulate it a bit more abstract: I have data of arbitrary precision (could be one minute, could be two hours or two days) and I want to convert it to a fixed precision of one day, with a specific time of day.
(second update)
This is the query from the accepted answer with correct postgresql type converstions, assuming the desired time is 16:00:
SELECT datetime, value FROM values, (
SELECT DATE(datetime) AS date, MIN(ABS(EXTRACT(EPOCH FROM TIME '16:00' - CAST(datetime AS TIME)))) AS timediff
FROM values
GROUP BY DATE(datetime)
) AS besttimes
WHERE
CAST(values.datetime AS TIME) BETWEEN TIME '16:00' - CAST(besttimes.timediff::text || ' seconds' AS INTERVAL)
AND TIME '16:00' + CAST(besttimes.timediff::text || ' seconds' AS INTERVAL)
AND DATE(values.datetime) = besttimes.date

How about going into this direction?
SELECT values.value, values.datetime
FROM values,
( SELECT DATE(datetime) AS date, MIN(ABS(_WANTED_TIME_ - TIME(datetime))) AS timediff
FROM values
GROUP BY DATE(datetime)
) AS besttimes
WHERE TIME(values.datetime) BETWEEN _WANTED_TIME_ - besttimes.timediff
AND _WANTED_TIME_ + besttimes.timediff
AND DATE(values.datetime) = besttimes.date
I am not sure about the date/time extracting and abs(time) functions, so you will have to replace them probably.

It appears you have two parts to solve:
Are there any results for a day at all?
If there are, then which is the nearest one?
By shortcircuiting the process at part 1 if you have no results you'll save a lot of execution time.
The next thing to note is that you don't have to pull the data from the database, wait until you have an answer or not by using PLSQL functions (or something else) to work it out on the server first.
Once you have a selection of times to check you can use intervals to compare them. Check the Postgres docs on intervals and datetime functions for precise instructions, but basically you minus the selected dates from the date you've given and the one with the smallest interval is the one you want.

Related

How can I output values for time intervals with no data in QuestDB

I am using QuestDB to get the amount of events we are receiving every 500 milliseconds. Everything works as expected and I can use SAMPLE BY 500T to aggregate in half a second intervals.
However, for the intervals where we don't have any data, we are not getting any rows. I guess this is expected, but it would be good to have some way of getting a row for those intervals just with null or empty values.
Luckily in QuestDB you have the FILL keyword to do exactly that. Take this query running at the public QuestDB demo:
SELECT
timestamp, count()
FROM trades
WHERE timestamp > dateadd('d', -1, now())
SAMPLE BY 500T ALIGN TO CALENDAR;
In this case I am aggregating every 500 milliseconds and getting results only for the intervals where I have data. I am limiting to only the past day. You can run this on the demo site as it is a live dataset and you should see gaps for some intervals.
Now, by using FILL I can add the rows for the periods with no values
SELECT
timestamp, count()
FROM trades
WHERE timestamp > dateadd('d', -1, now())
SAMPLE BY 500T FILL(NULL) ALIGN TO CALENDAR;
Note that you could also fill with LINEAR (linear interpolation of previous and next rows), PREV for the value of the row before, or with a constant value.

Finding the Closest Unbooked Dates Using SQL

Scenario
A user selects a date. Based on the selection I check whether the date & time is booked or not (No issues here).
If a date & time is booked, I need to show them n alternative dates. Based on their date and time parameters, and those proposed alternative dates have to be as close as to their chosen date as possible. The list of alternative dates should start from the date the query is ran on My backend handles this.
My Progress So Far
SELECT alternative_date
FROM GENERATE_SERIES(
TIMESTAMP '2022-08-20 05:00:00',
date_trunc('month', TIMESTAMP '2022-08-20 07:00:00') + INTERVAL '1 month - 1 day',
INTERVAL '1 day'
) AS G(alternative_date)
WHERE NOT EXISTS(
SELECT * FROM events T
WHERE T.bookDate::DATE = G.alternative_date::DATE
)
The code above uses the GENERATE_SERIES(...) function in PSQL. It searches for all dates, starting from 2022-08-20, and up to the end of August. It specifically returns the dates which does not exist in the bookDate column (Meaning it has not yet been booked).
Problems I Need Help With
When searching for alternative dates, I'm providing 3 important things
The user's preferred booking date, so I can suggest which other dates are close to him that he can choose? How would I go about doing this? It's the part where I'm facing most trouble.
The user's start and end times, so when providing a list of alternative dates, I can tell him, hey there's free space between 06 and 07 on the date 2022-08-22 for instance. I'm also facing some issues here, a push in the right track will be great!
I want to add another WHERE but it fails, the current WHERE is a NOT EXISTS so it looks for all dates not equaling to what is given. My other WHERE basically means WHERE the place is open for booking or not.
To get closest free dates, you can ORDER BY your result by "distance" of particular alternative date to user's preferred date - the shortest intervals will be first:
ORDER BY alternative_date - TIMESTAMP '2022-08-20 05:00:00'
If you want to recommend time slots smaller than whole dates (hour range), you need to switch the whole thing from dates to hours, i.e. generate_series from 1 day to 1 hour (or whatever your smallest bookable unit is) and excluse invalid hours (nighttime I assume) in WHERE. From there, it is pretty much the same as with dates.
As for "second where", there can be only one WHERE, but it can be composed from multiple conditions - you can add more conditions using AND operator (and it can also be sub-query if needed):
WHERE NOT EXISTS(
SELECT * FROM events T
WHERE T.bookDate::DATE = G.alternative_date::DATE
) AND NOT EXISTS (
SELECT 1 FROM events WHERE "roomId" = '13b46460-162d-4d32-94c0-e27dd9246c79'
)
(warning: this second sub-query is probably dangerous in real world, since the room will be used more than one time, I assume, so you need to add some time condition to the subquery to check against date)

SQL: select date rows that contain specific hour and minute

I am querying a table that has the date column as follows:
date
2021-03-08 05:05:31+00
2021-03-08 05:10:31+00
How can I select all the rows that contain 05:05 as the hour and minute in SQL? i.e. rows where hour = 05, and minute = 05. In this case it will be the first row.
Q: How can I select all the rows that contain 05:05 as the hour and minute in SQL?
A: For MySQL, look in the MySql Date and Time functions. There, you'll find Extract().
You can use it as follows:
https://www.w3schools.com/sql/func_mysql_extract.asp
Extract the minute from a datetime:
SELECT EXTRACT(MINUTE FROM "2017-06-15 09:34:21");
This assumes that you're storing the column as a "Date" type.
Different RDBMS vendors have different Date/Time functions. You'll have to read the documentation and experiment to determine which syntax to use for your particular DB vendor and your particular table schema.
You Can Use below Query for get Result as per your question .
There is DateName function in SQL and you can put this in your query as below.
CreatedDate is column name..
Example :
Select * from #tmp1 where datename(hour,createdDate)=07 And datename(minute,CreatedDate)=07

Creating a DAX pattern that counts days between a date field and a month value on a chart's x-axis

I am struggling with a DAX pattern to allow me to plot an average duration value on a chart.
Here is the problem: My dataset has a field called dtOpened which is a date value describing when something started, and I want to be able to calculate the duration in days since that date.
I then want to be able to create an average duration since that date over a time period.
It is very easy to do when thinking about the value as it is now, but I want to be able to show a chart that describes what that average value would have been over various time periods on the x-axis (month/quarter/year).
The problem that I am facing is that if I create a calculated column to find the current age (NOW() - [dtOpened]), then it always uses the NOW() function - which is no use for historic time spans. Maybe I need a Measure for this, rather than a calculated column, but I cannot work out how to do it.
I have thought about using LASTDATE (rather than NOW) to work out what the last date would be in the filter context of any single month/quarter/year, but if the current month is only half way through, then it would probably need to consider today's date as the value from which to subtract the dtOpened value.
I would appreciate any help or pointers that you can give me!
It looks like you have a table (let's call it Cases) storing your cases with one record per case with fields like the following:
casename, dtOpened, OpenClosedFlag
You should create a date table with on record per day spanning your date range. The date table will have a month ending date field identifying the last day of the month (same for quarter & year). But this will be a disconnected date table. Don't create a relationship between the Date on the Date table and your case open date.
Then use iterative averagex to average the date differences.
Average Duration (days) :=
CALCULATE (
AVERAGEX ( Cases, MAX ( DateTable[Month Ending] ) - Cases[dtopened] ),
FILTER ( Cases, Cases[OpenClosedFlag] = "Open" ),
FILTER ( Cases, Cases[dtopened] <= MAX ( DateTable[Month Ending] ) )
)
Once you plot the measure against your Month you should see the average values represented correctly. You can do something similar for quarter & year.
You're a genius, Rory; Thanks.
In my example, I had a dtClosed field rather than an Opened/Closed flag, so there was one extra piece of filtering to do to test if the Case was closed at that point in time. So my measure ended up looking like this:
Average Duration:=CALCULATE(
AVERAGEX(CasesOnly, MAX(DT[LastDateM]) - CasesOnly[Owner Opened dtOnly]),
FILTER(CasesOnly, OR(ISBLANK(CasesOnly[Owner Resolution dtOnly]),
CasesOnly[Owner Resolution dtOnly] > MAX(DT[LastDateM]))),
FILTER(CasesOnly, CasesOnly[Owner Opened dtOnly] <= MAX(DT[LastDateM]))
)
And to get the chart, I plotted the DT[Date] field on the x-axis.
Thanks very much again.

sqlalchemy select by date column only x newset days

suppose I have a table MyTable with a column some_date (date type of course) and I want to select the newest 3 months data (or x days).
What is the best way to achieve this?
Please notice that the date should not be measured from today but rather from the date range in the table (which might be older then today)
I need to find the maximum date and compare it to each row - if the difference is less than x days, return it.
All of this should be done with sqlalchemy and without loading the entire table.
What is the best way of doing it? must I have a subquery to find the maximum date? How do I select last X days?
Any help is appreciated.
EDIT:
The following query works in Oracle but seems inefficient (is max calculated for each row?) and I don't think that it'll work for all dialects:
select * from my_table where (select max(some_date) from my_table) - some_date < 10
You can do this in a single query and without resorting to creating datediff.
Here is an example I used for getting everything in the past day:
one_day = timedelta(hours=24)
one_day_ago = datetime.now() - one_day
Message.query.filter(Message.created > one_day_ago).all()
You can adapt the timedelta to whatever time range you are interested in.
UPDATE
Upon re-reading your question it looks like I failed to take into account the fact that you want to compare two dates which are in the database rather than today's day. I'm pretty sure that this sort of behavior is going to be database specific. In Postgres, you can use straightforward arithmetic.
Operations with DATEs
1. The difference between two DATES is always an INTEGER, representing the number of DAYS difference
DATE '1999-12-30' - DATE '1999-12-11' = INTEGER 19
You may add or subtract an INTEGER to a DATE to produce another DATE
DATE '1999-12-11' + INTEGER 19 = DATE '1999-12-30'
You're probably using timestamps if you are storing dates in postgres. Doing math with timestamps produces an interval object. Sqlalachemy works with timedeltas as a representation of intervals. So you could do something like:
one_day = timedelta(hours=24)
Model.query.join(ModelB, Model.created - ModelB.created < interval)
I haven't tested this exactly, but I've done things like this and they have worked.
I ended up doing two selects - one to get the max date and another to get the data
using the datediff recipe from this thread I added a datediff function and using the query q = session.query(MyTable).filter(datediff(max_date, some_date) < 10)
I still don't think this is the best way, but untill someone proves me wrong, it will have to do...