Finding the Closest Unbooked Dates Using SQL - sql

Scenario
A user selects a date. Based on the selection I check whether the date & time is booked or not (No issues here).
If a date & time is booked, I need to show them n alternative dates. Based on their date and time parameters, and those proposed alternative dates have to be as close as to their chosen date as possible. The list of alternative dates should start from the date the query is ran on My backend handles this.
My Progress So Far
SELECT alternative_date
FROM GENERATE_SERIES(
TIMESTAMP '2022-08-20 05:00:00',
date_trunc('month', TIMESTAMP '2022-08-20 07:00:00') + INTERVAL '1 month - 1 day',
INTERVAL '1 day'
) AS G(alternative_date)
WHERE NOT EXISTS(
SELECT * FROM events T
WHERE T.bookDate::DATE = G.alternative_date::DATE
)
The code above uses the GENERATE_SERIES(...) function in PSQL. It searches for all dates, starting from 2022-08-20, and up to the end of August. It specifically returns the dates which does not exist in the bookDate column (Meaning it has not yet been booked).
Problems I Need Help With
When searching for alternative dates, I'm providing 3 important things
The user's preferred booking date, so I can suggest which other dates are close to him that he can choose? How would I go about doing this? It's the part where I'm facing most trouble.
The user's start and end times, so when providing a list of alternative dates, I can tell him, hey there's free space between 06 and 07 on the date 2022-08-22 for instance. I'm also facing some issues here, a push in the right track will be great!
I want to add another WHERE but it fails, the current WHERE is a NOT EXISTS so it looks for all dates not equaling to what is given. My other WHERE basically means WHERE the place is open for booking or not.

To get closest free dates, you can ORDER BY your result by "distance" of particular alternative date to user's preferred date - the shortest intervals will be first:
ORDER BY alternative_date - TIMESTAMP '2022-08-20 05:00:00'
If you want to recommend time slots smaller than whole dates (hour range), you need to switch the whole thing from dates to hours, i.e. generate_series from 1 day to 1 hour (or whatever your smallest bookable unit is) and excluse invalid hours (nighttime I assume) in WHERE. From there, it is pretty much the same as with dates.
As for "second where", there can be only one WHERE, but it can be composed from multiple conditions - you can add more conditions using AND operator (and it can also be sub-query if needed):
WHERE NOT EXISTS(
SELECT * FROM events T
WHERE T.bookDate::DATE = G.alternative_date::DATE
) AND NOT EXISTS (
SELECT 1 FROM events WHERE "roomId" = '13b46460-162d-4d32-94c0-e27dd9246c79'
)
(warning: this second sub-query is probably dangerous in real world, since the room will be used more than one time, I assume, so you need to add some time condition to the subquery to check against date)

Related

How to set a max range condition with timescale time_bucket_gapfill() in order to not fill real missing values?

I'd like some advices to know if what I need to do is achievable with timescale functions.
I've just found out I can use time_bucket_gapfill() to complete missing data, which is amazing! I need data each 5 minutes but I can receive 10 minutes, 30 minutes or 1 hour data. So the function helps me to complete the missing points in order to have only 5 minutes points. Also, I use locf() to set the gapfilled value with last value found.
My question is: can I set a max range when I set the last value found with locf() in order to never overpass 1 hour ?
Example: If the last value found is older than 1 hour ago I don't want to fill gaps, I need to leave it empty to say we have real missing values here.
I think I'm close to something with this but apparently I'm not allowed to use locf() in the same case.
ERROR: multiple interpolate/locf function calls per resultset column not supported
Somebody have an idea how I can resolve that?
How to reproduce:
Create table powers
CREATE table powers (
delivery_point_id BIGINT NOT NULL,
at timestamp NOT NULL,
value BIGINT NOT NULL
);
Create hypertable
SELECT create_hypertable('powers', 'at');
Create indexes
CREATE UNIQUE INDEX idx_dpid_at ON powers(delivery_point_id, at);
CREATE INDEX index_at ON powers(at);
Insert data for one day, one delivery point, point 10 minutes
INSERT INTO powers SELECT 1, at, round(random()*10000) FROM generate_series(TIMESTAMP '2021-01-01 00:00:00', TIMESTAMP '2022-01-02 00:00:00', INTERVAL '10 minutes') AS at;
Remove three hours of data from 4am to 7am
DELETE FROM powers WHERE delivery_point_id = 1 AND at < '2021-01-1 07:00:00' AND at > '2021-01-01 04:00:00';
The query that need to be fixed
SELECT
time_bucket_gapfill('5 minutes', at) AS point_five,
avg(value) AS avg,
CASE
WHEN (locf(at) - at) > interval '1 hour' THEN null
ELSE locf(avg(value))
END AS gapfilled
FROM powers
GROUP BY point_five, at
ORDER BY point_five;
Actual: ERROR: multiple interpolate/locf function calls per resultset column not supported
Expected: Gapfilled values each 5 minutes except between 4am and 7 am (real missing values).
This is a great question! I'm going to provide a workaround for how to do this with the current stuff, but I think it'd be great if you'd open a Github issue as well, because there might be a way to add an option for this that doesn't require a workaround like this.
I also think your attempt was a good approach and just requires a few tweaks to get it right!
The error that you're seeing is that we can't have multiple locf calls in a single column, this is a limitation that's pretty easy to work around as we can just shift both of them into a subquery, but that's not enough. The other thing that we need to change is that locf only works on aggregates, right now, you’re trying to use it on a column (at) that isn’t aggregated, which isn’t going to work, because it wouldn’t know which of the values of at in a time_bucket to “pull forward” for the gapfill.
Now you said you want to fill data as long as the previous point wasn’t more than one hour ago, so, we can take the last value of at in the bucket by using last(at, at) this is also the max(at) so either of those aggregates would work. So we put that into a CTE (common table expression or WITH query) and then we do the case statement outside like so:
WITH filled as (SELECT
time_bucket_gapfill('5 minutes', at) AS point_five,
avg(value) AS avg,
locf(last(at, at)) as filled_from,
locf(avg(value)) as filled_avg
FROM powers
WHERE at BETWEEN '2021-01-01 01:30:00' AND '2021-01-01 08:30:00'
AND delivery_point_id = 1
GROUP BY point_five
ORDER BY point_five)
SELECT point_five,
avg,
filled_from,
CASE WHEN point_five - filled_from > '1 hour'::interval THEN NULL
ELSE filled_avg
END as gapfilled
FROM filled;
Note that I’ve tried to name my CTE expressively so that it’s a little easier to read!
Also, I wanted to point out a couple other hyperfunctions that you might think about using:
heartbeat_agg is a new/experimental one that will help you determine periods when your system is up or down, so if you're expecting points at least every hour, you can use it to find the periods where the delivery point was down or the like.
When you have more irregular sampling or want to deal with different data frequencies from different delivery points, I’d take a look a the time_weight family of functions. They can be more efficient than using something like gapfill to upsample, by instead letting you treat all the different sample rates similarly, without having to create more points and more work to do so. Even if you want to, for instance, compare sums of values, you’d use something like integral to get the time weighted sum over a period based on the locf interpolation.
Anyway, hope all that is helpful!

SQL SELECT date from table, and calculate how many days since that date

I'm looking to calculate how many days have passed since a specific date, retrieved from a table in my database. Based on the info I've found on W3Schools (Here), I have attempted using DATEDIFF, but am coming up against a couple of different errors I can't seem to work around.
I have included my code below, and based on this, what I want to happen is this: Select the "DD" from the "Wave_Data" table, and, based on "sysdate", work out how many days have lapsed since then.
SELECT DATEDIFF(WEEKDAY,:P1_DD,SYSDATE)
FROM WAVE_DATA
WHERE WAVE_NUMBER = :P1_WAVE;
The final calculation would then be inputted into a text field within my ApEx database.
Thank you in advance for any help you may be able to provide,
Dominic
In Oracle you can just subtract one Date from another to get the difference (in days) between them:
SELECT SYSDATE - :p1_dd
FROM Wave_Data
WHERE Wave_Number = :p1_wave;
If you want to know the difference between the dates without any time parts then you can do:
SELECT TRUNC( SYSDATE ) - TRUNC( :p1_dd )
FROM Wave_Data
WHERE Wave_Number = :p1_wave;
or
SELECT FLOOR( SYSDATE - :p1_dd )
FROM Wave_Data
WHERE Wave_Number = :p1_wave;

sqlalchemy select by date column only x newset days

suppose I have a table MyTable with a column some_date (date type of course) and I want to select the newest 3 months data (or x days).
What is the best way to achieve this?
Please notice that the date should not be measured from today but rather from the date range in the table (which might be older then today)
I need to find the maximum date and compare it to each row - if the difference is less than x days, return it.
All of this should be done with sqlalchemy and without loading the entire table.
What is the best way of doing it? must I have a subquery to find the maximum date? How do I select last X days?
Any help is appreciated.
EDIT:
The following query works in Oracle but seems inefficient (is max calculated for each row?) and I don't think that it'll work for all dialects:
select * from my_table where (select max(some_date) from my_table) - some_date < 10
You can do this in a single query and without resorting to creating datediff.
Here is an example I used for getting everything in the past day:
one_day = timedelta(hours=24)
one_day_ago = datetime.now() - one_day
Message.query.filter(Message.created > one_day_ago).all()
You can adapt the timedelta to whatever time range you are interested in.
UPDATE
Upon re-reading your question it looks like I failed to take into account the fact that you want to compare two dates which are in the database rather than today's day. I'm pretty sure that this sort of behavior is going to be database specific. In Postgres, you can use straightforward arithmetic.
Operations with DATEs
1. The difference between two DATES is always an INTEGER, representing the number of DAYS difference
DATE '1999-12-30' - DATE '1999-12-11' = INTEGER 19
You may add or subtract an INTEGER to a DATE to produce another DATE
DATE '1999-12-11' + INTEGER 19 = DATE '1999-12-30'
You're probably using timestamps if you are storing dates in postgres. Doing math with timestamps produces an interval object. Sqlalachemy works with timedeltas as a representation of intervals. So you could do something like:
one_day = timedelta(hours=24)
Model.query.join(ModelB, Model.created - ModelB.created < interval)
I haven't tested this exactly, but I've done things like this and they have worked.
I ended up doing two selects - one to get the max date and another to get the data
using the datediff recipe from this thread I added a datediff function and using the query q = session.query(MyTable).filter(datediff(max_date, some_date) < 10)
I still don't think this is the best way, but untill someone proves me wrong, it will have to do...

Get records as of today or up to a certain date in Oracle

Could somebody recommend the query to retrieve records up to today or certain dates?
I'm required to produce an Oracle report where user needs to enter a date and records up to that date will be shown.
I tried
select * from the_table where the_date <= sysdate
However it seems to produce an inaccurate result. What is the better query for this. For now I'm just playing around with sysdate. Later I will need to use a certain date keyed in by the user and all the records up to that date needs to be shown.
Any suggestions?
Sometimes you get inaccurate records because of little differences like minutes and seconds when two dates have the same day/month/year. Try the following
select * from the_table where TRUNC(the_date) <= sysdate
The TRUNC removes the minute and the seconds. Sometimes you get inaccurate records without using that

SQL: select one record for each day nearest to a specific time

I have one table that stores values with a point in time:
CREATE TABLE values
(
value DECIMAL,
datetime DATETIME
)
There may be many values on each day, there may also be only one value for a given day. Now I want to get the value for each day in a given timespan (e.g. one month) which is nearest to a given time of day. I only want to get one value per day if there are records for this day or no value if there are no records. My database is PostgreSQL. I'm quite stuck with that. I could just get all values in the timespan and select the nearest value for each day programmatically, but that would mean to pull a huge amount of data from the database, because there can be many values on one day.
(Update)
To formulate it a bit more abstract: I have data of arbitrary precision (could be one minute, could be two hours or two days) and I want to convert it to a fixed precision of one day, with a specific time of day.
(second update)
This is the query from the accepted answer with correct postgresql type converstions, assuming the desired time is 16:00:
SELECT datetime, value FROM values, (
SELECT DATE(datetime) AS date, MIN(ABS(EXTRACT(EPOCH FROM TIME '16:00' - CAST(datetime AS TIME)))) AS timediff
FROM values
GROUP BY DATE(datetime)
) AS besttimes
WHERE
CAST(values.datetime AS TIME) BETWEEN TIME '16:00' - CAST(besttimes.timediff::text || ' seconds' AS INTERVAL)
AND TIME '16:00' + CAST(besttimes.timediff::text || ' seconds' AS INTERVAL)
AND DATE(values.datetime) = besttimes.date
How about going into this direction?
SELECT values.value, values.datetime
FROM values,
( SELECT DATE(datetime) AS date, MIN(ABS(_WANTED_TIME_ - TIME(datetime))) AS timediff
FROM values
GROUP BY DATE(datetime)
) AS besttimes
WHERE TIME(values.datetime) BETWEEN _WANTED_TIME_ - besttimes.timediff
AND _WANTED_TIME_ + besttimes.timediff
AND DATE(values.datetime) = besttimes.date
I am not sure about the date/time extracting and abs(time) functions, so you will have to replace them probably.
It appears you have two parts to solve:
Are there any results for a day at all?
If there are, then which is the nearest one?
By shortcircuiting the process at part 1 if you have no results you'll save a lot of execution time.
The next thing to note is that you don't have to pull the data from the database, wait until you have an answer or not by using PLSQL functions (or something else) to work it out on the server first.
Once you have a selection of times to check you can use intervals to compare them. Check the Postgres docs on intervals and datetime functions for precise instructions, but basically you minus the selected dates from the date you've given and the one with the smallest interval is the one you want.