SQL Performance when quering for a time interval - sql

I have one table of tickets containing three relevant columns: id, start and finish where start and finish are timestamps.
I have a second table (intervals) with only one relevant column which is time point. time_point is also a timestamp. time_point is always every 15 minutes. That is content of this second table is:
8:00
8:15
8:30
...
The first table (ticket) has 4 millions of records. The second has only 96 records (24 * 4).
I have to select how many tickets are open at any time_point
I wrote the following query: (simplified version)
select *
from interval, ticket
where (finish is null or finish > time_point)
and start < time_point
which works but it is too slow. The problem is that there is no real join between both tables and a I presume that a full table scan is performed for every row.
How can I get better performance here?
Thanks!
EDIT: This is an Oracle DB.

i believe you dont need to cross join or create an interval table.
Instead try following:
> select count(*), tsd from (
> select
> /****************************************************************
> Now
> 1- bring your finish column into the format you need: HH24:MI
> 2- truncate its content down to the interval the row belongs to
> ****************************************************************/
> to_char(dt,'HH24')|| decode(trunc(to_char(dt,'MI')/15) * 15,0,'00',trunc(to_char(dt,'MI')/15)*15)
> tsd
> from (
> select nvl( finish ,to_date('31.12.2999', 'dd.mm.yyyy')) dt --
> from tickets
> /****************************************************************
> Now Filter out your tickets(before truncate), to find the relevant
> tickets for your period use a Parameter date and compare it to the
> start and end columns nvl( finish ,to_date('31.12.2999', dd.mm.yyyy'))
> ****************************************************************/
> where P_YOUR_PARAM_DATE between start
> and nvl( finish ,to_date('31.12.2999', 'dd.mm.yyyy'))
> ) dat
> ) group by tsd order by tsd ;

One way to speed this up is to include the finish column in a composite index so there's no need to read from the table to fetch that value:
create index IX_Tickets on Tickets(start,finish)
P.S. Drop any simple index on Tickets.start as well.
P.P.S. Please clarify: 8:00, 8:15 in your intervals table are not timestamp data type. Did you elminate the date element in your question for the sake of simplicity?

Related

How to make a group query to select multiple rows?

I have a DateTime column (timestamp 2022-05-22 10:10:12) with a batch of stamps per each day.
I need to filter the rows where stamp is before 9am (here is no problem) and I'm using this code:
SELECT * FROM tickets
WHERE date_part('hour'::text, tickets.date_in) < 9::double precision;
The output is the list of the rows where the time in timestamp is less than 9 am (50 rows from 2000).
date_in
2022-05-22 08:10:12
2022-04-23 07:11:13
2022-06-15 08:45:26
Then I need to find all the days where at least one row has a stamp before 9 am - and here I'm stuck. Any idea how to select all the days where at least one stamp was before 9 am?
The code I'm trying:
SELECT * into temp1 FROM tickets
WHERE date_part('hour'::text, tickets.date_in) < 9::double precision
ORDER BY date_part('day'::text, date_in);
Select * into temp2
from tickets, temp1
where date_part('day'::text, tickets.date_in) = date_part('day'::text, temp1.date_in);
Update temp2 set distorted_route = 1;
But this is giving me nothing.
Expected output is to get all the days where at least one route was done before 9am:
date_in
2022-05-22 08:10:12
2022-05-22 10:11:45
2022-05-22 12:14:59
2022-04-23 07:11:13
2022-04-23 11:42:25
2022-06-15 08:45:26
2022-06-15 15:10:57
Should I make an additional table (temp1) to feed it with the first query result (just the rows before 9am) and then make a cross table query to find in the source table public.tickets all the days which are equal to the public.temp1?
Select * from tickets, temp1
where TO_Char(tickets.date_in, 'YYYY-MM-DD')
= TO_Char(temp1.date_in, 'YYYY-MM-DD');
or like this:
SELECT *
FROM tickets
WHERE EXISTS (
SELECT date_in FROM TO_Char(tickets.date_in, 'YYYY-MM-DD') = TO_Char(temp1.date_in, 'YYYY-MM-DD')
);
Ideally, I'd want to avoid using a temporary table and make a request just for one table.
After that, I need to create a view or update and add some remarks to the source table.
Assuming you mean:
How to select all rows where at least one row exists with a timestamp before 9 am of the same day?
SELECT *
FROM tickets t
WHERE EXISTS (
SELECT FROM tickets t1
WHERE t1.date_in::date = t.date_in::date -- same day
AND t1.date_in::time < time '9:00' -- time before 9:00
AND t1.id <> t.id -- exclude self
)
ORDER BY date_id; -- optional, but typically helpful
id being the PK column of your undisclosed table.
But be aware that ...
... typically you'll want to work with timestamptz instead of timestamp. See:
Ignoring time zones altogether in Rails and PostgreSQL
https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_timestamp_.28without_time_zone.29
... this query is slow for big tables, because it cannot use a plain index on (date_id) (not "sargable"). Related:
How do you do date math that ignores the year?
There are various ways to optimize performance. The best way depends on undisclosed information for performance questions.

sql query to get today new records compared with yesterday

i have this table:
COD (Integer) (PK)
ID (Varchar)
DATE (Date)
I just want to get the new ID's from today, compared with yesterday (the ID's from today that are not present yesterday)
This needs to be done with just one query, maximum efficiency because the table will have 4-5 millions records
As a java developer i am able to do this with 2 queries, but with just one is beyond my knowledge so any help would be so much appreciated
EDIT: date format is dd/mm/yyyy and every day each ID may come 0 or 1 times
Here is a solution that will go over the base data one time only. It selects the id and the date where the date is either yesterday or today (or both). Then it GROUPS BY id - each group will have either one or two rows. Then it filters by the condition that the MIN date in the group is "today". Those are the id's that exist today but did not exist yesterday.
DATE is an Oracle keyword, best not used as a column name. I changed that to DT. I also assume that your "dt" field is a pure date (as pure as it can be in Oracle, meaning: time of day, which is always present, is 00:00:00).
select id
from your_table
where dt in (trunc(sysdate), trunc(sysdate) - 1)
group by id
having min(dt) = trunc(sysdate)
;
Edit: Gordon makes a good point: perhaps you may have more than one such row per ID, in the same day? In that case the time-of-day may also be different from 00:00:00.
If so, the solution can be adapted:
select id
from your_table
where dt >= trunc(sysdate) - 1 and dt < trunc(sysdate) + 1
group by id
having min(dt) >= trunc(sysdate)
;
Either way: (1) the base table is read just once; (2) the column DT is not wrapped within any function, so if there is an index on that column, it can be used to access just the needed rows.
The typical method would use not exists:
select t.*
from t
where t.date >= trunc(sysdate) and t.date < trunc(sysdate + 1) and
not exists (select 1
from t t2
where t2.id = t.id and
t2.date >= trunc(sysdate - 1) and t2.date < trunc(sysdate)
);
This is a general solution. If you know that there is at most one record per day, there are better solutions, such as using lag().
Use MINUS. I suppose your date column has a time part, so you need to truncate it.
select id from mytable where trunc(date) = trunc(sysdate)
minus
select id from mytable where trunc(date) = trunc(sysdate) - 1;
I suggest the following function index. Without it, the query would have to full scan the table, which would probably be quite slow.
create idx on mytable( trunc(sysdate) , id );

SELECT statement optimization

I'm not so expert in SQL queryes, but not even a complete newbie.
I'm exporting data from a MS-SQL database to an excel file using a SQL query.
I'm exporting many columns and two of this columns contain a date and an hour, this are the columns I use for the WHERE clause.
In detail I have about 200 rows for each day, everyone with a different hour, for many days. I need to extract the first value after the 15:00 of each day for more days.
Since the hours are different for each day i can't specify something like
SELECT a,b,hour,day FROM table WHERE hour='15:01'
because sometimes the value is at 15:01, sometimes 15:03 and so on (i'm looking for the closest value after the 15:00), for fix this i used this workaround:
SELECT TOP 1 a,b,hour,day FROM table WHERE hour > "15:00"
in this way i can take the first value after the 15:00 for a day...the problem is that i need this for more days...for a user-specifyed interval of days. At the moment i fix this with a UNION ALL statement, like this:
SELECT TOP 1 a,b,hour,day FROM table WHERE data="first_day" AND hour > "15:00"
UNION ALL SELECT TOP 1 a,b,hour,day FROM table WHERE data="second_day" AND hour > "15:00"
UNION ALL SELECT TOP 1 a,b,hour,day FROM table WHERE data="third_day" AND hour > "15:00"
...and so on for all the days (i build the SQL string with a for each day in the specifyed interval).
Until now this worked, but now I need to expand the days interval (now is maximun a week, so 5 days) to up to 60 days. I don't want to build an huge query string, but i can't imagine an alternative way for write the SQL.
Any help appreciated
Ettore
I typical solution for this uses row_number():
SELECT a, b, hour, day
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY day ORDER BY hour) as seqnum
FROM table t
WHERE hour > '15:00'
) t
WHERE seqnum = 1;

Postgres SQL select a range of records spaced out by a given interval

I am trying to determine if it is possible, using only sql for postgres, to select a range of time ordered records at a given interval.
Lets say I have 60 records, one record for each minute in a given hour. I want to select records at 5 minute intervals for that hour. The resulting rows should be 12 records each one 5 minutes apart.
This is currently accomplished by selecting the full range of records and then looping thru the results and pulling out the records at the given interval. I am trying to see if I can do this purly in sql as our db is large and we may be dealing with tens of thousands of records.
Any thoughts?
Yes you can. Its really easy once you get the hang of it. I think its one of jewels of SQL and its especially easy in PostgreSQL because of its excellent temporal support. Often, complex functions can turn into very simple queries in SQL that can scale and be indexed properly.
This uses generate_series to draw up sample time stamps that are spaced 1 minute apart. The outer query then extracts the minute and uses modulo to find the values that are 5 minutes apart.
select
ts,
extract(minute from ts)::integer as minute
from
( -- generate some time stamps - one minute apart
select
current_time + (n || ' minute')::interval as ts
from generate_series(1, 30) as n
) as timestamps
-- extract the minute check if its on a 5 minute interval
where extract(minute from ts)::integer % 5 = 0
-- only pick this hour
and extract(hour from ts) = extract(hour from current_time)
;
ts | minute
--------------------+--------
19:40:53.508836-07 | 40
19:45:53.508836-07 | 45
19:50:53.508836-07 | 50
19:55:53.508836-07 | 55
Notice how you could add an computed index on the where clause (where the value of the expression would make up the index) could lead to major speed improvements. Maybe not very selective in this case, but good to be aware of.
I wrote a reservation system once in PostgreSQL (which had lots of temporal logic where date intervals could not overlap) and never had to resort to iterative methods.
http://www.amazon.com/SQL-Design-Patterns-Programming-Focus/dp/0977671542 is an excellent book that goes has lots of interval examples. Hard to find in book stores now but well worth it.
Extract the minutes, convert to int4, and see, if the remainder from dividing by 5 is 0:
select *
from TABLE
where int4 (date_part ('minute', COLUMN)) % 5 = 0;
If the intervals are not time based, and you just want every 5th row; or
If the times are regular and you always have one record per minute
The below gives you one record per every 5
select *
from
(
select *, row_number() over (order by timecolumn) as rown
from tbl
) X
where mod(rown, 5) = 1
If your time records are not regular, then you need to generate a time series (given in another answer) and left join that into your table, group by the time column (from the series) and pick the MAX time from your table that is less than the time column.
Pseudo
select thetimeinterval, max(timecolumn)
from ( < the time series subquery > ) X
left join tbl on tbl.timecolumn <= thetimeinterval
group by thetimeinterval
And further join it back to the table for the full record (assuming unique times)
select t.* from
tbl inner join
(
select thetimeinterval, max(timecolumn) timecolumn
from ( < the time series subquery > ) X
left join tbl on tbl.timecolumn <= thetimeinterval
group by thetimeinterval
) y on tbl.timecolumn = y.timecolumn
How about this:
select min(ts), extract(minute from ts)::integer / 5
as bucket group by bucket order by bucket;
This has the advantage of doing the right thing if you have two readings for the same minute, or your readings skip a minute. Instead of using min even better would be to use one of the the first() aggregate functions-- code for which you can find here:
http://wiki.postgresql.org/wiki/First_%28aggregate%29
This assumes that your five minute intervals are "on the fives", so to speak. That is, that you want 07:00, 07:05, 07:10, not 07:02, 07:07, 07:12. It also assumes you don't have two rows within the same minute, which might not be a safe assumption.
select your_timestamp
from your_table
where cast(extract(minute from your_timestamp) as integer) in (0,5);
If you might have two rows with timestamps within the same minute, like
2011-01-01 07:00:02
2011-01-01 07:00:59
then this version is safer.
select min(your_timestamp)
from your_table
group by (cast(extract(minute from your_timestamp) as integer) / 5)
Wrap either of those in a view, and you can join it to your base table.

Help me build a SQL select statement

SQL isn't my greatest strength and I need some help building a select statement.
Basically, this is my requirement. The table stores a list of names and a timestamp of when the name was entered in the table. Names may be entered multiple times during a week, but only once a day.
I want the select query to return names that were entered anytime in the past 7 days, but not today.
To get a list of names entered today, this is the statement I have:
Select * from table where Date(timestamp) = Date(now())
And to get a list of names entered in the past 7 days, not including today:
Select * from table where (Date(now())- Date(timestamp) < 7) and (date(timestamp) != date(now()))
If the first query returns a set or results, say A, and the second query returns B, how can I get
B-A
Try this if you're working with SQL Server:
SELECT * FROM Table
WHERE Timestamp BETWEEN
dateadd(day,datediff(day,0,getdate()),-7),
AND dateadd(day,datediff(day,0,getdate()),0)
This ensures that the timestamp is between 00:00 7 days ago, and 00:00 today. Today's entries with time greater than 00:00 will not be included.
In plain English, you want records from your second query where the name is not in your first query. In SQL:
Select *
from table
where (Date(now())- Date(timestamp) < 7)
and (date(timestamp) != date(now()))
and name not in (Select name
from table
where Date(timestamp) = Date(now())
)
not in
like
select pk from B where PK not in A
or you can do something like
Select * from table where (Date(now())- Date(timestamp) < 7) and (Date(now())- Date(timestamp) > 1)