How to select data but without similar times? - sql

I have a table with create_dt times and i need to get records but without the datas that have similar create_dt time (15 minutes).
So i need to get only one record instead od two records if the create_dt is in 15 minutes of the first one.
Format of the date and time is '(29.03.2019 00:00:00','DD.MM.YYYY HH24:MI:SS'). Thanks

It's a bit unclear what exactly you want, but one thing I can think of, is to round all values to the nearest "15 minute" and then only pick one row from those "15 minute" intervals:
with rounded as (
select create_dt,
date '0001-01-01' + (round((cast(create_dt as date) - date '0001-01-01') * 24 * 60 / 15) * 15 / 60 / 24) as rounded,
... other columns ....
from your_table
), numbered as (
select create_dt,
rounded,
row_number() over (partition by rounded order by create_dt) as rn
... other columns ....
from rounded
)
select *
from numbered
where rn = 1;
The expression date '0001-01-01' + (round((cast(create_dt as date) - date '0001-01-01') * 24 * 60 / 15) * 15 / 60 / 24) will return create_dt rounded up or down to the next "15 minutes" interval.
The row_number() then assigns unique numbers for each distinct 15 minutes interval and the final select then always picks the first row for that interval.
Online example: https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=e6c7ea651c26a6f07ccb961185652de7

I'm going to walk you through this conceptually. First of all, there's a difficulty in doing this that you might not have noticed.
Let's say you wanted one record from the same hour or day. But if there are two record created on the same day, you only want one in your results. Which one?
I mention this because to the designers of SQL, there is not a single answer that they can provide SQL to pick. Then cannot show data from both records without both records being in the tabular output.
This is a common problem, but when the designers of SQL provided a feature to handle it, it can only work if there is no ambiguity of how to have one row of result for two records. That solution is GROUP BY, but it only works for showing the fields other than the timestamp if they are the same for all the records which match the time period. You have to include all the fields in your select clause and if multiple records in your time period are the same, they will create multiple records in your output. So although there is a tool GROUP BY for this problem, you might not be able to use it.
So here is the solution you want. If multiple records are close together, then don't include the records after the first one. So you want a WHERE clause which will exclude a record if another record recently proceeds it. So the test for each record in the result will involve other records in the table. You need to join the table to itself.
Let's say we have a table named error_events. If we get multiples of the same value in the field error_type very close to the time of other similar events, we only want to see the first one. The SQL will look something like this:
SELECT A.*
FROM error_events A
INNER JOIN error_events B ON A.error_type = B.error_type
WHERE ???
You will have to figure out the details of the WHERE clause, and the functions for the timestamp will depend you when RDBMS product you are using. (mysql and postgres for instance may work differently.)
You want only the records where there is no record which is earlier by less then 15 minutes. You do want the original record. That record will match itself in the join, but it will be the only record in the time period between its timestamp and 15 minutes prior.
So an example WHERE clause would be
WHERE B.create_dt BETWEEN [15 minutes before A.create_dt] and A.create_dt
GROUP BY A.*
HAVING 1 = COUNT(B.pkey)
Like we said, you will have to find out how your database product subtracts time, and how 15 minutes is represented in that difference.

Related

How to make a group query to select multiple rows?

I have a DateTime column (timestamp 2022-05-22 10:10:12) with a batch of stamps per each day.
I need to filter the rows where stamp is before 9am (here is no problem) and I'm using this code:
SELECT * FROM tickets
WHERE date_part('hour'::text, tickets.date_in) < 9::double precision;
The output is the list of the rows where the time in timestamp is less than 9 am (50 rows from 2000).
date_in
2022-05-22 08:10:12
2022-04-23 07:11:13
2022-06-15 08:45:26
Then I need to find all the days where at least one row has a stamp before 9 am - and here I'm stuck. Any idea how to select all the days where at least one stamp was before 9 am?
The code I'm trying:
SELECT * into temp1 FROM tickets
WHERE date_part('hour'::text, tickets.date_in) < 9::double precision
ORDER BY date_part('day'::text, date_in);
Select * into temp2
from tickets, temp1
where date_part('day'::text, tickets.date_in) = date_part('day'::text, temp1.date_in);
Update temp2 set distorted_route = 1;
But this is giving me nothing.
Expected output is to get all the days where at least one route was done before 9am:
date_in
2022-05-22 08:10:12
2022-05-22 10:11:45
2022-05-22 12:14:59
2022-04-23 07:11:13
2022-04-23 11:42:25
2022-06-15 08:45:26
2022-06-15 15:10:57
Should I make an additional table (temp1) to feed it with the first query result (just the rows before 9am) and then make a cross table query to find in the source table public.tickets all the days which are equal to the public.temp1?
Select * from tickets, temp1
where TO_Char(tickets.date_in, 'YYYY-MM-DD')
= TO_Char(temp1.date_in, 'YYYY-MM-DD');
or like this:
SELECT *
FROM tickets
WHERE EXISTS (
SELECT date_in FROM TO_Char(tickets.date_in, 'YYYY-MM-DD') = TO_Char(temp1.date_in, 'YYYY-MM-DD')
);
Ideally, I'd want to avoid using a temporary table and make a request just for one table.
After that, I need to create a view or update and add some remarks to the source table.
Assuming you mean:
How to select all rows where at least one row exists with a timestamp before 9 am of the same day?
SELECT *
FROM tickets t
WHERE EXISTS (
SELECT FROM tickets t1
WHERE t1.date_in::date = t.date_in::date -- same day
AND t1.date_in::time < time '9:00' -- time before 9:00
AND t1.id <> t.id -- exclude self
)
ORDER BY date_id; -- optional, but typically helpful
id being the PK column of your undisclosed table.
But be aware that ...
... typically you'll want to work with timestamptz instead of timestamp. See:
Ignoring time zones altogether in Rails and PostgreSQL
https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_timestamp_.28without_time_zone.29
... this query is slow for big tables, because it cannot use a plain index on (date_id) (not "sargable"). Related:
How do you do date math that ignores the year?
There are various ways to optimize performance. The best way depends on undisclosed information for performance questions.

Sorting table by different cols, depending on what happens to another column

This is my first time I'm asking a question, and English is not my native language. I apologize for any misspelling or misbehaving beforehand.
Now to my question:
I have a table looking like this. (image 1)
1. Every booked time is half an hour long
2. There are always booking_date, hour and minute
3. Some rows have got delivery_date, some hasn’t yet
4. Delivery_date are always AFTER booking_date
unordered table
If the day being printed out is for example (2018-12-01), I want the table to be ordered by the date being viewed (2018-12-01) either it is the booking_date OR delivery_date, which comes first. AND should be ordered by (hour, minute) of each. Like below: (image 2)
Ideal ordered
AS you can see it jumps from row 01 to 08 and then back to 05, then to 02. It’s because it has to be ordered by hour and minute. And yet, the delivery_date has such priority that it jumps in between the rows (like row 05)
I’ve tried this SQL:
SELECT * FROM booking WHERE booking_date=$b_date OR DATE(delivery_date)=$b_date ORDER BY hour, minute ASC, HOUR(delivery_date), MINUTE(delivery_date) ASC
This will give me the booking_date ordered correctly by hour, minute, but the delivery_date is not correctly ordered
Then I have also searched, found on Stackoverflow.com and tried this one:
SELECT * FROM booking WHERE booking_date=$b_date OR DATE(delivery_date)=$b_date ORDER BY CASE WHEN booking_date=$b_date THEN hour, minute WHEN DATE(delivery_date)=$b_date THEN HOUR(delivery_date), MINUTE(delivery_date) END ASC
This gives me the following error:
check the manual that corresponds to your MySQL server version for the right syntax to use near ' minute WHEN DATE(delivery_date)=$b_date THEN HOUR(delivery_date), MINUTE(delivery_date) END' at line 1
and that is the comma after ”hour”. I take it, it doesn’t like the comma, so I can’t use 2 columns for ORDER BY. When I use only one column it works, but the minutes will be wrong.
Is there a way to use 2 columns in ORDER BY?
The THEN clause may only specify one value. Instead you can order by minutes:
SELECT *
FROM booking
WHERE booking_date=$b_date OR DATE(delivery_date)=$b_date
ORDER BY CASE WHEN booking_date=$b_date
THEN hour * 60 + minute
WHEN DATE(delivery_date)=$b_date
THEN HOUR(delivery_date) * 60 + MINUTE(delivery_date)
END ASC

Query to count records within time range SQL or Access Query

I have a table that looks like this:
Row,TimeStamp,ID
1,2014-01-01 06:01:01,5
2,2014-01-01 06:00:03,5
3,2014-01-01 06:02:00,5
4,2014-01-01 06:02:39,5
What I want to do is count the number of records for each ID, however I don't want to count records if a subsequent TimeStamp is within 30 seconds.
So in my above example the total count for ID 5 would be 3, because it wouldn't count Row 2 because it is within 30 seconds of the last timestamp.
I am building a Microsoft Access application, and currently using a Query, so this query can either be an Access query or a SQL query. Thank you for your help.
I think the query below does what you want however I don't understand your expected output. It returns a count of 4 (all the rows in your example) which I believe would be correct because all of your records are at least 30 seconds apart. No single timestamp has a subsequent timestamp within 30 seconds from it (in time).
Row 2 with a timestamp of '2014-01-01 06:00:03' is not within 30 seconds of any timestamp coming after. The closest is row #1 which is 58 seconds later (58 is greater than 30 so I don't know why you think it should be excluded (given what you said you wanted in your explanation)).
Rows 1/3/4 of your example data also are not within 30 seconds of each other.
This is a test of the sql below but like I said it returns all 4 rows (change to a count if you want the count, I brought back the rows to illustrate):
http://sqlfiddle.com/#!3/0d727/20/0
Now check this example with some added data: (I added a fifth row)
http://sqlfiddle.com/#!3/aee67/1/0
insert into tbl values ('2014-01-01 06:01:01',5);
insert into tbl values ('2014-01-01 06:00:03',5);
insert into tbl values ('2014-01-01 06:02:00',5);
insert into tbl values ('2014-01-01 06:02:39',5);
insert into tbl values ('2014-01-01 06:02:30',5);
Note how the query result shows only 3 rows. That is because the row I added (#5) is within 30 seconds of row #3, so #3 is excluded. Row #5 also gets excluded because row #4 is 9 seconds (<=30) later than it. Row #4 does come back because no subsequent timestamp is within 30 seconds (there are no subsequent timestamps at all).
Query to get the detail:
select *
from tbl t
where not exists
(select 1
from tbl x
where x.id = t.id
and x.timestamp > t.timestamp
and datediff(second, t.timestamp, x.timestamp) <= 30)
Query to get the count by ID:
select id, count(*)
from tbl t
where not exists
(select 1
from tbl x
where x.id = t.id
and x.timestamp > t.timestamp
and datediff(second, t.timestamp, x.timestamp) <= 30)
group by id
To the best of my knowledge it is impossible to do with just a SQL statement as presented.
I use two approaches:
For small result sets, remove the surplus records inside your time windows in code, then calculate the relevant statistics. The main advantage to this approach is you do not have to alter the database structure.
Add a field to flag each record relative to the time window, then use code to preprocess your data & fill the indicator. You can now use SQL to aggregate / filter based on the new flag column. If you need to track multiple time windows, you can use multiple flags / multiple columns (e.g. 30 second window, 600 second window, etc)
For this, I'd recommend the second approach, it allows the database (SQL) do more work after you once the preprocessing step is done.

Postgres SQL select a range of records spaced out by a given interval

I am trying to determine if it is possible, using only sql for postgres, to select a range of time ordered records at a given interval.
Lets say I have 60 records, one record for each minute in a given hour. I want to select records at 5 minute intervals for that hour. The resulting rows should be 12 records each one 5 minutes apart.
This is currently accomplished by selecting the full range of records and then looping thru the results and pulling out the records at the given interval. I am trying to see if I can do this purly in sql as our db is large and we may be dealing with tens of thousands of records.
Any thoughts?
Yes you can. Its really easy once you get the hang of it. I think its one of jewels of SQL and its especially easy in PostgreSQL because of its excellent temporal support. Often, complex functions can turn into very simple queries in SQL that can scale and be indexed properly.
This uses generate_series to draw up sample time stamps that are spaced 1 minute apart. The outer query then extracts the minute and uses modulo to find the values that are 5 minutes apart.
select
ts,
extract(minute from ts)::integer as minute
from
( -- generate some time stamps - one minute apart
select
current_time + (n || ' minute')::interval as ts
from generate_series(1, 30) as n
) as timestamps
-- extract the minute check if its on a 5 minute interval
where extract(minute from ts)::integer % 5 = 0
-- only pick this hour
and extract(hour from ts) = extract(hour from current_time)
;
ts | minute
--------------------+--------
19:40:53.508836-07 | 40
19:45:53.508836-07 | 45
19:50:53.508836-07 | 50
19:55:53.508836-07 | 55
Notice how you could add an computed index on the where clause (where the value of the expression would make up the index) could lead to major speed improvements. Maybe not very selective in this case, but good to be aware of.
I wrote a reservation system once in PostgreSQL (which had lots of temporal logic where date intervals could not overlap) and never had to resort to iterative methods.
http://www.amazon.com/SQL-Design-Patterns-Programming-Focus/dp/0977671542 is an excellent book that goes has lots of interval examples. Hard to find in book stores now but well worth it.
Extract the minutes, convert to int4, and see, if the remainder from dividing by 5 is 0:
select *
from TABLE
where int4 (date_part ('minute', COLUMN)) % 5 = 0;
If the intervals are not time based, and you just want every 5th row; or
If the times are regular and you always have one record per minute
The below gives you one record per every 5
select *
from
(
select *, row_number() over (order by timecolumn) as rown
from tbl
) X
where mod(rown, 5) = 1
If your time records are not regular, then you need to generate a time series (given in another answer) and left join that into your table, group by the time column (from the series) and pick the MAX time from your table that is less than the time column.
Pseudo
select thetimeinterval, max(timecolumn)
from ( < the time series subquery > ) X
left join tbl on tbl.timecolumn <= thetimeinterval
group by thetimeinterval
And further join it back to the table for the full record (assuming unique times)
select t.* from
tbl inner join
(
select thetimeinterval, max(timecolumn) timecolumn
from ( < the time series subquery > ) X
left join tbl on tbl.timecolumn <= thetimeinterval
group by thetimeinterval
) y on tbl.timecolumn = y.timecolumn
How about this:
select min(ts), extract(minute from ts)::integer / 5
as bucket group by bucket order by bucket;
This has the advantage of doing the right thing if you have two readings for the same minute, or your readings skip a minute. Instead of using min even better would be to use one of the the first() aggregate functions-- code for which you can find here:
http://wiki.postgresql.org/wiki/First_%28aggregate%29
This assumes that your five minute intervals are "on the fives", so to speak. That is, that you want 07:00, 07:05, 07:10, not 07:02, 07:07, 07:12. It also assumes you don't have two rows within the same minute, which might not be a safe assumption.
select your_timestamp
from your_table
where cast(extract(minute from your_timestamp) as integer) in (0,5);
If you might have two rows with timestamps within the same minute, like
2011-01-01 07:00:02
2011-01-01 07:00:59
then this version is safer.
select min(your_timestamp)
from your_table
group by (cast(extract(minute from your_timestamp) as integer) / 5)
Wrap either of those in a view, and you can join it to your base table.

SQL: need only 1 row per particular timestamp

i have some SQL code that is inserting values from another (non sql-based) system. one of the values i get is a timestamp.
i can get multiple inserts that have the same timestamp (albeit different values for other fields).
my problem is that i am trying to get the first insert happening every day (based upon timestamp) since a particular day (i.e. give me the first insert of each day since January 28, 2007...)
my code to get the first timestamp of every day is as follows:
SELECT MIN(my_timestamp) AS first_timestamp
FROM my_schema.my_table
WHERE my_col1 = 'WHATEVER'
AND my_timestamp > timestamp '2010-Jul-27 07:45:24' - INTERVAL '365 DAY'
GROUP BY DATE (my_timestamp);
This delivers me the list of times available. But when I join against these times, I can get several rows, as there are lots of rows that mach these times. So for 365 days, I may get 5,000 rows (I could be inserting 100 rows at 00:00:00 every day).
Assuming, in the example above, my_table has columns my_col1 and my_col2, how can I get exactly 365 rows that contain my_col1 & my_col2? it doesn't matter which row i get back if there are multiple rows for a date; any row will suffice.
it's an odd question. the overall problem is: given a timestamp, how can one get 1-row-per-timestamp even if there are multiple rows that have said timestamp (assuming there is no other priority)?
thanks for the help in advance.
EDIT:
So, let's say for example, this table has the following columns: my_col1, my_col2, and my_timestamp.
Here are example values (in order of my_col1 - my_col2 - my_timestamp):
'my_val1' - 10 - '2010-07-01 01:01:01'
'my_val2' - 11 - '2010-07-01 01:01:01'
'my_val3' - 12 - '2010-07-01 01:01:01'
'my_val4' - 13 - '2010-07-01 01:01:02'
'my_val5' - 14 - '2010-07-02 01:01:01'
'my_val6' - 15 - '2010-07-02 01:01:01'
'my_val7' - 16 - '2010-07-03 01:01:01'
in the end, i would want only 3 rows, 1 with a timestamp with '2010-07-01 01:01:01', one with '2010-07-02 01:01:01', and one with '2010-07-03 01:01:01'. the third one is easy, since there is only 1 row with that last timestamp. but the first two are the tricky ones. the sql i posted above will ignore the row with 'my_val4'.
i need a query that will return me all of the columns, not just the dates.
how would i get sql to give me either the first or last of the values that would match that timestamp (it doesn't matter either way. i just need to get 1-per first-day's timestamp matching)?
select distinct on (date(my_timestamp)) *
from my_table
order by date(my_timestamp), my_timestamp
This selects all columns, exactly one row per date(my_timestamp). The single row per day is the first row for the group, as determined by order by (so that's the row with minimal my_timestamp).
Of course you can add whatever joins, wheres etc. you need. But this is the stub you're looking for.
The solution is to use the SQL's DISTINCT statement (http://www.sql-tutorial.com/sql-distinct-sql-tutorial/):
SELECT DISTINCT MIN(my_timestamp) AS first_timestamp FROM my_schema.my_table WHERE my_col1 = 'WHATEVER' AND my_timestamp > timestamp '2010-Jul-27 07:45:24' - INTERVAL '365 DAY' GROUP BY DATE (my_timestamp);
I know you already have an answer, but I still don't understand why you have mentioned a join in your question. Why not just include the rest of the columns in your query, like this:
SELECT MIN(my_timestamp) AS first_timestamp, my_col1, my_col2
FROM my_table
GROUP BY DATE(my_timestamp);
This works in MySQL. Does it not return the expected result in PostgreSQL?