SELECT statement optimization

SELECT statement optimization - sql

I'm not so expert in SQL queryes, but not even a complete newbie.
I'm exporting data from a MS-SQL database to an excel file using a SQL query.
I'm exporting many columns and two of this columns contain a date and an hour, this are the columns I use for the WHERE clause.
In detail I have about 200 rows for each day, everyone with a different hour, for many days. I need to extract the first value after the 15:00 of each day for more days.
Since the hours are different for each day i can't specify something like
SELECT a,b,hour,day FROM table WHERE hour='15:01'
because sometimes the value is at 15:01, sometimes 15:03 and so on (i'm looking for the closest value after the 15:00), for fix this i used this workaround:
SELECT TOP 1 a,b,hour,day FROM table WHERE hour > "15:00"
in this way i can take the first value after the 15:00 for a day...the problem is that i need this for more days...for a user-specifyed interval of days. At the moment i fix this with a UNION ALL statement, like this:
SELECT TOP 1 a,b,hour,day FROM table WHERE data="first_day" AND hour > "15:00"
UNION ALL SELECT TOP 1 a,b,hour,day FROM table WHERE data="second_day" AND hour > "15:00"
UNION ALL SELECT TOP 1 a,b,hour,day FROM table WHERE data="third_day" AND hour > "15:00"
...and so on for all the days (i build the SQL string with a for each day in the specifyed interval).
Until now this worked, but now I need to expand the days interval (now is maximun a week, so 5 days) to up to 60 days. I don't want to build an huge query string, but i can't imagine an alternative way for write the SQL.
Any help appreciated
Ettore

I typical solution for this uses row_number():
SELECT a, b, hour, day
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY day ORDER BY hour) as seqnum
FROM table t
WHERE hour > '15:00'
) t
WHERE seqnum = 1;

Related

How to select data but without similar times?

I have a table with create_dt times and i need to get records but without the datas that have similar create_dt time (15 minutes).
So i need to get only one record instead od two records if the create_dt is in 15 minutes of the first one.
Format of the date and time is '(29.03.2019 00:00:00','DD.MM.YYYY HH24:MI:SS'). Thanks

It's a bit unclear what exactly you want, but one thing I can think of, is to round all values to the nearest "15 minute" and then only pick one row from those "15 minute" intervals:
with rounded as (
select create_dt,
date '0001-01-01' + (round((cast(create_dt as date) - date '0001-01-01') * 24 * 60 / 15) * 15 / 60 / 24) as rounded,
... other columns ....
from your_table
), numbered as (
select create_dt,
rounded,
row_number() over (partition by rounded order by create_dt) as rn
... other columns ....
from rounded
)
select *
from numbered
where rn = 1;
The expression date '0001-01-01' + (round((cast(create_dt as date) - date '0001-01-01') * 24 * 60 / 15) * 15 / 60 / 24) will return create_dt rounded up or down to the next "15 minutes" interval.
The row_number() then assigns unique numbers for each distinct 15 minutes interval and the final select then always picks the first row for that interval.
Online example: https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=e6c7ea651c26a6f07ccb961185652de7

I'm going to walk you through this conceptually. First of all, there's a difficulty in doing this that you might not have noticed.
Let's say you wanted one record from the same hour or day. But if there are two record created on the same day, you only want one in your results. Which one?
I mention this because to the designers of SQL, there is not a single answer that they can provide SQL to pick. Then cannot show data from both records without both records being in the tabular output.
This is a common problem, but when the designers of SQL provided a feature to handle it, it can only work if there is no ambiguity of how to have one row of result for two records. That solution is GROUP BY, but it only works for showing the fields other than the timestamp if they are the same for all the records which match the time period. You have to include all the fields in your select clause and if multiple records in your time period are the same, they will create multiple records in your output. So although there is a tool GROUP BY for this problem, you might not be able to use it.
So here is the solution you want. If multiple records are close together, then don't include the records after the first one. So you want a WHERE clause which will exclude a record if another record recently proceeds it. So the test for each record in the result will involve other records in the table. You need to join the table to itself.
Let's say we have a table named error_events. If we get multiples of the same value in the field error_type very close to the time of other similar events, we only want to see the first one. The SQL will look something like this:
SELECT A.*
FROM error_events A
INNER JOIN error_events B ON A.error_type = B.error_type
WHERE ???
You will have to figure out the details of the WHERE clause, and the functions for the timestamp will depend you when RDBMS product you are using. (mysql and postgres for instance may work differently.)
You want only the records where there is no record which is earlier by less then 15 minutes. You do want the original record. That record will match itself in the join, but it will be the only record in the time period between its timestamp and 15 minutes prior.
So an example WHERE clause would be
WHERE B.create_dt BETWEEN [15 minutes before A.create_dt] and A.create_dt
GROUP BY A.*
HAVING 1 = COUNT(B.pkey)
Like we said, you will have to find out how your database product subtracts time, and how 15 minutes is represented in that difference.

How to compare time stamps from consecutive rows

I have a table that I would like to sort by a timestamp desc and then compare all consecutive rows to determine the difference between each row. From there, I would like to find all the rows whose difference is greater than ~2hours.
I'm stuck on how to actually compare consecutive rows in a table. Any help would be much appreciated.
I'm using Oracle SQL Developer 3.2

You didn't show us your table definition, but something like this:
select *
from (
select t.*,
t.timestamp_column,
t.timestamp_column - lag(timestamp_column) over (order by timestamp_column) as diff
from the_table t
) x
where diff > interval '2' hour;
This assumes that timestamp_column is defined as timestamp not date (otherwise the result of the difference wouldn't be an interval)

Multiple aggregate sums from different conditions in one sql query

Whereas I believe this is a fairly general SQL question, I am working in PostgreSQL 9.4 without an option to use other database software, and thus request that any answer be compatible with its capabilities.
I need to be able to return multiple aggregate totals from one query, such that each sum is in a new row, and each of the groupings are determined by a unique span of time, e.g. WHERE time_stamp BETWEEN '2016-02-07' AND '2016-02-14'. The number of records that satisfy there WHERE clause is unknown and may be zero, in which case ideally the result is "0". This is what I have worked out so far:
(
SELECT SUM(minutes) AS min
FROM downtime
WHERE time_stamp BETWEEN '2016-02-07' AND '2016-02-14'
)
UNION ALL
(
SELECT SUM(minutes)
FROM downtime
WHERE time_stamp BETWEEN '2016-02-14' AND '2016-02-21'
)
UNION ALL
(
SELECT SUM(minutes)
FROM downtime
WHERE time_stamp BETWEEN '2016-02-28' AND '2016-03-06'
)
UNION ALL
(
SELECT SUM(minutes)
FROM downtime
WHERE time_stamp BETWEEN '2016-03-06' AND '2016-03-13'
)
UNION ALL
(
SELECT SUM(minutes))
FROM downtime
WHERE time_stamp BETWEEN '2016-03-13' AND '2016-03-20'
)
UNION ALL
(
SELECT SUM(minutes)
FROM downtime
WHERE time_stamp BETWEEN '2016-03-20' AND '2016-03-27'
)
Result:
min
---+-----
1 | 119
2 | 4
3 | 30
4 |
5 | 62
6 | 350
That query gets me almost the exact result that I want; certainly good enough in that I can do exactly what I need with the results. Time spans with no records are blank but that was predictable, and whereas I would prefer "0" I can account for the blank rows in software.
But, while it isn't terrible for the 6 weeks that it represents, I want to be flexible and to be able to do the same thing for different time spans, and for a different number of data points, such as each day in a week, each week in 3 months, 6 months, each month in 1 year, 2 years, etc... As written above, it feels as if it is going to get tedious fast... for instance 1 week spans over a 2 year period is 104 sub-queries.
What I'm after is a more elegant way to get the same (or similar) result.
I also don't know if doing 104 iterations of a similar query to the above (vs. the 6 that it does now) is a particularly efficient usage.
Ultimately I am going to write some code which will help me build (and thus abstract away) the long, ugly query--but it would still be great to have a more concise and scale-able query.

In Postgres, you can generate a series of times and then use these for the aggregation:
select g.dte, coalesce(sum(dt.minutes), 0) as minutes
from generate_series('2016-02-07'::timestamp, '2016-03-20'::timestamp, interval '7 day') g(dte) left join
downtime dt
on dt.timestamp >= g.dte and dt.timestamp < g.dte + interval '7 day'
group by g.dte
order by g.dte;

Postgres SQL select a range of records spaced out by a given interval

I am trying to determine if it is possible, using only sql for postgres, to select a range of time ordered records at a given interval.
Lets say I have 60 records, one record for each minute in a given hour. I want to select records at 5 minute intervals for that hour. The resulting rows should be 12 records each one 5 minutes apart.
This is currently accomplished by selecting the full range of records and then looping thru the results and pulling out the records at the given interval. I am trying to see if I can do this purly in sql as our db is large and we may be dealing with tens of thousands of records.
Any thoughts?

Yes you can. Its really easy once you get the hang of it. I think its one of jewels of SQL and its especially easy in PostgreSQL because of its excellent temporal support. Often, complex functions can turn into very simple queries in SQL that can scale and be indexed properly.
This uses generate_series to draw up sample time stamps that are spaced 1 minute apart. The outer query then extracts the minute and uses modulo to find the values that are 5 minutes apart.
select
ts,
extract(minute from ts)::integer as minute
from
( -- generate some time stamps - one minute apart
select
current_time + (n || ' minute')::interval as ts
from generate_series(1, 30) as n
) as timestamps
-- extract the minute check if its on a 5 minute interval
where extract(minute from ts)::integer % 5 = 0
-- only pick this hour
and extract(hour from ts) = extract(hour from current_time)
;
ts | minute
--------------------+--------
19:40:53.508836-07 | 40
19:45:53.508836-07 | 45
19:50:53.508836-07 | 50
19:55:53.508836-07 | 55
Notice how you could add an computed index on the where clause (where the value of the expression would make up the index) could lead to major speed improvements. Maybe not very selective in this case, but good to be aware of.
I wrote a reservation system once in PostgreSQL (which had lots of temporal logic where date intervals could not overlap) and never had to resort to iterative methods.
http://www.amazon.com/SQL-Design-Patterns-Programming-Focus/dp/0977671542 is an excellent book that goes has lots of interval examples. Hard to find in book stores now but well worth it.

Extract the minutes, convert to int4, and see, if the remainder from dividing by 5 is 0:
select *
from TABLE
where int4 (date_part ('minute', COLUMN)) % 5 = 0;

If the intervals are not time based, and you just want every 5th row; or
If the times are regular and you always have one record per minute
The below gives you one record per every 5
select *
from
(
select *, row_number() over (order by timecolumn) as rown
from tbl
) X
where mod(rown, 5) = 1
If your time records are not regular, then you need to generate a time series (given in another answer) and left join that into your table, group by the time column (from the series) and pick the MAX time from your table that is less than the time column.
Pseudo
select thetimeinterval, max(timecolumn)
from ( < the time series subquery > ) X
left join tbl on tbl.timecolumn <= thetimeinterval
group by thetimeinterval
And further join it back to the table for the full record (assuming unique times)
select t.* from
tbl inner join
(
select thetimeinterval, max(timecolumn) timecolumn
from ( < the time series subquery > ) X
left join tbl on tbl.timecolumn <= thetimeinterval
group by thetimeinterval
) y on tbl.timecolumn = y.timecolumn

How about this:
select min(ts), extract(minute from ts)::integer / 5
as bucket group by bucket order by bucket;
This has the advantage of doing the right thing if you have two readings for the same minute, or your readings skip a minute. Instead of using min even better would be to use one of the the first() aggregate functions-- code for which you can find here:
http://wiki.postgresql.org/wiki/First_%28aggregate%29

This assumes that your five minute intervals are "on the fives", so to speak. That is, that you want 07:00, 07:05, 07:10, not 07:02, 07:07, 07:12. It also assumes you don't have two rows within the same minute, which might not be a safe assumption.
select your_timestamp
from your_table
where cast(extract(minute from your_timestamp) as integer) in (0,5);
If you might have two rows with timestamps within the same minute, like
2011-01-01 07:00:02
2011-01-01 07:00:59
then this version is safer.
select min(your_timestamp)
from your_table
group by (cast(extract(minute from your_timestamp) as integer) / 5)
Wrap either of those in a view, and you can join it to your base table.

MySQL to get the count of rows that fall on a date for each day of a month

I have a table that contains a list of community events with columns for the days the event starts and ends. If the end date is 0 then the event occurs only on the start day. I have a query that returns the number of events happening on any given day:
SELECT COUNT(*) FROM p_community e WHERE
(TO_DAYS(e.date_ends)=0 AND DATE(e.date_starts)=DATE('2009-05-13')) OR
(DATE('2009-05-13')>=DATE(e.date_starts) AND DATE('2009-05-13')<=DATE(e.date_ends))
I just sub in any date I want to test for "2009-05-13".
I need to be be able to fetch this data for every day in an entire month. I could just run the query against each day one at a time, but I'd rather run one query that can give me the entire month at once. Does anyone have any suggestions on how I might do that?
And no, I can't use a stored procedure.

Try:
SELECT COUNT(*), DATE(date) FROM table WHERE DATE(dtCreatedAt) >= DATE('2009-03-01') AND DATE(dtCreatedAt) <= DATE('2009-03-10') GROUP BY DATE(date);
This would get the amount for each day in may 2009.
UPDATED: Now works on a range of dates spanning months/years.

Unfortunately, MySQL lacks a way to generate a rowset of given number of rows.
You can create a helper table:
CREATE TABLE t_day (day INT NOT NULL PRIMARY KEY)
INSERT
INTO t_day (day)
VALUES (1),
(2),
…,
(31)
and use it in a JOIN:
SELECT day, COUNT(*)
FROM t_day
JOIN p_community e
ON day BETWEEN DATE(e.start) AND IF(DATE(e.end), DATE(e.end), DATE(e.start))
GROUP BY
day
Or you may use an ugly subquery:
SELECT day, COUNT(*)
FROM (
SELECT 1 AS day
UNION ALL
SELECT 2 AS day
…
UNION ALL
SELECT 31 AS day
) t_day
JOIN p_community e
ON day BETWEEN DATE(e.start) AND IF(DATE(e.end), DATE(e.end), DATE(e.start))
GROUP BY
day

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas