PL/SQL check time period, repeat, up until 600 records, from large database - sql

What would be the best way to check if there has been data within a 3 month period up until a maximum of 600 records, then repeat for the 3 months before that if 600 hasn't been reached? Also it's a large table so querying the whole thing could take a few minutes or completely hang Oracle SQL Developer.
ROWNUM seems to give row numbers to the whole table before returning the result of the query, so that seems to take too long. The way we are currently doing it is entering a time period explicitly that we guess there will be enough records within and then limiting the rows to 600. This only takes 5 seconds, but needs to be changed constantly.
I was thinking to do a FOR loop through each row, but am having trouble storing the number of results outside of the query itself to check whether or not 600 has been reached.
I was also thinking about creating a data index? But I don't know much about that. Is there a way to sort the data by date before grabbing the whole table that would be faster?
Thank you

check if there has been data within a 3 month period up until a maximum of 600 records, then repeat for the 3 months before that if 600 hasn't been reached?
Find the latest date and filter to only allow the rows that are within 6 months of it and then fetch the first 600 rows:
SELECT *
FROM (
SELECT t.*,
MAX(date_column) OVER () AS max_date_column
FROM table_name t
)
WHERE date_column > ADD_MONTHS( max_date_column, -6 )
ORDER BY date_column DESC
FETCH FIRST 600 ROWS ONLY;
If there are 600 or more within the latest 3 months then they will be returned; otherwise it will extend the result set into the next 3 month period.
If you intend to repeat the extension over more than two 3-month periods then just use:
SELECT *
FROM table_name
ORDER BY date_column DESC
FETCH FIRST 600 ROWS ONLY;
I was also thinking about creating a data index? But I don't know much about that. Is there a way to sort the data by date before grabbing the whole table that would be faster?
Yes, creating an index on the date column would, typically, make filtering the table faster.

Related

Big Query SQL - Group into every n numbers

I have a table that includes a column called minutes_since. It is an integer containing the number of minutes since a pre-defined event. Multiple rows maybe fall within the same minute.
I want to group and aggregate the rows into every n minutes. For example, I want to get the average of another column for all rows occurring within 5 minute intervals.
How could this be achieved in big query standard sql?
#standardSQL
SELECT
MIN(minutes_since) minute_start,
MAX(minutes_since) minute_end,
AVG(value) value_avg
FROM `project.dataset.table`
GROUP BY DIV(minutes_since - 1, 5)

Get the number of records from 2 columns where the time is overlapping

I am new to MS ACCESS and am having trouble trying to get the number of records from overlapping time ranges. This is an example of my data.
example of raw data
I am trying to do is to get the column number_of_records. For example, if there are 4 records added at 5.11, the number_of_records should become 8 as 4 records are added at 5.10.
example of raw data with no_of_records column
There is a mistake in my image above. I forgot to mention that for example, if the time hits 6:00, the number of records should not add on to the previous records and should start afresh.
Do any of you have any suggestions?
Consider the correlated count subquery:
SELECT t.time_column_1, t.time_column_2,
(SELECT Count(*) FROM myTable sub
WHERE sub.time_column_1 <= t.time_column_1
AND sub.time_column_2 = t.time_column_2) AS number_of_records
FROM mytable t
ORDER BY t.time_column_2, t.time_column_1

SQL: Minimising rows in subqueries/partitioning

So here's an odd thing. I have limited SQL access to a database - the most relevant restriction here being that if I create a query, a maximum of 10,000 rows is returned.
Anyway, I've been trying to have a query return individual case details, but only at busy times - say when 50+ cases are attended to in an hour. So, I inserted the following line:
COUNT(CaseNo) OVER (PARTITION BY DATEADD(hh,
DATEDIFF(hh, 0, StartDate), 0)) AS CasesInHour
... And then used this as a subquery, selecting only those cases where CasesInHour >= 50
However, it turns out that the 10,000 rows limit affects the partitioning - when I tried to run over a longer period nothing came up, as it was counting the cases in any given hour from only a (fairly random) much smaller selection.
Can anyone think of a way to get around this limit? The final total returned will be much lower than 10,000 rows, but it will be looking at far more than 10,000 as a starting point.
If this is really MySQL we're talking about, sql_big_selects and max_join_size affects the number of rows examined, not the number of rows "returned". So, you'll need to reduce the number of rows examined by being more selective and using proper indexes.
For example, the following query may be examining over 10,000 rows:
SELECT * FROM stats
To limit the selectivity, you might want to grab only the rows from the last 30 days:
SELECT * FROM stats
WHERE created > DATESUB(NOW(), INTERVAL 30 DAY)
However, this only reduces the number of rows examined if there is an index on the created column and the cardinality of the index is sufficient to reduce the rows examined.

SQL Select statement Where time is *:00

I'm attempting to make a filtered table based off an existing table. The current table has rows for every minute of every hour of 24 days based off of locations (tmcs).
I want to filter this table into another table that has rows for just 1 an hour for each of the 24 days based off the locations (tmcs)
Here is the sql statement that i thought would have done it...
SELECT
Time_Format(t.time, '%H:00') as time, ROUND(AVG(t.avg), 0) as avg,
tmc, Date, Date_Time FROM traffic t
GROUP BY time, tmc, Date
The problem is i still get 247,000 rows effected...and according to simple math I should only have:
Locations (TMCS): 14
Hours in a day: 24
Days tracked: 24
Total = 14 * 24 * 24 = 12,096
My original table has 477,277 rows
When I make a new table off this query i get right around 247,000 which makes no sense, so my query must be wrong.
The reason I did this method instead of a where clause is because I wanted to find the average speed(avg)per hour. This is not mandatory so I'd be fine with using a Where clause for time, but I just don't know how to do this based off *:00
Any help would be much appreciated
Fix the GROUP BY so it's standard, rather then the random MySQL extension
SELECT
Time_Format(t.time, '%H:00') as time,
ROUND(AVG(t.avg), 0) as avg,
tmc, Date, Date_Time
FROM traffic t
GROUP BY
Time_Format(t.time, '%H:00'), tmc, Date, Date_Time
Run this with SET SESSION sql_mode = 'ONLY_FULL_GROUP_BY'; to see the errors that other RDBMS will give you and make MySQL work properly

Date range intersection in SQL

I have a table where each row has a start and stop date-time. These can be arbitrarily short or long spans.
I want to query the sum duration of the intersection of all rows with two start and stop date-times.
How can you do this in MySQL?
Or do you have to select the rows that intersect the query start and stop times, then calculate the actual overlap of each row and sum it client-side?
To give an example, using milliseconds to make it clearer:
Some rows:
ROW START STOP
1 1010 1240
2 950 1040
3 1120 1121
And we want to know the sum time that these rows were between 1030 and 1100.
Lets compute the overlap of each row:
ROW INTERSECTION
1 70
2 10
3 0
So the sum in this example is 80.
If your example should have said 70 in the first row then
assuming #range_start and #range_end as your condition paramters:
SELECT SUM( LEAST(#range_end, stop) - GREATEST(#range_start, start) )
FROM Table
WHERE #range_start < stop AND #range_end > start
using the greatest/least and date functions you should be able to get what you need directly operating on the date type.
I fear you're out of luck.
Since you don't know the number of rows that you will be "cumulatively intersecting", you need either a recursive solution, or an aggregation operator.
The aggregation operator you need is no option because SQL does not have the data type that it is supposed to operate on (that type being an interval type, as described in "Temporal Data and the Relational Model").
The recursive solution may be possible, but it is likely to be difficult to write, difficult to read to other programmers, and it is also questionable whether the optimizer can turn that query into the optimal data access strategy.
Or I misunderstood your question.
There's a fairly interesting solution if you know the maximum time you'll ever have. Create a table with all the numbers in it from one to your maximum time.
millisecond
-----------
1
2
3
...
1240
Call it time_dimension (this technique is often used in dimensional modelling in data warehousing.)
Then this:
SELECT
COUNT(*)
FROM
your_data
INNER JOIN time_dimension ON time_dimension.millisecond BETWEEN your_data.start AND your_data.stop
WHERE
time_dimension.millisecond BETWEEN 1030 AND 1100
...will give you the total number of milliseconds of running time between 1030 and 1100.
Of course, whether you can use this technique depends on whether you can safely predict the maximum number of milliseconds that will ever be in your data.
This is often used in data warehousing, as I said; it fits well with some kinds of problems -- for example, I've used it for insurance systems, where a total number of days between two dates was needed, and where the overall date range of the data was easy to estimate (from the earliest customer date of birth to a date a couple of years into the future, beyond the end date of any policies that were being sold.)
Might not work for you, but I figured it was worth sharing as an interesting technique!
After you added the example, it is clear that indeed I misunderstood your question.
You are not "cumulatively intersecting rows".
The steps that will bring you to a solution are :
intersect each row's start and end point with the given start and end points. This should be doable using CASE expressions or something of that nature, something in the style of :
SELECT (CASE startdate < givenstartdate : givenstartdate, CASE startdate >= givenstartdate : startdate) as retainedstartdate, (likewise for enddate) as retainedenddate FROM ... Cater for nulls and that sort of stuff as needed.
With the retainedstartdate and retainedenddate, use a date function to compute the length of the retained interval (which is the overlap of your row with the given time section).
SELECT the SUM() of those.