SQL Select statement Where time is *:00 - sql

I'm attempting to make a filtered table based off an existing table. The current table has rows for every minute of every hour of 24 days based off of locations (tmcs).
I want to filter this table into another table that has rows for just 1 an hour for each of the 24 days based off the locations (tmcs)
Here is the sql statement that i thought would have done it...
SELECT
Time_Format(t.time, '%H:00') as time, ROUND(AVG(t.avg), 0) as avg,
tmc, Date, Date_Time FROM traffic t
GROUP BY time, tmc, Date
The problem is i still get 247,000 rows effected...and according to simple math I should only have:
Locations (TMCS): 14
Hours in a day: 24
Days tracked: 24
Total = 14 * 24 * 24 = 12,096
My original table has 477,277 rows
When I make a new table off this query i get right around 247,000 which makes no sense, so my query must be wrong.
The reason I did this method instead of a where clause is because I wanted to find the average speed(avg)per hour. This is not mandatory so I'd be fine with using a Where clause for time, but I just don't know how to do this based off *:00
Any help would be much appreciated

Fix the GROUP BY so it's standard, rather then the random MySQL extension
SELECT
Time_Format(t.time, '%H:00') as time,
ROUND(AVG(t.avg), 0) as avg,
tmc, Date, Date_Time
FROM traffic t
GROUP BY
Time_Format(t.time, '%H:00'), tmc, Date, Date_Time
Run this with SET SESSION sql_mode = 'ONLY_FULL_GROUP_BY'; to see the errors that other RDBMS will give you and make MySQL work properly

Related

SQL - Returning max count, after breaking down a day into hourly rows

I need to write a SQL query that helps return the highest count in a given hourly range. The problem is that in my table, it just logs orders as they come and doesn’t have a unique identifier that separates hours from hours.
So basically, I need to find the highest number of orders (on any given hour), from 7/08/2022, - 7/15/2022, have a table that does not distinguish distinct hour sets, and logs orders as they come.
I have tried to use a query that combines MAX(), COUNT(), and DATETIME(), but to no avail.
Can I please receive some help?
I've had to tackle this kind of measurement in the past..
Here's what I did for 15 minute intervals:
My datetime column is named datreg in my database log area.
cast(round(floor(cast(datreg as float(53))*24*4)/(24*4),5) as smalldatetime
I times by 4 in this formula, to get 4 intervals inside my 24 hour period.. For you it would look like this to get just hourly intervals:
cast(round(floor(cast(datreg as float(53))*24)/(24),5) as smalldatetime
This is a little piece of magic when it comes to dashboards and reports.

Use SQL to ensure I have data for each day of a certain time period

I'm looking to only select one data point from each date in my report. I want to ensure each day is accounted for and has at least one row of information, as we had to do a few different things to move a large data file into our data warehouse (import one large Google Sheet for some data, use Python for daily pulls of some of the other data - want to make sure no date was left out), and this data goes from now through last summer. I could do a COUNT DISTINCT clause to just make sure the number of days between the first data point and yesterday (the latest data point), but I want to verify each day is accounted for. Should mention I am in BigQuery. Also, an example of the created_at style is: 2021-02-09 17:05:44.583 UTC
This is what I have so far:
SELECT FIRST(created_at)
FROM 'large_table'
ORDER BY created_at
**I know FIRST is probably not the best clause for this case, and it's currently acting to grab the very first data point in created_at, but just as a jumping-off point.
You can use aggregation:
select any_value(lt).*
from large_table lt
group by created_at
order by min(created_at);
Note: This assumes that created_at is a date -- or at least only has one value per date. You might need to convert it to a date:
select any_value(lt).*
from large_table lt
group by date(created_at)
order by min(created_at);
BigQuery equivalent of the query in your question
SELECT created_at
FROM 'large_table'
ORDER BY created_at
LIMIT 1

Oracle SQL group by time interval

I have a table with the following columns:
order_number
customer_number
creation_date
estimated_ship_date
I need to select the max estimated_ship_date for any records with the same customer_number and the creation_date is within 15 minutes of each other. It could be any number of records, 1-50 max I would say.
So basically group by customer_number and creation_date within 15 minutes of each other. It is the creation_date within 15 minutes of each other that I am stuck on.
If you will only ever have to consider rows as a group if they all fall within a single 15 minute span then you could use a windowing clause:
select order_number, customer_number, creation_date, estimated_ship_date,
max(estimated_ship_date) over (partition by customer_number order by creation_date
range between 15/1440 preceding and 15/1440 following) as estimated_ship_date
from cust_orders;
That gets each row of your table back with an additional column that shows the maximum ship date for any row fifteen minutes* either side of the current one.
It might not do quite what you want if you have a series of orders where each is within 15 minutes of the previous one, but they are not all within 15 minutes of all of the others - as in the example in my earlier comment. It sounds like you maybe don't expect that situation or wouldn't want to group them all together anyway, but you'd need to look at how the grouping worked if it did happen and maybe adjust it slightly.
* Oracle date arithmetic is based on 1 representing a full days, so 15 minutes is (15*60)/(24*60*60) seconds; or 900/86400; or 15/(24*60); or 15/1440; or 1/96; etc. Which representation you use is a matter of taste and maintainability. You can also use intervals if you prefer.

Query to find a weekly average

I have an SQLite database with the following fields for example:
date (yyyymmdd fomrat)
total (0.00 format)
There is typically 2 months of records in the database. Does anyone know a SQL query to find a weekly average?
I could easily just execute:
SELECT COUNT(1) as total_records, SUM(total) as total FROM stats_adsense
Then just divide total by 7 but unless there is exactly x days that are divisible by 7 in the db I don't think it will be very accurate, especially if there is less than 7 days of records.
To get a daily summary it's obviously just total / total_records.
Can anyone help me out with this?
You could try something like this:
SELECT strftime('%W', thedate) theweek, avg(total) theaverage
FROM table GROUP BY strftime('%W', thedate)
I'm not sure how the syntax would work in SQLite, but one way would be to parse out the date parts of each [date] field, and then specifying which WEEK and DAY boundaries in your WHERE clause and then GROUP by the week. This will give you a true average regardless of whether there are rows or not.
Something like this (using T-SQL):
SELECT DATEPART(w, theDate), Avg(theAmount) as Average
FROM Table
GROUP BY DATEPART(w, theDate)
This will return a row for every week. You could filter it in your WHERE clause to restrict it to a given date range.
Hope this helps.
Your weekly average is
daily * 7
Obviously this doesn't take in to account specific weeks, but you can get that by narrowing the result set in a date range.
You'll have to omit those records in the addition which don't belong to a full week. So, prior to summing up, you'll have to find the min and max of the dates, manipulate them such that they form "whole" weeks, and then run your original query with a WHERE that limits the date values according to the new range. Maybe you can even put all this into one query. I'll leave that up to you. ;-)
Those values which are "truncated" are not used then, obviously. If there's not enough values for a week at all, there's no result at all. But there's no solution to that, apparently.

SQL: Calculating system load statistics

I have a table like this that stores messages coming through a system:
Message
-------
ID (bigint)
CreateDate (datetime)
Data (varchar(255))
I've been asked to calculate the messages saved per second at peak load. The only data I really have to work with is the CreateDate. The load on the system is not constant, there are times when we get a ton of traffic, and times when we get little traffic. I'm thinking there are two parts to this problem: 1. Determine ranges of time that are considered peak load, 2. Calculate the average messages per second during these times.
Is this the right approach? Are there things in SQL that can help with this? Any tips would be greatly appreciated.
I agree, you have to figure out what Peak Load is first before you can start to create reports on it.
The first thing I would do is figure out how I am going to define peak load. Ex. Am I going to look at an hour by hour breakdown.
Next I would do a group by on the CreateDate formated in seconds (no milleseconds). As part of the group by I would do an avg based on number of records.
I don't think you'd need to know the peak hours; you can generate them with SQL, wrapping a the full query and selecting the top 20 entries, for example:
select top 20 *
from (
[...load query here...]
) qry
order by LoadPerSecond desc
This answer had a good lesson about averages. You can calculate the load per second by looking at the load per hour, and dividing by 3600.
To get a first glimpse of the load for the last week, you could try (Sql Server syntax):
select datepart(dy,createdate) as DayOfYear,
hour(createdate) as Hour,
count(*)/3600.0 as LoadPerSecond
from message
where CreateDate > dateadd(week,-7,getdate())
group by datepart(dy,createdate), hour(createdate)
To find the peak load per minute:
select max(MessagesPerMinute)
from (
select count(*) as MessagesPerMinute
from message
where CreateDate > dateadd(days,-7,getdate())
group by datepart(dy,createdate),hour(createdate),minute(createdate)
)
Grouping by datepart(dy,...) is an easy way to distinguish between days without worrying about month borders. It works until you select more that a year back, but that would be unusual for performance queries.
warning, these will run slow!
this will group your data into "second" buckets and list them from the most activity to least:
SELECT
CONVERT(char(19),CreateDate,120) AS CreateDateBucket,COUNT(*) AS CountOf
FROM Message
GROUP BY CONVERT(Char(19),CreateDate,120)
ORDER BY 2 Desc
this will group your data into "minute" buckets and list them from the most activity to least:
SELECT
LEFT(CONVERT(char(19),CreateDate,120),16) AS CreateDateBucket,COUNT(*) AS CountOf
FROM Message
GROUP BY LEFT(CONVERT(char(19),CreateDate,120),16)
ORDER BY 2 Desc
I'd take those values and calculate what they want