SQL Finding maximum average time for distinct cell - sql

I have a table with large number of records for which i am trying to find only 10 numbers with the largest average time per number.
So the table may look like so:
number | time
012345 | 10s
012345 | 20s
055555 | 50s
055555 | 30s
068976 | 11s
etc...
and the output should look like so:
number | time
012345 | 15s
055555 | 40s
068976 | 11s
tried this but to no avail
select distinct(destination), avg(totalqueuetime)
from call
group by destination, totalqueuetime
order by totalqueue time desc limit 10;
it does not seem to group the numbers.

Please try the following code, which has been tested as confirmed as effective. ...-
(If you wish to sort by average total queue time, as your code sample above suggests)
SELECT destination,
AVG( totalqueuetime ) AS avgTQT
FROM call
GROUP BY destination
ORDER BY avgTQT DESC LIMIT 10;
(If you wish to sort by destination, as your desired output sample above suggests)
SELECT destination,
AVG( totalqueuetime ) AS avgTQT
FROM call
GROUP BY destination
ORDER BY destination DESC LIMIT 10;
If you have any questions or comments, then please feel free to post a Comment accordingly.
Note : As for your supplied code, if you remove totalqueuetime from the GROUP BY clause you will not need to use DISTINCT. Thanks to AVG your SELECT statement will place the average in every returned field, potentially leading to many instances of the same combination of description and average. Grouping them by Destination will reduce the list to one instance of each combination only.

Your group by has two keys. It should only have one:
select destination, avg(totalqueuetime)
from call
group by destination
order by totalqueue time desc
limit 10;
Notes on the use of distinct. select distinct is almost never needed with group by. In fact, in almost all cases, you don't need select distinct at all -- because you can use group by.
In addition, distinct is not a function. It applies to the entire entire row. So, don't use parentheses around the first column, unless you want to confuse yourself.

Related

Nested SQL in Presto not resolving a column name when trying to apply WHERE

I have been trying to create an automated chart / query for the utilisation of my router.
I have a nested query that returns the following:
Record_Date | Mbps_IN | Mbps_OUT
YYYYMMDD HH:00 | 1234 | 1234
This should have one entry per hour but due to data collection issues from my router there are often missing hours or even days of data missing. The nature of the counter is a "delta" so elsewhere in the "raw data" I am capturing the delta of data volume between the previous record which results in a flat line for a number of hours and then a very big data value often 2-3 times bigger due to it containing multiple hours of utilisation recorded against the first hour the data feed returned.
Ultimately I would like to find a way to smooth / build an average from this spike and backfill the missing hours. (but that is a challenge for another day).
In the first instance I would like simply only select the rows where the value in Mbps_In is less than 1000.
However, when I do this from either metabase or a dbeaver connection direct to my PrestoDB I get an error:
Column 'results.Mbps_In' cannot be resolved {:message "line 27:7: Column 'results.Mbps_in' cannot be resolved", :errorCode 47, :errorName "COLUMN_NOT_FOUND",
My Query works just fine to give the tabular output including the outliers as follows:
select
metrics_date_hour Record_Date
,round(In_Utilisation_Mbps_Total,2) as Mbps_In
,round(Out_Utilisation_Mbps_Total,2) as Mbps_Out
from (
nested query
) results
-- WHERE results.Mbps_In < 1000
Group By Record_Date, Order By Record_Date desc
When I uncomment the Where clause I get the error on the failure to resolve the column name.
I feel like this should not be difficult but I have tried a few variations and efforts at referencing some of the original columns that were processed earlier to get to this results output but I am still failing to correctly reference the column from the results table.
Updated with successful query:
select
metrics_date_hour Record_Date
,round(sum(In_Utilisation_Mbps_Total),2) as Mbps_In
,round(sum(Out_Utilisation_Mbps_Total),2) as Mbps_Out
from (
nested query
) results
-- WHERE results.Mbps_In < 1000 - I didn't get this to work
Group By Record_Date
Having (sum(In_Utilisation_Mbps_Total) <1000
Order By Record_Date desc
The error is produced because you don't have a column named Mbps_In in your nested query. I thing that you really need a HAVING clause not a WHERE. Try to change it to this:
select
metrics_date_hour Record_Date
,round(In_Utilisation_Mbps_Total,2) as Mbps_In
,round(Out_Utilisation_Mbps_Total,2) as Mbps_Out
from (
nested query
) results
Group By Record_Date
Having Mbps_In<1000
Order By Record_Date desc
If you still want too use the WHERE clause, you need to change your column name:
select
metrics_date_hour Record_Date
,round(In_Utilisation_Mbps_Total,2) as Mbps_In
,round(Out_Utilisation_Mbps_Total,2) as Mbps_Out
from (
nested query
) results
Where In_Utilisation_Mbps_Total<1000
Group By Record_Date
Order By Record_Date desc

SELECT MIN from a subset of data obtained through GROUP BY

There is a database in place with hourly timeseries data, where every row in the DB represents one hour. Example:
TIMESERIES TABLE
id date_and_time entry_category
1 2017/01/20 12:00 type_1
2 2017/01/20 13:00 type_1
3 2017/01/20 12:00 type_2
4 2017/01/20 12:00 type_3
First I used the GROUP BY statement to find the latest date and time for each type of entry category:
SELECT MAX(date_and_time), entry_category
FROM timeseries_table
GROUP BY entry_category;
However now, I want to find which is the date and time which is the LEAST RECENT among the datetime's I obtained with the query listed above. I will need to use somehow SELECT MIN(date_and_time), but how do I let SQL know I want to treat the output of my previous query as a "new table" to apply a new SELECT query on? The output of my total query should be a single value—in case of the sample displayed above, date_and_time = 2017/01/20 12:00.
I've tried using aliases, but don't seem to be able to do the trick, they only rename existing columns or tables (or I'm misusing them..).There are many questions out there that try to list the MAX or MIN for a particular group (e.g. https://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/ or Select max value of each group) which is what I have already achieved, but I want to do work now on this list of obtained datetime's. My database structure is very simple, but I lack the knowledge to string these queries together.
Thanks, cheers!
You can use your first query as a sub-query, it is similar to what you are describing as using the first query's output as the input for the second query. Here you will get the one row out put of the min date as required.
SELECT MIN(date_and_time)
FROM (SELECT MAX(date_and_time) as date_and_time, entry_category
FROM timeseries_table
GROUP BY entry_category)a;
Is this what you want?
SELECT TOP 1 MAX(date_and_time), entry_category
FROM timeseries_table
GROUP BY entry_category
ORDER BY MAX(date_and_time) ASC;
This returns ties. If you do not want ties, then include an additional sort key:
SELECT TOP 1 MAX(date_and_time), entry_category
FROM timeseries_table
GROUP BY entry_category
ORDER BY MAX(date_and_time) ASC, entry_category;

% of total calculation without subquery in Postgres

I'm trying to create a "Percentage of Total" column and currently using a subquery with no issues:
SELECT ID, COUNT(*), COUNT(*) / (SELECT COUNT(*)
FROM DATA) AS % OF TOTAL FROM DATA GROUP BY ID;
| ID | COUNT | % OF TOTAL |
| 1 | 100 | 0.10 |
| 2 | 800 | 0.80 |
| 3 | 100 | 0.10 |
However, for reasons outside the scope of this question, I'm looking to see if there is any way to accomplish this without using a subquery. Essentially, the application uses logic outside of the SQL query to determine what the WHERE clause is and injects it into the query. That logic does not account for the existence of subqueries like the above, so before going back and rebuilding all of the existing logic to account for this scenario, I figured I'd see if there's another solution first.
I've tried accomplishing this effect with a window function, but to no avail.
Use window functions:
SELECT ID, COUNT(*),
COUNT(*) / SUM(COUNT(*)) OVER () AS "% OF TOTAL"
FROM DATA
GROUP BY ID;
SELECT id, count(*) AS ct
, round(count(*)::numeric
/ sum(count(*)) OVER (ORDER BY id), 2) AS pct_of_running_total
FROM data
GROUP BY id;
You must add ORDER BY to the window function or the order of rows is arbitrary. I may seem correct at first, but that can change any time and without warning. It seems you want to order rows by id.
And you obviously don't want integer division, which would truncate fractional digits. I cast to numeric and round the result to two fractional digits like in your result.
Related answer:
Postgres window function and group by exception
Key to understanding why this works is the sequence of evens in a SELECT query:
Best way to get result count before LIMIT was applied

Oracle SQL last n records

i have read tons of articles regarding last n records in Oracle SQL by using rownum functionality, but on my case it does not give me the correct rows.
I have 3 columns in my table: 1) message (varchar), mes_date (date) and mes_time (varchar2).
Inside lets say there is 3 records:
Hello world | 20-OCT-14 | 23:50
World Hello | 21-OCT-14 | 02:32
Hello Hello | 20-OCT-14 | 23:52
I want to get the last 2 records ordered by its date and time (first row the oldest, and second the newest date/time)
i am using this query:
SELECT *
FROM (SELECT message
FROM messages
ORDER
BY MES_DATE, MES_TIME DESC
)
WHERE ROWNUM <= 2 ORDER BY ROWNUM DESC;
Instead of getting row #3 as first and as second row #2 i get row #1 and then row #3
What should i do to get the older dates/times on top follow by the newest?
Maybe that helps:
SELECT *
FROM (SELECT message,
mes_date,
mes_time,
ROW_NUMBER() OVER (ORDER BY TO_DATE(TO_CHAR(mes_date, 'YYYY-MM-DD') || mes_time, 'YYYY-MM-DD HH24:MI') DESC) rank
FROM messages
)
WHERE rank <= 2
ORDER
BY rank
I am really sorry to disappoint - but in Oracle there's no such thing as "the last two records".
The table structure does not allocate data at the end, and does not keep a visible property of time (the only time being held is for the sole purpose of "flashback queries" - supplying results as of point in time, such as the time the query started...).
The last inserted record is not something you can query using the database.
What can you do? You can create a trigger that orders the inserted records using a sequence, and select based on it (so SELECT * from (SELECT * FROM table ORDER BY seq DESC) where rownum < 3) - that will assure order only if the sequence CACHE value is 1.
Notice that if the column that contains the message date does not have many events in a second, you can use that column, as the other solution suggested - e.g. if you have more than 2 events that arrive in a second, the query above will give you random two records, and not the actual last two.
AGAIN - Oracle will not be queryable for the last two rows inserted since its data structure do not managed orders of inserts, and the ordering you see when running "SELECT *" is independent of the actual inserts in some specific cases.
If you have any questions regarding any part of this answer - post it down here, and I'll focus on explaining it in more depth.
select * from table
minus
select * from table
where rownum<=(select count(*) from table)-n

SQL AVG() function returning incorrect values

I want to use the AVG function in sql to return a working average for some values (ie based on the last week not an overall average). I have two values I am calculating, weight and restingHR (heart rate). I have the following sql statements for each:
SELECT AVG( weight ) AS average
FROM stats
WHERE userid='$userid'
ORDER BY date DESC LIMIT 7
SELECT AVG( restingHR ) AS average
FROM stats
WHERE userid='$userid'
ORDER BY date DESC LIMIT 7
The value I get for weight is 82.56 but it should be 83.35
This is not a massive error and I'm rounding it when I use it so its not too big a deal.
However for restingHR I get 45.96 when it should be 57.57 which is a massive difference.
I don't understand why this is going so wrong. Any help is much appreciated.
Thanks
Use a subquery to separate selecting the rows from computing the average:
SELECT AVG(weight) average
FROM (SELECT weight
FROM stats
WHERE userid = '$userid'
ORDER BY date DESC
LIMIT 7) subq
It seems you want to filter your data with ORDER BY date DESC LIMIT 7, but you have to consider, that the ORDER BY clause takes effect after everything else is done. So your AVG() function considers all values of restingHR from your $userId, not just the 7 latest.
To overcome this...okay, Barmar just posted a query.