PostgreSQL: Calculate Average Handling Time - sql

I have a sample table that looks like this
I need to to a SQL script to get the Average Handling Time of a Case, I researched for suggestions but never worked with timestamps and I'm really lost on how to do it.

If you subtract one timestamp from another, you get an interval. And you can calculate the average over intervals.
select avg(close_timestamp - create_timestamp)
from the_table;

You can calculate the AVG of the difference of the timestamp.
SELECT agent, avg(close_timestamp - create_timestamp) average_timestamp
FROM your_table
GROUP BY agent
ORDER BY agent
You can format the solution for obtain it in days/hours/minutes/seconds.

Related

Finding the initial sampled time window after using SAMPLE BY again

I can't seem to find a perhaps easy solution to what I'm trying to accomplish here, using SQL and, more importantly, QuestDB. I also find it hard to put my exact question into words so bear with me.
Input
My real input is different of course but a similar dataset or case is the gas_prices table on the demo page of QuestDB. On https://demo.questdb.io, you can directly write and run queries against some sample database, so it should be easy enough to follow.
The main task I want to accomplish is to find out which month was responsible for the year's highest galon price.
Output
Using the following query, I can get the average galon price per month just fine.
SELECT timestamp, avg(galon_price) as avg_per_month FROM 'gas_prices' SAMPLE BY 1M
timestamp
avg_per_month
2000-06-05T00:00:00.000000Z
1.6724
2000-07-05T00:00:00.000000Z
1.69275
2000-08-05T00:00:00.000000Z
1.635
...
...
Then, I get all these monthly averages, group them by year and return the maximum galon price per year by wrapping the above query in a subquery, like so:
SELECT timestamp, max(avg_per_month) as max_per_year FROM (
SELECT timestamp, avg(galon_price) as avg_per_month FROM 'gas_prices' SAMPLE BY 1M
) SAMPLE BY 12M
timestamp
max_per_year
2000-01-05T00:00:00.000000Z
1.69275
2001-01-05T00:00:00.000000Z
1.767399999999
2002-01-05T00:00:00.000000Z
1.52075
...
...
Wanted output
I want to know which month was responsible for the maximum price of a year.
Looking at the output of the above query, we see that the maximum galon price for the year 2000 was 1.69275. Which month of the year 2000 had this amount as average price? I'd like to display this month in an additional column.
For the first row, July 2000 is shown in the additional column for year 2000 because it is responsible for the highest average price in 2000. For the second row, it was May 2001 as that month had the highest average price of 2001.
timestamp
max_per_year
which_month_is_responsible
2000-01-05T00:00:00.000000Z
1.69275
2000-07-05T00:00:00.000000Z
2001-01-05T00:00:00.000000Z
1.767399999999
2001-05-05T00:00:00.000000Z
...
...
What did I try?
I tried by adding a subquery to the SELECT to have a "duplicate" of some sort for the timestamp column but that's apparently never valid in QuestDB (?), so probably the solution is by adding even more subqueries in the FROM? Or a UNION?
Who can help me out with this? The data is there in the database and it can be calculated. It's just a matter of getting it out.
I think 'wanted output' can be achieved with window functions.
Please have a look at:
CREATE TABLE electricity (ts TIMESTAMP, consumption DOUBLE) TIMESTAMP(ts);
INSERT INTO electricity
SELECT (x*1000000)::timestamp, rnd_double()
FROM long_sequence(10000000);
SELECT day, ts, max_per_day
FROM
(
SELECT timestamp_floor('d', ts) as day,
ts,
avg_in_15_min as max_per_day,
row_number() OVER (PARTITION BY timestamp_floor('d', ts) ORDER BY avg_in_15_min desc) as rn_per_day
FROM
(
SELECT ts, avg(consumption) as avg_in_15_min
FROM electricity
SAMPLE BY 15m
)
) WHERE rn_per_day = 1

difference between 2 datetimes in UTC in Bigquery

I am returning a set of values in a bigquery select statement like this
I need to add a compute field Utlization for each row like this formulae (end_time - start_time)*cores Utilization
This time format is in UTC so I am not sure how to do this , I want to do this in the select statement itself. I am new to BigQuery. Kindly Help . Thanks
You need to use TIMESTAMP_DIFF function in standard SQL.
In your particular case the query would be something like
SELECT TIMESTAMP_DIFF(end_time , start_time , second)*cores as Utilization
FROM <yourtable>
Take into consideration that you can change the time unit of the result and you should change it to fit your needs. I inserted second but you can use microsecond, millisecond, second, minute, hour or day.

PostgreSQL - GROUP BY timestamp values?

I've got a table with purchase orders stored in it. Each row has a timestamp indicating when the order was placed. I'd like to be able to create a report indicating the number of purchases each day, month, or year. I figured I would do a simple SELECT COUNT(xxx) FROM tbl_orders GROUP BY tbl_orders.purchase_time and get the value, but it turns out I can't GROUP BY a timestamp column.
Is there another way to accomplish this? I'd ideally like a flexible solution so I could use whatever timeframe I needed (hourly, monthly, weekly, etc.) Thanks for any suggestions you can give!
This does the trick without the date_trunc function (easier to read).
// 2014
select created_on::DATE from users group by created_on::DATE
// updated September 2018 (thanks to #wegry)
select created_on::DATE as co from users group by co
What we're doing here is casting the original value into a DATE rendering the time data in this value inconsequential.
Grouping by a timestamp column works fine for me here, keeping in mind that even a 1-microsecond difference will prevent two rows from being grouped together.
To group by larger time periods, group by an expression on the timestamp column that returns an appropriately truncated value. date_trunc can be useful here, as can to_char.

Query to find a weekly average

I have an SQLite database with the following fields for example:
date (yyyymmdd fomrat)
total (0.00 format)
There is typically 2 months of records in the database. Does anyone know a SQL query to find a weekly average?
I could easily just execute:
SELECT COUNT(1) as total_records, SUM(total) as total FROM stats_adsense
Then just divide total by 7 but unless there is exactly x days that are divisible by 7 in the db I don't think it will be very accurate, especially if there is less than 7 days of records.
To get a daily summary it's obviously just total / total_records.
Can anyone help me out with this?
You could try something like this:
SELECT strftime('%W', thedate) theweek, avg(total) theaverage
FROM table GROUP BY strftime('%W', thedate)
I'm not sure how the syntax would work in SQLite, but one way would be to parse out the date parts of each [date] field, and then specifying which WEEK and DAY boundaries in your WHERE clause and then GROUP by the week. This will give you a true average regardless of whether there are rows or not.
Something like this (using T-SQL):
SELECT DATEPART(w, theDate), Avg(theAmount) as Average
FROM Table
GROUP BY DATEPART(w, theDate)
This will return a row for every week. You could filter it in your WHERE clause to restrict it to a given date range.
Hope this helps.
Your weekly average is
daily * 7
Obviously this doesn't take in to account specific weeks, but you can get that by narrowing the result set in a date range.
You'll have to omit those records in the addition which don't belong to a full week. So, prior to summing up, you'll have to find the min and max of the dates, manipulate them such that they form "whole" weeks, and then run your original query with a WHERE that limits the date values according to the new range. Maybe you can even put all this into one query. I'll leave that up to you. ;-)
Those values which are "truncated" are not used then, obviously. If there's not enough values for a week at all, there's no result at all. But there's no solution to that, apparently.

SQL: Calculating system load statistics

I have a table like this that stores messages coming through a system:
Message
-------
ID (bigint)
CreateDate (datetime)
Data (varchar(255))
I've been asked to calculate the messages saved per second at peak load. The only data I really have to work with is the CreateDate. The load on the system is not constant, there are times when we get a ton of traffic, and times when we get little traffic. I'm thinking there are two parts to this problem: 1. Determine ranges of time that are considered peak load, 2. Calculate the average messages per second during these times.
Is this the right approach? Are there things in SQL that can help with this? Any tips would be greatly appreciated.
I agree, you have to figure out what Peak Load is first before you can start to create reports on it.
The first thing I would do is figure out how I am going to define peak load. Ex. Am I going to look at an hour by hour breakdown.
Next I would do a group by on the CreateDate formated in seconds (no milleseconds). As part of the group by I would do an avg based on number of records.
I don't think you'd need to know the peak hours; you can generate them with SQL, wrapping a the full query and selecting the top 20 entries, for example:
select top 20 *
from (
[...load query here...]
) qry
order by LoadPerSecond desc
This answer had a good lesson about averages. You can calculate the load per second by looking at the load per hour, and dividing by 3600.
To get a first glimpse of the load for the last week, you could try (Sql Server syntax):
select datepart(dy,createdate) as DayOfYear,
hour(createdate) as Hour,
count(*)/3600.0 as LoadPerSecond
from message
where CreateDate > dateadd(week,-7,getdate())
group by datepart(dy,createdate), hour(createdate)
To find the peak load per minute:
select max(MessagesPerMinute)
from (
select count(*) as MessagesPerMinute
from message
where CreateDate > dateadd(days,-7,getdate())
group by datepart(dy,createdate),hour(createdate),minute(createdate)
)
Grouping by datepart(dy,...) is an easy way to distinguish between days without worrying about month borders. It works until you select more that a year back, but that would be unusual for performance queries.
warning, these will run slow!
this will group your data into "second" buckets and list them from the most activity to least:
SELECT
CONVERT(char(19),CreateDate,120) AS CreateDateBucket,COUNT(*) AS CountOf
FROM Message
GROUP BY CONVERT(Char(19),CreateDate,120)
ORDER BY 2 Desc
this will group your data into "minute" buckets and list them from the most activity to least:
SELECT
LEFT(CONVERT(char(19),CreateDate,120),16) AS CreateDateBucket,COUNT(*) AS CountOf
FROM Message
GROUP BY LEFT(CONVERT(char(19),CreateDate,120),16)
ORDER BY 2 Desc
I'd take those values and calculate what they want