SQL - calculate average frequency along with other aggregations - sql

I'm trying to make a query for a table where each row is an order. This query should get the following numbers for every day of the week:
[x] Total count of orders
[x] Sum of total_amount
[x] Total count of unique customers
[ ] Average order frequency of customers
I can't find a way to get the last one (average order frequency). I have tried this below yet I get a ERROR: aggregate function calls cannot be nested:
SELECT weekday,
COUNT(*) AS orders,
SUM(total_amount) AS revenue,
COUNT(DISTINCT customer_id) AS customers,
AVG(COUNT(DISTINCT customer_id)) AS avg_order_freq -- Average order frequency
FROM orders
GROUP BY weekday
I was hoping there was a single aggregate function for this. It would've been easier if I only had to get only the average order frequency, but I also had to aggregate other columns.
I'm using PostgreSQL.

Related

SQL Percent Rank use 4-week average number to find percentage rank within full FlowFunds values. Partition by FlowGroup

I want to find the percentage rank of where a 4-week average sits across an entire time series of data.
[table attached here]
In this example, I want to find how the 4-week avg of FlowGroup a (507.5858) ranks in comparison to the other 4 values of FlowGroup a (804.6113, 1781.3388, 627.1612, -1182.7681).
The return value for FlowGroup a should be = 0.311
Currently I am ranking each individual Flow$ against the total like this:
PERCENT_RANK ()
OVER (PARTITION BY FlowGroup ORDER BY Flow$) as PctRank
I do not know how I can percent rank the 4-week average against the total Flow$, partitioned by FlowGroup.
Cheers!

How to apply aggregate functions to results of other aggregate functions in single query?

I have a table BIKE_TABLE containing columns Rented_Bike_Count, Hour, and Season. My goal is to determine average hourly rental count per season, as well as the MIN, MAX, and STDDEV of the average hourly rental count per season. I need to do this in a single query.
I used:
SELECT
SEASONS,
HOUR,
ROUND(AVG(RENTED_BIKE_COUNT),2) AS AVG_RENTALS_PER_HR
FROM TABLE
GROUP BY HOUR, SEASONS
ORDER BY SEASONS
and this gets me close, returning 96 rows (4 seasons x 24 hours per) like:
SEASON
HOUR
AVG_RENTALS_PER_HR
Autumn
0
709.44
Autumn
1
552.5
Autumn
2
377.48
Autumn
3
256.55
But I cannot figure out how to return the following results that use ROUND(AVG(RENTED_BIKE_COUNT) as their basis:
What is the average hourly rental count per season? The answer should be four lines, like: Autumn, [avg. number of bikes rented per hour]
What is the MIN of the average hourly rental count per season?
Same for MAX
Same for STDDEV.
I tried running
MIN(AVG(RENTED_BIKE_COUNT)) AS MIN_AVG_HRLY_RENTALS_BY_SEASON,
MAX(AVG(RENTED_BIKE_COUNT)) AS MAX_AVG_HRLY_RENTALS_BY_SEASON,
STDDEV(AVG(RENTED_BIKE_COUNT)) AS STNDRD_DEV_AVG_HRLY_RENTALS_BY_SEASON
as nested SELECT and then as nested FROM clauses, but I cannot seem to get it right. Am I close? Any assistance greatly appreciated.
I think that you are over complicating the task. Does this give you your answers? If not please tell me the difference between it's output and your desired output.
Of course you can add ROUND() to reach column etc as you see fit.
SELECT
SEASONS,
MIN(RENTED_BIKE_COUNT) minimum,
MAX(RENTED_BIKE_COUNT) maximum,
STDDEV(RENTED_BIKE_COUNT) sDev,
AVG(RENTED_BIKE_COUNT) average
FROM TABLE
GROUP BY SEASONS
ORDER BY SEASONS;
According to your comment It seems that you may want the following query.
WITH seasons AS(
SELECT
Season,
AVG(RENTED_BIKE_COUNT) seasonAverage
FROM TABLE
GROUP BY season)
SELECT
AVG(seasonAverage) average,
MIN(seasonAverage) minimum,
MAX(seasonAverage) maximum,
STDDEV(seasonAverage) sDev
FROM
seasons;

SQL query for cumulative sum of purchases across category

Write SQL query to get the cumulative sum of purchases across each category.
(Cumulative sum should be calculated following the order of purchases)
You are looking for the cumulative sum function, which uses a window frame:
sum(purchases) over (partition by category order by order_date)

Optimize Average of Averages SQL Query

I have a table where each row is a vendor with a sale made on some date.
I'm trying to compute average daily sales per vendor for the year 2019, and get a single number. Which I think means I want to compute an average of averages.
This is the query I'm considering, but it takes a very long time on this large table. Is there a smarter way to compute this average without this much nesting? I have a feeling I'm scanning rows more times than I need to.
-- Average of all vendor's average daily sale counts
SELECT AVG(vendor_avgs.avg_daily_sales) avg_of_avgs
FROM (
-- Get average number of daily sales for each vendor
SELECT vendor_daily_totals.memberdeviceid, AVG(vendor_daily_totals.cnt)
avg_daily_sales
FROM (
-- Get total number of sales for each vendor
SELECT vendorid, COUNT(*) cnt
FROM vendor_sales
WHERE year = 2019
GROUP BY vendorid, month, day
) vendor_daily_totals
GROUP BY vendor_daily_totals.vendorid
) vendor_avgs;
I'm curious if there is in general a way to compute an average of averages more efficiently.
This is running in Impala, by the way.
I think you can just do the calculation in one shot:
SELECT AVG(t.avgs)
FROM (
SELECT vendorid,
COUNT(*) * 1.0 / COUNT(DISTINCT month, day) as avgs
FROM vendor_sales
WHERE year = 2019
GROUP BY vendorid
) t
This gets the total and divides by the number of days. However, COUNT(DISTINCT) might be even slower than nested GROUP BYs in Impala, so you need to test this.

BigQuery: Calculating averages in daily partitioned tables

I have a problem with getting averages out of several partitioned daily tables. We have partitioned tables for every day. I want to have an SQL query that calculates requests average for N days grouped by country.
So this is the schema:
date (string)
country (string)
req (integer)
What I have until now:
SELECT country, avg(req) as AvgReq
FROM TABLE_DATE_RANGE([thePartitionedTable_],
DATE_ADD(CURRENT_TIMESTAMP(), -2, 'DAY'), CURRENT_TIMESTAMP())
GROUP BY country
This works for 1 day of course, but the data is skewed when i try it for 2 or more days. What is the problem in my logic? How does the AVG() function work in this case? Do i need to group by date as well?
So i want the daily average of thePartitionedTable_today and daily average thePartitionedTable_yesterday then i want the average of their averages if that makes sense. So if thePartitionedTable_today has a daily average of 2 for Nigeria and thePartitionedTable_yesterday had a daily average of 3 for Nigeria, then the average for Nigeria of those two days should be 2.5. I really appriciate your time!
Using standard SQL:
with avg_byday AS (
SELECT
country,
AVG(req) AS req_avg
FROM
`thePartitionedTable_*`
GROUP BY
_TABLE_SUFFIX,
country)
SELECT
country,
AVG(req_avg)
FROM
avg_byday
GROUP BY
country
The subquery will also give you average requests per country for each day.