BigQuery: Calculating averages in daily partitioned tables - sql

I have a problem with getting averages out of several partitioned daily tables. We have partitioned tables for every day. I want to have an SQL query that calculates requests average for N days grouped by country.
So this is the schema:
date (string)
country (string)
req (integer)
What I have until now:
SELECT country, avg(req) as AvgReq
FROM TABLE_DATE_RANGE([thePartitionedTable_],
DATE_ADD(CURRENT_TIMESTAMP(), -2, 'DAY'), CURRENT_TIMESTAMP())
GROUP BY country
This works for 1 day of course, but the data is skewed when i try it for 2 or more days. What is the problem in my logic? How does the AVG() function work in this case? Do i need to group by date as well?
So i want the daily average of thePartitionedTable_today and daily average thePartitionedTable_yesterday then i want the average of their averages if that makes sense. So if thePartitionedTable_today has a daily average of 2 for Nigeria and thePartitionedTable_yesterday had a daily average of 3 for Nigeria, then the average for Nigeria of those two days should be 2.5. I really appriciate your time!

Using standard SQL:
with avg_byday AS (
SELECT
country,
AVG(req) AS req_avg
FROM
`thePartitionedTable_*`
GROUP BY
_TABLE_SUFFIX,
country)
SELECT
country,
AVG(req_avg)
FROM
avg_byday
GROUP BY
country
The subquery will also give you average requests per country for each day.

Related

How to apply aggregate functions to results of other aggregate functions in single query?

I have a table BIKE_TABLE containing columns Rented_Bike_Count, Hour, and Season. My goal is to determine average hourly rental count per season, as well as the MIN, MAX, and STDDEV of the average hourly rental count per season. I need to do this in a single query.
I used:
SELECT
SEASONS,
HOUR,
ROUND(AVG(RENTED_BIKE_COUNT),2) AS AVG_RENTALS_PER_HR
FROM TABLE
GROUP BY HOUR, SEASONS
ORDER BY SEASONS
and this gets me close, returning 96 rows (4 seasons x 24 hours per) like:
SEASON
HOUR
AVG_RENTALS_PER_HR
Autumn
0
709.44
Autumn
1
552.5
Autumn
2
377.48
Autumn
3
256.55
But I cannot figure out how to return the following results that use ROUND(AVG(RENTED_BIKE_COUNT) as their basis:
What is the average hourly rental count per season? The answer should be four lines, like: Autumn, [avg. number of bikes rented per hour]
What is the MIN of the average hourly rental count per season?
Same for MAX
Same for STDDEV.
I tried running
MIN(AVG(RENTED_BIKE_COUNT)) AS MIN_AVG_HRLY_RENTALS_BY_SEASON,
MAX(AVG(RENTED_BIKE_COUNT)) AS MAX_AVG_HRLY_RENTALS_BY_SEASON,
STDDEV(AVG(RENTED_BIKE_COUNT)) AS STNDRD_DEV_AVG_HRLY_RENTALS_BY_SEASON
as nested SELECT and then as nested FROM clauses, but I cannot seem to get it right. Am I close? Any assistance greatly appreciated.
I think that you are over complicating the task. Does this give you your answers? If not please tell me the difference between it's output and your desired output.
Of course you can add ROUND() to reach column etc as you see fit.
SELECT
SEASONS,
MIN(RENTED_BIKE_COUNT) minimum,
MAX(RENTED_BIKE_COUNT) maximum,
STDDEV(RENTED_BIKE_COUNT) sDev,
AVG(RENTED_BIKE_COUNT) average
FROM TABLE
GROUP BY SEASONS
ORDER BY SEASONS;
According to your comment It seems that you may want the following query.
WITH seasons AS(
SELECT
Season,
AVG(RENTED_BIKE_COUNT) seasonAverage
FROM TABLE
GROUP BY season)
SELECT
AVG(seasonAverage) average,
MIN(seasonAverage) minimum,
MAX(seasonAverage) maximum,
STDDEV(seasonAverage) sDev
FROM
seasons;

How to write an SQL query to get max number of counts for the most number of travelling of a user within a month

I have been given a task by my manager to write a SQL query to select the max number of counts (no of records) for a user who has travelled the most within a month provided that if the user travels multiple places on the same date, then it should be counted as one. For instance, if you look at the following table design; according to this scenario, my query must return me a count of 2. Although traveller_id "1" has traveled three times within a month, but he traveled to Thailand and USA on the same date, that is why its count is reduced to 2.
I have also developed my logic for this query but I am unable to write it due to lack of syntax knowledge. I split up this query into 3 parts:
Select All records from the table within a month using the MONTH function of SQL
Select All distinct DateTime records from the above result so that the same DateTime gets eliminated.
Select max number of counts for the traveller who visited most places.
Please help me in completing my query. You can also use a different approach from mine.
You can use the count aggregation in a cte then select top(1):
with u as
(select traveller_id,
count(distinct visit_date) as n
from travellers_log
where visit_date between '2022-03-01' and '2022-03-31'
group by traveller_id)
select top(1) traveller_id, name, n from u inner join table_travellers
on u.traveller_id = table_travellers.id
order by n desc;

Optimize Average of Averages SQL Query

I have a table where each row is a vendor with a sale made on some date.
I'm trying to compute average daily sales per vendor for the year 2019, and get a single number. Which I think means I want to compute an average of averages.
This is the query I'm considering, but it takes a very long time on this large table. Is there a smarter way to compute this average without this much nesting? I have a feeling I'm scanning rows more times than I need to.
-- Average of all vendor's average daily sale counts
SELECT AVG(vendor_avgs.avg_daily_sales) avg_of_avgs
FROM (
-- Get average number of daily sales for each vendor
SELECT vendor_daily_totals.memberdeviceid, AVG(vendor_daily_totals.cnt)
avg_daily_sales
FROM (
-- Get total number of sales for each vendor
SELECT vendorid, COUNT(*) cnt
FROM vendor_sales
WHERE year = 2019
GROUP BY vendorid, month, day
) vendor_daily_totals
GROUP BY vendor_daily_totals.vendorid
) vendor_avgs;
I'm curious if there is in general a way to compute an average of averages more efficiently.
This is running in Impala, by the way.
I think you can just do the calculation in one shot:
SELECT AVG(t.avgs)
FROM (
SELECT vendorid,
COUNT(*) * 1.0 / COUNT(DISTINCT month, day) as avgs
FROM vendor_sales
WHERE year = 2019
GROUP BY vendorid
) t
This gets the total and divides by the number of days. However, COUNT(DISTINCT) might be even slower than nested GROUP BYs in Impala, so you need to test this.

Calculation of weighted average counts in SQL

I have a query that I am currently using to find counts
select Name, Count(Distinct(ID)), Status, Team, Date from list
In addition to the counts, I need to calculate a goal based on weighted average of counts per status and team, for each day.
For example, if Name 1 counts are divided into 50% Status1-Team1(X) and 50% Status2-Team2(Y) yesterday, then today's goal for Name1 needs to be (X+Y)/2.
The table would look like this, with the 'Goal' field needed as the output:
What is the best way to do this in the same query?
I'm almost guessing here since you did not provide more details but maybe you want to do this:
SELECT name,status,team,data,(select sum(data)/(select count(*) from list where name = q.name)) FROM (SELECT Name, Count(Distinct(ID)) as data, Status, Team, Date FROM list) as q

SQL daily sum report

I have two tables, one is income which consists of ID, income_amount and date
the other is expenses which is ID, amount_spent and date.
I'm trying to display a table with three columns the daily total of income, the daily total of amount and the date for that day, it is possible for that day to have no income or amount but not necessarily both.
I was able to display the table in visual c# by gathering them in individual tables and deriving the results programmatically but is there a way to achieve that table with just a single sql query?
trunc_to_day here is an hypothetical function which truncates a date to its day (you didn't specify what RDMBS you are using):
select sum(incomes), sun(spent), day from (
(select income_amount incomes, 0 spent, trunc_to_day(datecol) day from income_table)
union all
(select 0 incomes, amount_spent spent, trunc_to_day(datecol) day from spent_table)
) group by day;
Finally, if you want to limit to some days, use a where statement on it.