SQL GROUP BY vs AVG (Problem 25 of SQL Practice Problems Book) - sql

The question is taken from problem 25 of SQL Practice Problems Book...
[Problem 25][1]
[1]: https://i.stack.imgur.com/fmJjs.jpg
High freight charges
Some of the countries we ship to have very high freight charges.
We'd like to investigate some more shipping options for our customers, to be able to offer them lower freight charges.
Return the three ship countries with the highest average freight overall, in descending order by average freight.
My first intuition was to run the following query:
SELECT shipcountry, freight
from orders
GROUP BY shipcountry
ORDER BY freight DESC
LIMIT 3;
which is apparently not the correct way to do it. The query below uses avg aggregation function, and is the correct way to query. I wanted to know why we need to run AVG function when we have the GROUP BY?
SELECT shipcountry, avg(freight) as mean_freight
FROM orders
GROUP BY shipcountry
ORDER BY mean_freight DESC
LIMIT 3;

The question asks about average freight cost per country; that strongly implies you'll need to use the AVG function (unless you like making extra work for yourself and you decide that using SUM and COUNT is more interesting in some way).
You can't avoid using aggregates somewhere along the line since the query is about aggregate values.

Related

PostgreSQL question - finding sum of sales by state

I need help writing the query, we have a table called SALES, which has 3 columns as below:
Column Names: sale_id, state, sale_amount_cents
I assume the sale_amount_cents has the sale amount in cents as opposed to dollars, and our end answer needs to be in dollars so we would have to multiply by 100.
Can someone please help writing the query to sum sales, in dollars, by date, rounding to two decimal places, and sorting from the greatest sale amount to the least?
I assume the query would look like this:
UPDATE SALES SET sale_amount_cents=sale_amount_cents*100
SELECT SUM(sale_amount_cents) from SALES
GROUP BY STATE
ORDER BY sale_amount_cents DESC;
select state, SUM(sale_amount_cents)/100 as Sales_in_dollar from SALES
GROUP BY STATE ORDER BY SUM(sale_amount_cents) DESC

Optimize Average of Averages SQL Query

I have a table where each row is a vendor with a sale made on some date.
I'm trying to compute average daily sales per vendor for the year 2019, and get a single number. Which I think means I want to compute an average of averages.
This is the query I'm considering, but it takes a very long time on this large table. Is there a smarter way to compute this average without this much nesting? I have a feeling I'm scanning rows more times than I need to.
-- Average of all vendor's average daily sale counts
SELECT AVG(vendor_avgs.avg_daily_sales) avg_of_avgs
FROM (
-- Get average number of daily sales for each vendor
SELECT vendor_daily_totals.memberdeviceid, AVG(vendor_daily_totals.cnt)
avg_daily_sales
FROM (
-- Get total number of sales for each vendor
SELECT vendorid, COUNT(*) cnt
FROM vendor_sales
WHERE year = 2019
GROUP BY vendorid, month, day
) vendor_daily_totals
GROUP BY vendor_daily_totals.vendorid
) vendor_avgs;
I'm curious if there is in general a way to compute an average of averages more efficiently.
This is running in Impala, by the way.
I think you can just do the calculation in one shot:
SELECT AVG(t.avgs)
FROM (
SELECT vendorid,
COUNT(*) * 1.0 / COUNT(DISTINCT month, day) as avgs
FROM vendor_sales
WHERE year = 2019
GROUP BY vendorid
) t
This gets the total and divides by the number of days. However, COUNT(DISTINCT) might be even slower than nested GROUP BYs in Impala, so you need to test this.

How to use aggregate function if we need aggregate of specific rows in SQL Server 2014

I have the following code which has 7 rows. I would like to take avg, min, max of the product name column that are same (for example: I have 3 products with a name of 'Shoes' that are same and their costs are 50, 45, 60. I have one product named 'Hat' with a cost of 50. Now I would like to take average of 3 common rows i.e 50, 45, 60 and it should display 51.66. For the other row, it should display 50 and so forth)
My problem is if I run below query it display the avg, max, min of same row instead of taking avg, min, max of rows that are same.
SELECT
PRODUCT.ProductName,
Vendor.VendorName, VendorProduct.Cost,
AVG(COST) AS AVG,
MIN(COST) AS MIN,
MAX(COST) AS MAX
FROM
PRODUCT
JOIN
VendorProduct ON VendorProduct.ProductID = PRODUCT.ProductID
JOIN
Vendor ON Vendor.VendorID = VendorProduct.VendorID
GROUP BY
PRODUCT.ProductName, VendorProduct.Cost, Vendor.VendorName
Any help is appreciated.
I am guessing that you want:
SELECT p.ProductName,
AVG(vp.COST) AS AVG, MIN(vp.COST) AS MIN, MAX(vp.COST) AS MAX
FROM PRODUCT p join
VendorProduct vp
on vp.ProductID = p.ProductID
GROUP BY p.ProductName;
Notes:
I have qualified all the column names using abbreviations for the table.
I removed the vendor name from the SELECT, because you want the averages by product.
I removed the vendor name and price from the GROUP BY for the same reason.
AVG, MIN, and MAX are bad names for columns, because these are SQL keywords.
Change
AVG(cost) as avg
To
AVG(cost) OVER(PARTITION BY productname) as [avg]
Make the same change (adding an OVER clause) to the other aggregates
and also remove the GROUP BY line entirely
This will give you the same number of rows as you're getting now, but the aggregates will repeat for every identical value of product name. You also get to keep the vendor details
What is it? In sqls it's called a window function; a way of grouping a data set and implicitly connecting the grouped results to the detail rows, without losing the detail rows' detail. MSDN will have a lot to say about them if your curiosity is piqued
Note, Gordon's advice re column naming

SQL DateDiff Syntax

I have a homework problem that I'm having a lot of trouble with... I don't expect the answer and I truly want to learn it. Could somebody help me out with the syntax?
Problem:
For each Sales Order, show how many days it took to ship the order in order by the longest order, then by Sales Order Number. Display Sales Order Number and the number of days to ship. Include the orders that have not yet shipped.
So far I have:
SELECT SalesOrder.SalesOrderNumber,
DATEDIFF (d, MIN(SalesOrder.OrderDate), MAX(Shipment.ShipmentDate)) AS "DaysToShip"
FROM SalesOrder, Shipment
GROUP BY SalesOrder.SalesOrderNumber;
Sometimes it's helpful to see an intermediate form of your query to evaluate if it's providing the correct data at some stage.
Consider the following query, pulled from your example minus some elements:
SELECT SalesOrder.SalesOrderNumber, SalesOrder.OrderDate, Shipment.ShipmentDate
FROM SalesOrder, Shipment
You should observe the results of this query and see how they differ from what you expect. In this case, you haven't indicated how SalesOrder and Shipment are related. The result will be many more rows than there are orders, with each SalesOrder related to each and every other Shipment record (a cross-join).
Once you provide the correct join condition and achieve the desired results at that stage, try adding in aggregation (GROUP BY, MIN, MAX) and test that form of your query. Finally, when you're convinced that you have the correct inputs, add in DATEDIFF and you'll have your final query.
SELECT SalesOrder.SalesOrderNumber,
DATEDIFF (d, MAX(SalesOrder.OrderDate), MAX(Shipment.ShipmentDate)) AS "DaysToShip"
FROM SalesOrder, Shipment
GROUP BY SalesOrder.SalesOrderNumber;

sql average of a sum divided by a count

I am trying to do an SQL query to get the average spend for a specific customer.
I have written the following SQL (this is slighlty cut down for this example)..
SELECT SUM(price) as sumPrice, COUNT(transactionId) as transactionCount, customerName
FROM customers, transactions
WHERE customers.customerId = transactions.customerId
AND transactiontypeId = 1
GROUP BY customers.customerId
This gives me the sum of the transaction and the count. With this I can then divide the sum by the count to get the average spend. However I would like to be able to get the Average as a value straight out of the database rather than manipulate the data once I have got it out.
Is there any way to do this? I have played around with writing a select within a select but haven;t had much luck as of yet, hence asking on here.
Thanks in advance
MySQL has a mean average function built-in.
SELECT AVG(price) AS averageSpend, customerName
FROM customers, transactions
WHERE customers.customerId = transactions.customerId
AND transactiontypeId = 1
GROUP BY customers.customerId