using a subquery with a having - sql

So the goal is to get a list of customers that have on average ordered more than the total average of all customers.
Select customerNumber, customerName, orderNumber, SUM(quantityOrdered)as 'total_qty', ROUND(AVG(quantityOrdered),2) as 'avg'
From customers
join orders using(customerNumber)
join orderdetails using (orderNumber)
Group by customerNumber, OrderNumber
Having ROUND(AVG(quantityOrdered),2) > ROUND(AVG(quantityOrdered),2) IN
(SELECT ROUND(AVG(quantityOrdered),2) FROM orderdetails)
ORDER BY customerName;
My code runs but it doesn't filter the results on the avg quantity ordered column to only show results over the total average of 35.22.

Possibly, you mean:
select c.customernumber, c.customername,
sum(od.quantity_ordered) as sum_qty,
round(avg(od.quantity_ordered), 2) as avg_dty
from customers c
join orders o using(customerNumber)
join orderdetails od using (orderNumber)
group by customernumber, customername
having avg(od.quantity_ordered) > (select avg(quantity_ordered) from orderdetails)
Rationale:
you discuss computing the average ordered, but what your query does is compare the average order detail quantity per customer; this assumes that the latter is what you want
then: since you want an average per customer, so do not put the order number in the group by
no need for in or the-like in the having clause: just compare the customer's average against a scalar subquery that computes the overall
Notes:
don't use single quotes for identifiers (such as column aliases) - they are meant for literal strings
table aliases make the query easier to write and read; prefixing all columns with the alias of the table they belong to makes the query understandable

Related

SQL Server clause issue

Select Name, contact, and postal code of the customer who has done MAXIMUM transactions in the month of June.
SELECT
Customer.customer_name,
Customer.customer_email,
Customer.customer_postcode
FROM
Customer
INNER JOIN
Sales on Customer.customer_id = Sales.customer_id
WHERE
MAX(Sales.customer_id) IN (SELECT COUNT((sales.customer_id)) AS 'transactions'
FROM sales
GROUP BY (sales.customer_id))
AND MONTH(date_purchased) = 6;
But I get this error:
Msg 147, Level 15, State 1, Line 4
An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference
You're taking the MAX of the customer_id, but what you want is the customer_id with the highest number of transactions. Start with your inner query, and get the top customers using ORDER BY..DESC.
SELECT Sales.customer_id, count(Sales.customer_id) as transactions
FROM Sales
GROUP BY Sales.customer_id
ORDER BY transactions DESC;
Now that you have the top customer_id, you should be able to join that result on the Customers table (using this as a CTE or an inner query) to get the name, contact, and postal code.
Your current query has a number of issues:
Aggregates such as MAX cannot be used in the WHERE, they must be in the HAVING part.
Even if you change it to HAVING, the subquery is wrong because it doesn't filter on June
A much simpler method is to just join the tables, group, and then sort by COUNT and take the first row
The outer June filter should use start and end dates, not MONTH function, to improve performance
You should use proper table aliasing
SELECT TOP (1)
c.customer_name,
c.customer_email,
c.customer_postcode
FROM
Customer c
INNER JOIN
Sales s on Customer.customer_id = Sales.customer_id
WHERE
date_purchased >= '20200601' AND date_purchased < '20200701'
-- note the use of half open interval >= AND <
GROUP BY
c.customer_name,
c.customer_email,
c.customer_postcode
ORDER BY COUNT(*) DESC;

PostgreSQL Query to JOIN and SUM

I have 2 tables:
orders
orderItems
SUM TOTAL (products price total) of each order s saved on table orders field total. I need to connect these 2 tables and get the sum total and count from the values saved in orders table an example is below
SELECT
count(orders.id), sum(orders.total)
FROM
orders
INNER JOIN orderItems ON orderItems.order_no = orders.order_no
AND orders.order_no LIKE 'P%' AND orderItems.pCode IN ('1','2','3','4')
How do I get the sum and count from single query?
This is a stab in the dark, but based on your updated comments I think I might know what you are dealing with. It seems like you are doing a sum and count on the order header level from the "orders" table, but by joining to the lines table you are getting multiple records, thus getting a seemingly arbitrary multiplication of both aggregates.
If this is the case, where you only want to sum and count the order header if there is one or more lines that meet your criteria (pCode in 1, 2, 3, 4) then what you want is a semi-join, using the exists clause.
SELECT
count(orders.id), sum(orders.total)
FROM
orders o
where
o.order_no like 'P%' and
exists (
select null
from orderItems i
where
o.order_no = i.order_no and
i.pCode in ('1', '2', '3', '4')
)
What this does is even if you have multiple lines meeting your condition(s), it will still only sum each header once. The syntax takes some getting used to, but the construct itself is very useful and efficient. The alternative would be a subquery "in" list, which on PostgreSQL would not run as efficiently for large datasets.
If that's not what you meant, please edit your question with the sample data and what you expect to see for the final output.
If you want to use aggregates (e.g. SUM, COUNT) across values (e.g. pCode) then you need to use a GROUP BY clause on the non-aggregated columns:
SELECT
orderItems.pCode,
COUNT(orders.id) AS order_count,
SUM(orders.total) AS order_total
FROM orders
INNER JOIN orderItems
ON orderItems.order_no = orders.order_no
WHERE orders.order_no LIKE 'P%'
AND orderItems.pCode IN ('1','2','3','4')
GROUP BY
orderItems.pCode
Note how orderItems.pCode is in both the SELECT clause and the GROUP BY clause. If you wanted to list by orders.order_no as well then you would add that column to both clauses too.

Group By Clause, Do i have to call all rows what i using in Select?

Do I need to put all the column names in group by which I have select put in select?
for example in this simple query :
Select
CustomerID,
CompanyName,
ContactName,
ContactTitle,
City,
Country
From
Customers
Group By
Country,
CompanyName,
ContactName,
ContactTitle,
City,
Country,
CustomerID
I have to allways call same amount Group By what i used in Select?
If you're just selecting columns and you want the returned records to discard the exact duplicate rows? Then there are 2 methods.
1) group by
2) distinct
Your query doesn't use any of the aggregate functions like f.e. COUNT, MIN, MAX, SUM, ...
So your query could use DISTINCT instead of a GROUP BY.
select DISTINCT
CustomerID, CompanyName, ContactName, ContactTitle, City, Country
from Customers
But if CustomerID is a primary key, then CustomerID would already make the result unique.
So then this query doesn't need a GROUP BY or a DISTINCT to only get unique records.
select CustomerID, CompanyName, ContactName, ContactTitle, City, Country
from Customers
Note that one could have both DISTINCT and GROUP BY in the same query. But that's just pointless. A GROUP BY already enforces the uniqueness, so adding a DISTINCT to make them unique would just make the query slower for no reason.
As for the why all the columns in that select also have to be listed in the GROUP BY? Some databases, f.e. MySql can be more tolerant about not having to group on all columns. But it's a rule from one of the SQL Standards. So most databases enforce that. It's to avoid potential misleading results.
GROUP BY x, y means you want one result row per x and y. So if you have a table with bills, you could group by year and month for instance and thus get the number of bills (count(*)) and the total (sum(amount)) per month.
So the question is what rows do you want to see. A row per company (with the number of their customers) maybe? A row per city? The GROUP BY clause contains exactly those columns mentioned.
Your GROUP BY clause does exactly nothing, as select customers and you group by customer ID (which should be the customer table's primary key).

PostgreSQL: get the min of a column with it's associated city

I have been at this for the past two hours and have tried many different ways in regards to subquery and joins. Here's the exact question "Get the name and city of customers who live in the city where the least number of products are made"
Here is a snapshot of the database tables
I know how to get the min
select min(quantity)
from products
but this returns just the min without the city attached to it so I can't search for the city in the customers table.
I have also tried group by and found it gave me 3 min's (one for each group of cities) which i believe may help me
select city,min(quantity)
from products
group by city
Putting everything together I got something that looks like
SELECT
c.name,c.city
FROM
customers c
INNER JOIN
(
SELECT
city,
MIN(quantity) AS min_quantity
FROM
products
GROUP BY
city
) AS SQ ON
SQ.city = c.city
But this returns multiple customers, which isn't correct. I assume by looking at the database the city when the lowest number of products seems to be Newark and there are no customers who reside in Newark so I assume again this query would result in 0 hits.Thank you for your time.
Example
Here is an example "Get the pids of products ordered through any agent who makes at least one order for a customer in Kyoto"
and the answer I provided is
select pid
from orders
inner join agents
on orders.aid = agents.aid
inner join customers
on customers.cid = orders.cid
where customers.city = 'Kyoto'
In Postgresql you have sophisticated tools, viz., windowing and CTEs.
WITH
find_least_sumq AS
(SELECT city, RANK() OVER ( PARTITION BY city ORDER BY SUM(quantity) ) AS r
FROM products)
SELECT name, city
FROM customers NATURAL JOIN find_least_sumq /* ON city */
WHERE r=1; /* rank 1 is smallest summed quantity including ties */
In Drew's answer, you are zeronig in on the cities where the smallest number of any particular item is made. I interpret the question as wanting the sum of items made in that city.
I guess it be something around this idea:
select customers.name, city.city, city.min
from customers
join (
select city, sum (quantity) as min
from products
group by city
--filter by the cities where the total_quantity = min_quantity
having sum (quantity) = (
--get the minimum quantity
select min(quantity) from products
)
) city on customers.city = city.city
This can be made so much simpler. Just sort the output by the field you want to get the minimum of.
SELECT city, quantity FROM customers ORDER BY quantity LIMIT 1;
I have just figured out my own answer. I guess taking a break and coming back to it was all I needed. For future readers this answer will use a subquery to help you get the min of a column and compare a different column (of that same row) to a different tables column.
This example is getting the city where the least number of products are made (quantity column) in the products table and comparing that city to the cities to the city column in the customers table, then printing the names and the city of those customers. (to help clarify, use the link in the original question to look at the structure of the database I am talking about) First step is to sum all the products to their respective cities, and then take the min of that, and then find the customers in that city.Here was my solution
with citySum as(
select city,sum(quantity) as sum
from products
group by city)
select name,city
from customers
where city
in
(select city
from citySum
where sum =(
select min(sum)
from citySum))
Here is another solution I have found today that works as well using only Sub queries
select c.name,c.city
from customers c
where c.city
in
(select city
from
(select p.city,sum(p.quantity) as lowestSum
from products p
group by p.city) summedCityQuantities
order by lowestsum asc
limit 1)

get row with max from group by results

I have sql such as:
select
c.customerID, sum(o.orderCost)
from customer c, order o
where c.customerID=o.customerID
group by c.customerID;
This returns a list of
customerID, orderCost
where orderCost is the total cost of all orders the customer has made. I want to select the customer who has paid us the most (who has the highest orderCost). Do I need to create a nested query for this?
You need a nested query, but you don't have to access the tables twice if you use analytic functions.
select customerID, sumOrderCost from
(
select customerID, sumOrderCost,
rank() over (order by sumOrderCost desc) as rn
from (
select c.customerID, sum(o.orderCost) as sumOrderCost
from customer c, orders o
where c.customerID=o.customerID
group by c.customerID
)
)
where rn = 1;
The rank() function ranks the results from your original query by the sum() value, then you only pick those with the highest rank - that is, the row(s) with the highest total order cost.
If more than one customer has the same total order cost, this will return both. If that isn't what you want you'll have to decide how to determine which single result to use. If you want the lowest customer ID, for example, add that to the ranking function:
select customerID, sumOrderCost,
rank() over (order by sumOrderCost desc, customerID) as rn
You can adjust you original query to return other data instead, just for the ordering, and not include it in the outer select.
You need to create nested query for this.
Two queries.