MS ACCESS / SQL: Select maximum count per person, per variable - sql

In MS Access, I've joined two tables, one table is a list of sales and the city they took place (Sales), and another is a table of persons and the sales their participated in (SalePersons).
When joining the two tables, you can see that the combined table details many sales per person across many cities. My goal is obtain the most-frequented city for sales per each person.
For example, Customer 1 might have 2 sales in Baltimore, 1 sale in New York, and 3 Washington; customer 2 might have 3 sales in Washington, 4 sale in Wichita, and 1 sale in New York. The table needs to have only "Washington" listed for Customer 1, and only "Wichita" listed for Customer 2. If there's a tie, I'd like to list all the tied cities.
So far, I only have the initial join working.
SELECT SalePersons.PersonID, Count(Sales.SaleNum) AS CountOfSaleNum, Sales.CITY
FROM Sales INNER JOIN SalePersons ON Sales.SaleNum = SalePersons.SaleNum
GROUP BY SalePersons.PersonID, Sales.CITY;
But, as you might guess, this join will only give me the count of sales per city, per person across all cities. I need to retrieve only the 1 most-frequented city person.
I thought I could make this a subquery and wrap all this all under a Select MAX(CountOfSaleNum) clause, but that didn't work. I still have much to learn.
Thank you in advance! I don't know what I'd do without this site sometimes.

You can use window functions:
SELECT sp.*
FROM (SELECT sp.PersonID, COUNT(*) AS CountOfSaleNum, s.CITY,
ROW_NUMBER() OVER (PARTITION BY sp.PersonID ORDER BY COUNT(*) DESC) as seqnum
FROM Sales s INNER JOIN
SalePersons sp
ON s.SaleNum = sp.SaleNum
GROUP BY sp.PersonID, s.CITY
) sp
WHERE seqnum = 1;
In MS Access, you are stuck with a more complicated query:
SELECT sp.PersonID, COUNT(*) AS CountOfSaleNum, s.CITY
FROM Sales as s INNER JOIN
SalePersons as sp
ON s.SaleNum = sp.SaleNum
GROUP BY sp.PersonID, s.CITY
HAVING s.City = (SELECT TOP 1 s2.City
FROM Sales as s2 INNER JOIN
SalePersons as sp2
ON s2.SaleNum = sp2.SaleNum
WHERE sp2.PersonID = sp.PersonId
GROUP BY sp2.PersonId, s2.City
ORDER BY COUNT(*) DESC, s2.City
);

Related

(Simple?) SQL Query: Display all customer information for customers with 2+ orders

I'm doing practice exam material for a distance education course. I have the following three relations (simplified here):
salesperson(emp#, name, salary)
order(order#, cust#, emp#, total)
customer(cust#, name, city)
I'm stuck on a pair of SQL queries.
Display all customer info for customers with at least 1 order.
SELECT * FROM customer
INNER JOIN order ON order.cust# = customer.cust#
GROUP BY cust#;
Display all customer info for customers with at least 2 orders.
SELECT cust#, name, city, industry-type FROM customer
INNER JOIN order ON order.cust# = customer.cust#
GROUP BY cust#
HAVING COUNT(cust#) > 2;
I realize these are misguided attempts resulting from a poor understanding of SQL, but I've spent a ton of time on W3School's SQL Query example tool (https://www.w3schools.com/sql/trysql.asp?filename=trysql_select_where) without getting anywhere, and I finally need some "real" help.
You can try to use subquery to get count by cust# then do inner join to make it.
SELECT c.*
FROM (
SELECT cust# , COUNT(*) cnt
FROM order
GROUP BY cust#
) o INNER JOIN customer c ON c.cust# = o.cust#
WHERE o.cnt > 2
You can change table names according to your DB. Following queries you can directly run in W3Schools
Display all customer info for customers with at least 1 order.
SELECT * FROM customers as cust JOIN orders as o ON o.customerid =
cust.customerid GROUP BY o.customerid;
Display all customer info for customers with at least 2 orders.
SELECT * FROM customers as cust JOIN orders as O ON O.CustomerID = cust.CustomerID GROUP BY cust.CustomerID HAVING COUNT(cust.CustomerID) > 2;

SQL Query with row_number() not returning expected output

my goal is to write a query that should return the cities which produced the highest avg. sales for each item-category.
This is the expected output:
item_category|city
books |los_angeles
toys |austin
electronics |san_fransisco
My 3 table schemas look like this:
users
user_id|city
sales
user_id|item_id|sales_amt
items
item_id|item_category
These are further notes to consider:
1. sales_amt is the only column that may have Null values. if no users have placed a sale for a particular item-category (no rows in sales with a non-Null sales_amt), then the city name should be Null.
2. only 1 row per each distinct item. It more than 1 city qualify, then pick the first one alphabetically.
The attempt I took looks like this but it does not produce the right output:
select a.item_category,a.city from (
select
i.item_category,
u.city,
row_number() over (partition by i.item_category,u.city order by avg(s.sales_amt) desc)rk
from sales s
join users u on s.user_id=u.user_id
join items i on i.item_id=s.item_id
group by i.item_category,u.city)a
where a.rk=1
My output does not return the Null cased for sales_amt. Also, I get non-unique rows. Therefore, I am very nervous I am not properly incorporating the 2 notes.
I hope someone can help.
my goal is to write a query that should return the cities which produced the highest avg. sales for each item-category.
This can be calculated using aggregation and window functions:
select ic.*
from (select i.item_category, u.city,
row_number() over(partition by u.item_category order by avg(s.sales_amt) desc, u.city) as seqnum
from users u join
sales s
on s.user_id = u.user_id join
items i
on i.item_id = s.item_id
group by i.item_category, u.city
) ic
where seqnum = 1;
Your question explicitly says "average" which is why this uses avg(). However, I suspect that you really want the sum in each city, which would be sum().
Notes:
You want one row so row_number() instead of rank().
You need sales to calculate the average, so join, instead of left join.
You want one row per item_category, so that is used for partitioning.
Aaaand my take on it is a mix of GMB and Gordon's advices; GMB points out that left joins are needed but I think his starting table, partition and choice of rank() is wrong (his query cannot generate null city names as requested, and could generate duplicates tied on same avg), and Gordon picked up on things like ordering by city on a tied avg which GMB did not but missed the "if no sales of any items in category X put null for the city" requirement. Both guys left cancelled orders floating round the system which introduces errors:
select *
from (
select
i.item_category,
u.city,
row_number() over(partition by i.item_category order by avg(s.sales_amt) desc, u.city asc) rn
from items i
left join (select * from sales where sale_amt is not null) s on i.item_id = s.item_id
left join users u on s.user_id = u.user_id
group by i.item_category, u.city
) t
where rn = 1
We start from itemcategory so that categories having no sales get nulls for their sale amount and city.
We also need to consider that any sales that didn't fulfil will have null in their amount and we exclude these with a subquery otherwise they will link through to users giving a false positive - even though the avg will calculate as null for a category that only has cancelled orders, the city will still show when it should not). I could also have done this with a and sales_amt is not null predicate in the join but I think this way is clearer. This should not be done with a predicate in the where clause because that will eliminate the sale-less categories we are trying to preserve
Row number is used on avg but with city name to break any ties. It's a simpler function than rank and cannot generate duplicate values
Finally we pull the rn 1s to get the top averaging cities
I think you want left joins starting from users in the inner query to preserve cities without sales.
As for the ranking: if you want one record per city, then do not put other columns that city in the partition (your current partition gives you one record per city and per category, which is not what you want).
Consider:
select *
from (
select
i.item_category,
u.city,
rank() over(partition by u.city order by avg(s.sales_amt) desc) rk
from users u
left join sales s on s.user_id = u.user_id
left join items i on i.item_id = s.item_id
group by i.item_category, u.city
) t
where rk = 1

Access 2002 SQL for joining three tables

I have been trying to get this to work for a while now. I have 3 tables. First table has the Sales for customers which include the CustomerID, DateOfSales (Which always has the first of the month). The second table has the CustomerName, CustomerID. The third table has which customers buy what product lines. They are stored by CustomerID, ProductID.
I want to get a list (from one SQL hopefully) that has ALL the customers that are listed as buying a certain ProductID AND the maxDate from the Sales. I can get all of them IF there are sales for that customer. How the heck do I get ALL customers that buy the certain ProductID AND the maxDate from Sales or NULL if there is no sales found?
SalesList |CustomerList|WhoBuysWhat
----------|------------|-----------
maxDate |CustomerID |CustomerID
CustomerID| |ProductID=17
This is as close as I got. It gets all max dates but only if there have been sales. I want the CustomerID and a NULL for the maxDate if there were no sales recorded yet.
SELECT WhoBuysWhat.CustomerID, CustomerList.CustomerName,
Max(SalesList.MonthYear) AS MaxOfMonthYear FROM (CustomerList INNER
JOIN SalesList ON CustomerList.CustomerID = SalesList.CustomerID) INNER
JOIN WhoBuysWhat ON CustomerList.CustomerID = WhoBuysWhat.CustomerID
WHERE (((SalesList.ProductID)=17)) GROUP BY WhoBuysWhat.CustomerID,
CustomerList.CustomerName;
Is it possible or do I need to use multiple SQL statements? I know we should get something newer than Access 2002 but that is what they have.
You want LEFT JOINs:
SELECT cl.CustomerID, cl.CustomerName,
Max(sl.MonthYear) AS MaxOfMonthYear
FROM (CustomerList as cl LEFT JOIN
(SELECT sl.*
FROM SalesList sl
WHERE sl.ProductID = 17
) as sl
ON cl.CustomerID = sl.CustomerID
) LEFT JOIN
WhoBuysWhat wbw
ON cl.CustomerID = wbw.CustomerID
GROUP BY cl.CustomerID, cl.CustomerName;

Select the row with the max value in a specific column, SQL Server

I've been working on a school project past few days and I picked to work on a DVD club database. I have six tables, but for this question, only two are relevant. The clients table and the loans table. So, what I am trying to do is count for every client how many loans he's made so far and out of all pick the client with the max number of loans, so he can be rewarded the free DVD next month. Here is the code I've written, but it doesn't pick the specific client, it shows all the clients having the max number of loans of a specific client:
SELECT tblClients.Client_ID, MAX(x.Number_Of_Loans) AS MAX_NOL
FROM
(
SELECT COUNT(tblLoans.Client_ID) AS Number_Of_Loans
FROM tblClients, tblLoans WHERE tblClients.Client_ID=tblLoans.Client_ID
GROUP BY tblLoans.Client_ID
)x, tblClients, tblLoans
WHERE tblClients.Client_ID=tblLoans.Client_ID
GROUP BY tblClients.Client_ID, tblClients.Given_Name,
tblClients.Family_Name, tblClients.Phone, tblClients.Address, tblClients.Town_ID
Use the following
SELECT TOP 1 tblClients.Client_ID,COUNT(tblLoans.Client_ID) AS MAX_NOL
FROM tblClients, tblLoans
WHERE tblClients.Client_ID=tblLoans.Client_ID
GROUP BY tblClients.Client_ID
ORDER BY COUNT(tblLoans.Client_ID) DESC
You can do this with a single aggregate GROUP, ordered by the client with the max loans:
SELECT TOP 1 tblClients.Client_ID, tblClients.Given_Name, tblClients.Family_Name,
tblClients.Phone, tblClients.Address, tblClients.Town_ID,
COUNT(x.Number_Of_Loans) AS MAX_NOL
FROM
tblClients INNER JOIN tblLoans
ON tblClients.Client_ID=tblLoans.Client_ID
GROUP BY tblClients.Client_ID, tblClients.Given_Name, tblClients.Family_Name,
tblClients.Phone, tblClients.Address, tblClients.Town_ID
ORDER BY MAX_NOL DESC;
Any selected columns from the client need to be included in the GROUP, and I would recommend using JOINs instead of WHERE joins.
Edit
What might be tidier is to split the determination of the ClientId with the most loans and the concern of fetching the rest of the client's data, like so (rather than the ungainly GROUP BY over many columns):
SELECT c.Client_ID, c.Given_Name, c.Family_Name,
c.Phone, c.Address, c.Town_ID,
x.MaxLoans
FROM
tblClients c
INNER JOIN
(SELECT TOP 1 tblClients.Client_ID, COUNT(tblLoans.Client_ID) AS MaxLoans
FROM tblClients
INNER JOIN tblLoans
ON tblClients.Client_ID=tblLoans.Client_ID
GROUP BY tblClients.Client_ID
ORDER BY MaxLoans DESC) x
ON c.Client_ID = x.Client_ID;

SQL List the representative that handles the most customers

I have 2 tables SALESREP and CUSTOMER
I need to find out which salesrep has most customers
I have the following code:
select rep_lname, count(cust_num)
from customer inner join salesrep
on customer.REP_NUM = SALESREP.REP_NUM
group by rep_lname
This gives me all the rows with the number of customers each salesrep has, instead I need only one row that has the most customers.
How can I find the row with MAX num of customers?
select rep_lname, count(cust_num)
from customer inner join salesrep
on customer.REP_NUM = SALESREP.REP_NUM
group by rep_lname order by count(cust_num) desc limit 1;
I'm sure there's another way using having, but I can't seem to figure it out at the moment. Perhaps somebody else will chime in with it?
SELECT TOP 1 WITH TIES rep_lname, COUNT(cust_num)
FROM customer inner join salesrep
ON customer.REP_NUM = SALESREP.REP_NUM
GROUP BY rep_lname
ORDER BY count(cust_num) DESC