I am getting too many solutions when I need only one - sql

I use a query that returns the name of the city with the highest number of orders placed.
This is what I have:
SELECT MAX(o.OrderID) AS [Number of Orders], od.ShipCity
FROM Orders o, [Order Details] od
GROUP BY o.ShipCity
ORDER BY [Number of Orders] DESC
I got all of the cities and their orders instead of just the one city with the most orders.
What happened?

Yeah, there's a couple of things wrong with your query. First, you're getting the max order id , which is presumably some autoincrement column. It's like Karl's answer is first, mine is second, SELECT MAX(answerid) FROM this_discussion = 2.... but that doesn't mean I have more answers than he does.
Rnofx5 is also right... you need to tell your table what to join ON, cause right now it's creating a Cartesian Product. If you're not sure what that is, for now accept that it's a horrible, evil, wicked thing to do and then Google it after we're done fixing the query.
So, we have orders and order details. Presumably orders does not contain City, so we need order details
SELECT count(o.OrderID), od.ShipCity
FROM orders AS o
INNER JOIN [Order Details] AS od
ON o.{a varible that both Orders and Order Details have in common} = .{a varible that both Orders and Order Details have in common}
GROUP BY od.ShipCity
ORDER BY count(o.OrderID) DESC
LIMIT 1;
Okay, so we're joining Orders with Order Details. In order to do that, we need to associate every order with something in Order Details. I don't know your schema, but from the sounds of it probably each order has a corresponding record in Order Details. In that case, you join these two tables using their ID. Something like
ON o.OrderID = od.OrderID
Now, we are counting all of the orders associated with a particular city... and we sorting them by our count, in descending order. And then we are keeping only the very first record that's returned (LIMIT 1)
Depending on your SQL implementation, you may need TOP 1 instead of LIMIT 1. You tagged mysqli, so presumably this is MySQL and in that case you'd want LIMIT not TOP. But be aware that that's a syntax variation you may encounter at some point

You are getting the highest orderID (MAX) per ship city rather then the count.
You instead need COUNT(o.OrderID)
And in MySQL you need to use LIMIT 1 on the end to get only the top most result.

Related

Query GROUP BY and COUNT

I'm new to SQL and taking COURSERA's "SQL for Data Science" course.I have the following question in a summary assignment:
Show the number of orders placed by each customer and sort the result by the number of orders in descending order.
Having failed to write the correct code, the answer would be as follows (of course one of several options):
SELECT *
,COUNT (InvoiceId) AS number_of_orders
FROM Invoices
GROUP BY CustomerId
ORDER BY number_of_orders DESC
I am still having trouble understanding the query logic. I would appreciate your assistance in understanding this query.
I seriously hope that Coursera isn't giving you the query you cited above as the recommended answer. It won't run on most databases, and even in cases such as MySQL where it might run, it is not completely correct. You should be using this version:
SELECT CustomerId, COUNT (InvoiceId) AS number_of_orders
FROM Invoices
GROUP BY CustomerId
ORDER BY number_of_orders DESC;
A basic rule of GROUP BY is that the only columns available for selection are those which appear in the GROUP BY clause. In addition to these columns, aggregates of any column(s) may also appear in the select. The version I gave you above follows these rules, and is ANSI compliant, meaning it would run on any database.
When you say SELECT * it represents ALL COLUMNS. But you are grouping by only CustomerId which is wrong in SQL.
Specify the other columns in the group section that you want to show
The script should be something like
SELECT CustomerName, DateEntered
,COUNT (InvoiceId) AS number_of_orders
FROM Invoices
GROUP BY CustomerId, CustomerName, DateEntered
ORDER BY number_of_orders DESC

Student Seeking Advice for a CSC Exam

I'm a student taking a course on SQL and DB. My question is this: how does one get good at hand writing queries? Our final exam will consist of many of these questions, and I want to do well. We aren't allowed any sort of reference sheet either, just fyi.
I suppose what I'm asking is: how would you approach this?
In short, You require practice aka hands on sql.
You will probably get many opinions on this from others. Aside from practice and reading, try to ensure you understand the absolute basics and sequence of query.
Always use table.column or alias.column to help prevent any ambiguity of where something is coming from.
Know the overall basic segments of writing a query such as
select
[all your alias.columns comma separated]
from
[your primary and/or JOIN/LEFT JOIN/etc tables]
where
[what is the criteria you are looking for]
AND [use proper parenthesis to prevent ambiguity if so needed]
group by
[any columns if doing aggregates such as count, min, max, avg, etc]
[you need to list all NON-AGGREGATE alias.columns]
having
[if any, such as count(*) > someValue]
order by
[any specific columns and ascending or descending order]
[such as orderDate DESC to put most recent order at top]
In my opinion, getting your FROM clause is one of the most important and I try to always list my table JOIN clauses on first table/alias = second table/alias. Indentation helps here so you can see how you get from one table to the next. At this point, do not think of your filtering (YET), just HOW the tables are related. Then you can add "AND" criteria for something you are specifically looking for from that source.
An example of orders. Looking for customers who ordered in the last 30 days. Start with that source as your first FROM table, everything else off of that. So I start with the orders because I care about WHEN something was ordered. I can then join to customers to get their name.
select
c.LastName,
c.FirstName,
o.OrderDate
from
Orders o
JOIN Customers c
on o.CustomerID = c.CustomerID
where
o.OrderDate > [sql-specific current date - 30 days]
order by
c.LastName,
c.FirstName
Another example of orders that ordered a specific item in the last 30 days. In this case, I could reverse the order of details as specific things being ordered might be smaller granularity vs everything. So, altering above such as
select
c.LastName,
c.FirstName,
o.OrderDate
from
Items i
JOIN OrderDetails od
on i.ItemID = od.ItemID
JOIN Orders o
on od.OrderID = o.OrderID
AND o.OrderDate > [sql-specific current date - 30 days]
JOIN Customers c
on o.CustomerID = c.CustomerID
where
i.ItemDescription = 'SomeThing'
order by
c.LastName,
c.FirstName
Notice my indentation nesting. Personal style preference, but at least you can see how alias i to od, od to o, o to c. In my preference, easier to see the trail of tables and how each are directly related. I also added the "AND" clause to filter out orders within the last 30 days directly in the JOIN to the orders table.
LEFT JOINs, I do the same and keep the criteria directly at the JOIN level. If you put a criteria of a left-join into the WHERE clause (without explicitly handling NULL OR [condition] it turns a left-join into an [INNER] join.
Hope this basic guidance helps you get more comfortable as you get more into writing your own queries and course/test preparation.

My question is about SQL, using a TOP function inside a sub-query in MS Access

Overall what I'm trying to achieve is a query that shows the most ordered item from a customer in a database. To achieve this I've tried making a query showing how many times a customer has ordered an item, and now I am trying to create a sub-query in it using TOP1 to discern the most bought items.
With the SQL from the first query (looking weird because I made it with the Access automatic creator):
SELECT
Customers.CustomerFirstName,
Customers.CustomerLastName,
Products.ProductName,
COUNT(SalesQuantity.ProductCode) AS CountOfProductCode
FROM (Employees
INNER JOIN (Customers
INNER JOIN Sales
ON Customers.CustomerCode = Sales.CustomerCode)
ON Employees.EmployeeCode = Sales.EmployeeCode)
INNER JOIN (Products
INNER JOIN SalesQuantity
ON Products.ProductCode = SalesQuantity.ProductCode)
ON Sales.SalesCode = SalesQuantity.SalesCode
GROUP BY
Customers.CustomerFirstName,
Customers.CustomerLastName,
Products.ProductName
ORDER BY
COUNT(SalesQuantity.ProductCode) DESC;
I have tried putting in a subquery after FROM line:
(SELECT TOP1 CountOfProduct(s)
FROM (.....)
ORDER by Count(SalesQuantity.ProductCode) DESC)
I'm just not sure what to put in for the "from"-every other tutorial has the data from an already created table, however this is from a query that is being made at the same time. Just messing around I've put "FROM" and then listed every table, as well as
FROM Count(SalesQuantity.ProductCode)
just because I've seen that in the order by from the other code, and assume that the query is discerning from this count. Both tries have ended with an error in the syntax of the "FROM" line.
I'm new to SQL so sorry if it's blatantly obvious, but any help would be greatly appreciated.
Thanks
As I understand, you want the most purchased product for each customer.
So, begin by building aggregate query that counts product purchases by customer (appears to be done in the posted image). Including customer ID in the query would simplify the next step which is to build another query with TOP N nested query.
Part of what complicates this is unique record identifier is lost because of aggregation. Have to use other fields from the aggregate query to provide unique identifier. Consider:
SELECT * FROM Query1 WHERE CustomerID & ProductName IN
(SELECT TOP 1 CustomerID & ProductName FROM Query1 AS Dupe
WHERE Dupe.CustomerID = Query1.CustomerID
ORDER BY Dupe.CustomerID, Dupe.CountOfProductCode DESC);
Overall what I'm trying to achieve is a query that shows the most ordered item from a customer in a database.
This answers your question. It does not modify your query which is only tangentially related.
SELECT s.CustomerCode, sq.ProductCode, SUM(sq.quantity) as qty
FROM Sales as s INNER JOIN
SalesQuantity as sq
ON s.SalesCode = sq.SalesCode
GROUP BY s.CustomerCode, sq.ProductCode;
To get the most ordered items, you can use this twice:
SELECT s.CustomerCode, sq.ProductCode, SUM(sq.quantity) as qty
FROM Sales as s INNER JOIN
SalesQuantity as sq
ON s.SalesCode = sq.SalesCode
GROUP BY s.CustomerCode, sq.ProductCode
HAVING sq.ProductCode IN (SELECT TOP 1 sq2.ProductCode
FROM Sales as s2 INNER JOIN
SalesQuantity as sq2
ON s2.SalesCode = sq2.SalesCode
WHERE s2.CustomerCode = s.CustomerCode
GROUP BY sq2.ProductCode
);
In almost any other database, this would be simpler, because you would be able to use window functions.

Get the min() value from columns where duplicates are associated with ids

So, to simplify my problem is as follows:
A. I pass in CustomerID, there can be multiple Orders per Customer. I need to find the minimum ProductID per Order per Customer.
B. Once that is complete, I need run the main query (to return the dataset) in which I would like to display all the Orders, along with Category and Code (per Product). The result should display NULL for Product, Category and Code where the ProductID <> the ProductID found in Part A.
Thanks
(All these are separate tables with foreign keys. I can handle the joins, it's just the "minimum ProductID per Order per Customer" thing that's throwing me.)
You did not provide much information about your specific database schema, so an exact answer is hard to provide. Basically, you want to use a combination of MIN and Group By (and perhaps a join or two depending on your schema).
The following provides a rough example:
select min(ProductID) from Orders o
inner join Customers c
on o.CustomerID = c.CustomerID
group by (productID)
having c.CustomerID = <customer id value>
Regards,

When do you give up set operations in SQL and go procedural?

I was once given this task to do in an RDBMS:
Given tables customer, order, orderlines and product. Everything done with the usual fields and relationships, with a comment memo field on the orderline table.
For one customer retrieve a list of all products that customer has ever ordered with product name, year of first purchase, dates of three last purchases, comment of the latest order, sum of total income for that product-customer combination last 12 months.
After a couple of days I gave up doing it as a Query and opted to just fetch every orderline for a customer, and every product and run through the data procedurally to build the required table clientside.
I regard this a symptom of one or more of the following:
I'm a lazy idiot and should have seen how to do it in SQL
Set operations are not as expressive as procedural operations
SQL is not as expressive as it should be
Did I do the right thing? Did I have other options?
You definitely should be able to do this exercise without doing the work equivalent to a JOIN in application code, i.e. by fetching all rows from both orderlines and products and iterating through them. You don't have to be an SQL wizard to do that one. JOIN is to SQL what a loop is to a procedural language -- in that both are fundamental language features that you should know how to use.
One trap people fall into is thinking that the whole report has to be produced in a single SQL query. Not true! Most reports don't fit into a rectangle, as Tony Andrews points out. There are lots of rollups, summaries, special cases, etc. so it's both simpler and more efficient to fetch parts of the report in separate queries. Likewise, in a procedural language you wouldn't try do all your computation in a single line of code, or even in a single function (hopefully).
Some reporting tools insist that a report is generated from a single query, and you have no opportunity to merge in multiple queries. If so, then you need to produce multiple reports (and if the boss wants it on one page, then you need to do some paste-up manually).
To get a list of all products ordered (with product name), dates of last three purchases, and comment on latest order is straightforward:
SELECT o.*, l.*, p.*
FROM Orders o
JOIN OrderLines l USING (order_id)
JOIN Products p USING (product_id)
WHERE o.customer_id = ?
ORDER BY o.order_date;
It's fine to iterate over the result row-by-row to extract the dates and comments on the latest orders, since you're fetching those rows anyway. But make it easy on yourself by asking the database to return the results sorted by date.
Year of first purchase is available from the previous query, if you sort by the order_date and fetch the result row-by-row, you'll have access to the first order. Otherwise, you can do it this way:
SELECT YEAR(MIN(o.order_date)) FROM Orders o WHERE o.customer_id = ?;
Sum of product purchases for the last 12 months is best calculated by a separate query:
SELECT SUM(l.quantity * p.price)
FROM Orders o
JOIN OrderLines l USING (order_id)
JOIN Products p USING (product_id)
WHERE o.customer_id = ?
AND o.order_date > CURDATE() - INTERVAL 1 YEAR;
edit: You said in another comment that you'd like to see how to get the dates of the last three purchases in standard SQL:
SELECT o1.order_date
FROM Orders o1
LEFT OUTER JOIN Orders o2
ON (o1.customer_id = o2.customer_id AND (o1.order_date < o2.order_date
OR (o1.order_date = o2.order_date AND o1.order_id < o2.order_id)))
WHERE o1.customer_id = ?
GROUP BY o1.order_id
HAVING COUNT(*) <= 3;
If you can use a wee bit of vendor-specific SQL features, you can use Microsoft/Sybase TOP n, or MySQL/PostgreSQL LIMIT:
SELECT TOP 3 order_date
FROM Orders
WHERE customer_id = ?
ORDER BY order_date DESC;
SELECT order_date
FROM Orders
WHERE customer_id = ?
ORDER BY order_date DESC
LIMIT 3;
Set operations are not as expressive as procedural operations
Perhaps more like: "Set operations are not as familiar as procedural operations to a developer used to procedural languages" ;-)
Doing it iteratively as you have done now is fine for small sets of data, but simply doesn't scale the same way. The answer to whether you did the right thing depends on whether you are satisfied with the performance right now and/or don't expect the amount of data to increase much.
If you could provide some sample code, we might be able to help you find a set-based solution, which will be faster to begin with and scale far, far better. As GalacticCowboy mentioned, techniques such as temporary tables can help make the statements far more readable while largely retaining the performance benefits.
In most RDBMS you have the option of temporary tables or local table variables that you can use to break up a task like this into manageable chunks.
I don't see any way to easily do this as a single query (without some nasty subqueries), but it still should be doable without dropping out to procedural code, if you use temp tables.
This problem may not have been solvable by one query. I see several distinct parts...
For one customer
Get a list of all products ordered (with product name)
Get year of first purchase
Get dates of last three purchases
Get comment on latest order
Get sum of product purchases for the last 12 months
Your procedure is steps 1 - 5 and SQL gets you the data.
Sounds like a data warehouse project to me. If you need things like "three most recent things" and "sum of something over the last 12 months" then store them i.e. denormalize.
EDIT: This is a completely new take on the solution, using no temp tables or strange sub-sub-sub queries. However, it will ONLY work on SQL 2005 or newer, as it uses the "pivot" command that is new in that version.
The fundamental problem is the desired pivot from a set of rows (in the data) into columns in the output. While noodling on the issue, I recalled that SQL Server now has a "pivot" operator to deal with this.
This works on SQL 2005 only, using the Northwind sample data.
-- This could be a parameter to a stored procedure
-- I picked this one because he has products that he ordered 4 or more times
declare #customerId nchar(5)
set #customerId = 'ERNSH'
select c.CustomerID, p.ProductName, products_ordered_by_cust.FirstOrderYear,
latest_order_dates_pivot.LatestOrder1 as LatestOrderDate,
latest_order_dates_pivot.LatestOrder2 as SecondLatestOrderDate,
latest_order_dates_pivot.LatestOrder3 as ThirdLatestOrderDate,
'If I had a comment field it would go here' as LatestOrderComment,
isnull(last_year_revenue_sum.ItemGrandTotal, 0) as LastYearIncome
from
-- Find all products ordered by customer, along with first year product was ordered
(
select c.CustomerID, od.ProductID,
datepart(year, min(o.OrderDate)) as FirstOrderYear
from Customers c
join Orders o on o.CustomerID = c.CustomerID
join [Order Details] od on od.OrderID = o.OrderID
group by c.CustomerID, od.ProductID
) products_ordered_by_cust
-- Find the grand total for product purchased within last year - note fudged date below (Northwind)
join (
select o.CustomerID, od.ProductID,
sum(cast(round((od.UnitPrice * od.Quantity) - ((od.UnitPrice * od.Quantity) * od.Discount), 2) as money)) as ItemGrandTotal
from
Orders o
join [Order Details] od on od.OrderID = o.OrderID
-- The Northwind database only contains orders from 1998 and earlier, otherwise I would just use getdate()
where datediff(yy, o.OrderDate, dateadd(year, -10, getdate())) = 0
group by o.CustomerID, od.ProductID
) last_year_revenue_sum on last_year_revenue_sum.CustomerID = products_ordered_by_cust.CustomerID
and last_year_revenue_sum.ProductID = products_ordered_by_cust.ProductID
-- THIS is where the magic happens. I will walk through the individual pieces for you
join (
select CustomerID, ProductID,
max([1]) as LatestOrder1,
max([2]) as LatestOrder2,
max([3]) as LatestOrder3
from
(
-- For all orders matching the customer and product, assign them a row number based on the order date, descending
-- So, the most recent is row # 1, next is row # 2, etc.
select o.CustomerID, od.ProductID, o.OrderID, o.OrderDate,
row_number() over (partition by o.CustomerID, od.ProductID order by o.OrderDate desc) as RowNumber
from Orders o join [Order Details] od on o.OrderID = od.OrderID
) src
-- Now, produce a pivot table that contains the first three row #s from our result table,
-- pivoted into columns by customer and product
pivot
(
max(OrderDate)
for RowNumber in ([1], [2], [3])
) as pvt
group by CustomerID, ProductID
) latest_order_dates_pivot on products_ordered_by_cust.CustomerID = latest_order_dates_pivot.CustomerID
and products_ordered_by_cust.ProductID = latest_order_dates_pivot.ProductID
-- Finally, join back to our other tables to get more details
join Customers c on c.CustomerID = products_ordered_by_cust.CustomerID
join Orders o on o.CustomerID = products_ordered_by_cust.CustomerID and o.OrderDate = latest_order_dates_pivot.LatestOrder1
join [Order Details] od on od.OrderID = o.OrderID and od.ProductID = products_ordered_by_cust.ProductID
join Products p on p.ProductID = products_ordered_by_cust.ProductID
where c.CustomerID = #customerId
order by CustomerID, p.ProductID
SQL queries return results in the form of a single "flat" table of rows and columns. Reporting requirements are often more complex than this, demanding a "jagged" set of results like your example. There is nothing wrong with "going procedural" to solve such requirements, or using a reporting tool that sits on top of the database. However, you should use SQL as far as possible to get the best performance from the database.