Join with a groupby operation in SQL - sql

I am playing with W3Schools SQL environment. A pre-defined database is setup here.
Tables to be used: Customer and Orders.
To get all the info from Customer we can do:
SELECT * FROM [Customers]
To get Customers who have only less than 3 orders we do:
SELECT CustomerID, count(*) as num_orders FROM [Orders] group by customerID having num_orders<3
To get the Customers we have in London, we do:
SELECT * FROM [Customers] where city="London"
Question: How can I get, for every customer in London (with less than 3 orders), how many orders they have?
I know it has to be a Left join, as I want to keep all customers even if they have N/A orders (so, no records in "Orders"), but I am having a hard time to make it work.
I tried:
SELECT * FROM [Customers] where city="London"
left join (SELECT CustomerID, count(*) as num_orders FROM [Orders] group by customerID having num_orders<3) as data
on customers.CustomerID= data.CustomerID
But the environment gives no meaninful info about the error.

The proper syntax is:
SELECT c.*, o.num_orders
FROM [Customers] c LEFT JOIN
(SELECT o.CustomerID, COUNT(*) as num_orders
FROM [Orders] o
GROUP BY o.customerID
) o
ON c.CustomerID = o.CustomerID
WHERE c.city = 'London';
Notes:
The most important difference is the order of the clauses. WHERE comes after the FROM clause.
The HAVING clause is removed, because the question is for all customers in London.
Single quotes are used to quote London. Single quotes are the standard string delimiter.
The query uses table aliases, and these are specifically chosen to be very short and related to the table names.
All columns are qualified.

Related

(Simple?) SQL Query: Display all customer information for customers with 2+ orders

I'm doing practice exam material for a distance education course. I have the following three relations (simplified here):
salesperson(emp#, name, salary)
order(order#, cust#, emp#, total)
customer(cust#, name, city)
I'm stuck on a pair of SQL queries.
Display all customer info for customers with at least 1 order.
SELECT * FROM customer
INNER JOIN order ON order.cust# = customer.cust#
GROUP BY cust#;
Display all customer info for customers with at least 2 orders.
SELECT cust#, name, city, industry-type FROM customer
INNER JOIN order ON order.cust# = customer.cust#
GROUP BY cust#
HAVING COUNT(cust#) > 2;
I realize these are misguided attempts resulting from a poor understanding of SQL, but I've spent a ton of time on W3School's SQL Query example tool (https://www.w3schools.com/sql/trysql.asp?filename=trysql_select_where) without getting anywhere, and I finally need some "real" help.
You can try to use subquery to get count by cust# then do inner join to make it.
SELECT c.*
FROM (
SELECT cust# , COUNT(*) cnt
FROM order
GROUP BY cust#
) o INNER JOIN customer c ON c.cust# = o.cust#
WHERE o.cnt > 2
You can change table names according to your DB. Following queries you can directly run in W3Schools
Display all customer info for customers with at least 1 order.
SELECT * FROM customers as cust JOIN orders as o ON o.customerid =
cust.customerid GROUP BY o.customerid;
Display all customer info for customers with at least 2 orders.
SELECT * FROM customers as cust JOIN orders as O ON O.CustomerID = cust.CustomerID GROUP BY cust.CustomerID HAVING COUNT(cust.CustomerID) > 2;

SQL Query to identify accommodation booked

I need to know what SQL statement to use in an accommodation database looking for a trip user
The error message is clear enough: the group by clause needs to be consistent with the select. Some databases are smart enough to understand that the name of the customer is functionally dependent on its id, and do not require that you put the name in the group by - but not SQL Server.
Also, you need to count on something that comes from the left joined table if you want 0 for customers without bookings.
Consider:
select c.customer_id, c.customer_name, count(ab.customer_id) as [number of accomm slots]
from customers c
left join accommodation_bookings ab on c.customer_id = ab.customer_id
group by c.customer_id, c.customer_name
I would take one step forward and pre-aggregate in a subquery. This is usually more efficient:
select c.customer_id, c.customer_name, coalesce(ab.cnt, 0) [number of accomm slots]
from customers c
left join (
select customer_id, count(*) cnt
from accommodation_bookings
group by customer_id
) ab on c.customer_id = ab.customer_id
You could also express this with a correlated subquery, or a lateral join:
select c.customer_id, c.customer_name, ab.*
from customers c
outer apply (
select count(*) [number of accomm slots]
from accommodation_bookings ab
where c.customer_id = ab.customer_id
) ab
This would take advantage of an index on accommodation_bookings(customer_id) (which should already be there if you have set up a foreign key).
Note: don't use single quotes for identifiers - they are meant for literal strings. In SQL Server, use the square brackets instead.

Count(*) with inner join and group by

I have 2 tables.
I need to get the number of counts with id from table 2 and get the name for the id from table 1.
I tried the following code. Didn't work!
select orders.CustomerID, customers.ContactName , count(*)
from Orders
left join customers on Customers.CustomerID= Orders.customerid
group by Orders.customerid;
Pls explain my shortcomings if possible.
When grouping the group by section of the query needs to mention all columns tat appear outside of an aggregate like COUNT. You're missing the ContactName here.
A fixed version:
select orders.CustomerID, customers.ContactName , count(*)
from Orders
left join customers on Customers.CustomerID= Orders.customerid
group by Orders.customerid, customers.ContactName;
Alternatively you can group by ID alone and then make the join like this:
With OrderCounts AS
(
select orders.CustomerID , count(*) AS OrderCount
from Orders
group by Orders.customerid
)
SELECT OrderCounts.CustomerID
, customers.ContactName
, OrderCounts.OrderCount
FROM OrderCounts
left join customers on Customers.CustomerID= OrderCounts.CustomerID
The first version is shorter and easier to type. In some scenarios the second version will run faster as the group by occurs on a single table & column.
For the second to give the same results CustomerID must be unique in the customers table otherwise it will produce duplicates (but if that's the case the first example would double count orders).

Give all rows appear in another table at least specific number of times

I am using the sample database and I want to write a query on the tables Customers and Orders that gives all the customers which have made more than 2 Orders. Although I achive that with the query:
Select Customers.*
From Customers
Where Customers.CustomerID IN(
Select Orders.CustomerID
From Orders
Group by Orders.CustomerID
Having count(*)>2
);
I cannot understand why the query:
SELECT Customers.*
FROM Orders
INNER JOIN Customers
ON Orders.CustomerID=Customers.CustomerID
GROUP BY Customers.CustomerID
HAVING COUNT(*)>2;
cannot give the same results. The message from the database is:
"Cannot group on fields selected with '*' (Customers)."
I had though the impression that it should work, since Customers.CustomerID is included on the demanded columns in Select statement. What is the problem and how could I modify the second query in order to work, even though it excecutes probably superfluous statements?
From SQL GROUP BY Statement
The GROUP BY statement is used in conjunction with the aggregate
functions to group the result-set by one or more columns.
SQL GROUP BY Syntax
SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name;
So for using aggregating, you need to specify which columns you are aggregating by, and for that you cannot use *
You would have to specifically specify the columns in both the SELECT and GROUP BY clauses.
Specify the columns you need in SELECT statement:
SELECT Customers.CustomerID, Customers.CustomerName
FROM Orders
INNER JOIN Customers
ON Orders.CustomerID=Customers.CustomerID
GROUP BY Customers.CustomerID, Customers.CustomerName
HAVING COUNT(*)>2;
Your first solution is formed of two queries actually
The second part is used for determining returning customers by listing CustomerID
Select o.CustomerID
From Sales.SalesOrderHeader o
Group by o.CustomerID
Having count(*)>2
And the first part displays Customer details by using returning customer list gained by the second query
Select c.*
From Sales.Customer c
Where
c.CustomerID IN(
...
);
It is not possible to return all customer data while trying to fetch dublicate customer ID's using Group By syntax on Order table
But instead of Group By, SQL aggregate functions (Count function below) with PARTITION BY clause can be used here
Please check the tutorial http://www.kodyaz.com/t-sql/sql-count-function-with-partition-by-clause.aspx and have a look at the following query
select * from (
SELECT distinct
c.*,
COUNT(o.SalesOrderID) over (partition by c.CustomerId) cnt
FROM Sales.SalesOrderHeader o
INNER JOIN Sales.Customer c ON o.CustomerID = c.CustomerID
) t where cnt > 1

SQL query for join with condition

I have these two tables:
Customers: Id, Name
Orders: Id, CustomerId, Time, Status
I want to get a list of customers for which the LAST order does not have a status of 'Wrong'.
I know how to use a LEFT JOIN to get a count of orders for each customer, but I don't know how I can use this statement for what I want. Maybe a JOIN is not the right thing to use too, I'm not sure.
It's possible that customers do not have any order, and they should be returned.
I'm abstracting the real tables here, but the scenario is for a windows phone app sending notifications. I want to get all clients for which their last notification does not have a 'Dropped' status. I can sort their notifications (orders) by the 'Time' field. Thanks for the help, while I continue experimenting with subqueries in the where clause.
Select ...
From Customers As C
Where Not Exists (
Select 1
From Orders As O1
Join (
Select O2.CustomerId, Max( O2.Time ) As Time
From Orders As O2
Group By O2.CustomerId
) As LastOrderTime
On LastOrderTime.CustomerId = O1.CustomerId
And LastOrderTime.Time = O1.Time
Where O1.Status = 'Dropped'
And O1.CustomerId = C.Id
)
There are obviously alternatives based on the actual database product and version. For example, in SQL Server one could use the TOP command or a CTE perhaps. However, without knowing what specific product is being used, the above solution should produce the results you want in almost any database product.
Addition
If you were using a product that supported ranking functions (which database product and version isn't mentioned) and common-table expressions, then an alternative solution might be something like so:
With RankedOrders As
(
Select O.CustomerId, O.Status
, Row_Number() Over( Partition By CustomerId Order By Time Desc ) As Rnk
From Orders As O
)
Select ...
From Customers
Where Not Exists (
Select 1
From RankedOrders As O1
Where O1.CustomerId = C.Id
And O1.Rnk = 1
And O1.Status = 'Dropped'
)
Assuming Last order refers to the Time column here is my query:
SELECT C.Id,
C.Name,
MAX(O.Time)
FROM
Customers C
INNER JOIN Orders O
ON C.Id = O.CustomerId
WHERE
O.Status != 'Wrong'
GROUP BY C.Id,
C.Name
EDIT:
Regarding your table configuration. You should really consider revising the structure to include a third table. They would look like this:
Customer
CustomerId | Name
Order
OrderId | Status | Time
CompletedOrders
CoId | CustomerId | OrderId
Now what you do is store the info about a customer or order in their respective tables ... then when an order is made you just create a CompletedOrders entry with the ids of the 2 individual records. This will allow for a 1 to Many relationship between customer and orders.
Didn't check it out, but something like this?
SELECT c.CustmerId, c.Name, MAX(o.Time)
FROM Customers c
LEFT JOIN Orders o ON o.CustomerId = c.CustomerId
WHERE o.Status <> 'Wrong'
GROUP BY c.CustomerId, C.Name
You can get list of customers with the LAST order which has status of 'Wrong' with something like
select customerId from orders where status='Wrong'
group by customerId
having time=max(time)