Group By not functioning as desired - sql

Let's say I have three tables - Orders, OrderDetails, and ProductType - and the Orders table includes a column for Customer. What I need to do is write a query that will show me a list of customers and how many orders each customer has placed, as well as displaying and grouping by another column, which is a boolean based on whether a particular type of product - say, telephones - is in the order.
For example, we might have:
Customer | NumOrders | IncludesPhone
---------------------------------
Jameson | 3 | Yes
Smith | 5 | Yes
Weber | 1 | Yes
Adams | 2 | | No
Jameson | 1 | No
Smith | 7 | No
Weber | 2 | No
However, when I try to write the query for this, I'm getting multiple rows with the same values for Customer and IncludesPhone, each with a different value for NumOrders. Why is this happening? My query is below:
SELECT Customer, COUNT(Customer) AS NumOrders, CASE WHEN (ProductType.Type = 'Phone') THEN 'Yes' ELSE 'No' END AS IncludesPhoneFROM Orders INNER JOIN OrderDetails INNER JOIN ProductTypeGROUP BY Customer, TypeOrder By IncludesPhone, Customer

Change the group by to
GROUP BY Customer,
CASE WHEN (ProductType.Type = 'Phone') THEN 'Yes' ELSE 'No' END

This query should work
SELECT Customer, COUNT(Customer) AS NumOrders,
CASE WHEN (ProductType.Type = 'Phone') THEN 'Yes' ELSE 'No' END AS IncludesPhone
FROM Orders INNER JOIN OrderDetails INNER JOIN ProductType
GROUP BY Customer,
CASE WHEN (ProductType.Type = 'Phone') THEN 'Yes' ELSE 'No' END
Order By IncludesPhone, Customer

Since you're grouping on both Customer and Type, your query returns the count of orders per customer per type. If you only want one row per customer, you should only group by Customer, and then use something like this, to determine whether a given customer bought a phone:
SELECT Customer, COUNT(Customer) AS NumOrders,
CASE
WHEN SUM(CASE WHEN (ProductType.Type = 'Phone') THEN 1 ELSE 0 END) > 0
THEN 'Yes'
ELSE 'No' END AS IncludesPhone
FROM Orders INNER JOIN OrderDetails INNER JOIN ProductType
GROUP BY Customer
Order By IncludesPhone, Customer
The inner sum basically counts the number of phones bought per customer. If this is more than 0, then the customer bought at least one phone and we return "Yes".

That's because you're grouping by Type column, so there could be duplicate rows. For example, for types 'Email' and 'Personal' column IncludesPhone will be 'No', but as you're grouping by Type there would be two records in the output.
To fix this, you can use same expression in the group by clause or use subquery or Common Table Expression:
with cte as (
select
Customer,
case when pt.Type = 'Phone' then 'Yes' else 'No' end as IncludesPhone
from Orders as o
inner join OrderDetails as od -- ???
inner join ProductType as pt -- ???
)
select Customer, IncludesPhone, count(*) as NumOrders
from cte
group by Customer, IncludesPhone
order by IncludesPhone, Customer
using same expression in the group by clause:
select
Customer,
case when pt.Type = 'Phone' then 'Yes' else 'No' end as IncludesPhone,
count(*) as NumOrders
from Orders as o
inner join OrderDetails as od -- ???
inner join ProductType as pt -- ???
group by Customer, case when pt.Type = 'Phone' then 'Yes' else 'No' end

You could try this query:
SELECT x.Customer,x.NumOrders,
CASE WHEN x.NumOrders>0 AND EXISTS(
SELECT *
FROM Orders o
INNER JOIN OrderDetails od ON ...
INNER JOIN ProductType pt ON ...
WHERE o.Customer=x.Customer
AND pt.Type = 'Phone'
) THEN 1 ELSE 0 END IncludesPhone
FROM
(
SELECT Customer,COUNT(Customer) AS NumOrders
FROM Orders
GROUP BY Customer
) x
Order By IncludesPhone, x.Customer;
or this one:
SELECT o.Customer,
COUNT(o.Customer) AS NumOrders,
MAX(CASE WHEN EXISTS
(
SELECT *
FROM OrderDetails od
JOIN ProductType pt ON ...
WHERE o.OrderID=od.OrderID -- Join predicated between Orders and OrderDetails table
AND ProductType.Type = 'Phone'
) THEN 1 ELSE 0 END) AS IncludesPhone
FROM Orders o
GROUP BY Customer
ORDER BY IncludesPhone, o.Customer

Related

How to calculate rows count in where statement in sql?

I have two tables in SQL Server:
order (columns: order_id, payment_id)
payment (columns: payment_id, is_pay)
I want to get all orders with two more properties:
How many rows where is_pay is 1:
where payment_id = <...> payment.is_pay = 1
And the count of the rows (without the first filter)
select count(*)
from payment
where payment_id = <...>
So I wrote this query:
select
*,
(select count(1) from payment p
where p.payment_id = o.payment_id and p.is_pay = 1) as total
from
order o
The problem is how to calculate the rows without the is_pay = 1?
I mean the "some of many"
First aggregate in payment and then join to order:
SELECT o.*, p.total_pay, p.total
FROM [order] o
LEFT JOIN (
SELECT payment_id, SUM(is_pay) total_pay, COUNT(*) total
FROM payment
GROUP BY payment_id
) p ON p.payment_id = o.payment_id;
Change LEFT to INNER join if all orders have at least 1 payment.
Also, if is_pay's data type is BIT, change SUM(is_pay) to:
SUM(CASE WHEN is_pay = 1 THEN 1 ELSE 0 END)
Use a join with conditional aggregation:
SELECT
o.payment_id,
COUNT(CASE WHEN p.is_pay = 1 THEN 1 END) AS pay_cnt,
COUNT(p.payment_id) AS all_cnt
FROM "order" o
LEFT JOIN payment p
ON o.payment_id = p.payment_id
GROUP BY
o.payment_id;
You can use a lateral join (outer apply) for this:
select o.*, p.*
from orders o outer apply
(select count(*) as num_payments,
sum(case when is_pay = 1 then 1 else 0 end) as num_payments_1
from payments p
where p.payment_id = o.payment_id
) p;
Note: Assuming that is_pay only takes on the values of 0 and 1 (which seems reasonable given the name), you can simplify this to:
select o.*, p.*
from orders o outer apply
(select count(*) as num_payments,
sum(is_pay) as num_payments_1
from payments p
where p.payment_id = o.payment_id
) p;
If you are looking for counts per payment id then use this:
select
payment.payment_id,
count(*) as total,
count(case when payment.is_pay = 1 then 1 else 0) end as total_is_pay_orders
from orders
left join payment
on orders.payment_id = payment.payment_id
group by 1

SQL. Find the customers who bought same brands and at-least 2 products in each brand

I have two tables :
Sales
columns: (Sales_id, Date, Customer_id, Product_id, Purchase_amount):
Product
columns: (Product_id, Product_Name, Brand_id,Brand_name)
I have to write a query to find the customers who bought the brands 'X' and 'Y' (both) and at least 2 products of each brand. Is the following query correct? Any recommended changes?
SELECT S.Customer_id "Customer ID"
FROM Sales S LEFT JOIN Product P
ON S.Product_id = P.Product_id
AND P.Brand_Name IN ('X','Y')
GROUP BY S.Customer_id
HAVING COUNT(DISTINCT S.Product_id)>=2 -----at least 2 products in each brand
AND COUNT(S.Customer_id) =2 ---------------customers who bought both brands
Any help will be appreciated. Thanks in advance
Use COUNT() window function to count the number of distinct brands and the number of distinct products of each brand that each customer has bought.
Then filter out the customers who haven't bought both brands and GROUP BY customer with a HAVING clause that filters out the customers who haven't bought at least 2 products of each brand.
Also your join should be an INNER join and not a LEFT join.
select t.customer_id "Customer ID"
from (
select s.customer_id,
count(distinct p.brand_id) over (partition by s.customer_id) brands_counter,
count(distinct p.product_id) over (partition by s.customer_id, p.brand_id) products_counter
from sales s inner join product p
on p.product_id = s.product_id
where p.brand_name in ('X', 'Y')
) t
where t.brands_counter = 2
group by t.customer_id
having min(t.products_counter) >= 2
Starting from your existing query, you can use the following HAVING clause:
HAVING
AND COUNT(DISTINCT CASE WHEN p.brand_name = 'X' then S.product_id end) >= 2
AND COUNT(DISTINCT CASE WHEN p.brand_name = 'Y' then S.product_id end) >= 2
This ensures that the customer bought at least two products in both brands. This implicitly guarantees that it placed ordered in both brands, so there is no need for additional logic for this.
You could also express this with MIN() and MAX():
HAVING
AND MIN(CASE WHEN p.brand_name = 'X' THEN S.product_id END)
<> MAX(CASE WHEN p.brand_name = 'X' then S.product_id end)
AND MIN(CASE WHEN p.brand_name = 'Y' THEN S.product_id END)
<> MAX(CASE WHEN p.brand_name = 'Y' then S.product_id end)
You can use two levels of aggregation:
SELECT Customer_id
FROM (SELECT S.Customer_id, S.Brand_Name, COUNT(DISTINCT S.Product_Id) as num_products
FROM Sales S LEFT JOIN
Product P
ON S.Product_id = P.Product_id
WHERE P.Brand_Name IN ('X', 'Y')
GROUP BY S.Customer_id, S.Product_Id
) s
GROUP BY Customer_Id
HAVING COUNT(*) = 2 AND MIN(num_products) >= 2;

SQL - how do I return the number of failed orders if there is a condition?

So, I have two tables - table customers and table orders.
customer with attributes custid, name, address and orders with attributes customerid, orderid, date and status. I need to return the ids of those customers, who have had more than 15% of their orders with status "failed".
This is what I have written and does not currently work:
SELECT C.custid
FROM customers C
WHERE C.custid IN (SELECT O.customerid, COUNT(status)
FROM orders O
WHERE O.status='failed'
GROUP BY O.custid
HAVING COUNT(status)=0.15)
Here is one approach using aggregation on the orders table:
SELECT customterid
FROM orders
GROUP BY customerid
HAVING COUNT(CASE WHEN status = 'failed' THEN 1 END) / COUNT(*) > 0.15;

Multiple COUNT / GROUP BY statements return unexpected results

Querying a postgreSQL 9.4 db, I want to be able to see how often each customer is interacting with each employee based on previous orders. My aim is to retrieve data in the following format:
CUSTOMER EMPLOYEE INTERACTIONS CUSTOMER_TOTAL
Customer1 EmployeeA 30 50
Customer1 EmployeeB 20 50
Customer2 EmployeeD 6 15
Customer2 EmployeeA 6 15
Customer2 EmployeeC 3 15
...where I have a separate record in the results for every combination of customer and employee (assuming at least one order has taken place between the two).
I want to include a column containing the number of orders between a customer and each individual employee (See column 3 above), and another column with the total number of orders for each customer overall (see column 4 above).
I've written the following query:
SELECT customer.name as Customer, employee.name as Employee,
SUM(CASE WHEN orders.employee_id = employee.id AND orders.customer_id = customer.id THEN 1 ELSE 0 END) AS Interactions,
SUM(CASE WHEN orders.customer_id = customer.id THEN 1 ELSE 0 END) AS Customer_Total
FROM tblcustomer customer
JOIN tblorder orders ON orders.customer_id = customer.id
LEFT JOIN tblemployee employee ON employee.id = orders.employee_id
GROUP BY customer.name, employee.name
ORDER BY Customer, Interactions DESC;
Which returned the following results:
CUSTOMER EMPLOYEE INTERACTIONS CUSTOMER_TOTAL
Customer1 EmployeeA 30 30
Customer1 EmployeeB 20 20
Customer2 EmployeeD 6 6
Customer2 EmployeeA 6 6
Customer2 EmployeeC 3 3
All rows / columns appear as expected, except for the final column. Instead of a count of total orders for each customer, it has returned only the orders where the employee is also a match. Where have I gone wrong?
I think you should be using LEFT JOIN here:
SELECT customer.name as Customer,
employee.name as Employee,
SUM(CASE WHEN orders.employee_id = employee.id AND
orders.customer_id = customer.id
THEN 1 ELSE 0 END) AS Interactions,
SUM(CASE WHEN orders.customer_id = customer.id
THEN 1 ELSE 0 END) AS Customer_Total
FROM tblcustomer customer
LEFT JOIN tblorder orders
ON orders.customer_id = customer.id
LEFT JOIN tblemployee employee
ON employee.id = orders.employee_id
GROUP BY customer.name,
employee.name
ORDER BY Customer,
Interactions DESC;
My hunch is that your INNER JOINs are filtering out records which would comprise the customer total. By using LEFT JOIN you are retaining these records.
One left join with tblEmployee would do the task. Inner join with tblEmployee will filter and get only those records which are number of orders between a customer and each individual employee.
SELECT customer.name as Customer,
employee.name as Employee,
SUM(CASE WHEN orders.employee_id = employee.id AND
orders.customer_id = customer.id
THEN 1 ELSE 0 END) AS Interactions,
SUM(CASE WHEN orders.customer_id = customer.id
THEN 1 ELSE 0 END) AS Customer_Total
FROM tblcustomer customer
JOIN tblorder orders
ON orders.customer_id = customer.id
LEFT JOIN tblemployee employee
ON employee.id = orders.employee_id
GROUP BY customer.name,
employee.name
ORDER BY Customer,
Interactions DESC;

Segment purchases based on new vs returning

I'm trying to write a query that can select a particular date and count how many of those customers have placed orders previously and how many are new. For simplicity, here is the table layout:
id (auto) | cust_id | purchase_date
-----------------------------------
1 | 1 | 2010-11-15
2 | 2 | 2010-11-15
3 | 3 | 2010-11-14
4 | 1 | 2010-11-13
5 | 3 | 2010-11-12
I was trying to select orders by a date and then join any previous orders on the same user_id from previous dates, then count how many had orders, vs how many didnt. This was my failed attempt:
SELECT SUM(
CASE WHEN id IS NULL
THEN 1
ELSE 0
END ) AS new, SUM(
CASE WHEN id IS NOT NULL
THEN 1
ELSE 0
END ) AS returning
FROM (
SELECT o1 . *
FROM orders AS o
LEFT JOIN orders AS o1 ON ( o1.user_id = o.user_id
AND DATE( o1.created ) = "2010-11-15" )
WHERE DATE( o.created ) < "2010-11-15"
GROUP BY o.user_id
) AS t
Given a reference data (2010-11-15), then we are interested in the number of distinct customers who placed an order on that date (A), and we are interested in how many of those have placed an order previously (B), and how many did not (C). And clearly, A = B + C.
Q1: Count of orders placed on reference date
SELECT COUNT(DISTINCT Cust_ID)
FROM Orders
WHERE Purchase_Date = '2010-11-15';
Q2: List of customers placing order on reference date
SELECT DISTINCT Cust_ID
FROM Orders
WHERE Purchase_Date = '2010-11-15';
Q3: List of customers who placed an order on reference date who had ordered before
SELECT DISTINCT o1.Cust_ID
FROM Orders AS o1
JOIN (SELECT DISTINCT o2.Cust_ID
FROM Orders AS o2
WHERE o2.Purchase_Date = '2010-11-15') AS c1
ON o1.Cust_ID = c1.Cust_ID
WHERE o1.Purchase_Date < '2010-11-15';
Q4: Count of customers who placed an order on reference data who had ordered before
SELECT COUNT(DISTINCT o1.Cust_ID)
FROM Orders AS o1
JOIN (SELECT DISTINCT o2.Cust_ID
FROM Orders AS o2
WHERE o2.Purchase_Date = '2010-11-15') AS c1
ON o1.Cust_ID = c1.Cust_ID
WHERE o1.Purchase_Date < '2010-11-15';
Q5: Combining Q1 and Q4
There are several ways to do the combining. One is to use Q1 and Q4 as (complicated) expressions in the select-list; another is to use them as tables in the FROM clause which don't need a join between them because each is a single-row, single-column table that can be joined in a Cartesian product. Another would be a UNION, where each row is tagged with what it calculates.
SELECT (SELECT COUNT(DISTINCT Cust_ID)
FROM Orders
WHERE Purchase_Date = '2010-11-15') AS Total_Customers,
(SELECT COUNT(DISTINCT o1.Cust_ID)
FROM Orders AS o1
JOIN (SELECT DISTINCT o2.Cust_ID
FROM Orders AS o2
WHERE o2.Purchase_Date = '2010-11-15') AS c1
ON o1.Cust_ID = c1.Cust_ID
WHERE o1.Purchase_Date < '2010-11-15') AS Returning_Customers
FROM Dual;
(I'm blithely assuming MySQL has a DUAL table - similar to Oracle's. If not, it is trivial to create a table with a single column containing a single row of data. Update 2: bashing the MySQL 5.5 Manual shows that 'FROM Dual' is supported but not needed; MySQL is happy without a FROM clause.)
Update 1: added qualifier 'o1.Cust_ID' in key locations to avoid 'ambiguous column name' as indicated in the comment.
How about
SELECT * FROM
(SELECT * FROM
(SELECT CUST_ID, COUNT(*) AS ORDER_COUNT, 1 AS OLD_CUSTOMER, 0 AS NEW_CUSTOMER
FROM ORDERS
GROUP BY CUST_ID
HAVING ORDER_COUNT > 1)
UNION ALL
(SELECT CUST_ID, COUNT(*) AS ORDER_COUNT, 0 AS OLD_CUSTOMER, 1 AS NEW_CUSTOMER
FROM ORDERS
GROUP BY CUST_ID
HAVING ORDER_COUNT = 1)) G
INNER JOIN
(SELECT CUST_ID, ORDER_DATE
FROM ORDERS) O
USING (CUST_ID)
WHERE ORDER_DATE = [date of interest] AND
OLD_CUSTOMER = [0 or 1, depending on what you want] AND
NEW_CUSTOMER = [0 or 1, depending on what you want]
Not sure if that'll do the whole thing, but it might provide a starting point.
Share and enjoy.
select count(distinct o1.cust_id) as repeat_count,
count(distinct o.cust_id)-count(distinct o1.cust_id) as new_count
from orders o
left join (select cust_id
from orders
where purchase_date < "2010-11-15"
group by cust_id) o1
on o.cust_id = o1.cust_id
where o.purchase_date = "2010-11-15"