Customers who have puchased two specific products - sql

I want to make two lists.
Customers who have bought product 'a' OR product 'b'
Customers who have bought product 'a' AND product 'b'
Number 1 is easy
SELECT SalesTable.OrderAccount
FROM SalesTable
WHERE SalesTable.ItemID = 'a' OR SalesTable.ItemId = 'b'
How do I solve number 2?
Thank you

There are a few ways to do this.
One way is to select where SalesTable.ItemID IN ('a', 'b'). Then you GROUP BY the customer and select rows HAVING two records. You have to be careful here to make sure you're also limiting to one order of each type. This method can be good because, done right, it limits you to one check through your table, but it can be tricky avoiding situations where a single customer may have ordered the same product more than once. It look something like this:
With T As (
SELECT DISTINCT OrderAccount, ItemID
FROM SalesTable
WHERE ItemID IN ('a', 'b')
)
SELECT OrderAccount
FROM T
GROUP BY OrderAccount
HAVING COUNT(ItemID) == 2
Another way is to JOIN SalesTable to itself using a different alias name for each instance of the table, where the join conditions restrict each instance of the table to a different one of the desired products and the same customer. This is more reliable about multiple orders for the same product, but it has to look through the table twice and multiply items in the result set when there are multiple matches on both sides of the JOIN.
SELECT DISTNCT s1.OrderAccount
FROM SalesTable s1
INNER JOIN SalesTable s2 ON s1.OrderAccount = s2.OrderAccount AND s2.ItemID = 'b'
WHERE s1.ItemID = '1'
Another option is using the row_number() windowing function to partition by your products and look for the second row. This fits somewhere in between the two: it only goes through the full table once, but must then review the (somewhat smaller) initial results to get to the final answer. However, the query optimizer can often make this perform just as well as the first option.
SELECT OrderAccount
FROM (
SELECT OrderAccount, row_number()
OVER (partition by OrderAccount, ItemID ORDER BY OrderAccount, ItemID) rn
FROM SalesTable
WHERE ItemID IN ('a', 'b')
) T
WHERE rn = 2

The first query can be written as
SELECT DISTINCT orderaccount
FROM salestable
WHERE itemid IN ('a', 'b');
The DISTINCT is necessary to get each orderaccount only once, no matter if they ordered only item a or only item b or both and whether they ordered them once or many times.
If you want to get order customers that bought both products, then use the same query, but group by customer and count distinct products:
SELECT orderaccount
FROM salestable
WHERE itemid IN ('a', 'b')
GROUP BY orderaccount
HAVING COUNT(DISTINCT itemid) = 2;
As we GROUP BY orderaccount, we don't need to SELECT DISTINCT anymore, because with the GROUP BY clause we already aggregate per orderaccount and get each just once.
If you have an order account table, you can also use IN or EXISTS for the lookup. Thus the DBMS can stop reading once it found a matching purchase. This may not matter here (a customer probably won't buy an item again and again and again), but in other situations (imagine a store selling an item a million times and you merely want to know whether the store sold it at least once or not at all) it can be very beneficial:
SELECT orderaccount
FROM orderaccounts
WHERE orderaccount IN (SELECT orderaccount FROM salestable WHERE itemid = 'a')
AND orderaccount IN (SELECT orderaccount FROM salestable WHERE itemid = 'b');
The same applies to the first query of course, where you would have OR instead of AND. With a large intermediate data set, DISTINCT can be a costly operation.

Related

PostgreSQL Query to JOIN and SUM

I have 2 tables:
orders
orderItems
SUM TOTAL (products price total) of each order s saved on table orders field total. I need to connect these 2 tables and get the sum total and count from the values saved in orders table an example is below
SELECT
count(orders.id), sum(orders.total)
FROM
orders
INNER JOIN orderItems ON orderItems.order_no = orders.order_no
AND orders.order_no LIKE 'P%' AND orderItems.pCode IN ('1','2','3','4')
How do I get the sum and count from single query?
This is a stab in the dark, but based on your updated comments I think I might know what you are dealing with. It seems like you are doing a sum and count on the order header level from the "orders" table, but by joining to the lines table you are getting multiple records, thus getting a seemingly arbitrary multiplication of both aggregates.
If this is the case, where you only want to sum and count the order header if there is one or more lines that meet your criteria (pCode in 1, 2, 3, 4) then what you want is a semi-join, using the exists clause.
SELECT
count(orders.id), sum(orders.total)
FROM
orders o
where
o.order_no like 'P%' and
exists (
select null
from orderItems i
where
o.order_no = i.order_no and
i.pCode in ('1', '2', '3', '4')
)
What this does is even if you have multiple lines meeting your condition(s), it will still only sum each header once. The syntax takes some getting used to, but the construct itself is very useful and efficient. The alternative would be a subquery "in" list, which on PostgreSQL would not run as efficiently for large datasets.
If that's not what you meant, please edit your question with the sample data and what you expect to see for the final output.
If you want to use aggregates (e.g. SUM, COUNT) across values (e.g. pCode) then you need to use a GROUP BY clause on the non-aggregated columns:
SELECT
orderItems.pCode,
COUNT(orders.id) AS order_count,
SUM(orders.total) AS order_total
FROM orders
INNER JOIN orderItems
ON orderItems.order_no = orders.order_no
WHERE orders.order_no LIKE 'P%'
AND orderItems.pCode IN ('1','2','3','4')
GROUP BY
orderItems.pCode
Note how orderItems.pCode is in both the SELECT clause and the GROUP BY clause. If you wanted to list by orders.order_no as well then you would add that column to both clauses too.

SQL querying a customer ID who ordered both product A and B

Having a bit of trouble when trying to figure out how to return a query of a customer who ordered both A and B
What I'm looking for is all customers who order both product A and product B
SELECT CustomerID
FROM table
WHERE product in ('a','b')
GROUP BY customerid
HAVING COUNT(distinct product) = 2
I don't normally post code only answers but there isn't a lot that words can add to this- the query predominantly explains itself
You can also
HAVING max(product) <> min(product)
It may be worth pointing out that in queries, the WHERE is performed, filtering to just products A and B. Then the GROUP BY is performed, grouping customer and counting the distinct number of products (or getting the min and max). Then the HAVING is performed, filtering to just those with 2 distinct products (or getting only those where MIN i.e. A, is different to MAX i.e. B)
If you'v never encountered HAVING, it is logically equivalent to:
SELECT CustomerID
FROM(
SELECT CustomerID, COUNT(distinct product) as count_distinct_product
FROM table
WHERE product in ('a','b')
GROUP BY customerid
)z
WHERE
z.count_distinct_product = 2
In a HAVING clause you can only refer to columns that are mentioned in the group by. You can also refer to aggregate operations (such as count/min/max) on other columns not mentioned in the group by
I have never worked with SQLLite, but since it's specs say it is a Relational Database, it should allow the following query.
select CustomerID
from table t
where exists (
select *
from table
where CustomerID = t.CustomerID
and Product = 'A'
)
and exists (
select *
from table
where CustomerID = t.CustomerID
and Product = 'B'
)
I'd use a correlated sub-query with a HAVING clause to scoop in both products in a single WHERE clause.
SELECT
t.Customer
FROM
#t AS t
WHERE
EXISTS
(
SELECT
1
FROM
#t AS s
WHERE
t.Customer = s.Customer
AND s.Product IN ('A', 'B')
HAVING
COUNT(DISTINCT s.Product) = 2
)
GROUP BY
t.Customer;
Select customerid from table group by customerid having product like 'A' and product like 'B' or
you can try having count(distinct product) =2this seems to be more accurate.
The whole idea is in a group of customerid suppose 1 if I have several A's and B's count(distinct product) will give as 2 else it will be 1 so the answer is as above.
Another way I just figured out was
SELECT CustomerID
FROM table
WHERE product in ('a','b')
GROUP BY customerid
HAVING sum(case product ='a' then 1 else 0 end) > 0
and sum(case when product ='b' then 1 else 0 end) > 0

SQL: Need to find duplicate rows across multiple columns

I've tried using other solutions I've found, but none have worked. I have a table with four columns, Supplier, ProductCode, Description, and Price. The Supplier field is linked to another table with a list of suppliers. I need to find any records that have the exact same Supplier and ProductCode. Thanks in advance!!
I copied this code from another thread and tried to modify it for my table, but I get errors:
SELECT s.id, t.*
FROM ListPrices AS s
JOIN (SELECT Supplier, ProductCode, count(*) AS qty
FROM ListPrices GROUP BY Supplier, [ProductCode] HAVING count(*) > 1)
AS t ON (s.ProductCode = t.ProductCode) AND (s.Supplier = t.Supplier);
May not be exactly what you are looking for but one way to do this is to group by supplier and productcode and check if the count > 1
select * from (
select supplier,productcode,count(*) as count_rows
from listprices group by supplier,productcode) inner_table
where inner_table.count_rows > 1;

How do I select those records where the group by clause returns 2 or more?

I'd like to return a list of items of only those that have two or more in the group:
select count(item_id) from items group by type_id;
Specifically, I'd like to know the values of item_id when the count(item_id) == 2.
You're asking for something that's not particularly possible without a subquery.
Basically, you want to list all values in a column while aggregating on that same column. You can't do this. Aggregating on a column makes it impossible to list of all the individual values from that column.
What you can do is find all type_id values which have an item_id count equal to 2, then select all item_ids from records matching those type_id values:
SELECT item_id
FROM items
WHERE type_id IN (
SELECT type_id
FROM items
GROUP BY type_id
HAVING COUNT(item_id) = 2
)
This is best expressed using a join rather than a WHERE IN clause, but the idea is the same no matter how you approach it. You may also want to select distinct item_ids in which case you'll need the DISTINCT keyword before item_id in the outer query.
If your SQL dialect includes GROUP_CONCAT(), that could be used to generate a list of items without the inner query. However, the results differ; the inner query returns one item id per row, where GROUP_CONCAT() returns multiple ids as a string.
SELECT type_id, GROUP_CONCAT(item_id), COUNT(item_id) as number
FROM items
GROUP BY type_id
HAVING number = 2
Try this sql query:
select count(item_id) from items group by type_id having count(item_id)=2;

Group related records, but pick certain fields from only the first record

I'm preforming an aggregate function on multiple records, which are grouped by a common ID. The problem is, I also want to export some other fields, which might be different within the grouped records, but I want to get those certain fields from one of the records (the first one, according to the query's ORDER BY).
Starting point example:
SELECT
customer_id,
sum(order_total),
referral_code
FROM order
GROUP BY customer_id
ORDER BY date_created
I need to query the referral code, but doing it outside of an aggregate function means I have to group by that field as well, and that's not what I want - I need exactly one row per customer in this example. I really only care about the referral code from the first order, and I'm happy to throw out any later referral codes.
This is in PostgreSQL, but maybe syntax from other DBs could be similar enough to work.
Rejected solutions:
Can't use max() or min() because order is significant.
A subquery might work at first, but does not scale; this is an extremely reduced example. My actual query has dozens of fields like referral_code which I only want the first instance of, and dozens of WHERE clauses which, if duplicated in a subquery, would make for a maintenance nightmare.
Well, it's actually pretty simple.
First, let's write a query that will do the aggregation:
select customer_id, sum(order_total)
from order
group by customer_id
now, let's write a query that would return 1st referral_code and date_created for given customer_id:
select distinct on (customer_id) customer_id, date_created, referral_code
from order
order by customer_id, date_created
Now, you can simply join the 2 selects:
select
x1.customer_id,
x1.sum,
x2.date_created,
x2.referral_code
from
(
select customer_id, sum(order_total)
from order
group by customer_id
) as x1
join
(
select distinct on (customer_id) customer_id, date_Created, referral_code
from order
order by customer_id, date_created
) as x2 using ( customer_id )
order by x2.date_created
I didn't test it, so there could be typos in it, but generally it should work.
You will need window functions.
It's kind of GROUP BY, but you can still access the individual rows.
Only used the Oracle equivalent though.
If the date_created is guaranteed to be unique per customer_id, then you can do this:
[simple table]
create table ordertable (customer_id int, order_total int, referral_code char, date_created datetime)
insert ordertable values (1,10, 'a', '2009-01-01')
insert ordertable values (2,15, 'b', '2009-01-02')
insert ordertable values (1,35, 'c', '2009-01-03')
[replace my lame table names with something better :)]
SELECT
orderAgg.customer_id,
orderAgg.order_sum,
referral.referral_code as first_referral_code
FROM (
SELECT
customer_id,
sum(order_total) as order_sum
FROM ordertable
GROUP BY customer_id
) as orderAgg join (
SELECT
customer_id,
min(date_created) as first_date
FROM ordertable
GROUP BY customer_id
) as dateAgg on orderAgg.customer_id = dateAgg.customer_id
join ordertable as referral
on dateAgg.customer_id = referral.customer_id
and dateAgg.first_date = referral.date_created
Perhaps something like:
SELECT
O1.customer_id,
O1.referral_code,
SQ.total
FROM
Orders O1
LEFT OUTER JOIN Orders O2 ON
O2.customer_id = O1.customer_id AND
O2.date_created < O1.date_created
INNER JOIN (
SELECT
customer_id,
SUM(order_total) AS total
FROM
Orders
GROUP BY
customer_id
) SQ ON SQ.customer_id = O1.customer_id
WHERE
O2.customer_id IS NULL
Would something like this do the trick?
SELECT
customer_id,
sum(order_total),
(SELECT referral_code
FROM order o
WHERE o.customer_id = order.customer_id
ORDER BY date_created
LIMIT 1) AS customers_referral_code
FROM order
GROUP BY customer_id, customers_referral_code
ORDER BY date_created
This doesn't require you to maintain the WHERE clause in two places and maintains the order significance, but would get pretty hairy if you needed "dozens of fields" like referral_code. It's also fairly slow (at least on MySQL).
It sounds to me like referral_code and the dozens of fields like it should be in the customer table, not the order table, since they're logically associated 1:1 with the customer, not the order. Moving them there would make the query MUCH simpler.
This might also do the trick:
SELECT
o.customer_id,
sum(o.order_total),
c.referral_code, c.x, c.y, c.z
FROM order o LEFT JOIN (
SELECT referral_code, x, y, z
FROM orders c
WHERE c.customer_id = o.customer_id
ORDER BY c.date_created
LIMIT 1
) AS c
GROUP BY o.customer_id, c.referral_code
ORDER BY o.date_created
SELECT customer_id, order_sum,
(first_record).referral, (first_record).other_column
FROM (
SELECT customer_id,
SUM(order_total) AS order_sum,
(
SELECT oi
FROM order oi
WHERE oi.customer_id = o.customer_id
LIMIT 1
) AS first_record
FROM order o
GROUP BY
customer_id
) q