Finding customers that only bought items no one else bought - sql

Below is a list of orders, is there a way to find the person_id of the customers, that has only bought products no one else has bought?
CREATE TABLE orders
AS
SELECT product_id, person_id
FROM ( VALUES
( 1 , 1 ),
( 2 , 1 ),
( 2 , 2 ),
( 3 , 3 ),
( 12, 6 ),
( 10, 3 )
) AS t(product_id, person_id);
The result would be the following table:
| person_id |
|-----------|
| 3 |
| 6 |
Do i have to find all the people who did buy items no one else bought and create a table that doesn't include those people?

You want all the products purchased by the person to be unique.
select person_id
from (select t.*,
min(person_id) over (partition by product_id) as minp,
max(person_id) over (partition by product_id) as maxp
from t
) t
group by person_id
having sum(case when minp <> maxp then 1 else 0 end) = 0;
You are probably thinking "Huh? What does this do?".
The subquery calculates the minimum person and maximum person on each product. If these are the same, than that one person is the only purchaser.
The having then checks that there are no non-single-purchaser products for a given person.
Perhaps a more intuitive phrasing of the logic would be:
select person_id
from (select t.*,
count(distinct person_id) over (partition by product_id) as numpersons
from t
) t
group by person_id
having max(numperson) = 1;
Alas, Postgres doesn't support COUNT(DISTINCT) as a window function.

The traditional self join with boolean aggregation
select o0.person_id
from
orders o0
left join
orders o1 on o0.product_id = o1.product_id and o0.person_id <> o1.person_id
group by o0.person_id
having bool_and(o1.product_id is null)
;
person_id
-----------
3
6

The inline view which is being joined gets all the product_ids which have only one person_id. Once all product_ids are found they will be joined to the original customers table to get the person_ids. This should solve your problem!!
SELECT person_id
FROM customers c1
INNER JOIN
(
SELECT product_id
FROM customers
GROUP BY product_id
HAVING COUNT(person_id ) = 1
) c2
ON c1.product_id = c2.product_id;

This is Gordon's logic using aggregates only:
SELECT person_id
FROM
(
SELECT product_id,
-- if count = 1 it's the only customer who bought this product
min(person_id) as person_id,
-- if the combination(person_id,product_id) is unique DISTINCT can be removed
count(distinct person_id) as cnt
FROM customers
GROUP BY product_id
) AS dt
GROUP BY person_id
HAVING max(cnt) = 1 -- only unique products

Here is another solution:
with unique_products as
(select product_id
from orders
group by product_id
having count(*) = 1)
select person_id
from orders
except
select person_id
from orders
where not exists
(select * from unique_products where unique_products.product_id = orders.product_id)
First all the identifier of products that appear in a single order are found. Then we subtract from all the persons (in the orders) those which do not have a order with a single product (i.e. all the persons that have at least ordered a product ordered by somebody else).

Related

sql - select all rows that have all multiple same cols

I have a table with 4 columns.
date
store_id
product_id
label_id
and I need to find all store_ids that have all products_id with same label_id (for example 4)in one day.
for example:
store_id | label_id | product_id | data|
4 4 5 9/2
5 4 7 9/2
4 3 12 9/2
4 4 7 9/2
so it should return 4 because it's the only store that contains all possible products with label 4 at one day.
I have tried something like this:
(select store_id, date
from table
where label_id = 4
group by store_id, date
order by date)
I dont know how to write the outer query, I tried:
select * from table
where product_id = all(Inner query)
but it didnt work.
Thanks
It is unclear from your question whether the labels are specific to a given day or through the entire period. But a variation of Tim's answer seems appropriate. For any label:
SELECT t.date, t.label, t.store_id
FROM t
GROUP BY t.date, t.label, t.store_id
HAVING COUNT(DISTINCT t.product_id) = (SELECT COUNT(DISTINCT t2product_id)
FROM t t2
WHERE t2.label = t.label
);
For a particular label:
SELECT t.date, t.store_id
FROM t
WHERE t.label = 4
GROUP BY t.date,t.store_id
HAVING COUNT(DISTINCT t.product_id) = (SELECT COUNT(DISTINCT t2product_id)
FROM t t2
WHERE t2.label = t.label
);
If the labels are specific to the date, then you need that comparison in the outer queries as well.
Here is one way:
SELECT date, store_id
FROM yourTable
GROUP BY date, store_id
HAVING COUNT(DISTINCT product_id) = (SELECT COUNT(DISTINCT product_id)
FROM yourTable t2
WHERE t2.date = t1.date)
ORDER BY date, product_id;
This query reads in a pretty straightforward way, and it says to find every product, on some date, whose distinct product count is the same as the distinct product count on the same day, across all stores.
I'd probably aggregate to lists of products in a string or array:
with products_per_day_and_store as
(
select
store_id,
date,
string_agg(distinct product_id order by product_id) as products
from mytable
where label_id = 4
group by store_id, date
)
, products_per_day
(
select
date,
string_agg(distinct product_id order by product_id) as products
from mytable
where label_id = 4
group by date
)
select distinct ppdas.store_id
from products_per_day_and_store ppdas
join products_per_day ppd using (date, products);

SQL Select Group By Min() - but select other

I want to select the ID of the Table Products with the lowest Price Grouped By Product.
ID Product Price
1 123 10
2 123 11
3 234 20
4 234 21
Which by logic would look like this:
SELECT
ID,
Min(Price)
FROM
Products
GROUP BY
Product
But I don't want to select the Price itself, just the ID.
Resulting in
1
3
EDIT: The DBMSes used are Firebird and Filemaker
You didn't specify your DBMS, so this is ANSI standard SQL:
select id
from (
select id,
row_number() over (partition by product order by price) as rn
from orders
) t
where rn = 1
order by id;
If your DBMS doesn't support window functions, you can do that with joining against a derived table:
select o.id
from orders o
join (
select product,
min(price) as min_price
from orders
group by product
) t on t.product = o.product and t.min_price = o.price;
Note that this will return a slightly different result then the first solution: if the minimum price for a product occurs more then once, all those IDs will be returned. The first solution will only return one of them. If you don't want that, you need to group again in the outer query:
select min(o.id)
from orders o
join (
select product,
min(price) as min_price
from orders
group by product
) t on t.product = o.product and t.min_price = o.price
group by o.product;
SELECT ID
FROM Products as A
where price = ( select Min(Price)
from Products as B
where B.Product = A.Product )
GROUP BY id
This will show the ID, which in this case is 3.

Return only unique rows from a Table

I have a table with 4 columns and 7 rows.
This table contains 1 customer with the same ID same LNAME and FNAME.
Also the table has 2 customers with the same ID, but different LNAME or FNAME.
That is the sales reps input error. Ideally my table should have only 2 rows (Row with ID_pk 3 and 7)
I need to have the following result-sets from the above table:
All unique rows by all the four columns (Row with ID_pk 3 and 7). (excluding case # 3 listed below)
All duplicates by all the four columns (Row with ID_pk 3 and 8).
All duplicates by Customer_ID but with not matching LNAME and/or FNAME (Row with ID_pk 1, 2, 4 and 5) (these rows have to be sent back to sales reps for validation.)
Doing stuff this like relies heavily on nested queries, the GROUP BY clause, and the COUNT function.
Part 1 - Unique rows
This query will show you all the rows where the customer ID has matching data.
SELECT Customer_ID, Customer_FNAME, Customer_LNAME FROM dbo.customers WHERE Customer_ID IN (
SELECT Customer_ID FROM (
SELECT DISTINCT Customer_ID, Customer_FNAME, Customer_LNAME FROM dbo.customers
GROUP BY Customer_ID, Customer_FNAME, Customer_LNAME
) Customers
GROUP BY Customer_ID
HAVING COUNT(Customer_ID) = 1
)
GROUP BY Customer_ID, Customer_FNAME, Customer_LNAME
Part 2 - Duplicates
This query will show you all the rows that have the same data entered more than once.
SELECT Customer_ID, Customer_FNAME, Customer_LNAME
FROM dbo.customers
GROUP BY Customer_ID, Customer_FNAME, Customer_LNAME
HAVING COUNT(Customer_ID) > 1
Part 3 - Mismatched Data
This query is basically the same as the first, just looking for a different COUNT value.
SELECT Customer_ID, Customer_FNAME, Customer_LNAME FROM dbo.customers WHERE Customer_ID IN (
SELECT Customer_ID FROM (
SELECT DISTINCT Customer_ID, Customer_FNAME, Customer_LNAME FROM dbo.customers
GROUP BY Customer_ID, Customer_FNAME, Customer_LNAME
) Customers
GROUP BY Customer_ID
HAVING COUNT(Customer_ID) > 1
)
GROUP BY Customer_ID, Customer_FNAME, Customer_LNAME
You may use a CTE (Common Table expression): https://msdn.microsoft.com/en-us/library/ms175972.aspx
;WITH checkDup AS (
SELECT Customer_ID, ROW_NUMBER() OVER (PARTITION BY Customer_ID ORDER BY Customer ID) AS 'RN'
FROM Table)
SELECT Customer_ID FROM checkDup
WHERE RN = 1;
Will give you your example output.
You may manipulate the CTE to get the other results you seek.

ORACLE SQL Return only duplicated values (not the original)

I have a database with the following info
Customer_id, plan_id, plan_start_dte,
Since some customer switch plans, there are customers with several duplicated customer_ids, but with different plan_start_dte. I'm trying to count how many times a day members switch to the premium plan from any other plan ( plan_id = 'premium').
That is, I'm trying to do roughly this: return all rows with duplicate customer_id, except for the original plan (min(plan_start_dte)), where plan_id = 'premium', and group them by plan_start_dte.
I'm able to get all duplicate records with their count:
with plan_counts as (
select c.*, count(*) over (partition by CUSTOMER_ID) ct
from CUSTOMERS c
)
select *
from plan_counts
where ct > 1
The other steps have me stuck. First I tried to select everything except the original plan:
SELECT CUSTOMERS c
where START_DTE not in (
select min(PLAN_START_DTE)
from CUSTOMERS i
where c.CUSTOMER_ID = i.CUSTOMER_ID
)
But this failed. If I can solve this I believe all I have to add is an additional condition where c.PLAN_ID = 'premium' and then group by date and do a count. Anyone have any ideas?
I think you want lag():
select c.*
from (select c.*,
lag(plan_id) over (partition by customer_id order by plan_start_date) as prev_plan_id
from customers c
) c
where prev_plan_id <> 'premium' and plan_id = 'premium';
I'm not sure what output you want. For the number of times this occurs per day:
select plan_start_date, count(*)
from (select c.*, lag(plan_id) over (partition by customer_id order by plan_start_date) as prev_plan_id
from customers c
) c
where prev_plan_id <> 'premium' and plan_id = 'premium'
group by plan_start_date
order by plan_start_date;

T-Sql find duplicate row values

I want to write a stored procedure.
In that stored procedure, I want to find duplicate row values from a table, and calculate sum operation on these rows to the same table.
Let's say, I have a CustomerSales table;
ID SalesRepresentative Customer Quantity
1 Michael CustA 55
2 Michael CustA 10
and I need to turn table to...
ID SalesRepresentative Customer Quantity
1 Michael CustA 65
2 Michael CustA 0
When I find SalesRepresentative and Customer duplicates at the same time, I want to sum all Quantity values of these rows and assign to the first row of a table, and others will be '0'.
Could you help me.
To aggregate duplicates into one row:
SELECT min(ID) AS ID, SalesRepresentative, Customer
,sum(Quantity) AS Quantity
FROM CustomerSales
GROUP BY SalesRepresentative, Customer
ORDER BY min(ID)
Or, if you actually want those extra rows with 0 as Quantity in the result:
SELECT ID, SalesRepresentative, Customer
,CASE
WHEN (count(*) OVER (PARTITION BY SalesRepresentative,Customer)) = 1
THEN Quantity
WHEN (row_number() OVER (PARTITION BY SalesRepresentative,Customer
ORDER BY ID)) = 1
THEN sum(Quantity) OVER (PARTITION BY SalesRepresentative,Customer)
ELSE 0
END AS Quantity
FROM CustomerSales
ORDER BY ID
This makes heavy use of window functions.
Alternative version without window functions:
SELECT min(ID) AS ID, SalesRepresentative, Customer, sum(Quantity) AS Quantity
FROM CustomerSales
GROUP BY SalesRepresentative, Customer
UNION ALL
SELECT ID, SalesRepresentative, Customer, 0 AS Quantity
FROM CustomerSales c
GROUP BY SalesRepresentative, Customer
LEFT JOIN (
SELECT min(ID) AS ID
FROM CustomerSales
GROUP BY SalesRepresentative, Customer
) x ON (x.ID = c.ID)
WHERE x.ID IS NULL
ORDER BY ID