Select all customers loyal to one company? - sql

I've got tables:
TABLE | COLUMNS
----------+----------------------------------
CUSTOMER | C_ID, C_NAME, C_ADDRESS
SHOP | S_ID, S_NAME, S_ADDRESS, S_COMPANY
ORDER | S_ID, C_ID, O_DATE
I want to select id of all customers who made order only from shops of one company - 'Samsung' ('LG', 'HP', ... doesn't really matter, it's dynamic).
I've come only with one solution, but I consider it ugly:
( SELECT DISTINCT c_id FROM order JOIN shop USING(s_id) WHERE s_company = "Samsung" )
EXCEPT
( SELECT DISTINCT c_id FROM order JOIN shop USING(s_id) WHERE s_company != "Samsung" );
Same SQL queries, but reversed operator. Isn't there any aggregate method which solves such query better?
I mean, there could be millions of orders(I don't really have orders, I've got something that occurs more often).
Is it efficient to select thousands of orders and then compare them to hundreds of thousands orders which have different company? I know, that it compares sorted things, so it's O( m + n + sort(n) + sort(m) ). But that's still large for millions of records, or isn't?
And one more question. How could I select all customer values (name, address). How can I join them, can I do just
SELECT CUSTOMER.* FROM CUSTOMER JOIN ( (SELECT...) EXCEPT (SELECT...) ) USING (C_ID);
Disclaimer: This question ain't homework. It's preparation for the exam and desire to things more effective. My solution would be accepted at exam, but I like effective programming.

I like to approach this type of question using group by and a having clause. You can get the list of customers using:
select o.c_id
from orders o join
shops s
on o.s_id = o.s_id
group by c_id
having min(s.s_company) = max(s.s_company);
If you care about the particular company, then:
having min(s.s_company) = max(s.s_company) and
max(s.s_company) = 'Samsung'
If you want full customer information, you can join the customers table back in.
Whether this works better than the except version is something that would have to be tested on your system.

How about a query that uses no aggregate functions like Min and Max?
select C_ID, S_ID
from shop
group by C_ID, S_ID;
Now we have a distinct list of customers and all the companies they shopped at. The loyal customers will be the ones who only appear once in the list.
select C_ID
from Q1
group by C_ID
having count(*) = 1;
Join back to the first query to get the company id:
with
Q1 as(
select C_ID, S_ID
from shop
group by C_ID, S_ID
),
Q2 as(
select C_ID
from Q1
group by C_ID
having count(*) = 1
)
select Q1.C_ID, Q1.S_ID
from Q1
join Q2
on Q2.C_ID = Q1.C_ID;
Now you have a list of loyal customers and the one company each is loyal to.

Related

Find a value from table that is only unique to one id

I have a table "stats" that consists 3 ids.
IDs: id_seller, id_part and id_proj
From this table, I want to find projects (id_proj) which buy specific parts (id_part) that is avaiable only from one seller.
In other words: Find id_proj, which buy id_parts, which are avaiable only from one seller (seller S5 is the only seller that sells P2).
So, In this example id_part (P2) is the only part id, that is specific and it is selling just to id_seller (S5).
The return should be: J2, J4
I ve tried with something like this:
SELECT DISTINCT s.id_proj
FROM stats s
WHERE NOT id_part IN (
SELECT s2.id_part
FROM stats s2
WHERE s2.id_seller = 'S5');
Use Group By with Having . Group by id_part and select pieces that have a seller.
Then join the result to the main table to get the information you need.
SELECT s2.*
FROM
(SELECT id_part,max(id_seller) AS id_seller
FROM stats
GROUP BY id_part
HAVING COUNT(DISTINCT id_seller) = 1) s1
JOIN stats s2 ON s1.id_part = s2.id_part AND s1.id_seller = s2.id_seller
select distinct id_proj from stats
where id_part in
(SELECT s.id_part
FROM stats s
WHERE s.id_seller = 'S5'
and id_part not in (select id_part from stats where id_seller <> 'S5') --**this take only part from seller S5**
)
if this is a homework though you should really understand the concept of IN and NOT IN like in this thread, EXIST and NOT EXIST.

Postgres Question: Aren't both a and b correct?

For questions below, use the following schema definition.
restaurant(rid, name, phone, street, city, state, zip)
customer(cid, fname, lname, phone, street, city, state, zip)
carrier(crid, fname, lname, lp)
delivery(did, rid, cid, tim, size, weight)
pickup(did, tim, crid)
dropoff(did, tim, crid)
It's a schema for a food delivery business that employs food carriers (carrier table).
Customers (customer table) order food from restaurants (restaurant table).
The restaurants order a delivery (delivery table); to deliver food from restaurant to customer.
The pickup table records when carrier picks up food at restaurant.
The dropoff table records when carrier drops off food at customer.
1.Find customers who have less than 5 deliveries.
a. select cid,count()
from delivery
group by cid
having count() < 5;
b. select a.cid,count()
from customer a
inner join delivery b
using(cid)
group by a.cid
having count() < 5;
c. select a.cid,count()
from customer a
left outer join delivery b
on a.cid=b.cid
group by a.cid
having count() < 5;
d. select cid,sum(case when b.cid is not null then 1 else 0 end)
from customer a
left outer join delivery b
using (cid)
group by cid
having sum(case when b.cid is not null then 1 else 0 end) < 5;
e. (write your own answer)
No, they are not correct. They miss customers who have had no deliveries.
The last is the best of a bunch of not so good queries. A better version would be:
select c.cid, count(d.cid)
from customer c left outer join
delivery d
on c.cid = d.cid
group by c.cid
having count(d.cid) < 5;
The sum(case) is over kill. And Postgres even offers a better solution than that!
count(*) filter (where d.cid is not null)
But count(d.cid) is still more concise.
Also note the use of meaningful table aliases. Don't get into the habit of using arbitrary letters for tables. That just makes queries hard to understand.

Making simple SQL more efficient

SQL Fiddle.
I'm having a slow start to the morning. I thought there was a more efficient way to make the following query using a join, instead of two independent selects -- am I wrong?
Keep in mind that I've simplified/reduced my query into this example for SO purposes, so let me know if you have any questions as well.
SELECT DISTINCT c.*
FROM customers c
WHERE c.customer_id IN (select customer_id from customers_cars where car_make = 'BMW')
AND c.customer_id IN (select customer_id from customers_cars where car_make = 'Ford')
;
Sample Table Schemas
-- Simple tables to demonstrate point
CREATE TABLE customers (
customer_id serial,
name text
);
CREATE TABLE customers_cars (
customer_id integer,
car_make text
);
-- Populate tables
INSERT INTO customers(name) VALUES
('Joe Dirt'),
('Penny Price'),
('Wooten Nagen'),
('Captain Planet')
;
INSERT INTO customers_cars(customer_id,car_make) VALUES
(1,'BMW'),
(1,'Merc'),
(1,'Ford'),
(2,'BMW'),
(2,'BMW'), -- Notice car_make is not unique
(2,'Ferrari'),
(2,'Porche'),
(3,'BMW'),
(3,'Ford');
-- ids 1 and 3 both have BMW and Ford
Other Expectations
There are ~20 car_make in the database
There are typically 1-3 car_make per customer_id
There is expected to be not more than 50 car_make assignments per customer_id (generally 20-30)
The query is generally only going to look for 2-3 specific car_make per customer (e.g., BMW and Ford), but not 10-20
And here another option, don't know what the fastest one would be on large tables.
SELECT customers.*
FROM customers
JOIN customers_cars USING(customer_id)
WHERE car_make = ANY(ARRAY['BMW','Ford'])
GROUP BY
customer_id, name
HAVING array_agg(car_make) #> ARRAY['BMW','Ford'];
vol7ron:
Fiddle
The following is a modification of the above, taking the same idea using an array for comparison. I'm not sure how any more efficient it would be compared to the dual-query approach, since it would have to create an array as one pass and then do more heavy-handed comparison because of comparing the elements of an array.
SELECT DISTINCT c.*
FROM customers c
WHERE customer_id IN (
select customer_id
from customers_cars
group by customer_id
having array_agg(car_make) #> ARRAY['BMW','Ford']
);
I would write it as
SELECT DISTINCT c.customer_id
FROM customers c
JOIN customers_cars cc_f on c.customer_id = cc_f.customer_id and cc_f.car_make = 'Ford'
JOIN customers_cars cc_b on c.customer_id = cc_b.customer_id and cc_b.car_make = 'BMW'
;
Whether this is better or not I don't know. In some RDBMs plain joins like this work better than subqueries, but I don't know about Postgres. From readability point of view it is also questionable.
It seems to me that you are trying to find customers that has at least 1 BMW and at least 1 Ford car.
This query should get that for you:
SELECT
customers.customer_id
FROM
customers
INNER JOIN customer_cars ON
customers.customer_id = customer_cars.customers_id
AND customer_cars.car_make IN ('BMW', 'Ford')
GROUP BY
customers.customer_id
HAVING
COUNT(CASE WHEN car_make = 'BMW' THEN 1 ELSE NULL END) > 0
AND COUNT(CASE WHEN car_make = 'Ford' THEN 1 ELSE NULL END) > 0
Make sure you have an indexes on customer_cars.customer_id and customer_cars.car_make to achieve maximum performance.
You don't need to join to customers at all (given relational integrity).
Generally, this is a case of relational division. We assembled an arsenal of techniques under this related question:
How to filter SQL results in a has-many-through relation
Unique combinations
If (customer_id, car_make) was defined unique in customers_cars, it would get much simpler:
SELECT customer_id
FROM customers_cars
WHERE car_make IN ('BMW', 'Ford')
GROUP BY 1
HAVING count(*) = 2;
Combinations not unique
Since (customer_id, car_make) is not unique, we need an extra step.
For only a few cars, your original query is not that bad. But (especially with duplicates!) EXISTS is typically faster than IN, and we don't need the final DISTINCT:
SELECT customer_id -- no DISTINCT needed.
FROM customers c
WHERE EXISTS (SELECT 1 FROM customers_cars WHERE customer_id = c.customer_id AND car_make = 'BMW')
AND EXISTS (SELECT 1 FROM customers_cars WHERE customer_id = c.customer_id AND car_make = 'Ford');
Above query gets verbose and less efficient for a longer list of cars. For an arbitrary number of cars I suggest:
SELECT customer_id
FROM (
SELECT customer_id, car_make
FROM customers_cars
WHERE car_make IN ('BMW', 'Ford')
GROUP BY 1, 2
) sub
GROUP BY 1
HAVING count(*) = 2;
SQL Fiddle.

sql query to select matching rows for all or nothing criteria

I have a table of cars where each car belongs to a company. In another table I have a list of company locations by city.
I want to select all cars from the cars table whose company has locations on all cities passed into the stored procedure, otherwise exclude those cars all together even if it falls short of one city.
So, I've tried something like:
select id, cartype from cars where companyid in
(
select id from locations where cityid in
(
select id from cities
)
)
This doesn't work as it obviously satisfies the condition if ANY of the cities are in the list, not all of them.
It sounds like a group by count, but can't make it work with what I tried.
I"m using MS SQL 2005
One example:
select id, cartype from cars c
where ( select count(1) from cities where id in (...))
= ( select count(distinct cityid)
from locations
where c.companyid = locations.id and cityid in (...) )
Maybe try counting all the cities, and then select the car if the company has the same number of distinct location cities are there are total cities.
SELECT id, cartype FROM cars
WHERE
--Subquery to find the number of locations belonging to car's company
(SELECT count(distinct cities.id) FROM cities
INNER JOIN locations on locations.cityid = cities.id
WHERE locations.companyId = cars.companyId)
=
--Subquery to find the total number of locations
(SELECT count(distinct cities.id) FROM cities)
I haven't tested this, and it may not be the most efficient query, but I think this might work.
Try this
SELECT e.*
FROM cars e
WHERE NOT EXISTS (
SELECT 1
FROM Cities p
WHERE p.location = e.Location
)

SQL query for finding row with same column values that was created most recently

If I have three columns in my MySQL table people, say id, name, created where name is a string and created is a timestamp.. what's the appropriate query for a scenario where I have 10 rows and each row has a record with a name. The names could have a unique id, but a similar name none the less. So you can have three Bob's, two Mary's, one Jack and 4 Phil's.
There is also a hobbies table with the columns id, hobby, person_id.
Basically I want a query that will do the following:
Return all of the people with zero hobbies, but only check by the latest distinct person created, if that makes sense. Meaning if there is a Bob person that was created yesterday, and one created today.. I only want to know if the Bob created today has zero hobbies. The one from yesterday is no longer relevant.
select pp.id
from people pp, (select name, max(created) from people group by name) p
where pp.name = p.name
and pp.created = p.created
and id not in ( select person_id from hobbies )
SELECT latest_person.* FROM (
SELECT p1.* FROM people p1
WHERE NOT EXISTS (
SELECT * FROM people p2
WHERE p1.name = p2.name AND p1.created < p2.created
)
) AS latest_person
LEFT OUTER JOIN hobbies h ON h.person_id = latest_person.id
WHERE h.id IS NULL;
Try This:
Select *
From people p
Where timeStamp =
(Select Max(timestamp)
From people
Where name = p.Name
And not exists
(Select * From hobbies
Where person_id = p.id))