SQL N:M query merging results by condition flag in intermediate table - sql

[First of all, if this is a duplicate, sorry, I couldn't find a response for this, as this is a strange solution for a limitation on an ORM and I'm clearly a noobie on SQL]
Domain requirements:
A brigades must be composed by one user (the commissar one) and, optionally, one and only one assistant (1:1)
A user can only be part of one brigade (1:1)
CREATE TABLE Users
(
id SERIAL PRIMARY KEY,
username VARCHAR(100) NOT NULL UNIQUE,
password VARCHAR(100) NOT NULL
);
CREATE TABLE Brigades
(
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL
);
-- N:M relationship with a flag inside which determine if that user is a commissar or not
CREATE TABLE Brigade_User
(
brigade_id INT NOT NULL REFERENCES Brigades(id)
ON DELETE CASCADE
ON UPDATE CASCADE,
user_id INT NOT NULL REFERENCES Users(id)
ON DELETE CASCADE
ON UPDATE CASCADE,
is_commissar BOOLEAN NOT NULL
PRIMARY KEY(brigade_id, user_id)
);
Ideally, as relations are 1:1, Brigade_User intermediate table could be erased and a Brigade table with two foreign keys could be created instead (this is not supported by Diesel Rust ORM, so I think I'm coupled to first approach)
CREATE TABLE Brigades
(
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL
-- 1:1
commisar_id INT NOT NULL REFERENCES Users(id)
ON DELETE CASCADE
ON UPDATE CASCADE,
-- 1:1
assistant_id INT NOT NULL REFERENCES Users(id)
ON DELETE CASCADE
ON UPDATE CASCADE
);
An example...
> SELECT * FROM brigade_user LEFT JOIN brigades ON brigade_user.brigade_id = brigades.id;
brigade_id | user_id | is_commissar | id | name
------------+---------+--------------+----+------------------
1 | 1 | t | 1 | Patrulla gatuna
1 | 2 | f | 1 | Patrulla gatuna
2 | 3 | t | 2 | Patrulla perruna
2 | 4 | f | 2 | Patrulla perruna
3 | 6 | t | 3 | Patrulla canina
3 | 5 | f | 3 | Patrulla canina
(4 rows)
Is it possible to make a query which returns a table like this?
brigade_id | commissar_id | assistant_id | name
-----------+--------------+--------------+--------------------
1 | 1 | 2 | Patrulla gatuna
2 | 3 | 4 | Patrulla perruna
3 | 6 | 5 | Patrulla canina
See that each two rows have been merged into one (remember, a brigade is composed by one commissary and, optionally, one assistant) depending on the flag.
Could this model be improved (having in mind the limitation on multiple foreign keys referencing the same table, discussed here)

Try the following:
with cte as
(
SELECT A.brigade_id,A.user_id,A.is_commissar,B.name
FROM brigade_user A LEFT JOIN brigades B ON A.brigade_id = B.id
)
select C1.brigade_id, C1.user_id as commissar_id , C2.user_id as assistant_id, C1.name from
cte C1 left join cte C2
on C1.brigade_id=C2.brigade_id
and C1.user_id<>C2.user_id
where C1.is_commissar=true
See a demo from here.

Related

Finding all entries with no new reference in another table within last two years

I have the following three tables:
CREATE TABLE group (
id SERIAL PRIMARY KEY,
name VARCHAR NOT NULL,
insert_date TIMESTAMP WITH TIME ZONE NOT NULL
);
CREATE TABLE customer (
id SERIAL PRIMARY KEY,
ext_id VARCHAR NOT NULL,
insert_date TIMESTAMP WITH TIME ZONE NOT NULL
);
CREATE TABLE customer_in_group (
id SERIAL PRIMARY KEY,
customer_id INT NOT NULL,
group_id INT NOT NULL,
insert_date TIMESTAMP WITH TIME ZONE NOT NULL,
CONSTRAINT customer_id_fk
FOREIGN KEY(customer_id)
REFERENCES customer(id),
CONSTRAINT group_id_fk
FOREIGN KEY(group_id)
REFERENCES group(id)
)
I need to find all of the groups which have not had any customer_in_group entities' group_id column reference them within the last two years. I then plan to delete all of the customer_in_groups that reference them, and finally delete that group after finding them.
So basically given the following two groups and the following 3 customer_in_groups
Group
| id | name | insert_date |
|----|--------|--------------------------|
| 1 | group1 | 2011-10-05T14:48:00.000Z |
| 2 | group2 | 2011-10-05T14:48:00.000Z |
Customer In Group
| id | group_id | customer_id | insert_date |
|----|----------|-------------|--------------------------|
| 1 | 1 | 1 | 2011-10-05T14:48:00.000Z |
| 2 | 1 | 1 | 2020-10-05T14:48:00.000Z |
| 3 | 2 | 1 | 2011-10-05T14:48:00.000Z |
I would expect just to get back group2, since group1 has a customer_in_group referencing it inserted in the last two years.
I am not sure how I would write the query that would find all of these groups.
As a starter, I would recommend enabling on delete cascade on foreing keys of customer_in_group.
Then, you can just delete the rows you want from groups, and it will drop the dependent rows in the child table. For this, you can use not exists:
delete from groups g
where not exists (
select 1
from customer_in_group cig
where cig.group_id = g.id and cig.insert_date >= now() - interval '2 year'
)

PostgreSQL: Delete a row and all of it's references (FK) existing in others tables?

Let's say i have a table named "users" like this:
+----+------------+-----------+
| id | first_name | last_name |
+----+------------+-----------+
| 1 | 'Sid' | 'Barrett' |
| 2 | 'Roger' | 'Waters' |
| 3 | 'Richard' | 'Wright' |
| 4 | 'David' | 'Gilmour' |
| 5 | 'Nick' | 'Mason' |
+----+------------+-----------+
Each "user" have lots of references on several tables with the same column name "user_id" as a FK.
If i need to delete an element i believe the procedure is to delete the relations first (to avoid the "violates foreign key constraint" error) and then the element itself in "users" table, right?
But... Is there any possibility to delete a "user" element with all its references on all the others tables?
I'm working with NodeJS (Express) and using PostgreSQL.
Please apologize if the answer is obvious, I'm a newbie in SQL.
Thanks a lot in advise!!
Yes, use on delete cascade on the foreign keys.
create table users (
id bigserial primary key
...
);
create table posts (
...
user_id bigint not null references users on delete cascade
...
)
Now when a user is deleted, all their associated posts will also be deleted.
This will go on, for example, if a post has comments...
create table comments (
...
post_id biging not null references posts on delete cascade
...
)
When a user is deleted their posts will be deleted, and those posts' comments will be deleted. That's the "cascade" part.

How to add foreign key constraint to Table A (id, type) referencing either of two tables Table B (id, type) or Table C (id, type)?

I'm looking to use two columns in Table A as foreign keys for either one of two tables: Table B or Table C. Using columns table_a.item_id and table_a.item_type_id, I want to force any new rows to either have a matching item_id and item_type_id in Table B or Table C.
Example:
Table A: Inventory
+---------+--------------+-------+
| item_id | item_type_id | count |
+---------+--------------+-------+
| 2 | 1 | 32 |
| 3 | 1 | 24 |
| 1 | 2 | 10 |
+---------+--------------+-------+
Table B: Recipes
+----+--------------+-------------------+-------------+----------------------+
| id | item_type_id | name | consistency | gram_to_fluid_ounces |
+----+--------------+-------------------+-------------+----------------------+
| 1 | 1 | Delicious Juice | thin | .0048472 |
| 2 | 1 | Ok Tasting Juice | thin | .0057263 |
| 3 | 1 | Protein Smoothie | heavy | .0049847 |
+----+--------------+-------------------+-------------+----------------------+
Table C: Products
+----+--------------+----------+--------+----------+----------+
| id | item_type_id | name | price | in_stock | is_taxed |
+----+--------------+----------+--------+----------+----------+
| 1 | 2 | Purse | $200 | TRUE | TRUE |
| 2 | 2 | Notebook | $14.99 | TRUE | TRUE |
| 3 | 2 | Computer | $1,099 | FALSE | TRUE |
+----+--------------+----------+--------+----------+----------+
Other Table: Item_Types
+----+-----------+
| id | type_name |
+----+-----------+
| 1 | recipes |
| 2 | products |
+----+-----------+
I want to be able to have an inventory table where employees can enter inventory counts regardless of whether an item is a recipe or a product. I don't want to have to have a product_inventory and recipe_inventory table as there are many operations I need to do across all inventory items regardless of item types.
One solution would be to create a reference table like so:
Table CD: Items
+---------+--------------+------------+-----------+
| item_id | item_type_id | product_id | recipe_id |
+---------+--------------+------------+-----------+
| 2 | 1 | NULL | 2 |
| 3 | 1 | NULL | 3 |
| 1 | 2 | 1 | NULL |
+---------+--------------+------------+-----------+
It just seems very cumbersome, plus I'd now need to add/remove products/recipes from this new table whenever they are added/removed from their respective tables. (Is there an automatic way to achieve this?)
CREATE TABLE [dbo].[inventory] (
[id] [bigint] IDENTITY(1,1) NOT NULL,
[item_id] [smallint] NOT NULL,
[item_type_id] [tinyint] NOT NULL,
[count] [float] NOT NULL,
CONSTRAINT [PK_inventory_id] PRIMARY KEY CLUSTERED ([id] ASC)
) ON [PRIMARY]
What I would really like to do is something like this...
ALTER TABLE [inventory]
ADD CONSTRAINT [FK_inventory_sources] FOREIGN KEY ([item_id],[item_type_id])
REFERENCES {[products] ([id],[item_type_id]) OR [recipes] ([id],[item_type_id])}
Maybe there is no solution as I'm describing it, so if you have any ideas where I can maintain the same/similar schema, I'm definitely open to hearing them!
Thanks :)
Since your products and recipes are stored separately, and appear to mostly have separate columns, then separate inventory tables is probably the correct approach. e.g.
CREATE TABLE dbo.ProductInventory
(
Product_id INT NOT NULL,
[count] INT NOT NULL,
CONSTRAINT FK_ProductInventory__Product_id FOREIGN KEY (Product_id)
REFERENCES dbo.Product (Product_id)
);
CREATE TABLE dbo.RecipeInventory
(
Recipe_id INT NOT NULL,
[count] INT NOT NULL,
CONSTRAINT FK_RecipeInventory__Recipe_id FOREIGN KEY (Recipe_id)
REFERENCES dbo.Recipe (Recipe_id )
);
If you need all types combined, you can simply use a view:
CREATE VIEW dbo.Inventory
AS
SELECT Product_id AS item_id,
2 AS item_type_id,
[Count]
FROM ProductInventory
UNION ALL
SELECT recipe_id AS item_id,
1 AS item_type_id
[Count]
FROM RecipeInventory;
GO
IF you create a new item_type, then you need to amend the DB design anyway to create a new table, so you would just need to amend the view at the same time
Another possibility, would be to have a single Items table, and then have Products/Recipes reference this. So you start with your items table, each of which has a unique ID:
CREATE TABLE dbo.Items
(
item_id INT IDENTITY(1, 1) NOT NULL
Item_type_id INT NOT NULL,
CONSTRAINT PK_Items__ItemID PRIMARY KEY (item_id),
CONSTRAINT FK_Items__Item_Type_ID FOREIGN KEY (Item_Type_ID) REFERENCES Item_Type (Item_Type_ID),
CONSTRAINT UQ_Items__ItemID_ItemTypeID UNIQUE (Item_ID, Item_type_id)
);
Note the unique key added on (item_id, item_type_id), this is important for referential integrity later on.
Then each of your sub tables has a 1:1 relationship with this, so your product table would become:
CREATE TABLE dbo.Products
(
item_id BIGINT NOT NULL,
Item_type_id AS 2,
name VARCHAR(50) NOT NULL,
Price DECIMAL(10, 4) NOT NULL,
InStock BIT NOT NULL,
CONSTRAINT PK_Products__ItemID PRIMARY KEY (item_id),
CONSTRAINT FK_Products__Item_Type_ID FOREIGN KEY (Item_Type_ID)
REFERENCES Item_Type (Item_Type_ID),
CONSTRAINT FK_Products__ItemID_ItemTypeID FOREIGN KEY (item_id, Item_Type_ID)
REFERENCES dbo.Item (item_id, item_type_id)
);
A few things to note:
item_id is again the primary key, ensuring the 1:1 relationship.
the computed column item_type_id (as 2) ensuring all item_type_id's are set to 2. This is key as it allows a foreign key constraint to be added
the foreign key on (item_id, item_type_id) back to the items table. This ensures that you can only insert a record to the product table, if the original record in the items table has an item_type_id of 2.
A third option would be a single table for recipes and products and make any columns not required for both nullable. This answer on types of inheritance is well worth a read.
I think there is a flaw in your database design. The best way to solve your actual problem, is to have Recipies and products as one single table. Right now you have a redundant column in each table called item_type_id. That column is not worth anything, unless you actually have the items in the same table. I say redundant, because it has the same value for absolutely every entry in each table.
You have two options. If you can not change the database design, work without foreign keys, and make the logic layer select from the correct tables.
Or, if you can change the database design, make products and recipies exist in the same table. You already have a item_type table, which can identify item categorization, so it makes sense to put all items in the same table
you can only add one constraint for a column or pair of columns. Think about apples and oranges. A column cannot refer to both oranges and apples. It must be either orange or apple.
As a side note, this can be somehow achieved with PERSISTED COMPUTED columns, however It only introduces overhead and complexity.
Check This for Reference
You can add some computed columns to the Inventory table:
ALTER TABLE Inventory
ADD _recipe_item_id AS CASE WHEN item_type_id = 1 THEN item_id END persisted
ALTER TABLE Inventory
ADD _product_item_id AS CASE WHEN item_type_id = 2 THEN item_id END persisted
You can then add two separate foreign keys to the two tables, using those two columns instead of item_id. I'm assuming the item_type_id column in those two tables is already computed/constraint appropriately but if not you may want to consider that too.
Because these computed columns are NULL when the wrong type is selected, and because SQL Server doesn't check FK constraints if at least one column value is NULL, they can both exist and only one or the other will be satisfied at any time.

Union all data unmanaged

Forgive me if this question already ask,not much of db guy here ,
here is what i tried,
select row_number() over (partition by name order by challanto_date) , *
from (
select
rma,
p.id,
p.name,
challanto_date,
CURRENT_TIMESTAMP as fromDate
from challan_to_vendor cv
left join challan_to_vendor_detail cvd on cv.id = cvd.challan_to_vendor_id
inner join main_product p on p.id = cvd.product_id
union all
select
rma,
p.id,
p.name,
CURRENT_TIMESTAMP as toDate,
challan_date
from challan_from_vendor cv
left join challan_from_vendor_detail cvd on cv.id = cvd.challan_from_vendor_id
inner join main_product p on p.id = cvd.product_id
) as a
Here is my create table script :
challan_from_vendor
CREATE TABLE public.challan_from_vendor
(
id character varying NOT NULL,
date_ad date,
rma integer DEFAULT 1,
CONSTRAINT psk PRIMARY KEY (id)
)
challan_from_vendor_detail
CREATE TABLE public.challan_from_vendor_detail
(
id character varying NOT NULL,
challan_from_id character varying,
product_id character varying,
CONSTRAINT psks PRIMARY KEY (id),
CONSTRAINT fsks FOREIGN KEY (challan_from_id)
REFERENCES public.challan_from_vendor (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
challan_to_vendor;
CREATE TABLE public.challan_to_vendor
(
id character varying NOT NULL,
date_ad date,
rma integer DEFAULT 1,
CONSTRAINT pk PRIMARY KEY (id)
)
challan_to_vendor_detail
CREATE TABLE public.challan_to_vendor_detail
(
id character varying NOT NULL,
challan_to_id character varying,
product_id character varying,
CONSTRAINT pks PRIMARY KEY (id),
CONSTRAINT fks FOREIGN KEY (challan_to_id)
REFERENCES public.challan_to_vendor (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
product
CREATE TABLE public.product
(
id character varying NOT NULL,
product_name character varying,
CONSTRAINT pks PRIMARY KEY (id)
)
Here is my table structures and desire output.
challan_from_vendor
| id | rma | date |
|:-----------|------------:|:------------:|
| 12012 | 0001 | 2018-11-10
| 123121 | 0001 | 2018-11-11
challan_to_vendor
| id | rma | date |
|:-----------|------------:|:------------:|
| 12 | 0001 | 2018-12-10
| 123 | 0001 | 2018-12-11
challan_from_vendor_detail
| id | challan_from_vendor_id | product_id |
|:-----------|------------:|:------------:|
| 121 | 12012 | 121313
| 1213 | 12012 | 131381
challan_to_vendor_detail
challan_from_vendor_detail
| id | challan_to_vendor_id | product_id |
|:-----------|------------------------|:------------:|
| 121 | 12 | 121313
| 1213 | 123 | 131381
product
| id | product_name |
|:-----------|------------:|
| 191313 | apple |
| 89113 | banana |
Output
| ram | product_id | challan_from_date | challan_to_date|
|:-----------|------------:|:-----------------:|:--------------:|
| 0001 | 191313| 2018-11-10 |2018-11-11 |
| 0001 | 89113 | 2018-12-10 |2018-12-11 |
There is some strange things in the query you have tried so it is not clear what tables, how they are related or what the columns are in those tables.
So by some guessing I give you this to start of with:
select
main_product.*,
challan_to_vendor.toDate,
challan_from_vendor.fromDate
from main_product
join challan_to_vendor using(product_id)
join challan_from_vendor using(product_id)
If you explain more about your db an what you want out of it I might be able to help you more.
Edit: So I could not run your create statements in my db since there was naming conflicts among other minor things. Here is some advice on the create process that I find useful:
Let the id's be integer instead of character varying otherwise it is probably a name-column and should not be named id. You also used integer-id's in your examples.
Use SERIAL PRIMARY KEY (see tutorial) to help you with the key creation. This also removes the naming-conflict since the constraints are given implicit unique names.
Use the same column-name for the same thing in all places to avoid confusion by having multiple things called id after a join plus that it simplify's the join. So for example the id of the product should be product_id in all places, that way you could use using(product_id) as your join condition.
So with the advises given above here's how I would create one of your table and then query them:
CREATE TABLE public.challan_to_vendor_detail
(
challan_to_vendor_detail_id SERIAL PRIMARY KEY,
challan_to_vendor_id integer,
product_id integer,
CONSTRAINT fks FOREIGN KEY (challan_to_vendor_id)
REFERENCES public.challan_to_vendor (challan_to_vendor_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
);
select
product_name,
challan_to_vendor.date_ad as date_to,
challan_from_vendor.date_ad as date_from
from product
join challan_to_vendor_detail using(product_id)
join challan_to_vendor using(challan_to_vendor_id)
join challan_from_vendor_detail using(product_id)
join challan_from_vendor using(challan_from_vendor_id)
Unfortunately the overall db-design does not make sense to me so I do not know if this is what you expect.
Good luck!

SQL Query 2 tables null results

I was asked this question in an interview:
From the 2 tables below, write a query to pull customers with no sales orders.
How many ways to write this query and which would have best performance.
Table 1: Customer - CustomerID
Table 2: SalesOrder - OrderID, CustomerID, OrderDate
Query:
SELECT *
FROM Customer C
RIGHT OUTER JOIN SalesOrder SO ON C.CustomerID = SO.CustomerID
WHERE SO.OrderID = NULL
Is my query correct and are there other ways to write the query and get the same results?
Answering for MySQL instead of SQL Server, cause you tagged it later with SQL Server, so I thought (since this was an interview question, that it wouldn't bother you, for which DBMS this is). Note though, that the queries I wrote are standard sql, they should run in every RDBMS out there. How each RDBMS handles those queries is another issue, though.
I wrote this little procedure for you, to have a test case. It creates the tables customers and orders like you specified and I added primary keys and foreign keys, like one would usually do it. No other indexes, as every column worth indexing here is already primary key. 250 customers are created, 100 of them made an order (though out of convenience none of them twice / multiple times). A dump of the data follows, posted the script just in case you want to play around a little by increasing the numbers.
delimiter $$
create procedure fill_table()
begin
create table customers(customerId int primary key) engine=innodb;
set #x = 1;
while (#x <= 250) do
insert into customers values(#x);
set #x := #x + 1;
end while;
create table orders(orderId int auto_increment primary key,
customerId int,
orderDate timestamp,
foreign key fk_customer (customerId) references customers(customerId)
) engine=innodb;
insert into orders(customerId, orderDate)
select
customerId,
now() - interval customerId day
from
customers
order by rand()
limit 100;
end $$
delimiter ;
call fill_table();
For me, this resulted in this:
CREATE TABLE `customers` (
`customerId` int(11) NOT NULL,
PRIMARY KEY (`customerId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `customers` VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),(50),(51),(52),(53),(54),(55),(56),(57),(58),(59),(60),(61),(62),(63),(64),(65),(66),(67),(68),(69),(70),(71),(72),(73),(74),(75),(76),(77),(78),(79),(80),(81),(82),(83),(84),(85),(86),(87),(88),(89),(90),(91),(92),(93),(94),(95),(96),(97),(98),(99),(100),(101),(102),(103),(104),(105),(106),(107),(108),(109),(110),(111),(112),(113),(114),(115),(116),(117),(118),(119),(120),(121),(122),(123),(124),(125),(126),(127),(128),(129),(130),(131),(132),(133),(134),(135),(136),(137),(138),(139),(140),(141),(142),(143),(144),(145),(146),(147),(148),(149),(150),(151),(152),(153),(154),(155),(156),(157),(158),(159),(160),(161),(162),(163),(164),(165),(166),(167),(168),(169),(170),(171),(172),(173),(174),(175),(176),(177),(178),(179),(180),(181),(182),(183),(184),(185),(186),(187),(188),(189),(190),(191),(192),(193),(194),(195),(196),(197),(198),(199),(200),(201),(202),(203),(204),(205),(206),(207),(208),(209),(210),(211),(212),(213),(214),(215),(216),(217),(218),(219),(220),(221),(222),(223),(224),(225),(226),(227),(228),(229),(230),(231),(232),(233),(234),(235),(236),(237),(238),(239),(240),(241),(242),(243),(244),(245),(246),(247),(248),(249),(250);
CREATE TABLE `orders` (
`orderId` int(11) NOT NULL AUTO_INCREMENT,
`customerId` int(11) DEFAULT NULL,
`orderDate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`orderId`),
KEY `fk_customer` (`customerId`),
CONSTRAINT `orders_ibfk_1` FOREIGN KEY (`customerId`) REFERENCES `customers` (`customerId`)
) ENGINE=InnoDB AUTO_INCREMENT=128 DEFAULT CHARSET=utf8;
INSERT INTO `orders` VALUES (1,247,'2013-06-24 19:50:07'),(2,217,'2013-07-24 19:50:07'),(3,8,'2014-02-18 20:50:07'),(4,40,'2014-01-17 20:50:07'),(5,52,'2014-01-05 20:50:07'),(6,80,'2013-12-08 20:50:07'),(7,169,'2013-09-10 19:50:07'),(8,135,'2013-10-14 19:50:07'),(9,115,'2013-11-03 20:50:07'),(10,225,'2013-07-16 19:50:07'),(11,112,'2013-11-06 20:50:07'),(12,243,'2013-06-28 19:50:07'),(13,158,'2013-09-21 19:50:07'),(14,24,'2014-02-02 20:50:07'),(15,214,'2013-07-27 19:50:07'),(16,25,'2014-02-01 20:50:07'),(17,245,'2013-06-26 19:50:07'),(18,182,'2013-08-28 19:50:07'),(19,166,'2013-09-13 19:50:07'),(20,69,'2013-12-19 20:50:07'),(21,85,'2013-12-03 20:50:07'),(22,44,'2014-01-13 20:50:07'),(23,103,'2013-11-15 20:50:07'),(24,19,'2014-02-07 20:50:07'),(25,33,'2014-01-24 20:50:07'),(26,102,'2013-11-16 20:50:07'),(27,41,'2014-01-16 20:50:07'),(28,94,'2013-11-24 20:50:07'),(29,43,'2014-01-14 20:50:07'),(30,150,'2013-09-29 19:50:07'),(31,218,'2013-07-23 19:50:07'),(32,131,'2013-10-18 19:50:07'),(33,77,'2013-12-11 20:50:07'),(34,2,'2014-02-24 20:50:07'),(35,45,'2014-01-12 20:50:07'),(36,230,'2013-07-11 19:50:07'),(37,101,'2013-11-17 20:50:07'),(38,31,'2014-01-26 20:50:07'),(39,56,'2014-01-01 20:50:07'),(40,176,'2013-09-03 19:50:07'),(41,223,'2013-07-18 19:50:07'),(42,145,'2013-10-04 19:50:07'),(43,26,'2014-01-31 20:50:07'),(44,62,'2013-12-26 20:50:07'),(45,195,'2013-08-15 19:50:07'),(46,153,'2013-09-26 19:50:07'),(47,179,'2013-08-31 19:50:07'),(48,104,'2013-11-14 20:50:07'),(49,7,'2014-02-19 20:50:07'),(50,209,'2013-08-01 19:50:07'),(51,86,'2013-12-02 20:50:07'),(52,110,'2013-11-08 20:50:07'),(53,204,'2013-08-06 19:50:07'),(54,187,'2013-08-23 19:50:07'),(55,114,'2013-11-04 20:50:07'),(56,38,'2014-01-19 20:50:07'),(57,236,'2013-07-05 19:50:07'),(58,79,'2013-12-09 20:50:07'),(59,96,'2013-11-22 20:50:07'),(60,37,'2014-01-20 20:50:07'),(61,207,'2013-08-03 19:50:07'),(62,22,'2014-02-04 20:50:07'),(63,120,'2013-10-29 20:50:07'),(64,200,'2013-08-10 19:50:07'),(65,51,'2014-01-06 20:50:07'),(66,181,'2013-08-29 19:50:07'),(67,4,'2014-02-22 20:50:07'),(68,123,'2013-10-26 19:50:07'),(69,108,'2013-11-10 20:50:07'),(70,55,'2014-01-02 20:50:07'),(71,76,'2013-12-12 20:50:07'),(72,6,'2014-02-20 20:50:07'),(73,18,'2014-02-08 20:50:07'),(74,211,'2013-07-30 19:50:07'),(75,53,'2014-01-04 20:50:07'),(76,216,'2013-07-25 19:50:07'),(77,32,'2014-01-25 20:50:07'),(78,74,'2013-12-14 20:50:07'),(79,138,'2013-10-11 19:50:07'),(80,197,'2013-08-13 19:50:07'),(81,221,'2013-07-20 19:50:07'),(82,118,'2013-10-31 20:50:07'),(83,61,'2013-12-27 20:50:07'),(84,28,'2014-01-29 20:50:07'),(85,16,'2014-02-10 20:50:07'),(86,39,'2014-01-18 20:50:07'),(87,3,'2014-02-23 20:50:07'),(88,46,'2014-01-11 20:50:07'),(89,189,'2013-08-21 19:50:07'),(90,59,'2013-12-29 20:50:07'),(91,249,'2013-06-22 19:50:07'),(92,127,'2013-10-22 19:50:07'),(93,47,'2014-01-10 20:50:07'),(94,178,'2013-09-01 19:50:07'),(95,141,'2013-10-08 19:50:07'),(96,188,'2013-08-22 19:50:07'),(97,220,'2013-07-21 19:50:07'),(98,15,'2014-02-11 20:50:07'),(99,175,'2013-09-04 19:50:07'),(100,206,'2013-08-04 19:50:07');
Okay, now to the queries. Three ways came to my mind, I omitted the right join that MDiesel did, because it's actually just another way of writing left join. It was invented for lazy sql developers, that don't want to switch table names, but instead just rewrite one word.
Anyway, first query:
select
c.*
from
customers c
left join orders o on c.customerId = o.customerId
where o.customerId is null;
Results in an execution plan like this:
+----+-------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+
| 1 | SIMPLE | c | index | NULL | PRIMARY | 4 | NULL | 250 | Using index |
| 1 | SIMPLE | o | ref | fk_customer | fk_customer | 5 | wtf.c.customerId | 1 | Using where; Using index |
+----+-------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+
Second query:
select
c.*
from
customers c
where c.customerId not in (select distinct customerId from orders);
Results in an execution plan like this:
+----+--------------------+--------+----------------+---------------+-------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------+----------------+---------------+-------------+---------+------+------+--------------------------+
| 1 | PRIMARY | c | index | NULL | PRIMARY | 4 | NULL | 250 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | orders | index_subquery | fk_customer | fk_customer | 5 | func | 2 | Using index |
+----+--------------------+--------+----------------+---------------+-------------+---------+------+------+--------------------------+
Third query:
select
c.*
from
customers c
where not exists (select 1 from orders o where o.customerId = c.customerId);
Results in an execution plan like this:
+----+--------------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+
| 1 | PRIMARY | c | index | NULL | PRIMARY | 4 | NULL | 250 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | o | ref | fk_customer | fk_customer | 5 | wtf.c.customerId | 1 | Using where; Using index |
+----+--------------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+
We can see in all execution plans, that the customers table is read as a whole, but from the index (the implicit one as the only column is primary key). This may change, when you select other columns from the table, that are not in an index.
The first one seems to be the best. For each row in customers only one row in orders is read. The id column suggests, that MySQL can do this in one step, as only indexes are involved.
The second query seems to be the worst (though all 3 queries shouldn't perform too bad). For each row in customers the subquery is executed (the select_type column tells this).
The third query is not much different in that it uses a dependent subquery, but should perform better than the second query. Explaining the small differences would lead to far now. If you're interested, here's the manual page that explains what each column and their values mean here: EXPLAIN output
Finally: I'd say, that the first query will perform best, but as always, in the end one has to measure, to measure and to measure.
I can thing of two other ways to write this query:
SELECT C.*
FROM Customer C
LEFT OUTER JOIN SalesOrder SO ON C.CustomerID = SO.CustomerID
WHERE SO.CustomerID IS NULL
SELECT C.*
FROM Customer C
WHERE NOT C.CustomerID IN(SELECT CustomerID FROM SalesOrder)
The solutions involving outer joins will perform better than a solution using NOT IN.