Union all data unmanaged - sql
Forgive me if this question already ask,not much of db guy here ,
here is what i tried,
select row_number() over (partition by name order by challanto_date) , *
from (
select
rma,
p.id,
p.name,
challanto_date,
CURRENT_TIMESTAMP as fromDate
from challan_to_vendor cv
left join challan_to_vendor_detail cvd on cv.id = cvd.challan_to_vendor_id
inner join main_product p on p.id = cvd.product_id
union all
select
rma,
p.id,
p.name,
CURRENT_TIMESTAMP as toDate,
challan_date
from challan_from_vendor cv
left join challan_from_vendor_detail cvd on cv.id = cvd.challan_from_vendor_id
inner join main_product p on p.id = cvd.product_id
) as a
Here is my create table script :
challan_from_vendor
CREATE TABLE public.challan_from_vendor
(
id character varying NOT NULL,
date_ad date,
rma integer DEFAULT 1,
CONSTRAINT psk PRIMARY KEY (id)
)
challan_from_vendor_detail
CREATE TABLE public.challan_from_vendor_detail
(
id character varying NOT NULL,
challan_from_id character varying,
product_id character varying,
CONSTRAINT psks PRIMARY KEY (id),
CONSTRAINT fsks FOREIGN KEY (challan_from_id)
REFERENCES public.challan_from_vendor (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
challan_to_vendor;
CREATE TABLE public.challan_to_vendor
(
id character varying NOT NULL,
date_ad date,
rma integer DEFAULT 1,
CONSTRAINT pk PRIMARY KEY (id)
)
challan_to_vendor_detail
CREATE TABLE public.challan_to_vendor_detail
(
id character varying NOT NULL,
challan_to_id character varying,
product_id character varying,
CONSTRAINT pks PRIMARY KEY (id),
CONSTRAINT fks FOREIGN KEY (challan_to_id)
REFERENCES public.challan_to_vendor (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
product
CREATE TABLE public.product
(
id character varying NOT NULL,
product_name character varying,
CONSTRAINT pks PRIMARY KEY (id)
)
Here is my table structures and desire output.
challan_from_vendor
| id | rma | date |
|:-----------|------------:|:------------:|
| 12012 | 0001 | 2018-11-10
| 123121 | 0001 | 2018-11-11
challan_to_vendor
| id | rma | date |
|:-----------|------------:|:------------:|
| 12 | 0001 | 2018-12-10
| 123 | 0001 | 2018-12-11
challan_from_vendor_detail
| id | challan_from_vendor_id | product_id |
|:-----------|------------:|:------------:|
| 121 | 12012 | 121313
| 1213 | 12012 | 131381
challan_to_vendor_detail
challan_from_vendor_detail
| id | challan_to_vendor_id | product_id |
|:-----------|------------------------|:------------:|
| 121 | 12 | 121313
| 1213 | 123 | 131381
product
| id | product_name |
|:-----------|------------:|
| 191313 | apple |
| 89113 | banana |
Output
| ram | product_id | challan_from_date | challan_to_date|
|:-----------|------------:|:-----------------:|:--------------:|
| 0001 | 191313| 2018-11-10 |2018-11-11 |
| 0001 | 89113 | 2018-12-10 |2018-12-11 |
There is some strange things in the query you have tried so it is not clear what tables, how they are related or what the columns are in those tables.
So by some guessing I give you this to start of with:
select
main_product.*,
challan_to_vendor.toDate,
challan_from_vendor.fromDate
from main_product
join challan_to_vendor using(product_id)
join challan_from_vendor using(product_id)
If you explain more about your db an what you want out of it I might be able to help you more.
Edit: So I could not run your create statements in my db since there was naming conflicts among other minor things. Here is some advice on the create process that I find useful:
Let the id's be integer instead of character varying otherwise it is probably a name-column and should not be named id. You also used integer-id's in your examples.
Use SERIAL PRIMARY KEY (see tutorial) to help you with the key creation. This also removes the naming-conflict since the constraints are given implicit unique names.
Use the same column-name for the same thing in all places to avoid confusion by having multiple things called id after a join plus that it simplify's the join. So for example the id of the product should be product_id in all places, that way you could use using(product_id) as your join condition.
So with the advises given above here's how I would create one of your table and then query them:
CREATE TABLE public.challan_to_vendor_detail
(
challan_to_vendor_detail_id SERIAL PRIMARY KEY,
challan_to_vendor_id integer,
product_id integer,
CONSTRAINT fks FOREIGN KEY (challan_to_vendor_id)
REFERENCES public.challan_to_vendor (challan_to_vendor_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
);
select
product_name,
challan_to_vendor.date_ad as date_to,
challan_from_vendor.date_ad as date_from
from product
join challan_to_vendor_detail using(product_id)
join challan_to_vendor using(challan_to_vendor_id)
join challan_from_vendor_detail using(product_id)
join challan_from_vendor using(challan_from_vendor_id)
Unfortunately the overall db-design does not make sense to me so I do not know if this is what you expect.
Good luck!
Related
SQL N:M query merging results by condition flag in intermediate table
[First of all, if this is a duplicate, sorry, I couldn't find a response for this, as this is a strange solution for a limitation on an ORM and I'm clearly a noobie on SQL] Domain requirements: A brigades must be composed by one user (the commissar one) and, optionally, one and only one assistant (1:1) A user can only be part of one brigade (1:1) CREATE TABLE Users ( id SERIAL PRIMARY KEY, username VARCHAR(100) NOT NULL UNIQUE, password VARCHAR(100) NOT NULL ); CREATE TABLE Brigades ( id SERIAL PRIMARY KEY, name VARCHAR(100) NOT NULL ); -- N:M relationship with a flag inside which determine if that user is a commissar or not CREATE TABLE Brigade_User ( brigade_id INT NOT NULL REFERENCES Brigades(id) ON DELETE CASCADE ON UPDATE CASCADE, user_id INT NOT NULL REFERENCES Users(id) ON DELETE CASCADE ON UPDATE CASCADE, is_commissar BOOLEAN NOT NULL PRIMARY KEY(brigade_id, user_id) ); Ideally, as relations are 1:1, Brigade_User intermediate table could be erased and a Brigade table with two foreign keys could be created instead (this is not supported by Diesel Rust ORM, so I think I'm coupled to first approach) CREATE TABLE Brigades ( id SERIAL PRIMARY KEY, name VARCHAR(100) NOT NULL -- 1:1 commisar_id INT NOT NULL REFERENCES Users(id) ON DELETE CASCADE ON UPDATE CASCADE, -- 1:1 assistant_id INT NOT NULL REFERENCES Users(id) ON DELETE CASCADE ON UPDATE CASCADE ); An example... > SELECT * FROM brigade_user LEFT JOIN brigades ON brigade_user.brigade_id = brigades.id; brigade_id | user_id | is_commissar | id | name ------------+---------+--------------+----+------------------ 1 | 1 | t | 1 | Patrulla gatuna 1 | 2 | f | 1 | Patrulla gatuna 2 | 3 | t | 2 | Patrulla perruna 2 | 4 | f | 2 | Patrulla perruna 3 | 6 | t | 3 | Patrulla canina 3 | 5 | f | 3 | Patrulla canina (4 rows) Is it possible to make a query which returns a table like this? brigade_id | commissar_id | assistant_id | name -----------+--------------+--------------+-------------------- 1 | 1 | 2 | Patrulla gatuna 2 | 3 | 4 | Patrulla perruna 3 | 6 | 5 | Patrulla canina See that each two rows have been merged into one (remember, a brigade is composed by one commissary and, optionally, one assistant) depending on the flag. Could this model be improved (having in mind the limitation on multiple foreign keys referencing the same table, discussed here)
Try the following: with cte as ( SELECT A.brigade_id,A.user_id,A.is_commissar,B.name FROM brigade_user A LEFT JOIN brigades B ON A.brigade_id = B.id ) select C1.brigade_id, C1.user_id as commissar_id , C2.user_id as assistant_id, C1.name from cte C1 left join cte C2 on C1.brigade_id=C2.brigade_id and C1.user_id<>C2.user_id where C1.is_commissar=true See a demo from here.
How to represent order content in SQL [closed]
Closed. This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 10 months ago. Improve this question I am creating a pizza shipping website, and I need to represent the orders and their content somewhere in the database. The problem is, I have absolutely no idea how to store items and their quantities. My first thought is to create an order_content table, which contains the order's id as a foreign key, and has two extra columns : item id and quantity. Another problem : I have multiple types of items : pizzas, drinks, extras etc... So id's aren't unique across categories. I can't just say, for example in order_content item_id 1, quantity 1, because item_id 1 can mean drink with id 1, pizza with id 1, etc... Another big problem : I have a custom pizza which can have 3 to 6 custom ingredients. I have ingredients in a table with their unique id's... How can I represent this custom pizza in orders ? Thank you PS : I am fairly a beginner in SQL - Relational databases
Here is a schema that you should experiment with and modify to suit your needs. create table clients( id serial, name varchar(25) not null, email varchar(25) not null, telephone varchar(25), constraint pk_clients_id primary key (id), constraint uq_clients_email unique(email) ); create table items( id serial, name varchar(25), price decimal(5,2), constraint pk_items_id primary key (id) ); create table orders( id serial, client_id int, constraint pk_orders_id primary key (id), constraint fk_orders_client foreign key (client_id) references clients(id) ); create table toppings( id serial, name varchar(25) not null, constraint pk_topping primary key (id)); create table order_details( order_id int, item_id int, quantity int, topping_1 int, topping_2 int, topping_3 int, topping_4 int, topping_5 int, topping_6 int, constraint fk_order_details_order_id foreign key (order_id) references orders(id), constraint fk_custom_pizza_id foreign key (item_id) references items(id), constraint pk_order_details_order_item_ids primary key(order_id, item_id) ); ✓ ✓ ✓ ✓ ✓ insert into clients (name, email, telephone) values ('Andrew','andrew#gmail.com','0123456789'); insert into items (name, price) values('custom pizza','20.00'),('1.5 litre coca-cola',5); insert into toppings (name) values('mozzarella'),('parma ham'),('mushrooms'),('olives'),('red peppers'),('salmon'); 1 rows affected 2 rows affected 6 rows affected with order_number as (insert into orders (client_id) values (1) returning id) insert into order_details select order_number.id,1,1,1,2,3,4,5,6 from order_number union all select order_number.id,2,1,null,null,null,null,null,null from order_number; 2 rows affected select c.id, c.name, o.id order_number, od.item_id, i.name, i.price, od.quantity * i.price as line_total, t1.name topping_1, t2.name topping_2, t3.name topping_3, t4.name topping_4, t5.name topping_5, t6.name topping_6 from clients c left join orders o on c.id = o.client_id left join order_details od on o.id = od.order_id left join items i on od.item_id = i.id left join toppings t1 on od.topping_1 = t1.id left join toppings t2 on od.topping_2 = t2.id left join toppings t3 on od.topping_3 = t3.id left join toppings t4 on od.topping_4 = t4.id left join toppings t5 on od.topping_5 = t5.id left join toppings t6 on od.topping_6 = t6.id id | name | order_number | item_id | name | price | line_total | topping_1 | topping_2 | topping_3 | topping_4 | topping_5 | topping_6 -: | :----- | -----------: | ------: | :------------------ | ----: | ---------: | :--------- | :-------- | :-------- | :-------- | :---------- | :-------- 1 | Andrew | 1 | 1 | custom pizza | 20.00 | 20.00 | mozzarella | parma ham | mushrooms | olives | red peppers | salmon 1 | Andrew | 1 | 2 | 1.5 litre coca-cola | 5.00 | 5.00 | null | null | null | null | null | null db<>fiddle here
In general, storage systems of shopping sites are using relations between an order and an order line. You could organize your DB like this and answer your problems order order_line product ingredient ingredient_for_product id order_id product_id ingredient_id ingredient_id ... quantity current_unit_price name order_line_id unit_price product_type additional_price quantity product_id product is an abstract concept that holds all sold products by your company. If you need to be more precise, you can either add fields that will be completed depending on the value of product_type or create another table with a one-on-one relationship with the product table. you have a unit_price in the order_line table and a current_unit_price in the product table. This has two uses : if you change the price of your product a posteriori, you will still keep the price your customer paid it allows you to store a price which is different from your registered "current_unit_price". For example, adding the value of supplementary ingredients
You might want to take a look at the fact table and dimension table definitions. Once this concept is understood it will be clear for you the the order table will be a fact table, and tables such as ingredients will be a dimension table.
Finding all entries with no new reference in another table within last two years
I have the following three tables: CREATE TABLE group ( id SERIAL PRIMARY KEY, name VARCHAR NOT NULL, insert_date TIMESTAMP WITH TIME ZONE NOT NULL ); CREATE TABLE customer ( id SERIAL PRIMARY KEY, ext_id VARCHAR NOT NULL, insert_date TIMESTAMP WITH TIME ZONE NOT NULL ); CREATE TABLE customer_in_group ( id SERIAL PRIMARY KEY, customer_id INT NOT NULL, group_id INT NOT NULL, insert_date TIMESTAMP WITH TIME ZONE NOT NULL, CONSTRAINT customer_id_fk FOREIGN KEY(customer_id) REFERENCES customer(id), CONSTRAINT group_id_fk FOREIGN KEY(group_id) REFERENCES group(id) ) I need to find all of the groups which have not had any customer_in_group entities' group_id column reference them within the last two years. I then plan to delete all of the customer_in_groups that reference them, and finally delete that group after finding them. So basically given the following two groups and the following 3 customer_in_groups Group | id | name | insert_date | |----|--------|--------------------------| | 1 | group1 | 2011-10-05T14:48:00.000Z | | 2 | group2 | 2011-10-05T14:48:00.000Z | Customer In Group | id | group_id | customer_id | insert_date | |----|----------|-------------|--------------------------| | 1 | 1 | 1 | 2011-10-05T14:48:00.000Z | | 2 | 1 | 1 | 2020-10-05T14:48:00.000Z | | 3 | 2 | 1 | 2011-10-05T14:48:00.000Z | I would expect just to get back group2, since group1 has a customer_in_group referencing it inserted in the last two years. I am not sure how I would write the query that would find all of these groups.
As a starter, I would recommend enabling on delete cascade on foreing keys of customer_in_group. Then, you can just delete the rows you want from groups, and it will drop the dependent rows in the child table. For this, you can use not exists: delete from groups g where not exists ( select 1 from customer_in_group cig where cig.group_id = g.id and cig.insert_date >= now() - interval '2 year' )
How to add foreign key constraint to Table A (id, type) referencing either of two tables Table B (id, type) or Table C (id, type)?
I'm looking to use two columns in Table A as foreign keys for either one of two tables: Table B or Table C. Using columns table_a.item_id and table_a.item_type_id, I want to force any new rows to either have a matching item_id and item_type_id in Table B or Table C. Example: Table A: Inventory +---------+--------------+-------+ | item_id | item_type_id | count | +---------+--------------+-------+ | 2 | 1 | 32 | | 3 | 1 | 24 | | 1 | 2 | 10 | +---------+--------------+-------+ Table B: Recipes +----+--------------+-------------------+-------------+----------------------+ | id | item_type_id | name | consistency | gram_to_fluid_ounces | +----+--------------+-------------------+-------------+----------------------+ | 1 | 1 | Delicious Juice | thin | .0048472 | | 2 | 1 | Ok Tasting Juice | thin | .0057263 | | 3 | 1 | Protein Smoothie | heavy | .0049847 | +----+--------------+-------------------+-------------+----------------------+ Table C: Products +----+--------------+----------+--------+----------+----------+ | id | item_type_id | name | price | in_stock | is_taxed | +----+--------------+----------+--------+----------+----------+ | 1 | 2 | Purse | $200 | TRUE | TRUE | | 2 | 2 | Notebook | $14.99 | TRUE | TRUE | | 3 | 2 | Computer | $1,099 | FALSE | TRUE | +----+--------------+----------+--------+----------+----------+ Other Table: Item_Types +----+-----------+ | id | type_name | +----+-----------+ | 1 | recipes | | 2 | products | +----+-----------+ I want to be able to have an inventory table where employees can enter inventory counts regardless of whether an item is a recipe or a product. I don't want to have to have a product_inventory and recipe_inventory table as there are many operations I need to do across all inventory items regardless of item types. One solution would be to create a reference table like so: Table CD: Items +---------+--------------+------------+-----------+ | item_id | item_type_id | product_id | recipe_id | +---------+--------------+------------+-----------+ | 2 | 1 | NULL | 2 | | 3 | 1 | NULL | 3 | | 1 | 2 | 1 | NULL | +---------+--------------+------------+-----------+ It just seems very cumbersome, plus I'd now need to add/remove products/recipes from this new table whenever they are added/removed from their respective tables. (Is there an automatic way to achieve this?) CREATE TABLE [dbo].[inventory] ( [id] [bigint] IDENTITY(1,1) NOT NULL, [item_id] [smallint] NOT NULL, [item_type_id] [tinyint] NOT NULL, [count] [float] NOT NULL, CONSTRAINT [PK_inventory_id] PRIMARY KEY CLUSTERED ([id] ASC) ) ON [PRIMARY] What I would really like to do is something like this... ALTER TABLE [inventory] ADD CONSTRAINT [FK_inventory_sources] FOREIGN KEY ([item_id],[item_type_id]) REFERENCES {[products] ([id],[item_type_id]) OR [recipes] ([id],[item_type_id])} Maybe there is no solution as I'm describing it, so if you have any ideas where I can maintain the same/similar schema, I'm definitely open to hearing them! Thanks :)
Since your products and recipes are stored separately, and appear to mostly have separate columns, then separate inventory tables is probably the correct approach. e.g. CREATE TABLE dbo.ProductInventory ( Product_id INT NOT NULL, [count] INT NOT NULL, CONSTRAINT FK_ProductInventory__Product_id FOREIGN KEY (Product_id) REFERENCES dbo.Product (Product_id) ); CREATE TABLE dbo.RecipeInventory ( Recipe_id INT NOT NULL, [count] INT NOT NULL, CONSTRAINT FK_RecipeInventory__Recipe_id FOREIGN KEY (Recipe_id) REFERENCES dbo.Recipe (Recipe_id ) ); If you need all types combined, you can simply use a view: CREATE VIEW dbo.Inventory AS SELECT Product_id AS item_id, 2 AS item_type_id, [Count] FROM ProductInventory UNION ALL SELECT recipe_id AS item_id, 1 AS item_type_id [Count] FROM RecipeInventory; GO IF you create a new item_type, then you need to amend the DB design anyway to create a new table, so you would just need to amend the view at the same time Another possibility, would be to have a single Items table, and then have Products/Recipes reference this. So you start with your items table, each of which has a unique ID: CREATE TABLE dbo.Items ( item_id INT IDENTITY(1, 1) NOT NULL Item_type_id INT NOT NULL, CONSTRAINT PK_Items__ItemID PRIMARY KEY (item_id), CONSTRAINT FK_Items__Item_Type_ID FOREIGN KEY (Item_Type_ID) REFERENCES Item_Type (Item_Type_ID), CONSTRAINT UQ_Items__ItemID_ItemTypeID UNIQUE (Item_ID, Item_type_id) ); Note the unique key added on (item_id, item_type_id), this is important for referential integrity later on. Then each of your sub tables has a 1:1 relationship with this, so your product table would become: CREATE TABLE dbo.Products ( item_id BIGINT NOT NULL, Item_type_id AS 2, name VARCHAR(50) NOT NULL, Price DECIMAL(10, 4) NOT NULL, InStock BIT NOT NULL, CONSTRAINT PK_Products__ItemID PRIMARY KEY (item_id), CONSTRAINT FK_Products__Item_Type_ID FOREIGN KEY (Item_Type_ID) REFERENCES Item_Type (Item_Type_ID), CONSTRAINT FK_Products__ItemID_ItemTypeID FOREIGN KEY (item_id, Item_Type_ID) REFERENCES dbo.Item (item_id, item_type_id) ); A few things to note: item_id is again the primary key, ensuring the 1:1 relationship. the computed column item_type_id (as 2) ensuring all item_type_id's are set to 2. This is key as it allows a foreign key constraint to be added the foreign key on (item_id, item_type_id) back to the items table. This ensures that you can only insert a record to the product table, if the original record in the items table has an item_type_id of 2. A third option would be a single table for recipes and products and make any columns not required for both nullable. This answer on types of inheritance is well worth a read.
I think there is a flaw in your database design. The best way to solve your actual problem, is to have Recipies and products as one single table. Right now you have a redundant column in each table called item_type_id. That column is not worth anything, unless you actually have the items in the same table. I say redundant, because it has the same value for absolutely every entry in each table. You have two options. If you can not change the database design, work without foreign keys, and make the logic layer select from the correct tables. Or, if you can change the database design, make products and recipies exist in the same table. You already have a item_type table, which can identify item categorization, so it makes sense to put all items in the same table
you can only add one constraint for a column or pair of columns. Think about apples and oranges. A column cannot refer to both oranges and apples. It must be either orange or apple. As a side note, this can be somehow achieved with PERSISTED COMPUTED columns, however It only introduces overhead and complexity. Check This for Reference
You can add some computed columns to the Inventory table: ALTER TABLE Inventory ADD _recipe_item_id AS CASE WHEN item_type_id = 1 THEN item_id END persisted ALTER TABLE Inventory ADD _product_item_id AS CASE WHEN item_type_id = 2 THEN item_id END persisted You can then add two separate foreign keys to the two tables, using those two columns instead of item_id. I'm assuming the item_type_id column in those two tables is already computed/constraint appropriately but if not you may want to consider that too. Because these computed columns are NULL when the wrong type is selected, and because SQL Server doesn't check FK constraints if at least one column value is NULL, they can both exist and only one or the other will be satisfied at any time.
SQL Query 2 tables null results
I was asked this question in an interview: From the 2 tables below, write a query to pull customers with no sales orders. How many ways to write this query and which would have best performance. Table 1: Customer - CustomerID Table 2: SalesOrder - OrderID, CustomerID, OrderDate Query: SELECT * FROM Customer C RIGHT OUTER JOIN SalesOrder SO ON C.CustomerID = SO.CustomerID WHERE SO.OrderID = NULL Is my query correct and are there other ways to write the query and get the same results?
Answering for MySQL instead of SQL Server, cause you tagged it later with SQL Server, so I thought (since this was an interview question, that it wouldn't bother you, for which DBMS this is). Note though, that the queries I wrote are standard sql, they should run in every RDBMS out there. How each RDBMS handles those queries is another issue, though. I wrote this little procedure for you, to have a test case. It creates the tables customers and orders like you specified and I added primary keys and foreign keys, like one would usually do it. No other indexes, as every column worth indexing here is already primary key. 250 customers are created, 100 of them made an order (though out of convenience none of them twice / multiple times). A dump of the data follows, posted the script just in case you want to play around a little by increasing the numbers. delimiter $$ create procedure fill_table() begin create table customers(customerId int primary key) engine=innodb; set #x = 1; while (#x <= 250) do insert into customers values(#x); set #x := #x + 1; end while; create table orders(orderId int auto_increment primary key, customerId int, orderDate timestamp, foreign key fk_customer (customerId) references customers(customerId) ) engine=innodb; insert into orders(customerId, orderDate) select customerId, now() - interval customerId day from customers order by rand() limit 100; end $$ delimiter ; call fill_table(); For me, this resulted in this: CREATE TABLE `customers` ( `customerId` int(11) NOT NULL, PRIMARY KEY (`customerId`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; INSERT INTO `customers` VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),(50),(51),(52),(53),(54),(55),(56),(57),(58),(59),(60),(61),(62),(63),(64),(65),(66),(67),(68),(69),(70),(71),(72),(73),(74),(75),(76),(77),(78),(79),(80),(81),(82),(83),(84),(85),(86),(87),(88),(89),(90),(91),(92),(93),(94),(95),(96),(97),(98),(99),(100),(101),(102),(103),(104),(105),(106),(107),(108),(109),(110),(111),(112),(113),(114),(115),(116),(117),(118),(119),(120),(121),(122),(123),(124),(125),(126),(127),(128),(129),(130),(131),(132),(133),(134),(135),(136),(137),(138),(139),(140),(141),(142),(143),(144),(145),(146),(147),(148),(149),(150),(151),(152),(153),(154),(155),(156),(157),(158),(159),(160),(161),(162),(163),(164),(165),(166),(167),(168),(169),(170),(171),(172),(173),(174),(175),(176),(177),(178),(179),(180),(181),(182),(183),(184),(185),(186),(187),(188),(189),(190),(191),(192),(193),(194),(195),(196),(197),(198),(199),(200),(201),(202),(203),(204),(205),(206),(207),(208),(209),(210),(211),(212),(213),(214),(215),(216),(217),(218),(219),(220),(221),(222),(223),(224),(225),(226),(227),(228),(229),(230),(231),(232),(233),(234),(235),(236),(237),(238),(239),(240),(241),(242),(243),(244),(245),(246),(247),(248),(249),(250); CREATE TABLE `orders` ( `orderId` int(11) NOT NULL AUTO_INCREMENT, `customerId` int(11) DEFAULT NULL, `orderDate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`orderId`), KEY `fk_customer` (`customerId`), CONSTRAINT `orders_ibfk_1` FOREIGN KEY (`customerId`) REFERENCES `customers` (`customerId`) ) ENGINE=InnoDB AUTO_INCREMENT=128 DEFAULT CHARSET=utf8; INSERT INTO `orders` VALUES (1,247,'2013-06-24 19:50:07'),(2,217,'2013-07-24 19:50:07'),(3,8,'2014-02-18 20:50:07'),(4,40,'2014-01-17 20:50:07'),(5,52,'2014-01-05 20:50:07'),(6,80,'2013-12-08 20:50:07'),(7,169,'2013-09-10 19:50:07'),(8,135,'2013-10-14 19:50:07'),(9,115,'2013-11-03 20:50:07'),(10,225,'2013-07-16 19:50:07'),(11,112,'2013-11-06 20:50:07'),(12,243,'2013-06-28 19:50:07'),(13,158,'2013-09-21 19:50:07'),(14,24,'2014-02-02 20:50:07'),(15,214,'2013-07-27 19:50:07'),(16,25,'2014-02-01 20:50:07'),(17,245,'2013-06-26 19:50:07'),(18,182,'2013-08-28 19:50:07'),(19,166,'2013-09-13 19:50:07'),(20,69,'2013-12-19 20:50:07'),(21,85,'2013-12-03 20:50:07'),(22,44,'2014-01-13 20:50:07'),(23,103,'2013-11-15 20:50:07'),(24,19,'2014-02-07 20:50:07'),(25,33,'2014-01-24 20:50:07'),(26,102,'2013-11-16 20:50:07'),(27,41,'2014-01-16 20:50:07'),(28,94,'2013-11-24 20:50:07'),(29,43,'2014-01-14 20:50:07'),(30,150,'2013-09-29 19:50:07'),(31,218,'2013-07-23 19:50:07'),(32,131,'2013-10-18 19:50:07'),(33,77,'2013-12-11 20:50:07'),(34,2,'2014-02-24 20:50:07'),(35,45,'2014-01-12 20:50:07'),(36,230,'2013-07-11 19:50:07'),(37,101,'2013-11-17 20:50:07'),(38,31,'2014-01-26 20:50:07'),(39,56,'2014-01-01 20:50:07'),(40,176,'2013-09-03 19:50:07'),(41,223,'2013-07-18 19:50:07'),(42,145,'2013-10-04 19:50:07'),(43,26,'2014-01-31 20:50:07'),(44,62,'2013-12-26 20:50:07'),(45,195,'2013-08-15 19:50:07'),(46,153,'2013-09-26 19:50:07'),(47,179,'2013-08-31 19:50:07'),(48,104,'2013-11-14 20:50:07'),(49,7,'2014-02-19 20:50:07'),(50,209,'2013-08-01 19:50:07'),(51,86,'2013-12-02 20:50:07'),(52,110,'2013-11-08 20:50:07'),(53,204,'2013-08-06 19:50:07'),(54,187,'2013-08-23 19:50:07'),(55,114,'2013-11-04 20:50:07'),(56,38,'2014-01-19 20:50:07'),(57,236,'2013-07-05 19:50:07'),(58,79,'2013-12-09 20:50:07'),(59,96,'2013-11-22 20:50:07'),(60,37,'2014-01-20 20:50:07'),(61,207,'2013-08-03 19:50:07'),(62,22,'2014-02-04 20:50:07'),(63,120,'2013-10-29 20:50:07'),(64,200,'2013-08-10 19:50:07'),(65,51,'2014-01-06 20:50:07'),(66,181,'2013-08-29 19:50:07'),(67,4,'2014-02-22 20:50:07'),(68,123,'2013-10-26 19:50:07'),(69,108,'2013-11-10 20:50:07'),(70,55,'2014-01-02 20:50:07'),(71,76,'2013-12-12 20:50:07'),(72,6,'2014-02-20 20:50:07'),(73,18,'2014-02-08 20:50:07'),(74,211,'2013-07-30 19:50:07'),(75,53,'2014-01-04 20:50:07'),(76,216,'2013-07-25 19:50:07'),(77,32,'2014-01-25 20:50:07'),(78,74,'2013-12-14 20:50:07'),(79,138,'2013-10-11 19:50:07'),(80,197,'2013-08-13 19:50:07'),(81,221,'2013-07-20 19:50:07'),(82,118,'2013-10-31 20:50:07'),(83,61,'2013-12-27 20:50:07'),(84,28,'2014-01-29 20:50:07'),(85,16,'2014-02-10 20:50:07'),(86,39,'2014-01-18 20:50:07'),(87,3,'2014-02-23 20:50:07'),(88,46,'2014-01-11 20:50:07'),(89,189,'2013-08-21 19:50:07'),(90,59,'2013-12-29 20:50:07'),(91,249,'2013-06-22 19:50:07'),(92,127,'2013-10-22 19:50:07'),(93,47,'2014-01-10 20:50:07'),(94,178,'2013-09-01 19:50:07'),(95,141,'2013-10-08 19:50:07'),(96,188,'2013-08-22 19:50:07'),(97,220,'2013-07-21 19:50:07'),(98,15,'2014-02-11 20:50:07'),(99,175,'2013-09-04 19:50:07'),(100,206,'2013-08-04 19:50:07'); Okay, now to the queries. Three ways came to my mind, I omitted the right join that MDiesel did, because it's actually just another way of writing left join. It was invented for lazy sql developers, that don't want to switch table names, but instead just rewrite one word. Anyway, first query: select c.* from customers c left join orders o on c.customerId = o.customerId where o.customerId is null; Results in an execution plan like this: +----+-------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+ | 1 | SIMPLE | c | index | NULL | PRIMARY | 4 | NULL | 250 | Using index | | 1 | SIMPLE | o | ref | fk_customer | fk_customer | 5 | wtf.c.customerId | 1 | Using where; Using index | +----+-------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+ Second query: select c.* from customers c where c.customerId not in (select distinct customerId from orders); Results in an execution plan like this: +----+--------------------+--------+----------------+---------------+-------------+---------+------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+--------+----------------+---------------+-------------+---------+------+------+--------------------------+ | 1 | PRIMARY | c | index | NULL | PRIMARY | 4 | NULL | 250 | Using where; Using index | | 2 | DEPENDENT SUBQUERY | orders | index_subquery | fk_customer | fk_customer | 5 | func | 2 | Using index | +----+--------------------+--------+----------------+---------------+-------------+---------+------+------+--------------------------+ Third query: select c.* from customers c where not exists (select 1 from orders o where o.customerId = c.customerId); Results in an execution plan like this: +----+--------------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+ | 1 | PRIMARY | c | index | NULL | PRIMARY | 4 | NULL | 250 | Using where; Using index | | 2 | DEPENDENT SUBQUERY | o | ref | fk_customer | fk_customer | 5 | wtf.c.customerId | 1 | Using where; Using index | +----+--------------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+ We can see in all execution plans, that the customers table is read as a whole, but from the index (the implicit one as the only column is primary key). This may change, when you select other columns from the table, that are not in an index. The first one seems to be the best. For each row in customers only one row in orders is read. The id column suggests, that MySQL can do this in one step, as only indexes are involved. The second query seems to be the worst (though all 3 queries shouldn't perform too bad). For each row in customers the subquery is executed (the select_type column tells this). The third query is not much different in that it uses a dependent subquery, but should perform better than the second query. Explaining the small differences would lead to far now. If you're interested, here's the manual page that explains what each column and their values mean here: EXPLAIN output Finally: I'd say, that the first query will perform best, but as always, in the end one has to measure, to measure and to measure.
I can thing of two other ways to write this query: SELECT C.* FROM Customer C LEFT OUTER JOIN SalesOrder SO ON C.CustomerID = SO.CustomerID WHERE SO.CustomerID IS NULL SELECT C.* FROM Customer C WHERE NOT C.CustomerID IN(SELECT CustomerID FROM SalesOrder)
The solutions involving outer joins will perform better than a solution using NOT IN.