How to represent order content in SQL [closed] - sql
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 10 months ago.
Improve this question
I am creating a pizza shipping website, and I need to represent the orders and their content somewhere in the database.
The problem is, I have absolutely no idea how to store items and their quantities.
My first thought is to create an order_content table, which contains the order's id as a foreign key, and has two extra columns : item id and quantity.
Another problem : I have multiple types of items : pizzas, drinks, extras etc... So id's aren't unique across categories. I can't just say, for example in order_content item_id 1, quantity 1, because item_id 1 can mean drink with id 1, pizza with id 1, etc...
Another big problem : I have a custom pizza which can have 3 to 6 custom ingredients. I have ingredients in a table with their unique id's... How can I represent this custom pizza in orders ?
Thank you
PS : I am fairly a beginner in SQL - Relational databases
Here is a schema that you should experiment with and modify to suit your needs.
create table clients(
id serial,
name varchar(25) not null,
email varchar(25) not null,
telephone varchar(25),
constraint pk_clients_id primary key (id),
constraint uq_clients_email unique(email)
);
create table items(
id serial,
name varchar(25),
price decimal(5,2),
constraint pk_items_id primary key (id)
);
create table orders(
id serial,
client_id int,
constraint pk_orders_id primary key (id),
constraint fk_orders_client foreign key (client_id) references clients(id)
);
create table toppings(
id serial,
name varchar(25) not null,
constraint pk_topping primary key (id));
create table order_details(
order_id int,
item_id int,
quantity int,
topping_1 int,
topping_2 int,
topping_3 int,
topping_4 int,
topping_5 int,
topping_6 int,
constraint fk_order_details_order_id foreign key (order_id) references orders(id),
constraint fk_custom_pizza_id foreign key (item_id) references items(id),
constraint pk_order_details_order_item_ids primary key(order_id, item_id)
);
✓
✓
✓
✓
✓
insert into clients (name, email, telephone) values ('Andrew','andrew#gmail.com','0123456789');
insert into items (name, price) values('custom pizza','20.00'),('1.5 litre coca-cola',5);
insert into toppings (name) values('mozzarella'),('parma ham'),('mushrooms'),('olives'),('red peppers'),('salmon');
1 rows affected
2 rows affected
6 rows affected
with order_number as
(insert into orders (client_id) values (1)
returning id)
insert into order_details
select order_number.id,1,1,1,2,3,4,5,6 from order_number
union all
select order_number.id,2,1,null,null,null,null,null,null from order_number;
2 rows affected
select
c.id,
c.name,
o.id order_number,
od.item_id,
i.name,
i.price,
od.quantity * i.price as line_total,
t1.name topping_1,
t2.name topping_2,
t3.name topping_3,
t4.name topping_4,
t5.name topping_5,
t6.name topping_6
from clients c
left join orders o on c.id = o.client_id
left join order_details od on o.id = od.order_id
left join items i on od.item_id = i.id
left join toppings t1 on od.topping_1 = t1.id
left join toppings t2 on od.topping_2 = t2.id
left join toppings t3 on od.topping_3 = t3.id
left join toppings t4 on od.topping_4 = t4.id
left join toppings t5 on od.topping_5 = t5.id
left join toppings t6 on od.topping_6 = t6.id
id | name | order_number | item_id | name | price | line_total | topping_1 | topping_2 | topping_3 | topping_4 | topping_5 | topping_6
-: | :----- | -----------: | ------: | :------------------ | ----: | ---------: | :--------- | :-------- | :-------- | :-------- | :---------- | :--------
1 | Andrew | 1 | 1 | custom pizza | 20.00 | 20.00 | mozzarella | parma ham | mushrooms | olives | red peppers | salmon
1 | Andrew | 1 | 2 | 1.5 litre coca-cola | 5.00 | 5.00 | null | null | null | null | null | null
db<>fiddle here
In general, storage systems of shopping sites are using relations between an order and an order line.
You could organize your DB like this and answer your problems
order
order_line
product
ingredient
ingredient_for_product
id
order_id
product_id
ingredient_id
ingredient_id
...
quantity
current_unit_price
name
order_line_id
unit_price
product_type
additional_price
quantity
product_id
product is an abstract concept that holds all sold products by your company. If you need to be more precise, you can either add fields that will be completed depending on the value of product_type or create another table with a one-on-one relationship with the product table.
you have a unit_price in the order_line table and a current_unit_price in the product table.
This has two uses :
if you change the price of your product a posteriori, you will still keep the price your customer paid
it allows you to store a price which is different from your registered "current_unit_price". For example, adding the value of supplementary ingredients
You might want to take a look at the fact table and dimension table definitions. Once this concept is understood it will be clear for you the the order table will be a fact table, and tables such as ingredients will be a dimension table.
Related
How to add foreign key constraint to Table A (id, type) referencing either of two tables Table B (id, type) or Table C (id, type)?
I'm looking to use two columns in Table A as foreign keys for either one of two tables: Table B or Table C. Using columns table_a.item_id and table_a.item_type_id, I want to force any new rows to either have a matching item_id and item_type_id in Table B or Table C. Example: Table A: Inventory +---------+--------------+-------+ | item_id | item_type_id | count | +---------+--------------+-------+ | 2 | 1 | 32 | | 3 | 1 | 24 | | 1 | 2 | 10 | +---------+--------------+-------+ Table B: Recipes +----+--------------+-------------------+-------------+----------------------+ | id | item_type_id | name | consistency | gram_to_fluid_ounces | +----+--------------+-------------------+-------------+----------------------+ | 1 | 1 | Delicious Juice | thin | .0048472 | | 2 | 1 | Ok Tasting Juice | thin | .0057263 | | 3 | 1 | Protein Smoothie | heavy | .0049847 | +----+--------------+-------------------+-------------+----------------------+ Table C: Products +----+--------------+----------+--------+----------+----------+ | id | item_type_id | name | price | in_stock | is_taxed | +----+--------------+----------+--------+----------+----------+ | 1 | 2 | Purse | $200 | TRUE | TRUE | | 2 | 2 | Notebook | $14.99 | TRUE | TRUE | | 3 | 2 | Computer | $1,099 | FALSE | TRUE | +----+--------------+----------+--------+----------+----------+ Other Table: Item_Types +----+-----------+ | id | type_name | +----+-----------+ | 1 | recipes | | 2 | products | +----+-----------+ I want to be able to have an inventory table where employees can enter inventory counts regardless of whether an item is a recipe or a product. I don't want to have to have a product_inventory and recipe_inventory table as there are many operations I need to do across all inventory items regardless of item types. One solution would be to create a reference table like so: Table CD: Items +---------+--------------+------------+-----------+ | item_id | item_type_id | product_id | recipe_id | +---------+--------------+------------+-----------+ | 2 | 1 | NULL | 2 | | 3 | 1 | NULL | 3 | | 1 | 2 | 1 | NULL | +---------+--------------+------------+-----------+ It just seems very cumbersome, plus I'd now need to add/remove products/recipes from this new table whenever they are added/removed from their respective tables. (Is there an automatic way to achieve this?) CREATE TABLE [dbo].[inventory] ( [id] [bigint] IDENTITY(1,1) NOT NULL, [item_id] [smallint] NOT NULL, [item_type_id] [tinyint] NOT NULL, [count] [float] NOT NULL, CONSTRAINT [PK_inventory_id] PRIMARY KEY CLUSTERED ([id] ASC) ) ON [PRIMARY] What I would really like to do is something like this... ALTER TABLE [inventory] ADD CONSTRAINT [FK_inventory_sources] FOREIGN KEY ([item_id],[item_type_id]) REFERENCES {[products] ([id],[item_type_id]) OR [recipes] ([id],[item_type_id])} Maybe there is no solution as I'm describing it, so if you have any ideas where I can maintain the same/similar schema, I'm definitely open to hearing them! Thanks :)
Since your products and recipes are stored separately, and appear to mostly have separate columns, then separate inventory tables is probably the correct approach. e.g. CREATE TABLE dbo.ProductInventory ( Product_id INT NOT NULL, [count] INT NOT NULL, CONSTRAINT FK_ProductInventory__Product_id FOREIGN KEY (Product_id) REFERENCES dbo.Product (Product_id) ); CREATE TABLE dbo.RecipeInventory ( Recipe_id INT NOT NULL, [count] INT NOT NULL, CONSTRAINT FK_RecipeInventory__Recipe_id FOREIGN KEY (Recipe_id) REFERENCES dbo.Recipe (Recipe_id ) ); If you need all types combined, you can simply use a view: CREATE VIEW dbo.Inventory AS SELECT Product_id AS item_id, 2 AS item_type_id, [Count] FROM ProductInventory UNION ALL SELECT recipe_id AS item_id, 1 AS item_type_id [Count] FROM RecipeInventory; GO IF you create a new item_type, then you need to amend the DB design anyway to create a new table, so you would just need to amend the view at the same time Another possibility, would be to have a single Items table, and then have Products/Recipes reference this. So you start with your items table, each of which has a unique ID: CREATE TABLE dbo.Items ( item_id INT IDENTITY(1, 1) NOT NULL Item_type_id INT NOT NULL, CONSTRAINT PK_Items__ItemID PRIMARY KEY (item_id), CONSTRAINT FK_Items__Item_Type_ID FOREIGN KEY (Item_Type_ID) REFERENCES Item_Type (Item_Type_ID), CONSTRAINT UQ_Items__ItemID_ItemTypeID UNIQUE (Item_ID, Item_type_id) ); Note the unique key added on (item_id, item_type_id), this is important for referential integrity later on. Then each of your sub tables has a 1:1 relationship with this, so your product table would become: CREATE TABLE dbo.Products ( item_id BIGINT NOT NULL, Item_type_id AS 2, name VARCHAR(50) NOT NULL, Price DECIMAL(10, 4) NOT NULL, InStock BIT NOT NULL, CONSTRAINT PK_Products__ItemID PRIMARY KEY (item_id), CONSTRAINT FK_Products__Item_Type_ID FOREIGN KEY (Item_Type_ID) REFERENCES Item_Type (Item_Type_ID), CONSTRAINT FK_Products__ItemID_ItemTypeID FOREIGN KEY (item_id, Item_Type_ID) REFERENCES dbo.Item (item_id, item_type_id) ); A few things to note: item_id is again the primary key, ensuring the 1:1 relationship. the computed column item_type_id (as 2) ensuring all item_type_id's are set to 2. This is key as it allows a foreign key constraint to be added the foreign key on (item_id, item_type_id) back to the items table. This ensures that you can only insert a record to the product table, if the original record in the items table has an item_type_id of 2. A third option would be a single table for recipes and products and make any columns not required for both nullable. This answer on types of inheritance is well worth a read.
I think there is a flaw in your database design. The best way to solve your actual problem, is to have Recipies and products as one single table. Right now you have a redundant column in each table called item_type_id. That column is not worth anything, unless you actually have the items in the same table. I say redundant, because it has the same value for absolutely every entry in each table. You have two options. If you can not change the database design, work without foreign keys, and make the logic layer select from the correct tables. Or, if you can change the database design, make products and recipies exist in the same table. You already have a item_type table, which can identify item categorization, so it makes sense to put all items in the same table
you can only add one constraint for a column or pair of columns. Think about apples and oranges. A column cannot refer to both oranges and apples. It must be either orange or apple. As a side note, this can be somehow achieved with PERSISTED COMPUTED columns, however It only introduces overhead and complexity. Check This for Reference
You can add some computed columns to the Inventory table: ALTER TABLE Inventory ADD _recipe_item_id AS CASE WHEN item_type_id = 1 THEN item_id END persisted ALTER TABLE Inventory ADD _product_item_id AS CASE WHEN item_type_id = 2 THEN item_id END persisted You can then add two separate foreign keys to the two tables, using those two columns instead of item_id. I'm assuming the item_type_id column in those two tables is already computed/constraint appropriately but if not you may want to consider that too. Because these computed columns are NULL when the wrong type is selected, and because SQL Server doesn't check FK constraints if at least one column value is NULL, they can both exist and only one or the other will be satisfied at any time.
Union all data unmanaged
Forgive me if this question already ask,not much of db guy here , here is what i tried, select row_number() over (partition by name order by challanto_date) , * from ( select rma, p.id, p.name, challanto_date, CURRENT_TIMESTAMP as fromDate from challan_to_vendor cv left join challan_to_vendor_detail cvd on cv.id = cvd.challan_to_vendor_id inner join main_product p on p.id = cvd.product_id union all select rma, p.id, p.name, CURRENT_TIMESTAMP as toDate, challan_date from challan_from_vendor cv left join challan_from_vendor_detail cvd on cv.id = cvd.challan_from_vendor_id inner join main_product p on p.id = cvd.product_id ) as a Here is my create table script : challan_from_vendor CREATE TABLE public.challan_from_vendor ( id character varying NOT NULL, date_ad date, rma integer DEFAULT 1, CONSTRAINT psk PRIMARY KEY (id) ) challan_from_vendor_detail CREATE TABLE public.challan_from_vendor_detail ( id character varying NOT NULL, challan_from_id character varying, product_id character varying, CONSTRAINT psks PRIMARY KEY (id), CONSTRAINT fsks FOREIGN KEY (challan_from_id) REFERENCES public.challan_from_vendor (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION ) challan_to_vendor; CREATE TABLE public.challan_to_vendor ( id character varying NOT NULL, date_ad date, rma integer DEFAULT 1, CONSTRAINT pk PRIMARY KEY (id) ) challan_to_vendor_detail CREATE TABLE public.challan_to_vendor_detail ( id character varying NOT NULL, challan_to_id character varying, product_id character varying, CONSTRAINT pks PRIMARY KEY (id), CONSTRAINT fks FOREIGN KEY (challan_to_id) REFERENCES public.challan_to_vendor (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION ) product CREATE TABLE public.product ( id character varying NOT NULL, product_name character varying, CONSTRAINT pks PRIMARY KEY (id) ) Here is my table structures and desire output. challan_from_vendor | id | rma | date | |:-----------|------------:|:------------:| | 12012 | 0001 | 2018-11-10 | 123121 | 0001 | 2018-11-11 challan_to_vendor | id | rma | date | |:-----------|------------:|:------------:| | 12 | 0001 | 2018-12-10 | 123 | 0001 | 2018-12-11 challan_from_vendor_detail | id | challan_from_vendor_id | product_id | |:-----------|------------:|:------------:| | 121 | 12012 | 121313 | 1213 | 12012 | 131381 challan_to_vendor_detail challan_from_vendor_detail | id | challan_to_vendor_id | product_id | |:-----------|------------------------|:------------:| | 121 | 12 | 121313 | 1213 | 123 | 131381 product | id | product_name | |:-----------|------------:| | 191313 | apple | | 89113 | banana | Output | ram | product_id | challan_from_date | challan_to_date| |:-----------|------------:|:-----------------:|:--------------:| | 0001 | 191313| 2018-11-10 |2018-11-11 | | 0001 | 89113 | 2018-12-10 |2018-12-11 |
There is some strange things in the query you have tried so it is not clear what tables, how they are related or what the columns are in those tables. So by some guessing I give you this to start of with: select main_product.*, challan_to_vendor.toDate, challan_from_vendor.fromDate from main_product join challan_to_vendor using(product_id) join challan_from_vendor using(product_id) If you explain more about your db an what you want out of it I might be able to help you more. Edit: So I could not run your create statements in my db since there was naming conflicts among other minor things. Here is some advice on the create process that I find useful: Let the id's be integer instead of character varying otherwise it is probably a name-column and should not be named id. You also used integer-id's in your examples. Use SERIAL PRIMARY KEY (see tutorial) to help you with the key creation. This also removes the naming-conflict since the constraints are given implicit unique names. Use the same column-name for the same thing in all places to avoid confusion by having multiple things called id after a join plus that it simplify's the join. So for example the id of the product should be product_id in all places, that way you could use using(product_id) as your join condition. So with the advises given above here's how I would create one of your table and then query them: CREATE TABLE public.challan_to_vendor_detail ( challan_to_vendor_detail_id SERIAL PRIMARY KEY, challan_to_vendor_id integer, product_id integer, CONSTRAINT fks FOREIGN KEY (challan_to_vendor_id) REFERENCES public.challan_to_vendor (challan_to_vendor_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION ); select product_name, challan_to_vendor.date_ad as date_to, challan_from_vendor.date_ad as date_from from product join challan_to_vendor_detail using(product_id) join challan_to_vendor using(challan_to_vendor_id) join challan_from_vendor_detail using(product_id) join challan_from_vendor using(challan_from_vendor_id) Unfortunately the overall db-design does not make sense to me so I do not know if this is what you expect. Good luck!
Sql request return wrong result
I have issue with a SQL request or my database structure. I would like to have supplier/customer database. As the ordered price products and sold products can change, I'd like to keep an historic on each order / sales to create in the future statistics on them. I'm at the beginning of my project and using 6 simple tables : Client | OrderC | OrderDetail -----------------+------------------+-------------------------------------- Client_ID (Pk) | Order_ID (Pk) | OrderD_ID (Pk) Name | Client_ID (Fk) | Order_ID (Fk) | date | Product_ID (Fk) | | Qty | | PU_Vte (sales price) | | For_Cmd_ID (Fk from For_ID-Cmd table Forever_cmd) _________________|__________________|_____________________________________ |(supplier database)| (detail of supplier order) Product | Forever_cmd | For_Ord_Detail ----------------+-------------------+---------------------------------------------------- Product_ID (Pk) | For_ID_CMD (Pk) | For_Det_Id (Pk) Name | date |ID_cmd_For (Fk from For_ID-CMD on Forever_cmd table) | |Product_ID (Fk source Product_ID on Product table) | | Qte (= quantity) | | PUHA (= supplier price) All is working fine until I have similar products on different supplier orders. The result of my sql request create extra result for the customers. Example: I create 2 supplier orders (S1 and S2) containing same product ID (P1) with different supplier prices. I create a customer order and I choose to sale product P1 from S1. When I query to obtain a view by client with products and to track them from supplier order, the result add all similar products from all supplier orders to the customers. Instead of only the products ordered by the customers. I don't know if it is from my query or database consistency. Here is my sql request : SELECT C.Name, O.order_id, F.For_ID_CMD, P.Name, D.Qte, D.PU_Vte, X.PUHA FROM Client AS C JOIN OrderC O ON C.Client_ID = O.Client_ID JOIN OrderDetail D ON O.Order_ID = D.Order_ID JOIN For_Ord_Detail X ON D.Product_ID = X.Product_ID JOIN Forever_Cmd F ON F.For_ID_CMD = X.ID_cmd_For JOIN Product P ON D.Product_ID = P.Product_ID; Could someone please help me ?
First, a quick code review: "date" is a special word. I'd recommend not using it as a column header. Be a bit more descriptive about the type of date these are. OrderC_date and Forever_cmd_date. Or OC_date, FC_date. You've used both "Qty" and "Qte" for quantity. Be consistent. Or even better, be more descriptive of what those quantities are. For your duplicates, remove your JOINs one at a time until you figure out which one is causing the dupes. I'm betting you need to narrow your JOIN criteria on one of those tables. =========================================================== EDIT This should point you in the right direction to find your duplicates. SQL Fiddle MS SQL Server 2014 Schema Setup: CREATE TABLE Product ( Product_ID int, Name varchar(20) ) ; INSERT INTO Product (Product_ID, Name) VALUES (1, 'Widget1') , (2, 'Widget2') ; CREATE TABLE Forever_cmd ( For_ID_CMD int, [date] date ) ; INSERT INTO Forever_cmd ( For_ID_Cmd, [date] ) /* Why a date? */ VALUES (1,'10/26/1985'), (2,'6/27/2012'); CREATE TABLE For_Ord_Detail ( For_Det_Id int, ID_cmd_For int, Product_ID int, Qte int, PUHA decimal(10,2) ) ; INSERT INTO For_Ord_Detail ( For_Det_Id, ID_cmd_For, Product_ID, Qte, PUHA ) VALUES (1,1,1,10,2.00) , (2,1,2,10,12.00) , (3,2,2,20,20.00) ; Query 1: SELECT F.For_ID_CMD, P.Name, X.PUHA FROM Product P LEFT OUTER JOIN For_Ord_Detail X ON P.Product_ID = X.Product_ID <<<<< LEFT OUTER JOIN Forever_Cmd F ON X.ID_cmd_For = F.For_ID_CMD Results: | For_ID_CMD | Name | PUHA | |------------|---------|------| | 1 | Widget1 | 2 | | 1 | Widget2 | 12 | << Why did this "duplicate"? | 2 | Widget2 | 20 | << Why did this "duplicate"? Hint: How do you determine which supplier you get your product from if multiple suppliers have the same product?
How do I delete duplicate records, or merge them with foreign-key restraints intact?
I have a database generated from an XML document with duplicate records. I know how to delete one record from the main table, but not those with foreign-key restraints. I have a large amount of XML documents, and they are inserted without caring about duplicates or not. One solution to removing duplicates is to just delete the lowest Primary_Key values (and all related foreign key records) and keep the highest. I don't know how to do that, though. The database looks like this: Table 1: [type] +-------------+---------+-----------+ | Primary_Key | Food_ID | Food_Type | +-------------+---------+-----------+ | 70001 | 12345 | fruit | | 70002 | 12345 | fruit | | 70003 | 12345 | meat | +----^--------+---------+-----------+ | |-----------------| | | Linked to primary key in the first table +-------------+--------v--------+-------------+-------------+------------+ | Primary_Key | Information_ID | Food_Name | Information | Comments | +-------------+-----------------+-------------+-------------+------------+ | 0001 | 70001 | banana | buy # toms | delicious! | | 0002 | 70002 | banana | buy # mats | so-so | | 0003 | 70003 | decade meat | buy # sals | disgusting | +-------------+-----------------+-------------+-------------+------------+ ^ Table 2: [food_information] There are several other linked tables as well, which all have a foreign key value of the matched primary key value in the main table ([type]). My question based on which solution might be best: How do I delete all of those records, except 70003 (the highest one)? We can't know if it's duplicate record unless [Food_ID] shows up more than once. If it shows up more than once, we need to delete records from ALL tables (there are 10) based on the Primary_Key and Foreign_Key relationship. How do I update/merge these SQL records on insertion to avoid having to delete multiples again? I'd prefer #1, as it prevents me from having to rebuild the database, and it makes inserting much easier. Thanks!
Even if a [foodID] is not duplicated you will get a max(Primary_Key) And it will not be deleted The where condition is NOT in delete tableX where tableX.informationID not in ( select max(Primary_Key) from [type] group by [foodID] ) then just do [type] last delete [type] where [type].[Primary_Key] not in ( select max(Primary_Key) from [type] group by [foodID] ) then just create as unique constraint on [foodID]
something like... assumed: create table food ( primary_key int, food_id int, food_type varchar(20) ); insert into food values (70001,12345,'fruit'); insert into food values (70002,12345,'fruit'); insert into food values (70003,12345,'meat'); insert into food values (70004,11111,'taco'); create table info ( primary_key int, info_id int, food_name varchar(20) ); insert into info values (1,70001,'banana'); insert into info values (2,70002,'banana'); insert into info values (3,70003,'decade meat'); insert into info values (4,70004,'taco taco'); and then... -- yields: 12345 70003 select food_id, max(info_id) as max_info_id from food join info on food.primary_key=info.info_id where food_id in ( select food_id from food join info on food.primary_key=info.info_id group by food_id having count(*)>1); then... something like... this to get the ones to delete. there might be a better way to write this... i'm thinking about it. select * from food join info on food.primary_key=info.info_id join ( select food_id, max(info_id) as max_info_id from food join info on food.primary_key=info.info_id where food_id in ( select food_id from food join info on food.primary_key=info.info_id group by food_id having count(*)>1) ) as dont_delete on food.food_id=dont_delete.food_id and info.info_id<max_info_id gives you: PRIMARY_KEY FOOD_ID FOOD_TYPE INFO_ID FOOD_NAME MAX_INFO_ID 70001 12345 fruit 70001 banana 70003 70002 12345 fruit 70002 banana 70003 so you could do.... just delete from food where primary_key in (select food.primary_key from that_big_query_up_there) and delete from info where info_id in (select food.primary_key from that_big_query_up_there) for future issues, maybe consider a unique constraint on food... unique(primary_key,food_id) or something but if it's one-to-one, why don't you just store them together...?
SQL Query 2 tables null results
I was asked this question in an interview: From the 2 tables below, write a query to pull customers with no sales orders. How many ways to write this query and which would have best performance. Table 1: Customer - CustomerID Table 2: SalesOrder - OrderID, CustomerID, OrderDate Query: SELECT * FROM Customer C RIGHT OUTER JOIN SalesOrder SO ON C.CustomerID = SO.CustomerID WHERE SO.OrderID = NULL Is my query correct and are there other ways to write the query and get the same results?
Answering for MySQL instead of SQL Server, cause you tagged it later with SQL Server, so I thought (since this was an interview question, that it wouldn't bother you, for which DBMS this is). Note though, that the queries I wrote are standard sql, they should run in every RDBMS out there. How each RDBMS handles those queries is another issue, though. I wrote this little procedure for you, to have a test case. It creates the tables customers and orders like you specified and I added primary keys and foreign keys, like one would usually do it. No other indexes, as every column worth indexing here is already primary key. 250 customers are created, 100 of them made an order (though out of convenience none of them twice / multiple times). A dump of the data follows, posted the script just in case you want to play around a little by increasing the numbers. delimiter $$ create procedure fill_table() begin create table customers(customerId int primary key) engine=innodb; set #x = 1; while (#x <= 250) do insert into customers values(#x); set #x := #x + 1; end while; create table orders(orderId int auto_increment primary key, customerId int, orderDate timestamp, foreign key fk_customer (customerId) references customers(customerId) ) engine=innodb; insert into orders(customerId, orderDate) select customerId, now() - interval customerId day from customers order by rand() limit 100; end $$ delimiter ; call fill_table(); For me, this resulted in this: CREATE TABLE `customers` ( `customerId` int(11) NOT NULL, PRIMARY KEY (`customerId`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; INSERT INTO `customers` VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),(50),(51),(52),(53),(54),(55),(56),(57),(58),(59),(60),(61),(62),(63),(64),(65),(66),(67),(68),(69),(70),(71),(72),(73),(74),(75),(76),(77),(78),(79),(80),(81),(82),(83),(84),(85),(86),(87),(88),(89),(90),(91),(92),(93),(94),(95),(96),(97),(98),(99),(100),(101),(102),(103),(104),(105),(106),(107),(108),(109),(110),(111),(112),(113),(114),(115),(116),(117),(118),(119),(120),(121),(122),(123),(124),(125),(126),(127),(128),(129),(130),(131),(132),(133),(134),(135),(136),(137),(138),(139),(140),(141),(142),(143),(144),(145),(146),(147),(148),(149),(150),(151),(152),(153),(154),(155),(156),(157),(158),(159),(160),(161),(162),(163),(164),(165),(166),(167),(168),(169),(170),(171),(172),(173),(174),(175),(176),(177),(178),(179),(180),(181),(182),(183),(184),(185),(186),(187),(188),(189),(190),(191),(192),(193),(194),(195),(196),(197),(198),(199),(200),(201),(202),(203),(204),(205),(206),(207),(208),(209),(210),(211),(212),(213),(214),(215),(216),(217),(218),(219),(220),(221),(222),(223),(224),(225),(226),(227),(228),(229),(230),(231),(232),(233),(234),(235),(236),(237),(238),(239),(240),(241),(242),(243),(244),(245),(246),(247),(248),(249),(250); CREATE TABLE `orders` ( `orderId` int(11) NOT NULL AUTO_INCREMENT, `customerId` int(11) DEFAULT NULL, `orderDate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`orderId`), KEY `fk_customer` (`customerId`), CONSTRAINT `orders_ibfk_1` FOREIGN KEY (`customerId`) REFERENCES `customers` (`customerId`) ) ENGINE=InnoDB AUTO_INCREMENT=128 DEFAULT CHARSET=utf8; INSERT INTO `orders` VALUES (1,247,'2013-06-24 19:50:07'),(2,217,'2013-07-24 19:50:07'),(3,8,'2014-02-18 20:50:07'),(4,40,'2014-01-17 20:50:07'),(5,52,'2014-01-05 20:50:07'),(6,80,'2013-12-08 20:50:07'),(7,169,'2013-09-10 19:50:07'),(8,135,'2013-10-14 19:50:07'),(9,115,'2013-11-03 20:50:07'),(10,225,'2013-07-16 19:50:07'),(11,112,'2013-11-06 20:50:07'),(12,243,'2013-06-28 19:50:07'),(13,158,'2013-09-21 19:50:07'),(14,24,'2014-02-02 20:50:07'),(15,214,'2013-07-27 19:50:07'),(16,25,'2014-02-01 20:50:07'),(17,245,'2013-06-26 19:50:07'),(18,182,'2013-08-28 19:50:07'),(19,166,'2013-09-13 19:50:07'),(20,69,'2013-12-19 20:50:07'),(21,85,'2013-12-03 20:50:07'),(22,44,'2014-01-13 20:50:07'),(23,103,'2013-11-15 20:50:07'),(24,19,'2014-02-07 20:50:07'),(25,33,'2014-01-24 20:50:07'),(26,102,'2013-11-16 20:50:07'),(27,41,'2014-01-16 20:50:07'),(28,94,'2013-11-24 20:50:07'),(29,43,'2014-01-14 20:50:07'),(30,150,'2013-09-29 19:50:07'),(31,218,'2013-07-23 19:50:07'),(32,131,'2013-10-18 19:50:07'),(33,77,'2013-12-11 20:50:07'),(34,2,'2014-02-24 20:50:07'),(35,45,'2014-01-12 20:50:07'),(36,230,'2013-07-11 19:50:07'),(37,101,'2013-11-17 20:50:07'),(38,31,'2014-01-26 20:50:07'),(39,56,'2014-01-01 20:50:07'),(40,176,'2013-09-03 19:50:07'),(41,223,'2013-07-18 19:50:07'),(42,145,'2013-10-04 19:50:07'),(43,26,'2014-01-31 20:50:07'),(44,62,'2013-12-26 20:50:07'),(45,195,'2013-08-15 19:50:07'),(46,153,'2013-09-26 19:50:07'),(47,179,'2013-08-31 19:50:07'),(48,104,'2013-11-14 20:50:07'),(49,7,'2014-02-19 20:50:07'),(50,209,'2013-08-01 19:50:07'),(51,86,'2013-12-02 20:50:07'),(52,110,'2013-11-08 20:50:07'),(53,204,'2013-08-06 19:50:07'),(54,187,'2013-08-23 19:50:07'),(55,114,'2013-11-04 20:50:07'),(56,38,'2014-01-19 20:50:07'),(57,236,'2013-07-05 19:50:07'),(58,79,'2013-12-09 20:50:07'),(59,96,'2013-11-22 20:50:07'),(60,37,'2014-01-20 20:50:07'),(61,207,'2013-08-03 19:50:07'),(62,22,'2014-02-04 20:50:07'),(63,120,'2013-10-29 20:50:07'),(64,200,'2013-08-10 19:50:07'),(65,51,'2014-01-06 20:50:07'),(66,181,'2013-08-29 19:50:07'),(67,4,'2014-02-22 20:50:07'),(68,123,'2013-10-26 19:50:07'),(69,108,'2013-11-10 20:50:07'),(70,55,'2014-01-02 20:50:07'),(71,76,'2013-12-12 20:50:07'),(72,6,'2014-02-20 20:50:07'),(73,18,'2014-02-08 20:50:07'),(74,211,'2013-07-30 19:50:07'),(75,53,'2014-01-04 20:50:07'),(76,216,'2013-07-25 19:50:07'),(77,32,'2014-01-25 20:50:07'),(78,74,'2013-12-14 20:50:07'),(79,138,'2013-10-11 19:50:07'),(80,197,'2013-08-13 19:50:07'),(81,221,'2013-07-20 19:50:07'),(82,118,'2013-10-31 20:50:07'),(83,61,'2013-12-27 20:50:07'),(84,28,'2014-01-29 20:50:07'),(85,16,'2014-02-10 20:50:07'),(86,39,'2014-01-18 20:50:07'),(87,3,'2014-02-23 20:50:07'),(88,46,'2014-01-11 20:50:07'),(89,189,'2013-08-21 19:50:07'),(90,59,'2013-12-29 20:50:07'),(91,249,'2013-06-22 19:50:07'),(92,127,'2013-10-22 19:50:07'),(93,47,'2014-01-10 20:50:07'),(94,178,'2013-09-01 19:50:07'),(95,141,'2013-10-08 19:50:07'),(96,188,'2013-08-22 19:50:07'),(97,220,'2013-07-21 19:50:07'),(98,15,'2014-02-11 20:50:07'),(99,175,'2013-09-04 19:50:07'),(100,206,'2013-08-04 19:50:07'); Okay, now to the queries. Three ways came to my mind, I omitted the right join that MDiesel did, because it's actually just another way of writing left join. It was invented for lazy sql developers, that don't want to switch table names, but instead just rewrite one word. Anyway, first query: select c.* from customers c left join orders o on c.customerId = o.customerId where o.customerId is null; Results in an execution plan like this: +----+-------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+ | 1 | SIMPLE | c | index | NULL | PRIMARY | 4 | NULL | 250 | Using index | | 1 | SIMPLE | o | ref | fk_customer | fk_customer | 5 | wtf.c.customerId | 1 | Using where; Using index | +----+-------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+ Second query: select c.* from customers c where c.customerId not in (select distinct customerId from orders); Results in an execution plan like this: +----+--------------------+--------+----------------+---------------+-------------+---------+------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+--------+----------------+---------------+-------------+---------+------+------+--------------------------+ | 1 | PRIMARY | c | index | NULL | PRIMARY | 4 | NULL | 250 | Using where; Using index | | 2 | DEPENDENT SUBQUERY | orders | index_subquery | fk_customer | fk_customer | 5 | func | 2 | Using index | +----+--------------------+--------+----------------+---------------+-------------+---------+------+------+--------------------------+ Third query: select c.* from customers c where not exists (select 1 from orders o where o.customerId = c.customerId); Results in an execution plan like this: +----+--------------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+ | 1 | PRIMARY | c | index | NULL | PRIMARY | 4 | NULL | 250 | Using where; Using index | | 2 | DEPENDENT SUBQUERY | o | ref | fk_customer | fk_customer | 5 | wtf.c.customerId | 1 | Using where; Using index | +----+--------------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+ We can see in all execution plans, that the customers table is read as a whole, but from the index (the implicit one as the only column is primary key). This may change, when you select other columns from the table, that are not in an index. The first one seems to be the best. For each row in customers only one row in orders is read. The id column suggests, that MySQL can do this in one step, as only indexes are involved. The second query seems to be the worst (though all 3 queries shouldn't perform too bad). For each row in customers the subquery is executed (the select_type column tells this). The third query is not much different in that it uses a dependent subquery, but should perform better than the second query. Explaining the small differences would lead to far now. If you're interested, here's the manual page that explains what each column and their values mean here: EXPLAIN output Finally: I'd say, that the first query will perform best, but as always, in the end one has to measure, to measure and to measure.
I can thing of two other ways to write this query: SELECT C.* FROM Customer C LEFT OUTER JOIN SalesOrder SO ON C.CustomerID = SO.CustomerID WHERE SO.CustomerID IS NULL SELECT C.* FROM Customer C WHERE NOT C.CustomerID IN(SELECT CustomerID FROM SalesOrder)
The solutions involving outer joins will perform better than a solution using NOT IN.