What if there are no possible primary keys in a table? - sql

I have to design a wide table for a database (timescaleDB, which will create hypertables based on date), but it seems like there are no possible primary keys, even if we are talking about composite keys.
| id | attribute1 | attribute2 | attribute3 | attribute4 | date_time
| ---| ---------- | ---------- | ---------- | ---------- | -------------------
| P1 | A | 20 | NULL | NULL | 2021-01-01 00:00:00
| P1 | B | 10 | NULL | NULL | 2021-01-01 00:00:00
| P1 | NULL | NULL | 200 | 300 | 2021-01-01 00:00:00
| P2 | C | 25 | NULL | NULL | 2021-01-01 00:00:00
| P2 | NULL | NULL | 150 | 400 | 2021-01-01 00:00:00
The problem is that we are scraping data that is describing P1, P2, etc. as a whole, and also that is describing only a part of P1 (A and B are part of P1) P2 (C), etc...
Is there any way to make this work without splitting up the table?

You can follow the design below. The following structure does not store any null values in the database
create table parenttable
(
id int identity,
Name nvarchar(10),
primary key(id)
)
create table childtable
(
id int identity,
parent_id int,
attribute nvarchar(50),
valueattribute nvarchar(50),
date_time datetime,
primary key(id),
foreign key(parent_id)references parenttable
);
insert into parenttable values
('P1'),
('P2')
insert into childtable values
(1,'attribute1','A','2021-01-01 00:00:00'),
(1,'attribute2','20','2021-01-01 00:00:00'),
(1,'attribute1','B','2021-01-01 00:00:00'),
(1,'attribute2','10','2021-01-01 00:00:00'),
(1,'attribute3','200','2021-01-01 00:00:00'),
(1,'attribute4','300','2021-01-01 00:00:00'),
(2,'attribute1','C','2021-01-01 00:00:00'),
(2,'attribute2','25','2021-01-01 00:00:00'),
(2,'attribute3','150','2021-01-01 00:00:00'),
(2,'attribute4','400','2021-01-01 00:00:00')
select *
from parenttable p join childtable c on p.id = c.parent_id
result in dbfiddle: https://dbfiddle.uk

Attribute1 and Attribute2 will always be NULL for P1, those are describing A and B (and they belong to P1). Similarly, attribute3 and attribute4 are going to be always NULL if there is A, B, C, etc. because those attirubtes are describing P1.
There is not enough information in your problem statement to answer your question.
I don't understand the above description, but it's enough to tell me you need to apply functional dependency analysis, and create as many tables as "those are describing" exist.
attribute3 and attribute4 ... are describing P1
That suggests you should have a table representing P1 things, with attribute3 and attribute4 as columns (preferably with meaningful names).
Organize your tables around the things you're modeling.
Look for columns that cannot be NULL for particular things. Those belong in the table depicting one kind of thing.
Then look for columns that might be NULL for a certain kind of thing. Those can be NULL-able columns, or a separate table sharing the same key, with optional cardinality.
There are no other kinds of columns.
Once you've grouped your column into tables and distinguished what's necessary from what's not, you can look over the mandatory columns for a candidate key. There is always such a key, even if it includes all the non-NULL columns. Why? Because two identical rows are indistinguishable from each other. If you think you need two such rows, what you really need is 1 row, and a quantity column (not in the key) indicating how many such exist.

Related

Union all data unmanaged

Forgive me if this question already ask,not much of db guy here ,
here is what i tried,
select row_number() over (partition by name order by challanto_date) , *
from (
select
rma,
p.id,
p.name,
challanto_date,
CURRENT_TIMESTAMP as fromDate
from challan_to_vendor cv
left join challan_to_vendor_detail cvd on cv.id = cvd.challan_to_vendor_id
inner join main_product p on p.id = cvd.product_id
union all
select
rma,
p.id,
p.name,
CURRENT_TIMESTAMP as toDate,
challan_date
from challan_from_vendor cv
left join challan_from_vendor_detail cvd on cv.id = cvd.challan_from_vendor_id
inner join main_product p on p.id = cvd.product_id
) as a
Here is my create table script :
challan_from_vendor
CREATE TABLE public.challan_from_vendor
(
id character varying NOT NULL,
date_ad date,
rma integer DEFAULT 1,
CONSTRAINT psk PRIMARY KEY (id)
)
challan_from_vendor_detail
CREATE TABLE public.challan_from_vendor_detail
(
id character varying NOT NULL,
challan_from_id character varying,
product_id character varying,
CONSTRAINT psks PRIMARY KEY (id),
CONSTRAINT fsks FOREIGN KEY (challan_from_id)
REFERENCES public.challan_from_vendor (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
challan_to_vendor;
CREATE TABLE public.challan_to_vendor
(
id character varying NOT NULL,
date_ad date,
rma integer DEFAULT 1,
CONSTRAINT pk PRIMARY KEY (id)
)
challan_to_vendor_detail
CREATE TABLE public.challan_to_vendor_detail
(
id character varying NOT NULL,
challan_to_id character varying,
product_id character varying,
CONSTRAINT pks PRIMARY KEY (id),
CONSTRAINT fks FOREIGN KEY (challan_to_id)
REFERENCES public.challan_to_vendor (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
product
CREATE TABLE public.product
(
id character varying NOT NULL,
product_name character varying,
CONSTRAINT pks PRIMARY KEY (id)
)
Here is my table structures and desire output.
challan_from_vendor
| id | rma | date |
|:-----------|------------:|:------------:|
| 12012 | 0001 | 2018-11-10
| 123121 | 0001 | 2018-11-11
challan_to_vendor
| id | rma | date |
|:-----------|------------:|:------------:|
| 12 | 0001 | 2018-12-10
| 123 | 0001 | 2018-12-11
challan_from_vendor_detail
| id | challan_from_vendor_id | product_id |
|:-----------|------------:|:------------:|
| 121 | 12012 | 121313
| 1213 | 12012 | 131381
challan_to_vendor_detail
challan_from_vendor_detail
| id | challan_to_vendor_id | product_id |
|:-----------|------------------------|:------------:|
| 121 | 12 | 121313
| 1213 | 123 | 131381
product
| id | product_name |
|:-----------|------------:|
| 191313 | apple |
| 89113 | banana |
Output
| ram | product_id | challan_from_date | challan_to_date|
|:-----------|------------:|:-----------------:|:--------------:|
| 0001 | 191313| 2018-11-10 |2018-11-11 |
| 0001 | 89113 | 2018-12-10 |2018-12-11 |
There is some strange things in the query you have tried so it is not clear what tables, how they are related or what the columns are in those tables.
So by some guessing I give you this to start of with:
select
main_product.*,
challan_to_vendor.toDate,
challan_from_vendor.fromDate
from main_product
join challan_to_vendor using(product_id)
join challan_from_vendor using(product_id)
If you explain more about your db an what you want out of it I might be able to help you more.
Edit: So I could not run your create statements in my db since there was naming conflicts among other minor things. Here is some advice on the create process that I find useful:
Let the id's be integer instead of character varying otherwise it is probably a name-column and should not be named id. You also used integer-id's in your examples.
Use SERIAL PRIMARY KEY (see tutorial) to help you with the key creation. This also removes the naming-conflict since the constraints are given implicit unique names.
Use the same column-name for the same thing in all places to avoid confusion by having multiple things called id after a join plus that it simplify's the join. So for example the id of the product should be product_id in all places, that way you could use using(product_id) as your join condition.
So with the advises given above here's how I would create one of your table and then query them:
CREATE TABLE public.challan_to_vendor_detail
(
challan_to_vendor_detail_id SERIAL PRIMARY KEY,
challan_to_vendor_id integer,
product_id integer,
CONSTRAINT fks FOREIGN KEY (challan_to_vendor_id)
REFERENCES public.challan_to_vendor (challan_to_vendor_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
);
select
product_name,
challan_to_vendor.date_ad as date_to,
challan_from_vendor.date_ad as date_from
from product
join challan_to_vendor_detail using(product_id)
join challan_to_vendor using(challan_to_vendor_id)
join challan_from_vendor_detail using(product_id)
join challan_from_vendor using(challan_from_vendor_id)
Unfortunately the overall db-design does not make sense to me so I do not know if this is what you expect.
Good luck!

postgres: Check pair of columns when inserting new row

I have a table like this:
id | person | supporter | referredby|
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
0 | ABC | DEF | |
1 | ABC | GHI | DEF |
2 | CBA | FED | |
3 | CBA | IHG | FED |
What I'm trying to accomplish is I'd like postgres to reject an INSERT if the value in referredby isn't in the supporter column for a specific person. (null referredby is ok)
For example, with the data above:
4, 'ABC', 'JKL', null: accepted (can be null)
4, 'ABC', 'JKL', 'IHG': rejected (IHG not listed as a supporter for ABC)
4, 'ABC', 'JKL', 'DEF': accepted (DEF is listed as a supporter for ABC)
Maybe a check constraint? I'm not sure how to piece it together
Add a foreign key that references person, supporter. (Needs to be unique key.)
alter table t add constraint cname unique(person, supporter);
alter table t add constraint fk foreign key (person, referredby)
references t (person, supporter);
(ANSI SQL syntax, but probably also supported by Postgresql.)

How do I delete duplicate records, or merge them with foreign-key restraints intact?

I have a database generated from an XML document with duplicate records. I know how to delete one record from the main table, but not those with foreign-key restraints.
I have a large amount of XML documents, and they are inserted without caring about duplicates or not. One solution to removing duplicates is to just delete the lowest Primary_Key values (and all related foreign key records) and keep the highest. I don't know how to do that, though.
The database looks like this:
Table 1: [type]
+-------------+---------+-----------+
| Primary_Key | Food_ID | Food_Type |
+-------------+---------+-----------+
| 70001 | 12345 | fruit |
| 70002 | 12345 | fruit |
| 70003 | 12345 | meat |
+----^--------+---------+-----------+
|
|-----------------|
|
| Linked to primary key in the first table
+-------------+--------v--------+-------------+-------------+------------+
| Primary_Key | Information_ID | Food_Name | Information | Comments |
+-------------+-----------------+-------------+-------------+------------+
| 0001 | 70001 | banana | buy # toms | delicious! |
| 0002 | 70002 | banana | buy # mats | so-so |
| 0003 | 70003 | decade meat | buy # sals | disgusting |
+-------------+-----------------+-------------+-------------+------------+
^ Table 2: [food_information]
There are several other linked tables as well, which all have a foreign key value of the matched primary key value in the main table ([type]).
My question based on which solution might be best:
How do I delete all of those records, except 70003 (the highest one)? We can't know if it's duplicate record unless [Food_ID] shows up more than once. If it shows up more than once, we need to delete records from ALL tables (there are 10) based on the Primary_Key and Foreign_Key relationship.
How do I update/merge these SQL records on insertion to avoid having to delete multiples again?
I'd prefer #1, as it prevents me from having to rebuild the database, and it makes inserting much easier.
Thanks!
Even if a [foodID] is not duplicated you will get a max(Primary_Key)
And it will not be deleted
The where condition is NOT in
delete tableX
where tableX.informationID not in ( select max(Primary_Key)
from [type]
group by [foodID] )
then just do [type] last
delete [type]
where [type].[Primary_Key] not in ( select max(Primary_Key)
from [type]
group by [foodID] )
then just create as unique constraint on [foodID]
something like...
assumed:
create table food (
primary_key int,
food_id int,
food_type varchar(20)
);
insert into food values (70001,12345,'fruit');
insert into food values (70002,12345,'fruit');
insert into food values (70003,12345,'meat');
insert into food values (70004,11111,'taco');
create table info (
primary_key int,
info_id int,
food_name varchar(20)
);
insert into info values (1,70001,'banana');
insert into info values (2,70002,'banana');
insert into info values (3,70003,'decade meat');
insert into info values (4,70004,'taco taco');
and then...
-- yields: 12345 70003
select food_id, max(info_id) as max_info_id
from food
join info on food.primary_key=info.info_id
where food_id in (
select food_id
from food
join info on food.primary_key=info.info_id
group by food_id
having count(*)>1);
then... something like... this to get the ones to delete. there might be a better way to write this... i'm thinking about it.
select *
from food
join info on food.primary_key=info.info_id
join ( select food_id, max(info_id) as max_info_id
from food
join info on food.primary_key=info.info_id
where food_id in (
select food_id
from food
join info on food.primary_key=info.info_id
group by food_id
having count(*)>1)
) as dont_delete
on food.food_id=dont_delete.food_id and
info.info_id<max_info_id
gives you:
PRIMARY_KEY FOOD_ID FOOD_TYPE INFO_ID FOOD_NAME MAX_INFO_ID
70001 12345 fruit 70001 banana 70003
70002 12345 fruit 70002 banana 70003
so you could do.... just delete from food where primary_key in (select food.primary_key from that_big_query_up_there) and delete from info where info_id in (select food.primary_key from that_big_query_up_there)
for future issues, maybe consider a unique constraint on food... unique(primary_key,food_id) or something but if it's one-to-one, why don't you just store them together...?

SQL Query 2 tables null results

I was asked this question in an interview:
From the 2 tables below, write a query to pull customers with no sales orders.
How many ways to write this query and which would have best performance.
Table 1: Customer - CustomerID
Table 2: SalesOrder - OrderID, CustomerID, OrderDate
Query:
SELECT *
FROM Customer C
RIGHT OUTER JOIN SalesOrder SO ON C.CustomerID = SO.CustomerID
WHERE SO.OrderID = NULL
Is my query correct and are there other ways to write the query and get the same results?
Answering for MySQL instead of SQL Server, cause you tagged it later with SQL Server, so I thought (since this was an interview question, that it wouldn't bother you, for which DBMS this is). Note though, that the queries I wrote are standard sql, they should run in every RDBMS out there. How each RDBMS handles those queries is another issue, though.
I wrote this little procedure for you, to have a test case. It creates the tables customers and orders like you specified and I added primary keys and foreign keys, like one would usually do it. No other indexes, as every column worth indexing here is already primary key. 250 customers are created, 100 of them made an order (though out of convenience none of them twice / multiple times). A dump of the data follows, posted the script just in case you want to play around a little by increasing the numbers.
delimiter $$
create procedure fill_table()
begin
create table customers(customerId int primary key) engine=innodb;
set #x = 1;
while (#x <= 250) do
insert into customers values(#x);
set #x := #x + 1;
end while;
create table orders(orderId int auto_increment primary key,
customerId int,
orderDate timestamp,
foreign key fk_customer (customerId) references customers(customerId)
) engine=innodb;
insert into orders(customerId, orderDate)
select
customerId,
now() - interval customerId day
from
customers
order by rand()
limit 100;
end $$
delimiter ;
call fill_table();
For me, this resulted in this:
CREATE TABLE `customers` (
`customerId` int(11) NOT NULL,
PRIMARY KEY (`customerId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `customers` VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),(50),(51),(52),(53),(54),(55),(56),(57),(58),(59),(60),(61),(62),(63),(64),(65),(66),(67),(68),(69),(70),(71),(72),(73),(74),(75),(76),(77),(78),(79),(80),(81),(82),(83),(84),(85),(86),(87),(88),(89),(90),(91),(92),(93),(94),(95),(96),(97),(98),(99),(100),(101),(102),(103),(104),(105),(106),(107),(108),(109),(110),(111),(112),(113),(114),(115),(116),(117),(118),(119),(120),(121),(122),(123),(124),(125),(126),(127),(128),(129),(130),(131),(132),(133),(134),(135),(136),(137),(138),(139),(140),(141),(142),(143),(144),(145),(146),(147),(148),(149),(150),(151),(152),(153),(154),(155),(156),(157),(158),(159),(160),(161),(162),(163),(164),(165),(166),(167),(168),(169),(170),(171),(172),(173),(174),(175),(176),(177),(178),(179),(180),(181),(182),(183),(184),(185),(186),(187),(188),(189),(190),(191),(192),(193),(194),(195),(196),(197),(198),(199),(200),(201),(202),(203),(204),(205),(206),(207),(208),(209),(210),(211),(212),(213),(214),(215),(216),(217),(218),(219),(220),(221),(222),(223),(224),(225),(226),(227),(228),(229),(230),(231),(232),(233),(234),(235),(236),(237),(238),(239),(240),(241),(242),(243),(244),(245),(246),(247),(248),(249),(250);
CREATE TABLE `orders` (
`orderId` int(11) NOT NULL AUTO_INCREMENT,
`customerId` int(11) DEFAULT NULL,
`orderDate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`orderId`),
KEY `fk_customer` (`customerId`),
CONSTRAINT `orders_ibfk_1` FOREIGN KEY (`customerId`) REFERENCES `customers` (`customerId`)
) ENGINE=InnoDB AUTO_INCREMENT=128 DEFAULT CHARSET=utf8;
INSERT INTO `orders` VALUES (1,247,'2013-06-24 19:50:07'),(2,217,'2013-07-24 19:50:07'),(3,8,'2014-02-18 20:50:07'),(4,40,'2014-01-17 20:50:07'),(5,52,'2014-01-05 20:50:07'),(6,80,'2013-12-08 20:50:07'),(7,169,'2013-09-10 19:50:07'),(8,135,'2013-10-14 19:50:07'),(9,115,'2013-11-03 20:50:07'),(10,225,'2013-07-16 19:50:07'),(11,112,'2013-11-06 20:50:07'),(12,243,'2013-06-28 19:50:07'),(13,158,'2013-09-21 19:50:07'),(14,24,'2014-02-02 20:50:07'),(15,214,'2013-07-27 19:50:07'),(16,25,'2014-02-01 20:50:07'),(17,245,'2013-06-26 19:50:07'),(18,182,'2013-08-28 19:50:07'),(19,166,'2013-09-13 19:50:07'),(20,69,'2013-12-19 20:50:07'),(21,85,'2013-12-03 20:50:07'),(22,44,'2014-01-13 20:50:07'),(23,103,'2013-11-15 20:50:07'),(24,19,'2014-02-07 20:50:07'),(25,33,'2014-01-24 20:50:07'),(26,102,'2013-11-16 20:50:07'),(27,41,'2014-01-16 20:50:07'),(28,94,'2013-11-24 20:50:07'),(29,43,'2014-01-14 20:50:07'),(30,150,'2013-09-29 19:50:07'),(31,218,'2013-07-23 19:50:07'),(32,131,'2013-10-18 19:50:07'),(33,77,'2013-12-11 20:50:07'),(34,2,'2014-02-24 20:50:07'),(35,45,'2014-01-12 20:50:07'),(36,230,'2013-07-11 19:50:07'),(37,101,'2013-11-17 20:50:07'),(38,31,'2014-01-26 20:50:07'),(39,56,'2014-01-01 20:50:07'),(40,176,'2013-09-03 19:50:07'),(41,223,'2013-07-18 19:50:07'),(42,145,'2013-10-04 19:50:07'),(43,26,'2014-01-31 20:50:07'),(44,62,'2013-12-26 20:50:07'),(45,195,'2013-08-15 19:50:07'),(46,153,'2013-09-26 19:50:07'),(47,179,'2013-08-31 19:50:07'),(48,104,'2013-11-14 20:50:07'),(49,7,'2014-02-19 20:50:07'),(50,209,'2013-08-01 19:50:07'),(51,86,'2013-12-02 20:50:07'),(52,110,'2013-11-08 20:50:07'),(53,204,'2013-08-06 19:50:07'),(54,187,'2013-08-23 19:50:07'),(55,114,'2013-11-04 20:50:07'),(56,38,'2014-01-19 20:50:07'),(57,236,'2013-07-05 19:50:07'),(58,79,'2013-12-09 20:50:07'),(59,96,'2013-11-22 20:50:07'),(60,37,'2014-01-20 20:50:07'),(61,207,'2013-08-03 19:50:07'),(62,22,'2014-02-04 20:50:07'),(63,120,'2013-10-29 20:50:07'),(64,200,'2013-08-10 19:50:07'),(65,51,'2014-01-06 20:50:07'),(66,181,'2013-08-29 19:50:07'),(67,4,'2014-02-22 20:50:07'),(68,123,'2013-10-26 19:50:07'),(69,108,'2013-11-10 20:50:07'),(70,55,'2014-01-02 20:50:07'),(71,76,'2013-12-12 20:50:07'),(72,6,'2014-02-20 20:50:07'),(73,18,'2014-02-08 20:50:07'),(74,211,'2013-07-30 19:50:07'),(75,53,'2014-01-04 20:50:07'),(76,216,'2013-07-25 19:50:07'),(77,32,'2014-01-25 20:50:07'),(78,74,'2013-12-14 20:50:07'),(79,138,'2013-10-11 19:50:07'),(80,197,'2013-08-13 19:50:07'),(81,221,'2013-07-20 19:50:07'),(82,118,'2013-10-31 20:50:07'),(83,61,'2013-12-27 20:50:07'),(84,28,'2014-01-29 20:50:07'),(85,16,'2014-02-10 20:50:07'),(86,39,'2014-01-18 20:50:07'),(87,3,'2014-02-23 20:50:07'),(88,46,'2014-01-11 20:50:07'),(89,189,'2013-08-21 19:50:07'),(90,59,'2013-12-29 20:50:07'),(91,249,'2013-06-22 19:50:07'),(92,127,'2013-10-22 19:50:07'),(93,47,'2014-01-10 20:50:07'),(94,178,'2013-09-01 19:50:07'),(95,141,'2013-10-08 19:50:07'),(96,188,'2013-08-22 19:50:07'),(97,220,'2013-07-21 19:50:07'),(98,15,'2014-02-11 20:50:07'),(99,175,'2013-09-04 19:50:07'),(100,206,'2013-08-04 19:50:07');
Okay, now to the queries. Three ways came to my mind, I omitted the right join that MDiesel did, because it's actually just another way of writing left join. It was invented for lazy sql developers, that don't want to switch table names, but instead just rewrite one word.
Anyway, first query:
select
c.*
from
customers c
left join orders o on c.customerId = o.customerId
where o.customerId is null;
Results in an execution plan like this:
+----+-------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+
| 1 | SIMPLE | c | index | NULL | PRIMARY | 4 | NULL | 250 | Using index |
| 1 | SIMPLE | o | ref | fk_customer | fk_customer | 5 | wtf.c.customerId | 1 | Using where; Using index |
+----+-------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+
Second query:
select
c.*
from
customers c
where c.customerId not in (select distinct customerId from orders);
Results in an execution plan like this:
+----+--------------------+--------+----------------+---------------+-------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------+----------------+---------------+-------------+---------+------+------+--------------------------+
| 1 | PRIMARY | c | index | NULL | PRIMARY | 4 | NULL | 250 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | orders | index_subquery | fk_customer | fk_customer | 5 | func | 2 | Using index |
+----+--------------------+--------+----------------+---------------+-------------+---------+------+------+--------------------------+
Third query:
select
c.*
from
customers c
where not exists (select 1 from orders o where o.customerId = c.customerId);
Results in an execution plan like this:
+----+--------------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+
| 1 | PRIMARY | c | index | NULL | PRIMARY | 4 | NULL | 250 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | o | ref | fk_customer | fk_customer | 5 | wtf.c.customerId | 1 | Using where; Using index |
+----+--------------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+
We can see in all execution plans, that the customers table is read as a whole, but from the index (the implicit one as the only column is primary key). This may change, when you select other columns from the table, that are not in an index.
The first one seems to be the best. For each row in customers only one row in orders is read. The id column suggests, that MySQL can do this in one step, as only indexes are involved.
The second query seems to be the worst (though all 3 queries shouldn't perform too bad). For each row in customers the subquery is executed (the select_type column tells this).
The third query is not much different in that it uses a dependent subquery, but should perform better than the second query. Explaining the small differences would lead to far now. If you're interested, here's the manual page that explains what each column and their values mean here: EXPLAIN output
Finally: I'd say, that the first query will perform best, but as always, in the end one has to measure, to measure and to measure.
I can thing of two other ways to write this query:
SELECT C.*
FROM Customer C
LEFT OUTER JOIN SalesOrder SO ON C.CustomerID = SO.CustomerID
WHERE SO.CustomerID IS NULL
SELECT C.*
FROM Customer C
WHERE NOT C.CustomerID IN(SELECT CustomerID FROM SalesOrder)
The solutions involving outer joins will perform better than a solution using NOT IN.

improve database table design depending on a value of a type in a column

I have the following:
1. A table "patients" where I store patients data.
2. A table "tests" where I store data of tests done to each patient.
Now the problem comes as I have 2 types of tests "tests_1" and "tests_2"
So for each test done to particular patient I store the type and id of the type of test:
CREATE TABLE IF NOT EXISTS patients
(
id_patient INTEGER PRIMARY KEY,
name_patient VARCHAR(30) NOT NULL,
sex_patient VARCHAR(6) NOT NULL,
date_patient DATE
);
INSERT INTO patients values
(1,'Joe', 'Male' ,'2000-01-23');
INSERT INTO patients values
(2,'Marge','Female','1950-11-25');
INSERT INTO patients values
(3,'Diana','Female','1985-08-13');
INSERT INTO patients values
(4,'Laura','Female','1984-12-29');
CREATE TABLE IF NOT EXISTS tests
(
id_test INTEGER PRIMARY KEY,
id_patient INTEGER,
type_test VARCHAR(15) NOT NULL,
id_type_test INTEGER,
date_test DATE,
FOREIGN KEY (id_patient) REFERENCES patients(id_patient)
);
INSERT INTO tests values
(1,4,'test_1',10,'2004-05-29');
INSERT INTO tests values
(2,4,'test_2',45,'2005-01-29');
INSERT INTO tests values
(3,4,'test_2',55,'2006-04-12');
CREATE TABLE IF NOT EXISTS tests_1
(
id_test_1 INTEGER PRIMARY KEY,
id_patient INTEGER,
data1 REAL,
data2 REAL,
data3 REAL,
data4 REAL,
data5 REAL,
FOREIGN KEY (id_patient) REFERENCES patients(id_patient)
);
INSERT INTO tests_1 values
(10,4,100.7,1.8,10.89,20.04,5.29);
CREATE TABLE IF NOT EXISTS tests_2
(
id_test_2 INTEGER PRIMARY KEY,
id_patient INTEGER,
data1 REAL,
data2 REAL,
data3 REAL,
FOREIGN KEY (id_patient) REFERENCES patients(id_patient)
);
INSERT INTO tests_2 values
(45,4,10.07,18.9,1.8);
INSERT INTO tests_2 values
(55,4,17.6,1.8,18.89);
Now I think this approach is redundant or not to good...
So I would like to improve queries like
select * from tests WHERE id_patient=4;
select * from tests_1 WHERE id_patient=4;
select * from tests_2 WHERE id_patient=4;
Is there a better approach?
In this example I have 1 test of type tests_1 and 2 tests of type tests_2 for patient with id=4.
Here is a fiddle
Add a table testtype (id_test,name_test) and use it an FK to the id_type_test field in the tests table. Do not create seperate tables for test_1 and test_2
It depends on the requirement
For OLTP I would do something like the following
STAFF:
ID | FORENAME | SURNAME | DATE_OF_BIRTH | JOB_TITLE | ...
-------------------------------------------------------------
1 | harry | potter | 2001-01-01 | consultant | ...
2 | ron | weasley | 2001-02-01 | pathologist | ...
PATIENT:
ID | FORENAME | SURNAME | DATE_OF_BIRTH | ...
-----------------------------------------------
1 | hermiony | granger | 2013-01-01 | ...
TEST_TYPE:
ID | CATEGORY | NAME | DESCRIPTION | ...
--------------------------------------------------------
1 | haematology | abg | arterial blood gasses | ...
REQUEST:
ID | TEST_TYPE_ID | PATIENT_ID | DATE_REQUESTED | REQUESTED_BY | ...
----------------------------------------------------------------------
1 | 1 | 1 | 2013-01-02 | 1 | ...
RESULT_TYPE:
ID | TEST_TYPE_ID | NAME | UNIT | ...
---------------------------------------
1 | 1 | co2 | kPa | ...
2 | 1 | o2 | kPa | ...
RESULT:
ID | REQUEST_ID | RESULT_TYPE_ID | DATE_RESULTED | RESULTED_BY | RESULT | ...
-------------------------------------------------------------------------------
1 | 1 | 1 | 2013-01-02 | 2 | 5 | ...
2 | 1 | 2 | 2013-01-02 | 2 | 5 | ...
A concern I have with the above is with the unit of the test result, these can sometimes (not often) change. It may be better to place the unit un the result table.
Also consider breaking these into the major test categories as my understanding is they can be quite different e.g. histopathology and xrays are not resulted in the similar ways as haematology and microbiology are.
For OLAP I would combine request and result into one table adding derived columns such as REQUEST_TO_RESULT_MINS and make a single dimension from RESULT_TYPE and TEST_TYPE etc.
You can do this in a few ways. without knowing all the different type of cases you need to deal with.
The simplest would be 5 tables
Patients (like you described it)
Tests (like you described it)
TestType (like Declan_K suggested)
TestResultCode
TestResults
TestRsultCode describe each value that is stored for each test. TestResults is a pivoted table that can store any number of test-results per test,:
Create table TestResultCode
(
idTestResultCode int
, Code varchar(10)
, Description varchar(200)
, DataType int -- 1= Real, 2 = Varchar, 3 = int, etc.
);
Create Table TestResults
(
idPatent int -- FK
, idTest int -- FK
, idTestType int -- FK
, idTestResultCode int -- FK
, ResultsI real
, ResultsV varchar(100)
, Resultsb int
, Created datetime
)
so, basically you can fit the results you wanted to add into the tables "tests_1" and "tests_2" and any other tests you can think of.
The application reading this table, can load each test and all its values. Of course the application needs to know how to deal with each case, but you can store any type of test in this structure.