How to add non duplicate items inside an array in PostgreSQL? - sql

question_id | person_id | question_title | question_source | question_likes | question_dislikes | created_at
-------------+-----------+--------------------------+--------------------------+----------------+-------------------+---------------------------------
2 | 2 | This is question Title 1 | This is question Video 1 | | | 2021-11-11 10:32:53.93505+05:30
3 | 3 | This is question Title 1 | This is question Video 1 | | | 2021-11-11 10:32:58.69947+05:30
1 | 1 | This is question Title 1 | This is question Video 1 | {2} | {1} | 2021-11-11 10:32:45.81733+05:30
I'm trying to add values inside the question_likes and question_dislikes which are arrays. I'm not able to figure out the query to add only non duplicate items inside the array using sql. I want all items in the array to be unique.
Below is the query i used to create the questions table:
CREATE TABLE questions(
question_id BIGSERIAL PRIMARY KEY,
person_id VARCHAR(50) REFERENCES person(person_id) NOT NULL,
question_title VARCHAR(100) NOT NULL,
question_source VARCHAR(100),
question_likes TEXT[] UNIQUE,
question_dislikes TEXT[] UNIQUE,
created_at TIMESTAMPTZ DEFAULT NOW()
);
UPDATE questions
SET question_likes = array_append(question_likes,$1)
WHERE question_id=$2
RETURNING *
What changes can i make to the above query to make it work?

This is a drawback of using a de-normalized model. Just because Postgres supports arrays, doesn't mean that they are a good choice to solve problems that are better solved using best practices of relational data modeling.
However, in your case you can simply prevent this by adding an additional condition to your WHERE clause:
UPDATE questions
SET question_likes = array_append(question_likes,$1)
WHERE question_id=$2
AND $1 <> ALL(question_likes)
RETURNING *
This prevents the update completely if the value of $1 is found anywhere in the array.

Related

SQL N:M query merging results by condition flag in intermediate table

[First of all, if this is a duplicate, sorry, I couldn't find a response for this, as this is a strange solution for a limitation on an ORM and I'm clearly a noobie on SQL]
Domain requirements:
A brigades must be composed by one user (the commissar one) and, optionally, one and only one assistant (1:1)
A user can only be part of one brigade (1:1)
CREATE TABLE Users
(
id SERIAL PRIMARY KEY,
username VARCHAR(100) NOT NULL UNIQUE,
password VARCHAR(100) NOT NULL
);
CREATE TABLE Brigades
(
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL
);
-- N:M relationship with a flag inside which determine if that user is a commissar or not
CREATE TABLE Brigade_User
(
brigade_id INT NOT NULL REFERENCES Brigades(id)
ON DELETE CASCADE
ON UPDATE CASCADE,
user_id INT NOT NULL REFERENCES Users(id)
ON DELETE CASCADE
ON UPDATE CASCADE,
is_commissar BOOLEAN NOT NULL
PRIMARY KEY(brigade_id, user_id)
);
Ideally, as relations are 1:1, Brigade_User intermediate table could be erased and a Brigade table with two foreign keys could be created instead (this is not supported by Diesel Rust ORM, so I think I'm coupled to first approach)
CREATE TABLE Brigades
(
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL
-- 1:1
commisar_id INT NOT NULL REFERENCES Users(id)
ON DELETE CASCADE
ON UPDATE CASCADE,
-- 1:1
assistant_id INT NOT NULL REFERENCES Users(id)
ON DELETE CASCADE
ON UPDATE CASCADE
);
An example...
> SELECT * FROM brigade_user LEFT JOIN brigades ON brigade_user.brigade_id = brigades.id;
brigade_id | user_id | is_commissar | id | name
------------+---------+--------------+----+------------------
1 | 1 | t | 1 | Patrulla gatuna
1 | 2 | f | 1 | Patrulla gatuna
2 | 3 | t | 2 | Patrulla perruna
2 | 4 | f | 2 | Patrulla perruna
3 | 6 | t | 3 | Patrulla canina
3 | 5 | f | 3 | Patrulla canina
(4 rows)
Is it possible to make a query which returns a table like this?
brigade_id | commissar_id | assistant_id | name
-----------+--------------+--------------+--------------------
1 | 1 | 2 | Patrulla gatuna
2 | 3 | 4 | Patrulla perruna
3 | 6 | 5 | Patrulla canina
See that each two rows have been merged into one (remember, a brigade is composed by one commissary and, optionally, one assistant) depending on the flag.
Could this model be improved (having in mind the limitation on multiple foreign keys referencing the same table, discussed here)
Try the following:
with cte as
(
SELECT A.brigade_id,A.user_id,A.is_commissar,B.name
FROM brigade_user A LEFT JOIN brigades B ON A.brigade_id = B.id
)
select C1.brigade_id, C1.user_id as commissar_id , C2.user_id as assistant_id, C1.name from
cte C1 left join cte C2
on C1.brigade_id=C2.brigade_id
and C1.user_id<>C2.user_id
where C1.is_commissar=true
See a demo from here.

store Perl hash data in a database

I have written Perl code that parses a text file and uses a hash to tally up the number of times a US State abbreviation appears in each file/record. I end up with something like this.
File: 521
OH => 4
PA => 1
IN => 2
TX => 3
IL => 7
I am struggling to find a way to store such hash results in an SQL database. I am using mariadb. Because the structure of the data itself varies, one file will have some states and the next may have others. For example, one file may contain only a few states, the next may contain a group of completely different states. I am even having trouble conceptualizing the table structure. What would be the best way to store data like this in a database?
There are many possible ways to store the data.
For sake of simplicity see if the following approach will be an acceptable solution for your case. The solution is base on use one table with two indexes based upon id and state columns.
CREATE TABLE IF NOT EXISTS `state_count` (
`id` INT NOT NULL,
`state` VARCHAR(2) NOT NULL,
`count` INT NOT NULL,
INDEX `id` (`id`),
INDEX `state` (`state`)
);
INSERT INTO `state_count`
(`id`,`state`,`count`)
VALUES
('251','OH',4),
('251','PA',1),
('251','IN',2),
('251','TX',3),
('251','IL',7);
Sample SQL SELECT output
MySQL [dbs0897329] > SELECT * FROM state_count;
+-----+-------+-------+
| id | state | count |
+-----+-------+-------+
| 251 | OH | 4 |
| 251 | PA | 1 |
| 251 | IN | 2 |
| 251 | TX | 3 |
| 251 | IL | 7 |
+-----+-------+-------+
5 rows in set (0.000 sec)
MySQL [dbs0897329]> SELECT * FROM state_count WHERE state='OH';
+-----+-------+-------+
| id | state | count |
+-----+-------+-------+
| 251 | OH | 4 |
+-----+-------+-------+
1 row in set (0.000 sec)
MySQL [dbs0897329]> SELECT * FROM state_count WHERE state IN ('OH','TX');
+-----+-------+-------+
| id | state | count |
+-----+-------+-------+
| 251 | OH | 4 |
| 251 | TX | 3 |
+-----+-------+-------+
2 rows in set (0.001 sec)
It's a little unclear in what direction your question goes. But if you want a good relational model to store the data into, that would be three tables. One for the files. One for the states. One for the count of the states in a file. For example:
The tables:
CREATE TABLE file
(id integer
AUTO_INCREMENT,
path varchar(256)
NOT NULL,
PRIMARY KEY (id),
UNIQUE (path));
CREATE TABLE state
(id integer
AUTO_INCREMENT,
abbreviation varchar(2)
NOT NULL,
PRIMARY KEY (id),
UNIQUE (abbreviation));
CREATE TABLE occurrences
(file integer,
state integer,
count integer
NOT NULL,
PRIMARY KEY (file,
state),
FOREIGN KEY (file)
REFERENCES file
(id),
FOREIGN KEY (state)
REFERENCES state
(id),
CHECK (count >= 0));
The data:
INSERT INTO files
(path)
VALUES ('521');
INSERT INTO states
(abbreviation)
VALUES ('OH'),
('PA'),
('IN'),
('TX'),
('IL');
INSERT INTO occurrences
(file,
state,
count)
VALUES (1,
1,
4),
(1,
2,
1),
(1,
3,
2),
(1,
4,
3),
(1,
4,
7);
The states of course would be reused. Fill the table with all 50 and use them. They should not be inserted for every file again.
You can fill occurrences explicitly with a count of 0 for file where the respective state didn't appear, if you want to distinguish between "I know it's 0." and "I don't know the count.", which would then be encoded through the absence of a corresponding row. If you don't want to distinguish that and no row means a count of 0, you can handle that in queries by using outer joins and coalesce() to "translate" to 0.

Result of query as column value

I've got three tables:
Lessons:
CREATE TABLE lessons (
id SERIAL PRIMARY KEY,
title text NOT NULL,
description text NOT NULL,
vocab_count integer NOT NULL
);
+----+------------+------------------+-------------+
| id | title | description | vocab_count |
+----+------------+------------------+-------------+
| 1 | lesson_one | this is a lesson | 3 |
| 2 | lesson_two | another lesson | 2 |
+----+------------+------------------+-------------+
Lesson_vocabulary:
CREATE TABLE lesson_vocabulary (
lesson_id integer REFERENCES lessons(id),
vocabulary_id integer REFERENCES vocabulary(id)
);
+-----------+---------------+
| lesson_id | vocabulary_id |
+-----------+---------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 2 |
| 2 | 4 |
+-----------+---------------+
Vocabulary:
CREATE TABLE vocabulary (
id integer PRIMARY KEY,
hiragana text NOT NULL,
reading text NOT NULL,
meaning text[] NOT NULL
);
Each lesson contains multiple vocabulary, and each vocabulary can be included in multiple lessons.
How can I get the vocab_count column of the lessons table to be calculated and updated whenevr I add more rows to the lesson_vocabulary table. Is this possible, and how would I go about doing this?
Thanks
You can use SQL triggers to serve your purpose. This would be similar to mysql after insert trigger which updates another table's column.
The trigger would look somewhat like this. I am using Oracle SQL, but there would just be minor tweaks for any other implementation.
CREATE TRIGGER vocab_trigger
AFTER INSERT ON lesson_vocabulary
FOR EACH ROW
begin
for lesson_cur in (select LESSON_ID, COUNT(VOCABULARY_ID) voc_cnt from LESSON_VOCABULARY group by LESSON_ID) LOOP
update LESSONS
set VOCAB_COUNT = LESSON_CUR.VOC_CNT
where id = LESSON_CUR.LESSON_ID;
end loop;
END;
It's better to create a view that calculates that (and get rid of the column in the lessons table):
select l.*, lv.vocab_count
from lessons l
left join (
select lesson_id, count(*)
from lesson_vocabulary
group by lesson_id
) as lv(lesson_id, vocab_count) on l.id = lv.lesson_id
If you really want to update the lessons table each time the lesson_vocabulary changes, you can run an UPDATE statement like this in a trigger:
update lessons l
set vocab_count = t.cnt
from (
select lesson_id, count(*) as cnt
from lesson_vocabulary
group by lesson_id
) t
where t.lesson_id = l.id;
I would recommend using a query for this information:
select l.*,
(select count(*)
from lesson_vocabulary lv
where lv.lesson_id = l.lesson_id
) as vocabulary_cnt
from lessons l;
With an index on lesson_vocabulary(lesson_id), this should be quite fast.
I recommend this over an update, because the data remains correct.
I recommend this over a trigger, because it is simpler.
I recommend this over a subquery with aggregation because it should be faster, particularly if you are filtering on the lessons table.

SQL Query 2 tables null results

I was asked this question in an interview:
From the 2 tables below, write a query to pull customers with no sales orders.
How many ways to write this query and which would have best performance.
Table 1: Customer - CustomerID
Table 2: SalesOrder - OrderID, CustomerID, OrderDate
Query:
SELECT *
FROM Customer C
RIGHT OUTER JOIN SalesOrder SO ON C.CustomerID = SO.CustomerID
WHERE SO.OrderID = NULL
Is my query correct and are there other ways to write the query and get the same results?
Answering for MySQL instead of SQL Server, cause you tagged it later with SQL Server, so I thought (since this was an interview question, that it wouldn't bother you, for which DBMS this is). Note though, that the queries I wrote are standard sql, they should run in every RDBMS out there. How each RDBMS handles those queries is another issue, though.
I wrote this little procedure for you, to have a test case. It creates the tables customers and orders like you specified and I added primary keys and foreign keys, like one would usually do it. No other indexes, as every column worth indexing here is already primary key. 250 customers are created, 100 of them made an order (though out of convenience none of them twice / multiple times). A dump of the data follows, posted the script just in case you want to play around a little by increasing the numbers.
delimiter $$
create procedure fill_table()
begin
create table customers(customerId int primary key) engine=innodb;
set #x = 1;
while (#x <= 250) do
insert into customers values(#x);
set #x := #x + 1;
end while;
create table orders(orderId int auto_increment primary key,
customerId int,
orderDate timestamp,
foreign key fk_customer (customerId) references customers(customerId)
) engine=innodb;
insert into orders(customerId, orderDate)
select
customerId,
now() - interval customerId day
from
customers
order by rand()
limit 100;
end $$
delimiter ;
call fill_table();
For me, this resulted in this:
CREATE TABLE `customers` (
`customerId` int(11) NOT NULL,
PRIMARY KEY (`customerId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `customers` VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),(50),(51),(52),(53),(54),(55),(56),(57),(58),(59),(60),(61),(62),(63),(64),(65),(66),(67),(68),(69),(70),(71),(72),(73),(74),(75),(76),(77),(78),(79),(80),(81),(82),(83),(84),(85),(86),(87),(88),(89),(90),(91),(92),(93),(94),(95),(96),(97),(98),(99),(100),(101),(102),(103),(104),(105),(106),(107),(108),(109),(110),(111),(112),(113),(114),(115),(116),(117),(118),(119),(120),(121),(122),(123),(124),(125),(126),(127),(128),(129),(130),(131),(132),(133),(134),(135),(136),(137),(138),(139),(140),(141),(142),(143),(144),(145),(146),(147),(148),(149),(150),(151),(152),(153),(154),(155),(156),(157),(158),(159),(160),(161),(162),(163),(164),(165),(166),(167),(168),(169),(170),(171),(172),(173),(174),(175),(176),(177),(178),(179),(180),(181),(182),(183),(184),(185),(186),(187),(188),(189),(190),(191),(192),(193),(194),(195),(196),(197),(198),(199),(200),(201),(202),(203),(204),(205),(206),(207),(208),(209),(210),(211),(212),(213),(214),(215),(216),(217),(218),(219),(220),(221),(222),(223),(224),(225),(226),(227),(228),(229),(230),(231),(232),(233),(234),(235),(236),(237),(238),(239),(240),(241),(242),(243),(244),(245),(246),(247),(248),(249),(250);
CREATE TABLE `orders` (
`orderId` int(11) NOT NULL AUTO_INCREMENT,
`customerId` int(11) DEFAULT NULL,
`orderDate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`orderId`),
KEY `fk_customer` (`customerId`),
CONSTRAINT `orders_ibfk_1` FOREIGN KEY (`customerId`) REFERENCES `customers` (`customerId`)
) ENGINE=InnoDB AUTO_INCREMENT=128 DEFAULT CHARSET=utf8;
INSERT INTO `orders` VALUES (1,247,'2013-06-24 19:50:07'),(2,217,'2013-07-24 19:50:07'),(3,8,'2014-02-18 20:50:07'),(4,40,'2014-01-17 20:50:07'),(5,52,'2014-01-05 20:50:07'),(6,80,'2013-12-08 20:50:07'),(7,169,'2013-09-10 19:50:07'),(8,135,'2013-10-14 19:50:07'),(9,115,'2013-11-03 20:50:07'),(10,225,'2013-07-16 19:50:07'),(11,112,'2013-11-06 20:50:07'),(12,243,'2013-06-28 19:50:07'),(13,158,'2013-09-21 19:50:07'),(14,24,'2014-02-02 20:50:07'),(15,214,'2013-07-27 19:50:07'),(16,25,'2014-02-01 20:50:07'),(17,245,'2013-06-26 19:50:07'),(18,182,'2013-08-28 19:50:07'),(19,166,'2013-09-13 19:50:07'),(20,69,'2013-12-19 20:50:07'),(21,85,'2013-12-03 20:50:07'),(22,44,'2014-01-13 20:50:07'),(23,103,'2013-11-15 20:50:07'),(24,19,'2014-02-07 20:50:07'),(25,33,'2014-01-24 20:50:07'),(26,102,'2013-11-16 20:50:07'),(27,41,'2014-01-16 20:50:07'),(28,94,'2013-11-24 20:50:07'),(29,43,'2014-01-14 20:50:07'),(30,150,'2013-09-29 19:50:07'),(31,218,'2013-07-23 19:50:07'),(32,131,'2013-10-18 19:50:07'),(33,77,'2013-12-11 20:50:07'),(34,2,'2014-02-24 20:50:07'),(35,45,'2014-01-12 20:50:07'),(36,230,'2013-07-11 19:50:07'),(37,101,'2013-11-17 20:50:07'),(38,31,'2014-01-26 20:50:07'),(39,56,'2014-01-01 20:50:07'),(40,176,'2013-09-03 19:50:07'),(41,223,'2013-07-18 19:50:07'),(42,145,'2013-10-04 19:50:07'),(43,26,'2014-01-31 20:50:07'),(44,62,'2013-12-26 20:50:07'),(45,195,'2013-08-15 19:50:07'),(46,153,'2013-09-26 19:50:07'),(47,179,'2013-08-31 19:50:07'),(48,104,'2013-11-14 20:50:07'),(49,7,'2014-02-19 20:50:07'),(50,209,'2013-08-01 19:50:07'),(51,86,'2013-12-02 20:50:07'),(52,110,'2013-11-08 20:50:07'),(53,204,'2013-08-06 19:50:07'),(54,187,'2013-08-23 19:50:07'),(55,114,'2013-11-04 20:50:07'),(56,38,'2014-01-19 20:50:07'),(57,236,'2013-07-05 19:50:07'),(58,79,'2013-12-09 20:50:07'),(59,96,'2013-11-22 20:50:07'),(60,37,'2014-01-20 20:50:07'),(61,207,'2013-08-03 19:50:07'),(62,22,'2014-02-04 20:50:07'),(63,120,'2013-10-29 20:50:07'),(64,200,'2013-08-10 19:50:07'),(65,51,'2014-01-06 20:50:07'),(66,181,'2013-08-29 19:50:07'),(67,4,'2014-02-22 20:50:07'),(68,123,'2013-10-26 19:50:07'),(69,108,'2013-11-10 20:50:07'),(70,55,'2014-01-02 20:50:07'),(71,76,'2013-12-12 20:50:07'),(72,6,'2014-02-20 20:50:07'),(73,18,'2014-02-08 20:50:07'),(74,211,'2013-07-30 19:50:07'),(75,53,'2014-01-04 20:50:07'),(76,216,'2013-07-25 19:50:07'),(77,32,'2014-01-25 20:50:07'),(78,74,'2013-12-14 20:50:07'),(79,138,'2013-10-11 19:50:07'),(80,197,'2013-08-13 19:50:07'),(81,221,'2013-07-20 19:50:07'),(82,118,'2013-10-31 20:50:07'),(83,61,'2013-12-27 20:50:07'),(84,28,'2014-01-29 20:50:07'),(85,16,'2014-02-10 20:50:07'),(86,39,'2014-01-18 20:50:07'),(87,3,'2014-02-23 20:50:07'),(88,46,'2014-01-11 20:50:07'),(89,189,'2013-08-21 19:50:07'),(90,59,'2013-12-29 20:50:07'),(91,249,'2013-06-22 19:50:07'),(92,127,'2013-10-22 19:50:07'),(93,47,'2014-01-10 20:50:07'),(94,178,'2013-09-01 19:50:07'),(95,141,'2013-10-08 19:50:07'),(96,188,'2013-08-22 19:50:07'),(97,220,'2013-07-21 19:50:07'),(98,15,'2014-02-11 20:50:07'),(99,175,'2013-09-04 19:50:07'),(100,206,'2013-08-04 19:50:07');
Okay, now to the queries. Three ways came to my mind, I omitted the right join that MDiesel did, because it's actually just another way of writing left join. It was invented for lazy sql developers, that don't want to switch table names, but instead just rewrite one word.
Anyway, first query:
select
c.*
from
customers c
left join orders o on c.customerId = o.customerId
where o.customerId is null;
Results in an execution plan like this:
+----+-------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+
| 1 | SIMPLE | c | index | NULL | PRIMARY | 4 | NULL | 250 | Using index |
| 1 | SIMPLE | o | ref | fk_customer | fk_customer | 5 | wtf.c.customerId | 1 | Using where; Using index |
+----+-------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+
Second query:
select
c.*
from
customers c
where c.customerId not in (select distinct customerId from orders);
Results in an execution plan like this:
+----+--------------------+--------+----------------+---------------+-------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------+----------------+---------------+-------------+---------+------+------+--------------------------+
| 1 | PRIMARY | c | index | NULL | PRIMARY | 4 | NULL | 250 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | orders | index_subquery | fk_customer | fk_customer | 5 | func | 2 | Using index |
+----+--------------------+--------+----------------+---------------+-------------+---------+------+------+--------------------------+
Third query:
select
c.*
from
customers c
where not exists (select 1 from orders o where o.customerId = c.customerId);
Results in an execution plan like this:
+----+--------------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+
| 1 | PRIMARY | c | index | NULL | PRIMARY | 4 | NULL | 250 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | o | ref | fk_customer | fk_customer | 5 | wtf.c.customerId | 1 | Using where; Using index |
+----+--------------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+
We can see in all execution plans, that the customers table is read as a whole, but from the index (the implicit one as the only column is primary key). This may change, when you select other columns from the table, that are not in an index.
The first one seems to be the best. For each row in customers only one row in orders is read. The id column suggests, that MySQL can do this in one step, as only indexes are involved.
The second query seems to be the worst (though all 3 queries shouldn't perform too bad). For each row in customers the subquery is executed (the select_type column tells this).
The third query is not much different in that it uses a dependent subquery, but should perform better than the second query. Explaining the small differences would lead to far now. If you're interested, here's the manual page that explains what each column and their values mean here: EXPLAIN output
Finally: I'd say, that the first query will perform best, but as always, in the end one has to measure, to measure and to measure.
I can thing of two other ways to write this query:
SELECT C.*
FROM Customer C
LEFT OUTER JOIN SalesOrder SO ON C.CustomerID = SO.CustomerID
WHERE SO.CustomerID IS NULL
SELECT C.*
FROM Customer C
WHERE NOT C.CustomerID IN(SELECT CustomerID FROM SalesOrder)
The solutions involving outer joins will perform better than a solution using NOT IN.

improve database table design depending on a value of a type in a column

I have the following:
1. A table "patients" where I store patients data.
2. A table "tests" where I store data of tests done to each patient.
Now the problem comes as I have 2 types of tests "tests_1" and "tests_2"
So for each test done to particular patient I store the type and id of the type of test:
CREATE TABLE IF NOT EXISTS patients
(
id_patient INTEGER PRIMARY KEY,
name_patient VARCHAR(30) NOT NULL,
sex_patient VARCHAR(6) NOT NULL,
date_patient DATE
);
INSERT INTO patients values
(1,'Joe', 'Male' ,'2000-01-23');
INSERT INTO patients values
(2,'Marge','Female','1950-11-25');
INSERT INTO patients values
(3,'Diana','Female','1985-08-13');
INSERT INTO patients values
(4,'Laura','Female','1984-12-29');
CREATE TABLE IF NOT EXISTS tests
(
id_test INTEGER PRIMARY KEY,
id_patient INTEGER,
type_test VARCHAR(15) NOT NULL,
id_type_test INTEGER,
date_test DATE,
FOREIGN KEY (id_patient) REFERENCES patients(id_patient)
);
INSERT INTO tests values
(1,4,'test_1',10,'2004-05-29');
INSERT INTO tests values
(2,4,'test_2',45,'2005-01-29');
INSERT INTO tests values
(3,4,'test_2',55,'2006-04-12');
CREATE TABLE IF NOT EXISTS tests_1
(
id_test_1 INTEGER PRIMARY KEY,
id_patient INTEGER,
data1 REAL,
data2 REAL,
data3 REAL,
data4 REAL,
data5 REAL,
FOREIGN KEY (id_patient) REFERENCES patients(id_patient)
);
INSERT INTO tests_1 values
(10,4,100.7,1.8,10.89,20.04,5.29);
CREATE TABLE IF NOT EXISTS tests_2
(
id_test_2 INTEGER PRIMARY KEY,
id_patient INTEGER,
data1 REAL,
data2 REAL,
data3 REAL,
FOREIGN KEY (id_patient) REFERENCES patients(id_patient)
);
INSERT INTO tests_2 values
(45,4,10.07,18.9,1.8);
INSERT INTO tests_2 values
(55,4,17.6,1.8,18.89);
Now I think this approach is redundant or not to good...
So I would like to improve queries like
select * from tests WHERE id_patient=4;
select * from tests_1 WHERE id_patient=4;
select * from tests_2 WHERE id_patient=4;
Is there a better approach?
In this example I have 1 test of type tests_1 and 2 tests of type tests_2 for patient with id=4.
Here is a fiddle
Add a table testtype (id_test,name_test) and use it an FK to the id_type_test field in the tests table. Do not create seperate tables for test_1 and test_2
It depends on the requirement
For OLTP I would do something like the following
STAFF:
ID | FORENAME | SURNAME | DATE_OF_BIRTH | JOB_TITLE | ...
-------------------------------------------------------------
1 | harry | potter | 2001-01-01 | consultant | ...
2 | ron | weasley | 2001-02-01 | pathologist | ...
PATIENT:
ID | FORENAME | SURNAME | DATE_OF_BIRTH | ...
-----------------------------------------------
1 | hermiony | granger | 2013-01-01 | ...
TEST_TYPE:
ID | CATEGORY | NAME | DESCRIPTION | ...
--------------------------------------------------------
1 | haematology | abg | arterial blood gasses | ...
REQUEST:
ID | TEST_TYPE_ID | PATIENT_ID | DATE_REQUESTED | REQUESTED_BY | ...
----------------------------------------------------------------------
1 | 1 | 1 | 2013-01-02 | 1 | ...
RESULT_TYPE:
ID | TEST_TYPE_ID | NAME | UNIT | ...
---------------------------------------
1 | 1 | co2 | kPa | ...
2 | 1 | o2 | kPa | ...
RESULT:
ID | REQUEST_ID | RESULT_TYPE_ID | DATE_RESULTED | RESULTED_BY | RESULT | ...
-------------------------------------------------------------------------------
1 | 1 | 1 | 2013-01-02 | 2 | 5 | ...
2 | 1 | 2 | 2013-01-02 | 2 | 5 | ...
A concern I have with the above is with the unit of the test result, these can sometimes (not often) change. It may be better to place the unit un the result table.
Also consider breaking these into the major test categories as my understanding is they can be quite different e.g. histopathology and xrays are not resulted in the similar ways as haematology and microbiology are.
For OLAP I would combine request and result into one table adding derived columns such as REQUEST_TO_RESULT_MINS and make a single dimension from RESULT_TYPE and TEST_TYPE etc.
You can do this in a few ways. without knowing all the different type of cases you need to deal with.
The simplest would be 5 tables
Patients (like you described it)
Tests (like you described it)
TestType (like Declan_K suggested)
TestResultCode
TestResults
TestRsultCode describe each value that is stored for each test. TestResults is a pivoted table that can store any number of test-results per test,:
Create table TestResultCode
(
idTestResultCode int
, Code varchar(10)
, Description varchar(200)
, DataType int -- 1= Real, 2 = Varchar, 3 = int, etc.
);
Create Table TestResults
(
idPatent int -- FK
, idTest int -- FK
, idTestType int -- FK
, idTestResultCode int -- FK
, ResultsI real
, ResultsV varchar(100)
, Resultsb int
, Created datetime
)
so, basically you can fit the results you wanted to add into the tables "tests_1" and "tests_2" and any other tests you can think of.
The application reading this table, can load each test and all its values. Of course the application needs to know how to deal with each case, but you can store any type of test in this structure.