improve database table design depending on a value of a type in a column - sql

I have the following:
1. A table "patients" where I store patients data.
2. A table "tests" where I store data of tests done to each patient.
Now the problem comes as I have 2 types of tests "tests_1" and "tests_2"
So for each test done to particular patient I store the type and id of the type of test:
CREATE TABLE IF NOT EXISTS patients
(
id_patient INTEGER PRIMARY KEY,
name_patient VARCHAR(30) NOT NULL,
sex_patient VARCHAR(6) NOT NULL,
date_patient DATE
);
INSERT INTO patients values
(1,'Joe', 'Male' ,'2000-01-23');
INSERT INTO patients values
(2,'Marge','Female','1950-11-25');
INSERT INTO patients values
(3,'Diana','Female','1985-08-13');
INSERT INTO patients values
(4,'Laura','Female','1984-12-29');
CREATE TABLE IF NOT EXISTS tests
(
id_test INTEGER PRIMARY KEY,
id_patient INTEGER,
type_test VARCHAR(15) NOT NULL,
id_type_test INTEGER,
date_test DATE,
FOREIGN KEY (id_patient) REFERENCES patients(id_patient)
);
INSERT INTO tests values
(1,4,'test_1',10,'2004-05-29');
INSERT INTO tests values
(2,4,'test_2',45,'2005-01-29');
INSERT INTO tests values
(3,4,'test_2',55,'2006-04-12');
CREATE TABLE IF NOT EXISTS tests_1
(
id_test_1 INTEGER PRIMARY KEY,
id_patient INTEGER,
data1 REAL,
data2 REAL,
data3 REAL,
data4 REAL,
data5 REAL,
FOREIGN KEY (id_patient) REFERENCES patients(id_patient)
);
INSERT INTO tests_1 values
(10,4,100.7,1.8,10.89,20.04,5.29);
CREATE TABLE IF NOT EXISTS tests_2
(
id_test_2 INTEGER PRIMARY KEY,
id_patient INTEGER,
data1 REAL,
data2 REAL,
data3 REAL,
FOREIGN KEY (id_patient) REFERENCES patients(id_patient)
);
INSERT INTO tests_2 values
(45,4,10.07,18.9,1.8);
INSERT INTO tests_2 values
(55,4,17.6,1.8,18.89);
Now I think this approach is redundant or not to good...
So I would like to improve queries like
select * from tests WHERE id_patient=4;
select * from tests_1 WHERE id_patient=4;
select * from tests_2 WHERE id_patient=4;
Is there a better approach?
In this example I have 1 test of type tests_1 and 2 tests of type tests_2 for patient with id=4.
Here is a fiddle

Add a table testtype (id_test,name_test) and use it an FK to the id_type_test field in the tests table. Do not create seperate tables for test_1 and test_2

It depends on the requirement
For OLTP I would do something like the following
STAFF:
ID | FORENAME | SURNAME | DATE_OF_BIRTH | JOB_TITLE | ...
-------------------------------------------------------------
1 | harry | potter | 2001-01-01 | consultant | ...
2 | ron | weasley | 2001-02-01 | pathologist | ...
PATIENT:
ID | FORENAME | SURNAME | DATE_OF_BIRTH | ...
-----------------------------------------------
1 | hermiony | granger | 2013-01-01 | ...
TEST_TYPE:
ID | CATEGORY | NAME | DESCRIPTION | ...
--------------------------------------------------------
1 | haematology | abg | arterial blood gasses | ...
REQUEST:
ID | TEST_TYPE_ID | PATIENT_ID | DATE_REQUESTED | REQUESTED_BY | ...
----------------------------------------------------------------------
1 | 1 | 1 | 2013-01-02 | 1 | ...
RESULT_TYPE:
ID | TEST_TYPE_ID | NAME | UNIT | ...
---------------------------------------
1 | 1 | co2 | kPa | ...
2 | 1 | o2 | kPa | ...
RESULT:
ID | REQUEST_ID | RESULT_TYPE_ID | DATE_RESULTED | RESULTED_BY | RESULT | ...
-------------------------------------------------------------------------------
1 | 1 | 1 | 2013-01-02 | 2 | 5 | ...
2 | 1 | 2 | 2013-01-02 | 2 | 5 | ...
A concern I have with the above is with the unit of the test result, these can sometimes (not often) change. It may be better to place the unit un the result table.
Also consider breaking these into the major test categories as my understanding is they can be quite different e.g. histopathology and xrays are not resulted in the similar ways as haematology and microbiology are.
For OLAP I would combine request and result into one table adding derived columns such as REQUEST_TO_RESULT_MINS and make a single dimension from RESULT_TYPE and TEST_TYPE etc.

You can do this in a few ways. without knowing all the different type of cases you need to deal with.
The simplest would be 5 tables
Patients (like you described it)
Tests (like you described it)
TestType (like Declan_K suggested)
TestResultCode
TestResults
TestRsultCode describe each value that is stored for each test. TestResults is a pivoted table that can store any number of test-results per test,:
Create table TestResultCode
(
idTestResultCode int
, Code varchar(10)
, Description varchar(200)
, DataType int -- 1= Real, 2 = Varchar, 3 = int, etc.
);
Create Table TestResults
(
idPatent int -- FK
, idTest int -- FK
, idTestType int -- FK
, idTestResultCode int -- FK
, ResultsI real
, ResultsV varchar(100)
, Resultsb int
, Created datetime
)
so, basically you can fit the results you wanted to add into the tables "tests_1" and "tests_2" and any other tests you can think of.
The application reading this table, can load each test and all its values. Of course the application needs to know how to deal with each case, but you can store any type of test in this structure.

Related

SQL N:M query merging results by condition flag in intermediate table

[First of all, if this is a duplicate, sorry, I couldn't find a response for this, as this is a strange solution for a limitation on an ORM and I'm clearly a noobie on SQL]
Domain requirements:
A brigades must be composed by one user (the commissar one) and, optionally, one and only one assistant (1:1)
A user can only be part of one brigade (1:1)
CREATE TABLE Users
(
id SERIAL PRIMARY KEY,
username VARCHAR(100) NOT NULL UNIQUE,
password VARCHAR(100) NOT NULL
);
CREATE TABLE Brigades
(
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL
);
-- N:M relationship with a flag inside which determine if that user is a commissar or not
CREATE TABLE Brigade_User
(
brigade_id INT NOT NULL REFERENCES Brigades(id)
ON DELETE CASCADE
ON UPDATE CASCADE,
user_id INT NOT NULL REFERENCES Users(id)
ON DELETE CASCADE
ON UPDATE CASCADE,
is_commissar BOOLEAN NOT NULL
PRIMARY KEY(brigade_id, user_id)
);
Ideally, as relations are 1:1, Brigade_User intermediate table could be erased and a Brigade table with two foreign keys could be created instead (this is not supported by Diesel Rust ORM, so I think I'm coupled to first approach)
CREATE TABLE Brigades
(
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL
-- 1:1
commisar_id INT NOT NULL REFERENCES Users(id)
ON DELETE CASCADE
ON UPDATE CASCADE,
-- 1:1
assistant_id INT NOT NULL REFERENCES Users(id)
ON DELETE CASCADE
ON UPDATE CASCADE
);
An example...
> SELECT * FROM brigade_user LEFT JOIN brigades ON brigade_user.brigade_id = brigades.id;
brigade_id | user_id | is_commissar | id | name
------------+---------+--------------+----+------------------
1 | 1 | t | 1 | Patrulla gatuna
1 | 2 | f | 1 | Patrulla gatuna
2 | 3 | t | 2 | Patrulla perruna
2 | 4 | f | 2 | Patrulla perruna
3 | 6 | t | 3 | Patrulla canina
3 | 5 | f | 3 | Patrulla canina
(4 rows)
Is it possible to make a query which returns a table like this?
brigade_id | commissar_id | assistant_id | name
-----------+--------------+--------------+--------------------
1 | 1 | 2 | Patrulla gatuna
2 | 3 | 4 | Patrulla perruna
3 | 6 | 5 | Patrulla canina
See that each two rows have been merged into one (remember, a brigade is composed by one commissary and, optionally, one assistant) depending on the flag.
Could this model be improved (having in mind the limitation on multiple foreign keys referencing the same table, discussed here)
Try the following:
with cte as
(
SELECT A.brigade_id,A.user_id,A.is_commissar,B.name
FROM brigade_user A LEFT JOIN brigades B ON A.brigade_id = B.id
)
select C1.brigade_id, C1.user_id as commissar_id , C2.user_id as assistant_id, C1.name from
cte C1 left join cte C2
on C1.brigade_id=C2.brigade_id
and C1.user_id<>C2.user_id
where C1.is_commissar=true
See a demo from here.

How to efficiently insert tree-like data structure into postgres

Essentially, I want to efficiently store a tree-like data structure in a table with Postgres. Each row has an ID (auto-generated upon insert), a parent ID (referencing another row in the same table, possibly null), and some additional metadata. All of that data comes in at once, so I'm trying to store it all at once as efficiently as possible.
My current thought is to group all the data by which level of the tree they're at, and batch insert one level at a time. That way I can set parent IDs using the IDs generated from the previous level's inserts. This way the amount of batches is correlated with the number of levels in the tree.
This is probably "good enough", but I'm wondering if there's a better way to do this kind of thing? It still seems like a lot of back and forth and unnecessary logic to me, when I have the whole tree of data already in memory and structured correctly.
Let me show how I would do it if I had some information on who is whose child record.
In my case, I use a staging table containing the info as it comes from the source. The records have a char based primary key id, and a self-referencing,nullable, foreign key boss_id .
Here goes:
-- the input table with "business identifiers".
DROP TABLE IF EXISTS rec_input;
CREATE TABLE rec_input (
id CHAR(4)
, first_name VARCHAR(32)
, last_name VARCHAR(32)
, boss_id CHAR(4)
)
;
-- some data for it ...
INSERT INTO rec_input(id,first_name,last_name,boss_id)
SELECT 'A01','Arthur','Dent' ,NULL
UNION ALL SELECT 'A02','Ford','Prefect' ,'A01'
UNION ALL SELECT 'A03','Zaphod','Beeblebrox' ,'A01'
UNION ALL SELECT 'A04','Tricia','McMillan' ,'A01'
UNION ALL SELECT 'A05','Gag','Halfrunt' ,'A02'
UNION ALL SELECT 'A06','Prostetnic Vogon','Jeltz','A02'
UNION ALL SELECT 'A07','Lionel','Prosser' ,'A04'
UNION ALL SELECT 'A08','Benji','Mouse' ,'A04'
UNION ALL SELECT 'A09','Frankie','Mouse' ,'A04'
UNION ALL SELECT 'A10','Svlad','Cjelli' ,'A03'
;
-- create a lookup table. The surrogate key is created here.
DROP TABLE IF EXISTS lookup_help;
CREATE TABLE lookup_help (
sk SERIAL NOT NULL -- < here is the surrogate auto increment key
, id CHAR(3)
);
-- fill the lookup table
INSERT INTO lookup_help(id)
SELECT id FROM rec_input;
-- test query
SELECT * FROM lookup_help;
-- this is the target table, with auto increment
-- and matching surrogate foreign key.
DROP TABLE IF EXISTS rec;
CREATE TABLE rec (
sk INTEGER NOT NULL -- surrogate key
, id CHAR(4). -- "business id"
, first_name VARCHAR(32)
, last_name VARCHAR(32)
, boss_id CHAR(4). -- "business foreign key", not needed really
, boss_sk INTEGER. -- internal foreign key
)
;
INSERT INTO rec
SELECT
l.sk -- from lookup table, inner joined
, i.id -- from input table
, i.first_name
, i.last_name
, i.boss_id
, b.sk -- from lookup table, left outer joined
FROM rec_input i
JOIN lookup_help l USING(id) -- for the main sk
LEFT JOIN lookup_help b ON i.boss_id=b.id -- for the foreign sk
;
-- test query
SELECT * FROM rec;
-- out sk | id | first_name | last_name | boss_id | boss_sk
-- out ----+------+------------------+------------+---------+---------
-- out 2 | A02 | Ford | Prefect | A01 | 1
-- out 3 | A03 | Zaphod | Beeblebrox | A01 | 1
-- out 4 | A04 | Tricia | McMillan | A01 | 1
-- out 6 | A06 | Prostetnic Vogon | Jeltz | A02 | 2
-- out 5 | A05 | Gag | Halfrunt | A02 | 2
-- out 10 | A10 | Svlad | Cjelli | A03 | 3
-- out 7 | A07 | Lionel | Prosser | A04 | 4
-- out 8 | A08 | Benji | Mouse | A04 | 4
-- out 9 | A09 | Frankie | Mouse | A04 | 4
-- out 1 | A01 | Arthur | Dent | |
-- out (10 rows)
Perhaps with your use case, you could try NoSql at the moment, querying such data would be far efficient and faster. Maybe give it a shot.
For development you've options like Apache CouchDB, redis, etc.

Finding all entries with no new reference in another table within last two years

I have the following three tables:
CREATE TABLE group (
id SERIAL PRIMARY KEY,
name VARCHAR NOT NULL,
insert_date TIMESTAMP WITH TIME ZONE NOT NULL
);
CREATE TABLE customer (
id SERIAL PRIMARY KEY,
ext_id VARCHAR NOT NULL,
insert_date TIMESTAMP WITH TIME ZONE NOT NULL
);
CREATE TABLE customer_in_group (
id SERIAL PRIMARY KEY,
customer_id INT NOT NULL,
group_id INT NOT NULL,
insert_date TIMESTAMP WITH TIME ZONE NOT NULL,
CONSTRAINT customer_id_fk
FOREIGN KEY(customer_id)
REFERENCES customer(id),
CONSTRAINT group_id_fk
FOREIGN KEY(group_id)
REFERENCES group(id)
)
I need to find all of the groups which have not had any customer_in_group entities' group_id column reference them within the last two years. I then plan to delete all of the customer_in_groups that reference them, and finally delete that group after finding them.
So basically given the following two groups and the following 3 customer_in_groups
Group
| id | name | insert_date |
|----|--------|--------------------------|
| 1 | group1 | 2011-10-05T14:48:00.000Z |
| 2 | group2 | 2011-10-05T14:48:00.000Z |
Customer In Group
| id | group_id | customer_id | insert_date |
|----|----------|-------------|--------------------------|
| 1 | 1 | 1 | 2011-10-05T14:48:00.000Z |
| 2 | 1 | 1 | 2020-10-05T14:48:00.000Z |
| 3 | 2 | 1 | 2011-10-05T14:48:00.000Z |
I would expect just to get back group2, since group1 has a customer_in_group referencing it inserted in the last two years.
I am not sure how I would write the query that would find all of these groups.
As a starter, I would recommend enabling on delete cascade on foreing keys of customer_in_group.
Then, you can just delete the rows you want from groups, and it will drop the dependent rows in the child table. For this, you can use not exists:
delete from groups g
where not exists (
select 1
from customer_in_group cig
where cig.group_id = g.id and cig.insert_date >= now() - interval '2 year'
)

Postgresql Sequence vs Serial

I was wondering when it is better to choose sequence, and when it is better
to use serial.
What I want is returning last value after insert using
SELECT LASTVAL();
I read this question
PostgreSQL Autoincrement
I never use serial before.
Check out a nice answer about Sequence vs. Serial.
Sequence will just create sequence of unique numbers. It's not a datatype. It is a sequence. For example:
create sequence testing1;
select nextval('testing1'); -- 1
select nextval('testing1'); -- 2
You can use the same sequence in multiple places like this:
create sequence testing1;
create table table1(id int not null default nextval('testing1'), firstname varchar(20));
create table table2(id int not null default nextval('testing1'), firstname varchar(20));
insert into table1 (firstname) values ('tom'), ('henry');
insert into table2 (firstname) values ('tom'), ('henry');
select * from table1;
| id | firstname |
|----|-----------|
| 1 | tom |
| 2 | henry |
select * from table2;
| id | firstname |
|----|-----------|
| 3 | tom |
| 4 | henry |
Serial is a pseudo datatype. It will create a sequence object. Let's take a look at a straight-forward table (similar to the one you will see in the link).
create table test(field1 serial);
This will cause a sequence to be created along with the table. The sequence name's nomenclature is <tablename>_<fieldname>_seq. The above one is the equivalent of:
create sequence test_field1_seq;
create table test(field1 int not null default nextval('test_field1_seq'));
Also see: http://www.postgresql.org/docs/9.3/static/datatype-numeric.html
You can reuse the sequence that is auto-created by serial datatype, or you may choose to just use one serial/sequence per table.
create table table3(id serial, firstname varchar(20));
create table table4(id int not null default nextval('table3_id_seq'), firstname varchar(20));
(The risk here is that if table3 is dropped and we continue using table3's sequence, we will get an error)
create table table5(id serial, firstname varchar(20));
insert into table3 (firstname) values ('tom'), ('henry');
insert into table4 (firstname) values ('tom'), ('henry');
insert into table5 (firstname) values ('tom'), ('henry');
select * from table3;
| id | firstname |
|----|-----------|
| 1 | tom |
| 2 | henry |
select * from table4; -- this uses sequence created in table3
| id | firstname |
|----|-----------|
| 3 | tom |
| 4 | henry |
select * from table5;
| id | firstname |
|----|-----------|
| 1 | tom |
| 2 | henry |
Feel free to try out an example: http://sqlfiddle.com/#!15/074ac/1
2021 answer using identity
I was wondering when it is better to choose sequence, and when it is better to use serial.
Not an answer to the whole question (only the part quoted above), still I guess it could help further readers. You should not use sequence nor serial, you should rather prefer identity columns:
create table apps (
id integer primary key generated always as identity
);
See this detailed answer: https://stackoverflow.com/a/55300741/978690 (and also https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_serial)

Write SQL script to insert data

In a database that contains many tables, I need to write a SQL script to insert data if it is not exist.
Table currency
| id | Code | lastupdate | rate |
+--------+---------+------------+-----------+
| 1 | USD | 05-11-2012 | 2 |
| 2 | EUR | 05-11-2012 | 3 |
Table client
| id | name | createdate | currencyId|
+--------+---------+------------+-----------+
| 4 | tony | 11-24-2010 | 1 |
| 5 | john | 09-14-2010 | 2 |
Table: account
| id | number | createdate | clientId |
+--------+---------+------------+-----------+
| 7 | 1234 | 12-24-2010 | 4 |
| 8 | 5648 | 12-14-2010 | 5 |
I need to insert to:
currency (id=3, Code=JPY, lastupdate=today, rate=4)
client (id=6, name=Joe, createdate=today, currencyId=Currency with Code 'USD')
account (id=9, number=0910, createdate=today, clientId=Client with name 'Joe')
Problem:
script must check if row exists or not before inserting new data
script must allow us to add a foreign key to the new row where this foreign related to a row already found in database (as currencyId in client table)
script must allow us to add the current datetime to the column in the insert statement (such as createdate in client table)
script must allow us to add a foreign key to the new row where this foreign related to a row inserted in the same script (such as clientId in account table)
Note: I tried the following SQL statement but it solved only the first problem
INSERT INTO Client (id, name, createdate, currencyId)
SELECT 6, 'Joe', '05-11-2012', 1
WHERE not exists (SELECT * FROM Client where id=6);
this query runs without any error but as you can see I wrote createdate and currencyid manually, I need to take currency id from a select statement with where clause (I tried to substitute 1 by select statement but query failed).
This is an example about what I need, in my database, I need this script to insert more than 30 rows in more than 10 tables.
any help
You wrote
I tried to substitute 1 by select statement but query failed
But I wonder why did it fail? What did you try? This should work:
INSERT INTO Client (id, name, createdate, currencyId)
SELECT
6,
'Joe',
current_date,
(select c.id from currency as c where c.code = 'USD') as currencyId
WHERE not exists (SELECT * FROM Client where id=6);
It looks like you can work out if the data exists.
Here is a quick bit of code written in SQL Server / Sybase that I think answers you basic questions:
create table currency(
id numeric(16,0) identity primary key,
code varchar(3) not null,
lastupdated datetime not null,
rate smallint
);
create table client(
id numeric(16,0) identity primary key,
createddate datetime not null,
currencyid numeric(16,0) foreign key references currency(id)
);
insert into currency (code, lastupdated, rate)
values('EUR',GETDATE(),3)
--inserts the date and last allocated identity into client
insert into client(createddate, currencyid)
values(GETDATE(), ##IDENTITY)
go