Is there a way to generate composite key efficiently in a PostgreSQL database?

Is there a way to generate composite key efficiently in a PostgreSQL database? - sql

Suppose I have a student table with id as primary key, which holds information about all students.
id | name
---+------
1 | aaron
2 | bob
In addition there is a table where id and tid form a composite key, which holds the scores for each test.
id | tid | score
---| --- | -----
Note: Different students have different tests with different numbers and no correlation. tid does not represent a specific test at all, but for a student the test serial number. id=1 and id=2, if tid=1, does not mean it is the same test.
There are two ways to generate tid, one is globally unique and increases by 1 for each record inserted, e.g.
id | tid | score
-- | --- | -----
1 | 1 | 99
1 | 2 | 98
2 | 3 | 97
2 | 4 | 96
The other is unique within a specific id, and different ids can have the same tid take value, for example
id | tid | score
-- | --- | -----
1 | 1 | 99
1 | 2 | 98
2 | 1 | 97
2 | 2 | 96
In the previous way, a student with id=2 could probably guess how many tests roughly the whole school went through in between based on his tid change. Since the tid of each student changes globally, this is something I don't want. Of course I could consider using a non-repeating use of random numbers or scheme. But I would prefer a slightly more compact incremental integer to describe it.
For the latter, is there a more efficient and simple way to implement it?

Option 1.
create table student(id integer PRIMARY KEY, name varchar);
create table test(tid integer PRIMARY KEY, name varchar, test_date date);
create table test_score(sid integer, tid integer references test, score integer, PRIMARY KEY(sid, tid);
UPDATE
Option 2.
Create an BEFORE INSERT trigger that uses a function that does roughly:
CREATE OR REPLACE FUNCTION tid_incr()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
DECLARE
max_ct integer;
BEGIN
SELECT INTO max_ct max(tid) FROM score WHERE id = NEW.id;
NEW.tid = max_ct + 1;
RETURN NEW;
END;
$function$

The correct design is option 2, because tid should refer (ie be a foreign key) to the test (ie a collection of questions) being taken (whether the test is custom for each student or not).
If students can take the same test more than once, add a date column (or timestamp if multiple attempts may be made of the same day) to distinguish the results of repeat attempts at the same test.

Related

Primary key collision in scope of one trasaction

I have a postgresql database, which heavily relies on events from the outside, e.g. administrator changing / adding some fields or records might trigger a change in overall fields structure in other tables.
There lies the problem, however, as sometimes the fields changed by the trigger function are primary key fields. There is a table, which uses two foreign keys ids as the primary key, as in example below:
# | PK id1 | PK id2 | data |
0 | 1 | 1 | ab |
1 | 1 | 2 | cd |
2 | 1 | 3 | ef |
However, within one transaction (if I may call it such, since, in fact, it is a plpgsql function), the structure might be changed to:
# | PK id1 | PK id2 | data |
0 | 1 | 3 | ab |
1 | 1 | 2 | cd |
2 | 1 | 1 | ef |
Which, as you might have noticed, changed the 0th record's second primary key to 3, and the 2nd's to 1, which is the opposite of what they were before.
It is 100% certain that after the function has taken its effect there will be no collisions whatsoever, but I'm wondering, how can this be implemented?
I could, in fact, use a synthetic primary key as a BIGSERIAL, yet there is still a need for those two ids to be UNIQUE constained, so it wouldn't do the trick, unfortunately.

You can declare a constraint as deferrable, for example a primary key:
CREATE TABLE elbat (id int,
nmuloc int,
PRIMARY KEY (id)
DEFERRABLE);
You can then use SET CONSTRAINTS in a transaction to set deferrable constraints as deferred. That means that they can be violated temporarily during the transaction but must be fulfilled at the transaction's COMMIT.
Let's assume we have some data in our example table:
INSERT INTO elbat (id,
nmuloc)
VALUES (1,
1),
(2,
2);
We can now switch the IDs like this:
BEGIN TRANSACTION;
SET CONSTRAINTS ALL DEFERRED;
UPDATE elbat
SET id = 2
WHERE nmuloc = 1;
SELECT *
FROM elbat;
UPDATE elbat
SET id = 1
WHERE nmuloc = 2;
COMMIT;
There's no error even though the IDs are both 2 after the first UPDATE.
db<>fiddle
More on that can be found in the documentation, e.g. in CREATE TABLE (or ALTER TABLE) and SET CONSTRAINTS.

How to create a dynamic unique constraint

I have a huge table that is partitioned by a partition id. Each partition can have a different number of fields in its unique constraint. Consider this table:
+----+---------+-------+-----+--+
| id | part_id | name | age | |
+----+---------+-------+-----+--+
| 1 | 1 | James | 12 | |
+----+---------+-------+-----+--+
| 2 | 1 | Mary | 33 | |
+----+---------+-------+-----+--+
| 3 | 2 | James | 1 | |
+----+---------+-------+-----+--+
| 4 | 2 | Mike | 19 | |
+----+---------+-------+-----+--+
| 5 | 3 | James | 12 | |
+----+---------+-------+-----+--+
For part_id: 1 I need a unique constraint on fields name and age. part_id: 2 needs a unique constraint on name. part_id: 3 needs a unique constraint on name. I am open to any database that can accomplish this.

Classic RDBMS is designed to work with stable schema. It means that the structure of your tables, columns, indexes, relations don't change often, each table has a fixed number of columns with fixed types and it is hard/inefficient to make them dynamic.
SQL Server has filtered indexes.
So, you can create a separate unique index for each partition.
CREATE UNIQUE NONCLUSTERED INDEX IX_Part1 ON YourTable
(
name ASC,
age ASC
)
WHERE (part_id = 1)
CREATE UNIQUE NONCLUSTERED INDEX IX_Part2 ON YourTable
(
name ASC
)
WHERE (part_id = 2)
CREATE UNIQUE NONCLUSTERED INDEX IX_Part3 ON YourTable
(
name ASC
)
WHERE (part_id = 3)
These DDL statements are static and the value of part_id is hard coded in them. Optimiser is able to use such indexes in queries that have the same WHERE filter, so they are useful not just for enforcing the constraint.
You can always write a procedure that would generate a text of the CREATE INDEX statement dynamically and run it via EXEC/sp_executesql. There may be some clever use of triggers on YourTable to create it on the fly as the data in your table changes, but in the end it will be some static CREATE INDEX statement.
You can create these indexes in advance for all possible values of part_id, even if there are no such actual values in the table yet.
If you have thousands of part_id and you want to create thousands of such unique constraints, then your current schema may not be quite appropriate.
SQL Server allows max 999 nonclustered indexes per table. See Maximum Capacity Specifications for SQL Server.
Are you trying to build some variation of EAV (entity-attribute-value) model?
Maybe there are non-relational DBMS that allow greater flexibility that would suit better for your task, but I don't have experience with them.

In oracle, the below is possible to create unique index dynamically
CREATE UNIQUE INDEX idx_part_id_dynamic ON partition_table part_id,
(CASE WHEN part_id = 1 THEN name, age
WHEN part_id = 3 THEN age
ELSE height
END );
);

How to lock a table when selecting records return no data

Consider this table:
| id | objId | fieldNumber |
|------|-------|-------------|
| 902 | 1 | 1 |
| 908 | 1 | 2 |
| 1007 | 1 | 3 |
| 1189 | 8 | 1 |
| 1233 | 12 | 1 |
| 1757 | 15 | 1 |
I want to enter a new record for a non existant obj. Lets say objId: 16. The field number must increase by 1 for every obj 16. Take a look at obj: 1. As you can see it increases by 1. Now if two or more database connections try to insert obj 16 at the same time I would have two obj 16 with fieldNumber 1. This cannot happen. I must guarantee the field numbers are not the same and must increase by one.
So my solution is get all records by objid. If there is at least one record then place a lock on all records by that objid then insert a record with the next fieldNumber.
Alternatively, when I get all records by objid. If there are no records then I will lock the whole table then insert a record with fieldNumber 1.
How would I go about placing a lock on the whole table? Let me know if you have a better idea to do this?

If you can handle exceptions, then probably the easiest way is to add a unique constraint on (objid, fieldnumber).
Then you can run a query, such as:
insert into t(objid, fieldnumber)
select #objid, coalesce(max(fieldnumber) + 1, 1)
from t
where objid = #objid;
If two simultaneous threads attempt to run the query, then the unique constraint will fail -- and the thread can re-try.
You can also use the SERIALIZABLE table hint (which is explained here).

It seems that objId and fieldNumber together (should) form the primary key for this table. Normalized that way the constraint would be automatically enforced. I think this is a 'cleaner' solution. But if you can't drop an autonumbering PK scheme, then Gordon's solution is unbeatable.

improve database table design depending on a value of a type in a column

I have the following:
1. A table "patients" where I store patients data.
2. A table "tests" where I store data of tests done to each patient.
Now the problem comes as I have 2 types of tests "tests_1" and "tests_2"
So for each test done to particular patient I store the type and id of the type of test:
CREATE TABLE IF NOT EXISTS patients
(
id_patient INTEGER PRIMARY KEY,
name_patient VARCHAR(30) NOT NULL,
sex_patient VARCHAR(6) NOT NULL,
date_patient DATE
);
INSERT INTO patients values
(1,'Joe', 'Male' ,'2000-01-23');
INSERT INTO patients values
(2,'Marge','Female','1950-11-25');
INSERT INTO patients values
(3,'Diana','Female','1985-08-13');
INSERT INTO patients values
(4,'Laura','Female','1984-12-29');
CREATE TABLE IF NOT EXISTS tests
(
id_test INTEGER PRIMARY KEY,
id_patient INTEGER,
type_test VARCHAR(15) NOT NULL,
id_type_test INTEGER,
date_test DATE,
FOREIGN KEY (id_patient) REFERENCES patients(id_patient)
);
INSERT INTO tests values
(1,4,'test_1',10,'2004-05-29');
INSERT INTO tests values
(2,4,'test_2',45,'2005-01-29');
INSERT INTO tests values
(3,4,'test_2',55,'2006-04-12');
CREATE TABLE IF NOT EXISTS tests_1
(
id_test_1 INTEGER PRIMARY KEY,
id_patient INTEGER,
data1 REAL,
data2 REAL,
data3 REAL,
data4 REAL,
data5 REAL,
FOREIGN KEY (id_patient) REFERENCES patients(id_patient)
);
INSERT INTO tests_1 values
(10,4,100.7,1.8,10.89,20.04,5.29);
CREATE TABLE IF NOT EXISTS tests_2
(
id_test_2 INTEGER PRIMARY KEY,
id_patient INTEGER,
data1 REAL,
data2 REAL,
data3 REAL,
FOREIGN KEY (id_patient) REFERENCES patients(id_patient)
);
INSERT INTO tests_2 values
(45,4,10.07,18.9,1.8);
INSERT INTO tests_2 values
(55,4,17.6,1.8,18.89);
Now I think this approach is redundant or not to good...
So I would like to improve queries like
select * from tests WHERE id_patient=4;
select * from tests_1 WHERE id_patient=4;
select * from tests_2 WHERE id_patient=4;
Is there a better approach?
In this example I have 1 test of type tests_1 and 2 tests of type tests_2 for patient with id=4.
Here is a fiddle

Add a table testtype (id_test,name_test) and use it an FK to the id_type_test field in the tests table. Do not create seperate tables for test_1 and test_2

It depends on the requirement
For OLTP I would do something like the following
STAFF:
ID | FORENAME | SURNAME | DATE_OF_BIRTH | JOB_TITLE | ...
-------------------------------------------------------------
1 | harry | potter | 2001-01-01 | consultant | ...
2 | ron | weasley | 2001-02-01 | pathologist | ...
PATIENT:
ID | FORENAME | SURNAME | DATE_OF_BIRTH | ...
-----------------------------------------------
1 | hermiony | granger | 2013-01-01 | ...
TEST_TYPE:
ID | CATEGORY | NAME | DESCRIPTION | ...
--------------------------------------------------------
1 | haematology | abg | arterial blood gasses | ...
REQUEST:
ID | TEST_TYPE_ID | PATIENT_ID | DATE_REQUESTED | REQUESTED_BY | ...
----------------------------------------------------------------------
1 | 1 | 1 | 2013-01-02 | 1 | ...
RESULT_TYPE:
ID | TEST_TYPE_ID | NAME | UNIT | ...
---------------------------------------
1 | 1 | co2 | kPa | ...
2 | 1 | o2 | kPa | ...
RESULT:
ID | REQUEST_ID | RESULT_TYPE_ID | DATE_RESULTED | RESULTED_BY | RESULT | ...
-------------------------------------------------------------------------------
1 | 1 | 1 | 2013-01-02 | 2 | 5 | ...
2 | 1 | 2 | 2013-01-02 | 2 | 5 | ...
A concern I have with the above is with the unit of the test result, these can sometimes (not often) change. It may be better to place the unit un the result table.
Also consider breaking these into the major test categories as my understanding is they can be quite different e.g. histopathology and xrays are not resulted in the similar ways as haematology and microbiology are.
For OLAP I would combine request and result into one table adding derived columns such as REQUEST_TO_RESULT_MINS and make a single dimension from RESULT_TYPE and TEST_TYPE etc.

You can do this in a few ways. without knowing all the different type of cases you need to deal with.
The simplest would be 5 tables
Patients (like you described it)
Tests (like you described it)
TestType (like Declan_K suggested)
TestResultCode
TestResults
TestRsultCode describe each value that is stored for each test. TestResults is a pivoted table that can store any number of test-results per test,:
Create table TestResultCode
(
idTestResultCode int
, Code varchar(10)
, Description varchar(200)
, DataType int -- 1= Real, 2 = Varchar, 3 = int, etc.
);
Create Table TestResults
(
idPatent int -- FK
, idTest int -- FK
, idTestType int -- FK
, idTestResultCode int -- FK
, ResultsI real
, ResultsV varchar(100)
, Resultsb int
, Created datetime
)
so, basically you can fit the results you wanted to add into the tables "tests_1" and "tests_2" and any other tests you can think of.
The application reading this table, can load each test and all its values. Of course the application needs to know how to deal with each case, but you can store any type of test in this structure.

Increasing a +1 to the id without changing the content of a column

I have this random table with random contents.
id | name| mission
1 | aaaa | kitr
2 | bbbb | etre
3 | ccccc| qwqw
4 | dddd | qwert
5 | eeee | potentials
6 | ffffffff | toto
What I want is to add in the above table a column with id=3 with different name and different mission BUT the OLD id =3 I want to have an id = 4 with the name and the mission that it had before when it was id=3, and the OLD id =4 become id=5 with the name and mission of id 5 and so on.
its like i want to enter a column inside of the columns and the below column i want to increase there id +1 but the columns rest the same. example below:
id | name| mission
1 | aaaa | kitr
2 | bbbb | etre
3 | zzzzzz| zzzzz
4 | ccccc| qwqw
5 | dddd | qwert
6 | eeee | potentials
7 | ffffffff | toto
why I want to do this ? I have a table that has 2 CLOB. Inside of those CLOBS there are different queries ex: id =1 has clob of creation of a table id=2 inserts for the columns id=3 has creation of another table id=4 has functions
if you add all of this id in one text(or clob) they will have to create then inserts then create then functions. that table it is like a huge script .
Why I am doing this ? The developers are building their application and they want the sql to work in specific order and I have 6 developers and am organizing the data modeling and the performance and how the scripts are running .So the above table is to organize the calling of the scripts that they wany

Simply put, don't do it.
This case highlights why you should never use any business value, i.e. any 'real world values' for a Primary Key.
In your case I would recommend primary keys not be used for any other purposes.
I recommend you add an extra column 'order' and then change THAT column in order to re-order the rows. That way your primary key and all the other records will not need to be touched.
This avoid the issue that your approach would need to change ALL the database records below the current record which seems like a really bad approach. Just imagine trying to undo that update ;)
Some more info here: https://stackoverflow.com/a/8777574/631619

UPDATE random_table r1
SET id =
(SELECT CASE WHEN id > 2 THEN id+1 ELSE id END id FROM random_table r2
WHERE r1.mission=r2.mission
)
Then insert the new value.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Is there a way to generate composite key efficiently in a PostgreSQL database? - sql

Related

Primary key collision in scope of one trasaction

How to create a dynamic unique constraint

How to lock a table when selecting records return no data

improve database table design depending on a value of a type in a column

Increasing a +1 to the id without changing the content of a column

Categories

Resources