Is there a sensible way to import a csv to postgres where one column has multiple values per row? - sql

I'm new to relational databases and unsure what to do in the following scenario. I have 2 tables, one of which has an id primary key that is also referenced in the other.
Table 1:
CREATE TABLE table1 (
id int,
x int,
y int,
PRIMARY KEY (id)
);
Table 2:
CREATE TABLE table2 (
t1_id int,
id int,
w int,
z int,
PRIMARY KEY (id),
FOREIGN KEY (t1_id) REFERENCES table1(id)
);
For both of these tables I am importing data with \copy, for example:
\copy table1 from 'data/table1.csv' delimiter ',' csv header;
The issue is that whereas the id column in the csv that populates table1 has all ints, some of the values in the t1_id column of table2's csv are multiple ids separated by semicolon e.g. 1062;1553.
I'm not sure what the best approach to represent this kind of data in a Postgresql database is. Should I create a third intermediate table of some kind? I need to account for the fact that the foreign key in table2's data refers to the unique primary key from table1, but that there might be more than one (or zero) per row.

I can't promise this is efficient, but you could turn the t1_id column into an array of integers instead of an integer and then invoke a trigger function to check values before inserting.
Something like this should work:
CREATE TABLE table2 (
t1_id int[],
id int,
w int,
z int,
PRIMARY KEY (id)
);
CREATE OR REPLACE FUNCTION table2_insert_trigger()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
DECLARE
included_items int[];
BEGIN
select array_agg (id)
into included_items
from table1
where id = any (NEW.t1_id);
if cardinality (NEW.t1_id) = cardinality (included_items) then
return NEW;
else
raise exception 'Id(s) not found in table1';
end if;
END;
$function$;
create trigger insert_table2_trigger before insert
on table2 for each row execute procedure table2_insert_trigger();
If table1 contains id 1, 2, 3 and 4, this would work:
insert into table2 values (array[1,2], 1, 2, 3);
And this would fail:
insert into table2 values (array[1,5], 1, 2, 3);
SQL Error [P0001]: ERROR: Id(s) not found in table1 Where: PL/pgSQL
function table2_insert_trigger() line 13 at RAISE
Again, I can't swear to efficiency, but try it and see if it works.

Related

Update Postgres SQL table with SERIAL from previous insert [duplicate]

This question already has answers here:
Insert a value from a table in another table as foreign key
(3 answers)
Closed 4 months ago.
Very new to SQL in general, working on creating 2 Tables, 1 for example representing appliances with a primary key, second representing a microwave for example with its FK referencing the primary tables PK.
I'm using SERIAL as the id for the primary table, but don't know how to update or insert into the second table using that specific generated value from the first.
I've created my tables using PSQL (Postgres15) like so:
CREATE TABLE Appliances (
id SERIAL NOT NULL,
field1 integer NOT NULL DEFAULT (0),
--
PRIMARY KEY (id),
UNIQUE(id)
);
CREATE TABLE Microwaves (
id integer NOT NULL,
field1 integer,
--
PRIMARY KEY (id),
FOREIGN KEY (id) REFERENCES Appliances(id)
);
Inserting my first row into the Appliance table:
INSERT INTO Appliances(field1) VALUES(1);
SELECT * FROM Appliances;
Yields:
And a query I found somewhere pulls the current increment of the SERIAL:
SELECT currval(pg_get_serial_sequence('Appliances', 'id'));
Yields:
I'm struggling to determine how to format the INSERT statement, have tried several variations around the below input:
INSERT INTO Microwaves VALUES(SELECT currval(pg_get_serial_sequence('Appliances', 'id'), 1));
Yields:
Appreciate feedback on solving the problem as represented, or a better way to tackle this in general.
Okay looks like I stumbled on at least one solution that works in my case as taken from https://stackoverflow.com/a/50004699/3564760
DO $$
DECLARE appliance_id integer;
BEGIN
INSERT INTO Appliances(field1) VALUES('appliance2') RETURNING id INTO appliance_id;
INSERT INTO Microwaves(id, field2) VALUES(appliance_id, 100);
END $$;
Still open to other answers if this isn't ideal.

How to insert a newly generated id into another table with a trigger in postgresql?

Basically, users when they create a new record in mytable1, there is an id field that needs to be the same across multiple tables. I achieve this by having mytable2 with the s_id as primary key
My current function looks like
CREATE OR REPLACE FUNCTION test.new_record()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
BEGIN
case when new.s_id in (select s_id from mytable1) then
insert into mytable2 (sprn, date_created) select max(s_id) +1, now() from mytable2 ;
update mytable1 set new.s_id = (select max(b.s_id) from mytable2 b);
end case;
RETURN new;
END;
$function$;
Intended was when the s_id is replicated then it would create a new entry on mytable2. This new entry would then be updated onto mytable1
Problem with this function is that right now it does not recognise the new on the update part of the function.
How to keep the s_id take the value on every new insert ?
If you want to have one "generator" across multiple tables, create one sequence that is used across all those tables for the default value:
create sequence the_id_sequence;
create table one
(
id integer primary key default nextval('the_id_sequence')
.... other columns
);
create table two
(
id integer primary key default nextval('the_id_sequence')
.... other columns ...
);
If you want to replicate an ID from one table to another during insert, you only need one sequence:
create table one
(
-- using identity is the preferred over "serial" to auto-generate PK values
id integer primary key generated always as identity
);
create table two
(
id integer primary key
);
create or replace function insert_two()
returns trigger
as
$$
begin
insert into two (id) values (new.id);
return new;
end;
$$
language plpgsql;
create trigger replicate_id
before insert on one
for each row
execute procedure insert_two();
Then if you run:
insert into one (id) values (default);
A row with exactly the same id value will be inserted into table two.
If you don't have a generated ID column so far, use the following syntax:
alter table one
add testidcolumn bigint generated always as identity;

PostgreSQL constraint using prefixes

Let's say I have the following PostgreSQL table:
id | key
---+--------
1 | 'a.b.c'
I need to prevent inserting records with a key that is a prefix of another key. For example, I should be able to insert:
'a.b.b'
But the following keys should not be accepted:
'a.b'
'a.b.c'
'a.b.c.d'
Is there a way to achieve this - either by a constraint or by a locking mechanism (check the existance before inserting)?
This solution is based on PostgreSQL user-defined operators and exclusion constraints (base syntax, more details).
NOTE: more testing shows this solution does not work (yet). See bottom.
Create a function has_common_prefix(text,text) which will calculate logically what you need. Mark the function as IMMUTABLE.
CREATE OR REPLACE FUNCTION
has_common_prefix(text,text)
RETURNS boolean
IMMUTABLE STRICT
LANGUAGE SQL AS $$
SELECT position ($1 in $2) = 1 OR position ($2 in $1) = 1
$$;
Create an operator for the index
CREATE OPERATOR <~> (
PROCEDURE = has_common_prefix,
LEFTARG = text,
RIGHTARG = text,
COMMUTATOR = <~>
);
Create exclusion constraint
CREATE TABLE keys ( key text );
ALTER TABLE keys
ADD CONSTRAINT keys_cannot_have_common_prefix
EXCLUDE ( key WITH <~> );
However, the last point produces this error:
ERROR: operator <~>(text,text) is not a member of operator family "text_ops"
DETAIL: The exclusion operator must be related to the index operator class for the constraint.
This is because to create an index PostgreSQL needs logical operators to be bound with physical indexing methods, via entities calles "operator classes". So we need to provide that logic:
CREATE OR REPLACE FUNCTION keycmp(text,text)
RETURNS integer IMMUTABLE STRICT
LANGUAGE SQL AS $$
SELECT CASE
WHEN $1 = $2 OR position ($1 in $2) = 1 OR position ($2 in $1) = 1 THEN 0
WHEN $1 < $2 THEN -1
ELSE 1
END
$$;
CREATE OPERATOR CLASS key_ops FOR TYPE text USING btree AS
OPERATOR 3 <~> (text, text),
FUNCTION 1 keycmp (text, text)
;
ALTER TABLE keys
ADD CONSTRAINT keys_cannot_have_common_prefix
EXCLUDE ( key key_ops WITH <~> );
Now, it works:
INSERT INTO keys SELECT 'ara';
INSERT 0 1
INSERT INTO keys SELECT 'arka';
INSERT 0 1
INSERT INTO keys SELECT 'barka';
INSERT 0 1
INSERT INTO keys SELECT 'arak';
psql:test.sql:44: ERROR: conflicting key value violates exclusion constraint "keys_cannot_have_common_prefix"
DETAIL: Key (key)=(arak) conflicts with existing key (key)=(ara).
INSERT INTO keys SELECT 'bark';
psql:test.sql:45: ERROR: conflicting key value violates exclusion constraint "keys_cannot_have_common_prefix"
DETAIL: Key (key)=(bark) conflicts with existing key (key)=(barka).
NOTE: more testing shows this solution does not work yet: The last INSERT should fail.
INSERT INTO keys SELECT 'a';
INSERT 0 1
INSERT INTO keys SELECT 'ac';
ERROR: conflicting key value violates exclusion constraint "keys_cannot_have_common_prefix"
DETAIL: Key (key)=(ac) conflicts with existing key (key)=(a).
INSERT INTO keys SELECT 'ab';
INSERT 0 1
You can use ltree module to achieve this, it will let you to create hierarchical tree-like structures. Also will help you to prevent from reinventing the wheel, creating complicated regular expressions and so on. You just need to have postgresql-contrib package installed. Take a look:
--Enabling extension
CREATE EXTENSION ltree;
--Creating our test table with a pre-loaded data
CREATE TABLE test_keys AS
SELECT
1 AS id,
'a.b.c'::ltree AS key_path;
--Now we'll do the trick with a before trigger
CREATE FUNCTION validate_key_path() RETURNS trigger AS $$
BEGIN
--This query will do our validation.
--It'll search if a key already exists in 'both' directions
--LIMIT 1 because one match is enough for our validation :)
PERFORM * FROM test_keys WHERE key_path #> NEW.key_path OR key_path <# NEW.key_path LIMIT 1;
--If found a match then raise a error
IF FOUND THEN
RAISE 'Duplicate key detected: %', NEW.key_path USING ERRCODE = 'unique_violation';
END IF;
--Great! Our new row is able to be inserted
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER test_keys_validator BEFORE INSERT OR UPDATE ON test_keys
FOR EACH ROW EXECUTE PROCEDURE validate_key_path();
--Creating a index to speed up our validation...
CREATE INDEX idx_test_keys_key_path ON test_keys USING GIST (key_path);
--The command below will work
INSERT INTO test_keys VALUES (2, 'a.b.b');
--And the commands below will fail
INSERT INTO test_keys VALUES (3, 'a.b');
INSERT INTO test_keys VALUES (4, 'a.b.c');
INSERT INTO test_keys VALUES (5, 'a.b.c.d');
Of course I did not bother creating primary key and other constraints for this test. But do not forget to do so. Also, there is much more on ltree module than I'm showing, if you need something different take a look on its docs, perhaps you'll find the answer there.
You can try below trigger. Please note that key is sql reserve word. So I would suggest you avoid using that as column name in your table.
I have added my create table syntax also for testing purpose:
CREATE TABLE my_table
(myid INTEGER, mykey VARCHAR(50));
CREATE FUNCTION check_key_prefix() RETURNS TRIGGER AS $check_key_prefix$
DECLARE
v_match_keys INTEGER;
BEGIN
v_match_keys = 0;
SELECT COUNT(t.mykey) INTO v_match_keys
FROM my_table t
WHERE t.mykey LIKE CONCAT(NEW.mykey, '%')
OR NEW.mykey LIKE CONCAT(t.mykey, '%');
IF v_match_keys > 0 THEN
RAISE EXCEPTION 'Prefix Key Error occured.';
END IF;
RETURN NEW;
END;
$check_key_prefix$ LANGUAGE plpgsql;
CREATE TRIGGER check_key_prefix
BEFORE INSERT OR UPDATE ON my_table
FOR EACH ROW
EXECUTE PROCEDURE check_key_prefix();
Here is a CHECK - based solution - it may satisfy your needs.
CREATE TABLE keys ( id serial primary key, key text );
CREATE OR REPLACE FUNCTION key_check(text)
RETURNS boolean
STABLE STRICT
LANGUAGE SQL AS $$
SELECT NOT EXISTS (
SELECT 1 FROM keys
WHERE key ~ ( '^' || $1 )
OR $1 ~ ( '^' || key )
);
$$;
ALTER TABLE keys
ADD CONSTRAINT keys_cannot_have_common_prefix
CHECK ( key_check(key) );
PS. Unfortunately, it fails in one point (multi - row inserts).
SQL is a very powerful language. Usually you can do most of the things by plain select statements. I.e. if you do not like triggers, you can use a this method for your inserts.
The only assumption is there exists at least 1 row in the table. (*)
The table:
create table my_table
(
id integer primary key,
key varchar(100)
);
Because of the assumption, we'll have at least 1 row.(*)
insert into my_table (id, key) values (1, 'a.b.c');
Now the magic sql. The trick is replace the p_key value by your key value to insert. I have, intentionally, not put that statement into a stored procedure. Because I want it to be straight forward if you want to carry it to your application side. But usually putting sql into stored procedure is better.
insert into my_table (id, key)
select (select max(id) + 1 from my_table), p_key
from my_table
where not exists (select 'p' from my_table where key like p_key || '%' or p_key like key || '%')
limit 1;
Now the tests:
-- 'a.b.b' => Inserts
insert into my_table (id, key)
select (select max(id) + 1 from my_table), 'a.b.b'
from my_table
where not exists (select 'p' from my_table where key like 'a.b.b' || '%' or 'a.b.b' like key || '%')
limit 1;
-- 'a.b' => does not insert
insert into my_table (id, key)
select (select max(id) + 1 from my_table), 'a.b'
from my_table
where not exists (select 'p' from my_table where key like 'a.b' || '%' or 'a.b' like key || '%')
limit 1;
-- 'a.b.c' => does not insert
insert into my_table (id, key)
select (select max(id) + 1 from my_table), 'a.b.c'
from my_table
where not exists (select 'p' from my_table where key like 'a.b.c' || '%' or 'a.b.c' like key || '%')
limit 1;
-- 'a.b.c.d' does not insert
insert into my_table (id, key)
select (select max(id) + 1 from my_table), 'a.b.c.d'
from my_table
where not exists (select 'p' from my_table where key like 'a.b.c.d' || '%' or 'a.b.c.d' like key || '%')
limit 1;
(*) If you wish you can get rid of this existence of the single row by introducing an Oracle like dual table. If you wish modifying the insert statement is straight forward. Let me know if you wish to do so.
One possible solution is to create a secondary table that holds the prefixes of your keys, and then use a combination of unique and exclusion constraints with an insert trigger to enforce the uniqueness semantics you want.
At a high level, this approach breaks each key down into a list of prefixes and applies something similar to readers-writer lock semantics: any number of keys may share a prefix as long as none of the keys equals the prefix. To accomplish that, the list of prefixes includes the key itself with a flag that marks it as a terminal prefix.
The secondary table looks like this. We use a CHAR rather than a BOOLEAN for the flag because later on we’ll be adding a constraint that doesn’t work on boolean columns.
CREATE TABLE prefixes (
id INTEGER NOT NULL,
prefix TEXT NOT NULL,
is_terminal CHAR NOT NULL,
CONSTRAINT prefixes_id_fk
FOREIGN KEY (id)
REFERENCES your_table (id)
ON DELETE CASCADE,
CONSTRAINT prefixes_is_terminal
CHECK (is_terminal IN ('t', 'f'))
);
Now we’ll need to define a trigger on insert into your_table to also insert rows into prefixes, such that
INSERT INTO your_table (id, key) VALUES (1, ‘abc');
causes
INSERT INTO prefixes (id, prefix, is_terminal) VALUES (1, 'a', ‘f’);
INSERT INTO prefixes (id, prefix, is_terminal) VALUES (1, 'ab', ‘f’);
INSERT INTO prefixes (id, prefix, is_terminal) VALUES (1, 'abc', ’t’);
The trigger function might look like this. I’m only covering the INSERT case here, but the function could be made to handle UPDATE as well by deleting the old prefixes and then inserting the new ones. The DELETE case is covered by the cascading foreign-key constraint on prefixes.
CREATE OR REPLACE FUNCTION insert_prefixes() RETURNS TRIGGER AS $$
DECLARE
is_terminal CHAR := 't';
remaining_text TEXT := NEW.key;
BEGIN
LOOP
IF LENGTH(remaining_text) <= 0 THEN
EXIT;
END IF;
INSERT INTO prefixes (id, prefix, is_terminal)
VALUES (NEW.id, remaining_text, is_terminal);
is_terminal := 'f';
remaining_text := LEFT(remaining_text, -1);
END LOOP;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
We add this function to the table as a trigger in the usual way.
CREATE TRIGGER insert_prefixes
AFTER INSERT ON your_table
FOR EACH ROW
EXECUTE PROCEDURE insert_prefixes();
An exclusion constraint and a partial unique index will enforce that a row where is_terminal = ’t’ can't collide with another row of the same prefix regardless of its is_terminal value, and that there's only one row with is_terminal = ’t’:
ALTER TABLE prefixes ADD CONSTRAINT prefixes_forbid_conflicts
EXCLUDE USING gist (prefix WITH =, is_terminal WITH <>);
CREATE UNIQUE INDEX ON prefixes (prefix) WHERE is_terminal = 't';
This allows new rows that don’t conflict but prevents ones that do conflict, including in multi-row INSERTs.
db=# INSERT INTO your_table (id, key) VALUES (1, 'a.b.c');
INSERT 0 1
db=# INSERT INTO your_table (id, key) VALUES (2, 'a.b.b');
INSERT 0 1
db=# INSERT INTO your_table (id, key) VALUES (3, 'a.b');
ERROR: conflicting key value violates exclusion constraint "prefixes_forbid_conflicts"
db=# INSERT INTO your_table (id, key) VALUES (4, 'a.b.c');
ERROR: duplicate key value violates unique constraint "prefixes_prefix_idx"
db=# INSERT INTO your_table (id, key) VALUES (5, 'a.b.c.d');
ERROR: conflicting key value violates exclusion constraint "prefixes_forbid_conflicts"
db=# INSERT INTO your_table (id, key) VALUES (6, 'a.b.d'), (7, 'a');
ERROR: conflicting key value violates exclusion constraint "prefixes_forbid_conflicts"

Define foreign key in Postgres to a subset of a target table

Example:
I have:
Table A:
int id
int table_b_id
Table B:
int id
text type
I want to add a constraint check on column table_b_id that will verify that it points only to rows in table B which their type value is 'X'.
I can't change table structure.
I've understood it can be done with 'CHECK' and a postgres functions which will do the specific query but I've saw people recommending to avoid it.
Any inputs on what is the best approach to implement it will be helpful.
What you are referring to is not a FOREIGN KEY, which, in PostgreSQL, refers to a (number of) column(s) in an other table where there is a unique index on that/those column(s), and which may have associated automatic actions when the value(s) of that/those column(s) change (ON UPDATE, ON DELETE).
You are trying to enforce a specific kind of referential integrity, similar to what a FOREIGN KEY does. You can do this with a CHECK clause and a function (because the CHECK clause does not allow sub-queries), you can also do it with table inheritance and range partitioning (refer to a child table which holds only rows where type = 'X'), but it is probably the easiest to do this with a trigger:
CREATE FUNCTION trf_test_type_x() RETURNS trigger AS $$
BEGIN
PERFORM * FROM tableB WHERE id = NEW.table_b_id AND type = 'X';
IF NOT FOUND THEN
-- RAISE NOTICE 'Foreign key violation...';
RETURN NULL;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE tr_test_type_x
BEFORE INSERT OR UPDATE ON tableA
FOR EACH ROW EXECUTE PROCEDURE trf_test_type_x();
You can create a partial index on tableB to speed things up:
CREATE UNIQUE INDEX idx_type_X ON tableB(id) WHERE type = 'X';
The most elegant solution, in my opinion, is to use inheritance to get a subtyping behavior:
PostgreSQL 9.3 Schema Setup with inheritance:
create table B ( id int primary key );
-- Instead to create a 'type' field, inherit from B for
-- each type with custom properties:
create table B_X ( -- some_data varchar(10 ),
constraint pk primary key (id)
) inherits (B);
-- Sample data:
insert into B_X (id) values ( 1 );
insert into B (id) values ( 2 );
-- Now, instead to reference B, you should reference B_X:
create table A ( id int primary key, B_id int references B_X(id) );
-- Here it is:
insert into A values ( 1, 1 );
--Inserting wrong values will causes violation:
insert into A values ( 2, 2 );
ERROR: insert or update on table "a" violates foreign key constraint "a_b_id_fkey"
Detail: Key (b_id)=(2) is not present in table "b_x".
Retrieving all data from base table:
select * from B
Results:
| id |
|----|
| 2 |
| 1 |
Retrieving data with type:
SELECT p.relname, c.*
FROM B c inner join pg_class p on c.tableoid = p.oid
Results:
| relname | id |
|---------|----|
| b | 2 |
| b_x | 1 |

Syntax check for constraint thats checks value of col1 = col2 * col3

Might be something very simple regarding syntax that I've been doing wrong but for the past 2 hours i've been trying multiple statements when defining this constraint at both table and column levels as part of CREATE TABLE and tried the same separately using ALTER TABLE however haven't had any success:
Create table tb1 (
tb1_quantity number,
tb1_price number,
tb1_total number constraint tb1_total_CK
CHECK(tb1_total = SUM(tb1_quantity * tb1_price))
);
The other way i've been trying is:
Create table tb1 (
tb1_quantity number,
tb1_price number,
tb1_total number constraint tb1_total_CK
CHECK(SUM(tb1_quantity * tb1_price))
)
;
Seems to be something with the way i'm declaring the functions methinks since im constantly getting the usual ORA-00934 Group function not allowed here message. I have read multiple alternative ways using triggers and views but i'm eager to get it to work using a constraint, am I along the correct lines with this syntax or just not wording it properly ?
you need to define this as an out of line constraint..i.e.:
Create table tb1 (
tb1_quantity number,
tb1_price number,
tb1_total number,
constraint tb1_total_CK CHECK(tb1_total = tb1_quantity * tb1_price)
);
eg:
SQL> Create table tb1 (
2 tb1_quantity number,
3 tb1_price number,
4 tb1_total number,
5 constraint tb1_total_CK CHECK(tb1_total = tb1_quantity * tb1_price)
6 );
Table created.
SQL> insert into tb1 values (1, 1, 1);
1 row created.
SQL> insert into tb1 values (1, 1, 2);
insert into tb1 values (1, 1, 2)
*
ERROR at line 1:
ORA-02290: check constraint (DTD_TRADE.TB1_TOTAL_CK) violated