How to create a trigger that automatically generates a primary key from multiple fields - sql

I have a geospatial db with (a.o.) a table with locations, and a table with features. The primary key for the locations table is location_id. Location_id is also a foreign key in the features table. The features table also includes the fields "type" (in which a two-letter code is entered to denote particular types of features), and N (which differentiates the different features that may be linked to one location). I figured a combination of location_id, type, and N would make a decent primary key for the features table. Previously, I entered these ids manually. However, I would like for this to be automatically done when a "user" enters a location ID, N, and type. (Ideally I want to find a way to automatically generate the correct N, so that "users" need only enter location_id and type, but I think this should be posted as a separate question?).
I have been trying to achieve this via triggers (see code below), but when I test it by trying to add a new data row to my features table, I get the error message "duplicate key value violates unique constraint features_pkey". Could someone point me in the direction of help for this issue?
CREATE OR REPLACE FUNCTION set_features_id()
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS
$$
DECLARE
compos_id text;
BEGIN
SELECT loc_id || type || N FROM features INTO compos_id;
NEW.id := compos_id;
RETURN NEW;
END;
$$;
DROP TRIGGER IF EXISTS set_lf_id_trigger on public.landscape_features_point;
CREATE TRIGGER set_features_id_trigger
BEFORE INSERT
ON "features"
FOR EACH ROW
EXECUTE PROCEDURE set_features_id();

Related

PostgreSQL: Inserting tuples in multiple tables using a view and a trigger

I am trying to build an order system that is able to insert a compound order that consists of multiple items and amounts. My database layout is as follows: I have an order table, containing an autoincrement id, item_id, amount and order_group_id columns. I also have an order_group table containing an autoincrement id and a person_id column. The idea is that when a person orders, one new order_group entry is created, and its id is used as the fk in the orders that the person has done.
I presume that this would normally be done in the code of the application. However, I am using postgrest to provide an API for me, which suggests creating a custom view to insert compound entries via that route. This is described here.
This is what I have so far:
CREATE FUNCTION kzc.new_order()
RETURNS TRIGGER
LANGUAGE plpgsql
AS $$
DECLARE
group_id int;
BEGIN
INSERT INTO kzc.order_group (person) VALUES (new.person) RETURNING id AS group_id;
INSERT INTO kzc."order" (item, amount, order_group) VALUES (new.item_id, new.amount, group_id);
RETURN new;
END;
$$;
CREATE TRIGGER new_order
INSTEAD OF INSERT ON kzc.new_order
FOR EACH ROW
EXECUTE FUNCTION kzc.new_order()
However, this code makes a new ordergroup for every order that is in the compound insert. How can I make it so that my code only makes one new ordergroup entry and assigns its id to all orders?
Thanks in advance!
I suggest that you add an order_group_id column to the new_order view and create a sequence for it. Then create a DEFAULT value for the column:
ALTER VIEW kzc.new_order
ALTER order_group_id SET DEFAULT currval('order_group_id_seq');
Add a BEFORE INSERT trigger FOR EACH STATEMENT that just calls nextval for the sequence. The currval calls will all pick up the same generated value.
Then you have that number in your trigger and can use it as a primary key for order_group.
To avoid adding the row multiple times, use
INSERT INTO kzc.order_group (id, person)
VALUES (NEW.order_group_id, NEW.person)
ON CONFLICT (id) DO NOTHING;

Creating an Identifier that Combines Multiple Other Columns

I'm working on a DB and would like to implement a system where a tables unique ID is generated by combining several other IDs/factors. Basically, I'd want an ID that looks like this:
1234 (A reference to a standard incrementing serial ID from another table)
10 (A reference to a standard incrementing serial ID from another table)
1234 (A number that increments from 1000-9999)
So the ID would look like:
1234101234
Additionally, each of those "entries" will have multiple time sensitive instances that are stored in another table. For these IDs I want to take the above ID and append a time stamp, so it'll look like:
12341012341234567890123
I've looked a little bit at PSQL sequences, but they seem like they're mostly used for simply incrementing up or down at certain levels, I'm not sure how to do this sort of concatenation in creating an ID string or whether it's even possible.
Don't do it! Just use a serial primary key id and then have three different columns:
otherTableID
otherTable2ID
timestamp
You can uniquely identify each row using your serial id. You can look up the other information. And -- even better -- you can create foreign key constraints to represent the relationships among the tables.
I'm not sure what do you want to achive, but
SELECT col_1::text || col_2::text || col_3::text || now()::text
should work. You should also add UNIQUE constraint on the column, i.e.
ALTER TABLE this_table ADD UNIQUE INDEX (this_new_column);
But the real question is: why do you want to do this? If you just want a unique meaningless ID, you need just to create column of type serial.
create procedure f_return_unq_id(
CONDITIONAL_PARAMS IN INTEGER,
v_seq in out integer
)
is
QUERY_1 VARCHAR2(200);
RESP INTEGER;
BEGIN
QUERY_1:='SELECT TAB1.SL_ID||TAB2.SL_ID||:v_seq||SYSTIMESTAMP FROM TABLE1 TAB1,TABLE2 TAB2 WHERE TAB1.CONDITION=:V_PARAMS';
BEGIN
EXECUTE IMMEDIATE QUERY_1 INTO RESP USING v_seq,CONDITIONAL_PARAMS;
EXCEPTION
when others then
DBMS_OUTPUT.PUT_LINE(SQLCODE);
END;
v_seq:=RESP;
EXCEPTION
when others then
DBMS_OUTPUT.PUT_LINE(SQLCODE);
END;
pass the v_seq to this procedure as your sequence number 1000-9999 and conditional parameters if any are there.

How to ignore some rows while importing from a tab separated text file in PostgreSQL?

I have a 30 GB tab separated text file which has more than 100 million rows, when I want to import this text file to a PostgreSQL table using \copy command, some rows cause error. how can I ignore those rows and also take a record of the ignored rows while importing to postgresql?
I connect to my machine by SSH so I can not use pgadmin!
it's very hard to edit the text file before importing because so many different rows have different problems. if there exists a way to check the rows one by one before importing and then run the \copy command for individual rows, it would be helpful.
Below is the code which generates the table:
CREATE TABLE Papers(
Paper_ID CHARACTER(8) PRIMARY KEY,
Original_paper_title TEXT,
Normalized_paper_title TEXT,
Paper_publish_year INTEGER,
Paper_publish_date DATE,
Paper_Document_Object_Identifier TEXT,
Original_venue_name TEXT,
Normalized_venue_name TEXT,
Journal_ID_mapped_to_venue_name CHARACTER(8),
Conference_ID_mapped_to_venue_name CHARACTER(8),
Paper_rank BIGINT,
FOREIGN KEY(Journal_ID_mapped_to_venue_name) REFERENCES Journals(Journal_ID),
FOREIGN KEY(Conference_ID_mapped_to_venue_name) REFERENCES Conferences(Conference_ID));
Don't load directly to your destination table but to a single column staging table.
create table Papers_stg (rec text);
Once you have all the data loaded you can the do verifications on the data using SQL.
Find records with wrong number of fields:
select rec
from Papers_stg
where cardinality(string_to_array(rec,' ')) <> 11
Create a table with all text fields
create table Papers_fields_text
as
select fields[1] as Paper_ID
,fields[2] as Original_paper_title
,fields[3] as Normalized_paper_title
,fields[4] as Paper_publish_year
,fields[5] as Paper_publish_date
,fields[6] as Paper_Document_Object_Identifier
,fields[7] as Original_venue_name
,fields[8] as Normalized_venue_name
,fields[9] as Journal_ID_mapped_to_venue_name
,fields[10] as Conference_ID_mapped_to_venue_name
,fields[11] as Paper_rank
from (select string_to_array(rec,' ') as fields
from Papers_stg
) t
where cardinality(fields) = 11
For fields conversion checks you might want to use the concept described here
Your only option is to use row-by-row processing. Write shell script (for example) that will loop thru input file and send each row to "copy" then check execution result, then write failed rows to some "err_input.txt".
More complicated logic can increase processing speed. Using "portions" instead of row-by-row and use row-by-row logic on failed segments.
Consider using pgloader
Check BATCHES AND RETRY BEHAVIOUR
You could use an BEFORE INSERT - trigger and check your criteria. If the record fails the check, write a log (or an entry into a separate table) and return null. You could even correct some values, if possible and feasible.
Of course, if checking criteria requires other queries (like finding duplicate keys etc.), you might get a performance issue. But I'm not sure which kind of "different problems in different rows" you mean...
Confer also an answer on StackExchange Database Administrators, and the following example taken from Bartosz Dmytrak at PostgreSQL forum:
CREATE OR REPLACE FUNCTION "myschema"."checkTriggerFunction" ()
RETURNS TRIGGER
AS
$BODY$
BEGIN
IF EXISTS (SELECT 1 FROM "myschema".mytable WHERE "MyKey" = NEW."MyKey")
THEN
RETURN NULL;
ELSE
RETURN NEW;
END IF;
END;
$BODY$
LANGUAGE plpgsql;
and trigger:
CREATE TRIGGER "checkTrigger"
BEFORE INSERT
ON "myschema".mytable
FOR EACH ROW
EXECUTE PROCEDURE "myschema"."checkTriggerFunction"();

SQL constraint to prevent updating a column based on its prior value

Can a Check Constraint (or some other technique) be used to prevent a value from being set that contradicts its prior value when its record is updated.
One example would be a NULL timestamp indicating something happened, like "file_exported". Once a file has been exported and has a non-NULL value, it should never be set to NULL again.
Another example would be a hit counter, where an integer is only permitted to increase, but can never decrease.
If it helps I'm using postgresql, but I'd like to see solutions that fit any SQL implementation
Use a trigger. This is a perfect job for a simple PL/PgSQL ON UPDATE ... FOR EACH ROW trigger, which can see both the NEW and OLD values.
See trigger procedures.
lfLoop has the best approach to the question. But to continue Craig Ringer's approach using triggers, here is an example. Essentially, you are setting the value of the column back to the original (old) value before you update.
CREATE OR REPLACE FUNCTION example_trigger()
RETURNS trigger AS
$BODY$
BEGIN
new.valuenottochange := old.valuenottochange;
new.valuenottochange2 := old.valuenottochange2;
RETURN new;
END
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
DROP TRIGGER IF EXISTS trigger_name ON tablename;
CREATE TRIGGER trigger_name BEFORE UPDATE ON tablename
FOR EACH ROW EXECUTE PROCEDURE example_trigger();
One example would be a NULL timestamp indicating something happened,
like "file_exported". Once a file has been exported and has a non-NULL
value, it should never be set to NULL again.
Another example would be a hit counter, where an integer is only
permitted to increase, but can never decrease.
In both of these cases, I simply wouldn't record these changes as attributes on the annotated table; the 'exported' or 'hit count' is a distinct idea, representing related but orthogonal real world notions from the objects they relate to:
So they would simply be different relations. Since We only want "file_exported" to occur once:
CREATE TABLE thing_file_exported(
thing_id INTEGER PRIMARY KEY REFERENCES(thing.id),
file_name VARCHAR NOT NULL
)
The hit counter is similarly a different table:
CREATE TABLE thing_hits(
thing_id INTEGER NOT NULL REFERENCES(thing.id),
hit_date TIMESTAMP NOT NULL,
PRIMARY KEY (thing_id, hit_date)
)
And you might query with
SELECT thing.col1, thing.col2, tfe.file_name, count(th.thing_id)
FROM thing
LEFT OUTER JOIN thing_file_exported tfe
ON (thing.id = tfe.thing_id)
LEFT OUTER JOIN thing_hits th
ON (thing.id = th.thing_id)
GROUP BY thing.col1, thing.col2, tfe.file_name
Stored procedures and functions in PostgreSQL have access to both old and new values, and that code can access arbitrary tables and columns. It's not hard to build simple (crude?) finite state machines in stored procedures. You can even build table-driven state machines that way.

Including multiple columns in a single index in Postgres

I have a 'users' table with two columns, 'email' and 'new_email'. I need:
A case-insensitive uniqueness constraint covering both columns - i.e., if "Bob#Example.com" appears in one row's 'email' column, then inserting "bob#example.com" into another row's (or even the same row's) 'new_email' column should fail.
Fast case-insensitive searching for a given email address in either the 'email' or 'new_email' fields - i.e. find the row where the new_email OR email is "Bob#example.com", case-insensitive.
I know that I could do this more easily by creating a related 'emails' table, but I'm expecting to be looking up users in this table (by primary key) from several applications, and I'd like to avoid duplicating the join logic in various places to also retrieve their emails. So I think some kind of expression index would be best, if that's possible.
If this isn't possible, I suppose my next best option would be to create a view that the other applications could use to easily fetch a user's emails along with their other information, but I'm not sure how to do that either.
I'm using Postgres 8.4. Thank you!
I think you'll have to use a trigger to enforce your cross-column uniqueness constraint. If you add unique indexes on each column and then a trigger something like this (untested off the top of my head code):
CREATE FUNCTION no_dups_allowed() RETURNS trigger AS $$
DECLARE
r ROW;
BEGIN
SELECT 1 INTO r
FROM users
WHERE LOWER(email) = LOWER(NEW.email_new)
OR LOWER(email_new) = LOWER(NEW.email);
IF FOUND THEN
-- Found a duplicate so it is time for a hissy fit!
RAISE 'Duplicate email address found' USING ERRCODE = 'unique_violation';
END;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
You'd want something like that as a BEFORE INSERT and BEFORE UPDATE trigger. That trigger would take care of catching cross-column duplicates and the unique indexes would take care of in-column duplicates.
Some useful references:
FOUND
RAISE
Triggers
Trigger Procedures
You'll want the individual indexes for your queries anyway and using the uniqueness half of the indexes simplifies your trigger by leaving it to only deal with the cross-column part; if you try to do it all in the trigger, then you'll have to watch out for updating a row without really changing the email or email_new columns.
For the querying half, you could create a view that used a UNION to combine the two columns. You could also create a function to merge the user's email addresses into one list. Hard to say which would be best without know more details of these other queries but I suspect that fixing all the other queries to know about email and email_new would be the best approach; you'll have to update all the other queries to use the view or function anyway so why build a view or function at all?
No need for triggers. Try this:
create table et (email text, email2 text);
create unique index et_u on et (coalesce(lower(email),lower(email2)));
insert into et (email,email2) values ('scott#gmail.com',NULL);
insert into et (email,email2) values ('scott#gmail.com',NULL);
ERROR: duplicate key value violates unique constraint "et_u"
insert into et (email,email2) values (NULL,'scott#gmail.com');
ERROR: duplicate key value violates unique constraint "et_u"
insert into et (email,email2) values (NULL,'Scott#gmail.com');
ERROR: duplicate key value violates unique constraint "et_u"