SQL constraint to prevent updating a column based on its prior value - sql

Can a Check Constraint (or some other technique) be used to prevent a value from being set that contradicts its prior value when its record is updated.
One example would be a NULL timestamp indicating something happened, like "file_exported". Once a file has been exported and has a non-NULL value, it should never be set to NULL again.
Another example would be a hit counter, where an integer is only permitted to increase, but can never decrease.
If it helps I'm using postgresql, but I'd like to see solutions that fit any SQL implementation

Use a trigger. This is a perfect job for a simple PL/PgSQL ON UPDATE ... FOR EACH ROW trigger, which can see both the NEW and OLD values.
See trigger procedures.

lfLoop has the best approach to the question. But to continue Craig Ringer's approach using triggers, here is an example. Essentially, you are setting the value of the column back to the original (old) value before you update.
CREATE OR REPLACE FUNCTION example_trigger()
RETURNS trigger AS
$BODY$
BEGIN
new.valuenottochange := old.valuenottochange;
new.valuenottochange2 := old.valuenottochange2;
RETURN new;
END
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
DROP TRIGGER IF EXISTS trigger_name ON tablename;
CREATE TRIGGER trigger_name BEFORE UPDATE ON tablename
FOR EACH ROW EXECUTE PROCEDURE example_trigger();

One example would be a NULL timestamp indicating something happened,
like "file_exported". Once a file has been exported and has a non-NULL
value, it should never be set to NULL again.
Another example would be a hit counter, where an integer is only
permitted to increase, but can never decrease.
In both of these cases, I simply wouldn't record these changes as attributes on the annotated table; the 'exported' or 'hit count' is a distinct idea, representing related but orthogonal real world notions from the objects they relate to:
So they would simply be different relations. Since We only want "file_exported" to occur once:
CREATE TABLE thing_file_exported(
thing_id INTEGER PRIMARY KEY REFERENCES(thing.id),
file_name VARCHAR NOT NULL
)
The hit counter is similarly a different table:
CREATE TABLE thing_hits(
thing_id INTEGER NOT NULL REFERENCES(thing.id),
hit_date TIMESTAMP NOT NULL,
PRIMARY KEY (thing_id, hit_date)
)
And you might query with
SELECT thing.col1, thing.col2, tfe.file_name, count(th.thing_id)
FROM thing
LEFT OUTER JOIN thing_file_exported tfe
ON (thing.id = tfe.thing_id)
LEFT OUTER JOIN thing_hits th
ON (thing.id = th.thing_id)
GROUP BY thing.col1, thing.col2, tfe.file_name

Stored procedures and functions in PostgreSQL have access to both old and new values, and that code can access arbitrary tables and columns. It's not hard to build simple (crude?) finite state machines in stored procedures. You can even build table-driven state machines that way.

Related

Is it safe to insert row, with value incremented by CTE select?

Say we have this table:
create table if not exists template (
id serial primary key,
label text not null,
version integer not null default 1,
created_at timestamp not null default current_timestamp,
unique(label, version)
);
The logic is to insert new record, incrementing version value in case of the equal label value. First intention is to do something like this:
with v as (
select coalesce(max(version), 0) + 1 as new_version
from template t where label = 'label1'
)
insert into template (label, version)
values ('label1', (select new_version from v))
returning *;
Although it works, I'm pretty sure it wouldn't be safe in case of the simultaneous inserts. Am I right?
If I am, should I wrap this query in a transaction?
Or is there a better way to implement this kind of versioning?
Gap-less serial IDs per label are hard to come by. Your simple approach can easily fail with concurrent writes due to inherent race conditions. And "value-locking" is not generally implemented in Postgres.
But there is a way. Introduce a parent table label - if you don't already have one - and take a lock on the parent row. This keeps locking to a minimum and should avoid excessive costs from lock contention.
CREATE TABLE label (
label text PRIMARY KEY
);
CREATE TABLE version (
id serial PRIMARY KEY
, label text NOT NULL REFERENCES label
, version integer NOT NULL DEFAULT 1
, created_at timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP
, UNIQUE(label, version)
);
Then, in a single transaction:
BEGIN;
INSERT INTO label (label)
VALUES ('label1')
ON CONFLICT (label) DO UPDATE
SET label = NULL WHERE false; -- never executed, but still locks the row
RETURNING *; -- optional
INSERT INTO version (label, version)
SELECT 'label1', coalesce(max(v.version), 0) + 1
FROM version v
WHERE v.label = 'label1'
RETURNING *;
COMMIT;
The first UPSERT inserts a new label if it's not there, yet, or locks the row if it is. Either way, the transaction now holds a lock on that label, excluding concurrent writes.
The second INSERT adds a new version, or the first one if there are none, yet.
You could also move the UPSERT into a CTE attached to the INSERT, thus making it a single command and hence always a single transaction implicitly. But the CTE is not needed per se.
This is safe under concurrent write load and works for all corner cases. You just have to make sure that all possibly competing write access takes the same route.
You might wrap this into a function. This ...
... ensures a single transaction
... simplifies the call, with a single mention of the label value
... allows to revoke write privileges from the tables and only grant it to this function if desired, enforcing the right access pattern.
CREATE FUNCTION f_new_label (_label text)
RETURNS TABLE (label text, version int)
LANGUAGE sql STRICT AS
$func$
INSERT INTO label (label)
VALUES (_label)
ON CONFLICT (label) DO UPDATE
SET label = NULL WHERE false; -- never executed, but still locks the row
INSERT INTO version AS v (label, version)
SELECT _label, coalesce(max(v1.version), 0) + 1
FROM version v1
WHERE v1.label = _label
RETURNING v.label, v.version;
$func$;
Call:
SELECT * FROM f_new_label('label1');
fiddle
Related:
How to use RETURNING with ON CONFLICT in PostgreSQL?
UPDATE n:m relation in view as array (operations)
Yes, there could be collisions with simultaneous inserts.
Transactions could lead to locks, as you want to keep the state if the table till the insert occurs with the new version number. appently, postgres would create a deadlock with multiple concurrent inserts.
You can use a before i8nsert trigger , which would guarantee, that every insert gets a higher version number, as it would go row by row.
But you have to remember, that cpus and the sql server can rearrange computation order, so that the rule first come first serve, may not be applied.

PostgreSQL Trigger after update of a specific column

I'm on my way of exploring triggers and want to create one that fires after an Update event on a game_saved column. As I have read in PostgreSQL docs it is possible to create triggers for columns. The column contains boolean values so the user may either add game to his collection or remove it. So I want the trigger function to calculate the number of games set to TRUE in the game_saved column for a certain user. And then update total_game_count in a game_collection table.
game_collection
id - BIGSERIAL primary key
user_id - INTEGER REFERENCES users(id)
total_game_count - INTEGER
game_info
id - BIGSERIAL primary key
user_id - INTEGER REFERENCES users(id)
game_id - INTEGER REFERENCES games(id)
review - TEXT
game_saved - BOOLEAN
Here is my trigger (which is not working and I want to figure out why):
CREATE OR REPLACE FUNCTION total_games()
RETURNS TRIGGER AS $$
BEGIN
UPDATE game_collection
SET total_game_count = (SELECT COUNT(CASE WHEN game_saved THEN 1 END)
FROM game_info WHERE game_collection.user_id = game_info.user_id)
WHERE user_id = NEW.user_id;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER tr_total_games
AFTER UPDATE OF game_saved FOR EACH ROW
EXECUTE PROCEDURE total_games();
If I change AFTER UPDATE OF game_saved (column) to AFTER UPDATE ON game_info (table) the trigger works correctly. So there is some problem with creating a trigger specifically for a column update.
Is it a good idea to fire the trigger on the column update or should I look for another approach here?
The syntax would be (as documented in the manual):
CREATE TRIGGER tr_total_games
AFTER UPDATE OF game_saved ON game_info
FOR EACH ROW
EXECUTE FUNCTION total_games();
For Postgres 10 or older, use:
...
EXECUTE PROCEDURE total_games();
See:
Trigger uses a procedure or a function?
But the whole approach is dubious. Keeping aggregates up to date via trigger is prone to errors under concurrent write load.
And without concurrent write load, there are simpler solutions: just add / subtract 1 from the current total ...
A VIEW would be a reliable alternative. Remove the column game_collection.total_game_count altogether - and maybe the whole table game_collection, which does not seem to have any other purpose. Create a VIEW instead:
CREATE VIEW v_game_collection AS
SELECT user_id, count(*) AS total_game_count
FROM game_info
WHERE game_saved
GROUP BY user_id;
This returns all users with at least 1 row in game_info where game_saved IS TRUE (and omits all others).
For very big tables you might want a MATERIALIZED VIEW or related solutions to improve read performance. It's a trade-off between performance, storage / cache footprint, and being up to date.

How to ignore some rows while importing from a tab separated text file in PostgreSQL?

I have a 30 GB tab separated text file which has more than 100 million rows, when I want to import this text file to a PostgreSQL table using \copy command, some rows cause error. how can I ignore those rows and also take a record of the ignored rows while importing to postgresql?
I connect to my machine by SSH so I can not use pgadmin!
it's very hard to edit the text file before importing because so many different rows have different problems. if there exists a way to check the rows one by one before importing and then run the \copy command for individual rows, it would be helpful.
Below is the code which generates the table:
CREATE TABLE Papers(
Paper_ID CHARACTER(8) PRIMARY KEY,
Original_paper_title TEXT,
Normalized_paper_title TEXT,
Paper_publish_year INTEGER,
Paper_publish_date DATE,
Paper_Document_Object_Identifier TEXT,
Original_venue_name TEXT,
Normalized_venue_name TEXT,
Journal_ID_mapped_to_venue_name CHARACTER(8),
Conference_ID_mapped_to_venue_name CHARACTER(8),
Paper_rank BIGINT,
FOREIGN KEY(Journal_ID_mapped_to_venue_name) REFERENCES Journals(Journal_ID),
FOREIGN KEY(Conference_ID_mapped_to_venue_name) REFERENCES Conferences(Conference_ID));
Don't load directly to your destination table but to a single column staging table.
create table Papers_stg (rec text);
Once you have all the data loaded you can the do verifications on the data using SQL.
Find records with wrong number of fields:
select rec
from Papers_stg
where cardinality(string_to_array(rec,' ')) <> 11
Create a table with all text fields
create table Papers_fields_text
as
select fields[1] as Paper_ID
,fields[2] as Original_paper_title
,fields[3] as Normalized_paper_title
,fields[4] as Paper_publish_year
,fields[5] as Paper_publish_date
,fields[6] as Paper_Document_Object_Identifier
,fields[7] as Original_venue_name
,fields[8] as Normalized_venue_name
,fields[9] as Journal_ID_mapped_to_venue_name
,fields[10] as Conference_ID_mapped_to_venue_name
,fields[11] as Paper_rank
from (select string_to_array(rec,' ') as fields
from Papers_stg
) t
where cardinality(fields) = 11
For fields conversion checks you might want to use the concept described here
Your only option is to use row-by-row processing. Write shell script (for example) that will loop thru input file and send each row to "copy" then check execution result, then write failed rows to some "err_input.txt".
More complicated logic can increase processing speed. Using "portions" instead of row-by-row and use row-by-row logic on failed segments.
Consider using pgloader
Check BATCHES AND RETRY BEHAVIOUR
You could use an BEFORE INSERT - trigger and check your criteria. If the record fails the check, write a log (or an entry into a separate table) and return null. You could even correct some values, if possible and feasible.
Of course, if checking criteria requires other queries (like finding duplicate keys etc.), you might get a performance issue. But I'm not sure which kind of "different problems in different rows" you mean...
Confer also an answer on StackExchange Database Administrators, and the following example taken from Bartosz Dmytrak at PostgreSQL forum:
CREATE OR REPLACE FUNCTION "myschema"."checkTriggerFunction" ()
RETURNS TRIGGER
AS
$BODY$
BEGIN
IF EXISTS (SELECT 1 FROM "myschema".mytable WHERE "MyKey" = NEW."MyKey")
THEN
RETURN NULL;
ELSE
RETURN NEW;
END IF;
END;
$BODY$
LANGUAGE plpgsql;
and trigger:
CREATE TRIGGER "checkTrigger"
BEFORE INSERT
ON "myschema".mytable
FOR EACH ROW
EXECUTE PROCEDURE "myschema"."checkTriggerFunction"();

DB2 locking when no record yet exists

I have a table, something like:
create table state {foo int not null, bar int not null, baz varchar(32)};
create unique index on state(foo,bar);
I'd like to lock for a unique record in this table. However, if there's no existing record I'd like to prevent anyone else from inserting a record, but without inserting myself.
I'd use "FOR UPDATE WITH RS USE AND KEEP EXCLUSIVE LOCKS" but that only seems to work if the record exists.
A) You can let DB2 create every ID number. Let's say you have defined your Customer table
CREATE TABLE Customers
( CustomerID Int NOT NULL
GENERATED ALWAYS AS IDENTITY
PRIMARY KEY
, Name Varchar(50)
, Billing_Type Char(1)
, Balance Dec(9,2) NOT NULL DEFAULT
);
Insert rows without specifying the CustomerID, since DB2 will always produce the value for you.
INSERT INTO Customers
(Name, Billing_Type)
VALUES
(:cname, :billtype);
If you need to know what the last value assigned in your session was, you can then use the IDENTITY_VAL_LOCAL() function.
B) In my environment, I generally specify GENERATED BY DEFAULT. This is in part due to the nature of our principle programming language, ILE RPG-IV, where developers have traditionally to allowed the compiler to use the entire record definition. This leads me to I can tell everyone to use a sequence to generate ID values for a given table or set of tables.
You can grant select to only you, but if there are others with secadm or other privileges, they could insert.
You can do something with a trigger, something like check the current session, and if the user is your user, then it inserts the row.
if (SESSION_USER <> 'Alex) then
rollback -- or generate an exception
end if;
It seems that you also want to keep just one row, then, you can control that also in a trigger:
select count(0) into value from state
if (value > 1) then
rollback -- or generate an exception
end if;

Intervals: How can I make sure there is just one row with a null value in a timstamp column in table?

I have a table with a column which contains a 'valid until' Date and I want to make sure that this can only be set to null in a single row within the table. Is there an easy way to do this?
My table looks like this (postgres):
CREATE TABLE 123.myTable(
some_id integer NOT NULL,
valid_from timestamp without time zone NOT NULL DEFAULT now(),
valid_until timestamp without time zone,
someString character varying)
some_id and valid_from is my PK. I want nobody to enter a line with a null value in column valid_until if there is already a line with null for this PK.
Thank you
In PostgreSQL, you have two basic approaches.
Use 'infinity' instead of null. Then your unique constraint works as expected. Or if you cannot do that:
CREATE UNIQUE INDEX null_valid_from ON mytable(someid) where valid_until IS NULL
I have used both approaches. I find usually the first approach is cleaner and it allows you to use range types and exclude constraints in newer versions of PostgreSQL better (to ensure no two time ranges overlap based on a given given someid), bt the second approach often is useful where the first cannot be done.
Depending on the database, you can't have null in a primary key (I don't know about all databases, but in sql server you can't). The easiest way around this I can think of is to set the date time to the minimum value, and then add a unique constraint on it, or set it to be the primary key.
I suppose another way would be to set up a trigger to check the other values in the table to see if another entry is null, and if there is one, don't allow the insert.
As Kevin said in his answer, you can set up a database trigger to stop someone from inserting more than one row where the valid until date is NULL.
The SQL statement that checks for this condition is:
SELECT COUNT(*)
FROM TABLE
WHERE valid until IS NULL;
If the count is not equal to 1, then your table has a problem.
The process that adds a row to this table has to perform the following:
Find the row where the valid until value is NULL
Update the valid until value to the current date, or some other meaningful date
Insert the new row with the valid until value set to NULL
I'm assuming you are Storing Effective-dated-records and are also using a valid from date.
If so, You could use CRUD stored procedures to enforce this compliance. E.G the insert closes off any null valid dates before inserting a new record with a null valid date.
You probably need other stored procedure validation to avoid overlapping records and to allow deleting and editing records. It may be more efficient (in terms of where clauses / faster queries) to use a date far in the future rather than using null.
I know only Oracle in sufficient detail, but the same might work in other databases:
create another column which always contains a fixed value (say '0') include this column in your unique key.
Don't use NULL but a specific very high or low value. I many cases this is actually easier to use then a NULL value
Make a function based unique key on a function converting the date including the null value to some other value (e.g. a string representation for dates and 'x' for null)
make a materialized view which gets updated on every change on your main table and put a constraint on that view.
select count(*) cnt from table where valid_until is NULL
might work as the select statement. And a check constraint limiting the cnt value to the values 0 and 1
I would suggest inserting to that table through an SP and putting your constraint in there, as triggers are quite hidden and will likely be forgotten about. If that's not an option, the following trigger will work:
CREATE TABLE dbo.TESTTRIGGER
(
YourDate Date NULL
)
CREATE TRIGGER DupNullDates
ON dbo.TESTTRIGGER
FOR INSERT, UPDATE
AS
DECLARE #nullCount int
SELECT #nullCount = (SELECT COUNT(*) FROM TESTTRIGGER WHERE YourDate IS NULL)
IF(#NullCount > 1)
BEGIN
RAISERROR('Cannot have Multiple Nulls', 16, 1)
ROLLBACK TRAN
END
GO
Well if you use MS SQL you can just add a unique Index on that column. That will allow only one NULL. I guess that if you use other RDBMS, this will still function.