Getting newest not handled, updated rows using txid - sql

I have a table in my PostgreSQL (actually its mulitple tables, but for the sake of simplicity lets assume its just one) and multiple clients that periodically need to query the table to find changed items. These are updated or inserted items (deleted items are handled by first marking them for deletion and then actually deleting them after a grace period).
Now the obvious solution would be to just keep a “modified” timestamp column for each row, remember it for each each client and then simply fetch the changed ones
SELECT * FROM the_table WHERE modified > saved_modified_timestamp;
The modified column would then be kept up to date using triggers like:
CREATE FUNCTION update_timestamp()
RETURNS trigger
LANGUAGE ‘plpgsql’
AS $$
BEGIN
NEW.modified = NOW();
RETURN NEW;
END;
$$;
CREATE TRIGGER update_timestamp_update
BEFORE UPDATE ON the_table
FOR EACH ROW EXECUTE PROCEDURE update_timestamp();
CREATE TRIGGER update_timestamp_insert
BEFORE INSERT ON the_table
FOR EACH ROW EXECUTE PROCEDURE update_timestamp();
The obvious problem here is that NOW() is the time the transation started. So it might happen that a transaction is not yet commited while fetching the updated rows and when its commited, the timestamp is lower than the saved_modified_timestamp, so the update is never registered.
I think I found a solution that would work and I wanted to see if you can find any flaws with this approach.
The basic idea is to use xmin (or rather txid_current()) instead of the timestamp and then when fetching the changes, wrap them in an explicit transaction with REPEATABLE READ and read txid_snapshot() (or rather the three values it contains txid_snapshot_xmin(), txid_snapshot_xmax(), txid_snapshot_xip()) from the transaction.
If I read the postgres documentation correctly, then all changes made transactions that are < txid_snapshot_xmax() and not in txid_snapshot_xip() should be returned in that fetch transaction. This information should then be all that is required to get all the update rows when fetching again. The select would then look like this, with xmin_version replacing the modified column:
SELECT * FROM the_table WHERE
xmin_version >= last_fetch_txid_snapshot_xmax OR xmin_version IN last_fetch_txid_snapshot_xip;
The triggers would then be simply like this:
CREATE FUNCTION update_xmin_version()
RETURNS trigger
LANGUAGE ‘plpgsql’
AS $$
BEGIN
NEW.xmin_version = txid_current();
RETURN NEW;
END;
$$;
CREATE TRIGGER update_timestamp_update
BEFORE UPDATE ON the_table
FOR EACH ROW EXECUTE PROCEDURE update_xmin_version();
CREATE TRIGGER update_timestamp_update_insert
BEFORE INSERT ON the_table
FOR EACH ROW EXECUTE PROCEDURE update_xmin_version();
Would this work? Or am I missing something?

Thank you for the clarification about the 64-bit return from txid_current() and how the epoch rolls over. I am sorry I confused that epoch counter with the time epoch.
I cannot see any flaw in your approach but would verify through experimentation that having multiple client sessions concurrently in repeatable read transactions taking the txid_snapshot_xip() snapshot does not cause any problems.
I would not use this method in practice because I assume that the client code will need to solve for handling duplicate reads of the same change (insert/update/delete) as well as periodic reconciliations between the database contents and the client's working set to handle drift due to communication failures or client crashes. Once that code is written, then using now() in the client tracking table, clock_timestamp() in the triggers, and a grace interval overlap when the client pulls changesets would work for use cases I have encountered.
If requirements called for stronger real-time integrity than that, then I would recommend a distributed commit strategy.

Ok, so I've tested it in depth now and haven't found any flaws so far. I had around 30 clients writing and reading to the database at the same time and all of them got consistent updates. So I guess this approach works.

Related

Looping through a CURSOR in PL/PGSQL without locking the table

I have a simple PL/PGSQL block Postgres 9.5 that loops over records in a table and conditionally updates some of the records.
Here's a simplified example:
DO $$
DECLARE
-- Define a cursor to loop through records
my_cool_cursor CURSOR FOR
SELECT
u.id AS user_id,
u.name AS user_name,
u.email AS user_email
FROM users u
;
BEGIN
FOR record IN my_cool_cursor LOOP
-- Simplified example:
-- If user's first name is 'Anjali', set email to NULL
IF record.user_name = 'Anjali' THEN
BEGIN
UPDATE users SET email = NULL WHERE id = record.user_id;
END;
END IF;
END LOOP;
END;
$$ LANGUAGE plpgsql;
I'd like to execute this block directly against my database (from my app, via the console, etc...). I do not want to create a FUNCTION() or stored procedure to do this operation.
The Issue
The issue is that the CURSOR and LOOP create a table-level lock on my users table, since everything between the outer BEGIN...END runs in a transaction. This blocks any other pending queries against it. If users is sufficiently large, this locks it up for several seconds or even minutes.
What I tried
I tried to COMMIT after each UPDATE so that it clears the transaction and the lock periodically. I was surprised to see this error message:
ERROR: cannot begin/end transactions in PL/pgSQL
HINT: Use a BEGIN block with an EXCEPTION clause instead.
I'm not quite sure how this is done. Is it asking me to raise an EXCEPTION to force a COMMIT? I tried reading the documentation on Trapping Errors but it only mentions ROLLBACK, so I don't see any way to COMMIT.
How do I periodically COMMIT a transaction inside the LOOP above?
More generally, is my approach even correct? Is there a better way to loop through records without locking up the table?
1.
You cannot COMMIT within a PostgreSQL function or DO command at all (plpgsql or any other PL). The error message you reported is to the point (as far as Postgres 9.5 is concerned):
ERROR: cannot begin/end transactions in PL/pgSQL
A procedure could do that in Postgres 11 or later. See:
PostgreSQL cannot begin/end transactions in PL/pgSQL
In PostgreSQL, what is the difference between a “Stored Procedure” and other types of functions?
There are limited workarounds to achieve "autonomous transactions" in older versions:
How do I do large non-blocking updates in PostgreSQL?
Does Postgres support nested or autonomous transactions?
Do stored procedures run in database transaction in Postgres?
But you do not need any of this for the presented case.
2.
Use a simple UPDATE instead:
UPDATE users
SET email = NULL
WHERE user_name = 'Anjali'
AND email IS DISTINCT FROM NULL; -- optional improvement
Only locks the rows that are actually updated (with corner case exceptions). And since this is much faster than a CURSOR over the whole table, the lock is also very brief.
The added AND email IS DISTINCT FROM NULL avoids empty updates. Related:
Update a column of a table with a column of another table in PostgreSQL
How do I (or can I) SELECT DISTINCT on multiple columns?
It's rare that explicit cursors are useful in plpgsql functions.
If you want to avoid locking rows for a long time, you could also define a cursor WITH HOLD, for example using the DECLARE SQL statement.
Such cursors can be used across transaction boundaries, so you could COMMIT after a certain number of updates. The price you are paying is that the cursor has to be materialized on the database server.
Since you cannot use transaction statements in functions, you will either have to use a procedure or commit in your application code.

Trigger that insert into a dblink table

I am trying to do a trigger, that select values into some tables, and then insert them into another table.
So, for now I got this. There is a lot of columns, so I don't copy them, it is only varchar2 values, and this part works, so I don't think it is useful :
create or replace
TRIGGER TRIGGER_FICHE
AFTER INSERT ON T_AG
BEGIN
declare
begin
INSERT INTO t_ag_hab#DBLINK_DEV
()
values
();
/*commit;*/
end;
END;
Stored procedure where trigger will be called(again a lot of parameters, not relevant to copy them :
INSERT INTO T_AG()
VALUES
();
commit work;
The thing is, we cannot do commit into a trigger, I read that, and understand it.
But, how can I see an update of my table, with the new value?
When the process is runnign, there is nor error, but I don't see the new line into t_ag_hab.
I know it's not very clear, but I don't know how to explain it other way.
How can I fix this?,
Because you're inserting into a remove table via a database link, you have a distributed transaction:
... distributed transaction processing is more complicated because the database must coordinate the committing or rolling back of the changes in a transaction as an atomic unit. The entire transaction must commit or roll back.
When you commit, you're committing both the local insert and the remote insert performed by your trigger, as an atomic unit. You cannot commit one without the other, and you don't have to do anything extra to commit the remote change:
The two-phase commit mechanism is transparent to users who issue distributed transactions. In fact, users need not even know the transaction is distributed. A COMMIT statement denoting the end of a transaction automatically triggers the two-phase commit mechanism. No coding or complex statement syntax is required to include distributed transactions within the body of a database application.
If you can't see the inserted data from the remote database afterwards then something else has deleted it after the commit, or more likely you're looking at the wrong database.
One slight downside (though also a feature) of a database link is that it hides the details of where the work is being done. You can drop and recreate a link to make your code update a different target database without having to modify the code itself. But that means your code doesn't know where the insert is actually going - you'd need to check the data dictionary to see where the link is pointing. And even then you might not know as the link can be using a TNS alias to identify the database, and changes to the tnsnames.ora aren't visible from within the database.
If you can see the data after committing by querying t_ag_ab#dblink_dev from the same database you ran your procedure, but you can't see if when queried locally from the database you expect that to be pointing to, then the link isn't pointing where you think it is. The insert is going to one database, and you are performing your query against a different one. Only you can decide which is the 'correct' database though; and either redefine the link (or TNS entry, if appropriate), or change where you're doing the query.
I am not able to understand you requirement clearly. For updating records in main table and insert the old records in audit table. we can use the below query as a trigger.(MS-SQL)
Create trigger trg_update ON T_AGENT
AFTER UPDATE AS
BEGIN
UPDATE Tab1
SET COL1 = I.COL1, COL2=I.COL2
FROM INSERTED I INNER JOIN Tab1 ON I.COL3=Tab1.Col3
INSERT Tab1_Audit(COL1,COL2,COL3)
SELECT Tab1 FROM DELETED
RETURN;
END;
So far what you presented is just for Inserting trigger. If you want to see the update action done try to add Update like this example.
SQL> CREATE OR REPLACE TRIGGER validate_update
2 AFTER INSERT OR UPDATE ON T_AGENT
3 FOR EACH ROW
4 BEGIN
5 IF UPDATING('ACCOUNT_ID') THEN -- do something like this when updating
6 DBMS_OUTPUT.put_line ('ERROR'); -- add your action here
7 ELSIF INSERTING THEN
8 INSERT INTO t_ag_hab#DBLINK_DEV() values();
9 END IF;
10 END;
11 /
Trigger created.

I have a trigger autonomous but only execute one time in the same session

I have a trigger autonomous but only execute one time in the same session, then do nothing
CREATE OR REPLACE TRIGGER tdw_insert_unsus
BEFORE INSERT ON unsuscription_fact FOR EACH ROW
DECLARE
PRAGMA AUTONOMOUS_TRANSACTION;
v_id_suscription SUSCRIPTION_FACT.ID_SUSCRIPTION%TYPE;
v_id_date_suscription SUSCRIPTION_FACT.ID_DATE_SUSCRIPTION%TYPE;
v_id_date_unsuscription SUSCRIPTION_FACT.ID_DATE_UNSUSCRIPTION%TYPE;
v_suscription DATE;
v_unsuscription DATE;
v_live_time SUSCRIPTION_FACT.LIVE_TIME%TYPE;
BEGIN
SELECT id_suscription, id_date_suscription
INTO v_id_suscription, v_id_date_suscription
FROM(
SELECT id_suscription, id_date_suscription
FROM suscription_fact
WHERE id_mno = :NEW.ID_MNO
AND id_provider = :NEW.ID_PROVIDER
AND ftp_service_id = :NEW.FTP_SERVICE_ID
AND msisdn = :NEW.MSISDN
AND id_date_unsuscription IS NULL
ORDER BY id_date_suscription DESC
)
WHERE ROWNUM = 1;
-- calculate time
v_unsuscription := to_date(:NEW.id_date_unsuscription,'yyyymmdd');
v_suscription := to_date(v_id_date_suscription,'yyyymmdd');
v_live_time := (v_unsuscription - v_suscription);
UPDATE suscription_fact SET id_date_unsuscription = :NEW.id_date_unsuscription,
id_time_unsuscription = :NEW.id_time_unsuscription, live_time = v_live_time
WHERE id_suscription = v_id_suscription;
COMMIT;
EXCEPTION
WHEN NO_DATA_FOUND THEN
ROLLBACK;
END;
/
if I insert values works well the first o second time but after not work, but if I logout the session and login works for the first or second insertion
what is the problem?, I use oracle 10g
You're using an autonomous transaction to work around the fact that a trigger can not query its table itself. You've run into the infamous mutating table error and you have found that declaring the trigger as an autonomous transaction makes the error go away.
No luck for you though, this does not solve the problem at all:
First, any transaction logic is lost. You can't rollback the changes on the suscription_fact table, they are committed, while your main transaction is not and could be rolled back. So you've also lost your data integrity.
The trigger can not see the new row because the new row hasn't been committed yet! Since the trigger runs in an independent transaction, it can not see the uncommitted changes made by the main transaction: you will run into completely wrong results.
This is why you should never do any business logic in autonomous transactions. (there are legitimate applications but they are almost entirely limited to logging/debugging).
In your case you should either:
Update your logic so that it does not need to query your table (updating suscription_fact only if the new row is more recent than the old value stored in id_date_unsuscription).
Forget about using business logic in triggers and use a procedure that updates all tables correctly or use a view because here we have a clear case of redundant data.
Use a workaround that actually works (by Tom Kyte).
I would strongly advise using (2) here. Don't use triggers to code business logic. They are hard to write without bugs and harder still to maintain. Using a procedure guarantees that all the relevant code is grouped in one place (a package or a procedure), easy to read and follow and without unforeseen consequences.

Local Temporary table in Oracle 10 (for the scope of Stored Procedure)

I am new to oracle. I need to process large amount of data in stored proc. I am considering using Temporary tables. I am using connection pooling and the application is multi-threaded.
Is there a way to create temporary tables in a way that different table instances are created for every call to the stored procedure, so that data from multiple stored procedure calls does not mix up?
You say you are new to Oracle. I'm guessing you are used to SQL Server, where it is quite common to use temporary tables. Oracle works differently so it is less common, because it is less necessary.
Bear in mind that using a temporary table imposes the following overheads:read data to populate temporary tablewrite temporary table data to fileread data from temporary table as your process startsMost of that activity is useless in terms of helping you get stuff done. A better idea is to see if you can do everything in a single action, preferably pure SQL.
Incidentally, your mention of connection pooling raises another issue. A process munging large amounts of data is not a good candidate for running in an OLTP mode. You really should consider initiating a background (i.e. asysnchronous) process, probably a database job, to run your stored procedure. This is especially true if you want to run this job on a regular basis, because we can use DBMS_SCHEDULER to automate the management of such things.
IF you're using transaction (rather than session) level temporary tables, then this may already do what you want... so long as each call only contains a single transaction? (you don't quite provide enough detail to make it clear whether this is the case or not)
So, to be clear, so long as each call only contains a single transaction, then it won't matter that you're using a connection pool since the data will be cleared out of the temporary table after each COMMIT or ROLLBACK anyway.
(Another option would be to create a uniquely named temporary table in each call using EXECUTE IMMEDIATE. Not sure how performant this would be though.)
In Oracle, it's almost never necessary to create objects at runtime.
Global Temporary Tables are quite possibly the best solution for your problem, however since you haven't said exactly why you need a temp table, I'd suggest you first check whether a temp table is necessary; half the time you can do with one SQL what you might have thought would require multiple queries.
That said, I have used global temp tables in the past quite successfully in applications that needed to maintain a separate "space" in the table for multiple contexts within the same session; this is done by adding an additional ID column (e.g. "CALL_ID") that is initially set to 1, and subsequent calls to the procedure would increment this ID. The ID would necessarily be remembered using a global variable somewhere, e.g. a package global variable declared in the package body. E.G.:
PACKAGE BODY gtt_ex IS
last_call_id integer;
PROCEDURE myproc IS
l_call_id integer;
BEGIN
last_call_id := NVL(last_call_id, 0) + 1;
l_call_id := last_call_id;
INSERT INTO my_gtt VALUES (l_call_id, ...);
...
SELECT ... FROM my_gtt WHERE call_id = l_call_id;
END;
END;
You'll find GTTs perform very well even with high concurrency, certainly better than using ordinary tables. Best practice is to design your application so that it never needs to delete the rows from the temp table - since the GTT is automatically cleared when the session ends.
I used global temporary table recently and it was behaving very unwantedly manner.
I was using temp table to format some complex data in a procedure call and once the data is formatted, pass the data to fron end (Asp.Net).
In first call to the procedure, i used to get proper data and any subsequent call used to give me data from last procedure call in addition to current call.
I investigated on net and found out an option to delete rows on commit.
I thought that will fix the problem.. guess what ? when i used on commit delete rows option, i always used to get 0 rows from database. so i had to go back to original approach of on commit preserve rows, which preserves the rows even after commiting the transaction.This option clears rows from temp table only after session is terminated.
then i found out this post and came to know about the column to track call_id of a session.
I implemented that solution and still it dint fix the problem.
then i wrote following statement in my procedure before i starting any processing.
Delete From Temp_table;
Above statemnet made the trick. my front end was using connection pooling and after each procedure call it was commitng the transaction but still keeping the connection in connection pool and subsequent request was using the same connection and hence the database session was not terminated after every call..
Deleting rows from temp table before strating any processing made it work....
It drove me nuts till i found this solution....

SQL Server Race Condition Question

(Note: this is for MS SQL Server)
Say you have a table ABC with a primary key identity column, and a CODE column. We want every row in here to have a unique, sequentially-generated code (based on some typical check-digit formula).
Say you have another table DEF with only one row, which stores the next available CODE (imagine a simple autonumber).
I know logic like below would present a race condition, in which two users could end up with the same CODE:
1) Run a select query to grab next available code from DEF
2) Insert said code into table ABC
3) Increment the value in DEF so it's not re-used.
I know that, two users could get stuck at Step 1), and could end up with same CODE in the ABC table.
What is the best way to deal with this situation? I thought I could just wrap a "begin tran" / "commit tran" around this logic, but I don't think that worked. I had a stored procedure like this to test, but I didn't avoid the race condition when I ran from two different windows in MS:
begin tran
declare #x int
select #x= nextcode FROM def
waitfor delay '00:00:15'
update def set nextcode = nextcode + 1
select #x
commit tran
Can someone shed some light on this? I thought the transaction would prevent another user from being able to access my NextCodeTable until the first transaction completed, but I guess my understanding of transactions is flawed.
EDIT: I tried moving the wait to after the "update" statement, and I got two different codes... but I suspected that. I have the waitfor statement there to simulate a delay so the race condition can be easily seen. I think the key problem is my incorrect perception of how transactions work.
Set the Transaction Isolation Level to Serializable.
At lower isolation levels, other transactions can read the data in a row that is read, (but not yet modified) in this transaction. So two transactions can indeed read the same value. At very low isolation (Read Uncommitted) other transactions can even read data after it's been modified (but before committed)...
Review details about SQL Server Isolation Levels here
So bottom line is that the Isolation level is crtitical piece here to control what level of access other transactions get into this one.
NOTE. From the link, about Serializable
Statements cannot read data that has been modified but not yet committed by other transactions.
This is because the locks are placed when the row is modified, not when the Begin Trans occurs, So what you have done may still allow another transaction to read the old value until the point where you modify it. So I would change the logic to modify it in the same statement as you read it, thereby putting the lock on it at the same time.
begin tran
declare #x int
update def set #x= nextcode, nextcode += 1
waitfor delay '00:00:15'
select #x
commit tran
As other responders have mentioned, you can set the transaction isolation level to ensure that anything you 'read' using a SELECT statement cannot change within a transaction.
Alternatively, you could take out a lock specifically on the DEF table by adding the syntax WITH HOLDLOCK after the table name, e.g.,
SELECT nextcode FROM DEF WITH HOLDLOCK
It doesn't make much difference here, as your transaction is small, but it can be useful to take out locks for some SELECTs and not others within a transaction. It's a question of 'repeatability versus concurrency'.
A couple of relavant MS-SQL docs.
Isolation levels
Table hints
Late answer. You want to avoid a race condition...
"SQL Server Process Queue Race Condition"
Recap:
You began a transaction. This doesn't actually "do" anything in and of itself, it modifies subsequent behavior
You read data from a table. The default isolation level is Read Committed, so this select statement is not made part of the transaction.
You then wait 15 seconds
You then issue an update. With the declared transaction, this will generate a lock until the transaction is committed.
You then commit the transaction, releasing the lock.
So, guessing you ran this simultaneously in two windows (A and B):
A read the "next" value from table def, then went into wait mode
B read the same "next" value from the table, then went into wait mode. (Since A only did a read, the transaction did not lock anything.)
A then updated the table, and probably commited the change before B exited the wait state.
B then updated the table, after A's write was committed.
Try putting the wait statement after the update, before the commit, and see what happens.
It's not a real race condition. It's more a common problem with concurrent transactions. One solution is to set a read lock on the table and therefor have a serialization in place.
This is actually a common problem in SQL databases and that is why most (all?) of them have some built in features to take care of this issue of obtaining a unique identifier. Here are some things to look into if you are using Mysql or Postgres. If you are using a different database I bet the provide something very similar.
A good example of this is postgres sequences which you can check out here:
Postgres Sequences
Mysql uses something called auto increments.
Mysql auto increment
You can set the column to a computed value that is persisted. This will take care of the race condition.
Persisted Computed Columns
NOTE
Using this method means you do not need to store the next code in a table. The code column becomes the reference point.
Implementation
Give the column the following properties under computed column specification.
Formula = dbo.GetNextCode()
Is Persisted = Yes
Create Function dbo.GetNextCode()
Returns VarChar(10)
As
Begin
Declare #Return VarChar(10);
Declare #MaxId Int
Select #MaxId = Max(Id)
From Table
Select #Return = Code
From Table
Where Id = #MaxId;
/* Generate New Code ... */
Return #Return;
End