ACID transactions across multiple technologies - sql

I have an application that uses a local database and a remote database to synchronize to. The local database uses SQLite and for the remote database I'm using postgres. I need to move data from one database to the other database and avoid duplicating information.
Roughly what I do right now:
BEGIN; //remote database (start transaction)
SELECT * FROM local.queued TOP 1; //local database (select first queued element)
INSERT INTO remote.queued VALUES ( element ) //remote database (insert first queued element on remote database)
BEGIN; //local database (start transaction)
DELETE * FROM local.queued LIMIT 1; //local database (delete first queued element on local database)
END; //local database (finalize transaction local database)
END; //remote database (finalize transaction remote database)
This works relatively well most of the times but incidentally, after giving a hard reset to the program I've noticed a data record was duplicated. I believe this is has something to do with the transaction finalizing. Because I'm using two distinct technologies it would be impossible to create a single atomic commit with WAL archiving.
Any ideas how I could improve this concept to avoid duplicative entries.

The canonical way to do that is a distributed transaction using the two-phase commit protocol.
Unfortunately SQLite doesn't seem to support it, but since PostgreSQL does, you can still use it if only two databases are involved:
BEGIN; -- on PostgreSQL
BEGIN; -- on SQLite
/*
* Do work on both databases.
* On error, ROLLBACK both transactions.
*/
PREPARE TRANSACTION 'somename'; -- PostgreSQL
COMMIT; -- SQLite
COMMIT PREPARED 'somename'; -- PostgreSQL
Now if an error happens during the SQLite COMMIT, you run ROLLBACK PREPARED 'sonename' on PostgreSQL. The idea is that everything that can fail during commit is done during PREPARE TRANSACTION, and the state of the transaction is persisted so that it stays open, but will still survive a server restart.
This is safe, but there is a caveat. Prepared transactions are dangerous, because they will hold locks and keep VACUUM from cleaning up (like all other transactions), but they are persistent and stick around until you explicitly remove them. So you need some piece of software, a distributed transaction manager, that is crash safe and keeps track of all distributed transactions. This transaction manager can clean up all prepared transactions after some outage.

I think it would make sense to make your DML actions idempotent - that is to say that if you call them multiple times they have the same overall effect. For example, we can make the INSERT a no-op if the data exists:
INSERT INTO x(id, name)
SELECT nu.id, nu.name
FROM
(SELECT 1 as id, 'a' as name) as nu
LEFT JOIN x ON nu.id = x.id
WHERE
x.id IS NULL
You can run this as many times as you like, and it'll only insert one record
https://www.db-fiddle.com/f/nbHmy3PVDQ3RrGMqLni1su/0
YOu'll need to decide what to do if the record exists in an altered state - eg do you want to leave it alone, or reset it to the incoming values - a question for another time

Related

exclusive lock and shared lock for select statement - SQL Server

I am not able to understand how select will behave while its part of exclusive transaction. Please consider following scenarios –
Scenario 1
Step 1.1
create table Tmp(x int)
insert into Tmp values(1)
Step 1.2 – session 1
begin tran
set transaction isolation level serializable
select * from Tmp
Step 1.3 – session 2
select * from Tmp
Even first session hasn't been finished, session 2 will be able to read tmp table. I thought Tmp will have exclusive lock and shared lock should not be issued to select query in session 2. And it’s not happening. I have made sure that default isolation level is READ COMMITED.
Thanks in advance for helping me in understanding this behavior.
EDIT : Why I need select in exclusive lock?
I have a SP which actually generate sequential values. So flow is -
read max values from Table and store value in variables
Update table set value=value+1
This SP is executed in parallel by several thousand instances. If two instances execute SP at same time, then they will read same value and will update value+1. Though I would like to have sequential value for every execution. I think its possible only if select is also part of exclusive lock.
If you want a transaction to be serializable, you have to change that option before you start the outermost transaction. So your first session is incorrect and is still actually running under read committed (or whatever other level was in effect for that session).
But even if you correct the order of statements, it still will not acquire an exclusive lock for a plain SELECT statement.
If you want the plain SELECT to acquire an exclusive lock, you need to ask for it:
select * from Tmp with (XLOCK)
or you need to execute a statement that actually requires an exclusive lock:
update Tmp set x = x
Your first session doesn't need an exclusive lock because it's not changing the data. If your first (serializable) session had run to completion and either rolled back or committed, before your second session was started, that session's results would still be the same because your first session didn't change the data - and so the "serializable" nature of the transaction was correct.

Design a Lock for SQL Server to help relax the conflict between INSERT and SELECT

SQL Server is SQL Azure, basically it's SQL Server 2008 for normal process.
I have a table, called TASK, constantly have new data in (new task), and removed (task complete)
For new data in, I use INSERT INTO .. SELECT ..., most of time takes very long, lets say dozen of minutes.
For old data out, I first use SELECT (WITH NOLOCK) to get task, UPDATE to let other thread know this task already starts to process, then DELETE once finished.
Dead lock sometime happens on SELECT, most time happens on UPDATE and DELETE.
this is not time critical task, so I can start process the new data once all INSERT finished. Is there any kind of LOCK to ask SELECT not to select it before the INSERT finished? Or any kind of other suggestion to avoid Conflict. I can redesign table if needed.
later the sqlserver2005,resolve lock is easy.
for conflict
1.you can use the service broker.
2.use the isolution level.
dbcc useroptions ,at last row ,you can see the deflaut isolution level is read_committed,this is the session level.
we can change the level to read_committed_snapshot for conflict,in sqlserver, not realy row lock like oracle.but we can use this method implement.
ALTER DATABASE DBName
SET READ_COMMITTED_SNAPSHOT ON;
open this feature,must in single user schame.
and you can test it.
for session A ,session B.
A:update table1 set name = 'new' with(Xlock) where id = 1
B:you still update other row and select all the data from table.
my english is not very good,but for lock ,i know.
in sqlserver,for function ,there are three locks.
1.optimistic lock ,use the timestamp(rowversion) control.
2.pessimism lock ,force lock when use the date.use Ulock,Xlock and so on.
3.virtual lock,use the proc getapplock().
if you need lock schame in system architecture,please me email : mjjjj2001#163.com
Consider using service broker if this is a processing queue.
There are a number of considerations that affect performance and locking. I surmise that the data is being updated and deleted in a separate session. Which transaction isolation level is in use for the insert session and the delete session.
Has the insert session and all transactions committed and closed when the delete session runs? Are there multiple delete sessions running concurrently? It is very important to have an index on the columns you are using to identify a task for the SELECT/UPDATE/DELETE statements, especially if you move to a higher isolation level such as REPEATABLE READ or SERIALIZED.
All of these issues could be solved by moving to Service Broker if it is appropriate.

oracle - commit over dblink?

If I connect to an oracle database as user smith, and issue the following 3 commands:
update smith.tablea
set col_name = 'florence' where col_id = 8;
insert into bob.other_table#mylink
values ('blah',2,'uncle','new');
commit;
Does this mean that the update to the local table (smith.tablea) and the insert to the remote db table (bob.other_table) have both been committed or that just the update to the local table has been committed?
Note: that 'mylink' represents a dblink to a remote database.
From documentation
The Oracle two-phase commit mechanism is completely transparent to
users who issue distributed transactions. In fact, users need not even
know the transaction is distributed. A COMMIT statement denoting the
end of a transaction automatically triggers the two-phase commit
mechanism to commit the transaction. No coding or complex statement
syntax is required to include distributed transactions within the body
of a database application.
so - yes, if everything goes fine, both operations are commited.
In this case the transaction should only work if the remote transaction and your local transaction are successfull.
More information about distributed transactions:
http://docs.oracle.com/cd/B19306_01/server.102/b14231/ds_txnman.htm

Obj-C, sqlite Commit Transaction, can use Selects within a transaction?

I'm improving the performance of queries in my app, I have several functions where I purely do inserts / updates, which adding begin / commit has greatly improved the speed.
However, my main function which runs when the app starts up, has conditional inserts / updates and selects, based on each other.
My worry is that I'll begin transaction, insert some data into table X conditionally, then select on table X, the query wouldn't find any the data, until commit transaction, I'm correct in my concern ?
As a workaround can I do my inserts / updates with begin / commit, then do my select, then do further transaction within begin / commit ?
PLEASE don't tell me to use FMDB or core data, I'm committed to this path, to provide some fixes.
y worry is that I'll begin transaction, insert some data into table X conditionally, then select on table X, the query wouldn't find any the data, until commit transaction, I'm correct in my concern ?
No. As the documentation for BEGIN TRANSACTION states, updates and inserts are always done in a transaction, so you would already be seeing this problem if it existed. This assumes, of course, that all of your SQL statements are being done with the same database connection.

PostgreSQL Locking Questions

I am trying to figure out how to lock an entire table from writing in Postgres however it doesn't seem to be working so I am assuming I am doing something wrong.
Table name is 'users' for example.
LOCK TABLE users IN EXCLUSIVE MODE;
When I check the view pg_locks it doesn't seem to be in there. I've tried other locking modes as well to no avail.
Other transactions are also capable of performing the LOCK function and do not block like I assumed they would.
In the psql tool (8.1) I simply get back LOCK TABLE.
Any help would be wonderful.
There is no LOCK TABLE in the SQL standard, which instead uses SET TRANSACTION to specify concurrency levels on transactions. You should be able to use LOCK in transactions like this one
BEGIN WORK;
LOCK TABLE table_name IN ACCESS EXCLUSIVE MODE;
SELECT * FROM table_name WHERE id=10;
Update table_name SET field1=test WHERE id=10;
COMMIT WORK;
I actually tested this on my db.
Bear in mind that "lock table" only lasts until the end of a transaction. So it is ineffective unless you have already issued a "begin" in psql.
(in 9.0 this gives an error: "LOCK TABLE can only be used in transaction blocks". 8.1 is very old)
The lock is only active until the end of the current transaction and released when the transaction is committed (or rolled back).
Therefore, you have to embed the statement into a BEGIN and COMMIT/ROLLBACK block. After executing:
BEGIN;
LOCK TABLE users IN EXCLUSIVE MODE;
you could run the following query to see which locks are active on the users table at the moment:
SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid WHERE relation = 'users'::regclass::oid;
The query should show the exclusive lock on the users table. After you perform a COMMIT and you re-run the above-mentioned query, the lock should not longer be present.
In addition, you could use a lock tracing tool like https://github.com/jnidzwetzki/pg-lock-tracer/ to get real-time insights into the locking activity of the PostgreSQL server. Using such lock tracing tools, you can see which locks are taken and released in real-time.