BizTalk WCF-OracleDB (post-)polling(-available)-statements - wcf

I have a general question regarding polling within transaction using WCF-OracleDB adapter, to make sure one (or more) rows polled also gets updated to be polled only once.
In WCF-SQL I normally use polling(available)-statements. In PollingAvailableStatement I use a simple COUNT(*) query. In PollingStatement I normally query next row(s) and store its ID(s) locally. Then I both query data AND update the rows with status to make sure the polled row(s) never gets polled again. And, if something goes wrong, it is rolled back by the AmbientTransaction.
How do I use PostPollingStatement to achieve the same with the WCF-OracleDB-adapter? What I miss is a "reference" to the rows I polled, to make sure only correct rows gets updated status.
I have tried to google this, of course, but everyone seems to avoid this requirement...?

I have not been able to find any combination of features in Oracle to use same solution as for SQL Server having a procedure to both update polling status AND return data. Hence, there must be another solution for safe polling against an Oracle database.
Oracle has two modes of transaction isolation levels, and you need to use Serializable in order for polling to work satisfactory. I think it is like a "snapshot" of the database you get when you enter the serializable mode. And you are isolated from changes by other users you would get if you used ReadCommitted instead. When you commit, other users will see your changes and you will see theirs. To avoid other users to interfere while in serializable mode, make sure to lock rows using FOR UPDATE. It will be released when you (or BizTalk) commit changes.
Bottomline, in serializable mode, same query will always return same result (if you did not change it yourself). This is how you can update only rows that you previously polled, the missed "reference" in my question.
Also, UseAmbientTransaction must be used to ensure BizTalk to successfully roll back transaction if there are problems. Using PRAGMA AUTONOMOUS_TRANSACTION, as proposed by some, is not an option because it is a sub-transaction that cannot be rolled back by the transaction BizTalk started.
Note: EnableBizTalkCompatibilityMode should also be set to True, but I have
not found any explanation of why or what it does.
So, with the above set configuration, the WCF-OracleDB adapter will enter serializable mode automatically before executing PollingStatement, you will be able to receive data and update polling status using same query as PollingStatement, and adapter will commit changes if PostPollingStatement is successful. If not, AmbientTransaction makes sure your changes are not commited.
And, configuration of pollings statements:
PollingAvailableStatement
SELECT COUNT(*) FROM <Table> WHERE POLLED = 0
PollingStatement
SELECT * FROM <Table> WHERE POLLED = 0 ORDER BY Date ASC FOR UPDATE
PostPollingStatement
UPDATE <Table> SET POLLED = 1 WHERE POLLED = 0
In real world scenarios, you will probably also have to consider batching (using ROWNUM in Oracle), which means you need to construct your queries similar to this:
SELECT *
FROM <Table>
WHERE PK IN
(
SELECT PK
FROM (
SELECT *
FROM <Table>
WHERE POLLED = 0
ORDER BY PK ASC
)
WHERE ROWNUM <=10
)
FOR UPDATE;
and:
UPDATE <Table>
SET POLLED = 1
WHERE PK IN
(
SELECT PK
FROM (
SELECT *
FROM <Table>
WHERE POLLED = 0
ORDER BY PK ASC
)
WHERE ROWNUM <=10
)
Point here is that WHERE and ORDER BY must be executed first, then ROWNUM is based on that sub-query (or view).

Related

How to fix "UPDATE <table> OUTPUT WITH (READPAST)"

We are trying to retrieve and update the TOP X events from a table but without locking anything else than the "processed" rows. We looked into different SQL hints like ROWLOCK and READPAST, but haven't figured out what combination of those should be used in this scenario. Also, we need to make sure that the returned rows are unique across different concurrent executions of that query and that the same row will never be selected twice.
Note: This table has got many INSERTs happening concurrently.
UPDATE TOP(:batchSize) wsns WITH (READPAST)
SET consumer_ip = :consumerIP
OUTPUT inserted.id, inserted.another_id, inserted.created_time, inserted.scheduled_time
FROM table_A a
WHERE a.scheduled_time < GETUTCDATE() AND a.consumer_ip IS NULL
Any help is highly appreciated. Many thanks!
I don't quite follow how/why are you trying to use the READPAST hint here?
But anyway - to achieve what you want I would suggest:
WITH xxx AS
(
SELECT TOP(:batchSize) *
FROM table_A
)
UPDATE xxx
SET consumer_ip = :consumerIP
OUTPUT inserted.id, inserted.another_id, inserted.created_time, inserted.scheduled_time
FROM table_A a
WHERE a.scheduled_time < GETUTCDATE() AND a.consumer_ip IS NULL;
If all that could happen in the background are new inserts then, I can't see why this would be a problem. SQL Server optimiser most likely would decide for PAGE/ROW lock (but this is depending on your DB settings as well as indexes affected and their options). If by any reason you want to stop other transaction until this update is finished - hold an exclusive lock on the entire table, till the end of your transaction, you can just add WITH(TABLOCKX). Therefore, I would strongly recommend to have a good read on the SQL Server concurrency and isolation before you start messing with it in a production environment.

How can I update multiple items from a select query in Postgres?

I'm using node.js, node-postgres and Postgres to put together a script to process quite a lot of data from a table. I'm using the cluster module as well, so I'm not stuck with a single thread.
I don't want one of the child processes in the cluster duplicating the processing of another. How can I update the rows I just received from a select query without the possibility of another process or query having also selected the same rows?
I'm assuming my SQL query will look something like:
BEGIN;
SELECT * FROM mytable WHERE ... LIMIT 100;
UPDATE mytable SET status = 'processing' WHERE ...;
COMMIT;
Apologies for my poor knowledge of Postgres and SQL, I've used it once before in a simple PHP web app and never before with node.js.
If you're using multithreaded application you cannot and should not be using "for Update" (in the main thread anyway) what you need to be using is advisory lock. Each thread can query a row or mnany rows, verifying that they're not locked, and then locking them so no other session uses them. It's as simple as this within each thread:
select * from mytab
where pg_try_advisory_lock(mytab.id)
limit 100
at the end be sure to release the locks using pg_advisory_unlock
BEGIN;
UPDATE mytable SET status = 'processing' WHERE status <> "processing" and id in
( selecy ID FROM mytable where status <> "processing" limit 100) returning * ;
COMMIT;
There's a chance that's going to fail if some other query was working on the same rows
so if you get an error, retry it until you get some data or no rows returned.
if you get zero rows either you're finished or there;s too many other simultaneous proceses like yours.

exclusive lock and shared lock for select statement - SQL Server

I am not able to understand how select will behave while its part of exclusive transaction. Please consider following scenarios –
Scenario 1
Step 1.1
create table Tmp(x int)
insert into Tmp values(1)
Step 1.2 – session 1
begin tran
set transaction isolation level serializable
select * from Tmp
Step 1.3 – session 2
select * from Tmp
Even first session hasn't been finished, session 2 will be able to read tmp table. I thought Tmp will have exclusive lock and shared lock should not be issued to select query in session 2. And it’s not happening. I have made sure that default isolation level is READ COMMITED.
Thanks in advance for helping me in understanding this behavior.
EDIT : Why I need select in exclusive lock?
I have a SP which actually generate sequential values. So flow is -
read max values from Table and store value in variables
Update table set value=value+1
This SP is executed in parallel by several thousand instances. If two instances execute SP at same time, then they will read same value and will update value+1. Though I would like to have sequential value for every execution. I think its possible only if select is also part of exclusive lock.
If you want a transaction to be serializable, you have to change that option before you start the outermost transaction. So your first session is incorrect and is still actually running under read committed (or whatever other level was in effect for that session).
But even if you correct the order of statements, it still will not acquire an exclusive lock for a plain SELECT statement.
If you want the plain SELECT to acquire an exclusive lock, you need to ask for it:
select * from Tmp with (XLOCK)
or you need to execute a statement that actually requires an exclusive lock:
update Tmp set x = x
Your first session doesn't need an exclusive lock because it's not changing the data. If your first (serializable) session had run to completion and either rolled back or committed, before your second session was started, that session's results would still be the same because your first session didn't change the data - and so the "serializable" nature of the transaction was correct.

READ COMMITTED database isolation level in oracle

I'm working on a web app connected to oracle. We have a table in oracle with a column "activated". Only one row can have this column set to 1 at any one time. To enforce this, we have been using SERIALIZED isolation level in Java, however we are running into the "cannot serialize transaction" error, and cannot work out why.
We were wondering if an isolation level of READ COMMITTED would do the job. So my question is this:
If we have a transaction which involves the following SQL:
SELECT *
FROM MODEL;
UPDATE MODEL
SET ACTIVATED = 0;
UPDATE MODEL
SET ACTIVATED = 1
WHERE RISK_MODEL_ID = ?;
COMMIT;
Given that it is possible for more than one of these transactions to be executing at the same time, would it be possible for more than one MODEL row to have the activated flag set to 1 ?
Any help would be appreciated.
your solution should work: your first update will lock the whole table. If another transaction is not finished, the update will wait. Your second update will guarantee that only one row will have the value 1 because you are locking the table (it doesn't prevent INSERT statements however).
You should also make sure that the row with the RISK_MODEL_ID exists (or you will have zero row with the value '1' at the end of your transaction).
To prevent concurrent INSERT statements, you would LOCK the table (in EXCLUSIVE MODE).
You could consider using a unique, function based index to let Oracle handle the constraint of only having a one row with activated flag set to 1.
CREATE UNIQUE INDEX MODEL_IX ON MODEL ( DECODE(ACTIVATED, 1, 1, NULL));
This would stop more than one row having the flag set to 1, but does not mean that there is always one row with the flag set to 1.
If what you want is to ensure that only one transaction can run at a time then you can use the FOR UPDATE syntax. As you have a single row which needs locking this is a very efficient approach.
declare
cursor c is
select activated
from model
where activated = 1
for update of activated;
r c%rowtype;
begin
open c;
-- this statement will fail if another transaction is running
fetch c in r;
....
update model
set activated = 0
where current of c;
update model
set activated = 1
where risk_model_id = ?;
close c;
commit;
end;
/
The commit frees the lock.
The default behaviour is to wait until the row is freed. Otherwise we can specify NOWAIT, in which case any other session attempting to update the current active row will fail immediately, or we can add a WAIT option with a polling time. NOWAIT is the option to choose to absolutely avoid the risk of hanging, and it also gives us the chance to inform the user that someone else is updating the table, which they might want to know.
This approach is much more scalable than updating all the rows in the table. Use a function-based index as WW showed to enforce the rule that only one row can have ACTIVATED=1.

SQL problem - high volume trans and PK violation

I'm writing a high volume trading system. We receive messages at around 300-500 per second and these messages then need to be saved to the database as quickly as possible. These messages get deposited on a Message Queue and are then read from there.
I've implemented a Competing Consumer pattern, which reads from the queue and allows for multithreaded processing of the messages. However I'm getting a frequent primary key violation while the app is running.
We're running SQL 2008. The sample table structure would be:
TableA
{
MessageSequence INT PRIMARY KEY,
Data VARCHAR(50)
}
A stored procedure gets invoked to persist this message and looks something like this:
BEGIN TRANSACTION
INSERT TableA(MessageSequence, Data )
SELECT #MessageSequence, #Data
WHERE NOT EXISTS
(
SELECT TOP 1 MessageSequence FROM TableA WHERE MessageSequence = #MessageSequence
)
IF (##ROWCOUNT = 0)
BEGIN
UPDATE TableA
SET Data = #Data
WHERE MessageSequence = #MessageSequence
END
COMMIT TRANSACTION
All of this is in a TRY...CATCH block so if there's an error, it rolls back the transaction.
I've tried using table hints, like ROWLOCK, but it hasn't made a difference. Since the Insert is evaluated as a single statement, it seems ludicrous that I'm still getting a 'Primary Key on insert' issue.
Does anyone have an idea why this is happening? And have you got ANY ideas which may point me in the direction of a solution?
Why is this happening?
SELECT TOP 1 MessageSequence FROM TableA WHERE MessageSequence = #MessageSequence
This SELECT will try to locate the row, if not found the EXISTS operator will return FALSE and the INSERT will proceed. Hoewever, the decision to INSERT is based on a state that was true at the time of the SELECT, but that is no longer guaranteed to be true at the time of the INSERT. In other words, you have race conditions where two threads can both look up the same #MessageSequence, both return NOT EXISTS and both try to INSERT, when only the first one will succeed, second one will cause a PK violation.
How do I solve it?
The quickest fix is to add a WITH (UPDLOCK) hint to the SELECT, this will enforce the lock placed on the #MessageSequence key to be retained and thus the INSERT/SELECT to behave atomically:
INSERT TableA(MessageSequence, Data )
SELECT #MessageSequence, #Data
WHERE NOT EXISTS (
SELECT TOP 1 MessageSequence FROM TableA WITH(UPDLOCK) WHERE MessageSequence = #MessageSequence)
To prevent SQL from doing fancy stuff like page lock, you can also add the ROWLOCK hint.
However, that is not my recommendation. My recommendation may surpise you, but is this: do the operation that is most likely to succeed and handle the error if it failed. Ie. if your business case makes it more likely for the #MessageSequnce to be new, try an INSERT and handle the PK if it failed. This way you avoid the spurious look-ups, and hte cost of the catch/retry is amortized over the many cases when it succeeds from the first try.
Also, it is perhaps worth investigating using the built-in queues that come with SQL Server.
Common problem. Explained here:
Defensive database programming: eliminating IF statements
It might be related to the transaction isolation level. You might need
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
before you start the transaction.
Also, if you have more updates than inserts, you should try the update first and check rowcount and do the insert second.
This is very similar to post 939831. Ultimately you want to use the hints (ROWLOCK, READPAST, UPDLOCK). READPAST tells sql server to skip to the next record if the current one is locked. UPDLOCK tells sql server that the read lock is going to escalate to an update lock.
When I implemented something similar I locked the next record by the threadID
UPDATE TOP (1)
foo
SET
ProcessorID = #PROCID
FROM
OrderTable foo WITH (ROWLOCK, READPAST, UPDLOCK)
WHERE
ProcessorID = 0
Then selected the record
SELECT *
FROM foo WITH (NOLOCK)
WHERE ProcessorID = #PROCID
Then marked it as processed
UPDATE foo
SET ProcessorID = -1
WHERE ProcessorID = #PROCID
Later in off hours I perform the relatively expensive operation of performing the delete operation to clear the queue of processed records.
The atomicity of the following statement is what you are after:
INSERT TableA(MessageSequence, Data )
SELECT #MessageSequence, #Data
WHERE NOT EXISTS
(
SELECT TOP 1 MessageSequence FROM TableA WHERE MessageSequence = #MessageSequence
)
According to this person, it depends on the current isolation level.
On a tangent, if you're thinking of a high volume trading system you might want to consider a tick database designed for such data [I'm not exactly sure what "message" you are storing here], such as discussed in this thread for example: http://www.elitetrader.com/vb/showthread.php?threadid=81345.
These are typically in-memory solutions with proprietary query languages. We use kdb+ at our shop.
Not sure what Messaging product you use - but it may be worth looking at the transactions not at the DB level, but at the MQ Level.
Of course, if you are using a TM (Transaction manager), the two operations : 1)Get from MQ and 2)Write to DB are both 'bracketed' under the same parent commit.
So I am not sure if you are using an implicit or explicit or any TM here (for example, Microsoft's DTC).
MessageSequence is the PK, so could the same Message from the MQ be getting processed twice.
When you perform a 'GET" from MQ, make sure the GET is committed (i.e. not a db-commit, but a MQ-commit) - that will ensure the same MessageID cannot be 'popped' by the next thread that writes messages to the DB.