I'm using node.js, node-postgres and Postgres to put together a script to process quite a lot of data from a table. I'm using the cluster module as well, so I'm not stuck with a single thread.
I don't want one of the child processes in the cluster duplicating the processing of another. How can I update the rows I just received from a select query without the possibility of another process or query having also selected the same rows?
I'm assuming my SQL query will look something like:
BEGIN;
SELECT * FROM mytable WHERE ... LIMIT 100;
UPDATE mytable SET status = 'processing' WHERE ...;
COMMIT;
Apologies for my poor knowledge of Postgres and SQL, I've used it once before in a simple PHP web app and never before with node.js.
If you're using multithreaded application you cannot and should not be using "for Update" (in the main thread anyway) what you need to be using is advisory lock. Each thread can query a row or mnany rows, verifying that they're not locked, and then locking them so no other session uses them. It's as simple as this within each thread:
select * from mytab
where pg_try_advisory_lock(mytab.id)
limit 100
at the end be sure to release the locks using pg_advisory_unlock
BEGIN;
UPDATE mytable SET status = 'processing' WHERE status <> "processing" and id in
( selecy ID FROM mytable where status <> "processing" limit 100) returning * ;
COMMIT;
There's a chance that's going to fail if some other query was working on the same rows
so if you get an error, retry it until you get some data or no rows returned.
if you get zero rows either you're finished or there;s too many other simultaneous proceses like yours.
Related
We are trying to retrieve and update the TOP X events from a table but without locking anything else than the "processed" rows. We looked into different SQL hints like ROWLOCK and READPAST, but haven't figured out what combination of those should be used in this scenario. Also, we need to make sure that the returned rows are unique across different concurrent executions of that query and that the same row will never be selected twice.
Note: This table has got many INSERTs happening concurrently.
UPDATE TOP(:batchSize) wsns WITH (READPAST)
SET consumer_ip = :consumerIP
OUTPUT inserted.id, inserted.another_id, inserted.created_time, inserted.scheduled_time
FROM table_A a
WHERE a.scheduled_time < GETUTCDATE() AND a.consumer_ip IS NULL
Any help is highly appreciated. Many thanks!
I don't quite follow how/why are you trying to use the READPAST hint here?
But anyway - to achieve what you want I would suggest:
WITH xxx AS
(
SELECT TOP(:batchSize) *
FROM table_A
)
UPDATE xxx
SET consumer_ip = :consumerIP
OUTPUT inserted.id, inserted.another_id, inserted.created_time, inserted.scheduled_time
FROM table_A a
WHERE a.scheduled_time < GETUTCDATE() AND a.consumer_ip IS NULL;
If all that could happen in the background are new inserts then, I can't see why this would be a problem. SQL Server optimiser most likely would decide for PAGE/ROW lock (but this is depending on your DB settings as well as indexes affected and their options). If by any reason you want to stop other transaction until this update is finished - hold an exclusive lock on the entire table, till the end of your transaction, you can just add WITH(TABLOCKX). Therefore, I would strongly recommend to have a good read on the SQL Server concurrency and isolation before you start messing with it in a production environment.
I have a general question regarding polling within transaction using WCF-OracleDB adapter, to make sure one (or more) rows polled also gets updated to be polled only once.
In WCF-SQL I normally use polling(available)-statements. In PollingAvailableStatement I use a simple COUNT(*) query. In PollingStatement I normally query next row(s) and store its ID(s) locally. Then I both query data AND update the rows with status to make sure the polled row(s) never gets polled again. And, if something goes wrong, it is rolled back by the AmbientTransaction.
How do I use PostPollingStatement to achieve the same with the WCF-OracleDB-adapter? What I miss is a "reference" to the rows I polled, to make sure only correct rows gets updated status.
I have tried to google this, of course, but everyone seems to avoid this requirement...?
I have not been able to find any combination of features in Oracle to use same solution as for SQL Server having a procedure to both update polling status AND return data. Hence, there must be another solution for safe polling against an Oracle database.
Oracle has two modes of transaction isolation levels, and you need to use Serializable in order for polling to work satisfactory. I think it is like a "snapshot" of the database you get when you enter the serializable mode. And you are isolated from changes by other users you would get if you used ReadCommitted instead. When you commit, other users will see your changes and you will see theirs. To avoid other users to interfere while in serializable mode, make sure to lock rows using FOR UPDATE. It will be released when you (or BizTalk) commit changes.
Bottomline, in serializable mode, same query will always return same result (if you did not change it yourself). This is how you can update only rows that you previously polled, the missed "reference" in my question.
Also, UseAmbientTransaction must be used to ensure BizTalk to successfully roll back transaction if there are problems. Using PRAGMA AUTONOMOUS_TRANSACTION, as proposed by some, is not an option because it is a sub-transaction that cannot be rolled back by the transaction BizTalk started.
Note: EnableBizTalkCompatibilityMode should also be set to True, but I have
not found any explanation of why or what it does.
So, with the above set configuration, the WCF-OracleDB adapter will enter serializable mode automatically before executing PollingStatement, you will be able to receive data and update polling status using same query as PollingStatement, and adapter will commit changes if PostPollingStatement is successful. If not, AmbientTransaction makes sure your changes are not commited.
And, configuration of pollings statements:
PollingAvailableStatement
SELECT COUNT(*) FROM <Table> WHERE POLLED = 0
PollingStatement
SELECT * FROM <Table> WHERE POLLED = 0 ORDER BY Date ASC FOR UPDATE
PostPollingStatement
UPDATE <Table> SET POLLED = 1 WHERE POLLED = 0
In real world scenarios, you will probably also have to consider batching (using ROWNUM in Oracle), which means you need to construct your queries similar to this:
SELECT *
FROM <Table>
WHERE PK IN
(
SELECT PK
FROM (
SELECT *
FROM <Table>
WHERE POLLED = 0
ORDER BY PK ASC
)
WHERE ROWNUM <=10
)
FOR UPDATE;
and:
UPDATE <Table>
SET POLLED = 1
WHERE PK IN
(
SELECT PK
FROM (
SELECT *
FROM <Table>
WHERE POLLED = 0
ORDER BY PK ASC
)
WHERE ROWNUM <=10
)
Point here is that WHERE and ORDER BY must be executed first, then ROWNUM is based on that sub-query (or view).
I would like to find the proper documentation to confirm my thought about a SQL Server job I recently wrote. My fear is that data could be inconsistent for few milliseconds (timing between the start of the job execution and its end).
Let's say the job is setup to run every 30 minutes. It will only have one step with the following SQL statement:
DELETE FROM myTable
INSERT INTO myTable
SELECT *
FROM myTableTemp
Could it happens that a SELECT query would be executed exactly in between the DELETE statement and the INSERT statement and thus returning empty results?
And what if I would have created 2 steps in my job, one for the DELETE query and another for the INSERT INTO? Is the atomicity is protected by SQL Server between several steps of one job?
Thanks for your help on this one
No there is no automatic atomic handling of jobs, whether they are multiple statements or steps.
Use this:
begin transaction
delete...
insert....
... anything else you need to be atomic
commit work
I have a problem and I have the impression that the solution is simple but I cannot figure it out.
A have a multi threaded env and a pl sql stored procedure.
Inside this procedure I have something like that :
select * into mycount from toto;
If mycount >0 then update...;
else insert ...;
The problem is that I have many threads calling this procedure.
Is there a simple way to have only one thread at a time executing the piece of code above ?
I know that I can use select for update but since I could have an UPDATE or INSERT I guess this does not works for me.
Thanks alot.
Have a separate table MODIFY_CHECKER with just one column FLAG. Use this table & column as a means to allow only one thread to update/insert on your actual table (TOTO I suppose)
You could add something like the below to your existing PL/SQL procedure -
IF (select count(1) from modify_checker where flag = 1) > 0 THEN
-- Another thread is already working, so just raise exception
RAISE <<exception>>
ELSE
-- No other thread working on this, so go ahead
UPDATE modify_checker SET flag = 1;
COMMIT;
<<actual code to update or insert actual table>>
UPDATE modify_checker SET flag = 0;
COMMIT;
END IF;
The simplest solution for what you want would be to introduce an intermediary queue of some sort at the application level - the various threads would send a request to the queue and a queue processor would read requests off of the queue and make the necessary call to the DB.
This would, in my opinion, be the easiest solution in that you don't have to change too much to get it to work. The only potential problem is that this solution could be complicated if a response is necessary - if so this is still an option, but the app code is complicated a bit by having to deal with asynchronous responses.
You can put a lock as soon as you access the procedure and the release it when you are exiting from it. This way only one can access this at a time.
ALTER TABLE tl_test ENABLE TABLE LOCK;
ALTER TABLE tl_test DISABLE TABLE LOCK;
I'm currently running the following statement
select * into adhoc..san_savedi from dps_san..savedi_record
It's taking a painfully long time and I'd like to see how far along it is so I ran this:
select count(*) from adhoc..san_savedi with (nolock)
That didn't return anything in a timely manner so for the heck of it I did this:
select top 1 * from adhoc..san_savedi with (nolock)
Even that seems to run indefinitely. I could understand if there are millions of records that the count(*) could take a long time, but I don't understand why selecting the top 1 record wouldn't come back pretty much immediately considering I specified nolock.
In the name of full disclosure, dps_san is a view that pulls from an odbc connection via linked server. I don't think that'd be affecting why I can't return the top row but just throwing it out there in case I'm wrong.
So I want to know what is keeping that statement from running?
EDIT:
As I mentioned above, yes dps_san..savedi_record is a view. Here's what it does:
select * from DPS_SAN..root.SAVEDI_RECORD
It's nothing more than an alias and does no grouping/sorting/etc so I don't think the problem lies here, but please enlighten me if I'm wrong about that.
SELECT queries with NOLOCK don't actually take no locks, they still need a SCH-S (schema stability) lock on the table (and as it is a heap it will also take a hobt lock).
Additionally before the SELECT can even begin SQL Server must compile a plan for the statement, which also requires it to take a SCH-S lock out on the table.
As your long running transaction creates the table via SELECT ... INTO it holds an incompatible SCH-M lock on it until the statement completes.
You can verify this by looking in sys.dm_os_waiting_tasks whilst while during the period of blocking.
When I tried the following in one connection
BEGIN TRAN
SELECT *
INTO NewT
FROM master..spt_values
/*Remember to rollback/commit this later*/
And then executing (or just simply trying to view the estimated execution plan)
SELECT *
FROM NewT
WITH (NOLOCK)
in a second the reading query was blocked.
SELECT wait_type,
resource_description
FROM sys.dm_os_waiting_tasks
WHERE session_id = <spid_of_waiting_task>
Shows the wait type is indeed SCH_S and the blocking resource SCH-M
wait_type resource_description
---------------- -------------------------------------------------------------------------------------------------------------------------------
LCK_M_SCH_S objectlock lockPartition=0 objid=461960722 subresource=FULL dbid=1 id=lock4a8a540 mode=Sch-M associatedObjectId=461960722
It very well may be that there are no locks... If dps_san..savedi_record is a view, then it may be taking a long time to execute, because it may be accessing tables without using an index, or it may be sorting millions of records, or whatever reason. Then your query, even a simple top or count, will be only as fast as that view can be executed.
A few issues to consider here. Is dps_san..savedi_record a view? If so, it could just be taking a really long time to get your data. The other thing I can think of is that you're trying to create a temp table by using the select into syntax, which is a bad idea. select * into ... syntax will lock the tempdb for duration of the select.
If you are creating the table using that syntax, then there is a workaround. First, create the table by throwing where 1=0 at the end of your initial statement:
select * into ... from ... where 1=0
This will create the table first (which is quick) which allows you to insert into because the table exists now (without penalty of locking tempdb for duration of query).
Find the session_id that is performing the select into:
SELECT r.session_id, r.blocking_session_id, r.wait_type, r.wait_time
FROM sys.dm_exec_requests AS r
CROSS APPLY sys.dm_exec_sql_text(r.plan_handle) AS t
WHERE t.[text] LIKE '%select%into%adhoc..san_savedi%';
This should let you know if another session is blocking the select into or if it has a wait type that is causing a problem.
You can repeat the process in another window for the session that is trying to do the select. I suspect Martin is right and that my earlier comment about schema lock is relevant.