Update and then select updated rows? - sql

I have an application that selects row with a particular status and then starts processing these rows. However some long running processing can cause a new instance of my program to be started, selecting the same rows again because it haven't had time to update the status yet. So I'm thinking of selecting my rows and then updating the status to something else so they cannot be selected again. I have done some searching and got the impression that the following should work, but it fails.
UPDATE table SET status = 5 WHERE status in
(SELECT TOP (10) * FROM table WHERE status = 1)
Only one expression can be specified in the select list when the subquery is not introduced with EXISTS.
TLDR: Is it possible to both select and update rows at the same time? (The order doesn't really matter)

You can use an output clause to update the rows and then return them as if they were selected.
As for updating the top 100 rows only, a safe approach is to use a cte to select the relevant rows first (I assumed that column id can be used to order the rows):
with cte as (select top (100) * from mytable where status = 1 order by id)
update cte set status = 5 output inserted.*

You can directly go for UPDATE statement. It will generate exclusive lock on the row and other concurrent transactions cannot read this row.
More information on locks
Exclusive locks: Exclusive locks are used to lock data being modified by one transaction thus preventing modifications by other
concurrent transactions. You can read data held by exclusive lock only
by specifying a NOLOCK hint or using a read uncommitted isolation
level. Because DML statements first need to read the data they want to
modify you'll always find Exclusive locks accompanied by shared locks
on that same data.
UPDATE TOP(10) table
SET status = 5 WHERE status =1

Related

Select query hangs when a large number of rows have an update lock

I am designing a program that will read a queue table. I am using locking to make sure that multiple instances do not pull the same row.
I am locking rows like this:
BEGIN TRANSACTION
UPDATE top(10) Source with (ROWLOCK, READPAST, updlock)
SET status = status + 1
With another connection I read the rows like this:
SELECT COUNT(*) FROM Source WITH (ROWLOCK, READPAST, updlock)
The count from the select statement does not include the rows I have locked. This is exactly what I want.
This works fine when I pick the top 10 rows, or 100, or even 1000. Somewhere around 4,690 (it's not consistent) the select begins to hang until the transaction is committed. It's not just slow; it waits for the transaction to end.
This is a test. My real query will not be using top. It will use a join which also causes the problem when too many rows are locked.
Any ideas on what may cause this?
Is there a better way to have multiple instances read a table and not have conflicts?

SQL Server filter rows which are being selected in other transactions

i have a couple of jobs Update from select queries e.g
UPDATE TABLE_X
SET "stopFlag" = 1
OUTPUT
INSERTED."RowID" AS "rowID"
WHERE "RowID" IN (
SELECT TOP 50
"RowID"
FROM
TABLE_X
WHERE
stopFlag=0
)
Currently i was thinking that the update cannot conflict with another update but as i see the logs of my database tables it seems that 2 different jobs executed for a single row resulting in messed up data. My question is. Is this a proper way to filter the rows from being selected. If it is then what am i missing?
A transaction is not necessary here, as every statement runs in an auto-commit transaction anyway.
You could up the isolation level to SERIALIZABLE, which may be more consistent, at the cost of more blocking. You could also add an UPDLOCK hint to the inner reference of Table_X.
But I think the best thing to do here will actually improve performance also: don't self-join the table, just update the derived table directly
UPDATE x
SET stopFlag = 1
OUTPUT
inserted.RowID AS rowID
FROM (
SELECT TOP 50
RowID,
stopFlag
FROM
TABLE_X
WHERE
stopFlag = 0
) x;
An UPDLOCK is automatically taken on any rows read from the table reference which is being updated, so that is not necessary.
If you want the statements to run concurrently, but mark and return disjoint rows, use READPAST. You can even introduce ordering guarantees, eg:
UPDATE TABLE_X
SET "stopFlag" = 1
OUTPUT
INSERTED."RowID" AS "rowID"
WHERE "RowID" IN (
SELECT TOP 50
"RowID"
FROM
TABLE_X with (rowlock, updlock, readpast)
WHERE
stopFlag=0
ORDER BY "RowID"
)
See generally Using tables as Queues.

In sybase, how would I lock a stored procedure that is executing and alter the table that the stored procedure returns?

I have a table as follows:
id status
-- ------
1 pass
1 fail
1 pass
1 na
1 na
Also, I have a stored procedure that returns a table with top 100 records having status as 'na'. The stored procedure can be called by multiple nodes in an environment and I don't want them to fetch duplicate data. So, I want to lock the stored procedure while it is executing and set the status of the records obtained from the stored procedure to 'In Progress' and return that table and then release the lock, so that different nodes don't fetch the same data. How would I accomplish this?
There is already a solution provided for similar question in ms sql but it shows errors when using in sybase.
Assuming Sybase ASE ...
The bigger issue you'll likely want to consider is whether you want a single process to lock the entire table while you're grabbing your top 100 rows, or if you want other processes to still access the table?
Another question is whether you'd like multiple processes to concurrently pull 100 rows from the table without blocking each other?
I'm going to assume that you a) don't want to lock the entire table and b) you may want to allow multiple processes to concurrently pull rows from the table.
1 - if possible, make sure the table is using datarows locking (default is usually allpages); this will reduce the granularity of locks to the row level (as opposed to page level for allpages); the table will need to be datarows if you want to allow multiple processes to concurrently find/update rows in the table
2 - make sure the lock escalation setting on the table is high enough to ensure a single process's 100 row update doesn't lock the table (sp_setpglockpromote for allpages, sp_setrowlockpromote for datarows); the key here is to make sure your update doesn't escalate to a table-level lock!
3 - when it comes time to grab your set of 100 rows you'll want to ... inside a transaction ... update the 100 rows with a status value that's unique to your session, select the associated id's, then update the status again to 'In Progress'
The gist of the operation looks like the following:
declare #mysession varchar(10)
select #mysession = convert(varchar(10),##spid) -- replace ##spid with anything that
-- uniquely identifies your session
set rowcount 100 -- limit the update to 100 rows
begin tran get_my_rows
-- start with an update so that get exclusive access to the desired rows;
-- update the first 100 rows you find with your ##spid
update mytable
set status = #mysession -- need to distinguish your locked rows from
-- other processes; if we used 'In Progress'
-- we wouldn't be able to distinguish between
-- rows update earlier in the day or updated
-- by other/concurrent processes
from mytable readpast -- 'readpast' allows your query to skip over
-- locks held by other processes but it only
-- works for datarows tables
where status = 'na'
-- select your reserved id's and send back to the client/calling process
select id
from mytable
where status = #mysession
-- update your rows with a status of 'In Progress'
update mytable
set status = 'In Progress'
where status = #mysession
commit -- close out txn and release our locks
set rowcount 0 -- set back to default of 'unlimited' rows
Potential issues:
if your table is large and you don't have an index on status then your queries could take longer than necessary to run; by making sure lock escalation is high enough and you're using datarows locking (so the readpast works) you should see minimal blocking of other processes regardless of how long it takes to find the desired rows
with an index on the status column, consider that all of these updates are going to force a lot of index updates which is probably going to lead to some expensive deferred updates
if using datarows and your lock escalation is too low then an update could look the entire table, which would cause another (concurrent) process to readpast the table lock and find no rows to process
if using allpages you won't be able to use readpast so concurrent processes will block on your locks (ie, they won't be able to read around your lock)
if you've got an index on status, and several concurrent processes locking different rows in the table, there could be a chance for deadlocks to occur (likely in the index tree of the index on the status column) which in turn would require your client/application to be coded to expect and address deadlocks
To think about:
if the table is relatively small such that table scanning isn't a big cost, you could drop any index on the status column and this should reduce the performance overhead of deferred updates (related to updating the indexes)
if you can work with a session specific status value (eg, 'In Progress - #mysession') then you could eliminate the 2nd update statement (could come in handy if you're incurring deferred updates on an indexed status column)
if you have another column(s) in the table that you could use to uniquely identifier your session's rows (eg, last_updated_by_spid = ##spid, last_updated_date = #mydate - where #mydate is initially set to getdate()) then your first update could set the status = 'In Progress', the select would use ##spid and #mydate for the where clause, and the second update would not be needed [NOTE: This is, effectively, the same thing Gordon is trying to address with his session column.]
assuming you can work with a session specific status value, consider using something that will allow you to track, and fix, orphaned rows (eg, row status remains 'In Progress - #mysession' because the calling process died and never came back to (re)set the status)
if you can pass the id list back to the calling program as a single string of concatenated id values you could use the method I outline in this answer to append the id's into a #variable during the first update, allowing you to set status = 'In Progress' in the first update and also allowing you to eliminate the select and the second update
how would you tell which rows have been orphaned? you may want the ability to update a (small)datetime column with the getdate() of when you issued your update; then, if you would normally expect the status to be updated within, say, 5 minutes, you could have a monitoring process that looks for orphaned rows where status = 'In Progress' and its been more than, say, 10 minutes since the last update
If the datarows, readpast, lock escalation settings and/or deadlock potential is too much, and you can live with brief table-level locks on the table, you could have the process obtain an exclusive table level lock before performing the update and select statements; the exclusive lock would need to be obtained within a user-defined transaction in order to 'hold' the lock for the duration of your work; a quick example:
begin tran get_my_rows
-- request an exclusive table lock; wait until it's granted
lock table mytable in exclusive mode
update ...
select ...
update ...
commit
I'm not 100% sure how to do this in Sybase. But, the idea is the following.
First, add a new column to the table that represents the session or connection used to change the data. You will use this column to provide isolation.
Then, update the rows:
update top (100) t
set status = 'in progress',
session = #session
where status = 'na'
order by ?; -- however you define the "top" records
Then, you can return or process the 100 ids that are "in progress" for the given connection.
Create another table, proc_lock, that has one row
When control enters the stored procedure, start a transaction and do a select for update on the row in proc_lock (see this link). If that doesn't work for Sybase, then you could try the technique from this answer to lock the row.
Before the procedure exits, make sure to commit the transaction.
This will ensure that only one user can execute the proc at a time. When the second user tries to execute the proc, it will block until the first user's lock on the proc_lock row is released (e.g. when transaction is committed)

Non Repeatable Read from database table in SQL Server

Suppose I have a table 100 rows, I just want to select top 10 rows of table, but my situation is that i want to select only those rows which was not previously processed.
For this i have added a Flag column so that i will update whenever i process rows.
But here the problem arises when concurrent request comes for top 10 rows. Both may get same rows and trying to update the same rows (which I dont want to do).
Here I can't use Begin Transaction because It will lock the table and concurrent request will not get handled.
Requirement : My actual requirement is When i am selecting top 10 rows
using flag condition and updating then, then if other request for the
same it will also select other top 10 rows which is not handling by
Request 1.
Example : My table contains 100 rows.
{
Select top 10 * from table_name where flag=0
update table_name set top 10 flag = 1
}
(Will select top 10 out of 100 rows n update)
if at the same time during above request, another request come,
{
Select top 10 * from table_name where flag=0 (Should skip previous request rows)
update table_name set top 10 flag = 1
}
Need: (Will select top 10 out of rest 90 rows n update)
I Need a lock on top 10 rows of first request, but lock should like skip rows of first request even during simultaneous select statement of both requests
Please help me out for this to solve.
You can use an OUTPUT clause to do both the selecting and the updating the flag in one statement, e.g.
UPDATE TOP 10 table
SET flag = 1
WHERE flag = 0
OUTPUT inserted.*
If I understand you correctly you don't want to use a Transaction because it will lock the table for the duration of the update.
Maybe you could split the process into one part which selects the rows and updates the flag and a second part where you actually do your update with the selected rows.
Use a Transaction only for the first part of the task. This will ensure the table is only locked for the absolute Minimum of time.
As for your non-repeatable reads:
If you really want enforce this policy you should delete the selected row from the table and optionally save them to another table where the read-history stays. The lowest-level way to accomplish this guaranteed is with an update of another flag (updated?) and a trigger after the update.
Transaction with ISOLATION LEVEL REPEATABLE READ
{
select top 10 rows
update select-flag
return the 10 rows
}
normal query
{
take the returned 10 rows and do something
change updated-flag
}
Trigger after update if updated-flag changed
{
copy updated to read-history-table
delete updated-rows
}
ISOLATION LEVELS on MSDN
REPEATABLE READ "Specifies that statements cannot read data that has
been modified but not yet committed by other transactions and that
no other transactions can modify data that has been read by the
current transaction until the current transaction completes."

Is update with nested select atomic operation?

I need to select first (let's say) 10000 rows in database and return them. There may be more clients that do this operation at one time. I came up with this query:
update v set v.batch_Id = :batchId
from tblRedir v
inner join (
select top 10000 id
from tblRedir
where batch_Id is null
order by Date asc
) v2 on v.id=v2.id
It is a operation that consists from update and nested select. Both the queries work on the same table (tblRedir). The idea is that the rows are first marked by a unique batchId and then returned via
select * from tblRedir where batch_id = :batchId
(the batchid is a unique identifier (e.g. timestamp or guid) for each this update)
My question:
I thought that the operation update with nested select is atomic - that means that every client receives his own set of data that is unique (no other client received a subset of his data).
However it looks that I'm wrong - in some cases there are clients that receive no data, because probably they first both execute the select and then both execute the update (so the first client has no marked rows).
Is this operation atomic or not?
I work with Sql server 2005. The query is run via NHibernate like this
session.CreateSQLQuery('update....')
SELECT places shared locks on the rows read which then can be lifted in READ COMMITED isolation mode.
UPDATE places the update locks later promoted to exclusive locks. They are not lifted until the end of the transaction.
You should make the locks to retain as soon as they are placed.
You can do it by making the transaction isolation level REPEATABLE READ which will retain the shared locks until the end of the transaction and will prevent UPDATE part from locking these rows.
Alternatively, you can rewrite your query as this:
WITH q AS
(
SELECT TOP 10000 *
FROM mytable WITH (ROWLOCK, READPAST)
WHERE batch_id IS NULL
ORDER BY
date
)
UPDATE q
SET batch_id = #myid
, which will just skip the locked rows.