SQL Server Update TOP by date entered - sql

I have a table [occ] without PK or timestamps.
These are the columns
on_date type_id price_id processed
I need to update 50 records to processed = 1 every X minutes.
My issue is that the code below acts a bit randomly and may update newer records and leave out older ones, creating huge sync problems in my backend
UPDATE TOP (50) occ
SET processed = 1
FROM occ
How could I make sure that my TOP 50 updated records are the oldest evey time?
Thank you

By adding Where and Order By clause like this - Order By on_date in your query.
UPDATE TOP (50) occ
SET processed = 1
FROM occ
WHERE processed = 0
Order By on_date

The answer is that unless you store the timestamp or a sequential ID (such as an identity field) - you can not guarantee that TOP 50 returns the the oldest 50 records. SQL might seem to often return the earliest records first in such cases, but this is not a guaranteed or predictable behaviour.

Create one ID(Identity Column) for every row added in the table and and use Order By Id in your Query
UPDATE TOP (50) occ
SET processed = 1
FROM occ
WHERE processed = 0
Order By Id

Related

Update and then select updated rows?

I have an application that selects row with a particular status and then starts processing these rows. However some long running processing can cause a new instance of my program to be started, selecting the same rows again because it haven't had time to update the status yet. So I'm thinking of selecting my rows and then updating the status to something else so they cannot be selected again. I have done some searching and got the impression that the following should work, but it fails.
UPDATE table SET status = 5 WHERE status in
(SELECT TOP (10) * FROM table WHERE status = 1)
Only one expression can be specified in the select list when the subquery is not introduced with EXISTS.
TLDR: Is it possible to both select and update rows at the same time? (The order doesn't really matter)
You can use an output clause to update the rows and then return them as if they were selected.
As for updating the top 100 rows only, a safe approach is to use a cte to select the relevant rows first (I assumed that column id can be used to order the rows):
with cte as (select top (100) * from mytable where status = 1 order by id)
update cte set status = 5 output inserted.*
You can directly go for UPDATE statement. It will generate exclusive lock on the row and other concurrent transactions cannot read this row.
More information on locks
Exclusive locks: Exclusive locks are used to lock data being modified by one transaction thus preventing modifications by other
concurrent transactions. You can read data held by exclusive lock only
by specifying a NOLOCK hint or using a read uncommitted isolation
level. Because DML statements first need to read the data they want to
modify you'll always find Exclusive locks accompanied by shared locks
on that same data.
UPDATE TOP(10) table
SET status = 5 WHERE status =1

How to create a queue like structure in SQL Server

Is there a good way to create a queue like structure in SQL Server?
Requirements:
When I insert rows, I want them to default to the bottom of the queue
When I select rows, I want to easily be able to get the top of the queue
Here's the tough one: I want to be able to easily move something up the queue, and reorient the rest. Example: move item 5 up to number 1, then 1-4 becomes 2-5
A simple identity column would work for requirements 1 and 2, but how would I handle 3?
Solution
I ended up implementing the solution from #roger-wolf
One difference, I used a trigger rather than a stored procedure to renumber. Here's my trigger code:
CREATE TRIGGER [dbo].[TR_Queue]
ON [dbo].[Queue]
AFTER INSERT, DELETE, UPDATE
AS
BEGIN
SET NOCOUNT ON;
-- Get the current max value in priority
DECLARE #maxPriority INT = COALESCE((SELECT MAX([priority]) FROM [dbo].[Queue]), 0);
WITH newValues AS (
-- Renumber by priority, starting at 1
SELECT [queueID]
,ROW_NUMBER() OVER(ORDER BY [priority] ASC) AS [priority]
FROM (
-- Pretend all nulls are greater than previous max priority
SELECT [queueID]
,COALESCE([priority], #maxPriority+1) AS [priority]
FROM [dbo].[Queue]
) AS tbl
)
UPDATE q
SET q.[priority] = newValues.[priority]
FROM [dbo].[Queue] AS qroger-wolf
INNER JOIN newValues
ON q.[queueID] = newValues.[queueID]
END
This works well for me as the queue is always relatively small and infrequently updated, so I don't have to work about performance of the trigger.
Use a float column for prioritisation and an approach similar to Celko trees:
If you have items with priorities 1, 2, and 3 and the last needs to become second, calculate an average between its new neighbours, 1.5 in this example;
If another one needs to become second, its priority would be 1.25. This can go on for quite a while;
When displaying queued items by their priority, use row_number() instead of float values in UI;
If items become too close together (say, 1e-10 or less), have a stored procedure ready to renumber them as integers.
The only deficiency I see here is that it becomes a bit more difficult to find N-th item in a middle of a queue, when it's neither first nor last. If you don't need that, the approach should work.
You could add a Priority column of type DateTime, and when you set a row as a priority row you set the current date-time in the Priority column and then use that as part of your order by criteria?
I had a similar requirement in a past project, what I did (and it worked):
Add column update_at_utc of type datetime2
When inserting, set update_at_utc = GETDATEUTC()
When retrieving, order by update_at_utc
When moving a row in the queue, for example between rows 3 and 4, simply take average of update_at_utc of these rows and use it to set update_at_utc of the row being moved.
Note 1: Point 4 assumes that the frequency of inserts and of moving the rows up/down the queue is such that datetime2 type has sufficient resolution. For example, if you insert 2 rows 1 millisecond apart, and then try to move 1000 rows between these 2 rows, then datetime2 resolution will be insufficient (https://learn.microsoft.com/en-us/sql/t-sql/data-types/datetime2-transact-sql?view=sql-server-2017). In such case, the moving of rows up/down the queue would need to be more complicated; When moving a row N places lower down:
Remember update_at_utc of the row N places lower down
For all rows between the current and the new position: assign row's update_at_utc to the preceding row's update_at_utc
Assign update_at_utc of the row being moved to the date remembered in point 1 above.
Note 2: I suggest UTC dates instead of local dates to avoid issues during a daylight saving switch.

T-SQL ways to avoid potentialy updating the same row based on subquery results

I have a SQL Server table with records (raw emails) that needs to be processed (build the email and send it) in a given order by an external process (mailer). Its not very resource intensive but can take a while with all the parsing and SMTP overhead etc.
To speed things up I can easily run multiple instance of the mailer process over multiple servers but worry that if two were to start at almost the same time they might still overlap a bit and send the same records.
Simplified for the question my table look something like this with each record having the data for the email.
queueItem
======================
queueItemID PK
...data...
processed bit
priority int
queuedStart datetime
rowLockName varchar
rowLockDate datetime
Batch 1 (Server 1)
starts at 12:00PM
lock/reserve the first 5000 rows (1-5000)
select the newly reserved rows
begin work
Batch 2 (Server 2)
starts at 12:15PM
lock/reserve the next 5000 rows (5001-10000)
select the newly reserved rows
begin work
To lock the rows I have been using the following:
declare #lockName varchar(36)
set #lockName = newid()
declare #batchsize int
set #batchsize = 5000
update queueItem
set rowLockName = #lockName,
rowLockDate = getdate()
where queueitemID in (
select top(#batchsize) queueitemID
from queueItem
where processed = 0
and rowLockName is null
and queuedStart <= getdate()
order by priority, queueitemID
)
If I'm not mistaken the query would start executing the SELECT subquery first and then lock the rows in preparation of the update, this is fast but not instantaneous.
My concern is that if I start two batches at near the same time (faster than the subquery runs) Batch 1's UPDATE might not be completed and Batch 2's SELECT would see the records as still available and attempt/succeed in overwriting Batch 1 (sort of race condition?)
I have ran some test but so far haven't had the issue with them overlapping, is it a valid concern that will come to haunt me at the worst of time?
Perhaps there are better way to write this query worth looking into as I am by no mean a T-SQL guru.

Non Repeatable Read from database table in SQL Server

Suppose I have a table 100 rows, I just want to select top 10 rows of table, but my situation is that i want to select only those rows which was not previously processed.
For this i have added a Flag column so that i will update whenever i process rows.
But here the problem arises when concurrent request comes for top 10 rows. Both may get same rows and trying to update the same rows (which I dont want to do).
Here I can't use Begin Transaction because It will lock the table and concurrent request will not get handled.
Requirement : My actual requirement is When i am selecting top 10 rows
using flag condition and updating then, then if other request for the
same it will also select other top 10 rows which is not handling by
Request 1.
Example : My table contains 100 rows.
{
Select top 10 * from table_name where flag=0
update table_name set top 10 flag = 1
}
(Will select top 10 out of 100 rows n update)
if at the same time during above request, another request come,
{
Select top 10 * from table_name where flag=0 (Should skip previous request rows)
update table_name set top 10 flag = 1
}
Need: (Will select top 10 out of rest 90 rows n update)
I Need a lock on top 10 rows of first request, but lock should like skip rows of first request even during simultaneous select statement of both requests
Please help me out for this to solve.
You can use an OUTPUT clause to do both the selecting and the updating the flag in one statement, e.g.
UPDATE TOP 10 table
SET flag = 1
WHERE flag = 0
OUTPUT inserted.*
If I understand you correctly you don't want to use a Transaction because it will lock the table for the duration of the update.
Maybe you could split the process into one part which selects the rows and updates the flag and a second part where you actually do your update with the selected rows.
Use a Transaction only for the first part of the task. This will ensure the table is only locked for the absolute Minimum of time.
As for your non-repeatable reads:
If you really want enforce this policy you should delete the selected row from the table and optionally save them to another table where the read-history stays. The lowest-level way to accomplish this guaranteed is with an update of another flag (updated?) and a trigger after the update.
Transaction with ISOLATION LEVEL REPEATABLE READ
{
select top 10 rows
update select-flag
return the 10 rows
}
normal query
{
take the returned 10 rows and do something
change updated-flag
}
Trigger after update if updated-flag changed
{
copy updated to read-history-table
delete updated-rows
}
ISOLATION LEVELS on MSDN
REPEATABLE READ "Specifies that statements cannot read data that has
been modified but not yet committed by other transactions and that
no other transactions can modify data that has been read by the
current transaction until the current transaction completes."

SQL: Updating a table updates only 7 rows at a time

While executing a SQL update,
update request set state_id = 2 where id in (
select id from request where state_id = 0 and entity_id in
(<list of numbers>)
)
This updates only 7 rows at a time.
The inner select query,
select id from request where state_id = 0 and entity_id in (<list of numbers>)
returns 2000+ records.
I am using Squirrel SQL client to run the query and checked if the limit rows is set to 3000.
What makes it more interesting is, if I wait for more time in between successive executing of the update query, more rows gets updated.
I mean, When I waited for 10 secs, and executed the update, 45 rows got updated.
but when ran rapidly, just 7 rows gets updated.
Can someone please help me point out what I might be doing wrong?
Executing the inner select -
Executing the update -
I suggest writing a proper update, since I see no reason to introduce a more complex query involving a subquery. Also just to be sure you didn't previously alter ##ROWCOUNT:
SET ##ROWCOUNT 0;
UPDATE dbo.request
SET state_id = 2
WHERE state_id = 0
AND entity_id IN (<list of numbers>);
Also check that you haven't artificially limited the row count in Management Studio. In Tools > Options:
(Well, that would have been relevant if you had tagged correctly. Check whatever client tool you use against Sybase and see if it has some similar option.)