SQL Table locking for concurrency [duplicate] - sql

This question already has answers here:
Only inserting a row if it's not already there
(7 answers)
Closed 9 years ago.
I'm trying to make sure that one and only one row gets inserted into a table, but I'm running into issues where multiple processes are bumping into each other and I get more than one row. Here's the details (probably more detail than is needed, sorry):
There's a table called Areas that holds a hierarchy of "areas". Each "area" may have pending "orders" in the Orders table. Since it's a hierarchy, multiple "areas" can be grouped under a parent "area".
I have a stored procedure called FindNextOrder that, given an area, finds the next pending order (which could be in a child area) and "activates" it. "Activating" it means inserting the OrderID into the QueueActive table. The business rule is that an area can only have one active order at a time.
So my stored procedure has a statement like this:
IF EXISTS (SELECT 1 FROM QueueActive WHERE <Order exists for the given area>) RETURN
...
INSERT INTO QueueActive <Order data for the given area>
My problem is that there every once in a while, two different processes will call this stored procedure at almost the same time. When each one does the check for an existing row, each comes back with a zero. Because of that both processes do the insert statement and I end up with TWO active orders instead of just one.
How do I prevent this? Oh, and I happen to be using SQL Server 2012 Express but I need a solution that works in SQL Server 2000, 2005, and 2008 as well.
I already did a search for exclusively locking a table and found this answer but my attempt to implement this failed.

I would use some query hints on your select statement. The trouble is coming along because your procedure is only taking out shared locks and thus the other procedures can join in.
Tag on a WITH (ROWLOCK, XLOCK, READPAST) to your SELECT
ROWLOCK ensures that you are only locking the row.
XLOCK takes out an exclusive lock on the row, that way no one else can read it.
READPAST allows the query to skip over any locked rows and keep working instead of waiting.
The last one is optional and depends upon your concurrency requirements.
Further reading:
SQL Server ROWLOCK over a SELECT if not exists INSERT transaction
http://technet.microsoft.com/en-us/library/ms187373.aspx

Have you tried to create a trigger that rolls back second transaction if there is one Active order in a table?

Related

Updating different fields in different rows

I've tried to ask this question at least once, but I never seem to put it across properly. I really have two questions.
My database has a table called PatientCarePlans
( ID, Name, LastReviewed, LastChanged, PatientID, DateStarted, DateEnded). There are many other fields, but these are the most important.
Every hour, a JSON extract gets a fresh copy of the data for PatientCarePlans, which may or may not be different to the existing records. That data is stored temporarily in PatientCarePlansDump. Unlike other tables which will rarely change, and if they do only one or two fields, with this table there are MANY fields which may now be different. Therefore, rather than simply copy the Dump files to the live table based on whether the record already exists or not, my code does the no doubt wrong thing: I empty out any records from PatientCarePlans from that location, and then copy them all from the Dump table back to the live one. Since I don't know whether or not there are any changes, and there are far too many fields to manually check, I must assume that each record is different in some way or another and act accordingly.
My first question is how best (I have OKish basic knowledge, but this is essentially a useful hobby, and therefore have limited technical / theoretical knowledge) do I ensure that there is minimal disruption to the PatientCarePlans table whilst doing so? At present, my code is:
IF Object_ID('PatientCarePlans') IS NOT NULL
BEGIN
BEGIN TRANSACTION
DELETE FROM [PatientCarePlans] WHERE PatientID IN (SELECT PatientID FROM [Patients] WHERE location = #facility)
COMMIT TRANSACTION
END
ELSE
SELECT TOP 0 * INTO [PatientCarePlans]
FROM [PatientCareplansDUMP]
INSERT INTO [PatientCarePlans] SELECT * FROM [PatientCarePlansDump]
DROP TABLE [PatientCarePlansDUMP]
My second question relates to how this process affects the numerous queries that run on and around the same time as this import. Very often those queries will act as though there are no records in the PatientCarePlans table, which causes obvious problems. I'm vaguely aware of transaction locks etc, but it goes a bit over my head given the hobby status! How can I ensure that a query is executed and results returned whilst this process is taking place? Is there a more efficient or less obstructive method of updating the table, rather than simply removing them and re-adding? I know there are merge and update commands, but none of the examples seem to fit my issue, which only confuses me more!
Apologies for the lack of knowhow, though that of course is why I'm here asking the question.
Thanks
I suggest you do not delete and re-create the table. The DDL script to create the table should be part of your database setup, not part of regular modification scripts.
You are going to want to do the DELETE and INSERT inside a transaction. Preferably you would do this under SERIALIZABLE isolation in order to prevent concurrency issues. (You could instead do a WITH (TABLOCK) hint, which would be less likely cause a deadlock, but will completely lock the table.)
SET XACT_ABORT, NOCOUNT ON; -- always set XACT_ABORT if you have a transaction
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
BEGIN TRAN;
DELETE FROM [PatientCarePlans]
WHERE PatientID IN (
SELECT p.PatientID
FROM [Patients] p
WHERE location = #facility
);
INSERT INTO [PatientCarePlans] (YourColumnsHere) -- always specify columns
SELECT YourColumnsHere
FROM [PatientCarePlansDump];
COMMIT;
You could also do this with a single MERGE statement. However it is complex to write (owing to the need to restrict the set of rows being targetted), and is not usually more performant than separate statements, and also needs SERIALIZABLE.

How to establish read-only-once implement within SAP HANA?

Context: I am a long-time MSSQL developer... What I would like to know is how to implement a read-only-once select from SAP HANA.
High-level pseudo-code:
Collect request via db proc (query)
Call API with request
Store results of the request (response)
I have a table (A) that is the source of inputs to a process. Once a process has completed it will write results to another table (B).
Perhaps this is all solved if I just add a column to table A to avoid concurrent processors from selecting the same records from A?
I am wondering how to do this without adding the column to source table A.
What I have tried is a left outer join between tables A and B to get rows from A that have no corresponding rows (yet) in B. This doesn't work, or I haven't implemented such that rows are processed only 1 time by any of the processors.
I have a stored proc to handle batch selection:
/*
* getBatch.sql
*
* SYNOPSIS: Retrieve the next set of criteria to be used in a search
* request. Use left outer join between input source table
* and results table to determine the next set of inputs, and
* provide support so that concurrent processes may call this
* proc and get their inputs exclusively.
*/
alter procedure "ACOX"."getBatch" (
in in_limit int
,in in_run_group_id varchar(36)
,out ot_result table (
id bigint
,runGroupId varchar(36)
,sourceTableRefId integer
,name nvarchar(22)
,location nvarchar(13)
,regionCode nvarchar(3)
,countryCode nvarchar(3)
)
) language sqlscript sql security definer as
begin
-- insert new records:
insert into "ACOX"."search_result_v4" (
"RUN_GROUP_ID"
,"BEGIN_DATE_TS"
,"SOURCE_TABLE"
,"SOURCE_TABLE_REFID"
)
select
in_run_group_id as "RUN_GROUP_ID"
,CURRENT_TIMESTAMP as "BEGIN_DATE_TS"
,'acox.searchCriteria' as "SOURCE_TABLE"
,fp.descriptor_id as "SOURCE_TABLE_REFID"
from
acox.searchCriteria fp
left join "ACOX"."us_state_codes" st
on trim(fp.region) = trim(st.usps)
left outer join "ACOX"."search_result_v4" r
on fp.descriptor_id = r.source_table_refid
where
st.usps is not null
and r.BEGIN_DATE_TS is null
limit :in_limit;
-- select records inserted for return:
ot_result =
select
r.ID id
,r.RUN_GROUP_ID runGroupId
,fp.descriptor_id sourceTableRefId
,fp.merch_name name
,fp.Location location
,st.usps regionCode
,'USA' countryCode
from
acox.searchCriteria fp
left join "ACOX"."us_state_codes" st
on trim(fp.region) = trim(st.usps)
inner join "ACOX"."search_result_v4" r
on fp.descriptor_id = r.source_table_refid
and r.COMPLETE_DATE_TS is null
and r.RUN_GROUP_ID = in_run_group_id
where
st.usps is not null
limit :in_limit;
end;
When running 7 concurrent processors, I get a 35% overlap. That is to say that out of 5,000 input rows, the resulting row count is 6,755. Running time is about 7 mins.
Currently my solution includes adding a column to the source table. I wanted to avoid that but it seems to make a simpler implement. I will update the code shortly, but it includes an update statement prior to the insert.
Useful references:
SAP HANA Concurrency Control
Exactly-Once Semantics Are Possible: Here’s How Kafka Does It
First off: there is no "read-only-once" in any RDBMS, including MS SQL.
Literally, this would mean that a given record can only be read once and would then "disappear" for all subsequent reads. (that's effectively what a queue does, or the well-known special-case of a queue: the pipe)
I assume that that is not what you are looking for.
Instead, I believe you want to implement a processing-semantic analogous to "once-and-only-once" aka "exactly-once" message delivery. While this is impossible to achieve in potentially partitioned networks it is possible within the transaction context of databases.
This is a common requirement, e.g. with batch data loading jobs that should only load data that has not been loaded so far (i.e. the new data that was created after the last batch load job began).
Sorry for the long pre-text, but any solution for this will depend on being clear on what we want to actually achieve. I will get to an approach for that now.
The major RDBMS have long figured out that blocking readers is generally a terrible idea if the goal is to enable high transaction throughput. Consequently, HANA does not block readers - ever (ok, not ever-ever, but in the normal operation setup).
The main issue with the "exactly-once" processing requirement really is not the reading of the records, but the possibility of processing more than once or not at all.
Both of these potential issues can be addressed with the following approach:
SELECT ... FOR UPDATE ... the records that should be processed (based on e.g. unprocessed records, up to N records, even-odd-IDs, zip-code, ...). With this, the current session has an UPDATE TRANSACTION context and exclusive locks on the selected records. Other transactions can still read those records, but no other transaction can lock those records - neither for UPDATE, DELETE, nor for SELECT ... FOR UPDATE ... .
Now you do your processing - whatever this involves: merging, inserting, updating other tables, writing log-entries...
As the final step of the processing, you want to "mark" the records as processed. How exactly this is implemented, does not really matter.
One could create a processed-column in the table and set it to TRUE when records have been processed. Or one could have a separate table that contains the primary keys of the processed records (and maybe a load-job-id to keep track of multiple load jobs).
In whatever way this is implemented, this is the point in time, where this processed status needs to be captured.
COMMIT or ROLLBACK (in case something went wrong). This will COMMIT the records written to the target table, the processed-status information, and it will release the exclusive locks from the source table.
As you see, Step 1 takes care of the issue that records may be missed by selecting all wanted records that can be processed (i.e. they are not exclusively locked by any other process).
Step 3 takes care of the issue of records potentially be processed more than once by keeping track of the processed records. Obviously, this tracking has to be checked in Step 1 - both steps are interconnected, which is why I point them out explicitly. Finally, all the processing occurs within the same DB-transaction context, allowing for guaranteed COMMIT or ROLLBACK across the whole transaction. That means, that no "record marker" will ever be lost when the processing of the records was committed.
Now, why is this approach preferable to making records "un-readable"?
Because of the other processes in the system.
Maybe the source records are still read by the transaction system but never updated. This transaction system should not have to wait for the data load to finish.
Or maybe, somebody wants to do some analytics on the source data and also needs to read those records.
Or maybe you want to parallelise the data loading: it's easily possible to skip locked records and only work on the ones that are "available for update" right now. See e.g. Load balancing SQL reads while batch-processing? for that.
Ok, I guess you were hoping for something easier to consume; alas, that's my approach to this sort of requirement as I understood it.

TSQL - Delete all records in a database older than x days

I have a business requirement to delete all records in from multiple databases older than 1 year. Is there a way to do this at the database level, or must each individual table must be scripted?
I see there are various ways to do this at the table level, but I was wondering if it can be done at the database level?
Table level: (for anyone looking for how to delete at the table level):
sql delete all rows older than 30 days
Delete all rows with timestamp older than x days
The essence of this will be using SQL to write SQL. Here's a starting point:
SELECT DISTINCT REPLACE(REPLACE(REPLACE(
'DELETE FROM {s}.{t} WHERE {c} < DATEADD(YEAR, -1, GetUtcDate())'
'{s}', table_name),
'{t}', table_name),
'{c}', column_name)
FROM
INFORMATION_SCHEMA.COLUMNS
WHERE
column_name like '%Created%'
It queries the info schema for a list of all the columns containing the word Created (assumption: that your tables contain columns with names like CreatedAt, CreatedOn, CreatedDate, Created, RecordCreated etc; adjust for your scenario) and puts the table/column names into a string that is an SQL that deletes data
You run this, then you copy the results out of the grid and into another query window and run it. It is thus an SQL that writes SQL
After you get down with the concept, you could look to automate it by having some program or dynamic SQL select the delete queries and run them
You can also "level up" if you have multiple databases, by making a query that hits sys.databases to find out all the database names that need querying, and then write an SQL that hits the DB list and generates variations of this sql that hits a specific DB, that is an SQL that hits specific tables for deletion
It's like Inception; be sure you understand the outermost layer of the onion before you start digging deeper
Bear in mind also that your data may be subject to constraints that mean you either have to:
delete them in a certain order or
keep re-running the delete until no more deletes are done (the first time you try a parent-then-child delete it may fail becuase children exist, the second time you might have deleted all the children so the parent can be removed - if any children remain this route fails, but maybe you want it to fail) or
configure your constraints to delete cascade so that deleting a record from the parent deletes records from the children
Warning: this is a seriously destructive action, and you could easily end up removing more than you intended or leaving your database in a state where referential integrity is broken and your app stops working.
This is not something that is just flippently designed in a quick 5 minutes of an SO question; this is a weeks+ long project to make sure you're only deleting relevant stuff. For example configuring the keys for delete cascade and then accidentally removing the "Application Status: Approved" enum record just because it was put into the DB more than a year ago could wipe out the entire database, erasing every approved application (even those approved yesterday), every account that came from it, all their transactions etc.
Think very carefully about this

Some confusion on the description of read consistency in Oracle

Below is a short brief of read consistency from oracle concepts guide.
What is a sql statement, just one sql? Or Pl/SQL or Store Procedure? Anyone can help provide me one opposite example which can indicates the un-consistency read?
read consistency
A consistent view of data seen by a user. For example, in statement-level read
consistency the set of data seen by a SQL statement remains constant throughout
statement execution.
A "statement" in this context is one DML statement: a single SELECT, INSERT, UPDATE, DELETE, MERGE.
It is not a PL/SQL block. Similarly, multiple executions of the same DML statement (say, within a PL/SQL loop) are separate "statements". If you need consistency over multiple statements or within a PL/SQL block, you can achieve that using SET TRANSACTION ISOLATION LEVEL SERIALIZABLE or SET TRANSACTION READ ONLY. Both introduce limitations.
An opposite example of an inconsistent read would be as follows.
Starting conditions: table BIG_TABLE has 10 million rows.
User A at 10:00:
SELECT COUNT(*) FROM BIG_TABLE;
User B at 10:01:
DELETE FROM BIG_TABLE WHERE ID >= 9000000; -- delete the last million rows
User B at 10:02:
COMMIT;
User A at 10:03: query completes:
COUNT(*)
--------------
9309129
That is wrong. User A should have either gotten 10 million rows or 9 million rows. At no point were there 9309129 committed rows in the table. What has happened is that user A had read 309,129 rows that user B was deleting before Oracle actually processed the deletion (or before the COMMIT). Then, after the user B delete/commit, user A's query stopped seeing the deleted rows and stopped counting them.
This sort of problem is impossible in Oracle, thanks to its implementation of Multiversion Read Consistency.
In Oracle, in the above situation, as it encountered blocks that had rows deleted (and committed) by User B, User A's query would have used the UNDO data reconstruct what those blocks looked like at 10:00 -- the time when user A's query started.
That's basically it -- Oracle statements operate on the a version of the database as it existed as of a single point in time. This point in time is almost always the time when the statement started. There are some exception cases involving updates when that point in time will be moved to a point in time "mid statement". But it is always consistent as of one point in time or another.

SQL unique field: concurrency bugs? [duplicate]

This question already has answers here:
Only inserting a row if it's not already there
(7 answers)
Closed 9 years ago.
I have a DB table with a field that must be unique. Let's say the table is called "Table1" and the unique field is called "Field1".
I plan on implementing this by performing a SELECT to see if any Table1 records exist where Field1 = #valueForField1, and only updating or inserting if no such records exist.
The problem is, how do I know there isn't a race condition here? If two users both click Save on the form that writes to Table1 (at almost the exact same time), and they have identical values for Field1, isn't it possible that the following would happen?
User1 makes a SQL call, which performs the select operation and determines there are no existing records where Field1 = #valueForField1. User1's process is preempted by User2's process, which also finds no records where Field1 = #valueForField1, and performs an insert. User1's process is allowed to run again, and inserts a second record where Field1 = #valueForField1, violating the requirement that Field1 be unique.
How can I prevent this? I'm told that transactions are atomic, but then why do we need table locks too? I've never used a lock before and I don't know whether or not I need one in this case. What happens if a process tries to write to a locked table? Will it block and try again?
I'm using MS SQL 2008R2.
Add a unique constraint on the field. That way you won't have to SELECT. You will only have to insert. The first user will succeed the second will fail.
On top of that you may make the field autoincremented, so you won't have to care on filling it, or you may add a default value, again not caring on filling it.
Some options would be an autoincremented INT field, or a unique identifier.
You can add a add a unique constraint. Example from http://www.w3schools.com/sql/sql_unique.asp:
CREATE TABLE Persons
(
P_Id int NOT NULL UNIQUE
)
EDIT: Please also read Martin Smith's comment below.
jyparask has a good answer on how you can tackle this specific problem. However, I would like to elaborate on your confusion over locks, transactions, blocking, and retries. For the sake of simplicity, I'm going to assume transaction isolation level serializable.
Transactions are atomic. The database guarantees that if you have two transactions, then all operations in one transaction occur completely before the next one starts, no matter what kind of race conditions there are. Even if two users access the same row at the same time (multiple cores), there is no chance of a race condition, because the database will ensure that one of them will fail.
How does the database do this? With locks. When you select a row, SQL Server will lock the row, so that all other clients will block when requesting that row. Block means that their query is paused until that row is unlocked.
The database actually has a couple of things it can lock. It can lock the row, or the table, or somewhere in between. The database decides what it thinks is best, and it's usually pretty good at it.
There is never any retrying. The database will never retry a query for you. You need to explicitly tell it to retry a query. The reason is because the correct behavior is hard to define. Should a query retry with the exact same parameters? Or should something be modified? Is it still safe to retry the query? It's much safer for the database to simply throw an exception and let you handle it.
Let's address your example. Assuming you use transactions correctly and do the right query (Martin Smith linked to a few good solutions), then the database will create the right locks so that the race condition disappears. One user will succeed, and the other will fail. In this case, there is no blocking, and no retrying.
In the general case with transactions, however, there will be blocking, and you get to implement the retrying.