Design a Lock for SQL Server to help relax the conflict between INSERT and SELECT - sql

SQL Server is SQL Azure, basically it's SQL Server 2008 for normal process.
I have a table, called TASK, constantly have new data in (new task), and removed (task complete)
For new data in, I use INSERT INTO .. SELECT ..., most of time takes very long, lets say dozen of minutes.
For old data out, I first use SELECT (WITH NOLOCK) to get task, UPDATE to let other thread know this task already starts to process, then DELETE once finished.
Dead lock sometime happens on SELECT, most time happens on UPDATE and DELETE.
this is not time critical task, so I can start process the new data once all INSERT finished. Is there any kind of LOCK to ask SELECT not to select it before the INSERT finished? Or any kind of other suggestion to avoid Conflict. I can redesign table if needed.

later the sqlserver2005,resolve lock is easy.
for conflict
1.you can use the service broker.
2.use the isolution level.
dbcc useroptions ,at last row ,you can see the deflaut isolution level is read_committed,this is the session level.
we can change the level to read_committed_snapshot for conflict,in sqlserver, not realy row lock like oracle.but we can use this method implement.
ALTER DATABASE DBName
SET READ_COMMITTED_SNAPSHOT ON;
open this feature,must in single user schame.
and you can test it.
for session A ,session B.
A:update table1 set name = 'new' with(Xlock) where id = 1
B:you still update other row and select all the data from table.
my english is not very good,but for lock ,i know.
in sqlserver,for function ,there are three locks.
1.optimistic lock ,use the timestamp(rowversion) control.
2.pessimism lock ,force lock when use the date.use Ulock,Xlock and so on.
3.virtual lock,use the proc getapplock().
if you need lock schame in system architecture,please me email : mjjjj2001#163.com

Consider using service broker if this is a processing queue.
There are a number of considerations that affect performance and locking. I surmise that the data is being updated and deleted in a separate session. Which transaction isolation level is in use for the insert session and the delete session.
Has the insert session and all transactions committed and closed when the delete session runs? Are there multiple delete sessions running concurrently? It is very important to have an index on the columns you are using to identify a task for the SELECT/UPDATE/DELETE statements, especially if you move to a higher isolation level such as REPEATABLE READ or SERIALIZED.
All of these issues could be solved by moving to Service Broker if it is appropriate.

Related

Understanding locks and query status in Snowflake (multiple updates to a single table)

While using the python connector for snowflake with queries of the form
UPDATE X.TABLEY SET STATUS = %(status)s, STATUS_DETAILS = %(status_details)s WHERE ID = %(entry_id)s
, sometimes I get the following message:
(snowflake.connector.errors.ProgrammingError) 000625 (57014): Statement 'X' has locked table 'XX' in transaction 1588294931722 and this lock has not yet been released.
and soon after that
Your statement X' was aborted because the number of waiters for this lock exceeds the 20 statements limit
This usually happens when multiple queries are trying to update a single table. What I don't understand is that when I see the query history in Snowflake, it says the query finished successfully (Succeded Status) but in reality, the Update never happened, because the table did not alter.
So according to https://community.snowflake.com/s/article/how-to-resolve-blocked-queries I used
SELECT SYSTEM$ABORT_TRANSACTION(<transaction_id>);
to release the lock, but still, nothing happened and even with the succeed status the query seems to not have executed at all. So my question is, how does this really work and how can a lock be released without losing the execution of the query (also, what happens to the other 20+ queries that are queued because of the lock, sometimes it seems that when the lock is released the next one takes the lock and have to be aborted as well).
I would appreciate it if you could help me. Thanks!
Not sure if Sergio got an answer to this. The problem in this case is not with the table. Based on my experience with snowflake below is my understanding.
In snowflake, every table operations also involves a change in the meta table which keeps track of micro partitions, min and max. This meta table supports only 20 concurrent DML statements by default. So if a table is continuously getting updated and getting hit at the same partition, there is a chance that this limit will exceed. In this case, we should look at redesigning the table updation/insertion logic. In one of our use cases, we increased the limit to 50 after speaking to snowflake support team
UPDATE, DELETE, and MERGE cannot run concurrently on a single table; they will be serialized as only one can take a lock on a table at at a time. Others will queue up in the "blocked" state until it is their turn to take the lock. There is a limit on the number of queries that can be waiting on a single lock.
If you see an update finish successfully but don't see the updated data in the table, then you are most likely not COMMITting your transactions. Make sure you run COMMIT after an update so that the new data is committed to the table and the lock is released.
Alternatively, you can make sure AUTOCOMMIT is enabled so that DML will commit automatically after completion. You can enable it with ALTER SESSION SET AUTOCOMMIT=TRUE; in any sessions that are going to run an UPDATE.

ACID transactions across multiple technologies

I have an application that uses a local database and a remote database to synchronize to. The local database uses SQLite and for the remote database I'm using postgres. I need to move data from one database to the other database and avoid duplicating information.
Roughly what I do right now:
BEGIN; //remote database (start transaction)
SELECT * FROM local.queued TOP 1; //local database (select first queued element)
INSERT INTO remote.queued VALUES ( element ) //remote database (insert first queued element on remote database)
BEGIN; //local database (start transaction)
DELETE * FROM local.queued LIMIT 1; //local database (delete first queued element on local database)
END; //local database (finalize transaction local database)
END; //remote database (finalize transaction remote database)
This works relatively well most of the times but incidentally, after giving a hard reset to the program I've noticed a data record was duplicated. I believe this is has something to do with the transaction finalizing. Because I'm using two distinct technologies it would be impossible to create a single atomic commit with WAL archiving.
Any ideas how I could improve this concept to avoid duplicative entries.
The canonical way to do that is a distributed transaction using the two-phase commit protocol.
Unfortunately SQLite doesn't seem to support it, but since PostgreSQL does, you can still use it if only two databases are involved:
BEGIN; -- on PostgreSQL
BEGIN; -- on SQLite
/*
* Do work on both databases.
* On error, ROLLBACK both transactions.
*/
PREPARE TRANSACTION 'somename'; -- PostgreSQL
COMMIT; -- SQLite
COMMIT PREPARED 'somename'; -- PostgreSQL
Now if an error happens during the SQLite COMMIT, you run ROLLBACK PREPARED 'sonename' on PostgreSQL. The idea is that everything that can fail during commit is done during PREPARE TRANSACTION, and the state of the transaction is persisted so that it stays open, but will still survive a server restart.
This is safe, but there is a caveat. Prepared transactions are dangerous, because they will hold locks and keep VACUUM from cleaning up (like all other transactions), but they are persistent and stick around until you explicitly remove them. So you need some piece of software, a distributed transaction manager, that is crash safe and keeps track of all distributed transactions. This transaction manager can clean up all prepared transactions after some outage.
I think it would make sense to make your DML actions idempotent - that is to say that if you call them multiple times they have the same overall effect. For example, we can make the INSERT a no-op if the data exists:
INSERT INTO x(id, name)
SELECT nu.id, nu.name
FROM
(SELECT 1 as id, 'a' as name) as nu
LEFT JOIN x ON nu.id = x.id
WHERE
x.id IS NULL
You can run this as many times as you like, and it'll only insert one record
https://www.db-fiddle.com/f/nbHmy3PVDQ3RrGMqLni1su/0
YOu'll need to decide what to do if the record exists in an altered state - eg do you want to leave it alone, or reset it to the incoming values - a question for another time

How to reduce downtime of table during inserts SQL Server

I have an operative table, call it Ops. The table gets queried by our customers via a web service every other second.
There are two processes that affect the table:
Deleting expired records (daily)
Inserting new records (weekly)
My goal is to reduce downtime to a minimum during these processes. I know Oracle, but this is the first time I'm using SQL Server and T-SQL. In Oracle, I would do a truncate to speed up the first process of deleting expired records and a partition exchange to insert new records.
Partition Exchanges for SQL Server seem a bit harder to handle, because from what I can read, one has to create file groups, partition schemes and partition functions (?).
What are your recommendations for reducing downtime?
A table is not offline because someone is deleting or inserting rows. The table can be read and updated concurrently.
However, under the default isolation level READ COMMITTED readers are blocked by writers and writers are blocked by readers. This means that a SELECT statement can take longer to complete because a not-yet-committed transaction is locking some rows the SELECT statement is trying to read. The SELECT statement is blocked until the transaction completes. This can be a problem if the transaction takes long time, since it appears as the table was offline.
On the other hand, under READ COMMITTED SNAPSHOT and SNAPSHOT isolation levels readers don't block writers and writers don't block readers. This means that a SELECT statement can run concurrently with INSERT, UPDATE and DELETE statements without waiting to acquire locks, because under these isolation levels SELECT statements don't request locks.
The simplest thing you can do is to enable READ COMMITTED SNAPSHOT isolation level on the database. When this isolation level is enabled it becomes the default isolation level, so you don't need to change the code of your application.
ALTER DATABASE MyDataBase SET READ_COMMITTED_SNAPSHOT ON
If your problem is "selects getting blocked," you can try 'NO LOCK' hint. But be sure to read the implications. You can check https://www.mssqltips.com/sqlservertip/2470/understanding-the-sql-server-nolock-hint/ for details.

Why is an implicit table lock being released prior to end of transaction in RedShift?

I have an ETL process that is building dimension tables incrementally in RedShift. It performs actions in the following order:
Begins transaction
Creates a table staging_foo like foo
Copies data from external source into staging_foo
Performs mass insert/update/delete on foo so that it matches staging_foo
Drop staging_foo
Commit transaction
Individually this process works, but in order to achieve continuous streaming refreshes to foo and redundancy in the event of failure, I have several instances of the process running at the same time. And when that happens I occasionally get concurrent serialization errors. This is because both processes are replaying some of the same changes to foo from foo_staging in overlapping transactions.
What happens is that the first process creates the staging_foo table, and the second process is blocked when it attempts to create a table with the same name (this is what I want). When the first process commits its transaction (which can take several seconds) I find that the second process gets unblocked before the commit is complete. So it appears to be getting a snapshot of the foo table before the commit is in place, which causes the inserts/updates/deletes (some of which may be redundant) to fail.
I am theorizing based on the documentation http://docs.aws.amazon.com/redshift/latest/dg/c_serial_isolation.html where it says:
Concurrent transactions are invisible to each other; they cannot detect each other's changes. Each concurrent transaction will create a snapshot of the database at the beginning of the transaction. A database snapshot is created within a transaction on the first occurrence of most SELECT statements, DML commands such as COPY, DELETE, INSERT, UPDATE, and TRUNCATE, and the following DDL commands :
ALTER TABLE (to add or drop columns)
CREATE TABLE
DROP TABLE
TRUNCATE TABLE
The documentation quoted above is somewhat confusing to me because it first says a snapshot will be created at the beginning of a transaction, but subsequently says a snapshot will be created only at the first occurrence of some specific DML/DDL operations.
I do not want to do a deep copy where I replace foo instead of incrementally updating it. I have other processes that continually query this table so there is never a time when I can replace it without interruption. Another question asks a similar question for deep copy but it will not work for me: How can I ensure synchronous DDL operations on a table that is being replaced?
Is there a way for me to perform my operations in a way that I can avoid concurrent serialization errors? I need to ensure that read access is available for foo so I can't LOCK that table.
OK, Postgres (and therefore Redshift [more or less]) uses MVCC (Multi Version Concurrency Control) for transaction isolation instead of a db/table/row/page locking model (as seen in SQL Server, MySQL, etc.). Simplistically every transaction operates on the data as it existed when the transaction started.
So your comment "I have several instances of the process running at the same time" explains the problem. If Process 2 starts while Process 1 is running then Process 2 has no visibility of the results from Process 1.

How to use transaction and select together, with sql2008

we have a ASP.NET site with a SQL2008 DB.
I'm trying to implement transactions in our current system.
We have a method that is inserting a new row to a table (Table_A), and it works fine with transaction. If an exception is throwed, it will do the rollback.
My problem is, that another user can't do anything that invole with Table_A in the same time - probably because it's locked.
I can understand why the transaction have to lock the new row, but I don't want it to lock the whole table - forcing user_B to wait for user_A to finish.
I have tryed to set isolationlevel ReadUncommitted and Serializable on the transaction, but it didn't work.
So i'm wondering if I also need to rewrite all my select methods, so they are able to select all the rows except the new row there is in the making?
Example:
Table T has 10 rows(records?)
User A is inserting a new row into table T. This take some time.
At the same time, User B want to select the 10 rows, and dosn't care about the new row.
So I guess I somehow need to rewrite user B's select query from a "select all query" to a "select all rows that isnt bind to a transaction query". :)
Any tips is much appriciated, thanks.
You'll need to make changes to your selects, not to the transaction that is doing the insert.
You can
set isolation level read uncommitted for the connection doing the select, or
specify the WITH (NOLOCK) hint against the table that is being locked, or
specify the WITH (READPAST) hint against the table being locked
As I say, all 3 of these options are things that you'd apply to the SELECT, not the INSERT.
A final option may be to enable SNAPSHOT isolation, and change the database default to use that instead (There are probably many warnings I should include here, if the application hasn't been built/tested with snapshot isolation turned on)
Probably what you need is to set your isolation level to Read Committed Snapshot.
Although it needs more space for the tempdb it is great when you don't want your selects to lock the tables.
Beware of posible problems when using this isolation level.