How are transactions partitioned/isolated in SQLite? - sql

I have been reading the SQLite documentation and also referencing code I have written previously but I don't seem to be able to find a definitive answer to what I imagine to be a rather simple question.
I would like to execute many (separate) compiled statements within a transaction, but child threads may also be creating transactions or just executing statements at the same time and I would not want them included in this particular transaction. Currently, I have a single database handle that I share between all threads.
So, my question is,
1) .. is it generally better to have some kind of semaphore around transactions to ensure they will not clash/collect with other statements being executed against a database handle. I already marshal writes to prevent problems with multithreaded issues with SQLite (although with WAL now it's very hard to unsettle it at all).
2) .. or are you expected to open multiple database connections and start/commit the transactions one per database connection if they will be concurrent?

Changes made in one database connection are invisible to all other database connections prior to commit.
So it seems a hybrid approach of having several connections open to the database provides adequate concurrency guarantees, trading off the expense of opening a new connection with the benefit of allowing multi-threaded write transactions.
A query sees all changes that are completed on the same database connection prior to the start of the query, regardless of whether or not those changes have been committed.
If changes occur on the same database connection after a query starts running but before the query completes, then it is undefined whether or not the query will see those changes.
If changes occur on the same database connection after a query starts running but before the query completes, then the query might return a changed row more than once, or it might return a row that was previously deleted.
For the purposes of the previous four items, two database connections that use the same shared cache and which enable PRAGMA read_uncommitted are considered to be the same database connection, not separate database connections.
Here is the SQLite information on isolation. Which is exceptionally useful to read and understand for this problem.

Related

SQLite concurrent connections issue

I am working on a VB.NET application.
As per the nature of the application, One module has to monitor database (SQLite DB) each second. This Monitoring is done by simple select statement which run to check data against some condition.
Other Modules performs a select,Insert and Update statements on same SQLite DB.
on SQLite concurrent select statements are working fine, but I'm having hard time here to find out, why it is not allowing Inset and Update.
I understand it's a file based lock, but is there anyway to get it done?
each module, in fact statement opens and close the connection to DB.
I've restricted user to run single statement at a time by GUI design.
any Help will be appreciated.
If your database file is not on a network, you could allow a certain amount of read/write concurrency by enabling WAL mode.
But perhaps you should use only a single connection and do your own synchronization for all DB accesses.
You can use some locking mechanism to make sure the database works in a multithreading situation. Since your application is a read intensive one according to what you said, you can consider using a ReaderWriterLock or ReaderWriterLockSlim. (refer to here and here for more details)
If you have only one database, then creating just one instance of the lock is OK; if you have more than one database, each of them can be assigned a lock. Every time you do some read or write, enter the lock (by EnterReadLock() for ReaderWriterLockSlim, or by AcquireReaderLock() for ReaderWriterLock) before you do something, and after you're done exit the lock. Note that you can place the exit of the lock in a finally clause lest you forget to release it.
The strategy above is being used in our production applications. It's not so good as to use a single thread in your case because you have to take performance into account.

What locks are enforced by SQL Server 2005 Express?

Consider a web page having grid-view connected to SqlDataSource having all permission to insert update and delete.
Publish the web page.
This is all on one computer local
Now
opening website on browser A - pressing edit of grid-view
opening website on broswer B - pressing edit of grid-view.
Now I edit in both browsers and press update one by one fine no problem
The last update is the one retained.
But hypothetical situation:
what if there were two computers, or
what if I had two mouse pointers controlled by two independent mice
Computer has capability of running two apps at the same time
Both users get ready and press the update in the browsers at the same time
Even if you consider two different computers this is not possible but for this question
Consider it as possible
Update from two different sources to the same database same table same same row
At the same time, same second, same micro second no delay, both hit the database server at the same time.
What will happen?
In theory I have studied that database management software implement locks when writing no reading, no other writing, etc but does SQL Server 2005 Express implement locks in practical or is it assumed that situation like above will never occur?
If locks there please provide explanation or resource which would explain it keeping in view different scenarios of access
Thank you
edit:-- I am not using control like sqldatasource so please when by providing statements to avoid bling update
its like-- algo---
sqlconnection conn=new .....
sqlcommand
command text is "sql statement for updating values of a particular row"
conn.Open();
cmd.ExecuteNonQuery();
conn.close;
so as seen how can I define the check that before executenonquery that if the data is recently changed are you sure you want to proceed? or something
I am kind of confused here I think..
}
This is solved by most applications using Optimistic Concurency control. Applications simply add more conditions to the update WHERE clause in order to detect changes that occured between the time the data was read and the moment the update is applied. Is called optimistic concurency because the applicaiton assumes no concurent changes will occur, and if they do occur they are are detected and the appplicaiton has to restart the operation. The alternative to optimistic concurency is pesimistic concurency where the applications explicitly locks the data it plans to update. In practice operaitons involving user interaction are never done under pesimistic concurency model.
Other concurency model, specially in distributed applications, is the one implied by the Fiefdom and Emissaries model.
So while database locks and transaction concurency models are always omnipresent in any database operation, when user interaction is involved no application will ever rely on the database locks. User interactions are simply way to long in terms of database transactions. Acquiring locks for the while forgetful Fred is out to lunch and has a data screen open on his desktop simply doesn't work.
SQL 2005 will enforce locks. Before a row can be updated the transaction must acquire an exclusive lock on it. Only 1 transaction can be granted this at a time so the other one will have to wait for that transaction to commit (2 phase locking) before being granted the lock that it needs for the update.
The second write will "win" in that it will overwrite the first one. You can implement optimistic concurrency controls in the sqldatasource to detect that the row has changed and abort the second one rather than blindly overwriting the first edit.
Edit
Following clarification to the question. If you want to roll your own you could add a timestamp column to the table (In SQL Server 2005 this is updated automatically when a row is updated) and bring that back as a hidden dataitem in the gridview then in your UPDATE statement add a where clause UPDATE ... WHERE PrimaryKeyColumn=#PKValue AND TimeStampCol=#OriginalTimestampValue If no rows were affected (retrievable from ExecuteNonQuery - generally) then another transaction modified the row. This might be a bit more lightweight than the alternative used by the data source control where it passes back the original values of all columns and adds them into the WHERE clause with similar logic.

Regarding SQL Server Locking Mechanism

I would like to ask couple of questions regarding SQL Server Locking mechanism
If i am not using the lock hint with SQL Statement, SQL Server uses PAGELOCK hint by default. am I right??? If yes then why? may be its due to the factor of managing too many locks this is the only thing i took as drawback but please let me know if there are others. and also tell me if we can change this default behavior if its reasonable to do.
I am writing a server side application, a Sync Server (not using sync framework) and I have written database queries in C# code file and using ODBC connection to execute them. Now question is what is the best way to change the default locking from Page to Row keeping drawbacks in mind (e.g. adding lock hint in queries this is what i am planning for).
What if a sql query(SELECT/DML) is being executed without the scope of transaction and statement contains lock hint then what kind of lock will be acquired (e.g. shared, update, exclusive)? AND while in transaction scope does Isolation Level of transaction has impact on lock type if ROWLOCK hint is being used.
Lastly, If some could give me sample so i could test and experience all above scenarios my self (e.g. dot net code or sql script)
Thanks
Mubashar
No. It locks as it sees fit and escalates locks as needed
Let the DB engine manage it
See point 2
See point 2
I'd only use lock hints if you want specific and certain behaviours eg queues or non-blocking (dirty) reads.
More generally, why do you think the DB engine can't do what you want by default?
The default locking is row locks not page locks, although the way in which the locking mechanism works means you will be placing locks on all the objects within the hierarchy e.g. reading a single row will place a shared lock on the table, a shared lock on the page and then a shared lock on the row.
This enables an action requesting an exclusive lock on the table to know it may not take it yet, since there is a shared lock present (otherwise it would have to check every page / row for locks.)
If you issue too many locks for an individual query however, it performs lock escalation which reduces the granularity of the lock - so that is it managing less locks.
This can be turned off using a trace flag but I wouldn't consider it.
Until you know you actually have a locking / lock escalation issue you risk prematurely optimizing a non-existant problem.

ORM Support for Handling Deadlocks

Do you know of any ORM tool that offers deadlock recovery? I know deadlocks are a bad thing but sometimes any system will suffer from it given the right amount of load. In Sql Server, the deadlock message says "Rerun the transaction" so I would suspect that rerunning a deadlock statement is a desirable feature on ORM's.
I don't know of any special ORM tool support for automatically rerunning transactions that failed because of deadlocks. However I don't think that a ORM makes dealing with locking/deadlocking issues very different. Firstly, you should analyze the root cause for your deadlocks, then redesign your transactions and queries in a way that deadlocks are avoided or at least reduced. There are lots of options for improvement, like choosing the right isolation level for (parts) of your transactions, using lock hints etc. This depends much more on your database system then on your ORM. Of course it helps if your ORM allows you to use stored procedures for some fine-tuned command etc.
If this doesn't help to avoid deadlocks completely, or you don't have the time to implement and test the real fix now, of course you could simply place a try/catch around your save/commit/persist or whatever call, check catched exceptions if they indicate that the failed transaction is a "deadlock victim", and then simply recall save/commit/persist after a few seconds sleeping. Waiting a few seconds is a good idea since deadlocks are often an indication that there is a temporary peak of transactions competing for the same resources, and rerunning the same transaction quickly again and again would probably make things even worse.
For the same reason you probably would wont to make sure that you only try once to rerun the same transaction.
In a real world scenario we once implemented this kind of workaround, and about 80% of the "deadlock victims" succeeded on the second go. But I strongly recommend to digg deeper to fix the actual reason for the deadlocking, because these problems usually increase exponentially with the number of users. Hope that helps.
Deadlocks are to be expected, and SQL Server seems to be worse off in this front than other database servers. First, you should try to minimize your deadlocks. Try using the SQL Server Profiler to figure out why its happening and what you can do about it. Next, configure your ORM to not read after making an update in the same transaction, if possible. Finally, after you've done that, if you happen to use Spring and Hibernate together, you can put in an interceptor to watch for this situation. Extend MethodInterceptor and place it in your Spring bean under interceptorNames. When the interceptor is run, use invocation.proceed() to execute the transaction. Catch any exceptions, and define a number of times you want to retry.
An o/r mapper can't detect this, as the deadlock is always occuring inside the DBMS, which could be caused by locks set by other threads or other apps even.
To be sure a piece of code doesn't create a deadlock, always use these rules:
- do fetching outside the transaction. So first fetch, then perform processing then perform DML statements like insert, delete and update
- every action inside a method or series of methods which contain / work with a transaction have to use the same connection to the database. This is required because for example write locks are ignored by statements executed over the same connection (as that same connection set the locks ;)).
Often, deadlocks occur because either code fetches data inside a transaction which causes a NEW connection to be opened (which has to wait for locks) or uses different connections for the statements in a transaction.
I had a quick look (no doubt you have too) and couldn't find anything suggesting that hibernate at least offers this. This is probably because ORMs consider this outside of the scope of the problem they are trying to solve.
If you are having issues with deadlocks certainly follow some of the suggestions posted here to try and resolve them. After that you just need to make sure all your database access code gets wrapped with something which can detect a deadlock and retry the transaction.
One system I worked on was based on “commands” that were then committed to the database when the user pressed save, it worked like this:
While(true)
start a database transaction
Foreach command to process
read data the command need into objects
update the object by calling the command.run method
EndForeach
Save the objects to the database
If not deadlock
commit the database transaction
we are done
Else
abort the database transaction
log deadlock and try again
EndIf
EndWhile
You may be able to do something like with any ORM; we used an in house data access system, as ORM were too new at the time.
We run the commands outside of a transaction while the user was interacting with the system. Then rerun them as above (when you use did a "save") to cope with changes other people have made. As we already had a good ideal of the rows the command would change, we could even use locking hints or “select for update” to take out all the write locks we needed at the start of the transaction. (We shorted the set of rows to be updated to reduce the number of deadlocks even more)

What are the problems of using transactions in a database?

From this post. One obvious problem is scalability/performance. What are the other problems that transactions use will provoke?
Could you say there are two sets of problems, one for long running transactions and one for short running ones? If yes, how would you define them?
EDIT: Deadlock is another problem, but data inconsistency might be worse, depending on the application domain. Assuming a transaction-worthy domain (banking, to use the canonical example), deadlock possibility is more like a cost to pay for ensuring data consistency, rather than a problem with transactions use, or you would disagree? If so, what other solutions would you use to ensure data consistency which are deadlock free?
It depends a lot on the transactional implementation inside your database and may also depend on the transaction isolation level you use. I'm assuming "repeatable read" or higher here. Holding transactions open for a long time (even ones which haven't modified anything) forces the database to hold on to deleted or updated rows of frequently-changing tables (just in case you decide to read them) which could otherwise be thrown away.
Also, rolling back transactions can be really expensive. I know that in MySQL's InnoDB engine, rolling back a big transaction can take FAR longer than committing it (we've seen a rollback take 30 minutes).
Another problem is to do with database connection state. In a distributed, fault-tolerant application, you can't ever really know what state a database connection is in. Stateful database connections can't be maintained easily as they could fail at any moment (the application needs to remember what it was in the middle of doing it and redo it). Stateless ones can just be reconnected and have the (atomic) command re-issued without (in most cases) breaking state.
You can get deadlocks even without using explicit transactions. For one thing, most relational databases will apply an implicit transaction to each statement you execute.
Deadlocks are fundamentally caused by acquiring multiple locks, and any activity that involves acquiring more than one lock can deadlock with any other activity that involves acquiring at least two of the same locks as the first activity. In a database transaction, some of the acquired locks may be held longer than they would otherwise be held -- to the end of the transaction, in fact. The longer locks are held, the greater the chance for a deadlock. This is why a longer-running transaction has a greater chance of deadlock than a shorter one.
One issue with transactions is that it's possible (unlikely, but possible) to get deadlocks in the DB. You do have to understand how your database works, locks, transacts, etc in order to debug these interesting/frustrating problems.
-Adam
I think the major issue is at the design level. At what level or levels within my application do I utilise transactions.
For example I could:
Create transactions within stored procedures,
Use the data access API (ADO.NET) to control transactions
Use some form of implicit rollback higher in the application
A distributed transaction in (via DTC / COM+).
Using more then one of these levels in the same application often seems to create performance and/or data integrity issues.