Keeping multi-user state across DB sessions - sql

The situation
Suppose we have a web application connected to a (Postgre)SQL database whose task can be summarized as:
A SELECT operation to visualize the data.
An UPDATE operation that stores modifications based on the visualized data.
Simple, but... the data involved isn't user specific, so it might potentially be changed during the process by other users. The editing task may take long time (perhaps more than an hour), meaning that the probability of these collisions happening isn't low: it makes sense to implement a robust solution to the problem.
The approach
The idea would be that, once the user tries to submit the changes (i.e. firing the UPDATE operation), a number of database checks will be triggered to ensure that the involved data didn't change in the meantime.
Assuming we have timestamped every change on the data, it would be as easy as keeping the access time when the data was SELECTed and ensuring that no new changes were done after that time on the involved data.
The problem
We could easily just keep that access time in the frontend application while the user performs the editing, and later provide it as an argument to the trigger function when performing the UPDATE, but that's not desirable for security reasons. The database should store the user's access time.
An intuitive solution could be a TEMPORARY TABLE associated to the database session. But, again, the user might take a long time doing the task, so capturing a connection from the pool and keeping it idle for such a long time doesn't seem like a good option either. The SELECT and the UPDATE operations will be performed under different sessions.
The question
Is there any paradigm or canonical way to address and solve this problem efficiently?

This problem is known as the "lost update" problem.
There are several solutions that depend on whether a connection pool is used or not and on the transaction isolation level used:
pessimistic locking with SELECT ... FOR UPDATE without connection pool
optimistic locking with timestamp column if connection pool is used.

Related

How are transactions partitioned/isolated in SQLite?

I have been reading the SQLite documentation and also referencing code I have written previously but I don't seem to be able to find a definitive answer to what I imagine to be a rather simple question.
I would like to execute many (separate) compiled statements within a transaction, but child threads may also be creating transactions or just executing statements at the same time and I would not want them included in this particular transaction. Currently, I have a single database handle that I share between all threads.
So, my question is,
1) .. is it generally better to have some kind of semaphore around transactions to ensure they will not clash/collect with other statements being executed against a database handle. I already marshal writes to prevent problems with multithreaded issues with SQLite (although with WAL now it's very hard to unsettle it at all).
2) .. or are you expected to open multiple database connections and start/commit the transactions one per database connection if they will be concurrent?
Changes made in one database connection are invisible to all other database connections prior to commit.
So it seems a hybrid approach of having several connections open to the database provides adequate concurrency guarantees, trading off the expense of opening a new connection with the benefit of allowing multi-threaded write transactions.
A query sees all changes that are completed on the same database connection prior to the start of the query, regardless of whether or not those changes have been committed.
If changes occur on the same database connection after a query starts running but before the query completes, then it is undefined whether or not the query will see those changes.
If changes occur on the same database connection after a query starts running but before the query completes, then the query might return a changed row more than once, or it might return a row that was previously deleted.
For the purposes of the previous four items, two database connections that use the same shared cache and which enable PRAGMA read_uncommitted are considered to be the same database connection, not separate database connections.
Here is the SQLite information on isolation. Which is exceptionally useful to read and understand for this problem.

Should I create separate SQL Server database for each user?

I am working on Asp.Net MVC web application, back-end is SQL Server 2012.
This application will provide billing, accounting, and inventory management. The user will create an account by signup. just like http://www.quickbooks.in. Each user will create some masters and various transactions. There is no limit, user can make unlimited records in the database.
I want to keep stable database performance, after heavy data load. I am maintaining proper indexing and primary keys in it, but there would be a heavy load on the database, per user.
So, should I create a separate database for each user, or should maintain one database with UserID. Add UserID in each table and making a partition based on UserID?
I am not an expert in SQL Server, so please provide suggestions with clear specifications.
Please inform me if there is any lack of information.
A DB per user is what happens when customers need to be able pack up and leave taking the actual database with them. Think of a self hosted wordpress website. Or if there are incredible risks to one user accidentally seeing another user's data, so it's safer to rely on the servers security model than to rely on remembering to add the UserId filter to all your queries. I can't imagine a scenario like that, but who knows-- maybe if the privacy laws allowed for jail time, I would rather data partitioned by security rules rather than carefully writing WHERE clauses.
If you did do user-per-database, creating a new user will be 10x more effort. While INSERT, UPDATE and so on stay the same from version to version, with each upgrade the syntax for database, user creation, permission granting and so on will evolve enough to break those scripts each SQL version upgrade.
Also, this will multiply your migration headaches by the number of users. Let's say you have 5000 users and you need to add some new columns, change a columns data type, update a trigger, and so on. Instead of needing to run that change script 1x, you need to run it 5000 times.
Per user Dbs also probably wastes disk space. Each of those databases is going to have a transaction log, sitting idle taking up the minimum log space.
As for load, if collectively your 5000 users are doing 1 billion inserts, updates and so on per day, my intuition tells me that it's going to be faster on one database, unless there is some sort of contension issue (everyone reading and writing to the same table at the same time and the same pages of the same table). Each database has machine resources (probably threads and memory) per database doing housekeeping, so these extra DBs can't be free.
Anyhow, the best thing to do is to simulate the two architectures and use a random data generator to simulate load and see how they perform.
It's not an easy answer to give.
First, there is logical design to be considered. Then you have integrity, security, management and performance (in this very order).
A database is a logical unit of data, self contained. Ideally, you should be able to take a database, move it to another instance, probably change the connection strings and be running again.
All the constraints are database-level. No foreign keys can exist referencing some object outside the database.
So, try thinking in these terms first.
How would you reliably prevent one user messing up the other user's data? Keep in mind that it's just a matter of time before someone opens an excel sheet and fire up queries on the database bypassing your application. Row level security in SQL Server is something you don't want to deal with.
Multiple databases mean that all management tasks should be scripted out and executed on all databases. Yes, there is some overhead to it, but once you set it up it's just the matter of monitoring. If a database goes suspect, it's a single customer down, not all of them. You can even have different versions for different customes if each customer have it's own database. Additionally, if you roll an upgrade, you can do it per customer, so the inpact will be much less.
Performance is the least relevant factor here. Of course, it really depends on how many customers and how much data, but proper indexing will solve these issues. Scale-out is much easier with multiple databases.
BTW, partitioning, as you mentioned it, is never a performance booster, it's simply a management feature, allowing for faster loading and evicting of data from a table.
I'd probably put each customer in separate database, but it's up to you eventually to make a decision for yourself. Hope I've helped some with this.

SQLite concurrent connections issue

I am working on a VB.NET application.
As per the nature of the application, One module has to monitor database (SQLite DB) each second. This Monitoring is done by simple select statement which run to check data against some condition.
Other Modules performs a select,Insert and Update statements on same SQLite DB.
on SQLite concurrent select statements are working fine, but I'm having hard time here to find out, why it is not allowing Inset and Update.
I understand it's a file based lock, but is there anyway to get it done?
each module, in fact statement opens and close the connection to DB.
I've restricted user to run single statement at a time by GUI design.
any Help will be appreciated.
If your database file is not on a network, you could allow a certain amount of read/write concurrency by enabling WAL mode.
But perhaps you should use only a single connection and do your own synchronization for all DB accesses.
You can use some locking mechanism to make sure the database works in a multithreading situation. Since your application is a read intensive one according to what you said, you can consider using a ReaderWriterLock or ReaderWriterLockSlim. (refer to here and here for more details)
If you have only one database, then creating just one instance of the lock is OK; if you have more than one database, each of them can be assigned a lock. Every time you do some read or write, enter the lock (by EnterReadLock() for ReaderWriterLockSlim, or by AcquireReaderLock() for ReaderWriterLock) before you do something, and after you're done exit the lock. Note that you can place the exit of the lock in a finally clause lest you forget to release it.
The strategy above is being used in our production applications. It's not so good as to use a single thread in your case because you have to take performance into account.

What locks are enforced by SQL Server 2005 Express?

Consider a web page having grid-view connected to SqlDataSource having all permission to insert update and delete.
Publish the web page.
This is all on one computer local
Now
opening website on browser A - pressing edit of grid-view
opening website on broswer B - pressing edit of grid-view.
Now I edit in both browsers and press update one by one fine no problem
The last update is the one retained.
But hypothetical situation:
what if there were two computers, or
what if I had two mouse pointers controlled by two independent mice
Computer has capability of running two apps at the same time
Both users get ready and press the update in the browsers at the same time
Even if you consider two different computers this is not possible but for this question
Consider it as possible
Update from two different sources to the same database same table same same row
At the same time, same second, same micro second no delay, both hit the database server at the same time.
What will happen?
In theory I have studied that database management software implement locks when writing no reading, no other writing, etc but does SQL Server 2005 Express implement locks in practical or is it assumed that situation like above will never occur?
If locks there please provide explanation or resource which would explain it keeping in view different scenarios of access
Thank you
edit:-- I am not using control like sqldatasource so please when by providing statements to avoid bling update
its like-- algo---
sqlconnection conn=new .....
sqlcommand
command text is "sql statement for updating values of a particular row"
conn.Open();
cmd.ExecuteNonQuery();
conn.close;
so as seen how can I define the check that before executenonquery that if the data is recently changed are you sure you want to proceed? or something
I am kind of confused here I think..
}
This is solved by most applications using Optimistic Concurency control. Applications simply add more conditions to the update WHERE clause in order to detect changes that occured between the time the data was read and the moment the update is applied. Is called optimistic concurency because the applicaiton assumes no concurent changes will occur, and if they do occur they are are detected and the appplicaiton has to restart the operation. The alternative to optimistic concurency is pesimistic concurency where the applications explicitly locks the data it plans to update. In practice operaitons involving user interaction are never done under pesimistic concurency model.
Other concurency model, specially in distributed applications, is the one implied by the Fiefdom and Emissaries model.
So while database locks and transaction concurency models are always omnipresent in any database operation, when user interaction is involved no application will ever rely on the database locks. User interactions are simply way to long in terms of database transactions. Acquiring locks for the while forgetful Fred is out to lunch and has a data screen open on his desktop simply doesn't work.
SQL 2005 will enforce locks. Before a row can be updated the transaction must acquire an exclusive lock on it. Only 1 transaction can be granted this at a time so the other one will have to wait for that transaction to commit (2 phase locking) before being granted the lock that it needs for the update.
The second write will "win" in that it will overwrite the first one. You can implement optimistic concurrency controls in the sqldatasource to detect that the row has changed and abort the second one rather than blindly overwriting the first edit.
Edit
Following clarification to the question. If you want to roll your own you could add a timestamp column to the table (In SQL Server 2005 this is updated automatically when a row is updated) and bring that back as a hidden dataitem in the gridview then in your UPDATE statement add a where clause UPDATE ... WHERE PrimaryKeyColumn=#PKValue AND TimeStampCol=#OriginalTimestampValue If no rows were affected (retrievable from ExecuteNonQuery - generally) then another transaction modified the row. This might be a bit more lightweight than the alternative used by the data source control where it passes back the original values of all columns and adds them into the WHERE clause with similar logic.

What are the problems of using transactions in a database?

From this post. One obvious problem is scalability/performance. What are the other problems that transactions use will provoke?
Could you say there are two sets of problems, one for long running transactions and one for short running ones? If yes, how would you define them?
EDIT: Deadlock is another problem, but data inconsistency might be worse, depending on the application domain. Assuming a transaction-worthy domain (banking, to use the canonical example), deadlock possibility is more like a cost to pay for ensuring data consistency, rather than a problem with transactions use, or you would disagree? If so, what other solutions would you use to ensure data consistency which are deadlock free?
It depends a lot on the transactional implementation inside your database and may also depend on the transaction isolation level you use. I'm assuming "repeatable read" or higher here. Holding transactions open for a long time (even ones which haven't modified anything) forces the database to hold on to deleted or updated rows of frequently-changing tables (just in case you decide to read them) which could otherwise be thrown away.
Also, rolling back transactions can be really expensive. I know that in MySQL's InnoDB engine, rolling back a big transaction can take FAR longer than committing it (we've seen a rollback take 30 minutes).
Another problem is to do with database connection state. In a distributed, fault-tolerant application, you can't ever really know what state a database connection is in. Stateful database connections can't be maintained easily as they could fail at any moment (the application needs to remember what it was in the middle of doing it and redo it). Stateless ones can just be reconnected and have the (atomic) command re-issued without (in most cases) breaking state.
You can get deadlocks even without using explicit transactions. For one thing, most relational databases will apply an implicit transaction to each statement you execute.
Deadlocks are fundamentally caused by acquiring multiple locks, and any activity that involves acquiring more than one lock can deadlock with any other activity that involves acquiring at least two of the same locks as the first activity. In a database transaction, some of the acquired locks may be held longer than they would otherwise be held -- to the end of the transaction, in fact. The longer locks are held, the greater the chance for a deadlock. This is why a longer-running transaction has a greater chance of deadlock than a shorter one.
One issue with transactions is that it's possible (unlikely, but possible) to get deadlocks in the DB. You do have to understand how your database works, locks, transacts, etc in order to debug these interesting/frustrating problems.
-Adam
I think the major issue is at the design level. At what level or levels within my application do I utilise transactions.
For example I could:
Create transactions within stored procedures,
Use the data access API (ADO.NET) to control transactions
Use some form of implicit rollback higher in the application
A distributed transaction in (via DTC / COM+).
Using more then one of these levels in the same application often seems to create performance and/or data integrity issues.