sql simultaneous update - sql

I'm trying to add a like and dislike button to my application and the backend runs on SQL, I've done my research and i read somewhere that it will handle updates in the same table and different rows, but i count the number of likes after the "SELECT " query and then i update the value so what happens if someone likes it at the same time and the value is +1 when it's going to update it? will it update it to +2 or will it stay the same?
thanks

You can do this in a single query:
UPDATE items SET likes = likes + 1 WHERE item_id = :id;
That statement, assuming you're using a DB that supports transactions, will run as a transaction, and so in isolation.
If you SELECT, modify, and then UPDATE in separate queries, then you run into the concurrent-update problem you described, where one user's changes can overwrite another's with an unfortunate interleaving.
You could do it using locks to ensure no bad interleaving, but that will increase complexity, and open up potential for concurrency bugs.

Related

Is bulk update faster than single update in db2?

I have a Table with 10 columns and in that table I have thousands/millions of rows.
In some scenario, I want to update more than 10K records at a time. currently my scenario code works sequentially like,
for i in (primary key ids for all records to be updated)
executeupdate(i)
what I thought is instead of running same query 10K times, I will add all ids in a string and run a single update query like,
executeupdate(all ids)
actual DB queries can be like this,
suppose I have primary key ids like,
10001,10002,10003,10004,10005
so in first case My queries will be like
update tab1 set status="xyz" where Id="10001"
update tab1 set status="xyz" where Id="10002"
update tab1 set status="xyz" where Id="10003"
update tab1 set status="xyz" where Id="10004"
update tab1 set status="xyz" where Id="10005"
and My bulk update query will be like,
update tab1 set status="xyz" where id in ("10001","10002","10003","10004","10005")
so My question is, will I get any Performance improvement (executime time) by doing bulk update
or total query execution time will be same as for each record index scan will happen and update will take place?
Note : I am using DB2 9.5 as database
Thanks.
In general, a "bulk" update will be faster, regardless of database. Of course, you can test the performance of the two, and report back.
Each call to update requires a bunch of overhead, in terms of processing the query, setting up locks on tables/pages/rows. Doing a single update consolidates this overhead.
The downside to a single update is that it might be faster overall, but it might lock underlying resources for longer periods of time. For instance, the single updates might take 10 milliseconds each, for an elapsed time of 10 seconds for 1,000 of them. However, no resource is locked for more than 10 milliseconds. The bulk update might take 5 seconds, but the resources would be locked for more of this period.
To speed these updates, be sure that id is indexed.
I should note. This is a general principle. I have not specifically tested single versus multiple update performance on DB2.
You will definitely see a performance improvement, because you will reduce the number of roundtrips.
However, this approach does not scale very well; thousands of ID's in one statement could get a bit tricky. Also, there is a limit on the size of your query (could be 64k). You could consider to 'page' through your table and update - say - 100 records per update statement.
I came here with same question a week back. Then I faced a situation where I had to update a table with around 3500 rows in mySQL database through JDBC.
I updated same table twice: once through a For loop, by iterating through a collection of objects, and once using a bulk update query. Here are my findings:
When I updated the data in the database through iteration, it took around 7.945 seconds to execute completely.
When I came up with a rather gigantic (where 'gigantic' means 183 pages long) update query and executed the same, it took around 2.24 seconds to complete the update process.
clearly, bulk update wins by a huge margin.
Why this Difference?
To answer this, let's see how a query actually gets executed in DBMS.
Unlike procedural languages, you instruct the DBMS what to do, but not how to do. The DBMS then does the followings.
Syntax checking, or more commonly called 'Parsing'. And parsing comprises of steps like Lexical Analysis, Syntactic Analysis, Semantic Parsing.
A series of optimization (Although the definition of 'optimization' itself may vary from product to product. At least that's what I learned while surfing through the internet. I don't have much knowledge about it though.).
execution.
Now, when you update a table in database row by row, each of the queries you execute goes through parsing, optimization and execution. In stead if you write a loop to create a rather long query, and then execute the same, it is parsed only once. And the amount of time you save by using batch update in place of iterative approach increases almost linearly with number of rows you update.
A few tips that might come handy while updating data in your database
It is always a good practice to use indexed columns as reference while writing Any query.
Try to use integers or numbers and not strings for sorting or searching data in database. Your server is way more comfortable in comparing two numbers than comparing two strings.
Avoid using views and 'in' clause. they make your task easier, but slows down your database. Use joins in stead.
If you are using .NET (and there's probably a similar option in other languages like Java), there is a option you can use on your DB2Connection class called BeginChain, which will greatly improve performance.
Basically, when you have the chain option activated, your DB2 client will keep all of the commands in a queue. When you call EndChain, the queue will be sent to the server at once, and processed at one time.
The documentation says that this should perform much better than non-chained UPDATE/INSERT/DELETEs (and this is what we've seen in my shop), but there are some differences you might need to be aware of:
No exceptions will be thrown on individual statements. They will all be batched up in one DB2Exception, which will contain multiple errors in the DB2Error property.
ExecuteNonQuery will return -1 when chaining is active.
Additionally, performance can be improved further by using a query with Parameter Markers instead of separate individual queries (assuming status can change as well, otherwise, you might just use a literal):
UPDATE tab1
SET status = #status
WHERE id = #id
Edit for comment: I'm not sure if the confusion is in using Parameter Markers (which are just placeholders for values in a query, see the link for more details), or in the actual usage of chaining. If it is the second, then here is some example code (I didn't verify that it works, so use at your own risk :)):
//Below is a function that returns an open DB2Connection
//object. It can vary by shop, so put it whatever you do.
using (var conn = (DB2Connection) GetConnection())
{
using (var trans = conn.BeginTransaction())
{
var sb = new StringBuilder();
sb.AppendLine("UPDATE tab1 ");
sb.AppendLine(" SET status = 'HISTORY' ");
sb.AppendLine(" WHERE id = #id");
trans.Connection.BeginChain();
using (var cmd = trans.Connection.CreateCommand())
{
cmd.CommandText = sb.ToString();
cmd.Transaction = trans;
foreach (var id in ids)
{
cmd.Parameters.Clear();
cmd.Parameters.Add("#id", id);
cmd.ExecuteNonQuery();
}
}
trans.Connection.EndChain();
trans.Commit();
}
}
One other aspect I would like to point out is the commit interval. If the single update statement updates few 100 K rows, the transaction log also grows acordingly, it might become slower. I have seen reduction in total time while using ETL tools like informatica which fired sets of update statements per record followed by a commit- compared to a single update statement based on conditions to do it in a single go. This was counter-intuitive for me.

SQL transaction affecting a big amount of rows

The situation is as follows:
A big production client/server system where one central database table has a certain column that has had NULL as default value but now has 0 as default value. But all the rows created before that change of course still have value as null and that generates a lot of unnecessary error messages in this system.
Solution is of course simple as that:
update theTable set theColumn = 0 where theColumn is null
But I guess it's gonna take a lot of time to complete this transaction? Apart from that, will there be any other issues I should think of before I do this? Will this big transaction block the whole database, or that particular table during the whole update process?
This particular table has about 550k rows and 500k of them has null value and will be affected by the above sql statement.
The impact on the performance of other connected clients depends on:
How fast the servers hardware is
How many indexes containing the column your update statement has to update
Which transaction isolation settings the other clients connect to the database
The db engine will acquire write locks, so when your clients only need read access to the table, it should not be a big problem.
500.000 records sounds not too much for me, but as i said, the time and resources the update takes depends on many factors.
Do you have a similar test system, where you can try out the update?
Another solution is to split the one big update into many small ones and call them in a loop.
When you have clients writing frequently to that table, your update statement might get blocked "forever". I have seen databases where performing the update row by row was the only way of getting the update through. But that was a table with about 200.000.000 records and about 500 very active clients!
it's gonna take a lot of time to complete this transaction
there's no definite way to say this. Depends a lot on the hardware, number of concurrent sessions, whether the table has got locks, the number of interdependent triggers et al.
Will this big transaction block the whole database, or that particular table during the whole update process
If the "whole database" is dependent on this table then it might.
will there be any other issues I should think of before I do this
If the table has been locked by other transaction - you might run into a row-lock situation. In rare cases, perhaps a dead lock situation. Best would be to ensure that no one is utilizing the table, check for any pre-exising locks and then run the statement.
Locking issues are vendor specific.
Asuming no triggers on the table, half a million rows is not much for a dediated database server even with many indexes on the table.

What is the purpose of ROWLOCK on Delete and when should I use it?

Ex)
When should I use this statement:
DELETE TOP (#count)
FROM ProductInfo WITH (ROWLOCK)
WHERE ProductId = #productId_for_del;
And when should be just doing:
DELETE TOP (#count)
FROM ProductInfo
WHERE ProductId = #productId_for_del;
The with (rowlock) is a hint that instructs the database that it should keep locks on a row scope. That means that the database will avoid escalating locks to block or table scope.
You use the hint when only a single or only a few rows will be affected by the query, to keep the lock from locking rows that will not be deleted by the query. That will let another query read unrelated rows at the same time instead of having to wait for the delete to complete.
If you use it on a query that will delete a lot of rows, it may degrade the performance as the database will try to avoid escalating the locks to a larger scope, even if it would have been more efficient.
Normally you shouldn't need to add such hints to a query, because the database knows what kind of lock to use. It's only in situations where you get performance problems because the database made the wrong decision, that you should add such hints to a query.
Rowlock is a query hint that should be used with caution (as is all query hints).
Omitting it will likely still result in the exact same behaviour and providing it will not guarantee that it will only use a rowlock, it is only a hint afterall. If you do not have a very in depth knowledge of lock contention chances are that the optimizer will pick the best possible locking strategy, and these things are usually best left to the database engine to decide.
ROWLOCK means that SQL will lock only the affected row, and not the entire table or the page in the table where the data is stored when performing the delete. This will only affect other people reading from the table at the same time as your delete is running.
If a table lock is used it will cause all queries to the table to wait until your delete has completed, with a row lock only selects reading the specific rows will be made to wait.
Deleting top N where N is a number of rows will most likely lock the table in any case.
SQL Server defaults to page locks. This is the most efficient way for SQL server to process multiple date sets. But SQL server is not multi-user friendly sometimes; therefore you may need to incorporate locking methods so you can get your data to flow in and out of the database. This is why people approach that problem by using locking hints.
If everyone designed there database tables so that everything processed each row at page width - the system would be very fast. But no one spends that detailed amount of time.
So, you might see people use with(nolock) on their SELECT statements and the use of with(rowlock) on their UPDATE and DELETE statements. An INSERT does not matter because it will lock the PAGE automatically. Sometimes by using with(rowlock), you can get better multi-user (multiple user connections) performance.
The problem with(nolock) is that you can return the committed record sitting there in the DB already, plus the dirty record that is about to update the sitting record; thus a double return of records to your SELECT statement. If you know the personality of your system on how the data runs through it, you can use with(nolock) to your advantage quite a bit though.
When do you know when to use with(rowlock)? When your system isn't letting user play nice with each other in the same table / record. Though, query re-write / tune first and then adjust your locking as a last resort.
But as a DBA, always blame the developer's code. It is your solemnly sworn duty to do such. If you are the developer writing this code, just blame yourself.

Getting deadlocks in MySQL

We're very frustratingly getting deadlocks in MySQL. It isn't because of exceeding a lock timeout as the deadlocks happen instantly when they do happen. Here's the SQL code that is executing on 2 separate threads (with 2 separate connections from the connection pool) that produces a deadlock:
UPDATE Sequences SET Counter = LAST_INSERT_ID(Counter + 1) WHERE Sequence IS NULL
Sequences table has 2 columns: Sequence and Counter
The LAST_INSERT_ID allows us to retrieve this updated counter value as per MySQL's recommendation. That works perfect for us, but we get these deadlocks! Why are we getting them and how can we avoid them??
Thanks so much for any help with this.
EDIT: this is all in a transaction (required since I'm using Hibernate) and AUTO_INCREMENT doesn't make sense here. I should've been more clear. The Sequences table holds many sequences (in our case about 100 million of them). I need to increment a counter and retrieve that value. AUTO_INCREMENT plays no role in all of this, this has nothing to do with Ids or PRIMARY KEYs.
Wrap your sql statements in a transaction. If you aren't using a transaction you will get a race condition on LAST_INSERT_ID.
But really, you should have counter fields auto_increment, so you let mysql handle this.
Your third solution is to use LOCK_TABLES, to lock the sequence table so no other process can access it concurrently. This is the probably the slowest solution unless you are using INNODB.
Deadlocks are a normal part of any transactional database, and can occur at any time. Generally, you are supposed to write your application code to handle them, as there is no surefire way to guarantee that you will never get a deadlock. That being said, there are situations that increase the likelihood of deadlocks occurring, such as the use of large transactions, and there are things you can do to mitigate their occurrence.
First thing, you should read this manual page to get a better understanding of how you can avoid them.
Second, if all you're doing is updating a counter, you should really, really, really be using an AUTO_INCREMENT column for Counter rather than relying on a "select then update" process, which as you have seen is a race condition that can produce deadlocks. Essentially, the AUTO_INCREMENT property of your table column will act as a counter for you.
Finally, I'm going to assume that you have that update statement inside a transaction, as this would produce frequent deadlocks. If you want to see it in action, try the experiment listed here. That's exactly what's happening with your code... two threads are attempting to update the same records at the same time before one of them is committed. Instant deadlock.
Your best solution is to figure out how to do it without a transaction, and AUTO_INCREMENT will let you do that.
No other SQL involved ? Seems a bit unlikely to me.
The 'where sequence is null' probably causes a full table scan, causing read locks to be acquired on every row/page/... .
This becomes a problem if (your particular engine does not use MVCC and) there were an INSERT that preceded your update within the same transaction. That INSERT would have acquired an exclusive lock on some resource (row/page/...), which will cause the acquisition of a read lock by any other thread to go waiting. So two connections can first do their insert, causing each of them to have an exclusive lock on some small portion of the table, and then they both try to do your update, requiring each of them to be able to acquire a read lock on the entire table.
I managed to do this using a MyISAM table for the sequences.
I then have a function called getNextCounter that does the following:
performs a SELECT sequence_value FROM sequences where sequence_name = 'test';
performs the update: UPDATE sequences SET sequence_value = LAST_INSERT_ID(last_retrieved_value + 1) WHERE sequence_name = 'test' and sequence_value = last retrieved value;
repeat in a loop until both queries are successful, then retrieve the last insert id.
As it is a MyISAM table it won't be part of your transaction, so the operation won't cause any deadlocks.

SQL, selecting and updating

I am trying to select 100s of rows at a DB that contains 100000s of row and update those rows afters.
the problem is I don't want to go to DB twice for this purpose since update only marks those rows as "read".
is there any way I can do this in java using simple jdbc libraries? (hopefully without using stored procedures)
update: ok here is some clarification.
there are a few instance of same application running on different servers, they all need to select 100s of "UNREAD" rows sorted according to creation_date column, read blob data within it, write it to file and ftp that file to some server. (I know prehistoric but requirements are requirements)
The read and update part is for to ensure each instance getting diffent set of data. (in order, tricks like odds and evens wont work :/)
We select data for update. the data transfers through the wire (we wait and wait) and then we update them as "READ". then release lock for reading. this entire thing takes too long. By reading and updating at the same time, I would like to reduce lock time (from time we use select for update to actual update) so that using multiple instances would increase read rows per second.
Still have ideas?
It seems to me there might be more than one way to interpret the question here.
You are selecting the rows for the
sole purpose of updating them and
not reading them.
You are selecting the rows to show
to somebody, and marking them as
read either one at a time or all as a group.
You want to select the rows and mark
them as read at the time you select
them.
Let's take Option 1 first, as that seems to be the easiest. You don't need to select the rows in order to update them, just issue an update with a WHERE clause:
update table_x
set read = 'T'
where date > sysdate-1;
Looking at option 2, you want to mark them as read when a user has read them (or a down stream system has received it, or whatever). For this, you'll probably have to do another update. If you query for the primary key, in addition to the other columns you'll need in the first select, you will probably have an easier time of updating, as the DB won't have to do table or index scans to find the rows.
In JDBC (Java) there is a facility to do a batch update, where you execute a set of updates all at once. That's worked out well when I need to perform a lot of updates that are of the exact same form.
Option 3, where you want to select and update all in one shot. I don't find much use for this, personally, but that doesn't mean others don't. I suppose some kind of stored procedure would reduce the round trips. I'm not sure what db you are working with here and can't really offer specifics.
Going to the DB isn't so bad. If you aren't returning anything 'across the wire' then an update shouldn't do you too much damage and its only a few hundred thousand rows. What is your worry?
If you're doing a SELECT in JDBC and iterating over the ResultSet to UPDATE each row, you're doing it wrong. That's an (n+1) query problem that will never perform well.
Just do an UPDATE with a WHERE clause that determines which of those rows needs to be updated. It's a single network round trip that way.
Don't be too code-centric. Let the database do the job it was designed for.
Can't you just use the same connection without closing it?