I'm trying to find how I am supposed to do this, someone can give me an explanation or show examples?
This is the question:
Start two Batch sql instances beside each other, to login twice in the same database and run two concurrent transactions. Show the effect of commit and rollback, and what happens if the two transactions try to commit conflicting changes. (Some hints can be found in the transaction example from the lecture.) Remember that by default, each SQL statement is considered its own transaction in Batch sql , and you have to give the command start transaction to start a multi-statement transaction.
I've tried to look around on the internet for an answer but since this question is broad it's kinda difficult for my level of understanding.
I think you need to do things like below and document/explain the results.
Create a table
Add some data into it
Update the data
Update different data in two separate concurrent transactions
Update the same data in two separate concurrent transactions
Update one unique row to be the same as another
Set up a circular reference (deadlock). Can an INSERT operation result in a deadlock?
I suspect the last one is what they are after but I'm not your lecturer so hard to know :) Have you tried asking for more clarity?
Related
I have a database table that I use as a queue system, where separate process that talk to each other create and read entries in the table. For example, when a user initiates a search an entry is created, then another process that runs every second or two will pick up that new entry, update the status and then do a search, updating the entry again when the search is complete. This all seems to work well with thousands of searches per hour.
However, I have a master admin screen that lets me view the status of all of these 'jobs' but it runs very slowly. I basically return all entries in the table for the last hour so I can keep an eye on what's going on. I think that I am running into lock issues of some sort. I only need to read each entry, and don't really care if it the data is a little bit out of date. I just use a standard 'Select * from Table' statement so maybe it is waiting for other locks to expire before returning data as the jobs are constantly updating the data.
Would this be handled better by a certain kind of cursor to return each row one at a time, etc? Any other ideas?
Thanks
If you really don't care if the data is a bit out of date... or if you only need the data to be 99.99% accurate, consider using WITH (NOLOCK):
SELECT * FROM Table WITH (NOLOCK);
This will instruct your query to use the READ UNCOMMITTED ISOLATION LEVEL, which has the following behavior:
Specifies that dirty reads are allowed. No shared locks are issued to
prevent other transactions from modifying data read by the current
transaction, and exclusive locks set by other transactions do not
block the current transaction from reading the locked data.
Be aware that NOLOCK may cause some inaccuracies in your data, so it probably isn't a good idea to use it throughout the rest of your system.
You need FROM yourtable WITH (NOLOCK) table hint.
You may also want to look at transaction isolation in your update process, if you aren't already
An alternative to NOLOCK (which can lead to very bad things, such as missed rows or duplicated rows) is to allow read committed snapshot isolation at the database level and then issue your query with:
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
The situation is as follows:
A big production client/server system where one central database table has a certain column that has had NULL as default value but now has 0 as default value. But all the rows created before that change of course still have value as null and that generates a lot of unnecessary error messages in this system.
Solution is of course simple as that:
update theTable set theColumn = 0 where theColumn is null
But I guess it's gonna take a lot of time to complete this transaction? Apart from that, will there be any other issues I should think of before I do this? Will this big transaction block the whole database, or that particular table during the whole update process?
This particular table has about 550k rows and 500k of them has null value and will be affected by the above sql statement.
The impact on the performance of other connected clients depends on:
How fast the servers hardware is
How many indexes containing the column your update statement has to update
Which transaction isolation settings the other clients connect to the database
The db engine will acquire write locks, so when your clients only need read access to the table, it should not be a big problem.
500.000 records sounds not too much for me, but as i said, the time and resources the update takes depends on many factors.
Do you have a similar test system, where you can try out the update?
Another solution is to split the one big update into many small ones and call them in a loop.
When you have clients writing frequently to that table, your update statement might get blocked "forever". I have seen databases where performing the update row by row was the only way of getting the update through. But that was a table with about 200.000.000 records and about 500 very active clients!
it's gonna take a lot of time to complete this transaction
there's no definite way to say this. Depends a lot on the hardware, number of concurrent sessions, whether the table has got locks, the number of interdependent triggers et al.
Will this big transaction block the whole database, or that particular table during the whole update process
If the "whole database" is dependent on this table then it might.
will there be any other issues I should think of before I do this
If the table has been locked by other transaction - you might run into a row-lock situation. In rare cases, perhaps a dead lock situation. Best would be to ensure that no one is utilizing the table, check for any pre-exising locks and then run the statement.
Locking issues are vendor specific.
Asuming no triggers on the table, half a million rows is not much for a dediated database server even with many indexes on the table.
This is in regards to MS SQL Server 2005.
I have an SSIS package that validates data between two different data sources. If it finds differences it builds and executes a SQL update script to fix the problem. The SQL Update script runs at the end of the package after all differences are found.
I'm wondering if it is necessary or a good idea to some how break down the sql update script into multiple transactions and whats the best way to do this.
The update script looks similar to this, but longer (example):
Update MyPartTable SET MyPartGroup = (Select PartGroupID From MyPartGroupTable
Where PartGroup = "Widgets"), PartAttr1 = 'ABC', PartAttr2 = 'DEF', PartAttr3 = '123'
WHERE PartNumber = 'ABC123';
For every error/difference found an additional Update query is added to the Update Script.
I only expect about 300 updates on a daily basis, but sometimes there could be 50,000. Should I break the script down into transactions every say 500 update queries or something?
don't optimize anything before you know there is a problem. if it is running fast, let it go. if it is running slow, make some changes.
No, I think the statement is fine as it is. It won't make much a of a difference in speed at all.
Billy Makes a valid point if you do care about the readability of the query(you should if it is a query that will be seen or used in the future.).
Would your system handle other processes reading the data that has yet to be updated? If so, you might want to perform multiple transactions.
The benefit of performing multiple transactions is that you will not continually accumulate locks. If you perform all these updates at once, SQL Server will eventually run out of small-grained lock resources (row/key) and upgrade to a table lock. When it does this, nobody else will be able to read from these tables until the transaction completes (unless they use dirty reads or are in snapshot mode).
The side effect is that other processes that read data may get inconsistent results.
So if nodoby else needs to use this data while you are updating, then sure, do all the updates in one transaction. If there are other processes that need to use the table, then yes, do it in chunks.
It shouldn't be a problem to split things up. However, if you want to A. maintain consistency between the items, and/or B. perform slightly better, you might want to use a single transaction for the while thing.
BEGIN TRANSACTION;
//Write 500 things
//Write 500 things
//Write 500 things
COMMIT TRANSACTION;
Transactions exist for just this reason -- where program logic would be clearer by splitting up queries but where data consistency between multiple actions is desired.
All records affected by the query will be either locked or copied into tempdb if the transaction operates in SNAPSHOT isolation level.
IF the number of records is high enough, the locks may be escalated.
If transaction isolation level is not SNAPSHOT, then a concurrent query will not be able to read the locked records which may be a concurrency problem for your application.
If transaction isolation level is SNAPSHOT, then tempdb should contain enough space to accomodate the old versions of the records, or the query will fail.
If either of this is a problem for you, then you should split the update into several chunks.
My employer has developed a utility that will run a stored procedure line by line against a DataTable, passing the fields of each row as parameters into the Stored Procedure. This is particularly useful for automated imports.
However, I now need to extend this to provide a transactional-ized version so that we can see the potential results of running the utility to provide a summary of what changes will make to the database. This could be as much as '3 rows inserted into Customer table', or '5 rows amended in Orders table'. The user could then decide whether to go ahead with the real import.
I know triggers could be set up on tables, however I'm not sure this would be possible in this case as all the tables referenced by the stored procedure would not be known.
Is there any other way of viewing changes made during a transaction, or does anyone have any other suggestions on how I could achieve this?
Many thanks.
Edited based on feedback and re-reading the question:
I agree with Remus in that no serious importer of data would want to visually inspect the data as it gets imported into the system.
As an ETL Writer, I would expect to do this in my staging area, and run queries that validate my data before it gets imported into the actual production place.
You could also get into issues with resources, deadlocks and blocks by implementing functionality that "holds" transactions until visually OK'ed by someone.
You snapshot the current LSN, run yout 'line by line' procedure in a transaction, then use fn_dblog to read back the log after the LSN you snapshotted. The changes made are the records in the log that a are stamped with the current transaction id. The wrapper transaction can be rolled back. Of course this will only work with an import of 3 rows in Customer and 5 rows Orders, no serious employer would consider doing something like this on a real sized import job. Imagine importing 1 mil Orders just to count them, then rolling back...
This will not work with any arbitrary procedure though as often time procedure make their own transaction management and they don't work as expected when invoked under a wrapping transaction.
I am developing a personal PHP/MySQL app, and I came across this particular scenario in my project:
I have various comment threads. This is handled by two tables - 'Comments' and 'Threads', with each comment in 'Comments' table having a 'thread_id' attribute indicating which thread the comment belongs to. When the user deletes a comment thread, currently I am doing two separate DELETE SQL queries:
First delete all the comments belonging to the thread in the 'Comments' table
Then, clearing the thread record from the 'Threads' table.
I also have another situation, where I need to insert data from a form into two separate tables.
Should I be using transactions for these kind of situations? If so, is it a general rule of thumb to use transactions whenever I need to perform such multiple SQL queries?
It depends on your actual needs, transactions are just a way of ensuring that all data manipulation that forms a single transaction gets executed successfully, and that transactions happen sequentially (a new transaction cannot be made until the previous one has either succeeded or failed). If one of the queries fails for whatever reason, the whole transaction will fail and the previous state will be restored.
If you absolutely need to make sure that no threads will be deleted unless all the comments have been deleted beforehand, go for transactions. If you need all the speed you can get, go for MyISAM
Yes, it is a general rule of thumb to use transactions when doing multiple operations that are related. If you do switch to InnoDB (usually a good idea, but not always. We didn't really discuss any requirements besides transactions, so I won't comment more), I'd also suggest setting a constraint on Comments that points to Threads as it sounds like a comment must be assigned to a thread. Deleting the thread would then remove associated comments in a single atomic statement.
If you want ACID transactions, you want InnoDB. If having one DELETE succeed and the other fail means having to manually DELETE the failed attempt, I'd say that's a hardship better handled with the database. Those situations call for transactions.
For the first part of your question I would recommend declaring thread_id as a foreign key in your comments table. This should reference the id column of the thread table. You can then set 'ON DELETE CASCADE' this means that when the ID is removed from the thread table all comments that reference that ID will also be deleted.