My employer has developed a utility that will run a stored procedure line by line against a DataTable, passing the fields of each row as parameters into the Stored Procedure. This is particularly useful for automated imports.
However, I now need to extend this to provide a transactional-ized version so that we can see the potential results of running the utility to provide a summary of what changes will make to the database. This could be as much as '3 rows inserted into Customer table', or '5 rows amended in Orders table'. The user could then decide whether to go ahead with the real import.
I know triggers could be set up on tables, however I'm not sure this would be possible in this case as all the tables referenced by the stored procedure would not be known.
Is there any other way of viewing changes made during a transaction, or does anyone have any other suggestions on how I could achieve this?
Many thanks.
Edited based on feedback and re-reading the question:
I agree with Remus in that no serious importer of data would want to visually inspect the data as it gets imported into the system.
As an ETL Writer, I would expect to do this in my staging area, and run queries that validate my data before it gets imported into the actual production place.
You could also get into issues with resources, deadlocks and blocks by implementing functionality that "holds" transactions until visually OK'ed by someone.
You snapshot the current LSN, run yout 'line by line' procedure in a transaction, then use fn_dblog to read back the log after the LSN you snapshotted. The changes made are the records in the log that a are stamped with the current transaction id. The wrapper transaction can be rolled back. Of course this will only work with an import of 3 rows in Customer and 5 rows Orders, no serious employer would consider doing something like this on a real sized import job. Imagine importing 1 mil Orders just to count them, then rolling back...
This will not work with any arbitrary procedure though as often time procedure make their own transaction management and they don't work as expected when invoked under a wrapping transaction.
Related
I do have the stored procedure to calculate some facts (say usp_calculate). It fills the cache-like table. The part of the table (determined by the arguments of the procedure) table must be recalculated every 20 minutes. Basically, the usp_calculate returns early if cached data is fresh, or it spends say a minute to calculate... and returns after that.
The usp_calculate is shared by more outer procedures that needs the data. How should I prevent starting the time-consuming part of the procedure if it was already started by some other process? How can I implement a kind of signaling and waiting for the result instead of starting the calculation again?
Context: I do have an SQL stored procedure named say usp_products. It finally performs a SELECT that returns the rows for a product code, a product name, and calculated information -- special price for a customer, and for the storage location. There is a lot of combinations (customers, price lists, other conditions) that prevent to precalculate the information by a separate process. It must be calculated on-demand, for the specific combination.
The third party database that is the source of the information is not designed for detecting changes. Anyway, the time condition (not older than 20 minutes) is considered a good enough to consider the data "fresh".
The building block for this is probably going to be application locks.
You can obtain an exclusive application lock using sp_getapplock. You can then determine if the data is "fresh enough" either using freshness information inherently contained in that data or a separate table that you use for tracking this.
At this point, if necessary, you refresh the data and update the freshness information.
Finally, you release the lock using sp_releaseapplock and let all of the other callers have their chance to acquire the lock and discover that the data is fresh.
I'm trying to find how I am supposed to do this, someone can give me an explanation or show examples?
This is the question:
Start two Batch sql instances beside each other, to login twice in the same database and run two concurrent transactions. Show the effect of commit and rollback, and what happens if the two transactions try to commit conflicting changes. (Some hints can be found in the transaction example from the lecture.) Remember that by default, each SQL statement is considered its own transaction in Batch sql , and you have to give the command start transaction to start a multi-statement transaction.
I've tried to look around on the internet for an answer but since this question is broad it's kinda difficult for my level of understanding.
I think you need to do things like below and document/explain the results.
Create a table
Add some data into it
Update the data
Update different data in two separate concurrent transactions
Update the same data in two separate concurrent transactions
Update one unique row to be the same as another
Set up a circular reference (deadlock). Can an INSERT operation result in a deadlock?
I suspect the last one is what they are after but I'm not your lecturer so hard to know :) Have you tried asking for more clarity?
I have a copy database that has one table. It needs to be refreshed each night. There are three approaches:
Use a VB6 recordset or .NET datareader to loop through all the records in the table. Make the appropriate changes.
Use SSIS to truncate the data in the table and then refresh it
Use a checksum to establish what records have changed then refresh that data using SSIS
Use a checksum to establish what records have changed then refresh that data using a recordset or datareader
The problem with approach one is that it is too slow. It takes about two weeks to run as there are 90,000,000 records. Approach four is too slow as well. There are about 20,000 updates per day.
Therefore I believe it is between option two and option three. Option two only takes about fifteen minutes. However, users could be searching the table whilst it is being truncated and refreshed.
I am wondering if I can use transactions to isolate the work. However, if I use a serialised transaction whilst the data is being refreshed then the table is locked for fifteen minutes. Is there another option?
What about a stored procedure that gets kicked off with a SQL job? It would probably work similarly to #1 but be faster. Re-indexing before doing your update would probably help too.
Definitely option 3. Truncating 90 mil. rows to refresh 20,000 is (as I'm sure you know) very inefficient and will just fill up your transaction logs unnecessarily.
SSIS has full transaction handling support, so should easily be able to meet your needs.
Here is a good article on using transactions with SSIS: https://www.mssqltips.com/sqlservertip/1585/how-to-use-transactions-in-sql-server-integration-services-ssis/
I have one question regarding trigger.
The scenario is like this
Create Procedure
begin
Insert into XYZ (a) values (b)
end
Now i have placed the trigger on INSERT - AFTER on table XYZ.
In that trigger there is business logic which takes 2-3 seconds to execute it, business logic is performed against other database table not on the XYZ table
So what i need to confirm here that once INSERT is been done, then the table XYZ will be ready to do insert for another records or it will be locked until the trigger is completed?
EDIT
I have done some more research on this issue and explain it below
In the INSERT - TRIGGER, i have put the my business logic and also below line
WAITFOR DELAY '00:01'
Now when i try to execute the above SP, the SP was not completed for 1 minues (as i have specified the delay of 1 minute in the trigger) and table XYZ was also locked during this period.
So this brings me to the conclusion that trigger does LOCKS the table even if you are not using the same table in the trigger. Am i right? Does anyone has different opinion here?
The question and answer linked to by #Hallainzil show one approach:
Wrap all table INSERTs and UPDATEs into Stored Procedures
The SP can then complete the additional Business Logic without maintaining a lock
There is also another approach which is slightly messier in several ways, but also more flexible in many ways:
Keep a record of which fields have been INSERTED or UPDATED
Have an agent job fire repeatedly or overnight to process those changes
You may use a trigger to keep that record. Maybe with a LastModifiedTime field, or a hasBeenProcessed field, or even a separate tracking table. It can be done in many ways, and is relatively light weight to maintain (none of the business logic happens yet).
This releases your table from any locks as quickly as possible. It also means that you are able to deal with logins that have the ability to write directly to your table, circumventing your Stored Procedures.
The down side is that your INSERTS/UPDATES and your business logic are being processed Asynchronously. Your other SQL code may need to check whether or not the business logic ha sbeen completed yet, rather than just assuming that both the INSERT and the Business Logic always happen atomically.
So, yes, there are ways of avoiding this locking. But you introduce additional constraints and/or complexity to your model. This is by no means a bad thing, but it needs to be considered within your overall design.
This is in regards to MS SQL Server 2005.
I have an SSIS package that validates data between two different data sources. If it finds differences it builds and executes a SQL update script to fix the problem. The SQL Update script runs at the end of the package after all differences are found.
I'm wondering if it is necessary or a good idea to some how break down the sql update script into multiple transactions and whats the best way to do this.
The update script looks similar to this, but longer (example):
Update MyPartTable SET MyPartGroup = (Select PartGroupID From MyPartGroupTable
Where PartGroup = "Widgets"), PartAttr1 = 'ABC', PartAttr2 = 'DEF', PartAttr3 = '123'
WHERE PartNumber = 'ABC123';
For every error/difference found an additional Update query is added to the Update Script.
I only expect about 300 updates on a daily basis, but sometimes there could be 50,000. Should I break the script down into transactions every say 500 update queries or something?
don't optimize anything before you know there is a problem. if it is running fast, let it go. if it is running slow, make some changes.
No, I think the statement is fine as it is. It won't make much a of a difference in speed at all.
Billy Makes a valid point if you do care about the readability of the query(you should if it is a query that will be seen or used in the future.).
Would your system handle other processes reading the data that has yet to be updated? If so, you might want to perform multiple transactions.
The benefit of performing multiple transactions is that you will not continually accumulate locks. If you perform all these updates at once, SQL Server will eventually run out of small-grained lock resources (row/key) and upgrade to a table lock. When it does this, nobody else will be able to read from these tables until the transaction completes (unless they use dirty reads or are in snapshot mode).
The side effect is that other processes that read data may get inconsistent results.
So if nodoby else needs to use this data while you are updating, then sure, do all the updates in one transaction. If there are other processes that need to use the table, then yes, do it in chunks.
It shouldn't be a problem to split things up. However, if you want to A. maintain consistency between the items, and/or B. perform slightly better, you might want to use a single transaction for the while thing.
BEGIN TRANSACTION;
//Write 500 things
//Write 500 things
//Write 500 things
COMMIT TRANSACTION;
Transactions exist for just this reason -- where program logic would be clearer by splitting up queries but where data consistency between multiple actions is desired.
All records affected by the query will be either locked or copied into tempdb if the transaction operates in SNAPSHOT isolation level.
IF the number of records is high enough, the locks may be escalated.
If transaction isolation level is not SNAPSHOT, then a concurrent query will not be able to read the locked records which may be a concurrency problem for your application.
If transaction isolation level is SNAPSHOT, then tempdb should contain enough space to accomodate the old versions of the records, or the query will fail.
If either of this is a problem for you, then you should split the update into several chunks.