How do I speed up a SSIS Transfer Server Objects task that runs really slow? - sql-server-2005

Within SSIS 2005 I used the Import/Export wizard to create a package that drops/recreates and replaces the data on some tables between my production server and development machine. The control flow that was created by the wizard was extremely complicated so I created a new package and used the "Transfer SQL Server Objects Task" which is really easy to configure and setup as opposed to the crazy thing the wizard created. The problem is that the package that I created takes over 3 minutes to run while the wizard version takes about 20 seconds. They are basically doing the same thing, why such a difference in execution time and is there a setting that I can change in the package that is using the Transfer Objects task to make it run quicker?
Here is the package that the wizard created. I have created similiar packages before using the wizard that I had no problem editing, but I never saw anything like this before. I cannot figure out where to modify the tables and schema that I drop and create.alt text http://www.freeimagehosting.net/uploads/f7323b2ce3.png
Here is the properties of the transfer task inside that for loop container
alt text http://www.freeimagehosting.net/uploads/6f0dfc8269.png

What connection type are you using?
Here when I've been wanting to transfer between Oracle and SQL, the ADO.NET provider is miles slower than the Oracle OLE DB provider.

Why not use the wizard generated package and figure out what it does? It is obviously doing things very efficiently.

Could be quite a number of things. Are you doing lookups? If so, use joins instead. You can also run a db profile to see what the crazy package does opposed to your custom package.

I don't use the wizard, but could it have created a stored procedure that will actually do the work? That would explain how it is going faster, since the stored procedure can do all the work within the database.
I am curious what is within TransferTask, as that seems to be where all the work is done.
You could look at exporting the data to a flat file, then using a Bulk Import to do this faster.
For some more thoughts about how fast things go look at here, but most important is some of the comments that were given, such as how he used Bulk Insert wrong.
http://weblogs.sqlteam.com/mladenp/articles/10631.aspx
UPDATE:
You may want to also look at this:
http://blogs.lessthandot.com/index.php/DataMgmt/DBAdmin/title-12 as, toward the end, he shows how long his tests took, but the first comment may be the most useful part, for speeding your import up.

This class of performance problem usually stems from "commit" levels and logging.
The illustrated wizard generated task does a "start transaction" before entering the loop and commits after all the data is transferred. Which is the best thing to do if the table is not 'enormous'.
Have you left 'autocommit" on in your hand coded version?

Use the Fast Parse option on integer and date columns imports if not locale specific
Use the SQL Server Native Client 10.x OLE DB provider for an In-Memory, high performance connection, or consider using Attunity Drivers or SQL Server <---> Oracle
Set the IsSorted property on the output of an upstream data flow component to True.
Select the OLE DB Destination Data Access mode “Table or View – fast load”
Run tasks in parallel do not add unneeded precedence constraints
Avoid using select * in data flow task
([RunInOptimizedMode] property). Optimized mode improves performance by removing unused columns, outputs, and components from the data flow.
Un-Check Constraints box
Set the network packet size to 32k instead of the default 4k
Drop indexes on truncated/reload tables, and consider using truncate if using delete *
If tables change slightly consider using Merge
Consider using a dynamic index rebuild SP like the famous one listed here:
Load test it using UAT with SQL Server Profiler set to filter on application "ssis-%"
The default buffer size is 10 megabytes, with a maximum buffer size of 100 megabytes.
Seperate MDF/LDFs as well as TempDB & Defragment Disks
Find bottleneck in your Database by using DMVs
Change to RAID 10 or 0 from RAID 5 or other

Related

SQL: Automatically copy records from one database to another database

I am trying to find out an ideal way to automatically copy new records from one database to another. the databases have different structure! I achieved it by writing VBS scripts which copy the data from one to another and triggered the scripts from another application which passes arguments to the script. But I faced issues at points where there were more than 100 triggers. i.e. 100wscript processes trying to access the database and they couldn't complete the task.
I want to find out a simpler solution inside SQL, I read about setting triggers, Stored procedure and running them from SQL agent, replication etc. The requirement is that I have to copy records to another database periodically or when there is a new record into another database.
Which method will suit me the best?
You can use CDC to do this activity. Create a SSIS package using CDC and run that package periodically through SQL Server Agent Job. CDC will store all the changes of that table and will do all those changes to the destination table when you run the package. Please follow the below link.
http://sqlmag.com/sql-server-integration-services/combining-cdc-and-ssis-incremental-data-loads
The word periodically in your question suggests that you should go for Jobs. You can schedule jobs in SQL Server using Sql Server agent and assign a period. The job will run your script as per assigned frequency.
PrabirS: Change Data Capture
This is a good option. Because it uses the truncation-log to create something similar to the Command Query Segregation Pattern (CQRS).
Alok Gupta: A SQL Job that runs in the SQL Agent
This too is a good option, given that you have something like a modified date thus you can filter the altered data. You can create a Stored Procedure and let it run regularly in the SQL Agent.
A third option could be triggers (the change will happen in the same transaction).
This option is useful for auditing and logging. But you should definitely avoid writing business logic in triggers, as triggers are more or less hidden and occur without directly calling them (similar to CDC actually). I have actually created a trigger about half a year ago that captured the data and inserted it somewhere else in xml-format as the columns in the original table could change over time (multiple projects using the same database(s)).
-Edit-
By the way, your question more or less suggest a lack of a clear design pattern and that the used technique is not the main problem. You could try to read how an ETL-layer is build, or try to implement a "separations of concerns". Note; it is hard to tell if this is the case, but given how you formulated your question, an unclear design is something that pops up in my mind as possible problem.

SQL Server synonyms with concurrent execution

I work in a DW project where we do ETL using T-SQL with SQL Server 2012. Our code works well so far for a set of tables coming from a legacy system. However, the Data Architect has announced we will receive tables from other legacy systems. He wants the same piece of code to be able to handle all the tables of all the systems since they will have the same structure but exist in separate DBs in the same server.
On top of that, he requires the existing piece of code to run at the same time for all of the system's tables so we save execution time. I have been working on a POC using synonyms which seems to do the trick but it won't execute our code for different set of tables at the same time because we will end up overwriting the references of the synonym objects currently being executed. So I am now rethinking the whole problem: How to avoid rewriting the same code we have for each set of tables we get while at the same time allowing concurrent execution? Any insight or suggestion will be appreciated. Thanks.
I would look at using SSIS to do your ETL. You can change the location of the source data at run time and as long as the structure is the same it should work.

Database caching

I have windows server 2008 r2 with microsoft sql server installed.
In my application, I am currently designing a tool for my users, that is querying database to see, if user has any notifications. Since my users can access the application multiple times in a short timespan, i was thinking about putting some kind of a cache on my query logic. But then I thought, that my ms sql server probably does that already for me. Am I right? Or do I need to configure something to make it happen? If it does, then for how long does it keep the cache up?
It's safe to assume that MSSQL will has the caching worked out pretty well =)
Don't bother trying to build anything yourself on top of it, simply make sure that the method you use to query for changes is efficient (eg. don't query on non-indexed columns).
PS: wouldn't caching locally defeat the whole purpose of checking for changes on the database?
Internally the database does all sorts of things, including 'caching', but at all times it works incredibly hard to make sure your users see up-to-date data. So it has to do some work each time your application makes a request.
If you want to reduce the workload by keeping static data in your application then you have to implement it yourself.
The later versions of the .net framework have caching features built in so you should take a look at those (building your own caching can get very complex).
SQL Server will handle caching for you, yes. When you create a query or a stored procedure SQL Server will cache that execution plan and reuse it accordingly. From MSDN:
SQL Server execution plans have the following main components: Query
Plan The bulk of the execution plan is a re-entrant, read-only data
structure used by any number of users. This is referred to as the
query plan. No user context is stored in the query plan. There are
never more than one or two copies of the query plan in memory: one
copy for all serial executions and another for all parallel
executions. The parallel copy covers all parallel executions,
regardless of their degree of parallelism.
Execution Context, each user that is currently executing the query has a data structure that holds
the data specific to their execution, such as parameter values. This
data structure is referred to as the execution context. The execution
context data structures are reused. If a user executes a query and one
of the structures is not being used, it is reinitialized with the
context for the new user.
If you wish to clear this cache you can execute sp_recompile or DBCC FREEPROCHCACHE

How do you translate old SQL database data to a new table layout?

We have and old database with a poorly thought out table structure, virtually no relationships setup and no naming schemes. I've created a new database with a clean relational data structure that implements proper design practices.
I'm looking for advice on different methods to migrate the old data over to the new format. This will require a lot of data re-shaping which won't be fun. The data is heavily accessed and the challenge will be to keep both databases in sync for all relevant data (accounts, important services etc).
I thought triggers might be the way to go here - but maybe there is a different method that I am unaware of (maybe MS Sync Framework, or a code-level data adapter which will be more work because there is so much data access code spread all over the place, classic ASP and .Net over dozens of projects). The database in question is SQL Server 2005, running in SQL Server 2000 compatibility mode.
I think the way to go is to write a stored procedure in the new database, which will actually pull your delta changes (only the modifications that were done from the last run to the instant the stored proc is run), and put this stored procedure in the sql agent job.
Configure the sql agent job to run for every 15 minutes and let the data sync in.
disadvantages of using triggers in this scenario
triggers will reduce the performance, as the sql server will execute the trigger code as well along with the update/ insert /delete statements and includes these as part of the execution at every time, i.e. if your trigger code takes 2 seconds to execute and the update statement with no trigger takes 2 seconds to execute, then the update time will be increased to 4 seconds with trigger in place. So employing triggers in this case might result in huge performance bottle neck.
I'm dealing with the same situation at my work, and I'm currently writing an application to do the migration. The original database has no established relationships, so it's really like a set of disconnected spreadsheets. By building my own application, I'm able to migrate the data using newly-established foreign keys, and assign data-specific defaults in place of nulls.

Queries for migrating data in live database?

I am writing code to migrate data from our live Access database to a new Sql Server database which has a different schema with a reorganized structure. This Sql Server database will be used with a new version of our application in development.
I've been writing migrating code in C# that calls Sql Server and Access and transforms the data as required. I migrated for the first time a table which has entries related to new entries of another table that I have not updated recently, and that caused an error because the record in the corresponding table in SQL Server could not be found
So, my SqlServer productions table has data only up to 1/14/09, and I'm continuing to migrate more tables from Access. So I want to write an update method that can figure out what the new stuff is in Access that hasn't been reflected in Sql Server.
My current idea is to write a query on the SQL side which does SELECT Max(RunDate) FROM ProductionRuns, to give me the latest date in that field in the table. On the Access side, I would write a query that does SELECT * FROM ProductionRuns WHERE RunDate > ?, where the parameter is that max date found in SQL Server, and perform my translation step in code, and then insert the new data in Sql Server.
What I'm wondering is, do I have the syntax right for getting the latest date in that Sql Server table? And is there a better way to do this kind of migration of a live database?
Edit: What I've done is make a copy of the current live database. Which I can then migrate without worrying about changes, then use that to test during development, and then I can migrate the latest data whenever the new database and application go live.
I personally would divide the process into two steps.
I would create an exact copy of Access DB in SQLServer and copy all the data
Copy the data from this temporary SQLServer DB to your destination database
In that way you can write set of SQL code to accomplish second step task
Alternatively use SSIS
Generally when you convert data to a new database that will take it's place in porduction, you shut out all users of the database for a period of time, run the migration and turn on the new database. This ensures no changes to the data are made while doing the conversion. Of course I never would have done this using c# either. Data migration is a database task and should have been done in SSIS (or DTS if you have an older version of SQL Server).
If the databse you are converting to is just in development, I would create a backup of the Access database and load the data from there to test the data loading process and to get the data in so you can do the application development. Then when it is time to do the real load, you just close down the real database to users and use it to load from. If you are trying to keep both in synch wile you develop, well I wouldn't do that but if you must, make a nightly backup of the file and load first thing in the morning using your process.
You may want to look at investing in a tool like SQL Data Compare.
I believe it has support for access databases too, and you can download a trial.
I you are happy with you C# code, but it fails because of the constraints in your destination database you temporarily can disable them and then enable after you copy the whole lot.
I am assuming that your destination database is brand new DB with no data, and not used by anyone when the transfer happens
It sounds like you have two problems:
You're migrating data from one database to another.
You're changing your schema.
Doing either of these things is tricky if you are trying to migrate the data while people are using the data.
The simplest approach is to migrate the data based on a static copy of the data, and also to queue updates to that data from the moment you captured the static copy. I don't know how easy this is in Access, but in SQLServer or Oracle you can use the redo logs for this or a manual solution using triggers. The poor-man's way of doing this is to make triggers for all the relevant tables that log the primary key of the records that have changed. Then after the old database is shut off you can iterate over those keys and get those records from the old database and put them into the new database. Just copy the whole record; if the record was deleted then delete it from the new database.
Your problem is compounded by the fact that you can't simply copy the data, you have to transform it. This means you probably have to shut down both databases and re-migrate the records based on the change list. It will take a lot of planning to ensure you get things right and I'd recommend writing a testing script that can validate that the resulting data is correct.
Also I'd ensure that the code for the migration runs inside one of the databases if possible. Otherwise you are copying the data twice and this will significantly harm the performance.