I have more than hundred tables in a linked server (lets say on a sql server 1). I have to perform an initial load, basically a simple dump, by creating duplicate copy of those hundred tables in to sql server 2 destination. I know how to perform data flow task in SSIS to extract data from a source and load it in a destination (creating a table in the destination as well). With more than hundred tables, I would need to create more than hundred data flow tasks which is very time consuming. So I have heard about copying files from source to destination dynamically by looping through and creating variables. Now, how do I do this? Remeber, those hundred tables do no contain similar structure. How can I perform this initial load faster without using multiple data flow task in SSIS. Please, help! Thank you!
I would use the Import and Export Wizard - it can do all those tasks you described in one pass. You just tell it your Source and Destination and check the tables you want. It generates an SSIS package, which you can save and customize if you want.
It's easiest to find via the Windows Start Menu, under SQL Server [version] / Import and Export Data.
If you would like to automate transferring data from 100+ tables, I would consider using BIML. BIML is a script language that enables you to generate SSIS packages based on the template you define. This template in your case may include the creation of the tables (if they do not exist) and the mapping / copying of the source. You can then wrap the resulting SSIS packages inside another BIML Master package.
It can be a little clunky if you are not using MIST, but its incredibly powerful once you get into it. A good starting point for you would be Andy Leonard's Stairway to heaven series as it provides a step-by-step walk through of moving data from source to target. After the stairway guide, check out BIML Script
Related
I am attempting to create an ETL design document that details source and destination columns and their relations, by looking at metadata for the columns. The format should be similar to below:
Final document format
I have utilised something similar to the site below to get data for the output/destination database:
MS SQL TIPS - list-columns-and-attributes-for-every-table-in-a-sql-server-database
Now I am trying to get the same information for the source database but I am not sure if I am allowed to run such a query directly on the source as it is important data.
Is there a way I can use SSIS or look at the source in SSMS to see all relationships I need?
I have the packages in SSIS that details as to what transformations I will apply to the source, via sql queries. I've tried looking at packages individually but there are a lot and there should be an easier way I am missing.
Depends on your sources. If they are all queries from tables, you can probably parse them from the .dtsx files. If any of them are stored procs or views, then there's probably nothing you can do without querying the source database.
I want to make a SSIS package that will load data from an excel file to a database table.
I have already made a package that completes the task, but the table needs to be recreated every time the excel data is loaded because the excel data and its column definition changes every month. If the table is not created with every execution there will be errors and my task will not be complete because excel data will be loaded under a wrong column definition.
Is there any way to dynamically drop and create the table every time?
SSIS generates a lot of metadata behind the scenes which describe your source and destination fields and mappings. If it feels it isn't talking to the same data source/destination it will often throw a validation error and refuse to start.
To an extent you can mitigate this by setting DelayValidation to True on the connection manager properties. However this is unlikely to help in your case as your data spec genuinely is changing.
A further option to get around the tight controls on data format is to write your own custom source/transformation/destination logic in script tasks. You can write a script task to read the format of the incoming Excel file, and save those details to variables and pass them into an Execute SQL Task to create the table. Then you can write a script task to dynamically map your data to some generic intermediate columns that exist within your package. Finally you can write a script task destination to load that data into the newly created table.
Basically you're bypassing all out of the box SSIS functionality and writing your own integration solution from the ground up, but it seems you don't have much choice.
I won't have access to SSIS until tomorrow so I thought I'd ask for advice before I start work on this project.
We currently use Access to store our data. It's not stored in a relational format so it's an awful mess. We want to move to a centralized database (SQL Server 2008 R2), which would require rewriting much of our codebase (which, incidentally, is also an awful mess.) Due to a time constraint, well before that can be done we are going to need to get a centralized database set up solely for the purpose of on-demand report generation for a client. So, our applications will still be running on Access. Instead of:
Receive data -> Import to Access initial file with one table -> Data processing -> Access result file with one table -> Report generation
The goal is:
Receive data -> Import to Access initial file with one table -> Import initial data to multiple tables in SQL Server -> Export Access working file with one table -> Data processing -> Access result file -> Import result to multiple tables in SQL Server -> Report generation whenever
We're going to use SSRS for the reporting component, which seems like it'll be straightforward enough. I'm not sure if SSIS alone would work well for splitting the Access data up into numerous tables, or if everything should be imported into a staging table with SSIS and then split up with stored procedures, or if I'll need to be writing a standalone application for this.
Haven't done much of any work with SQL Server before, so any advice is appreciated.
In SSIS package, you can write code (e.g. C#) to do your own/custom data transformations. However, SSIS comes with built-in transformations that may be good for your needs. SSIS is very powerful and flexible. Actually, you may do pretty much anything you want with the data in SSIS.
The high level workflow for your task could like like the following:
1. Connect to the data source and pull the data
2. Transform the data
3. Output data to the destination data source
You certainly can split a data flow into two separate branches and send it to two destinations. All you need to do is put a multi-cast in the dataflow and then the bulk of the transformations will happen after that.
From what you've said, however, a better solution might be to use the Access tables as a staging database and then grab the data from there and send it to SQL Server. That would mean two data flows but it will be a cleaner implementation.
Within SSIS 2005 I used the Import/Export wizard to create a package that drops/recreates and replaces the data on some tables between my production server and development machine. The control flow that was created by the wizard was extremely complicated so I created a new package and used the "Transfer SQL Server Objects Task" which is really easy to configure and setup as opposed to the crazy thing the wizard created. The problem is that the package that I created takes over 3 minutes to run while the wizard version takes about 20 seconds. They are basically doing the same thing, why such a difference in execution time and is there a setting that I can change in the package that is using the Transfer Objects task to make it run quicker?
Here is the package that the wizard created. I have created similiar packages before using the wizard that I had no problem editing, but I never saw anything like this before. I cannot figure out where to modify the tables and schema that I drop and create.alt text http://www.freeimagehosting.net/uploads/f7323b2ce3.png
Here is the properties of the transfer task inside that for loop container
alt text http://www.freeimagehosting.net/uploads/6f0dfc8269.png
What connection type are you using?
Here when I've been wanting to transfer between Oracle and SQL, the ADO.NET provider is miles slower than the Oracle OLE DB provider.
Why not use the wizard generated package and figure out what it does? It is obviously doing things very efficiently.
Could be quite a number of things. Are you doing lookups? If so, use joins instead. You can also run a db profile to see what the crazy package does opposed to your custom package.
I don't use the wizard, but could it have created a stored procedure that will actually do the work? That would explain how it is going faster, since the stored procedure can do all the work within the database.
I am curious what is within TransferTask, as that seems to be where all the work is done.
You could look at exporting the data to a flat file, then using a Bulk Import to do this faster.
For some more thoughts about how fast things go look at here, but most important is some of the comments that were given, such as how he used Bulk Insert wrong.
http://weblogs.sqlteam.com/mladenp/articles/10631.aspx
UPDATE:
You may want to also look at this:
http://blogs.lessthandot.com/index.php/DataMgmt/DBAdmin/title-12 as, toward the end, he shows how long his tests took, but the first comment may be the most useful part, for speeding your import up.
This class of performance problem usually stems from "commit" levels and logging.
The illustrated wizard generated task does a "start transaction" before entering the loop and commits after all the data is transferred. Which is the best thing to do if the table is not 'enormous'.
Have you left 'autocommit" on in your hand coded version?
Use the Fast Parse option on integer and date columns imports if not locale specific
Use the SQL Server Native Client 10.x OLE DB provider for an In-Memory, high performance connection, or consider using Attunity Drivers or SQL Server <---> Oracle
Set the IsSorted property on the output of an upstream data flow component to True.
Select the OLE DB Destination Data Access mode “Table or View – fast load”
Run tasks in parallel do not add unneeded precedence constraints
Avoid using select * in data flow task
([RunInOptimizedMode] property). Optimized mode improves performance by removing unused columns, outputs, and components from the data flow.
Un-Check Constraints box
Set the network packet size to 32k instead of the default 4k
Drop indexes on truncated/reload tables, and consider using truncate if using delete *
If tables change slightly consider using Merge
Consider using a dynamic index rebuild SP like the famous one listed here:
Load test it using UAT with SQL Server Profiler set to filter on application "ssis-%"
The default buffer size is 10 megabytes, with a maximum buffer size of 100 megabytes.
Seperate MDF/LDFs as well as TempDB & Defragment Disks
Find bottleneck in your Database by using DMVs
Change to RAID 10 or 0 from RAID 5 or other
Is there an automatic way in SQL Server 2005 to create a database from several tables in another database? I need to work on a project and I only need a few tables to run it locally, and I don't want to make a backup of a 50 gig DB.
UPDATE
I tried the Tasks -> Export Data in Management studio, and while it created a new sub database with the tables I wanted, it did not copy over any table metadata, ie...no PK/FK constraints and no Identity data (Even with Preserve Identity checked).
I obviously need these for it to work, so I'm open to other suggestions. I'll try that database publishing tool.
I don't have Integration Services available, and the two SQL Servers cannot directly connect to each other, so those are out.
Update of the Update
The Database Publishing Tool worked, the SQL it generated was slightly buggy, so a little hand editing was needed (Tried to reference nonexistent triggers), but once I did that I was good to go.
You can use the Database Publishing Wizard for this. It will let you select a set of tables with or without the data and export it into a .sql script file that you can then run against your other db to recreate the tables and/or the data.
Create your new database first. Then right-click on it and go to the Tasks sub-menu in the context menu. You should have some kind of import/export functionality in there. I can't remember exactly since I'm not at work right now! :)
From there, you will get to choose your origin and destination data sources and which tables you want to transfer. When you select your tables, click on the advanced (or options) button and select the check box called "preserve primary keys". Otherwise, new primary key values will be created for you.
I know this method can hardly be called automatic but why don't you use a few simple SELECT INTO statements?
Because I'd have to reconstruct the schema, constraints and indexes first. Thats the part I want to automate...Getting the data is the easy part.
Thanks for your suggestions everyone, looks like this is easy.
Integration Services can help accomplish this task. This tool provids advanced data transformation capabilities so you will be able to get exact subset of data that you need from large database.
Assuming that such data is needed for testing/debugging you may consider applying Row Sampling to reduce amount of data exported.
Create new database
Right click on it,
Tasks -> Import Data
Follow instructions