Will SSIS work well for importing to multiple tables?

Will SSIS work well for importing to multiple tables? - sql

I won't have access to SSIS until tomorrow so I thought I'd ask for advice before I start work on this project.
We currently use Access to store our data. It's not stored in a relational format so it's an awful mess. We want to move to a centralized database (SQL Server 2008 R2), which would require rewriting much of our codebase (which, incidentally, is also an awful mess.) Due to a time constraint, well before that can be done we are going to need to get a centralized database set up solely for the purpose of on-demand report generation for a client. So, our applications will still be running on Access. Instead of:
Receive data -> Import to Access initial file with one table -> Data processing -> Access result file with one table -> Report generation
The goal is:
Receive data -> Import to Access initial file with one table -> Import initial data to multiple tables in SQL Server -> Export Access working file with one table -> Data processing -> Access result file -> Import result to multiple tables in SQL Server -> Report generation whenever
We're going to use SSRS for the reporting component, which seems like it'll be straightforward enough. I'm not sure if SSIS alone would work well for splitting the Access data up into numerous tables, or if everything should be imported into a staging table with SSIS and then split up with stored procedures, or if I'll need to be writing a standalone application for this.
Haven't done much of any work with SQL Server before, so any advice is appreciated.

In SSIS package, you can write code (e.g. C#) to do your own/custom data transformations. However, SSIS comes with built-in transformations that may be good for your needs. SSIS is very powerful and flexible. Actually, you may do pretty much anything you want with the data in SSIS.
The high level workflow for your task could like like the following:
1. Connect to the data source and pull the data
2. Transform the data
3. Output data to the destination data source

You certainly can split a data flow into two separate branches and send it to two destinations. All you need to do is put a multi-cast in the dataflow and then the bulk of the transformations will happen after that.
From what you've said, however, a better solution might be to use the Access tables as a staging database and then grab the data from there and send it to SQL Server. That would mean two data flows but it will be a cleaner implementation.

Related

Is it possible to transfer data to SQL Server from an unconventional database

My company has a SQL Server database which they would like to populate with data from a hierarchical database (as opposed to relational). I have already written a .net application to map its schema to a relational database and have successfully done that. However my problem is that the tech being used here is so old that I see no obvious way of data transfer.
However, I have certain ideas about how I can do this. This involves having to write file scans in my unconventional database and dump out files as csv. Then do a bulk upload into SQL Server. I do not appreciate this as there is the element of invalid data involved which terminates the bulk upload quite so often.
I was hoping to explore options around service broker. I was hoping to dump out live transactions where a record has changed in my database and then this can somehow be picked up?
Secondly I was also hoping to use something which if I dump out live or changed records in a file (I can format the file to whatever format is needed), can something suck it into SQL Server?
Any help would be greatly appreciated.
Regards,
Waqar

Service Broker is a very powerful queue/messaging management system. I am not sure why you want to use it for this.
You can set up an SSIS job that keeps checking a folder for csv files and when detects a new one it reads it into SQL Server and then zips it and archives it somewhere else. This is very common. SSIS can then either process the data (its a wonderful ETL tool) or invoke procedures in SQL Server to process the data. SSIS is very fast and is rarely overwhelmed so why would you use Service Broker?
If its IMS (mainframe) type data you have to convert it to flat tables and then as csv type text tables for SQL Server to read.
SQL server is very good at processing XML and, as of 2016, JSON shaped data, so if that is your data type you can directly import into SQL Server.

Skip bulk insert. The SQL Server xml data type lends itself to doing what you're doing. If you can output data from your other system into an XML format, you can push that XML directly into an argument for a stored procedure.
After that, you can use the functions for the XML type to iterate through the data as needed.

How to use Master Data Services data then? The MDS lifecycle

So I know that we need to create Master Data Services db to make data clean, right, consistent, etc. We can import some data there, then process it.. and then? We should then export it to another db, is it so? So MDS is like a set of tools to clean and make your data right, and it is for only one use, right? I mean: we have our data -> we load it in MDS db with SSIS -> we process it, apply business rules, etc. -> we export it to our SQL Server db with SSIS -> then we can use it as we like.
Am I right?
I want to understand how MDS are using in practice, where data is going after MDS processes and where from data is got.
Thanks and sorry if it is dump question.

Using Master Data Services in conjunction with Data Quality Services and Integration Services, enables you to clean & normalize a data set, store it as your master (trusted) set of data, and enable others in your organization to view/share this master/trusted data set.
You may find this tutorial, last updated for SQL Server 2014, helpful in better understanding how to use these three technologies together -- https://technet.microsoft.com/en-us/library/jj819782(v=sql.120).aspx

How to load from multiple sources to multiple destination dynamically

I have more than hundred tables in a linked server (lets say on a sql server 1). I have to perform an initial load, basically a simple dump, by creating duplicate copy of those hundred tables in to sql server 2 destination. I know how to perform data flow task in SSIS to extract data from a source and load it in a destination (creating a table in the destination as well). With more than hundred tables, I would need to create more than hundred data flow tasks which is very time consuming. So I have heard about copying files from source to destination dynamically by looping through and creating variables. Now, how do I do this? Remeber, those hundred tables do no contain similar structure. How can I perform this initial load faster without using multiple data flow task in SSIS. Please, help! Thank you!

I would use the Import and Export Wizard - it can do all those tasks you described in one pass. You just tell it your Source and Destination and check the tables you want. It generates an SSIS package, which you can save and customize if you want.
It's easiest to find via the Windows Start Menu, under SQL Server [version] / Import and Export Data.

If you would like to automate transferring data from 100+ tables, I would consider using BIML. BIML is a script language that enables you to generate SSIS packages based on the template you define. This template in your case may include the creation of the tables (if they do not exist) and the mapping / copying of the source. You can then wrap the resulting SSIS packages inside another BIML Master package.
It can be a little clunky if you are not using MIST, but its incredibly powerful once you get into it. A good starting point for you would be Andy Leonard's Stairway to heaven series as it provides a step-by-step walk through of moving data from source to target. After the stairway guide, check out BIML Script

How do I speed up a SSIS Transfer Server Objects task that runs really slow?

Within SSIS 2005 I used the Import/Export wizard to create a package that drops/recreates and replaces the data on some tables between my production server and development machine. The control flow that was created by the wizard was extremely complicated so I created a new package and used the "Transfer SQL Server Objects Task" which is really easy to configure and setup as opposed to the crazy thing the wizard created. The problem is that the package that I created takes over 3 minutes to run while the wizard version takes about 20 seconds. They are basically doing the same thing, why such a difference in execution time and is there a setting that I can change in the package that is using the Transfer Objects task to make it run quicker?
Here is the package that the wizard created. I have created similiar packages before using the wizard that I had no problem editing, but I never saw anything like this before. I cannot figure out where to modify the tables and schema that I drop and create.alt text http://www.freeimagehosting.net/uploads/f7323b2ce3.png
Here is the properties of the transfer task inside that for loop container
alt text http://www.freeimagehosting.net/uploads/6f0dfc8269.png

What connection type are you using?
Here when I've been wanting to transfer between Oracle and SQL, the ADO.NET provider is miles slower than the Oracle OLE DB provider.

Why not use the wizard generated package and figure out what it does? It is obviously doing things very efficiently.

Could be quite a number of things. Are you doing lookups? If so, use joins instead. You can also run a db profile to see what the crazy package does opposed to your custom package.

I don't use the wizard, but could it have created a stored procedure that will actually do the work? That would explain how it is going faster, since the stored procedure can do all the work within the database.
I am curious what is within TransferTask, as that seems to be where all the work is done.
You could look at exporting the data to a flat file, then using a Bulk Import to do this faster.
For some more thoughts about how fast things go look at here, but most important is some of the comments that were given, such as how he used Bulk Insert wrong.
http://weblogs.sqlteam.com/mladenp/articles/10631.aspx
UPDATE:
You may want to also look at this:
http://blogs.lessthandot.com/index.php/DataMgmt/DBAdmin/title-12 as, toward the end, he shows how long his tests took, but the first comment may be the most useful part, for speeding your import up.

This class of performance problem usually stems from "commit" levels and logging.
The illustrated wizard generated task does a "start transaction" before entering the loop and commits after all the data is transferred. Which is the best thing to do if the table is not 'enormous'.
Have you left 'autocommit" on in your hand coded version?

Use the Fast Parse option on integer and date columns imports if not locale specific
Use the SQL Server Native Client 10.x OLE DB provider for an In-Memory, high performance connection, or consider using Attunity Drivers or SQL Server <---> Oracle
Set the IsSorted property on the output of an upstream data flow component to True.
Select the OLE DB Destination Data Access mode “Table or View – fast load”
Run tasks in parallel do not add unneeded precedence constraints
Avoid using select * in data flow task
([RunInOptimizedMode] property). Optimized mode improves performance by removing unused columns, outputs, and components from the data flow.
Un-Check Constraints box
Set the network packet size to 32k instead of the default 4k
Drop indexes on truncated/reload tables, and consider using truncate if using delete *
If tables change slightly consider using Merge
Consider using a dynamic index rebuild SP like the famous one listed here:
Load test it using UAT with SQL Server Profiler set to filter on application "ssis-%"
The default buffer size is 10 megabytes, with a maximum buffer size of 100 megabytes.
Seperate MDF/LDFs as well as TempDB & Defragment Disks
Find bottleneck in your Database by using DMVs
Change to RAID 10 or 0 from RAID 5 or other

Queries for migrating data in live database?

I am writing code to migrate data from our live Access database to a new Sql Server database which has a different schema with a reorganized structure. This Sql Server database will be used with a new version of our application in development.
I've been writing migrating code in C# that calls Sql Server and Access and transforms the data as required. I migrated for the first time a table which has entries related to new entries of another table that I have not updated recently, and that caused an error because the record in the corresponding table in SQL Server could not be found
So, my SqlServer productions table has data only up to 1/14/09, and I'm continuing to migrate more tables from Access. So I want to write an update method that can figure out what the new stuff is in Access that hasn't been reflected in Sql Server.
My current idea is to write a query on the SQL side which does SELECT Max(RunDate) FROM ProductionRuns, to give me the latest date in that field in the table. On the Access side, I would write a query that does SELECT * FROM ProductionRuns WHERE RunDate > ?, where the parameter is that max date found in SQL Server, and perform my translation step in code, and then insert the new data in Sql Server.
What I'm wondering is, do I have the syntax right for getting the latest date in that Sql Server table? And is there a better way to do this kind of migration of a live database?
Edit: What I've done is make a copy of the current live database. Which I can then migrate without worrying about changes, then use that to test during development, and then I can migrate the latest data whenever the new database and application go live.

I personally would divide the process into two steps.
I would create an exact copy of Access DB in SQLServer and copy all the data
Copy the data from this temporary SQLServer DB to your destination database
In that way you can write set of SQL code to accomplish second step task
Alternatively use SSIS

Generally when you convert data to a new database that will take it's place in porduction, you shut out all users of the database for a period of time, run the migration and turn on the new database. This ensures no changes to the data are made while doing the conversion. Of course I never would have done this using c# either. Data migration is a database task and should have been done in SSIS (or DTS if you have an older version of SQL Server).
If the databse you are converting to is just in development, I would create a backup of the Access database and load the data from there to test the data loading process and to get the data in so you can do the application development. Then when it is time to do the real load, you just close down the real database to users and use it to load from. If you are trying to keep both in synch wile you develop, well I wouldn't do that but if you must, make a nightly backup of the file and load first thing in the morning using your process.

You may want to look at investing in a tool like SQL Data Compare.
I believe it has support for access databases too, and you can download a trial.

I you are happy with you C# code, but it fails because of the constraints in your destination database you temporarily can disable them and then enable after you copy the whole lot.
I am assuming that your destination database is brand new DB with no data, and not used by anyone when the transfer happens

It sounds like you have two problems:
You're migrating data from one database to another.
You're changing your schema.
Doing either of these things is tricky if you are trying to migrate the data while people are using the data.
The simplest approach is to migrate the data based on a static copy of the data, and also to queue updates to that data from the moment you captured the static copy. I don't know how easy this is in Access, but in SQLServer or Oracle you can use the redo logs for this or a manual solution using triggers. The poor-man's way of doing this is to make triggers for all the relevant tables that log the primary key of the records that have changed. Then after the old database is shut off you can iterate over those keys and get those records from the old database and put them into the new database. Just copy the whole record; if the record was deleted then delete it from the new database.
Your problem is compounded by the fact that you can't simply copy the data, you have to transform it. This means you probably have to shut down both databases and re-migrate the records based on the change list. It will take a lot of planning to ensure you get things right and I'd recommend writing a testing script that can validate that the resulting data is correct.
Also I'd ensure that the code for the migration runs inside one of the databases if possible. Otherwise you are copying the data twice and this will significantly harm the performance.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas