SSIS 2012 package to import excel data to database table - excel-2007

I want to make a SSIS package that will load data from an excel file to a database table.
I have already made a package that completes the task, but the table needs to be recreated every time the excel data is loaded because the excel data and its column definition changes every month. If the table is not created with every execution there will be errors and my task will not be complete because excel data will be loaded under a wrong column definition.
Is there any way to dynamically drop and create the table every time?

SSIS generates a lot of metadata behind the scenes which describe your source and destination fields and mappings. If it feels it isn't talking to the same data source/destination it will often throw a validation error and refuse to start.
To an extent you can mitigate this by setting DelayValidation to True on the connection manager properties. However this is unlikely to help in your case as your data spec genuinely is changing.
A further option to get around the tight controls on data format is to write your own custom source/transformation/destination logic in script tasks. You can write a script task to read the format of the incoming Excel file, and save those details to variables and pass them into an Execute SQL Task to create the table. Then you can write a script task to dynamically map your data to some generic intermediate columns that exist within your package. Finally you can write a script task destination to load that data into the newly created table.
Basically you're bypassing all out of the box SSIS functionality and writing your own integration solution from the ground up, but it seems you don't have much choice.

Related

Extract ETL process of existing database

I have a pretty complex database, I have both its source code and access to it.
My job is to answer the question ‘from what file, given column is loaded?’. Currently what I do is just go through code, from target column to source file. I wonder if there is any tool that would make it automatic or generate it based on SQL?
I'm using SQL Server.

Excel sheet to SQL table upload automation

I am trying to find the easiest and simplest and quickest way to upload a sheet from Excel to a table in SQL Server 2012 automatically every morning as a job from a location on my folder to the table.
SSIS is the ETL tool you could use, but if it’s a very simple job you can just write a BCP command.
https://learn.microsoft.com/en-us/sql/tools/bcp-utility?view=sql-server-2017
The way the schedule it is to add the task to the agent job on the server. A few things to bear in mind with ETL:
Will your file be named the same each day?
Do you need to retain archived versions of the file?
How do you do error handling if it’s absent or malformed?
Does the DDL need to change periodically to accomodate new date ranges (I.e. new day/month year)
Will this pattern be reused in the future?
Do you need to test logically (duplicates/logical fallacies/referential integrity etc)?
Under whose account will the job run (hint, don’t use your own - get a service account)?
The more complex the answers are to these types of questions the more likely you’ll need a real ETL tool like SSIS

How can I Snapshot a database without losing undeleted data?

We have a shop floor database OPERATION that replicates selected data to a database BUSINESS that is used for reporting. The data in OPERATION is deleted daily by the third-party shop floor application so in order to retain the data on BUSINESS I've set the Article Property for DELETE delivery format to be Do not replicate DELETE statements.
This works well, but occasionally somebody wants something extra/different to be replicated. Depending on the nature of the change to the Publication it may prompt for Reinitialization of the snapshot which would of course blow away the database on BUSINESS (as I sadly did one day).
What's the best way around this?
I would suggest you implement an ETL process instead of replication.
You can use SSIS to extract data out of OPERATION database and copy it to BUSINESS database. In the SSIS package you have full control over the logic. For example, you can append the data to existing data in BUSINESS. You can use MERGE, to insert new records and modify existing ones (this way it would be safe to run it repeatedly as the unchanged data would not be overwritten).
If someone requests additional data, you would just wrote a new SSIS package to transfer additional data without affecting your main process.
SSIS can be scheduled to run from a SQL agent job (use dtexec for example).

SQL: Automatically copy records from one database to another database

I am trying to find out an ideal way to automatically copy new records from one database to another. the databases have different structure! I achieved it by writing VBS scripts which copy the data from one to another and triggered the scripts from another application which passes arguments to the script. But I faced issues at points where there were more than 100 triggers. i.e. 100wscript processes trying to access the database and they couldn't complete the task.
I want to find out a simpler solution inside SQL, I read about setting triggers, Stored procedure and running them from SQL agent, replication etc. The requirement is that I have to copy records to another database periodically or when there is a new record into another database.
Which method will suit me the best?
You can use CDC to do this activity. Create a SSIS package using CDC and run that package periodically through SQL Server Agent Job. CDC will store all the changes of that table and will do all those changes to the destination table when you run the package. Please follow the below link.
http://sqlmag.com/sql-server-integration-services/combining-cdc-and-ssis-incremental-data-loads
The word periodically in your question suggests that you should go for Jobs. You can schedule jobs in SQL Server using Sql Server agent and assign a period. The job will run your script as per assigned frequency.
PrabirS: Change Data Capture
This is a good option. Because it uses the truncation-log to create something similar to the Command Query Segregation Pattern (CQRS).
Alok Gupta: A SQL Job that runs in the SQL Agent
This too is a good option, given that you have something like a modified date thus you can filter the altered data. You can create a Stored Procedure and let it run regularly in the SQL Agent.
A third option could be triggers (the change will happen in the same transaction).
This option is useful for auditing and logging. But you should definitely avoid writing business logic in triggers, as triggers are more or less hidden and occur without directly calling them (similar to CDC actually). I have actually created a trigger about half a year ago that captured the data and inserted it somewhere else in xml-format as the columns in the original table could change over time (multiple projects using the same database(s)).
-Edit-
By the way, your question more or less suggest a lack of a clear design pattern and that the used technique is not the main problem. You could try to read how an ETL-layer is build, or try to implement a "separations of concerns". Note; it is hard to tell if this is the case, but given how you formulated your question, an unclear design is something that pops up in my mind as possible problem.

How to load from multiple sources to multiple destination dynamically

I have more than hundred tables in a linked server (lets say on a sql server 1). I have to perform an initial load, basically a simple dump, by creating duplicate copy of those hundred tables in to sql server 2 destination. I know how to perform data flow task in SSIS to extract data from a source and load it in a destination (creating a table in the destination as well). With more than hundred tables, I would need to create more than hundred data flow tasks which is very time consuming. So I have heard about copying files from source to destination dynamically by looping through and creating variables. Now, how do I do this? Remeber, those hundred tables do no contain similar structure. How can I perform this initial load faster without using multiple data flow task in SSIS. Please, help! Thank you!
I would use the Import and Export Wizard - it can do all those tasks you described in one pass. You just tell it your Source and Destination and check the tables you want. It generates an SSIS package, which you can save and customize if you want.
It's easiest to find via the Windows Start Menu, under SQL Server [version] / Import and Export Data.
If you would like to automate transferring data from 100+ tables, I would consider using BIML. BIML is a script language that enables you to generate SSIS packages based on the template you define. This template in your case may include the creation of the tables (if they do not exist) and the mapping / copying of the source. You can then wrap the resulting SSIS packages inside another BIML Master package.
It can be a little clunky if you are not using MIST, but its incredibly powerful once you get into it. A good starting point for you would be Andy Leonard's Stairway to heaven series as it provides a step-by-step walk through of moving data from source to target. After the stairway guide, check out BIML Script