SSIS Pass Datasource Between Control Flow Tasks - sql

I'm having troubles solving this little problem, hopefully someone can help me.
In my SSIS package I have a data flow task.
There's several different transforms, merges and conversions that happen in here.
At the end of the dataflow task, there is two datasets, one that contains two numbers that need to be compared, and another dataset that contains a bunch of records.
Ideally, I would like to have these passed onto a whole new data flow task (separate sequence container) where I can do some validation work on it and separate the logic.
I cant for the life of me figure out how to do it. Iv tried looking into scripting and storing the datasets as variables, but I'm not sure this is the right way to do it.
The next step is to export the large dataset as a spreadsheet, but before this happens i need to compare the two numbers from the other dataset and ensure they're correct.

To pass data flowing in one dataflow to another, You have to have a temporary location.
This means that You have to put data in destination in one dataflow and then read that data in another dataflow.
You can put data in number of destinations:
database table
raw file
flat file
dataset variable (recordset destination)
any other destination component that you can read from with corresponding source component or by writing script or whatever
Raw files are meant to be used for cases like this. They are binary and as such they are extremely fast to write to and read from.
In case You insist to use recordset destination take a look at http://consultingblogs.emc.com/jamiethomson/archive/2006/01/04/SSIS_3A00_-Recordsets-instead-of-raw-files.aspx because there is no recordset source component.

A Data Flow Task needs to have a destination; a Data Flow Task likewise is NOT a destination. Otherwise the data doesn't go anywhere in the pipeline. From my experience, your best bets are to:
1) Pump the data into staging tables in SQL Server, and then pick up the validations from there.
2) Do the validations in the same Data Flow task.

Related

SSIS Mapping and Transformation

I'm new to building SSIS packages, in fact this is my first package. I need to pull data from one DB view on Azure managed instance to an SQL on prem. I have built out the data flow and all. I'm moving data from a database view into a another database table but the destination table has a column that the source doesn't have hence my destination mapping view looks like (See attached image) How do I fix this or what are my options?
If this columns needs to stay empty in the source and you don't have it in source your best and only option is leave it like this. It basically needs to ignore it so no information will be fed. That will work.
In case you need information as current date you can add derivied column box in between your source and destination in your Data Flow where you can add current date or more columns that come from variable for example.
Its self explanatory that ignore(optional) means mapping for those columns can be ignored and if you want columns to be mapped with any calculated column you can do it by using derived column SSIS component Reference
As per your use case,try to use OLD DB component instead of ADO.NET component
to optimize performance for a relatively large data set

Excel to CSV Plugin for Kettle

I am trying to develop a reusable component in Pentaho which will take an Excel file and convert it to a CSV with an encoding option.
In short, I need to develop a transformation that has an Excel input and a CSV output.
I don't know the columns in advance. The columns have to be dynamically injected to the excel input.
That's a perfect candidate for Pentaho Metadata Injection.
You should have a template transformation wich contains the basic workflow (read from the excel, write to the text file), but without specifiying the input and/or output formats. Then, you should store your metadata (the list of columns and their properties) somewhere. In Pentaho example an excel spreadsheet is used, but you're not limited to that. I've used a couple of database tables to store the metadata for example, one for the input format and another one for the output format.
Also, you need to have a transformation that has the Metadata Injection step to "inject" the metadata into the template transformation. What it basically does, is to create a new transformation at runtime, by using the template and the fields you set to be populated, and then it runs it.
Pentaho's example is pretty clear if you follow it step by step, and from that you can then create a more elaborated solution.
You'll need at least two steps in a transformation:
Input step: Microsoft Excel input
Output step: Text file output
So, Here is the solution. In your Excel Input Component, in Fields Section, mention maximum number of fields which will come in any excel. Then Route the Input excel to text field based on the Number of fields which are actually present. You need to play switch/case component here.

Reading metadata CSV from a datalake, too big for a lookup activity

I need to create a pipeline to read CSVs from a folder, load from Row 8 into an Azure SQL table, Frist 5 rows will go into a different table ([tblMetadata]).
So far I have done it using Lookup Activity, works fine, but one of the files is bigger than 6 MB and it fails.
I checked all options in Lookup, read everything about Copy Activity (which I am using to load main data - skip 7 rows). The pipeline is created using GUI.
The output from the Lookup is used as parameters for a Stored Procedure to insert into tblMetadata
Can someone advise me how to deal with this? At the moment I am on the training, no one can help me on site.
You could probably do this with a single Data Flow activity that has a couple of transformations.
You would use a Source transformation that reads from a folder using folder paths and wildcards, then add a conditional split transformation to send different rows to different sinks.
I did workaround in different way, modified CSVs that are bing imported to have whole Metadata in the first row (as this was part of my different project). Then used FirstRow only in Lookup.

Passing data from one Pentaho transformation to another in a job?

Fairly straightforward question I think, I just haven't been able to find a clear example. I have a very complex transformation that I'm breaking down into a job. Having never created a job before, I'm struggling to send the data from one transformation to another. I used Copy Rows to Result in the first one and Get Rows From Result in the second one, but I feel like I'm still missing something. When I used Get Rows, I had to specify the row names - there was no sort of Get Fields button. I also can't preview the data in the transformation without running the job and having it save to an Excel file. When I did that, ALL of the fields were in the output file -- instead of just the ones I'd specified in the second transformation.
I've searched through the documentation and tried Googling but I can't find a clear walkthrough just on how to smoothly move data from one transformation to another. Any responses would be appreciated even if it's just pointing me towards something I've overlooked.
Thanks!
The most commom way is to use copy rows to result at the end of one KTR and use get rows from result as the starting point for the next one. Though you really can't "see" the result while operating in the next KTR, what you can do to ease the reading is set a preview window and leave it open to see all the columns names and data.
Whoever if you want to set just a few lines of code through to the next KTR you can use Set variables as the ending step of the first KTR and capture those variables at anytime in the second using Get Variables steps. Don't forget that if you do so you need to set the variables in the parent KJB(the Job that called the first KTR) with no Default value, and the Variable scope type of the Set variables step has to be set to Valid in the parent job.
The best way is to create KTR's, run/test each. This way you can examine resulting data and then integrate all individual transformations into the final job.

Get list of columns of source flat file in SSIS

We get weekly data files (flat files) from our vendor to import into SQL, and at times the column names change or new columns are added.
What we have currently is an SSIS package to import columns that have been defined. Since we've assigned the mapping, SSIS only throws up an error when a column is absent. However when a new column is added (apart from the existing ones), it doesn't get imported at all, as it is not named. This is a concern for us.
What we'd like is to get the list of all the columns present in the flat file so that we can check whether any new columns are present before we import the file.
I am relatively new to SSIS, so a detailed help would be much appreciated.
Thanks!
Exactly how to code this will depend on the rules for the flat file layout, but I would approach this by writing a script task that reads the flat file using the file system object and a StreamReader object, and looks at the columns, which are hopefully named in the first line of the file.
However, about all you can do if the columns have changed is send an alert. I know of no way to dynamically change your data transformation task to accomodate new columns. It will have to be edited to handle them. And frankly, if all you're going to do is send an alert, you might as well just use the error handler to do it, and save yourself the trouble of pre-reading the column list.
I agree with the answer provided by #TabAlleman. SSIS can't natively handle dynamic columns (and niether can your SQL destination).
May I propose an alternative? You can detect a change in headers without using a C# Script Tasks. One way to do this would be to create a flafile connection that reads the entire row as a single column. Use a Conditional Split to discard anything other than the header row. Save that row to a RecordSet object. Any change? Send Email.
The "Get Header Row" DataFlow would look like this. Row Number if needed.
The Control Flow level would look like this. Use a ForEach ADO RecordSet object to assign the header row value to an SSIS variable CurrentHeader..
Above, the precedent constraints (fx icons ) of
[#ExpectedHeader] == [#CurrentHeader]
[#ExpectedHeader] != [#CurrentHeader]
determine whether you load data or send email.
Hope this helps!
i have worked for banking clients. And for banks to randomly add columns to a db is not possible due to fed requirements and rules. That said I get your not fed regulated bizz. So here are some steps
This is not a code issue but more of soft skills and working with other teams(yours and your vendors).
Steps you can take are:
(1) reach a solid columns structure that you always require. Because for newer columns older data rows will carry NULL.
(2) if a new column is going to be sent by the vendor. You or your team needs to make the DDL/DML changes to the table were data will be inserted. Ofcouse of correct data type.
(3) document this change in data dictanary as over time you or another member will do analysis on this data and would like to know what is the use of each attribute or column.
(4) long-term you do not wish to keep changing table structure monthly because one of your many vendors decided to change the style the send you data. Some clients push back very aggresively other not so much.
If a third-party tool is an option for you, check out CozyRoc's Data Flow Task Plus. It handles variable columns in sources.
SSIS cannot make the columns dynamic,
one thing, i always do, is use a script task to read the first and last lines of a file.
if it is not an expected list of csv columns i mark file as errored and continue/fail as required.
Headers are obviously important, but so are footers. Files can through any unknown issue be partially built. Requesting the header be placed at the rear of the file it is a double check.
I also do not know if SSIS can do this dynamically, but it never ceases to amaze me how people add/change order of columns and assume things will still work.
1-SSIS Does not provide dynamic source and destination mapping.But some third party component such as Data flow task plus , supporting this feature
2-We can achieve this using ssis script task.
3-If the Header is correct process further for migration else fail the package before DFT execute.
4-Read the line from the header using script task and store in array or list object
5-Then compare those array values to user defined variables declare earlier contained default value as column name.
6-If values are matching exactly then progress further else fail it.