Read for-each-loop container variable in OLEDB source variable windows - sql

I make one table named QueryTable that store 4 SQL queries each have different meta data
I want to store these four queries result in Excel sheet
First I have taken executable SQL task and configured the connection and Result Set as a Full Result Set, Query statement.
After that open Result Set tab and create Query_variable as a object type.
2) Drag the For-Each_loop container and set Foreach ADO Enumerator in collection part and assign Query_variable
In variable mapping part create new variable as string type to store four queries. Result.
3) Finally add I one data flow task add OLEDB source configure with Same variable (That I have given in for each loop container).
Rightnow it is showing default value what i have given in User::Variable
I can iterate same No of column (Meta-data) queries and store in excel destination
But the Problem is when variable goes to next query that holds lesser or greater no of column.Here package fail cant handle different meta data table
Please assist me ,Can we iterate different meta data queries same time with proper output?
I Hope I have Explain the Problem what i facing exactly

Set the default value of User::Variable to one of the queries, so that BIDS can validate it at design time.
You can also try setting "DelayValidation" to true, but that might not be enough in this case.

Set the delay validation to true for both the data flow and the for each loop container.

Related

SSIS Mapping and Transformation

I'm new to building SSIS packages, in fact this is my first package. I need to pull data from one DB view on Azure managed instance to an SQL on prem. I have built out the data flow and all. I'm moving data from a database view into a another database table but the destination table has a column that the source doesn't have hence my destination mapping view looks like (See attached image) How do I fix this or what are my options?
If this columns needs to stay empty in the source and you don't have it in source your best and only option is leave it like this. It basically needs to ignore it so no information will be fed. That will work.
In case you need information as current date you can add derivied column box in between your source and destination in your Data Flow where you can add current date or more columns that come from variable for example.
Its self explanatory that ignore(optional) means mapping for those columns can be ignored and if you want columns to be mapped with any calculated column you can do it by using derived column SSIS component Reference
As per your use case,try to use OLD DB component instead of ADO.NET component
to optimize performance for a relatively large data set

Azure Data Factory Lookup and For Each

I have a Data Factory Pipeline that I want to have iterate through the rows of a SQL Lookup activity. I have narrowed the query down to three columns and 500 rows.
I understand that to reference a value in the table I use:
#{activity('lookupActivity').output.value[row#].colname}
However, the for each needs to have something to iterate over. My first guess is to set some array variable to the rows of the returned sql query. So what do I set that variable to?
#{activity('lookupActivity').output.value?
Lastly, it looks like almost all data is represented as a json in ADF, is this true? And how could I view the output of this look up as a json so I can understand what my dynamic content needs to look like?
You're right that everything (nearly) is JSON. (Exception: Azure Data Factory v2: Activity execute pipeline output
So you can put your #activity('lookupActivity').output.value which is an array into the foreach activity on the settings tab, like this
Then inside your foreach loop, you reference the current value of one of the columns as #item().colname.
You can use the output value to for each activity and go through one at a time. You can do sequential or parallel depending on your needs.

set variable as today and yesterday in pentaho di

I am creating a transformation in pentaho di to extract data from google analytics. I need to set in "Query Definition" the start date and end date as yesterday and today. I understand this can be done by create two varialbes e.g. ${todsy},${yesterday}. However, I don't know how to set these to change values dynamically at every run. ANy idea on how to do this?
Thanks,
I can think of an easy way to do this. The first thing is that you can't declare and use the variables in the same transformation. I would suggest you to approach this problem in the following way:
Create a transformation before this one, say "set variables transformation". In this transformation you will set the variables.
You can use Get System Info step to set today's and yesterday's dates as the variables. Use copy rows to result step to pass these rows to the next transformation.
In the next transformation, which will be the one you have attached the screenshot of, use the Get Variables step and use these variables in your input step. Or you can use Get rows from result step as well.
You don't need to worry about the dates now, because dates will be generated and your variables get the values dynamically.
You can check this article if you want to learn more about how to pass the values from one transformation to another:
https://anotherreeshu.wordpress.com/2014/12/23/using-copy-rows-to-result-in-pentaho-data-integration/
Hope it helps!
for that, you have to use a job, add the first transformation and inside it use
get system info step then add today's and yesterday's date as a variable, and link to the set variable step. Set the scope of variable as parent job,
in second job use **get variables **.
It took me a while to solve this myself and the way I ended up doing it is as follows:
I created a transformation (called 'set formatted_today variable') the transformation contains two objects:
The transformation contained a 'table input' object with a query like:
select to_char(current_timestamp, 'YYYY-MM-DD-HH-MI') as formatted_today
The output of my 'table input' goes to a 'set variables' object, you can use the 'get fields button to wire the fields you've named in your query to the variable you want to set. In this case, my field is called 'formatted_today' and so is my variable.
In my main job, I have a 'set session variables' object that creates my 'formatted_today' variable.
Immediately after it, I call my 'set formatted_today variable' transformation
Anywhere I need this variable I insert ${formatted_today} in the text

Passing data from one Pentaho transformation to another in a job?

Fairly straightforward question I think, I just haven't been able to find a clear example. I have a very complex transformation that I'm breaking down into a job. Having never created a job before, I'm struggling to send the data from one transformation to another. I used Copy Rows to Result in the first one and Get Rows From Result in the second one, but I feel like I'm still missing something. When I used Get Rows, I had to specify the row names - there was no sort of Get Fields button. I also can't preview the data in the transformation without running the job and having it save to an Excel file. When I did that, ALL of the fields were in the output file -- instead of just the ones I'd specified in the second transformation.
I've searched through the documentation and tried Googling but I can't find a clear walkthrough just on how to smoothly move data from one transformation to another. Any responses would be appreciated even if it's just pointing me towards something I've overlooked.
Thanks!
The most commom way is to use copy rows to result at the end of one KTR and use get rows from result as the starting point for the next one. Though you really can't "see" the result while operating in the next KTR, what you can do to ease the reading is set a preview window and leave it open to see all the columns names and data.
Whoever if you want to set just a few lines of code through to the next KTR you can use Set variables as the ending step of the first KTR and capture those variables at anytime in the second using Get Variables steps. Don't forget that if you do so you need to set the variables in the parent KJB(the Job that called the first KTR) with no Default value, and the Variable scope type of the Set variables step has to be set to Valid in the parent job.
The best way is to create KTR's, run/test each. This way you can examine resulting data and then integrate all individual transformations into the final job.

Get list of columns of source flat file in SSIS

We get weekly data files (flat files) from our vendor to import into SQL, and at times the column names change or new columns are added.
What we have currently is an SSIS package to import columns that have been defined. Since we've assigned the mapping, SSIS only throws up an error when a column is absent. However when a new column is added (apart from the existing ones), it doesn't get imported at all, as it is not named. This is a concern for us.
What we'd like is to get the list of all the columns present in the flat file so that we can check whether any new columns are present before we import the file.
I am relatively new to SSIS, so a detailed help would be much appreciated.
Thanks!
Exactly how to code this will depend on the rules for the flat file layout, but I would approach this by writing a script task that reads the flat file using the file system object and a StreamReader object, and looks at the columns, which are hopefully named in the first line of the file.
However, about all you can do if the columns have changed is send an alert. I know of no way to dynamically change your data transformation task to accomodate new columns. It will have to be edited to handle them. And frankly, if all you're going to do is send an alert, you might as well just use the error handler to do it, and save yourself the trouble of pre-reading the column list.
I agree with the answer provided by #TabAlleman. SSIS can't natively handle dynamic columns (and niether can your SQL destination).
May I propose an alternative? You can detect a change in headers without using a C# Script Tasks. One way to do this would be to create a flafile connection that reads the entire row as a single column. Use a Conditional Split to discard anything other than the header row. Save that row to a RecordSet object. Any change? Send Email.
The "Get Header Row" DataFlow would look like this. Row Number if needed.
The Control Flow level would look like this. Use a ForEach ADO RecordSet object to assign the header row value to an SSIS variable CurrentHeader..
Above, the precedent constraints (fx icons ) of
[#ExpectedHeader] == [#CurrentHeader]
[#ExpectedHeader] != [#CurrentHeader]
determine whether you load data or send email.
Hope this helps!
i have worked for banking clients. And for banks to randomly add columns to a db is not possible due to fed requirements and rules. That said I get your not fed regulated bizz. So here are some steps
This is not a code issue but more of soft skills and working with other teams(yours and your vendors).
Steps you can take are:
(1) reach a solid columns structure that you always require. Because for newer columns older data rows will carry NULL.
(2) if a new column is going to be sent by the vendor. You or your team needs to make the DDL/DML changes to the table were data will be inserted. Ofcouse of correct data type.
(3) document this change in data dictanary as over time you or another member will do analysis on this data and would like to know what is the use of each attribute or column.
(4) long-term you do not wish to keep changing table structure monthly because one of your many vendors decided to change the style the send you data. Some clients push back very aggresively other not so much.
If a third-party tool is an option for you, check out CozyRoc's Data Flow Task Plus. It handles variable columns in sources.
SSIS cannot make the columns dynamic,
one thing, i always do, is use a script task to read the first and last lines of a file.
if it is not an expected list of csv columns i mark file as errored and continue/fail as required.
Headers are obviously important, but so are footers. Files can through any unknown issue be partially built. Requesting the header be placed at the rear of the file it is a double check.
I also do not know if SSIS can do this dynamically, but it never ceases to amaze me how people add/change order of columns and assume things will still work.
1-SSIS Does not provide dynamic source and destination mapping.But some third party component such as Data flow task plus , supporting this feature
2-We can achieve this using ssis script task.
3-If the Header is correct process further for migration else fail the package before DFT execute.
4-Read the line from the header using script task and store in array or list object
5-Then compare those array values to user defined variables declare earlier contained default value as column name.
6-If values are matching exactly then progress further else fail it.