Azure Data Factory Lookup and For Each

Azure Data Factory Lookup and For Each - azure-data-factory-2

I have a Data Factory Pipeline that I want to have iterate through the rows of a SQL Lookup activity. I have narrowed the query down to three columns and 500 rows.
I understand that to reference a value in the table I use:
#{activity('lookupActivity').output.value[row#].colname}
However, the for each needs to have something to iterate over. My first guess is to set some array variable to the rows of the returned sql query. So what do I set that variable to?
#{activity('lookupActivity').output.value?
Lastly, it looks like almost all data is represented as a json in ADF, is this true? And how could I view the output of this look up as a json so I can understand what my dynamic content needs to look like?

You're right that everything (nearly) is JSON. (Exception: Azure Data Factory v2: Activity execute pipeline output
So you can put your #activity('lookupActivity').output.value which is an array into the foreach activity on the settings tab, like this
Then inside your foreach loop, you reference the current value of one of the columns as #item().colname.

You can use the output value to for each activity and go through one at a time. You can do sequential or parallel depending on your needs.

Related

Pass Multiple Value to a For Each in Azure Data Factory

Can someone let me know if its possible to pass a parameter and an activity to a For Each in Azure Data Factory.
From the image I want to pass the parmater 'relativeURLs' into a For Each.
I would then like to do a For Each on the Lookup activity 'CompanyId Lookup
Is that possible?

I am not very confident if I get the ask correctly .
I would then like to do a For Each on the Lookup activity 'CompanyId
Lookup Is that possible?
This should go in the Foreach
#activity('your lookup activity name ').output.value
Since the relative url is a parameter , you can reference that inside the FE loop

Here is the procedure to Pass Multiple Value to a For Each in Azure Data Factory.
create Linked service and dataset.
Create parameter of relativeURL with respective values
Read the data by the Lookup activity.
#range(0,length(pipeline().parameters.relativeURL))
using two values inside Foreach using their indexes.
In ForEach, check the Sequential
Create variables for different values.
The value of the set variable from the lookup activity.
#string(activity('Lookup1').output.value[0].data[item()])
Value for this set variable from the pipeline parameters of the relativeURL
#pipeline().parameters.relativeURL[item()]

Read for-each-loop container variable in OLEDB source variable windows

I make one table named QueryTable that store 4 SQL queries each have different meta data
I want to store these four queries result in Excel sheet
First I have taken executable SQL task and configured the connection and Result Set as a Full Result Set, Query statement.
After that open Result Set tab and create Query_variable as a object type.
2) Drag the For-Each_loop container and set Foreach ADO Enumerator in collection part and assign Query_variable
In variable mapping part create new variable as string type to store four queries. Result.
3) Finally add I one data flow task add OLEDB source configure with Same variable (That I have given in for each loop container).
Rightnow it is showing default value what i have given in User::Variable
I can iterate same No of column (Meta-data) queries and store in excel destination
But the Problem is when variable goes to next query that holds lesser or greater no of column.Here package fail cant handle different meta data table
Please assist me ,Can we iterate different meta data queries same time with proper output?
I Hope I have Explain the Problem what i facing exactly

Set the default value of User::Variable to one of the queries, so that BIDS can validate it at design time.
You can also try setting "DelayValidation" to true, but that might not be enough in this case.

Set the delay validation to true for both the data flow and the for each loop container.

Pentaho - Having multiple Copy rows to result results in Get rows from result empty

I'm trying to process some data and store it in a datawarehouse. For doing it, I wanted to store dimensions in one transformation and fact (only have one) in another transformation. So I can use a job for execute the first one, copy rows to result and get them into the second transformation.
In the first transformation, I read some Excel file and separate this data into some streams. It is data from a baptism, so I have one stream for the person, another one for parents, another one for sponsors, and so on... At the end of each stream, I insert data into database and return PK autogenerated (it is an id autoincrement).
In the second one, I only have Get rows from result and want to set them into a txt file (just for see it is been done correctly). The problem is that the file is created but it is empty. I suppose that if I let fields in Get rows from result empty, it gets all fields.
What am I doing wrong?
At the end what I want is to have one Copy rows to result at the end of each stream in the first transformation and get all this data in the second one.
In "Insert Pare Padrina" I return id_pare_padrina which is autogenerated, and the same with "Insert Mare Padrina" (I have more streams which I also have to include them into result). This transformation is not executed per row because I need values of other rows.
Thank you!

In order to pass the data from the first transformation to the second transformation, you need to set certain parameters like:
1. First of all, in the transformation settings of the second transformation (at the Job Level), check on the items as image below:
Copy Previous results to parameters will ensure that all the results/data in the "Copy Rows to Result" step is getting properly passed to the next level.
Execute for every input row : will execute the second transformation for every rows in the first transformation file. This is optional based on your requirement.
2. In the same transformation settings, define the "Parameters" in the Parameters tabs. Check the image below:
Here, NAME is the parameter i have defined. So when you are using the "Get rows from result", you can define these parameter names.
3. Instead of using "Get rows from result", you can alternately use "Get Variables" step to fetch all the variables coming from the previous step. All you need to do is to define the parameter names inside the ktr file (CTRL + T). (Actually i have practically implemented in that fashion and it worked for me.)
4. Since "copy rows to result" step uses heap memory, defining multiple instances of this step might exhaust the memory space quickly and your code might fall in trouble. Ideally use a single instance of this step.
But if your data interation is only one row, best option would be to use "set variables" step.
I assume you might have missed some of these sections in the job.
You can read more on copy rows to result in here.
Hope it helps :)

Adding two extra columns to input data - Pentaho Kettle

I am working on a transformation step for Pentaho Kettle. It selects several input columns and based on that adds two new columns during transformation. I am unable to understand (based on code from other plugins), how I can add the two new columns so that 1) steps downstream are aware of these columns and 2) i can push the transformed data into these columns.
Thanks in advance.

You might need to override meta.getStepFields() to add new ValueMetaInterface objects to the RowMetaInterface passed in. This is the standard way to add columns at runtime; however, the row's metadata (i.e. list of ValueMetaInterface objects) must be the same from row to row or else the next step in your transformation will complain.
Often when doing data-driven custom plugins, you consume as many rows as you need (using getRow()) in order to figure out what the outgoing row format/metadata will be, then you can construct a RowMetaInterface (usually using meta.getStepFields()) that will be passed into the putRow() call. If you intend to pass through the incoming fields, do something like:
RowMetaInterface outputRowMeta = getInputRowMeta().clone();
If you're creating new rows use this:
RowMetaInterface outputRowMeta = new RowMeta();
Either way when you call meta.getStepFields(outputRowMeta, ...) it should populate outputRowMeta with the appropriate fields, by adding/changing/removing ValueMetaInterface objects from outputRowMeta.
I've got a blog post using Groovy to add/replace fields in the incoming rows here:
http://funpdi.blogspot.com/2014/10/flatten-json-to-key-value-pairs-in-pdi.html
Not sure if that is similar to your use case or not. If you have more questions, feel free to find me on IRC at ##pentaho (my nick is usually mburgess_pdi)

IF i have understood your question correctly, i think you are trying to create an output file with dynamic column. So you can do this by checking on the "fast dumping" option in Text File Output Step. While doing so , donot define any column names in the "Fields" tab
Check my image below:
Hope it helps :)

Read column values to variable in SSIS using for each loop

I want to read the resultset of a table using the following stmt:
Select col1 as A,col2 as B from tablename;
Then, I want to read each row of the result set into local variables of the SSIS package and for each row I have to pass the values to the script task.
I want to use foreach loop in SSIS. I took Foreach Item Enumerator.
The question: How to read the values into the variable using the For each Item enumerator and how can i iterator can i use select count(*) from table; pass that value to a variable and asssign the count value in the foreach loop.
I'm stuck at how to assign the count value and read columns to variables. Can anyone help with these?
Thanks in advance.

I'm not exactly sure what it is that you're trying to do, but it would seem that you're trying to process data in your control flow. The foreach iterator is not made for processing data sets, it's made for iterating over multiple data sets and doing something to each of them, usually passing them to a data flow.
You might find it more useful to create a data flow. Start with a data source component that gets the data that you want and then pass the data to a Script Component to do the processing.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas