accept parameters from sql table in adf pipeline - azure-sql-database

I have a request to select the values from a sql table and pass it to adf pipeline
example:
sql table-->abc with columns(col1,col2,col3,col4,col5)
I need to pass the col1 to col5 values as input to a pipeline abc.
how can we do that please suggest or if it is feasible solution.

You can use the below flow to achieve the same:
Lookup activity: this would include your select query
After lookup activity >>> Execute pipeline activity wherein you would pass the lookup activity output values as input to the pipeline
#activity('LookupActivityNm').output.value[0].col1
#activity('LookupActivityNm').output.value[0].col2

Related

How do i write a spark sql query which doesn't fail if a field is not present in the json data

I am writing a Spark SQL job to process JSON data daily. Someday, a field is not present at all in the JSON. In this case, a defined column is not present in inferred schema from JSON. Is there a way to get around this.
select col1, col2, col3.data from table1
When col3.data is not present, the job fails. How to gracefully handle this.
I tried using below to handle this:
* COALESCE(col3.data, 'dummyData')
This works for NULL value of data not when data field itself is not present.

Why SQL Sink TableName = "DefaultTableName" and not the value supplied in the parameter

I am moving data from the Dataverse/Data Lake into an Azure SQL Server. I have a parameter defined for the TableName.
The Source uses the parameter to read data from the required table, but not the Sink.
Here my parameters are defined and are being passed down from the Pipeline.
This is the Sink
And the Sink Settings using the Dataset Parameter
I get a table created called [landing].[DefaultTableName] no matter what I set as my TableName parameter
Here is a working example,
DataFlow with dynamic SQL sink:
Parameterized table name in sink Dataset:
Pipeline Setup and new parameter value input:
Execute:
Conclusions:
What you are trying to do is pass the Dataflow parameter value to Sink dataset parameter, which is not possible. You can only input values to Dataflow parameters from pipeline and not the other way around. And you cannot refer to DataFlow parameter in Dataset cause all are created in different scopes.
DataFlow Parameter:

Transform Data using Stored Procedure in Azure Data Factory

I'm trying to write a pipeline using Azure Data Factory v2. The data comes from Azure Table Storage, which I then need to transform by retrieving more data from Azure SQL and then need to send to another SQL Database for Insertion.
What I have in mind is to use:
Lookup from Table Storage -> For Each Row -> Execute SP -> Append Data to Lookup Output -> Execute SP to insert in another SQL.
I am not sure if I what I want to achieve is doable with Data Factory or if i'm even approaching this from the right angle.
I don't understand what you mean with the step "Append data to Lookup Output". You cannot add data to the result of a lookup activity.
What you can do is store that output in a table at the first Azure SQL, and perform another Lookup to grab all the data.
Hope this helped!

pentaho set variable to jobs

I am new to pentaho. I have a Job with 3 transformation and all 3 transformation are simliar . In each transformation has Sql query something like
select * from table1 where tabl1.col1='XXX' and tab2.col2='YYYY'
value of col1 remains same. I want to pass it as variable in job instead of replacing it in each transformation . What are the steps to do that.
you split your transformation in two:
one for setting the var
the second for using set variable from step #1.
Please refer to Pentaho Documentation available online :
http://wiki.pentaho.com/display/COM/Using+Variables+in+Kettle
http://wiki.pentaho.com/display/EAI/Set+Variables
You cannot use variables in the same transformation where you set them.

Pentaho Data Integration (PD)I: After Selecting records I need to update the field value in the table using pentaho transforamtion

Have a requirement to create a transformation where I have to run a select statement. After selecting the values it should update the status, so it doesn't process the same record again.
Select file_id, location, name, status
from files
OUTPUT:
1, c/user/, abc, PROCESS
Updated output should be:
1, c/user/, abc, INPROCESS
Is it possible for me to do a database select and cache the records so it doesn't reprocess the same record again in a single transformation in PDI? So I don't need to update the status in the database. Something similar to dynamic lookup in Informatica. If not what's the best possible way to update the database after doing the select.
Thanks, that helps. You wouldn't do this in a single transformation, because of the multi-threaded execution model of PDI transformations; you can't count on a variable being set until the transform ends.
The way to do it is to put two transformations in a Job, and create a variable in the job. The first transform runs your select and flows the result into a Set Variables step. Configure it to set the variable you created in your Job. Next you run the second transform which contains your Excel Input step. Specify your Job level variable as the file name.
If the select gives more than one result, you can store the file names in the Jobs file results area. You do this with an Set files in result step. Then you can configure the job to run the second transform once for each result file.