HTTP request in Azure Data Factory - http-headers

In Azure Data Factory, I need to tap into a HTTP requests via URL using the HTTP connector. I was able to do this as well as setup the dataset. Where I'm having issues is on the pipeline. Here's what I need to do. What is the best way to accomplish this?
Call out to the service base URL and retrieve the header returned of TotalPages.
Using the value for TotalPages, make subsequent requests to the URL with the parameter page (e.g., page=1, page=2, etc.) using the value from TotalPages to form those requests.
Thanks.

Ok. So the issue here is that you cannot nest control structures in Data Factory more than 1 time. The solution is to create two or more pipelines (aka Master and Child).
From the Master pipeline retrieve the number of tasks you will need to execute, and pass them to a for loop. Within the for loop launch for each activity pair a new Child pipeline which will then execute the second activity.
If the Activity is simple enough you can skip the Child Pipeline altogether and do it directly inside the first for loop.
As a Json representation of pipelines in question it should look along these lines:
{
"name": "generic_master",
"properties": {
"activities": [
{
"name": "Web1",
"type": "WebActivity",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"url": "https://jsonplaceholder.typicode.com/posts/1",
"method": "GET"
}
},
{
"name": "ForEach1",
"type": "ForEach",
"dependsOn": [
{
"activity": "Web1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "#activity('Web1').output",
"type": "Expression"
},
"activities": [
{
"name": "Execute Pipeline1",
"type": "ExecutePipeline",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"pipeline": {
"referenceName": "generic_child",
"type": "PipelineReference"
},
"waitOnCompletion": true
}
}
]
}
}
],
"annotations": []
}
}
{
"name": "generic_child",
"properties": {
"activities": [
{
"name": "Web1",
"type": "WebActivity",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"url": "https://jsonplaceholder.typicode.com/posts/1",
"method": "POST"
}
}
],
"annotations": []
}
}

In order to read the TotalPages values from the HTTP Request's response, you can use a "Lookup" activity to submit the HTTP request and store the TotalPages value in a variable with the "Set variable" activity.
Actions:
Pipeline level:
create a variable called TotalPages
Lookup activity:
tick the first row only box on the Settings tab
As a source dataset, use the data set defined for your HTTP request
Select the GET method.
Set variable activity:
Select the TotalPages variable on the Variables tab
In the value box, click on "Add dynamic content" and enter something like this: #{activity('GetTotalPages').output.firstRow.RegisterSearch['#TotalPages']}
In my case, the lookup activity is called GetTotalPages, and my HTTP request returns the total number of pages in a RegisterSearch array, under a column name #TotalPages

Related

REST dataset for Copy Activity Source give me error Invalid PaginationRule

My Copy Activity is setup to use a REST Get API call as my source. I keep getting Error Code 2200 Invalid PaginationRule RuleKey=supportRFC5988.
I can call the GET Rest URL using the Web Activity, but this isn't optimal as I then have to pass the output to a stored procedure to load the data to the table. I would much rather use the Copy Activity.
Any ideas why I would get an Invalid PaginationRule error on a call?
I'm using a REST Linked Service with the following properties:
Name: Workday
Connect via integration runtime: link-unknown-self-hosted-ir
Base URL: https://wd2-impl-services1.workday.com/ccx/service
Authentication type: Basic
User name: Not telling
Azure Key Vault for password
Server Certificate Validation is enabled
Parameters: Name:format Type:String Default value:json
Datasource:
"name": "Workday_Test_REST_Report",
"properties": {
"linkedServiceName": {
"referenceName": "Workday",
"type": "LinkedServiceReference",
"parameters": {
"format": "json"
}
},
"folder": {
"name": "Workday"
},
"annotations": [],
"type": "RestResource",
"typeProperties": {
"relativeUrl": "/customreport2/company1/person%40company.com/HIDDEN_BI_RaaS_Test_Outbound"
},
"schema": []
}
}
Copy Activity
{
"name": "Copy Test Workday REST API output to a table",
"properties": {
"activities": [
{
"name": "Copy data1",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "RestSource",
"httpRequestTimeout": "00:01:40",
"requestInterval": "00.00:00:00.010",
"requestMethod": "GET",
"paginationRules": {
"supportRFC5988": "true"
}
},
"sink": {
"type": "SqlMISink",
"tableOption": "autoCreate"
},
"enableStaging": false
},
"inputs": [
{
"referenceName": "Workday_Test_REST_Report",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "Destination_db",
"type": "DatasetReference",
"parameters": {
"schema": "ELT",
"tableName": "WorkdayTestReportData"
}
}
]
}
],
"folder": {
"name": "Workday"
},
"annotations": []
}
}
Well after posting this, I noticed that in the copy activity code there is a nugget about "supportRFC5988": "true" I switched the true to false, and everything just worked for me. I don't see a way to change this in the Copy Activity GUI
Editing source code and setting this option to false helped!

Karate For loop to get ids based on pattern and then use a delete feature

I have response from an API call that gives me a list of devices each with an id. Some of these devices are test devices with the id starting with the prefix 'Test' Example Test319244.
I wish to only retrieve those ids with the prefix 'Test', may be in an array and be able to pass them to another feature file which takes the device ID as the parameter to delete it. Basically I want to delete all the testdevices.
Here is the sample response that contains all the device IDs
{
"items": [
{
"deviceId": "004401784033074000",
"deviceType": "AVMAP_TMR",
"disabled": false,
"metadata": {
"createdAt": "2020-07-20T00:00:00.000+00:00",
"modifiedAt": "2020-07-20T00:00:00.000+00:00"
}
},
{
"deviceId": "Test319246",
"deviceType": "AVMAP_TMR",
"disabled": false,
"metadata": {
"createdAt": "2020-07-21T00:00:00.000+00:00",
"modifiedAt": "2020-07-21T00:00:00.000+00:00"
}
},
{
"deviceId": "Test319245",
"deviceType": "AVMAP_TMR",
"disabled": false,
"metadata": {
"createdAt": "2020-07-21T00:00:00.000+00:00",
"modifiedAt": "2020-07-21T00:00:00.000+00:00"
}
},
{
"deviceId": "Test319244",
"deviceType": "AVMAP_TMR",
"disabled": false,
"metadata": {
"createdAt": "2020-07-21T00:00:00.000+00:00",
"modifiedAt": "2020-07-21T00:00:00.000+00:00"
}
},
{
"deviceId": "command-service",
"deviceType": "service",
"disabled": false,
"metadata": {
"createdAt": "2020-07-20T00:00:00.000+00:00",
"modifiedAt": "2020-07-20T00:00:00.000+00:00"
}
},
{
"deviceId": "kafka-connect-all",
"deviceType": "kafka-connect",
"disabled": false,
"metadata": {
"createdAt": "2020-07-20T00:00:00.000+00:00",
"modifiedAt": "2020-07-20T00:00:00.000+00:00"
}
}
],
"metadata": {
"pagination": {
"limit": 50,
"offset": 0,
"previousOffset": 0,
"nextOffset": 0,
"totalCount": 15
},
"sortedBy": [
{
"field": "deviceId",
"order": "ASC"
}
]
}
}
Here in the above example I only want to delete the devices with ids - Test319244,Test319245 and Test319246
How can I get an array of ids based on the pattern(Testxxxxxx) and pass that on to another feature file
I need help to define an array of ids like:
* def ids = extract the ids based on the pattern
# pass the ids to the delete feature which would send the id one at a time and delete the device.
* def delete = call(delete.feature) ids
This is how the delete scenario feature file looks:
Scenario: Delete Device
# device_registry_url defined in karate-config.js
Given url device_registry_url
And path '/device/'+DeviceID
And header Authorization = authheader
And request ''
When method delete
Then status 200
Would this be the right approach or could we do it in a better way? If so, can someone kindly help in how to do it please?
Just use karate.filter() and then you know what to do:
* def fun = function(x){ return x.deviceId.startsWith('Test') }
* def filtered = karate.filter(response.items, fun)
* call read('delete.feature') filtered

GraphJSON serialization in Gremlin.Net

I'm trying to query the TinkerPop server (hosted inside docker container) via CosmosDB client library, which uses under the hood Gremlin.Net. So I managed to connect it and insert the data, here's intercepted WebSocket request:
!application/vnd.gremlin-v1.0+json{
"requestId": "b64bd2eb-46c3-4095-9eef-768bca2a14ed",
"op": "eval",
"processor": "",
"args": {
"gremlin": "g.addV(\"User\").property(\"UserId\",2).property(\"CustomerId\",1)"
}
}
The response:
{
"requestId": "b64bd2eb-46c3-4095-9eef-768bca2a14ed",
"status": {
"message": "",
"code": 200,
"attributes": {
"host": "/172.19.0.1:38848"
}
},
"result": {
"data": [
{
"id": 0,
"label": "User",
"type": "vertex",
"properties": {}
}
],
"meta": {}
}
}
Problem is that I see those properties when I'm connected via gremlin console
gremlin> g.V().hasLabel("User").has("CustomerId",1).has("UserId",2).limit(1).valueMap()
==>{UserId=[2], CustomerId=[1]}
Also, I'm able to query the TinkerPop server with Gremlin.Net:
!application/vnd.gremlin-v1.0+json{
"requestId": "de35909f-4bc1-4aae-aa5f-28361b3c0933",
"op": "eval",
"processor": "",
"args": {
"gremlin": "g.V().hasLabel(\"User\").has(\"CustomerId\",1).has(\"UserId\",2).limit(1)"
}
}
But it returns a payload with zero-valued ID and without any properties included:
{
"requestId": "de35909f-4bc1-4aae-aa5f-28361b3c0933",
"status": {
"message": "",
"code": 200,
"attributes": {
"host": "/172.19.0.1:38858"
}
},
"result": {
"data": [
{
"id": 0,
"label": "User",
"type": "vertex",
"properties": {}
}
],
"meta": {}
}
}
Tried to swap between GraphSON v1, v2, v3 with no luck. Documentation says that script serializers should include all the properties. Do I have to tweak the config somehow to make this work and return properties?
So it seems that with a version of 3.4 of the Gremlin server ReferenceElementStrategy
was added by default to traversals, to preserve compatibility between binary and script serializers. In our case we wanted to mimic the behavior of the CosmosDB, so to adjust and receive desired behavior just remove the strategy from init script (in our case it was empty-sample.groovy
globals << [g : graph.traversal().withStrategies(ReferenceElementStrategy.instance())]
to
globals << [g : graph.traversal()]

Azure Data Factory v2 If activity always fails

I'm currently struggling with the Azure Data Factory v2 If activity which always fails with this error message:
enter image description here
I've designed two separate pipelines, one takes the full snapshot of the data (1333 records) from the on-premises SQL Server and loads the data into the Azure SQL Database, and another one just takes delta from the same source.
Both pipelines work fine when executed independently.
I then decided to wrap these two pipelines into the one parent pipeline which would do this:
1.
Execute LookUp activity to check if the target table in Azure SQL Database has any records, basic Select Count(Request_ID) As record_count From target_table - activity works fine, I can preview the returned record count.
2.
Pass the output from the LookUp activity to the If activity with the conditions that if record_count = 0, the parent pipeline would invoke the full load pipeline, otherwise the parent pipeline would invoke the delta load pipeline.
This is the actual expression:
{#activity('lookup_sites_record_count').output.firstRow.record_count}==0"
Whenever I try to execute this parent pipeline, it fails with the above message of "Activity failed: Activity failed because an inner activity failed."
Both inner activities, that is, full load and delta load pipelines, work just fine when triggered independently.
What I'm missing?
Many thanks in advance :).
mikhailg
Pipeline's JSON definition below:
{
"name": "pl_remedyreports_load_rs_sites",
"properties": {
"activities": [
{
"name": "lookup_sites_record_count",
"type": "Lookup",
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false
},
"typeProperties": {
"source": {
"type": "SqlSource",
"sqlReaderQuery": "Select Count(Request_ID) As record_count From mdp.RS_Sites;"
},
"dataset": {
"referenceName": "ds_azure_sql_db_sites",
"type": "DatasetReference"
}
}
},
{
"name": "If_check_site_record_count",
"type": "IfCondition",
"dependsOn": [
{
"activity": "lookup_sites_record_count",
"dependencyConditions": [
"Succeeded"
]
}
],
"typeProperties": {
"expression": {
"value": "{#activity('lookup_sites_record_count').output.firstRow.record_count}==0",
"type": "Expression"
},
"ifFalseActivities": [
{
"name": "pl_remedyreports_invoke_load_sites_inc",
"type": "ExecutePipeline",
"typeProperties": {
"pipeline": {
"referenceName": "pl_remedyreports_load_sites_inc",
"type": "PipelineReference"
}
}
}
],
"ifTrueActivities": [
{
"name": "pl_remedyreports_invoke_load_sites_full",
"type": "ExecutePipeline",
"typeProperties": {
"pipeline": {
"referenceName": "pl_remedyreports_load_sites_full",
"type": "PipelineReference"
}
}
}
]
}
}
],
"folder": {
"name": "Load Remedy Reference Data"
}
}
}
Your expression should be:
#equals(activity('lookup_sites_record_count').output.firstRow.record_count,0)

Copy Activity Properties to update Azure DW data from an on prem SQL stored procedure in data factory

I'm not sure that what I'm trying to achieve is even possible in Data factory, but I guess there should be a way.
Simply put it, I have a table in DW that needs to be updated by a stored procedure once a day.
This stored procedure resides on the Source DB, I am looking for a way to pass some IDs and get the results from that SP and store it in DB.
Any Help would be appreciated. Below Pipeline is all I could think of:
{
"name": "UpdateColumnX",
"properties": {
"activities": [
{
"type": "SqlServerStoredProcedure?? Not Really Sure",
"typeProperties": {
"source": {
"type": "SqlSource",
"sqlReaderQuery": "$$Text.Format('Passing IDs to the stored Procedure', Time.AddHours(WindowStart,10), Time.AddHours(WindowEnd,10))\n"
},
"storedProcedureName": "UpdateDataThroughSP",
"storedProcedureParameters": {
"StartDate": "$$Text.Format('{0:yyyy-MM-dd HH:mm:ss}', Time.AddHours(WindowStart,10))",
"EndDate ": "$$Text.Format('{0:yyyy-MM-dd HH:mm:ss}', Time.AddHours(WindowEnd,10))"
}
},
"inputs": [
{
"name": "Not Sure which table should be my Input, the DW table having the IDs or the source table? "
}
],
"outputs": [
{
"name": "Sames and Input not sure"
}
],
"policy": {
"timeout": "01:00:00",
"concurrency": 1,
"retry": 3
},
"scheduler": {
"frequency": "Day",
"interval": 1,
"offset": "20:30:00"
},
"name": "Update Data through Source SP"
}
],
"start": "2017-09-13T20:30:00.045Z",
"end": "2099-12-30T13:00:00Z",
"isPaused": false,
"hubName": "HubName",
"pipelineMode": "Scheduled"
}
}