Copy Data from REST API to Azure SQL DB with Data Factory - azure-sql-database

Beginner here, I want to copy some stock data from a REST API into my Azure SQL database. I've set up a pipeline with a Look-up activity to get the needed strings (with stock symbols) and use them in For Each to call the API and copy the data to SQL database.
Problem: the copy activity works for some stock symbols, but not for all.
My Look-Up concatenates the stock symbols into a comma-separated string, for concatenating the string with the API call URL.
Here is the concatenate statement and the final output of the pipeline:
SELECT STRING_AGG(Symbol, ',') AS Symbollist
FROM [dbo].[Tab_Symbols]
GROUP BY GroupOfTen;
green symbols --> appear in SQL database
red symbols --> do not appear in SQL database
Why are the first and last symbols always missing?
I've tested the API call with the strings and I receive all stock symbols data in JSON format as it should. Maybe the problems is my mapping or sink SQL table?
Thank you! Fabian
Here are screenshots of my copy activity:
REST dataset settings and dynamic URL string:
For Each settings:
Source settings and use of dynamic string (use for each item):
Sink settings:
Mapping:

I solved it. In the source settings of my copy activity the column reference '.Symbollist' of the lookup-output was missing.
#item().Symbollist

Related

Trouble accessing Json from Web API call in ADF and inserting in SQL table

I have a REST API that I use to extract the data.
I am using Azure data factory to do that by using web component and then taking that Json output and using Stored Procedure and trying to insert into database table.
When i do the test in SQL with sample Json it works fine, where as json that's coming from WEB call in ADF has the "" in front of each node value like below.
[{\"company_id\":\"test\",\"condition_assessments_ID\":\"SON-testMOTION-SON-MOTION ANALYSIS - SMA & test-2022-04-27#10_00_00\"}]
And i think because of that i was getting error like
"Execution fail against sql server. Sql error number: 13609. Error Message: JSON text is not properly formatted. Unexpected character 'S' is found at position 0."
This is one issue and to avoid it so i was converting the Json as string and which is putting more back slashes and its not helping.
How to preserve the json formatting so its straight forward for a stored proc to insert it?
Escape character backslash \ appears wherever double quotes are present in the Azure data factory.
Even though the Stored procedure input is shown with a backslash, when data is loaded to Sink (SQL Table) it removes backslashes and data is inserted in JSON format.
I have repro’d in my lab with sample API data and was able to load as expected.
Web activity output:
Pass output of web activity to stored procedure parameter by converting it to string. Here I am only passing the data part from the output.
#string(activity('Web1').output.data)
Stored procedure Input:
SQL table data after the stored procedure is run:

How to get an array from JSON in the Azure Data Factory?

My actual (not properly working) setup has two pipelines:
Get API data to lake: for each row in metadata table in SQL calling the REST API and copy the reply (json-files) to the Blob datalake.
Copy data from the lake to SQL: For Each file auto create table in SQL.
The result is the correct number of tables in SQL. Only the content of the tables is not what I hoped for. They all contain 1 column named odata.metadata and 1 entry, the link to the metadata.
If I manually remove the metadata from the JSON in the datalake and then run the second pipeline, the SQL table is what I want to have.
Have:
{ "odata.metadata":"https://test.com",
"value":[
{
"Key":"12345",
"Title":"Name",
"Status":"Test"
}]}
Want:
[{
"Key":"12345",
"Title":"Name",
"Status":"Test"
}]
I tried to add $.['value'] in the API call. The result then was no odata.metadata line, but the array started with {value: which resulted in an error copying to SQL
I also tried to use mapping (in sink) to SQL. That gives the wanted result for the dataset I manually specified the mapping for, but only goes well for the dataset with the same number of column in the array. I don't want to manually do the mapping for 170 calls...
Does anyone know how handle this in ADF? For now I feel like the only solution is to add a Python step in the pipeline, but I hope for a somewhat standard ADF way to do this!
You can add another pipeline with dataflow to remove the content from JSON file before copying data to SQL, using flatten formatters.
Before flattening the JSON file:
This is what I see when JSON data copied to SQL database without flattening:
After flattening the JSON file:
Added a pipeline with dataflow to flatten the JSON file to remove 'odata.metadata' content from the array.
Source preview:
Flatten formatter:
Select the required object from the Input array
After selecting value object from input array, you can see only the values under value in Flatten formatter preview.
Sink preview:
File generated after flattening.
Copy the generated file as Input to SQL.
Note: If your Input file schema is not constant, you can enable Allow schema drift to allow schema changes
Reference: Schema drift in mapping data flow

Azure Data Factory 2 : How to split a file into multiple output files

I'm using Azure Data Factory and am looking for the complement to the "Lookup" activity. Basically I want to be able to write a single line to a file.
Here's the setup:
Read from a CSV file in blob store using a Lookup activity
Connect the output of that to a For Each
within the For Each, take each record (a line from the file read by the Lookup activity) and write it to a distinct file, named dynamically.
Any clues on how to accomplish that?
Use Data flow, use the derived column activity to create a filename column. Use the filename column in sink. Details on how to implement dynamic filenames in ADF is describe here: https://kromerbigdata.com/2019/04/05/dynamic-file-names-in-adf-with-mapping-data-flows/
Data Flow would probably be better for this, but as a quick hack, you can do the following to read the text file line by line in a pipeline:
Define your source dataset to output a line as a single column. Normally I would use "NoDelimiter" for this, but that isn't supported by Lookup. As a workaround, define it with an incorrect Column Delimiter (like | or \t for a CSV file). You should also go to the Schema tab, and CLEAR the schema. This will generate a column in the output named "Prop_0".
In the foreach activity, set the Items to the Lookup's "output.value" and check "Sequential".
Inside the foreach, you can use item().Prop_0 to grab the text of the line:
To the best of my understanding, creating a blob isn't directly supported by pipelines [hence my suggestion above to look into Data Flow]. It is, however, very simple to do in Logic Apps. If I was tackling this problem, I would create a logic app with an HTTP Request Received trigger, then call it from ADF with a Web activity and send the text line and dynamic file name in the payload.

Create an Azure Data Factory pipeline to copy new records from DocumentDB to Azure SQL

I am trying to find the best way to copy yesterday's data from DocumentDB to Azure SQL.
I have a working DocumentDB database that is recording data gathered via a web service. I would like to routinely (daily) copy all new records from the DocumentDB to an Azure SQL DB table. In order to do so I have created and successfully executed an Azure Data Factory Pipeline that copies records with a datetime > '2018-01-01', but I've only ever been able to get it to work with an arbitrary date - never getting the date from a variable.
My research on DocumentDB SQL querying shows that it has Mathematical, Type checking, String, Array, and Geospatial functions but no date-time functions equivalent to SQL Server's getdate() function.
I understand that Data Factory Pipelines have some system variables that are accessible, including utcnow(). I cannot figure out, though, how to actually use those by editing the JSON successfully. If I try just including utcnow() within the query I get an error from DocumentDB that "'utcnow' is not a recognized built-in function name".
"query": "SELECT * FROM c where c.StartTimestamp > utcnow()",
If I try instead to build the string within the JSON using utcnow() I can't even save it because of a syntax error:
"query": "SELECT * FROM c where c.StartTimestamp > " + utcnow(),
I am willing to try a different technology than a Data Factory Pipeline, but I have a lot of data in our DocumentDB so I'm not interested in abandoning that, and I have much greater familiarity with SQL programming and need to move the data there for joining and other analysis.
What is the easiest and best way to copy those new entries over every day into the staging table in Azure SQL?
Are you using ADF V2 or V1?
For ADF V2.
I think that you can follow the incremental approach that they recommend, for example you could have a watermark table (it could be in your target Azure SQL database) and two lookups activities, one of the lookups will obtain the previous run watermark value (it could be date, integer, whatever your audit value is) and another lookup activity to obtain the MAX (watermark_value, i.e. date) of your source document and have a CopyActivity that gets all the values where the c.StartTimeStamp<=MaxWatermarkValueFromSource AND c.StartTimeStamp>LastWaterMarkValue.
I followed this example using the Python SDK and worked for me.
https://learn.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-powershell

Does External file format in polybase support Row terminator?

I am loading a BCPed data which has few rows that contains newline character in data itself. So when I try to import those data it's throwing an error. To Solve this issue I need to specify row terminator in external file format as \r\n. Does polybase allow row terminator? if so how?
Does polybase allow row terminator?
Row terminator feature of Polybase is not supported currently. There is a feature request which related to your question on Azure feedback page. The state of this feature is still under review. Link below is for your reference.
Polybase: allow field/row terminators within string fields