I’m working with a pipeline that loads table data from onpremise SQL to a datalake csv file dynamically, sinking a .csv file for each table that I already set to load in a versionControl table in a AzureSQL using Foreach.
So, after load the data, i want to update the versionControl table with the lastUpdate date, based on the MAX(lastUpdate) field of each .csv file loaded. To accomplish that, i know that i need to add a dataflow after the copy activity, so i can use the aggregate transformation, but don’t know how to pass the filename to the source of the dataflow dynamically in a parameter.
Thanks!
2 options:
Parameterized dataset. Use a source dataset in the dataflow that has a parameter for the file name. You can then pass in that filename as a pipeline parameter.
Parameterized Source wildcard. You can also use a source dataset in the dataflow that points just to a folder in your container. You can then parameterize the wildcard property in the Source and send in the filename there as a pipeline parameter.
Related
I have a Data Flow which reads multiple CSV files from Azure Data Lake and writes them into Azure Blob Storage as a single file. I need to get the url of the file written to the blob.
This data flow is a part of a pipeline and I need to give the Blob url as the output of the pipeline. Is there any way achieve this? Thanks in advance for your help.
To create a new column name in your data stream that is the source
file name and path utilize "Column to store file name" field under
Source transformation --> Source Options.
Source Options settings
Data Preview
You can also have a parameter inside your data flow to hold file name or file path and add that parameter value as new column using derived column transformation.
Please Note: You need supply value in to your data flow parameter from data flow activity.
I'm aware that its possible to load data from files in S3 (e.g. csv, parquet or json) into snowflake by creating an external stage with file format type csv and then loading it into a table with 1 column of type VARIANT. But this needs some manual step to cast this data into the correct types to create a view which can be used for analysis.
Is there a way to automate this loading process from S3 so the table column data types is either inferred from the CSV file or specified elsewhere by some other means? (similar to how a table can be created in Google BigQuery from csv files in GCS with inferred table schema)
As of today, the single Variant column solution you are adopting is the closest you can get with Snowflake out-of-the-box tools to achieve your goal which, as I understand from your question, is to let the loading process infer the source file structure.
In fact, the COPY command needs to know the structure of the expected file that it is going to load data from, through FILE_FORMAT.
More details: https://docs.snowflake.com/en/user-guide/data-load-s3-copy.html#loading-your-data
I'm using Azure Data Factory and am looking for the complement to the "Lookup" activity. Basically I want to be able to write a single line to a file.
Here's the setup:
Read from a CSV file in blob store using a Lookup activity
Connect the output of that to a For Each
within the For Each, take each record (a line from the file read by the Lookup activity) and write it to a distinct file, named dynamically.
Any clues on how to accomplish that?
Use Data flow, use the derived column activity to create a filename column. Use the filename column in sink. Details on how to implement dynamic filenames in ADF is describe here: https://kromerbigdata.com/2019/04/05/dynamic-file-names-in-adf-with-mapping-data-flows/
Data Flow would probably be better for this, but as a quick hack, you can do the following to read the text file line by line in a pipeline:
Define your source dataset to output a line as a single column. Normally I would use "NoDelimiter" for this, but that isn't supported by Lookup. As a workaround, define it with an incorrect Column Delimiter (like | or \t for a CSV file). You should also go to the Schema tab, and CLEAR the schema. This will generate a column in the output named "Prop_0".
In the foreach activity, set the Items to the Lookup's "output.value" and check "Sequential".
Inside the foreach, you can use item().Prop_0 to grab the text of the line:
To the best of my understanding, creating a blob isn't directly supported by pipelines [hence my suggestion above to look into Data Flow]. It is, however, very simple to do in Logic Apps. If I was tackling this problem, I would create a logic app with an HTTP Request Received trigger, then call it from ADF with a Web activity and send the text line and dynamic file name in the payload.
While reading azure sql table data (which actually consists of path of the directories) from azure data factory by using the paths how to dynamically get the files from the datalake.
Can any one tell me what should I give in the dataset
Screenshot
You could use lookup activity to read data from azure sql, and then following it by an foreach activity. And then, pass #item(). to your dataset parameter k1.
Is it possible to write an output parameter to a dataset?
I have a meta data activity that stores the file name of an azure blob dataset and I would like write that value into another azure blob dataset as an additional column via a copy activity.
Thanks
If you are looking to get the output of the previous operation as an input to the next operation, you could probably go ahead in the following manner,
I am hoping that the attribute you are getting is the child Items, the values for this can be obtained in the next step using the following expression.
#activity('Name_of_activity').output.childItems.
This would return an Array of your subfolders.
The following link should help you with the expression in ADF