Stream Analytics possible output path prefix for Data Lake Store - azure-data-lake

Is there any way to set an output path prefix in Stream Analytics job that store data to Data Lake Store to store it into separate files (that dependent on device id) for example data/2017/5/3/device1.csv , data/2017/5/3/device2.csv ... Or what is the best way to do this after stream analytics store data to one file ?
My input file is iot hub.

Is there any way to set an output path prefix in Stream Analytics job that store data to Data Lake Store to store it into separate files (that dependent on device id) for example data/2017/5/3/device1.csv , data/2017/5/3/device2.csv.
According to the document, base on my understanding, we could set an output path prefix with {date}/{time} and there is no requirement that the variables must be used.
It seems that it is not supported to set an output path with dynamic device id.
If it is possible to create multiple Stream Analytics jobs, we could create Stream Analytics job and add output with the static device id as path prefix.

Related

Append data to a file using REST API

Does any one have an example of appending data to a file stored in azure datalake from source using Data Factory and the rest API here
Can I use a copy activity with REST dataset on the sink side ?
Bellow is my pipeline, it consists of a copy activity inside a foreach loop. My requirement is : if the file already exists on the sink then append data to the same file. (the copy activity here overwrite the existing file with just the new data)
Sink :
Currently, the append data to a file in Azure Data Lake is not supported in the Azure Data Factory.
As a workaround,
Load multiple files using ForEach activity into data lake.
Merge the individual files to get a single file (final file) using copy data activity.
Delete the individual files after merging.
Please refer this SO thread for similar process.

Get the blob URL of output file written by sink operation in a Data Flow Activity - Azure Synapse Analytics

I have a Data Flow which reads multiple CSV files from Azure Data Lake and writes them into Azure Blob Storage as a single file. I need to get the url of the file written to the blob.
This data flow is a part of a pipeline and I need to give the Blob url as the output of the pipeline. Is there any way achieve this? Thanks in advance for your help.
To create a new column name in your data stream that is the source
file name and path utilize "Column to store file name" field under
Source transformation --> Source Options.
Source Options settings
Data Preview
You can also have a parameter inside your data flow to hold file name or file path and add that parameter value as new column using derived column transformation.
Please Note: You need supply value in to your data flow parameter from data flow activity.

Automatic ETL data before loading to Bigquery

I have CSV files added to a GCS bucket daily or weekly each file name contains (date + specific parameter)
The files contain the schema (id + name) columns and we need to auto load/ingest these files into a bigquery table so that the final table have 4 columns (id,name,date,specific parameter)
We have tried dataflow templates but we couldn't get the date and specific parameter from the file name to the dataflow
And we tried cloud function (we can get the date and specific parameter value from file name) but couldn't add it in columns while ingestion
Any suggestions?
Disclaimer: I have authored an article for this kind of problem using Cloud Workflows. When you want to extract parts of filename, to use as table definition later.
We will create a Cloud Workflow to load data from Google Storage into BigQuery. This linked article is a complete guide on how to work with workflows, connecting any Google Cloud APIs, working with subworkflows, arrays, extracting segments, and calling BigQuery load jobs.
Let’s assume we have all our source files in Google Storage. Files are organized in buckets, folders, and could be versioned.
Our workflow definition will have multiple steps.
(1) We will start by using the GCS API to list files in a bucket, by using a folder as a filter.
(2) For each file then, we will further use parts from the filename to use in BigQuery’s generated table name.
(3) The workflow’s last step will be to load the GCS file into the indicated BigQuery table.
We are going to use BigQuery query syntax to parse and extract the segments from the URL and return them as a single row result. This way we will have an intermediate lesson on how to query from BigQuery and process the results.
Full article with lots of Code Samples is here: Using Cloud Workflows to load Cloud Storage files into BigQuery

Getting files and folders in the datalake while reading from datafactory

While reading azure sql table data (which actually consists of path of the directories) from azure data factory by using the paths how to dynamically get the files from the datalake.
Can any one tell me what should I give in the dataset
Screenshot
You could use lookup activity to read data from azure sql, and then following it by an foreach activity. And then, pass #item(). to your dataset parameter k1.

How to get the Number of records in a Data lake File with Logicapp Data Lake connector?

I have a requirement to get the Table data from Azure Database table and upload the same to Data Lake file using logic app. Once the upload is complete, I need to get the number of records present in the Data lake file. Does logic app has any expressions or built-in methods to get the number of records in a Data lake file.
At the data lake store level, there is no notion of records. You can query information about the file such as how many bytes long it is. The concept of records is what is interpreted by application that read it based on the type of data (CSV, JSON etc) and the delimiter that makes sense.
You will need to do this as a separate step before or after saving the file.