Write to data lake in stream analytics - azure-stream-analytics

Is there a way that I can have a output to data lake from stream analytics and use a aad app or something else than my account that is used to write to data lake? It is not practical to have a user as the one that writes to the data lake.

According to your description, I checked and tested Azure Data Lake Store output for Azure Stream Analytics, and I found that this output would use my current account for authorization as you mentioned.
Moreover, as Renew Data Lake Store authorization section mentioned as follows:
Currently, there is a limitation where the authentication token needs to be manually refreshed every 90 days for all jobs with Data Lake Store output.
For your requirement, I assumed that you could add feedback here. Or you could choose other outputs type for temporarily storing your results, then you could use another background task to trigger the temporary output store, then manually retrieve the records and write to your data lake. For this approach, you could leverage Service-to-service authentication with Data Lake Store.

For now answer is No
But it is planned to be available till the end of 2018
https://feedback.azure.com/forums/270577-stream-analytics/suggestions/15367185-please-provide-support-for-azure-data-lake-store-o

Related

Using Dataverse Synapse Link as reference data for Azure Stream Analytics

We are trying to use our Dataverse data as reference data for our Azure Stream Analytics. The idea is to couple customer acitivities with their CRM profile to create meaningful actions for them. We are currently moving from DES to the Dataverse Synapse Link, and have created the data lake where data gets dumped and can see it in Synapse Studio. However, Stream Analytics does not take CDM format out-of-the-box. It seems it can only handle CSV (with headers) and Json formats.
What is the best approach to get our Dataverse data in as reference for Stream Analytics (and in real time as possible)? Should we create a custom deserializer, or use ADF or something else?

Treat Delta Lake table as a transactional store for external API?

I apologize that I am likely going to be showing my ignorance of this space. I am just staring into Delta Lake and I have a feeling my initial concepts were incorrect.
Today I have millions of documents in Azure Cosmos Db. I have a service that combines data from various container tables and merges them into combined json documents that are then indexed into Elasticsearch.
Our current initiative is to use Synapse to do enriching of the data before indexing to Elasticsearch. The initial idea was that we would stream the CosmosDb updates into ADLS via the ChangeFeed. We would then combine (i.e., replace what the combiner service is doing) and enrich in Synapse.
The logic in combiner service is very complex, and difficult to rewrite from scratch (it is currently an Azure Service Fabric Stateless .Net application). I had thought that I could just have my combiner write the final copy (i.e, the json we are currently indexing as an end product) to ADLS, then we would only need to do our enrichments as additive data. I believe this is a misunderstanding of what Delta Lake is. I have been thinking of it as similar to Cosmos Db where I can push a json document via a REST call. I don't think this is a valid scenario, but I don't find any information that states this (perhaps because the assumption is so far off base that it never comes up).
In this scenario, would my only option be to have my service write the consolidated document back to Cosmos Db, and then sync that doc into ADLS?
Thanks!
John

azue synapse copy from Azure Sql to Datalake Table

i want to copy data from azure Sql Tabel to Datalake Storage account table using synapse analytics, in the Datalake table i want to store table name and max id for the incremental load, is this possibe
If your requirement is only to transfer the data from Azure SQL Database to Data Lake Storage (ADLS) account and no big data analysis required, you can simply use Copy activity in either Azure Data Factory (ADF) or Synapse pipeline.
ADF also allows you to perform required transformation on your data before storing it into the destination using data flow activity.
Refer this official tutorial to Copy data from a SQL Server database to Azure Blob storage.
Now coming to Incremental load, ADF and Synapse pipelines both provide complete inbuilt support for it. You need to select a column as Watermark column in your source table.
Watermark column in the source data store, which can be used to slice
the new or updated records for every run. Normally, the data in this
selected column (for example, last_modify_time or ID) keeps increasing
when rows are created or updated. The maximum value in this column is
used as a watermark.
Microsoft provides a complete step-by-step tutorial to Incrementally load data from Azure SQL Database to Azure Blob storage using the Azure portal which you can follow and implement with appropriate changes as per your use case.
Apart from watermark technique, there are other methods which you can choose to manage incremental load. Check here.

How to use output of Azure Data Factory Web Activity in next copy activity?

I have a ADF Web activity from which I'm getting metadata as an output. I want to copy this metadata into Azure Postgres DB. How to use the Web activity output as an source to the next copy activity?
Accoding to this answer. I think we can use two Web activities to store
the output of your first Web activity.
Use #activity('Web1').output.Response expression at second web activity to save the output as a blob to the container. Then we can use Copy activity to copy this blob into Azure Postgres DB.
Since I do not have permission to set role permissions, I did not test this. I think this solution is feasible.

Azure Stream Analytics Output to Data Lake Storage Gen2 with System-Assigned Managed Identity

I have a Stream Analytics Job with Use System-assigned Managed Identity enabled and which I would like to output its results to a Data Lake Storage Gen2.
As far as I understand I should only need to go into the Storage Account's IAM settings and add the Stream Analytics Identity as a Stroage Blob Data Owner. However, I don't see the Category of Stream Ananlytics Jobs in the dropdown and I can't seem to find the service principal in any of the other ones.
Am I missing something here or is this scenario just not supported yet?
Just choose the options like below, in the Select option, search for the name of your Stream Analytics Job, then you can find it and add it.