Check file encoding thanks to Azure Data Factory activity - azure-data-factory-2

I'd like to be able to check the encoding of a input file in the flow of my pipeline. Any idea about to do that thanks to one of the activity provided by Azure Data Factory?
Thanks for the tips

It's actually not supported by any of the activities "on the box" at this time, but you are able to do that using other services with connectors available on ADF like Azure Function for example. But you will need to develop the algorithm to detect the encoding and an azure function service to do that ... (Of course other services like Azure Batch, Notebooks ... could be used)
Saying that, it could be really usefull to add this information into the Get Metadata Activity (just posted the idea to https://feedback.azure.com/forums/270578-data-factory/suggestions/37452187-add-encoding-into-the-get-a-file-s-metadata-activi)

Related

Can MuleSoft API/Anypoint copy data from one database's table to another database's table without any additional step (any custom code)?

Goal:
I have two SQL server databases (DB-A and DB-B) located on two different severs in same network.
DB-A has a table T1 and I want to copy data from DB-A's Table T1 (source) to DB-B's Table T2 (Destination). This DB sync should take palace anytime any record in T1 is added, updated, and deleted.
Please note: All db to db data syc options are out of consideration, I must use MuleSoft API for this job.
Background:
I am new to MuleSoft and its offered products, I am told mule soft platform can help with building and managing API’s.
I explored web for MuleSoft offering, there are many articles (mentioned below) which are suggesting that MuleSoft itself can read and write from one DB table and write to another DB table (using DB connectors etc).
Questions:
Is it possible that MuleSoft itself can get this data sync job done without us writing own MuleSoft API invoker or MuleSoft API Consumer (to trigger MuleSoft API from one end or to receive data from MuleSoft API on the other end and write to DB table)?
What are all key steps to get this data transfer working? If you can provide any reference which shows step by step journey to achieve the goal will be huge help.
Links:
https://help.mulesoft.com/s/question/0D52T00004mXXGDSA4/copy-data-from-one-oracle-table-to-another-oracle-table
https://help.mulesoft.com/s/question/0D52T00004mXStnSAG/select-insert-data-from-one-database-to-another
https://help.mulesoft.com/s/question/0D72T000003rpJJSAY/detail
First let's clarify the terminology since the questions mixes several concepts in a confusing way. MuleSoft is a company that has several products that may apply. A MuleSoft API should be considered an API created by MuleSoft. Since you clearly are talking about APIs created by you or your organization that would be an incorrect description. What you are talking about are really Mule applications, which are applications that are deployed and executed in a Mule runtime. Mule applications may implement your APIs, or may implement integrations. After all Mule originally was an ESB product used to integrate other systems, before REST APIs where a thing. You may deploy Mule applications to Anypoint Platform. Specifically to the CloudHub component of the platform, or to an on-prem instance of Mule runtime.
In any case, a Mule application is perfectly capable of implementing APIs, integrations or both. There is no need that it implements an API or call another API if that is not what you want. You need to trigger the flow somehow, either reading directly from the database to find new rows, with a scheduler to execute a query at a given time, an HTTP request or even have an API listening for requests to trigger the flow.
As an example the application can use the <db:listener> source of the Database connector to start the flow fetching rows. You need to take care of any watermark columns configurations to detect only new rows. See the documentation https://docs.mulesoft.com/db-connector/1.13/database-documentation#listener for details.
Alternatively you can trigger the flow in another way and just use a select operation.
After that use DataWeave to transform the records as needed. Then use insert or update operations.
There are examples in the documentation that can help you to get started. If you are not familiar with Mule you should start with reading the documentation and do some training until you get the concepts.

How to create a process in Dell Boomi that will get data from one Database and then will send data to a SaaS

I would like to know how do I create a process in Dell Boomi that will meet the following criteria:
Read data directly from Database poduction table then will send the data to SaaS (public internet) using REST API.
Another process will read data from SaaS (REST API) and then write it to another Database table.
Please see attached link as to what I have done so far and I really don't know how to proceed. Hope you can help me out. Thank you.Boomi DB connector
You are actually making a good start. For the first process (DB > Saas) you need to:
Ensure you have access to the DB - if your Atom is local than this shouldn't be much of an issue, but if it is on the Boomi Cloud,
then you need to enable access to this DB from the internet (not
something I would recommend).
Check what you need to read and define Boomi Operation - from the image you have linked I can see that you are doing that, but not
knowing what data you need and how it is structured, it is impossible to say if you have defined all correctly.
Transform data to the output system format - once you get the data from the DB, use the Map shape to map it to the Profile of the Saas you are sending your data to.
Send data to Saas - you can use HttpClient connector to send data in JSON or XML (or any other format you like) to the Saas Rest API
For the other process (Saas > DB) the steps are practically the same but in reverse order.

Automating scaleup of Streaming units - Stream analytics job

We would like to automate scale up of streaming units for certain stream analytics job if the 'SU utilization' is high. Is it possible to achieve this using PowerShell? Thanks.
Firstly, as Pete M said, we could call REST API to create or update a transformation within a job.
Besides, Azure Stream Analytics Cmdlets New-AzureRmStreamAnalyticsTransformation could be used to update a transformation within a job.
Depends on what you mean by "automate". You can update a transformation via the API from a scheduled job, including streaming unit allocation. I'm not sure if you can do this via the PS object model but you can always make a rest call:
https://learn.microsoft.com/en-us/rest/api/streamanalytics/stream-analytics-transformation
If you mean you want to use powershell to create and configure a job to automatically scale on its own, unfortunately today that isn't possible regardless of how you create the job. ASA doesn't support elastic scaling. You have to do it "manually", either by hand or some manner of scheduled webjob or similar.
It is three years later now, but I think you can use App Insights to automatically create an alert rule based on percent utilization. Is it an absolute MUST that you use powershell? If so, there is an Azure Automation Script on Github:
https://github.com/Azure/azure-stream-analytics/blob/master/Autoscale/StepScaleUp.ps1

How to use Apache Nifi to query a REST API?

For a project i need to develop an ETL process (extract transform load) that reads data from a (legacy) tool that exposes its data on a REST API. This data needs to be stored in amazon S3.
I really like to try this with apache nifi but i honestly have no clue yet how i can connect with the REST API, and where/how i can implement some business logic to 'talk the right protocol' with the source system. For example i like to keep track of what data has been written so far so it can resume loading where it left of.
So far i have been reading the nifi documentation and i'm getting a better insight what the tool provdes/entails. However it's not clear to be how i could implement the task within the nifi architecture.
Hopefully someone can give me some guidance?
Thanks,
Paul
The InvokeHTTP processor can be used to query a REST API.
Here is a simple flow that
Queries the REST API at https://api.exchangeratesapi.io/latest every 10 minutes
Sets the output-file name (exchangerates_<ID>.json)
Stores the query response in the output file on the local filesystem (under /tmp/data-out)
I exported the flow as a NiFi template and stored it in a gist. The template can be imported into a NiFi instance and run as is.

Bulk user account creation from CSV data import/ingestion

Hi all brilliant minds,
I am currently working on a fairly complex problem and I would love to get some idea brainstorming going on. I have a C# .NET web application running in Windows Azure, using SQL Azure as the primary datastore.
Everytime a new user creates an account, all they need to provide is the name, email and password. Upon account creation, we store the core membership data to the SQL database, and all the secondary operations (e.g. sending emails, establishing social relationships, creating profile assets, etc) get pushed onto an Azure Queue and gets picked-up/processed later.
Now I have a couple of CSV files that contain hundreds of new users (names & emails) that need to be created on the system. I am thinking of automating this by breaking into two parts:
Part 1: Write a service that ingests the CSV files, parses out the names & emails, and saves this data in storage A
This service should be flexible enough to take files with different formats
This service does not actually create the user accounts, so this is decoupled from the business logic layer of our application
The choice of storage does not have to be SQL, it could also be non-relational datastore
(e.g. Azure Tables)
This service could be a third-party solution outside of our application platform - so it is open to all suggestions
Part 2: Write a process that periodically goes through storage A and creates the user accounts from there
This is in the "business logic layer" of our application
Whenever an account is successfully created, mark that specific record in storage A as processed
This needs to be retry-able in case of failures in user account creations
I'm wondering if anyone has experience with importing bulk "users" from files, and if what I am suggesting sounds like a decent solution.
Note that Part 1 could be a third-party solution outside of our application platform, so there's no restriction in what language/platform it has to be running in. We are thinking about either using BULK INSERT, or Microsoft SQL Server Integration Services 2008 (SSIS) that ingests and loads data from CSV into SQL datastore. If anyone has worked with these and can provide some pointers that would be greatly appreciated too.. Thanks so much in advance!
If I understand this correctly, you already have a process that picks up messages from a queue and does its core logic to create the user assets/etc. So, sounds like you should only automate the parsing of the CSV files and dumping the contents into queue messages? That sounds like a trivial task.
You can kick the process of processing the CSV file also via a queue message (to a different queue). The message would contain the location of the CSV file and the Worker Role running in Azure would pick it up (could be the same worker role as the one that processes new users if the usual load is not high).
Since you're utilizing queues, the process is retriable
HTH