Synchronous transformation jobs using kettle - pentaho

Is there any way to run transformation jobs for synchronous calls ? The requirement is like for each API call I need to execute the ktr and return the response. Synchronous calls might happen , input file size can also change , can I handle this requirement using kettle transformation? Please help.

you can use the REST Client kettle step to communicate at the end of your transformation the data or any message to the api. I usually do.

Yes, you can do this with the EE server, you can expose transformations as webservices.
If you want to do it with CE then carte will help you. But it doesnt seem to support waiting for the job/trans to finish.
So if you want that option, then use CDA or SPARKL to expose your job or transformation as a webservice within the BA server and that will give you what you're looking for.

Related

Can MuleSoft API/Anypoint copy data from one database's table to another database's table without any additional step (any custom code)?

Goal:
I have two SQL server databases (DB-A and DB-B) located on two different severs in same network.
DB-A has a table T1 and I want to copy data from DB-A's Table T1 (source) to DB-B's Table T2 (Destination). This DB sync should take palace anytime any record in T1 is added, updated, and deleted.
Please note: All db to db data syc options are out of consideration, I must use MuleSoft API for this job.
Background:
I am new to MuleSoft and its offered products, I am told mule soft platform can help with building and managing API’s.
I explored web for MuleSoft offering, there are many articles (mentioned below) which are suggesting that MuleSoft itself can read and write from one DB table and write to another DB table (using DB connectors etc).
Questions:
Is it possible that MuleSoft itself can get this data sync job done without us writing own MuleSoft API invoker or MuleSoft API Consumer (to trigger MuleSoft API from one end or to receive data from MuleSoft API on the other end and write to DB table)?
What are all key steps to get this data transfer working? If you can provide any reference which shows step by step journey to achieve the goal will be huge help.
Links:
https://help.mulesoft.com/s/question/0D52T00004mXXGDSA4/copy-data-from-one-oracle-table-to-another-oracle-table
https://help.mulesoft.com/s/question/0D52T00004mXStnSAG/select-insert-data-from-one-database-to-another
https://help.mulesoft.com/s/question/0D72T000003rpJJSAY/detail
First let's clarify the terminology since the questions mixes several concepts in a confusing way. MuleSoft is a company that has several products that may apply. A MuleSoft API should be considered an API created by MuleSoft. Since you clearly are talking about APIs created by you or your organization that would be an incorrect description. What you are talking about are really Mule applications, which are applications that are deployed and executed in a Mule runtime. Mule applications may implement your APIs, or may implement integrations. After all Mule originally was an ESB product used to integrate other systems, before REST APIs where a thing. You may deploy Mule applications to Anypoint Platform. Specifically to the CloudHub component of the platform, or to an on-prem instance of Mule runtime.
In any case, a Mule application is perfectly capable of implementing APIs, integrations or both. There is no need that it implements an API or call another API if that is not what you want. You need to trigger the flow somehow, either reading directly from the database to find new rows, with a scheduler to execute a query at a given time, an HTTP request or even have an API listening for requests to trigger the flow.
As an example the application can use the <db:listener> source of the Database connector to start the flow fetching rows. You need to take care of any watermark columns configurations to detect only new rows. See the documentation https://docs.mulesoft.com/db-connector/1.13/database-documentation#listener for details.
Alternatively you can trigger the flow in another way and just use a select operation.
After that use DataWeave to transform the records as needed. Then use insert or update operations.
There are examples in the documentation that can help you to get started. If you are not familiar with Mule you should start with reading the documentation and do some training until you get the concepts.

Mosaic-Decisions: Flow import/export by API

We have 21 mosaic instances, It is very difficult to migrate flows on 21 environment. We have to make this process automatically by CICD pipeline.
How can we import/export mosaic flow by API? If it is available please mention steps.
Any advice is greatly appreciated.
Yes, Mosaic Decisions has the provision of Flow migration. Following migrations are available in Mosaic Decisions -
Single flow export-import
Bulk flow export-import
Whole Project export-import
As you mentioned about triggering it through terminal, It can be done in 2 steps,
Hitting curl command on the API meant to export the flow/s
Hitting curl command on the API meant to import the flow/s
Please note, you need to have access to the cluster and the project where the flow/s are getting imported.
In the coming versions, Mosaic Decisions will also come with export-import happening through a single hit through UI or hitting a single API.
Hope this resolves your query.
For API related queries, you can connect with the product support of Mosaic.

Check file encoding thanks to Azure Data Factory activity

I'd like to be able to check the encoding of a input file in the flow of my pipeline. Any idea about to do that thanks to one of the activity provided by Azure Data Factory?
Thanks for the tips
It's actually not supported by any of the activities "on the box" at this time, but you are able to do that using other services with connectors available on ADF like Azure Function for example. But you will need to develop the algorithm to detect the encoding and an azure function service to do that ... (Of course other services like Azure Batch, Notebooks ... could be used)
Saying that, it could be really usefull to add this information into the Get Metadata Activity (just posted the idea to https://feedback.azure.com/forums/270578-data-factory/suggestions/37452187-add-encoding-into-the-get-a-file-s-metadata-activi)

Automating scaleup of Streaming units - Stream analytics job

We would like to automate scale up of streaming units for certain stream analytics job if the 'SU utilization' is high. Is it possible to achieve this using PowerShell? Thanks.
Firstly, as Pete M said, we could call REST API to create or update a transformation within a job.
Besides, Azure Stream Analytics Cmdlets New-AzureRmStreamAnalyticsTransformation could be used to update a transformation within a job.
Depends on what you mean by "automate". You can update a transformation via the API from a scheduled job, including streaming unit allocation. I'm not sure if you can do this via the PS object model but you can always make a rest call:
https://learn.microsoft.com/en-us/rest/api/streamanalytics/stream-analytics-transformation
If you mean you want to use powershell to create and configure a job to automatically scale on its own, unfortunately today that isn't possible regardless of how you create the job. ASA doesn't support elastic scaling. You have to do it "manually", either by hand or some manner of scheduled webjob or similar.
It is three years later now, but I think you can use App Insights to automatically create an alert rule based on percent utilization. Is it an absolute MUST that you use powershell? If so, there is an Azure Automation Script on Github:
https://github.com/Azure/azure-stream-analytics/blob/master/Autoscale/StepScaleUp.ps1

How to use Apache Nifi to query a REST API?

For a project i need to develop an ETL process (extract transform load) that reads data from a (legacy) tool that exposes its data on a REST API. This data needs to be stored in amazon S3.
I really like to try this with apache nifi but i honestly have no clue yet how i can connect with the REST API, and where/how i can implement some business logic to 'talk the right protocol' with the source system. For example i like to keep track of what data has been written so far so it can resume loading where it left of.
So far i have been reading the nifi documentation and i'm getting a better insight what the tool provdes/entails. However it's not clear to be how i could implement the task within the nifi architecture.
Hopefully someone can give me some guidance?
Thanks,
Paul
The InvokeHTTP processor can be used to query a REST API.
Here is a simple flow that
Queries the REST API at https://api.exchangeratesapi.io/latest every 10 minutes
Sets the output-file name (exchangerates_<ID>.json)
Stores the query response in the output file on the local filesystem (under /tmp/data-out)
I exported the flow as a NiFi template and stored it in a gist. The template can be imported into a NiFi instance and run as is.