Single notification mail for multiple flow files Nifi

Single notification mail for multiple flow files Nifi - error-handling

I'm trying to copy data from a database and place it in S3 using nifi. I'm able to copy the data from database and place it in S3. Now I'm trying to add error handling for this flow. I just added the PutEmail processor for error notification. I just gave a wrong bucket name to validate the Email. This PutEmail processor is getting triggered for each and every flow file(As there are 100 flow files mail is triggering 100 times). I just want to trigger this PutEmail(notification) only once whenever there is a error in the flow. Any suggestions on this please.
Below is the Flow:
Any suggestions on better(Generic) error handling will be helpful for me.

For your use case, MergeContent would allow you to batch several FlowFiles over a given duration to be rolled up into a singular email.
You could additionally do some additional transforms to only get the key parts of the content and/or attributes to provide source FlowFiles to MergeContent that would give a summary listing in the message sent.

You can implement custom ReportingTasks which will periodically sends reports based on Need

Related

Is there a way to delay recovery operations action in Zabbix like with regular operations?

I can configure Zabbix to send me mail warning notifications only if a certain amount of time has passed and the trigger problem is still active on the dashboard.
Now, Zabbix doesn't have an option under "recovery operations" for delay like "Operations" has but is there a way to configure something so I can receive "RESOLVED" mail only if there was a "PROBLEM" mail for a certain trigger in the first place?
The way it works now is - if I set up 'recovery operations' for sending me 'resolved' mail it will send me that regardless if it did or didn't send me 'problem mail'.
I want to do solve this because it's very annoying getting all the notifications but I still need some notifications. Like when a problem is active for more than 20 minutes and I only want to see problem and resolved notifications for that.

Unfortunately there's no way out of the box to manage the recovery operation.
You can find more details in the documentation:
Recovery operations do not support escalating - all operations are
assigned to a single step.
If this is an important issue to you there are some ways to mitigate it, but any workaround that comes to mind is time consuming.
You can implement multiple triggers with tags and tag-bound actions (ie: duplicate triggers with different actions and recovery actions), manage the issue with an agent in your mailbox (horrible!) or write a custom script to be used as default recovery action.
This script should receive the problem ID as a parameter and use it to check via API if it needs to silently close the issue or send an email or set a trigger with a specific tag and use it with another zabbix action etc...

Nifi: send file to HDFS; if fail, wait one second then retry

As part of my ETL process in Nifi, I need to send some files to HDFS. What is needed is to check if there's a failure. If so, it needs to wait one second then retry. If that fails it will again wait a second and retry. Finally, if that fails it will send an email to me and another admin. That last part is easy enough, I just have it going to a "PutEmail" processor. For the retries, I set it up as per this article: https://kisstechdocs.wordpress.com/2015/01/15/creating-a-limited-failure-loop-in-nifi/
However, it won't wait before each retry. I thought about adding an expression that is something like ${now():plus(1000)} but it's expecting a boolean expression. Is there maybe a way to express the same idea in a boolean way? Or will I need to scrap this and find a different approach? Thanks in advance for any assistance.

Try ControlRate for this, you can configure the throughput to be 1 flow file per second, or tweak the settings to allow multiple flow files through based on a grouping attribute, e.g. For example if you get 5 failed flowfiles in the queue at once, then it will take 5 seconds if you send 1 flow file per second "globally". If your use case says you just need each flow file to be delayed by 1 second (meaning after 1 second, all 5 flow files are sent), I believe you can set the properties to support that as well.
An alternate solution is a scripting processor, you could read in a (configurable) number of flow files at once, then delay one second, then transfer them all.

Event sourcing combinded with deletable data

I want to develop an Emailer microservice which provides a SendEmail command. Inside the microservice I have an aggregate in mind which represents the whole email process with the following events:
Aggregate Email:
(EmailCreated)
EmailDeliveryStarted
EmailDeliveryFailed
EmailRecipientDelivered when one of the recipients received the email
EmailRecipientDeliveryFailed when one of the recipients could not receive the email
etc.
In the background the email delivery service SendGrid is used; my microservice works like a facade for that with my own events. The incoming webhooks from SendGrid are translated to proper domain events.
The process would look like this:
Command SendEmail ==> EmailCreated
EmailCreatedHandler ==> Email.Send (to SendGrid)
Incoming webhook ==> EmailDeliveryStarted
Further webhooks ==> EmailRecipientDelivered, EmailRecipientDeliveryFailed, etc.
Of course if I'd want to replace the external webservice and it would apply other messaging strategies I would adapt to that but keep my domain model with it's events. I want to let the client not worry about the concrete email delivery strategy.
Now the crucial problem I face: I want to accept the SendEmail commands even if SendGrid is not available at that very moment, which entails storing the whole email data (with attachments) and then, with an event handler, start the sending process. On the other hand I don't want to bloat my initial EmailCreated event with this BLOB data. And I want to be able to clean up this data after SendGrid has accepted my send email request.
I could also try both sending the email to SendGrid and storing an initial EmailDeliveryStarted event in the SendEmail command. But this feels like a two-phase commit: if SendGrid accepted my call but somehow my repository was unable to store the EmailDeliveryStarted event the client would be informed that something went wrong and it tries again which would be a disaster.
So I don't know how to design my aggregate and, more important, my EmailCreated event since it should not contain the BLOB data like attachments.

I found this question interesting and it took a little bit to reflect on that.
First things first - I do not see an obligation to store the email attachments in the event. You can simply store the fully qualified name of the files attached. That would keep the event log smaller and perhaps rule out the need for "deleting" the event (and you know that, in an event source model, you should not do that).
Secondly, assuming that the project is not building an e-mail client, I don't see a need to model an e-mail as an aggregate root. I see AggregateRoots represent business-relevant domains, not for a utility task like sending an e-mail. You could model this far more easily using a database table / document that keeps track of what has been sent and what not yet. I see sending e-mails through SendGrid as a reaction to a business event, certainly to be tracked, but not an AggregateRoot in its own right.
Lastly, if you want to accept SendEmail commands also when SendGrid is offline, the aggregate emits an EmailQueued event. The EmailQueuedHandler will produce a line on the read model of the process in charge taking all the Emails in queued state and batch them for sending. If the communication with SendGrid fails, you can either:
Do nothing, the sender process will pick the email at the next attempt
Emit a EmailSendFailed, intercepted by a Handler that will increase the retry count (if you want to stop after a number of retries).
Hope that is sufficiently clear and best of luck with your project.

mule process csv file with same set of data

In mule I have a CSV file containing 50 rows of records like product id, product name, quantity, price, offer, expire date. I want to convert the CSV format to a JSON format and I use an external API using the above data. While in mid calling, the API network goes down. How can I retry with same data.
Can anybody shed light on this

You would have to create an exception catcher that would log the failure requests, then run through those again.
Do not keep "looping" the same record until the API comes up. This will block all others from being processed and your queue will not decrease.
See this post which explains how to setup batch jobs and this post on how to create error handlers; both from the mulesoft blog.

Take a look at the until-successful router: https://developer.mulesoft.com/docs/display/current/Until+Successful+Scope
NOTE: This is by default asynchronous, but can be configured to be synchronous.

Will a Mule Flow stop processing a message after some arbitrary default number of exceptions thrown?

I have a flow that inserts objects in a Mongo database using the Mongo connector, which uses the MongoClientImpl provided by the connector. This client has a line that tries to cast the _id to an ObjectId prior to returning that value as a string to the user after the insert has been submitted to the database. As far as I can tell, this line does not impact whether the object is inserted, but it does throw an exception when trying to cast a string to an ObjectId.
My flow is throwing hundreds of ClassCastExceptions. It also does not appear to be processing nearly as many inserts as I would expect. I expect to see tens of thousands, but instead the flow is only inserting 136 documents.
Is there a limit to the number of exceptions that can be thrown and captured by Mule's DefaultMessagingExceptionStrategy before the flow will stop processing a given message?

The answer is no, but you can implement some custom logic and stop the flow. Probably you would need afterwards to manually start it, this is how I would do it:
Use the object store module, for counting the incidences of a given exception.
Upon a defined number of incidences stop programatically the flow, one option to do this would be something like #[groovy:<flowName>.stop()] given that groovy has direct access to the registry.
Send an email notification so you can be aware of the errors (specially useful for production).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas