Sending exception message from Step-functions to aws cloudwatch event logs - amazon-cloudwatch

When AWS Step-function fails, the output message is null and same is passed to cloudwatch event logs, but the error message in exception is not passed to cloudwatch events log. How can I send that exception message to cloudwatch events log so that I can process it downstream.

CloudWatch Events and CloudWatch Logs are two different services. If you enable logging at ERROR, FATAL, or ALL log level, you'll see the errors in CloudWatch Logs. See https://docs.aws.amazon.com/step-functions/latest/dg/cw-logs.html
I assume you're referring to CloudWatch Events/EventBridge here. The events Step Functions emits to CloudWatch Events contain the result of the DescribeExecution API. It doesn't include errors, just the input and output of the execution. The error and cause that caused the execution to fail is in the executionFailedEventDetails field of last event in the execution history. You can retrieve the last event by calling GetExecutionHistory with "reverseOrder": true and "maxResults": 1.

I ended up using below boto3.client(‘stepfunctions’) and then using get_execution_history() method to retrieve the cause and error.

Related

Delete message from Azure Storage Queue using REST API

I am trying to simply Get a message off an Azure Storage Queue and then delete it using the REST API.
I can retrieve the message and get a popreceipt but when I try and use this to delete the message, I keep getting, "The specified message does not exist."
In the documentation (https://learn.microsoft.com/en-us/rest/api/storageservices/delete-message2) it looks like you only have to supply the popreceipt however further down the page, it says,
After a client retrieves a message with the Get Messages operation, the client is expected to process and delete the message. To delete the message, you must have two items of data returned in the response body of the Get Messages operation:
The message ID, an opaque GUID value that identifies the message in
the queue.
A valid pop receipt, an opaque value that indicates that the message has been retrieved.
So this implies you do need to send the MessageId as well but there is nothing in the docs that specifies where to place the messageid.
The URI in the docs says to pass DELETE to
https://myaccount.queue.core.windows.net/myqueue/messages/messageid?popreceipt=string-value and I have tried replacing messageid with the actual messageid from the GET but this does not seem to be correct.
Has anyone used this and can explain why I always get "The specified message does not exist" when trying to DELETE the message off the queue or am I missing something?
When GET message /dequeue a message is requested, the message becomes
invisible for certain amount of time .In this mean time ,if it is
not deleted in process which dequeued ,it will be visible again as
said by #gaurav mantri and have a chance to be picked up by another
process i.e ; when you perform get message operation with a visibility timeout reached, another message object is returned with a new pop_receipt .
So please check the time, when you dequeue the message and wait if
the time is out or delete it on time as sometimes encoding process
may take more time than the message invisibility timeout.
Also please note that the maximum timeout interval for Queue service
operations is 30 seconds. If the server timeout interval elapses
before the service has finished processing the request, the service
returns an error.
For the next process pop receipt is updated and so using old one
gives error error code 404 (Not Found) because message with a
matching pop receipt wont be found .
So please check by gradually increasing the invisibility timeout.
References:
Queue getmessage fails - some messages only. (microsoft.com)
Setting timeouts for Queue service operations (REST API) - Azure
Storage | Microsoft Docs

Setting SQS Destination for S3 Event Notification - Invalid S3 Resource Format

I was attempting to setup an event notification in S3 that would send a message to an SQS generated by Snowflake on any object creation event. This is fairly straightforward, but when I attempted to test the configuration, I noticed that no event had occurred. On closer inspection, I noticed this:
When clicking on the error, I get a message that the ARN has an invalid S3 resource format. I received this ARN from Snowflake so I'm sure it's valid. So, why is this error occurring and how can I get around it?

Using Sagas with Recoverabilty

We are having an issue with recovery for messages originating from Sagas.
When a Saga sends a message for processing, the message handler can sometimes fail with an exception. We currently use a try/catch and when an exception is thrown, we "Reply" with a failed message to the Saga. The issue with this approach is that Recoverability retries don't happen since we are handling the error in the message handler.
My thought was to add custom logic to the pipeline and if the Command message implements some special Interface, the custom logic would send a failed message response to the Saga if an exception occurs (after the retries fails), but I'm not sure where to plug into the pipeline that would allow me to send messages after retries fails.
Is this a valid approach? If not, how can I solve for Saga to Handler failure messages after retries?
You can use immediate dispatch to not wait for a handler to complete.
However, I would like to suggest an alternate approach. Why not create a Timeout in the saga? If the reply from the processing-handler isn't received within a certain TimeSpan, you take an alternate path. The processing-handler gets 5 minutes and if it doesn't respond within 5 minutes, we do something else. If it still responds after 6 minutes, we know we've already taken the alternate path (use a boolean flag or so and store that inside the saga data) and put aside the reply that arrived too late.
If you want to start a discussion based on this, check our community platform.

How to set a different MaxRetries per message type at any given Endpoint in NService bus?

I have a requirement for an endpoint to receive commands from a client and also to subscribe to events from another endpoint such as:
1- the received command is tried only once then sent to the error queue if an exception occurred
2- the received event is tried indefinitely until it is processed
Could the MaxRetries bet set differently depending on the message type?
NServiceBus MaxRetries setting is to handle things like deadlocks, so not really what you want for this scenario.
What you want is to use SLRs to handle this situation.
To filter based on an exception type, have a look at http://andreasohlund.net/2012/09/26/disabling-second-level-retries-for-specific-exceptions/
Hope this helps!
I looked into NSB source and I notice that MaxRetries can't take different values for different messages in NSB 3.3. Happily you can override class that forward messages to error queue and implement your own version that checking if failed message is event and then instead forward to error queue you can send it again to current endpoint.

Error queue and batch sending of messages

Hopefully this is a simple question but i need to verify that my assumption is correct: If i send 4 messages in one batch send and one of the 4 messages causes a fault and fails retries in its handler does that single message get forwarded to the error queue or does the entire batch message get placed into the error queue?
Common sense tells me that the single message would be moved to the error queue as the batch message has been unwrapped and delegated to its handlers.
The transaction boundary is the handler and therefore each message has its own set of retries. The only complexity to this is that if you are using a pipeline of message handlers you also have to consider that if any of the handlers fail for a given message, a retry will occur.