Hystrix Calls the fallbackMethod even when backend API gets executed - fallback

I am trying to implement the Hystrix into our micro-services project. To keep the question simple and concise, I will describe the scenario below:
(a) A call to a backend service which is performing slow at times (e.g. Payment Service)
(b) I have annotated the method with #Hystrix (from where I am making the call to the payment service). Also, I have implemented the corresponding fallback method.
(c) Below is the code snippet for the same.
#HystrixCommand(fallbackMethod = "fallbackProcessPayment")
public String processPayment(User user) throws Exception
{
// Call the payment service (which is slow in nature) to process the payment for the user....
}
public String fallbackProcessPayment(User user){
// This is the fallback method for processPayment....
// Gracefully handle the processPayment
}
In the config.properties file timeout is configured as
hystrix.command.getUseCase1.execution.isolation.thread.timeoutInMilliseconds=2000
Current Behavior - As soon as call to the backend Payment service is made from processPayment(..) method, it takes longer (~ 4000 ms) than what I have set in the hystrix.command.getUseCase1.execution.isolation.thread.timeoutInMilliseconds (2000 ms)
and hence Hystrix calls the fallbackProcessPayment (...) but what I also see is the fact that backend Payment service also gets executed albeit slowly.
This is undesired behaviour as Payment is being processed in the background as I also notify the user (through the fallback method) that we are not able to process the payment (because the call was timed out as paymentService took 4 secs to respond whereas Hystrix expected the response in 2 secs (based on timeoutInMilliseconds configuration).
Is there are any configuration which I am missing to make it work properly ???
Any pointer into this would be of great help.
Thanks for your time

Well. This is the expected behaviour of hystrix. You have couple of options.
1. Either increase the timeout
2. In your fallback method check what is the reason for the method to fail. i.e. on which exception. (You can get to know this by adding a argument to your fallback method of type Throwable which will have the exception for which fallback method is triggered). If the failure is due to timeout then you can write a piece of code to check if the previous process completed or not before returning the response.
But the second approach isn't feasible because, If you have set your threshold as 5 and if 5 requests fail due to timeout in succession then the 6th request will directly go to your fallback method. Where checking if previous process completed or not doesn't make sense.

Related

What are the Implications of WCF TimeoutException vs CommunicationException on method execution?

I'm making a call to a WCF service that will initiate a batch of credit card charges. If an exception occurs, I want to know whether it occurred prior to the method executing and cards actually being charged. For example, with a TimeoutException, there's no way to know whether the WCF method executed so I need to make sure those charges aren't retried until the situation is investigated. But if the network was just down, or the server cert expired, or anything else happened prior to the method actually executing, I can un-lock my records to be retried later without human intervention.
try
{
var response = wcfClient.ProcessBatch(paymentBatch);
wcfClient.Close();
//...
}
catch(CommunicationException)
{
//Safe to assume ProcessBatch did not execute?
wcfClient.Abort();
}
catch(TimeoutException)
{
//Indeterminate state. Have to assume operation may have succeeded server-side
wcfClient.Abort();
}
catch(Exception)
{
//Assuming operation may have succeeded server-side
}
This is using a wsHttpBinding. Does a CommunicationException guarantee the method did not execute or could it also be thrown during the response?
According to the CommunicationException documentation on MSDN, a CommunicationException Exceptioon is a superclass of errors that fall into two subcategories, both related to to errors in the SOAP datagram.
Conversely, a TimeoutException is pretty straightforward: "The exception that is thrown when the time allotted for a process or operation has expired." The allotted time is probably set by the owner of the service, and you may or may not have a mechanism by which to override it.
For future reference, two quick Bing searches returned both of the articles cited herein.

Automated Debugging of failed API test cases

Our application has 35 web servers and around 100 different APIs being executed on it.
These APIs internally calls each other and execute independently as well.
We have automated test cases of around 30 APIs but some of our tests fail because the other APIs fail on which the API under test depends.
So how can we know through our automated test cases the reason for each test failure?
Example Scenario:
We have a test case to validate the API to fetch the User's Bank account balance.
Now we hit this API through rest Assured and tries to assert the expected output. This request goes first to the ledger server which further internally hits auth server to validate the request authenticity, then hits, then hits a counter server to log fetchBalance request, then hits several other servers to get the correct balance of the user and then responds to our request.
But the problem is that this may break at any instance and if it breaks, the ledger server returns always same error string- "Something failed underhood". Now debugging becomes a challenge. We have to go to each server and have to search for the logs to know the actual cause.
I want to write a solution which can trace complete lifecycle of this request and can report where it actually failed.
For this problem, you should be aware of the most common failure reasons. And then you can implement the strategy bases on the failure reasons.
Example: If you have send one request to the server that API may have some security validations and some processing steps and integration with different components.
If you can identify some failure points and can implement checkpoints against that.
Request failed at Security validation. You have possible error codes for that then write logic according to that
You have a failure at the processing step there could be a possible reason
If there is a failure at the integration point then there must be some error codes also. You can implement logic around them
Validate the state of data before each interaction with server. For example
assert expression1 : expression2
Where expression2 will be executed if expression1 fails. (This is a Groovy example, but you can modify this as needed.)
An example expression2 message could be something like: "Failure occured when trying to send 'so-and-so' request!".

HttpRequest not aborted (cancelled) on browser abort in ASP.NET Core MVC

I wrote the following MVC Controller to test cancellation functionality:
class MyController : Controller
{
[HttpGet("api/CancelTest")]
async Task<IActionResult> Get()
{
await Task.Delay(1000);
CancellationToken token = HttpContext.RequestAborted;
bool cancelled = token.IsCancellationRequested;
logger.LogDebug(cancelled.ToString());
return Ok();
}
}
Say, I want to cancel the request, so the value 'true' is logged in the controller action above. This is possible server-side if the server implements the IHttpRequestLifetimeFeature. Luckily Kestrel does, and this can be accomplished the following way:
var feature = (IHttpRequestLifetimeFeature) HttpContext.Features[typeof(IHttpRequestLifetimeFeature)];
feature.Abort();
The problem however is that I want to cancel the request on the client side. For example, in the browser. In pre-core versions of ASP.NET MVC/WebApi the cancellation token would automatically be cancelled if the browser aborted a request. Example: refresh the page a couple of times in Chrome. In the Network tab of the chrome dev tools you can now see the previous (unfinished) request be cancelled.
The thing is: in ASP.NET Core running on Kestrel, I can only see the following entry in the log:
Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.UvException:
Error -4081 ECANCELED operation canceled
So the abort request from the browser DOES arrive and is handled by the Kestrel webserver. It does however not affect the RequestAborted property of the HttpContext in the controller, because the value 'false' is still logged by the method.
Question:
Is there a way to abort/cancel my controller's method, so that the HttpContext.RequestAborted property will be marked as cancelled?
Perhaps I can make something that would subscribe to Kestrel's operation cancelled trigger and call the IHttpRequestLifetimeFeature.Abort() method?
Update:
I did some further testing and it seems the HttpRequest IS in fact aborted, but there seems to be some kind of delay before the cancellation actually takes place. The delay is not time-factored, and seems to come straight from libuv (the library where the Kestrel webserver is build on top of). I posted more info on https://github.com/aspnet/KestrelHttpServer/issues/1103
More updates:
Issue has been moved to another one, because the previous one contained multiple problems. https://github.com/aspnet/KestrelHttpServer/issues/1139
Turns out that that simply using HttpContext.RequestAborted is indeed the right way, but due to a bug in Kestrel (the order in which FIN/RST packages were handled), the request was not aborted on a browser abort.
The bug should finally be fixed in Kestrel 2.0.
See the updates in my question for more information.

Ensure a web server's query will complete even if the client browser page is closed

I am trying to write a control panel to
Inform about certain KPIS
Enable the user to init certain requests / jobs by pressing a button that then runs a stored proc on the DB or sets a specific setting etc
So far, so good, except I would like to run some bigger jobs where the length of time that the job is running for is unknown and could run over both the script timeout period AND the time the user is willing to wait for a response.
What I want is a "fire and forget" process so the user hits the button and even if they kill the page or turn off their phone they know the job has been initiated and WILL complete.
I was looking into C# BeginExecuteNonQuery which is an async call to the query so the proc is fired but the control doesn't have to wait for a response from it to carry on. However I don't know what happens when the page/app that fired it is shut.
Also I was thinking of some sort of Ajax command that fires the code in a page behind the scenes so the user doesn't know about it running but then again I believe if the user shuts the page down the script will die and the command will die on the server as well.
The only way for certain I know of is a "queue" table where jobs are inserted into this table then an MS Agent job comes along every minute or two checking for new inserts and then runs the code if there is any. That way it is all on the DB and only a DB crash will destroy it. It won't help with multiple jobs waiting to be run concurrently that both take a long time but it's the only thing I can be sure of that will ensure the code is run at all.
Any ideas?
Any language is okay.
Since web browsers are unconnected, requests from them always take the full amount of time. The governing factor isn't what the browser does, but how long the web site itself will allow an action to continue.
IIS (and in general, web servers) have a timeout period for requests, where if the work being done takes simply too long, the request is terminated. This would involve abruptly stopping whatever is taking so long, such as a database call, running code, and so on.
Simply making your long-running actions asynchronous may seem like a good idea, however I would recommend against that. The reason is that in ASP and ASP.Net, asynchronously-called code still consumes a thread in a way that blocks other legitimate request from getting through (in some cases you can end up consuming two threads!). This could have performance implications in non-obvious ways. It's better to just increase the timeout and allow the synchronously blocking task to complete. There's nothing special you have to do to make such a request complete fully, it will occur even if the sender closes his browser or turns off his phone immediately after (presuming the entire request was received).
If you're still concerned about making certain work finish, no matter what is going on with the web request, then it's probably better to create an out-of-process server/service that does the work and to which such tasks can be handed off. Your web site then invokes a method that, inside the service, starts its own async thread to do the work and then immediately returns. Perhaps it also returns a request ID, so that the web page can check on the status of the requested work later through other methods.
You may use asynchronous method and call the query from this method.
Your simple method can be changed in to a asynch method in the following manner.
Consider that you have a Test method to be called asynchronously -
Class AsynchDemo
{
public string TestMethod(out int threadId)
{
//call your query here
}
//create a asynch handler delegate:
public delegate string AsyncMethodCaller(out int threadId);
}
In your main program /or where you have to call the Test Method:
public static void Main()
{
// The asynchronous method puts the thread id here.
int threadId;
// Create an instance of the test class.
AsyncDemo ad = new AsyncDemo();
// Create the delegate.
AsyncMethodCaller caller = new AsyncMethodCaller(ad.TestMethod);
// Initiate the asychronous call.
IAsyncResult result = caller.BeginInvoke(
out threadId, null, null);
// Call EndInvoke to wait for the asynchronous call to complete,
// and to retrieve the results.
string returnValue = caller.EndInvoke(out threadId, result);
Console.WriteLine("The call executed on thread {0}, with return value \"{1}\".",
threadId, returnValue);
}
From my experience a Classic ASP or ASP.NET page will run until complete, even if the client disconnects, unless you have something in place for checking that the client is still connected and do something if they are not, or a timeout is reached.
However, it would probably be better practice to run these sorts of jobs as scheduled tasks.
On submitting your web page could record in a database that the task needs to be run and then when the scheduled task runs it checks for this and starts the job.
Many web hosts and/or web control panels allow you to create scheduled tasks that call a URL on schedule.
Alternately if you have direct access to the web server you could create a scheduled task on the server to call a URL on schedule.
Or, if ASP.NET, you can put some code in global.asax to run on a schedule. Be aware though, if your website is set to stop after a certain period of inactivity then this will not work unlesss there is frequent continuous activity.

In the new ASP.NET Web API, how do I design for "Batch" requests?

I'm creating a web API based on the new ASP.NET Web API. I'm trying to understand the best way to handle people submitting multiple data-sets at the same time. If they have 100,000 requests it would be nice to let them submit 1,000 at a time.
Let's say I have a create new Contact method in my Contacts Controller:
public string Put(Contact _contact)
{
//add new _contact to repository
repository.Add(_contact);
//return success
}
What's the proper way to allow users to "Batch" submit new contacts? I'm thinking:
public string BatchPut(IEnumerable<Contact> _contacts)
{
foreach (var contact in _contacts)
{
respository.Add(contact);
}
}
Is this a good practice? Will this parse a GET request with a JSON array of Contacts (assuming they are correctly formatted)?
Lastly, any tips on how best to respond to Batch requests? What if 4 out of 300 fail?
Thanks a million!
When you PUT a collection, you are either inserting the whole collection or replacing an existing collection as if it was a single resource. It is very similar to GET, DELETE or POST a collection. It is an atomic operation. Using is as a substitute for individual calls to PUT a contact may not be very RESTfull (but that is really open for debate).
You may want to look at HTTP pipelining and send multiple PutContact requests of the same socket. With each request you can return standard HTTP status for that single request.
I implemented batch updates in the past with SOAP and we encountered a number of unforeseen issues when the system was under load. I suspect you will run into the same issues if you don't pay attention.
For example, the database may timeout in the middle of the batch update and the all hell broke loose in terms of failures, reliability, transactions etc. And the poor client had to figure out what was actually updated and try again.
When there was too many records to update, the HTTP request would time out because we took too long. That opened another can of worms.
Another concern was how much data would we accept during the update? Was 10MB of contacts enough? Perhaps 1MB? Larger buffers has numerous implications in terms of memory usage and security.
Hence my suggestion to look at HTTP pipelining.
Update
My suggestion would to handle batch creation of contacts as an async process. Just assume that a "job" is the same as a "batch create" process. So the service may look as follows:
public class JobService
{
// Post
public void Create(CreateJobRequest job)
{
// 1. Create job in the database with status "pending"
// 2. Save job details to disk (or S3)
// 3. Submit the job to MSMQ (or SQS)
// 4. For 20 seconds, poll the database to see if the job completed
// 5. If the job completed, return 201 with a URI to "Get" method below
// 6. If not, return 202 (aka the request was accepted for processing, but has not completed)
}
// Get
public Job Get(string id)
{
// 1. Fetch the job from the database
// 2. Return the job if it exists or 404
}
}
The background process that consumes stuff from the queue can update the database or alternatively perform a PUT to the service to update the status of Job to running and completed.
You'll need another service to navigate through the data that was just processed, address errors and so forth.
You background process may be need to be tolerant of validation errors. If not, or if your service does validation (assuming you are not doing database calls etc for which response times cannot be guaranteed), you can return a structure like CreateJobResponse that contains enough information for your client to fix the issue and resubmit the request. If you have to do some validation that is time consuming, do it in the background process, mark the job as failed and update the job with the information that will allow a client to fix the errors and resubmit the request. This assumes that the client can do something with the fact that the job failed.
If the Create method breaks the job request into many smaller "jobs" you'll have to deal with the fact that it may not be atomic and pose numerous challenges to monitor whether jobs completed successfully.
A PUT operation is supposed to replace a resource. Normally you do this against a single resource but when doing it against a collection that would mean you replace the original collection with the set of data passed. Not sure if you are meaning to do that but I am assuming you are just updating a subset of the collection in which case a PATCH method would be more appropriate.
Lastly, any tips on how best to respond to Batch requests? What if 4 out of 300 fail?
That is really up to you. There is only a single response so you can send a 200 OK or a 400 Bad Request and put the details in the body.