How to create a immediately-completed client response? - jax-rs

I want to stub-out a JAX-RS client request. Instead of making an HTTP call, I want to return an immediately-completed client Response. I tried invoking javax.ws.rs.core.Response.ok().build(), unfortunately when the application invokes Response.getEntity() later on it gets this exception:
java.lang.IllegalStateException: Method not supported on an outbound message.
at org.glassfish.jersey.message.internal.OutboundJaxrsResponse.readEntity(OutboundJaxrsResponse.java:144
I dug into Jersey's source-code but couldn't figure out a way to do this. How can one translate a server-side Response to a client-side Response in Jersey (or more generally JAX-RS)?
Use-case
I want to stub-out the HTTP call in a development environment (not a unit test) in order to prove that the network call is responsible for a performance problem I am seeing. I tried using a profiler to do this, but the call is asynchronous with 500+ threads and some network calls return fast (~100ms) while others return much slower (~1.5 seconds). Profilers do not follow asynchronous workflows well, and even if they did they only display the average time consumed across all invocations. I need to see the timing for each individual call. Stubbing-out the network call allows me to test whether the server is returning calls with such a large delta (100ms to 1.5 seconds) or whether the surrounding code is responsible.

Related

Can I send an API response before successful persistence of data?

I am currently developing a Microservice that is interacting with other microservices.
The problem now is that those interactions are really time-consuming. I already implemented concurrent calls via Uni and uses caching where useful. Now I still have some calls that still need some seconds in order to respond and now I thought of another thing, which I could do, in order to improve the performance:
Is it possible to send a response before the sucessfull persistence of data? I send requests to the other microservices where they have to persist the results of my methods. Can I already send the user the result in a first response and make a second response if the persistence process was sucessfull?
With that, the front-end could already begin working even though my API is not 100% finished.
I saw that there is a possible status-code 207 but it's rather used with streams where someone wants to split large files. Is there another possibility? Thanks in advance.
"Is it possible to send a response before the sucessfull persistence of data? Can I already send the user the result in a first response and make a second response if the persistence process was sucessfull? With that, the front-end could already begin working even though my API is not 100% finished."
You can and should, but it is a philosophy change in your API and possibly you have to consider some edge cases and techniques to deal with them.
In case of a long running API call, you can issue an "ack" response, a traditional 200 one, only the answer would just mean the operation is asynchronous and will complete in the future, something like { id:49584958, apicall:"create", status:"queued", result:true }
Then you can
poll your API with the returned ID to see if the operation that is still ongoing, has succeeded or failed.
have a SSE channel (realtime server side events) where your server can issue status messages as pending operations finish
maybe using persistent connections and keepalives, or flushing the response in the middle, you can achieve what you point out, ie. like a segmented response. I am not familiar with that approach as I normally go for the suggesions above.
But in any case, edge cases apply exactly the same: For example, what happens if then through your API a user issues calls dependent on the success of an ongoing or not even started previous command? like for example, get information about something still being persisted?
You will have to deal with these situations with mechanisms like:
Reject related operations until pending call is resolved "server side": Api could return ie. a BUSY error informing that operations are still ongoing when you want to, for example, delete something that still is being created.
Queue all operations so the server executes all them sequentially.
Allow some simulatenous operations if you find they will not collide (ie. create 2 unrelated items)

Is the WebSocket returned from HttpContext.WebSockets.AcceptWebSocketAsync thread-safe?

In ASP.NET Core v2, is the WebSocket returned by HttpContext.WebSockets.AcceptWebSocketAsync thread-safe?
More specifically, can I call ReceiveAsync in parallel with a thread that calls SendAsync?
I'd like to be able to have a message loop receiving messages like the close event, while at the same time be able to send messages in response to server-side events (that is, not in response to received events).
I haven't found any documentation specifying what implementation AcceptWebSocketAsync returns, but in practice it appears to consistently return a ManagedWebSocket instance.
I haven't found any API documentation for ManagedWebSocket. Fortunately the source code has been published and it contains this helpful note:
Thread-safety:
It's acceptable to call ReceiveAsync and SendAsync in parallel. One of each may run concurrently.
It's acceptable to have a pending ReceiveAsync while CloseOutputAsync or CloseAsync is called.
Attemping to invoke any other operations in parallel may corrupt the instance. Attempting to invoke a send operation while another is in progress or a receive operation while another is in progress will result in an exception.
— (source1)
(source2)
tl;dr: not thread-safe in general, but the read and send in parallel scenario is supported

Why should I build an API with an asynchronous/non-blocking framework?

I have been looking into the Play Framework as a possible candidate for helping me to build a simple API. However, the Django Rest Framework (DRF) also seems to be a pretty strong contenter.
As far as I can tell, the DRF does not advertise itself to be an asynchronous (or non-blocking) framework like the Play Framework does, but I am interested in whether or not this even matters. The situation that I keep thinking of is sending an email to a user via Mandrill -- I do not want my API to get bogged-down waiting for the Mandrill API to tell it whether or not the email was sent.
Thus, I think the question can be summarized like this: is there a benefit from the client's perspective that will result from my building an API with an asynchronous/non-blocking framework like Play over the DRF, or am I missing the point?
I'm a Django REST framework contributor (and user), so my perspective is biased towards that.
Django REST framework is built on Django, which is a synchronous framework for web applications. If you're already using a synchronous framework like Django, having a synchronous API is less of an issue.
Now, just because it is synchronous, that doesn't mean that only a single request can ever be handled at a time. Most web servers that are handling Django applications can handle multiple requests, some of theme even do it somewhat asynchronously across multiple threads. Usually this isn't actually an issue, as your web server can typically handle many concurrent requests, even if some of them are blocking. And when you have long, blocking calls you usually don't want that done within the API - you should be delegating that to background workers like Celery or Resque.
This isn't just specific to Django, many of the same principles apply to other synchronous frameworks like Rails and ASP.NET MVC. If you have long-running requests, you generally should be delegating work to other processes instead of holding up the request. It's common to use the 202 response code for these cases.
Now, that doesn't necessarily mean that asynchronous frameworks are useless. In runtimes such as Node.js, most frameworks handle requests asynchronously. It doesn't make sense to use a synchronous framework in these languages, so most libraries are built to be asynchronous.
What you choose very much depends on the tools that you are already using.
Regarding the clients connecting to your app there should be no difference at all if your server uses asynchronous/non-blocking (ANB) technologies or not. But it may make a lot of difference in the number of requests your app can handle.
Suppose the following scenario: a request that checks if a FB/Google/etc access token is valid, and then uses it to get the social profile of your user and then returns something back.
If you are using a blocking http client in your server, during each of the 2 http requests the thread serving that request can be blocked a lot of time doing nothing.
If you are using a non-blocking http client (like the one Play brings) while the HTTP request is made and the response comes back the thread can be used to do something else (ex: process part of another request).
Note that to solve this "problem" you wouldn't need an ANB framework, just an ANB http client. So you should look more to the kind of operations you will have in your app and check how your chosen framework will deal with them. For example: if your app consists almost of DB CRUD operations and the DB driver is blocking (like JDBC in Java and probably the ones used by Django) it really does not matter much if the framework is asynchronous or not, you will be blocking most of the time on that specific component.
In your email example probably Django+Celery will do just as fine as Play/Akka.
Non async frameworks usually do long-running tasks passing them to some external process (e.g. Resque/DelayedJob/sidekiq for Rails development)
just wanted to add that Mandrill API supports async parameter for sending emails.
Here is what's their docs are saying:
enable a background sending mode that is optimized for bulk sending. In async mode, messages/send will immediately return a status of "queued" for every recipient. To handle rejections when sending in async mode, set up a webhook for the 'reject' event.
So in case using async set to true you'll get handle immediately after performing a call to the API without waiting for all emails to be sent.
https://mandrillapp.com/api/docs/messages.JSON.html#method-send
(I took JSON version of the API just as example)
The Django community is working on this thing for now if you want you can utilise the sync_to_async() adapter .
It comes with some limitations and performance penalties but the community is still working on the same .
The link below will help you to work with the sync_to_async() adapter
https://docs.djangoproject.com/en/3.2/topics/async/

Find the best solution for read file and call web service

I have a text file which has about 100,000 records of identifier.
I must read all of record, each record i do request to web service and receive the result from web service, the result i write to another file.
I'm confuse between two solution:
- Read identifier file to a list of identifier, iterate this list, call web service, ....
- Read identifier line on each line, call web service, .....
Do you think what solution will be better ? program will do faster ?
Thanks for all.
As Dukeling says, using different threads to read the file, send requests and write to file can increase the speed of the program, rather the one thread solutions you propose.
I recommend that you would start using asynchronous calls to your web service. You make the call, but don't wait for a response (you handle the responses in the callback). When you make a lot of calls to the web service in parallel (as you want speed), this frees up some I/O threads on your hosting machine and can improve the rate/time of processed requests sometimes.
Then you can have a thread that reads from the file, starts the asynchronous call and repeats. On the callback function you implement the writing to file. You should at this level implement a logic that insures that your responses are written in the right order.
On the other hand, calling the web service for each record may be too chatty.
I would suggest an implementation similar to pagging: loading a certain amount of records, sending them to operation and receiving the responses in bulk. You should take care of not failing the whole package for one recors, have a logic for resending only a part of the tasks and so on.

Is a status method necessary for an API?

I am building an API and I was wondering is it worth having a method in an API that returns the status of the API whether its alive or not?
Or is this pointless, and its the API users job to be able to just make a call to the method that they need and if it doesn't return anything due to network issues they handle it as needed?
I think it's quite useful to have a status returned. On the one hand, you can provide more statuses than 'alive' or not and make your API more poweful, and on the other hand, it's more useful for the user, since you can tell him exactly what's going on (e.g. 'maintainance').
But if your WebService isn't available at all due to network issues, then, of course, it's up to the user to catch that exception. But that's not the point, I guess, and it's not something you could control with your API.
It's useless.
The information it returns is completely out of date the moment it is returned to you because the service may fail right after the status return call is dispatched.
Also, if you are load balancing the incoming requests and your status request gets routed to a failing node, the reply (or lack thereof) would look to the client like a problem with the whole API service. In the meantime, all the other nodes could be happily servicing requests. Now your client will think that the whole API service is down but subsequent requests would work just fine (assuming your load balancer would remove the failed node or restart it).
HTTP status codes returned from your application's requests are the correct way of indicating availability. Your clients of course have to be coded to tolerate and handle them.
What is wrong with standard HTTP response status codes? 503 Service Unavailable comes to mind. HTTP clients should already be able to handle that without writing any code special to your API.
Now, if the service is likely to be unavailable frequently and it is expensive for the client to discover that but cheap for the server, then it might be appropriate to have a separate 'health check' URL that can quickly let the client know that the service is available (at the time of the GET on the health check URL).
It is not necessary most of the time. At least when it returns simple true or false. It just makes client code more complicated because it has to call one more method. Even if your client received active=true from service, next useful call may still fail. Let you client make the calls that they need during normal execution and have them handle network, timeout and HTTP errors correctly. Very useful pattern for such cases is called Circuit Breaker.
The reasons where status check may be useful:
If all the normal calls are considered to be expensive there may be an advantage in first calling lightweight status-check method (just to avoid expensive call).
Service can have different statuses and client can change its behavior depending on these statuses.
It might also be worth looking into stateful protocols like XMPP.