I'm working on a program that needs to make multiple calls to various microservices, possibly do some processing on those results, and then return a combination of those processes.
A very basic example might look like:
(def urls ["http://localhost:8080" "http://localhost:8080"])
(defn handle-http-request [url]
(let [request (http-kit/get url)]
(do-some-processing (:body #request))))
(defn home-page
[request]
(let [response (pmap handle-http-request urls)]
(ring-resp/response {:buildings (first response) :characters (second response)})))
In this case i'm using pmap to handle running all my requests in parallel, and then return them as JSON that the UI making the request can handle. In the real environment there will be more URLs each fetching different data.
My question is whether this is an appropriate way to handle this problem? I've looked some core.async, and see it as a possible way to handle this but worry it might be overkill? My second concern is handling errors, it would appear that core.async might be able to more elegantly handle issues where a remote has timed out. Am I right in this assumption, or is using pmap okay in this situation?
Lastly, are there any accepted patterns or reading on how to handle microservice architectures like this? I see my problem as relatively specific, but feel the idea of a server making requests to many others and compiling those results is nothing new.
The HTTP-Kit documentation
provides an example for this using futures:
Combined, concurrent requests, handle results synchronously
(let [urls ["http://server.com/api/1" "http://server.com/api/2" "http://server.com/api/3"]
;; send the request concurrently (asynchronously)
futures (doall (map http/get urls))]
(doseq [resp futures]
;; wait for server response synchronously
(println (-> #resp :opts :url) " status: " (:status #resp))
)
Yet, often this is still not enough. Make sure you configure timeouts
and you have an escalation strategy once the timeouts hit. The more
other services you are going to hit, the more complex this task gets.
Maybe have a look at libraries like
Resilience4JClj, that
provide configurable ways to deal with retrying, caching, timeouts, ...
Note: pmap is best suited to CPU intensive tasks. Or as the docs state:
Only useful for
computationally intensive functions where the time of f dominates
the coordination overhead.
Related
I read there are web servers their behaviors are called blocking whereas Node.js's is said non-blocking.
Would a blocking web server get hung up to the sense it needs restarting, if many http clients send requests at most in parallel?
As a complement, I don't say that it needs restarting while it potentially works fine again after the flood of parallel requests have stopped.
And I currently don't understand how request buffers and overflows work for web servers.
Although technically it could be possible to make a single-thread, single-process blocking server that can only handle 1 request at a time, it doesn't really practically make sense. Concurrency is kind of important.
The three main paradigms for parallelism (that I know of) are:
Multi-process/forking
Threading
Using an event loop/reactor pattern
Node falls in the third category, and also a bit in the second category depending on how you look at it.
Most languages can look at a socket and read from it, and immediately move on if there was nothing to read. Therefore most languages can have this non-blocking behavior.
I'm a bit confused about how Cro handles client requests and, specifically, why some requests seem to cause Cro's memory usage to balloon.
A minimal example of this shows up in the literal "Hello world!" Cro server.
use Cro::HTTP::Router;
use Cro::HTTP::Server;
my $application = route {
get -> {
content 'text/html', 'Hello Cro!';
}
}
my Cro::Service $service = Cro::HTTP::Server.new:
:host<localhost>, :port<10000>, :$application;
$service.start;
react whenever signal(SIGINT) {
$service.stop;
exit;
}
All that this server does is respond to GET requests with "Hello Cro!' – which certainly shouldn't be taxing. However, if I navigate to localhost:10000 and then rapidly refresh the page, I notice Cro's memory use start to climb (and then to stay elevated).
This only seems to happen when the refreshes are rapid, which suggests that the issue might be related either to not properly closing connections or to a concurrency issue (a maybe-slightly-related prior question).
Is there some performance technique or best practice that this "Hello world" server has omitted for simplicity? Or am I missing something else about how Cro is designed to work?
The Cro request processing pipeline is a chain of supply blocks that requests and, later, responses pass through. Decisions about the optimal number of processing threads to create are left to the Raku ThreadPoolScheduler implementation.
So far as connection lifetime goes, it's up to the client - that is, the web browser - as to how eagerly connections are closed; if the browser uses a keep-alive HTTP/1.1 connection or retains a HTTP/2.0 connection, Cro respects that request.
Regarding memory use, growth up to a certain point isn't surprising; it's only a problem if it doesn't eventually level out. Causes include:
The scheduler determining more threads are required to handle the load. Each OS thread comes with some overhead inside the VM, the majority of it being that the GC nursery is per thread to allow simple bump-the-pointer allocation.
The MoarVM optimizer using memory for specialized bytecode and JIT-compiled machine code, which it produces in the background as the application runs, and is driven by certain bits of code having been executed enough times.
The GC trying to converge on a full collection threshold.
I work on splitting monoliths into microservices. With the monolith, I had a single source of truth and can just GET /resources/123 right after the PATCH /resources/123 and be sure that the database has the up-to-date data I need.
With microservices and CQRS in place, there is a risk that the query service has not seen yet the latest update to the record when I perform a GET request.
What is the best or standard approach to making sure that the client receives back the up-to-date value? I know that the client may compare resource versions that he receives after PATCH and after GET and retry requests, but is there a known API design to tell the server something like GET /resources/123 and wait up to 5 sec for the resource version 45 or bigger to arrive?
Since a PATCH request allows a response body, to my mind there's nothing wrong with the response including the object after patching. The requestor who sent the PATCH can use the response in lieu of a GET; for others, the eventual consistency delay for the GET isn't observable (since they don't know when the PATCH was issued).
CQRS means to not contort your write model for the sake of reads. If there's a read that is easily performed based on the write model, that read can be done against the write model.
Generally a better design might be for the PATCH request to delay its own response, if that's an option.
However, your GET request can also just 'hang' until it's ready. This generally feels like a better design than polling.
A client could indicate to the server how long it's willing to wait using a Prefer: wait= header: https://datatracker.ietf.org/doc/html/rfc7240#section-4.3
This could be used both for the GET or the PATCH request.
I don't think there's a standard HTTP way to say: this resource is not available right now, but will be in the future. However, there is a standard HTTP header to tell clients when to retry the request:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Retry-After
This is mainly used for 429 and 503 errors but neither seem appropriate here.
Frankly this is one of the first thing I've heard in a while that could be a good new HTTP status code. 425 Too Early exists but its a different use-case.
I am currently developing a Microservice that is interacting with other microservices.
The problem now is that those interactions are really time-consuming. I already implemented concurrent calls via Uni and uses caching where useful. Now I still have some calls that still need some seconds in order to respond and now I thought of another thing, which I could do, in order to improve the performance:
Is it possible to send a response before the sucessfull persistence of data? I send requests to the other microservices where they have to persist the results of my methods. Can I already send the user the result in a first response and make a second response if the persistence process was sucessfull?
With that, the front-end could already begin working even though my API is not 100% finished.
I saw that there is a possible status-code 207 but it's rather used with streams where someone wants to split large files. Is there another possibility? Thanks in advance.
"Is it possible to send a response before the sucessfull persistence of data? Can I already send the user the result in a first response and make a second response if the persistence process was sucessfull? With that, the front-end could already begin working even though my API is not 100% finished."
You can and should, but it is a philosophy change in your API and possibly you have to consider some edge cases and techniques to deal with them.
In case of a long running API call, you can issue an "ack" response, a traditional 200 one, only the answer would just mean the operation is asynchronous and will complete in the future, something like { id:49584958, apicall:"create", status:"queued", result:true }
Then you can
poll your API with the returned ID to see if the operation that is still ongoing, has succeeded or failed.
have a SSE channel (realtime server side events) where your server can issue status messages as pending operations finish
maybe using persistent connections and keepalives, or flushing the response in the middle, you can achieve what you point out, ie. like a segmented response. I am not familiar with that approach as I normally go for the suggesions above.
But in any case, edge cases apply exactly the same: For example, what happens if then through your API a user issues calls dependent on the success of an ongoing or not even started previous command? like for example, get information about something still being persisted?
You will have to deal with these situations with mechanisms like:
Reject related operations until pending call is resolved "server side": Api could return ie. a BUSY error informing that operations are still ongoing when you want to, for example, delete something that still is being created.
Queue all operations so the server executes all them sequentially.
Allow some simulatenous operations if you find they will not collide (ie. create 2 unrelated items)
So I'm trying to make API request in parallel, but it doesn't seems any faster. Am I doing it wrong? Here are my codes.
fun getUserInfo(username: String): Mono<String> {
return webclient
// some config and params
.post()
.bodyToMono(String::class)
.subscribeOn(Schedulers.parallel())
}
fun main(){
val time = measureTimeMillis {
Mono.zip(getUserInfo("doge"), getUserInfo("cheems"), etc...)
.map { user ->listOf(it.t1, it.t2, etc...) }
.block()
}
// give the same amount of time doesn't seems faster
// with and without subscribeOn(Schedulers.parallel())
}
It has nothing to do with your code.
You must understand that any i/o-work is mostly spent waiting. As in waiting for a response.
If we look at the lifecycle of a thread, it will do a bit of preprocessing and then send the request. When the request has been sent it has to wait for the response. This is where the majority of the time is spent, then you get a response and process the response. Here's the thing, as much as 90% of the request time could be spent just waiting. This is a lot of wasted resources having the thread just waiting.
This is what is good with webflux/reactor. When a request is sent, the thread will not wait, it will go on to process other requests/responses, and when that first requests response comes back, any free thread can pick up that response, it does not have to the be the thread that sent the request in the first place.
What i have just described is usually what is called async or asynchronous work.
So lets look at what you are doing. You want to run your requests in parallel, meaning utilizing multiple threads on multiple cores at the same time.
For this to work, you will need to contact the other cpus and tell them to get prepared for incoming work. The cpus will then need to initialize a number of threads on each cpu and then the data must be sent to all the cpus. As you can see there is a setup time involved here.
Then all the requests are made from multiple cpus at the same time, but the waiting for the responses are constant! Its the exact same waiting time as before (up to as much as 90% of the total request time). Then when all responses are return they are collected and processed on multiple cpus, and then they are sent back to the original thread on the original cpu.
What have you gained? Most likely almost nothing, but you also most likely utilized a lot more resources for this very, very minimal gain.
Parallel is usually good if you have the need for raw cpu computing power, like calculations of some sort, examples could be a 3D renderer, or I don't know, cracking hashes etc. not i/o work.
I/O work is usually more about orchestration than raw cpu power. Slow responses will not be solved by parallel computing a slow response is always a slow response. You will just consume more resources to no good.
This is the reason why just regular flatMap Is so powerful in reactor.
It will perform everything async for you without needing to deal with threads, locks, synchronization, joins etc. It will perform all work async as quick as possible utilizing as few resources as possible.