Understanding Cro request/response cycle and memory use - raku

I'm a bit confused about how Cro handles client requests and, specifically, why some requests seem to cause Cro's memory usage to balloon.
A minimal example of this shows up in the literal "Hello world!" Cro server.
use Cro::HTTP::Router;
use Cro::HTTP::Server;
my $application = route {
get -> {
content 'text/html', 'Hello Cro!';
}
}
my Cro::Service $service = Cro::HTTP::Server.new:
:host<localhost>, :port<10000>, :$application;
$service.start;
react whenever signal(SIGINT) {
$service.stop;
exit;
}
All that this server does is respond to GET requests with "Hello Cro!' – which certainly shouldn't be taxing. However, if I navigate to localhost:10000 and then rapidly refresh the page, I notice Cro's memory use start to climb (and then to stay elevated).
This only seems to happen when the refreshes are rapid, which suggests that the issue might be related either to not properly closing connections or to a concurrency issue (a maybe-slightly-related prior question).
Is there some performance technique or best practice that this "Hello world" server has omitted for simplicity? Or am I missing something else about how Cro is designed to work?

The Cro request processing pipeline is a chain of supply blocks that requests and, later, responses pass through. Decisions about the optimal number of processing threads to create are left to the Raku ThreadPoolScheduler implementation.
So far as connection lifetime goes, it's up to the client - that is, the web browser - as to how eagerly connections are closed; if the browser uses a keep-alive HTTP/1.1 connection or retains a HTTP/2.0 connection, Cro respects that request.
Regarding memory use, growth up to a certain point isn't surprising; it's only a problem if it doesn't eventually level out. Causes include:
The scheduler determining more threads are required to handle the load. Each OS thread comes with some overhead inside the VM, the majority of it being that the GC nursery is per thread to allow simple bump-the-pointer allocation.
The MoarVM optimizer using memory for specialized bytecode and JIT-compiled machine code, which it produces in the background as the application runs, and is driven by certain bits of code having been executed enough times.
The GC trying to converge on a full collection threshold.

Related

Would a blocking web server get hung up to the sense it needs restarting, if many http clients send requests at most in parallel?

I read there are web servers their behaviors are called blocking whereas Node.js's is said non-blocking.
Would a blocking web server get hung up to the sense it needs restarting, if many http clients send requests at most in parallel?
As a complement, I don't say that it needs restarting while it potentially works fine again after the flood of parallel requests have stopped.
And I currently don't understand how request buffers and overflows work for web servers.
Although technically it could be possible to make a single-thread, single-process blocking server that can only handle 1 request at a time, it doesn't really practically make sense. Concurrency is kind of important.
The three main paradigms for parallelism (that I know of) are:
Multi-process/forking
Threading
Using an event loop/reactor pattern
Node falls in the third category, and also a bit in the second category depending on how you look at it.
Most languages can look at a socket and read from it, and immediately move on if there was nothing to read. Therefore most languages can have this non-blocking behavior.

Recovering from OOM on DirectByteBuffer allocation on a WebFlux Web Server

I'm not sure if this makes sense so please comment if I need to provide more info:
My webserver is used to upload files (receives files as Multipart/form-data and uploads them to another service). Using WebFlux, the controller defines the argument as a #RequestPart(name = "payload") final Part payload which wraps the header and Flux.
Reactor / Netty uses DirectByteBuffers to accomodate the payload. If the request handler cannot get enough direct memory to handle the request, it's gonna fail on an OOM and return 500. So this is normal / expected.
However, what's supposed to happen after?
I'm running load tests by sending multiple requests at the same time (either lots of requests with small files or less requests with bigger files). Once I get the first 500 due to an OOM, the system becomes unstable. Some requests will go through, and other fails with OOM (even requests with very small payload can fail).
This behaviour leds me to believe the allocated Pooled buffers are not shared between IO Channels? However this seems weird, it makes the system very easy to DDOS?
From the tests I did, I get the same behaviour using unpooled databuffers, although for a different reason. I do see the memory being unallocated when doing jcmd <PID> VM.native_memory but they aren't released to the OS according to metrics & htop. For instance, the reserved memory shown by jcmd goes back down but htop still reports the previous-high amount and it eventually OOM.
So Question :
Is that totally expected or am I missing a config value somewhere?
Setup :
Spring-boot 2.5.5 on openjdk11:jdk-11.0.10_9
Netty config :
-Dio.netty.allocator.type=pooled -Dio.netty.leakDetectionLevel=paranoid -Djdk.nio.maxCachedBufferSize=262144 -XX:MaxDirectMemorySize=1g -Dio.netty.maxDirectMemory=0

Async WCF and Protocol Behaviors

FYI: This will be my first real foray into Async/Await; for too long I've been settling for the familiar territory of BackgroundWorker. It's time to move on.
I wish to build a WCF service, self-hosted in a Windows service running on a remote machine in the same LAN, that does this:
Accepts a request for a single .ZIP archive
Creates the archive and packages several files
Returns the archive as its response to the request
I have to support archives as large as 10GB. Needless to say, this scenario isn't covered by basic WCF designs; we must take additional steps to meet the requirement. We must eliminate timeouts while the archive is building and memory errors while it's being sent. Both of these occur under basic WCF designs, depending on the size of the file returned.
My plan is to proceed using task-based asynchronous WCF calls and streaming mode.
I have two concerns:
Is this the proper approach to the problem?
Microsoft has done a nice job at abstracting all of this, but what of the underlying protocols? What goes on 'under the hood?' Does the server keep the connection alive while the archive is building (could be several minutes) or instead does it close the connection and initiate a new one once the operation is complete, thereby requiring me to properly route the request through the client machine firewall?
For #2, clearly I'm hoping for the former (keep-alive). But after some searching I'm not easily finding an answer. Perhaps you know.
You need streaming for big payloads. That is the right approach. This has nothing at all to do with asynchronous IO. The two are independent. The client cannot even tell that the server is async internally.
I'll add my standard answers for whether to use async IO or not:
https://stackoverflow.com/a/25087273/122718 Why does the EF 6 tutorial use asychronous calls?
https://stackoverflow.com/a/12796711/122718 Should we switch to use async I/O by default?
Each request runs over a single connection that is kept alive. This goes for both streaming big amounts of data as well as big initial delays. Not sure why you are concerned about routing. Does your router kill such connections? That's a problem.
Regarding keep alive, there is nothing going over the wire to do that. TCP sessions can stay open indefinitely without any kind of wire traffic.

How to capture screen shot of 1000 web pages concurrently in c#

I need to get screenshot of 1000 URLs using Parallel.Foreach in windows service. I tried to use WebBrowser control but it throws error since it runs only in STA. Kindly tell me how to achieve this task using Parallel.Foreach...
Edit : I am using a third party trial version DLL in below code to process it...
Parallel.ForEach(webpages, webPage=>
{
GetScreenShot(webPage);
}
public void GetScreenShot(string webPage)
{
WebsitesScreenshot.WebsitesScreenshot _Obj;
_Obj = new WebsitesScreenshot.WebsitesScreenshot();
WebsitesScreenshot.WebsitesScreenshot.Result _Result;
_Result = _Obj.CaptureWebpage(webPage);
if (_Result == WebsitesScreenshot.
WebsitesScreenshot.Result.Captured)
{
_Obj.ImageFormat = WebsitesScreenshot.
WebsitesScreenshot.ImageFormats.PNG;
_Obj.SaveImage(somePath);
}
_Obj.Dispose();
}
Most of the time this code runs fine upto processing of 80 urls but after that some tasks are being blocked. I don't know why...
Some times error is ContextSwitchDeadlock....as given below...
ContextSwitchDeadlock was detected
Message: The CLR has been unable to transition from COM context 0x44d3a8 to COM context 0x44d5d0 for 60 seconds. The thread that owns the destination context/apartment is most likely either doing a non pumping wait or processing a very long running operation without pumping Windows messages. This situation generally has a negative performance impact and may even lead to the application becoming non responsive or memory usage accumulating continually over time. To avoid this problem, all single threaded apartment (STA) threads should use pumping wait primitives (such as CoWaitForMultipleHandles) and routinely pump messages during long running operations.
This error indicates that a CLR thread is not sending any messages for an extended period of time. If a process is resource starved causing extended waits during processing this error can occur.
Given that you are trying to process 1000 web pages simultaneously, it would be no surprise that at least some of the threads will become resource starved. Personally, it is surprising to me that you can hit 80 websites without seeing errors.
Back off the number of websites you are trying to processing in parallel and your problems will likely disappear. Since you are running the trial version, there is little else you can do. If you licensed the commercial version you might be able to get support from the vendor. But at a guess, they would simply tell you to do the same thing.
The Websites.Screenshot library can be quite resource intensive depending upon the web page, esp. if the pages have flash. Think of it as being logically equivalent to opening 80 tabs simultaneously in a web browser.
You don't mention whether you are using the 32-bit or the 64-bit version. But the 64-bit version is likely to have fewer resource constraint, esp. memory. IMHO The .Net framework does a poor job of minimizing memory usage, so memory problems can crop up earlier than you would think.
ADDED
Please try limiting the number of threads threads first, e.g.
Parallel.ForEach(
Webpages,
new ParallelOptions { MaxDegreeOfParallelism = 10 }, // 10 thread limit
webPage => { GetScreenShot(webPage); }
);
Without access to the source code, you may not be able to change the threading model at all. You might also try setting the timeout to a higher value.
I don't have this control personally and am not willing to install it on my machine to answer a question re: changing the threading model. Unless it is a documented feature, you probably won't be able to do it without changing or at least inspecting the source.

Twisted - succes (or failure) callback for LineReceiver sendLine

I'm still trying to master Twisted while in the midst of finishing an application that uses it.
My question is:
My application uses LineReceiver.sendLine to send messages from a Twisted TCP server.
I would like to know if the sendLine succeeded.
I gather that I need to somehow add a success (and error?) callback to sendLine but I don't know how to do this.
Thanks for any pointers / examples
You need to define "succeeded" in order to come up with an answer to this.
All sendLine does immediately (probably) is add some bytes to a send buffer. In some sense, as long as it doesn't raise an exception (eg, MemoryError because your line is too long or TypeError because your line was the number 3 instead of an actual line) it has succeeded.
That's not a very useful kind of success, though. Unfortunately, the useful kind of success is more like "the bytes were added to the send buffer, the send buffer was flushed to the socket, the peer received the bytes, and the receiving application acted on the data in a persistent way".
Nothing in LineReceiver can tell you that all those things happened. The standard solution is to add some kind of acknowledgement to your protocol: when the receiving application has acted on the data, it sends back some bytes that tell the original sender the message has been handled.
You won't get LineReceiver.sendLine to help you much here because all it really knows how to do is send some bytes in a particular format. You need a more complex protocol to handle acknowledgements.
Fortunately, Twisted comes with a few. twisted.protocols.amp is one: it offers remote method calls (complete with responses) as a basic feature. I find that AMP is suitable for a wide range of applications so it's often safe to recommend for new development. It largely supersedes the older twisted.spread (aka "PB") which also provides both remote method calls and remote object references (and is therefore more complex - in my experience, more complex than most applications need). There are also some options that are a bit more standard: for example, Twisted Web includes an HTTP implementation (HTTP, as you may know, is good at request/response style interaction).