Understanding fiddler statistics - wcf

We are sending a HTTP WCF request to a 3rd party system hosted on our servers and were experiencing a significant delay between sending the request and getting the response. The 3rd party are claiming that they complete their work in a few seconds but in fiddler I can see a significant gap between the ServerBeginResponse and the GotResponseHeaders.
Now I'm not sure what could account for this delay? Could someone explain what the ServerBeginResponseand the GotResponseHeaders timers in Fiddler actually mean?

The timers mean pretty much exactly what they say-- The ServerGotRequest timer is set when Fiddler is done transmitting the HTTP request to the server. The GotResponseHeaders timer is set when Fiddler has read the complete set of response headers from the server.
In your screenshot, there's a huge delay between ServerBeginResponse (which is set when the first byte of the server's response is returned) and GotResponseHeaders which suggests that the server spent a significant amount of time in completing the return of the HTTP response headers.
If you send me (via Help > Send Feedback) a SAZ capture of this traffic, I can take a closer look at it.


ASP.NET Core and 102 status code implementation

I have long operation, which called via Web API. Status code 102 says to us:
An interim response used to inform the client that the server has
accepted the complete request, but has not yet completed it.
This status code SHOULD only be sent when the server has a reasonable
expectation that the request will take significant time to complete.
As guidance, if a method is taking longer than 20 seconds (a
reasonable, but arbitrary value) to process the server SHOULD return a
102 (Processing) response. The server MUST send a final response after
the request has been completed.
So, I want to return 102 status code to client, then client waits response about result of operation. How to implement it on .NET?
I read this thread: How To Return Http 102 Processing in Asp.Net Web Api?
This thread has good explanation what is necessary, but no response. I don't understand how it implement on .NET, not theory...
Using HTTP 102 requires that the server send two responses for one request. ASP.NET (Core or not) does not support sending a response to the client without completely ending the request. Any attempt to send two responses will end up in throwing an exception and just not working. (I tried a couple different ways)
There's a good discussion here about how it's not actually in the HTTP spec, so implementing it isn't really required.
There are a couple alternatives I can think of:
Use web sockets (a persistent connection that allows data to be sent back and forth), like with SignalR, for example.
If your request takes a long time because it's getting data from elsewhere, you can try pulling in that data via a stream and send it to the client via a stream. That will send the data as it's coming in, rather than loading it all into memory first before sending it. Here's an example of streaming data from a database to the response: https://stackoverflow.com/a/45682190/1202807

Is there a way to add header to apache response on how long it took to retrieve a resource?

Is there a module or a built-in function in apache which I can use/activate to send information how long it took to retrieve/process a resource?
For example the resource http://dom.net/resource is accessed. The response header will include the total time it took to wait for the resource to be ready before it gets sent back to the client.
Apache doesn't really 'wait' until the resource is ready before sending the response back to you - it streams data back to the client as and when it receives it.
Depending on what you're interested in measuring, you could record the time taken for the client to receive the first byte/last byte back from Apache, or measure the time taken for Apache to receive the first byte from the (remote?) resource like so. The time taken for Apache to receive the entire response back from the remote resource is not something you can send in the headers, as the headers will have been sent to the client before the remote response is fully received. This information could trivially be written to the Apache logs, however.

What Is Meant By Server Response Time

I'm doing website optimisations using Google's Pagespeed Insights to test improvements. Among the high-priority fix suggestions, is this:
Reduce server response time
In our test, your server responded in 2.1 seconds.
I read the 'helpful' doc linked in this section, and now I'm really confused.
Is the server response time the DNS response, the time to first-byte, or a combination? Is it purely a server-side thing, or could this be affected by, for example, a slow JavaScript resource or ready events in the DOM?
My first guess would have been that it's the time taken from the moment the request was issued, to the 1st byte received from the server, however Google's definition is not quite that:
(from this page https://developers.google.com/speed/docs/insights/Server)
Server response time measures how long it takes to load the necessary
HTML to begin rendering the page from your server, subtracting out the
network latency between Google and your server. There may be variance
from one run to the next, but the differences should not be too large.
In fact, highly variable server response time may indicate an
underlying performance issue.
To take 2.1 seconds would suggest to me that your application/webserver is buffering it's output, so all your server side processing is happening before it sends the content. If you don't buffer then the html can begin being sent to the browser more quickly which may help, however you lose the ability to do things like change response headers late in your logic.

What is the purpose of the MaxReceivedMessageSize on the client-side?

I did a test against a WCF server where the response from the server exceeds the MaxRecievedMessageSize property defined in the client-side binding object, resulting in a CommunicationException. I examined request and response using Fiddler. Despite exceeding the MaxRecievedMessageSize, the entirety of the response is sent to the client.
I believe I am missing the point of this behavior. As I see it, no bandwidth is saved as the data has already been received. The client application could have processed the data but the client binding has discarded before it is given to the application.
If saving bandwidth is not the purpose of the MaxReceivedMessageSize on the client-side, what is it for?
The answer is simple: security.
It would indeed be better for the bandwidth if your client could say to the server: "oh, by the way, don't bother sending me replies bigger than X bytes", but that is something they didn't implement :-)
And even if it was, what if the server has a bug, or is intentionally misbehaving...
What if the server returned a 2 TB string? Your client would then try to allocate a 2TB buffer to receive the request and will probably get a OutOfMemoryException. That would bring your client down.

Heroku: I have a request that takes more than 30 seconds and it breaks

I have a request that takes more than 30 seconds and it breaks.
What is the solution for this? I am not sure if I add more dynos this will work.
You should probably see the Heroku devcenter article regarding this, as the information will be more helpful, here's a small summary:
To answer the timeout question:
Cedar supports long-polling and streaming responses. Your app has an initial 30 second window to respond with a single byte back to the client. After each byte sent (either recieved from the client or sent by your application) you reset a rolling 55 second window. If no data is sent during the 55 second window your connection will be terminated.
(That is, if you had Cedar instead of Aspen or Bamboo you could send a byte every thirty seconds or so just to trick the system. It might work.)
To answer your dynos question:
Additional concurrency is of no help whatsoever if you are encountering request timeouts. You can crank your dynos to the maximum and you'll still get a request timeout, since it is a single request that is failing to serve in the correct amount of time. Extra dynos increase your concurrency, not the speed of your requests.
(That is, don't bother adding more dynos.)
On request timeouts:
Check your code for infinite loops, if you're doing something big:
If so, you should move this heavy lifting into a background job which can run asynchronously from your web request. See Queueing for details.