Gwan report.c statistics - performancecounter

I am testing on G-wan server performance and it's very amazing!!! Here is the output from report.c
Requests
All: 5,725 (6.06% of Cache misses)
HTTP: 66 (1.15% of all requests)
Errors: 70 (1.22% of all requests)
CSP: 5,650 (98.69% of all requests) Exceptions: 1
Connections
Accepted: 4,717 (1.21 requests per connection)
Closed: 4,372
Timeouts: 682 (14.46%) Accept:682 Read:0 Slow:0 Build:0 Send:0 Close:0
Busy: 345 (Waiting: 334 Reading: 9 Replying: 2 Sending: 0 Pushing: 0 Relaying: 0 Closing: 0)
I found that the Errors rate seem to be quite high, and there an exceptions occur on CSP too, could anyone tell me what did "Errors" mean and how to avoid it? Thanks!

the "Errors" rate seem to be quite high
That's HTTP errors (wrong requests coming from a client, not found resources, etc. - look at the error.log file for a trace).
The only way to avoid HTTP errors is to prevent clients from connecting to the server.
If you can't live with this "high rate of HTTP errors" of 1.22% of all requests then use a G-WAN connection handler (with the HTTP_ERROR notification) to make G-WAN ignore HTTP errors and close the connection without sending an HTTP error message (just return 0; in the handler) - but that's probably not what most users want.
there an exceptions occur on CSP too
An exception means a 'graceful crash report' was issued for a servlet bug. As you have only 1 crash on 5,650 dynamic requests, that was probably during the servlet development. Look at your error.log and trace files to check what happened.
Note that the "cache misses" statistics are for static contents only (1.15% of all your HTTP requests).
Apparently, not all your clients are responding in the timely fashion: you have timeouts and pending requests.

Related

Cloudflare returning 520 due to empty server response from Heroku

My Rails app which has been working great for years suddenly started returning Cloudflare 520 errors. Specifically, api.exampleapp.com backend calls return the 520 whereas hits to the frontend www.exampleapp.com subdomain are working just fine.
The hard part about this is nothing has changed in either my configuration, or code at all. Cloudflare believes this is happening as the Heroku server is returning an empty response.
> GET / HTTP/1.1
> Host: api.exampleapp.com
> Accept: */*
> Accept-Encoding: deflate, gzip
>
{ [5 bytes data]
* TLSv1.2 (IN), TLS alert, close notify (256):
{ [2 bytes data]
* Empty reply from server
* Connection #0 to host ORIGIN_IP left intact
curl: (52) Empty reply from server
error: exit status 52
On the Heroku end, my logs don't even seem to register the request when I hit any of these urls. I also double-checked my SSL setup (Origin certificate created at Cloudflare installed on Heroku), just in case, and it seems to be correct and is not expired.
The app is down for a couple of days now, users are complaining, and no response from either customer care teams despite being a paid customer. My dev ops knowledge is fairly limited.
Welcome to the club: https://community.cloudflare.com/t/sometimes-a-cf-520-error/288733
It seems to be a Cloudflare issue introduced in late July affecting hundreds of sites running very different configurations. It's been almost a month since the issue was first reported, Cloudflare "fixed" it twice, but it's still there. Very frustrating.
Change your webserver logs to a info state and see if your application is not exceeding some HTTP/2 directive while processing the connection.
If this is the case, try to increase the directive size:
#nginx
server {
...
http2_max_field_size 64k;
http2_max_header_size 64k;
}

Spring Cloud Gateway hides server websocket handshake 401 failures to clients

I'm reverse proxying a websocket backend API with spring-cloud-gateway 2.2.3. When this backend API rejects some websocket handshake request with a 401 Unauthorized status response, then spring-cloud-gateway still returns a 101 handshake status to the client (which gets confused and then misbehaves)
I need spring-cloud-gateway to return the original 401 websocket handshake error to the client so the SCG reverse proxy is transparent to the client (which is conforming to the WebSocket specs handshake)
Here are the full wiretap traces and exception (I have redacted hostnames).
The client-side response in this WSS request is available as a HAR file captured from chrome and which displays in chrome
as this screenpshot.
Here is my spring cloud gateway configuration
spring:
cloud:
gateway:
routes:
- id: route_shield
uri: https://shield-webui-cf-mysql.nd-int-cfapi.was.redacted
predicates:
- Host=**
filters:
- SetRequestHostHeader=shield-webui-cf-mysql.nd-int-cfapi.was.redacted
ssl:
useInsecureTrustManager: true
I'm wondering whether this is a spring-cloud-gateway bug, or a desired behavior which I can override.
To override it, here are alternatives I'm considering:
using circuit breaker filter and fallback to a local handler returning a 401
write a custom post-filter
Override/patch the WebsocketRoutingFilter
However my debugger breakpoint in the handle(WebSocketSession session) method does not trigger, suspecting it is not called
Likely would need to provide a RequestUpgradeStrategy bean as an alternative to the default implementation of org.springframework.web.reactive.socket.server.upgrade.ReactorNettyRequestUpgradeStrategy#getNativeResponse mentionned in the trace
io.netty.handler.codec.http.websocketx.WebSocketHandshakeException: Invalid handshake response getStatus: 401 Unauthorized
at io.netty.handler.codec.http.websocketx.WebSocketClientHandshaker13.verify(WebSocketClientHandshaker13.java:274) ~[netty-codec-http-4.1.51.Final.jar:4.1.51.Final]
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
|_ checkpoint ⇢ http://localhost:8080/v2/events [ReactorNettyRequestUpgradeStrategy]

Regarding nat type analysis

I downloaded the stun client from http://www.stunprotocol.org/ and trying to figure out the NAT type by command stunclient --mode full stun.stunprotocol.org --verbosity 9 and I got the below response.
config.fBehaviorTest = true
config.fFilteringTest = true
config.timeoutSeconds = 0
config.uMaxAttempts = 0
config.addrServer = 52.86.10.164:3478
socketconfig.addrLocal = 0.0.0.0:0
Sending message to 52.86.10.164:3478
Got response (68 bytes) from 52.86.10.164:3478 on inter
Other address is 52.201.75.212:3479
Sending message to 52.201.75.212:3478
Got response (68 bytes) from 52.201.75.212:3478 on inte
Sending message to 52.201.75.212:3479
Continuing to wait for response...
Continuing to wait for response...
Continuing to wait for response...
Continuing to wait for response...
Continuing to wait for response...
Sending message to 52.201.75.212:3479
Continuing to wait for response...
Continuing to wait for response...
Continuing to wait for response...
Continuing to wait for response...
Continuing to wait for response...
Sending message to 52.86.10.164:3478
Continuing to wait for response...
Continuing to wait for response...
Continuing to wait for response...
Continuing to wait for response...
Continuing to wait for response...
Sending message to 52.86.10.164:3478
Continuing to wait for response...
Continuing to wait for response...
Continuing to wait for response...
Continuing to wait for response...
Continuing to wait for response...
Sending message to 52.86.10.164:3478
Continuing to wait for response...
Continuing to wait for response...
Continuing to wait for response...
Continuing to wait for response...
Continuing to wait for response...
Sending message to 52.86.10.164:3478
Continuing to wait for response...
Continuing to wait for response...
Continuing to wait for response...
Continuing to wait for response...
Continuing to wait for response...
Binding test: success
Local address: 10.64.60.58:58841
Mapped address: 125.19.34.60:24604
Behavior test: fail
Filtering test: success
Nat filtering: Address and Port Dependent Filtering
I am working in a corporate and therefore for security reasons, NAT type "Address and Port Dependent Filtering" seems viable.
But as a general phenomenon, it seems to me that for peer to peer connections , most of the the time, NAT Type will be "Address and Port Dependent Filtering" and therefore turn server is required for any media communication.
However, searching on google for webrtc, it shows that 90% of peer to peer communication establishes through stun server itself( by hole punching etc). This means NAT type in that case is fully supported for establishing the connection.
Do experts have any opinions about the NAT type analytics to consider for peer to peer communication?
The stunclient program could use a bit more logging to indicate what it's doing. Since I know a little about the code, here's how I interpret it.
Stunclient does two different sets of tests. The first are the "mapping behavior" tests, which is the most important for understand how your NAT/Firewall will impact P2P connectivity. The other set are the "filtering tests" that are an indicator of how "open" your NAT is with regards to receiving traffic from other IP/port combinations.
Your behavior test "failed". What this likely means, based on your log output is this:
Test 1: Pick a random port, 58841, in this case. From this local port, do a basic binding test to stun.stunprotocol.org:3478. This is where the client receives a response where the server indicates the mapped address (125.19.34.60:24604) and that the stun alternate IP for subsequent behavior and filtering tests is at 52.201.75.212.
Test 2: Same local port, 58851. Send a binding request to the alternate IP and primary port (52.201.75.212:3478). In your case, it appears that the response that came back was likely a different ip or port. And in thist case "test 3" was required.
Test 3: Same local port, 58851. Send a binding request to the alternate ip and alternate port (52.201.75.212:3479) so it can distinguish between between "address dependent" vs "address and port dependent mappings". And this is the interesting part - you never got a response. Despite being able to communicate with both IP addresses on port 3478. This is why the test came back as failed.
Could be one of two things:
a) Your NAT/Firewall actually is open for port 3478, but not 3479. Do this from the command line to detect
stunclient 52.201.75.212 3479
And if that succeeds in getting a mapped address, then immediately do this:
stunclient 52.86.10.164 3478
Try other combinations of those two ip address and ports. The resulting behavior could mean the following
b) Your NAT/Firewall declines to port map when both the remove ip and port have changed. That means your network environment is even more restrictive that an "Address and Port dependent mapping" NAT. More commonly known as a symmetric NAT.
As for the filtering tests, just ignore this result. The filtering tests try to detect if you can send to one ip:port, but receive from a different ip or port. 99% of the time NATs disallow this. So the result almost always results in "Address and Port Dependent Filtering". The filtering test result is not very indicative of how your NAT will succeed in P2P connectivity.
And just because your enterprise network is very restrictive, it doesn't mean it's impossible for you to communicate with a peer on another network. If he has a more well behaved NAT with Endpoint Independent Mapping, then there's still a chance P2P connectivity will succeed.
I haven't kept up with NAT trends in recent yearas, but 80-90% of connections succeeding with STUN alone sounds right. The rest will need a relay solution such as TURN.
I imagine, you want to communicate between p2p in all possible scenarios (including symmetry NAT).
My suggestion: Try using webRTC and use stun and turn servers in ice server list. this will give you range of ICE candidates and webRTC will take care of connecting to best possible candidates.
this should save you from your NAT type concerns.

What does Apache Bench consider a failure?

I have been using Apache Bench in work to benchmark a number of servers. After testing one of them I got this result:
Concurrency Level: 10
Time taken for tests: 13.564 seconds
Complete requests: 500
Failed requests: 497
(Connect: 0, Length: 497, Exceptions: 0)
There is no sign of an error in the server's log file. This makes me believe that it is Apache Bench that is treating successful requests as fails. With that in mind, can anyone explain to me what does Apache Bench considers a failed request?
Apache Bench marks a response as a failure if the actual response length doesn't match the byte count stated in the response header.
Possible duplicate of: investigating apache benchmark failed request
Apache bench seems to consider it a failed transaction even if the byte count of the body and the Content-Length header match, IF the same URI returns a variable length body on each request.

Apache, mod_ssl "request failed: error reading the headers" for a specific user

Currently we have an Apache 2.2.3 server with mod_ssl 2.2.3 running Django, with users authenticating by using a x509 certificate.
So far the system is running perfectly except for a single user, who when trying to upload a file receives 400 Bad Request error, and the contents of the ssl_error_log regarding this operation are:
[<date>] [error] [client <client ip>] request failed: error reading the headers, referer: <referrer url>
The contents of the ssl_access_log are:
<client ip> - - [<date>] "POST <target page> HTTP/1.1" 400 321
Also, the user's browser is Firefox as far as I know.
I am completely unable to reproduce this bug and so far none of the other users have experienced it. Could you point out some reasons for this to happen?
I've experienced connectivity that stops the upstream after an X amount of bytes is sent. X was a pretty low value, as in enough to request some simple pages, but not to deal with ajax requests much less upload files. As far as I recall, this connectivity problem occurred only when tethering (from a specific Android phone, but I didnt even test other phones).
So if the upstream gets interrupted and the upload stalls, it makes sense apache would return this error, according to this post: "Apache waits a time equal to the Timeout directive (defaults to 5 minutes if not defined) for a response from the client. It is likely Apache is waiting for the CRLF that indicates the end of the headers, yet it is never received.."