health check for apache knox - apache

I want to create a health check mechanism to make sure I remove unhealthy Knox instances that are configured behind a load balancer.
Normal ping to the underlying instances will help check whether the machine is reachable or not. But it will not help determine if the gateway is healthy/running to serve incoming requests to that instance.
I can make a request to Knox through the LB, but it will goto only one instance and there is no way of knowing it.
I want to know if there is any way to determine the same? Or is there a mechanism that is provided in Knox itself though which I can make a http (non-secure, as direct https calls to the instance is not permitted) call to the gateway server and determine?
Thanks!!

I am not sure which Load balancer you are using. From the "health check" I am assuming you are using Elastic Load Balancer.
Create a health check with tcp protocol. It will only check whether those port are open or not. If the knox is not running those instances will go to out of service and the incoming requests will be re directed to the instances which are in service .
PFB the screenshots for the same.

I don't know how your load balancer is configured but you could try pinging knox_host:knox_port directly, this would at-least tell you whether knox is up and running (and listening).
If you would want to know whether Knox is healthy (specifically your topology) then you can try issuing a test request periodically and look for the response code 200.
e.g.
curl -i -u guest:guest-password -X GET \
'http://<direct-knox>:8443/gateway/sandbox/webhdfs/v1/?op=LISTSTATUS'
Hope that helps !

Related

AWS - NLB Performance Issue

AWS
I am using network load balancer infront of private VPC in the API gateway. Basically for APIs in the gateway the endpoint is network load balancer's DNS name.
The issue is, performance sucks (+5 seconds).. If I use the IP address of the EC2 instead of NLB DNS the response is very good (less than 100ms).
Can somebody point me what is the issue? Any configuration screw up I did while creating NLB?
I have been researching for the past 2 days and couldn't find any solution.
Appreciate your response.
I had a similar issue that was due to failing health checks. When all health checks fails, the targets are tried randomly (typically target in each AZ), however, at that stage I had only configured an EC2 in one of the AZs. The solution was to fix the health checks. They require the SecurityGroup (on the EC2 instances) to allow the entire VPC CIDR range (or at least the port the health checks are using).

AWS CLI via 2 proxies

I have a scenario where i need to execute AWS CLI commands via 2 proxies against the AWS cloudwatch.
Server A(AWS CLI) -----> Server B (Apache proxy Web server) -----> Corporate Proxy IP (X.X.X.X) -----> Internet
My Challenge here is that the AWS CLI commands do not have a context (/something) based on which a rewrite rule (to be written on Server B) can be applied to forward the request from Server A to Corporate Proxy IP and finally to internet (AWS).
Connectivity from Corporate Proxy IP is already there to Internet.
My main motive is to fetch cloudwatch metrics on Server A via the 2 proxies. According to me this is not achievable but need inputs if this can be achieved and if YES, what Rewrite rule
should be written on Server B to proxy the AWS CLI commands to Corporate Proxy.
AWS CLI commands Eg. would be as below:
aws cloudwatch get-metric-statistics --namespace AWS/EC2 --metric-name CPUUtilization --dimensions Name=InstanceId,Value=i-xxx --statistics Average --start-time date -u '+%FT%TZ' -d '10 mins ago' --end-time date -u '+%FT%TZ' --period 60
I'm aware that we can use HTTP_PROXY to forward requests via proxy, however that would only forward my request from Server A to Server B (Apache proxy Web Server).
Thanks in advance & appreciate a quick response.
Ok, so I actually recently built out a reverse proxy server (nginx) specifically to forward AWS CLI requests to help with corporate firewalls. Unfortunately, I cannot publish the code used to make that work, but can give you some insights into the issues with setting up a system like this.
This ones the most obvious. You'll need to have a redirection rule that understands the request being pushed through it and rewrites it to a syntax that the upstream AWS server can understand. In default AWS commands, that context is part of the URL (e.g. https://ec2.us-west-2.amazonaws.com). If you're passing through an upstream reverse proxy, you'll need to pass that context up somehow. You can either have a star DNS record to capture all requests to your proxy the same way amazon does it (e.g. \*.\*.{proxy-address} => {proxy-ip} then aws ec2 --endpoint-url https://ec2.us-west-2.proxy describe-instances) or you can manually inject the information into the path (e.g. aws ec2 --endpoint-url https://proxy/ec2/us-west-2 describe-instances). Then, on your proxy server, you parse out the information and set your upstream based on it. My final solution was to place the full default AWS endpoint url into the path of my proxy https://proxy/ec2.us-west-2.amazonaws.com then have regex upstream to parse out the endpoint URL in case there is information in the path placed by amazon, then set the upstream server to the endpoint URL resolved by the regex.
After you complete #1, you'll now run into the second issue of generated signatures. If you're using the --endpoint-url flag to the CLI, it will sign the request with the Host header set to the proxy server URL. Now, when this is rewritten upstream, that Host header will no longer match the signature. So, you'll need to re-sign any request passing through the proxy. There's a couple sneaky ways around this. What I ended up doing was creating an AWS CLI wrapper which overloaded the signing mechanism to sign the request as if it was sending it to the default AWS endpoint, then overwriting the Host header to point towards my reverse proxy. Re-signing this way is advantageous because it removes proxy latency due to not having to translate the request, but is quite difficult to implement in a way that will dynamically ingest any new signature methods AWS may release.
It is also worth noting that if you dig deep enough into the botocore source code, you'll find some reverse proxy support that is build in, but appears to be defunct/not used (it is not exposed to the client). Hopefully, they flush out that functionality in the near future and this will no longer be an issue.

Google Compute Engine: how to find why load balancing health checks are failing?

I've been trying to create a Google Compute Engine network load balancing health check for an HTTPS (port 443) endpoint. The same endpoint when accessed over HTTP (port 80) is healthy. Also, the HTTPS endpoint, when accessed for example with curl correctly returns a 200 OK response, which would be the required condition for an healthy check.
It would be extremely helpful if there was a way to access a more detailed error report of why the health check is failing, because it's probably something quite easy to fix, but the total lack of detailed information in the web interface makes it random guess work. Trying to research information about where to find detailed information about why a health check is failing I have come up empty.
I believe this is because load balancer health checks don't currently support HTTPS.
The 200 ok is related to the health check, but if your TCP connection is not closed properly, this can cause this issue. If you run this command: tcpdump -A -n host your_host ip, you can confirm that the TCP connection is closed with a FIN/ACK status. If you see [R] flag in the output, it indicates that the connection is being reset instead of closing properly.
For more information, visit this link https://developers.google.com/compute/docs/load-balancing/health-checks#steps_to_set_up_health_checks

Delay issue with Websocket over SSL on Amazon's ELB

I followed the instructions from this link:
How do you get Amazon's ELB with HTTPS/SSL to work with Web Sockets? to set up ELB to work with Websocket (having ELB forward 443 to 8443 on TCP mode). Now I am seeing this issue for wss: server sends message1, client does not receive it; after few seconds, server sends message2, client receives both messages (both messages are around 30 bytes). I can reproduce the issue fairly easily. If I set up port forwarding with iptable on the server and have client connecting directly to the server (port 443), I don't have the problem Also, the issue seems to happen only to wss. ws works fine.
The server is running jetty8.
I checked EC2 forums and did not really find anything. I am wondering if anyone has seen the same issue.
Thanks
From what you describe, this pretty likely is a buffering issue with ELB. Quick research suggests that this actually is the issue.
From the ELB docs:
When you use TCP for both front-end and back-end connections, your
load balancer will forward the request to the back-end instances
without modification to the headers. This configuration will also not
insert cookies for session stickiness or the X-Forwarded-* headers.
When you use HTTP (layer 7) for both front-end and back-end
connections, your load balancer parses the headers in the request and
terminates the connection before re-sending the request to the
registered instance(s). This is the default configuration provided by
Elastic Load Balancing.
From the AWS forums:
I believe this is HTTP/HTTPS specific but not configurable but can't
say I'm sure. You may want to try to use the ELB in just plain TCP
mode on port 80 which I believe will just pass the traffic to the
client and vice versa without buffering.
Can you try to make more measurements and see how this delay depends on the message size?
Now, I am not entirely sure what you already did and what failed and what did not fail. From the docs and the forum post, however, the solution seems to be using the TCP/SSL (Layer 4) ELB type for both, front-end and back-end.
This resonates with "Nagle's algorithm" ... the TCP stack could be configured to bundling requests before sending them over the wire to reduce traffic. This would explain the symptoms, but worth a try

All jmeter requests going to only one server with haproxy

I'm using Jmeter to load test my web application. I have two web servers and we are using HAProxy for load balance. All my tests are running fine and configured correctly. I have three jmeter remote clients so I can run my tests distributed. The problem I'm facing is that ALL my jmeter requests are only being processed by one of the web servers. For some reason it's not balancing and I'm having many time outs, and huge response times. I've looked around a lot for a way to make these requests being balanced, but I'm having no luck so far. Does anyone know what can be the cause of this behavior? Please let me know if you need to know anything about my environment first and I will provide the answers.
Check your haproxy configuration:
What is it's load balancing policy, if not round-robin is it based on ip source or some other info that might be common to your 3 remote servers?
Are you sure load balancing is working right? Try testing with browser first, if you can add some information about the web server in response to debug.
Check your test plan:
Are you sure you don't have somewhere in your requests a sessionid that is hardcoded?
How many threads did you configure?
In your Jmeter script by default the HTTP Request "Use KeepAlive" header option is checked.
Keep-Alive is a header that maintains a persistent connection between
client and server, preventing a connection from breaking
intermittently. Also known as HTTP keep-alive, it can be defined as a
method to allow the same TCP connection for HTTP communication instead
of opening a new connection for each new request.
This may cause all requests to go to the same server. Just uncheck the option and save, stop your script and re-run.