AWS - NLB Performance Issue - api

AWS
I am using network load balancer infront of private VPC in the API gateway. Basically for APIs in the gateway the endpoint is network load balancer's DNS name.
The issue is, performance sucks (+5 seconds).. If I use the IP address of the EC2 instead of NLB DNS the response is very good (less than 100ms).
Can somebody point me what is the issue? Any configuration screw up I did while creating NLB?
I have been researching for the past 2 days and couldn't find any solution.
Appreciate your response.

I had a similar issue that was due to failing health checks. When all health checks fails, the targets are tried randomly (typically target in each AZ), however, at that stage I had only configured an EC2 in one of the AZs. The solution was to fix the health checks. They require the SecurityGroup (on the EC2 instances) to allow the entire VPC CIDR range (or at least the port the health checks are using).

Related

Kubernetes application authentication

Maybe this is a dumb question, but I really don't know if I have to secure applications with tokens etc. within a kubernetes cluster.
So for example I make a grpc-call from a client within the cluster to a server within the cluster.
I thought this should be secure without authenticating the client with a token or something like that, because (if I understood it right) kubernetes pods and services work within a VPN which won't be exposed as long as it's not told to.
But is this really secure, should I somehow build an authorization system within my cluster?
Also how can I use a service to load balance the grpc-calls over the server pods without exposing the server outside the cluster?
If you have a service, it already has built-in load balancer when you have more than one replica out of the box.
Also Kubernetes traffic is internal within the cluster out of the box, unless you explicitly expose traffic using LoadBalancer, Ingress or NodePort.
Does it mean traffic is safe? No.
By default, everything is allowed within Kubernetes cluster so every service can reach every service or pod in StatefulSet apps.
You can use NetworkPolicy to allow traffic from one service to another service and nothing else. That would increase security.
Does it mean traffic is safe now? It depends.
Authentication would add an additional security layer in case container is hacked. There could be more scenarios, but I can't think of for now.
So internal authentication is usually used to improve security in production systems.
I hope it answers the question.

Traefik, Metallb portforwarding

I'm having problems portforwarding traefik. I have a deployment in Rancher, where i'm using metallb with traefik to have ssl certs. applied on my services. All of this is working locally, and i'm not seeing any error messages in the traefik logs. It's funny because, at times, i am able to reach my service outside of my network, but other times not.
I have portforwarded, 80, 433, 8080 to 192.168.87.135
What am i doing wrong? are there some ports im missing?
Picture of traefik logs
Picture of the exposed traefik loadbalancer
IPv4 specifies private ip address ranges that are not reachable from the internet because:
The Internet has grown beyond anyone's expectations. Sustained
exponential growth continues to introduce new challenges. One
challenge is a concern within the community that globally unique
address space will be exhausted.
(source: RFC-1918 Address Allocation for Private Internets)
IP addresses from these private IP ranges are not accessible from the internet. Your IP address 192.168.87.235 is part of the class C private ip address range 192.168.0.0/16 hence it is by nature not reachable from the internet.
Furthermore you state yourself that it is working correctly within your local network.
A follow up question to this is: How can I access my network if it's a private network?
To access your local network you need to have a gateway that has both an internal as well as a public IP so that you can reach your network through the public IP. One solution could be to have a DNS name thats mapped to the public IP and is internally routed to the internal load balancer IP 192.168.87.235 with a reverse proxy.
Unfortunately I can't tell you why it is working occasionally because that would require far more knowledge about your local network. But I guess it could i.e. be that you are connected with VPN to your local network or that you already have a reverse proxy that is just not online all the time.
Edit after watching your video:
Your cluster is still reachable from the internet at the end of the video. You get the message "Service unavailable" which is in fact returned by traefik everytime you wish to access a non-healthy application. Your problem is that the demo application is not starting up after you restart the VM. So what you need to do next is to check why the demo app is not starting. This includes checking the logs of the pod and events of the failing pod.
Another topic I'd like to touch is traefik and what it actually does. First to only call Traefik a reverse proxy, while not false,is not the entire truth. Traefik in a kubernetes environment is an ingress controller. That means it is a reverse proxy configured by kubernetes resources, namely by the "Ingress" object or the "IngressRoute" object. The latter is a custom resource introduced by Traefik itself (read here for further informations) because it introduces andvanced options to configure traefik.
The reason I tell you this is because you actually have two ingress controllers installed in your cluster, "Traefik" and "nginx-ingress-controller" and you just need a single one.

Action Required: S3 shutting down legacy application server capacity

I got a mail from amazon s3 webservices stating below details
"We are writing to you today to let you know about changes which impact your use of the Amazon Simple Storage Service (S3). In efforts to best serve our customers, we have improved the systems powering the Amazon S3 API and are in the process of shutting down legacy application server capacity. We have detected access on the legacy capacity for Amazon S3 buckets that you own. The legacy capacity is no longer in service, as the DNS entry for the S3 endpoint no longer includes the IP addresses associated with it. We will be shutting down the legacy capacity and retiring the set of IP addresses fronting this capacity after April 1, 2020."
I want to find out which legacy system I am using, and how to prevent from affecting my services.
Imagine you had a web site, www.example.com.
In DNS, that name was pointed to your web server at 203.0.113.100.
You decide to buy a new web server, and you give it a new IP address, let's say 203.0.113.222.
You update the DNS for example.com to point to 203.0.113.222. Within seconds, traffic starts arriving at the new server. Over the coming minutes, more and more traffic arrives at the new server, and less and less arrives at the old server.
Yet, for some strange reason, a few of your site's prior visitors are still hitting that old server. You check the DNS and it's correct. Days go by, then weeks, and somehow a few visitors who used your old server before the cutover are still hitting it.
How is that possible?
That's the gist of the communication here from AWS. They see your traffic arriving on unexpected S3 server IP addresses, for no reason that they can explain.
You're trying to connect to the right endpoint -- that's not the issue -- the problem is that for some reason you have somehow "cached" (using the term in a very imprecise sense) an old DNS lookup and are accessing a bucket by hitting a wrong, old S3 IP address.
If you have a Java backend service accessing S3, those can notorious for holding on to DNS lookups forever. You might need to restart that service, and look into how to resolve that issue and enable correct behavior which is -- as I understand it -- not how Java behaves by default. (Not claiming to be a Java expert but I've encountered this sort of DNS behavior many times.)
If you have an HAProxy or Nginx server that's front-ending for an S3 bucket and has been up for a while, those might need a restart and you should look into how to correctly configure them not to resolve DNS only at startup. I ran into exactly this issue once, years ago, except my HAProxy was forwarding requests to Amazon CloudFront on only 1 of the several IP addresses it could have been using. They took that CloudFront edge server offline, or it failed, or whatever, and the DNS was updated... but my proxy was not able to re-query DNS so it just kept trying and failing until I restarted it. Then I fixed it so that it periodically repeated the DNS lookup so it always had a current address.
If you have your own DNS resolver servers, you might want to verify that they aren't somehow misbehaving, and you might want to ensure that you don't for some reason have any /etc/hosts (or equivalent) static host entried for anything related to S3.
There could be any number of causes but I'm confident at least in my interpretation of what they say is happening.

AWS Route 53 Redirect to Status Page

First question, so if I get this wrong somehow be kind.
We are using Route 53 with Amazon and have our primary front end servers behind an ELB. Our app also routes all requests through HTTPS. We are utilizing an offsite status page via statuspage.io.
What I am trying to accomplish is if the primary site goes down I'd like to have R53 redirect both the SSL and non-SSL traffic to our status page.
I originally had tried setting up a static page in S3 but still had issues with HTTPS requests made on our site.
Has anyone done this successfully? I imagine it has to be possible, but its definitely outside my realm of expertise.
Thank you very much for your time and help.
You are right, S3 website doesn't support HTTPS. However, CloudFront does[1]. What you can do is failover to CloudFront and have your origin be your S3 website or your statuspage.io.
Steps:
Create a distribution and set the CNAMEs to match your DNS entries.
Upload and associate your SSL cert with your distribution
Update failover target to be your CloudFront distribution and set it as an alias.
[1] http://aws.amazon.com/about-aws/whats-new/2014/03/05/amazon-cloudront-announces-sni-custom-ssl/
Route53 is managing the DNS which is not what you want to do (even if you'd change the DNS it would take TTL to sync). What you should do is use a combination of auto-scaling policies and health-checks. These health-checks will be performed by the ELB every 30 seconds and if two consecutive checks will fail it'll mark the instance as out-of-service and will stop directing traffic to it (the ELB is directing traffic to your instances in a round-robin manner).
Having more than one instance and using auto-scaling rules is the key: it will enable AWS to terminate the unhealthy instance and spin up a new instance instead (in the same ASG with the same AMI etc).

Are AWS ELB IP addresses unique to an ELB?

Does anyone know how AWS ELB with SSL work behind the scenes? Running an nslookup on my ELB's domain name I get 4 unique IP addresses. If my ELB is SSL enabled, is it possible for AWS to share these same IPs with other SSL enabled ELBs (not necessarily owned by me)?
As I understand it the hostname in a web request is inside of the encrypted web request for a https request. If this is the case, does AWS have to give each SSL-enabled ELB unique IP addresses that are never shared with anyone else's SSL ELB instance? Put another way -- does AWS give 4 unique IP addresses for every SSL ELB you've requested?
Does anyone know how AWS ELB with SSL work behind the scenes? [...] Put another way --
does AWS give 4 unique IP addresses for every SSL ELB you've
requested?
Elastic Load Balancing (ELB) employs a scalable architecture in itself, meaning the number of unique IP addresses assigned to your ELB does in fact vary depending on the capacity needs and respective scaling activities of your ELB, see section Scaling Elastic Load Balancers within Best Practices in Evaluating Elastic Load Balancing (which provides a pretty detailed explanation of the Architecture of the Elastic Load Balancing Service and How It Works):
The controller will also monitor the load balancers and manage the
capacity [...]. It increases
capacity by utilizing either larger resources (resources with higher
performance characteristics) or more individual resources. The Elastic
Load Balancing service will update the Domain Name System (DNS) record
of the load balancer when it scales so that the new resources have
their respective IP addresses registered in DNS. The DNS record that
is created includes a Time-to-Live (TTL) setting of 60 seconds,[...]. By default, Elastic Load Balancing will return multiple IP
addresses when clients perform a DNS resolution, with the records
being randomly ordered [...]. As the traffic
profile changes, the controller service will scale the load balancers
to handle more requests, scaling equally in all Availability Zones. [emphasis mine]
This is further detailed in section DNS Resolution, including an important tip for load testing an ELB setup:
When Elastic Load Balancing scales, it updates the DNS record with the
new list of IP addresses. [...] It is critical that you factor this
changing DNS record into your tests. If you do not ensure that DNS is
re-resolved or use multiple test clients to simulate increased load,
the test may continue to hit a single IP address when Elastic Load
Balancing has actually allocated many more IP addresses. [emphasis mine]
The entire topic is explored in much more detail within Shlomo Swidler's excellent analysis The “Elastic” in “Elastic Load Balancing”: ELB Elasticity and How to Test it, which meanwhile refers to the aforementioned Best Practices in Evaluating Elastic Load Balancing by AWS as well, basically confirming his analysis but lacking the illustrative step by step samples Shlomo provides.