How to use envoy to load balance between multiple endpoints? - load-balancing

I currently have this clusters config, aware that it is invalid since address must be a host rather than a url. I'm trying to load balance between two API providers that have the same API structure, however different ways of authenticating: one using basic http auth and the other with an api key embedded in the path. Is there a way to do this in a single Envoy config? If not, does this mean I have to have envoys running for each api endpoint to use prefix_rewrite?
- name: cluster_1
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: cluster_1
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: https://username:somepass#acoolapi.com/
port_value: 80
- endpoint:
address:
socket_address:
address: https://anotherapi.com/apikey/
port_value: 80
```

Related

A few questions about authentication and authorization with Kong jwt in microservices architecture

As a newbie in microservices architecture, I need to ask a few questions about implementing JWT authentication using Kong.
The architecture of my application looks like in the picture below:
So far I have only used Kong as a proxy and load balancer. The Authentication Service was responsible for creating the token. The token was created during registration and logging in. During registration or logging in, the authentication service asked the user service, and the user service checked the user's data in the mongodb database. Each endpoint from the other services had to receive a JWT in the header and had a function along with a secret which decoded the token. However, it seems to me that this is an unnecessary duplication of code and the whole process of creating and decoding JWT may or even should be done in Kong with JWT plugin.
I tried to follow a couple of tutorials and YouTube guides just like this one:
JWT Kong Gateway
Unfortunately each of the tutorials shows how to create a JWT only for a single consumer, without Kong being connected to the base.
My kong.yml file:
_format_version: "3.0"
_transform: true
services:
- name: building_service
url: http://building_service/building
routes:
- name: building_service_route
paths:
- /building
- name: user_service
url: http://user_service/user
routes:
- name: user_service_route
paths:
- /user
- name: role_service
url: http://role_service/role
routes:
- name: role_service_route
paths:
- /role
- name: task_service
url: http://task_service/task
routes:
- name: task_service_route
paths:
- /task
- name: authorization_service
url: http://authorization_service/authorization
routes:
- name: authorization_service_route
paths:
- /authorization
plugins:
- name: jwt
route: building_service_route
enabled: true
config:
key_claim_name: kid
claims_to_verify:
- exp
# consumers:
# - username: login_server_issuer
# jwt_secrets:
# - consumer: login_server_issuer
# secret: "secret-hash-brown-bear-market-rate-limit"
- name: bot-detection
- name: rate-limiting
config:
minute: 60
policy: local
Kongo service in docker-compose.yml:
services:
kong:
build: ./App/kong
volumes:
- ./App/kong/kong.yml:/usr/local/kong/declarative/kong.yml
container_name: kong
environment:
KONG_DATABASE: 'off'
KONG_PROXY_ACCESS_LOG: '/dev/stdout'
KONG_ADMIN_ACCESS_LOG: '/dev/stdout'
KONG_PROXY_ERROR_LOG: '/dev/stderr'
KONG_ADMIN_ERROR_LOG: '/dev/stderr'
KONG_ADMIN_LISTEN: "0.0.0.0:8001, 0.0.0.0:8444 ssl"
KONG_DECLARATIVE_CONFIG: "/usr/local/kong/declarative/kong.yml"
command: "kong start"
networks:
- api-network
ports:
- "8000:8000"
- "8443:8443"
- "127.0.0.1:8001:8001"
- "127.0.0.1:8444:8444"
List of my questions:
How authentication service should connect to Kong and create JWT with chosen user (as I understand consumer) data?
Should Kong be somehow connected to database to get required user data and create secret?
How to decode JWT with kong and transfer it to other services in header?
Can anyone provide an example of how to achieve desired result?
Do I misunderstood something about JWT or Kong and what I want to achieve is impossible?
If you can consider using Keycloak for user management, then you can have a look at the jwt-keycloak plugin:
https://github.com/gbbirkisson/kong-plugin-jwt-keycloak

OpenShift: Pod often does not return expected requests

I'm running a .dotnet 3.1 RestAPI inside a pod in Openshift, and it process every request smoothly - all transactions to the database (outside the Openshift network) are executed properly, all programatic executions are being finalized without errors. However, 1 in 15 requests will always ECONNRESET and fail to return the HTTP request.
Let's say I make a GET to /users/id/3 - I can see this request hitting my restAPI, being processed all the way down the infra layer, fetch data from the DB, wrap the return, and send it back finishing the request, however at this point, no return is received by the frontend, or postman, but I can see on the API logs that the request was finished and returned.
All of this requests take 2.3min to execute, and often finish in a ECONNRESET. I'm at odds at how to troubleshoot this. I have tried curl'ing the resource in another pod and the same behaviour appears.
I think these requests sometimes are getting lost in the cluster network, so I tried playing with the sessionaffinity of the service config but it's not really tied to this, as far as I understood. Do I have a wrong route config, or service config?
Route config
spec:
host: api.com.cloud
to:
kind: Service
name: api
weight: 100
port:
targetPort: 8080-tcp
tls:
termination: edge
wildcardPolicy: None
status:
ingress:
- host: api.com.cloud
routerName: default
conditions:
- type: Admitted
status: 'True'
lastTransitionTime: XXXX
wildcardPolicy: None
routerCanonicalHostname: router-default.apps.com.cloud
Service config
spec:
clusterIP: XXXX
ipFamilies:
- IPv4
ports:
- name: 8080-tcp
protocol: TCP
port: 8080
targetPort: 8080
internalTrafficPolicy: Cluster
clusterIPs:
- XXX
type: ClusterIP
ipFamilyPolicy: SingleStack
sessionAffinity: None
selector:
deploymentconfig: api
status:
loadBalancer: {}

Connection refused when using load balancing with Traefik 2

I have yml configuration file with a router and a service. Every time I get a 404 error. I know the URL works and I can access the server from Traefik server. What am I missing? Also, for some reason the request reroutes to https. Perhaps a conflicting rule?
Also note, Traefik runs in docker, but the connecting server does not. The goal here is to add multiple nodes to the load balancer.
http:
routers:
demo_1-rtr:
rule: "Host(`http://demo.lab.local`)"
service: demo_1
entryPoints:
- http
services:
demo_1:
loadBalancer:
servers:
- url: "http://172.16.9.90:16000"
Traefik Config:
global:
checkNewVersion: true
sendAnonymousUsage: true
api:
insecure: true
providers:
docker:
endpoint: "unix://var/run/docker.sock"
exposedByDefault: false
file:
directory: /rules
watch: true
log:
level: DEBUG
accessLog: {}
entryPoints:
http:
address: ":80"
I suspect it would be this
--api.insecure=true global argument and it should work.
So in your case add the following in traefik.toml
[api]
insecure = true
Otherwise I would need more information to debug more.

Let's encrypt SSL with traefick on ECS Fargate

I've been trying to solve this for days, but without any luck:
Situation:
I have a ECS cluster on AWS using Fargate, this cluster contains an instance of Traefick 2.3.4 and other containers. I'm using Traefick as reverse proxy to forward the requests to the other containers.
Using HTTP everything works fine, so I've decided to add also the secure connection to Traefick. I've tried everything that I could find on the Internet but nothing works, when I try to connect to the specified domain with curl it returns:
curl: (35) error:1408F10B:SSL routines:ssl3_get_record:wrong version number
Here there are some test that I've done:
traefick.yml:
log:
level: DEBUG
api:
dashboard: true
entryPoints:
web:
address: :80
http:
redirections:
entryPoint:
to: websecure
scheme: https
websecure:
address: ":443"
providers:
ecs:
clusters:
- tools-cluster
region: eu-west-2
exposedByDefault: false
certificatesResolvers:
letsencrypt:
acme:
caServer: https://acme-staging-v02.api.letsencrypt.org/directory
email: #########################
storage: acme.json
httpchallenge:
entrypoint: web
Labels:
"dockerLabels": {
"traefik.enable": "true",
"traefik.http.services.traefik.loadbalancer.server.port": "8080",
"traefik.http.routers.traefik.rule": "Host(`${host}`)",
"traefik.http.routers.traefik.entrypoints": "websecure",
"traefik.http.routers.traefik.tls.certresolver": "letsencrypt",
"traefik.http.routers.traefik.service": "api#internal"
}
this version returns this error:
rror: 400 :: urn:ietf:params:acme:error:connection :: Fetching https://traefik.baaluu.com/.well-known/acme-challenge/td8IdOvJ1_GkigY-jPYaA4YsgeiS5FUiuUS-avbpsuY: Error getting validation data, url
It tries to retrieve that data but it can't because it is redirected to the https and it can't retrieve because https doesn't work, I've tried also without the auto redirect, and it returns a similar error, it can't retrieve that data.
But following this guide it should work correctly.
So I've decided to move to the dnsChallenge with this configuration:
Traefick.yml
log:
level: DEBUG
api:
dashboard: true
entryPoints:
web:
address: :80
websecure:
address: ":443"
providers:
ecs:
clusters:
- tools-cluster
region: eu-west-2
exposedByDefault: false
certificatesResolvers:
letsencrypt:
acme:
caServer: https://acme-staging-v02.api.letsencrypt.org/directory
email: ######################
storage: acme.json
dnsChallenge:
provider: route53
delayBeforeCheck: 3
and same labels as before:
"dockerLabels": {
"traefik.enable": "true",
"traefik.http.services.traefik.loadbalancer.server.port": "8080",
"traefik.http.routers.traefik.rule": "Host(`${host}`)",
"traefik.http.routers.traefik.entrypoints": "websecure",
"traefik.http.routers.traefik.tls.certresolver": "letsencrypt",
"traefik.http.routers.traefik.service": "api#internal"
}
Still nothing, and I've this inside the logs:AuthURL: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/170242259"
That url contains:
{
"type": "urn:ietf:params:acme:error:malformed",
"detail": "Method not allowed",
"status": 405
}
The latest test that I did is to remove the staging ca server:
log:
level: DEBUG
api:
dashboard: true
entryPoints:
web:
address: :80
websecure:
address: :443
providers:
ecs:
clusters:
- tools-cluster
region: eu-west-2
exposedByDefault: false
certificatesResolvers:
letsencrypt:
acme:
email: ###############
storage: acme.json
dnsChallenge:
provider: route53
delayBeforeCheck: 2
The ssl still doesn't work but I don't see any error message inside the logs: this is the last message that I get about a certificate:
Try to challenge certificate for domain [traefik.baaluu.com] found in HostSNI rule" providerName=letsencrypt.acme routerName=traefik#ecs rule="Host(`traefik.baaluu.com`)"
And there is not much more after that:
(I'm sorry for the picture but I don't find a way to extract that logs from ECS)
The other containers are still reachable on the http protocol.
If I try to connect to it using telnet I can reach the service:
telnet traefik.baaluu.com 443
Trying 3.8.30.164...
Connected to traefik-1547500306.eu-west-2.elb.amazonaws.com.
Escape character is '^]'.
Same goes for the 80
Looking better inside the logs I've also find this
retry due to: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/chall-v3/9205340157/1Wh0tQ :: urn:ietf:params:acme:error:badNonce :: JWS has an invalid anti-replay nonce: \"0004cbkFTGjCALFGDYOmhruMl6_F_fRSj33cOMvdpx5Xd2M\", url: "
time="2020-12-10T13:08:21Z" level=debug msg="legolog: [INFO] retry due to: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/chall-v3/9205340157/1Wh0tQ :: urn:ietf:params:acme:error:badNonce :: JWS has an invalid anti-replay nonce: \"0004cbkFTGjCALFGDYOmhruMl6_F_fRSj33cOMvdpx5Xd2M\", url: "
that contains this url: https://acme-v02.api.letsencrypt.org/acme/chall-v3/9205340157/1Wh0tQ
{
"type": "dns-01",
"status": "valid",
"url": "https://acme-v02.api.letsencrypt.org/acme/chall-v3/9205340157/1Wh0tQ",
"token": "44R4gD4_ZmemiCn5rtkqJyWOcjoj09sEgobUvZLH6yc",
"validationRecord": [
{
"hostname": "traefik.baaluu.com"
}
]
}
So I suppose that the ssl has been generated correctly but I'm not sure.
Any idea or suggestion?
Thanks in advance.
H2K
Edit:
I've removed the ssl from the dashboard and I've put it on another container, now entering inside the dashboard I can see this:
So I suppose that the ssl is working for that domain, but I still can't connect to it.
Edit 2:
with telnet if I connect to that url on the port 443 and I request the page I can see the content:
telnet xxxxxxxxxxxxxxxxx 443
Trying 3.10.148.201...
Connected to traefik-1547500306.eu-west-2.elb.amazonaws.com.
Escape character is '^]'.
GET /index.html HTTP/1.1
Host: xxxxxxxxxxxxxxxxx
And the content of the page appears, so it is not a load balacer problem or routing problem, it seems that I can reach the container using the 443, simply the ssl is not there. It is like to have 2 http port and both are behaving in the same way. The 443 at the moment is like a port 80.
I've have also spent a number of days trying to work it out so i feel your pain.
The error is misleading, the request doesn't even make it past the ALB let alone traefik.
There are two factors to this issue,
The first being that when you specify a port 443 through docker compose as "443:443" you would assume that this creates a HTTPS listener, it actually creates a listener for 443 on the HTTP protocol. In addition the listener also sent the data to the fargate HTTP port and didn't redirect. I'm not sure if this is a bug, or because because i haven't specified that the protocol should be "x-aws-protocol: https" on the target port.
I also found some AWS documentation that said if you use a HTTPS port on a ALB that you need an SSL certificate in place at a ALB level. This kind of makes sense that you can't terminate the connection at a task level if you consider the swarm nature and security implications (better minds are welcome to explain)
With the above in mind i created a certificate in the ACM that covered all the the domains that i needed, changed the listener to the HTTPS protocol and specified the certificate i created. At this point i was able to configure traefik to accept traefik to the frontend.

Express Gateway degraded causing huge delays on requests

I have a Gateway configured to handle some services, but there is a huge performance issue and I don't know how can I address this. This is the config for this particular service:
http:
port: 3020
hostname: 'xxxx.prod.xx.xxx.net'
apiEndpoints:
laPrdTest:
host: 'xxx-xxxxxxxx.xxxxxxxx.com'
paths: '/api/v1/esign/*'
serviceEndpoints:
laPrdTest:
urls:
- http://xxxxxx-app1.prd.xxx.xxxxxxxx.net
- http://xxxxxx-app2.prd.xxx.xxxxxxxx.net
policies:
- basic-auth
- cors
- expression
- log
- jwt
- proxy
- rate-limit
pipelines:
laPrdTest:
apiEndpoints:
- laPrdTest
policies:
- cors:
- log:
- action:
message: ${req.method} ${req.originalUrl} ${JSON.stringify(req.headers)}
- rate-limit:
- action:
max: 50
windowMs: 120000
rateLimitBy: "${req.ip}"
- proxy:
- action:
serviceEndpoint: laPrdTest
changeOrigin: true
Check on this gif so you can see how this is very bad on performance:
And this is a call directly to the service API with no performance issue:
Why is this happening?
What can I do to solve this issue?
Finally I couldn't use this anymore an move to do the proxy through Nginx directly.
Any request made through the Gateway it took seconds almost minutes and now using Nginx as a gateway it takes miliseconds.